XR (Extended Reality) refers to when “real and virtual “environments are represented jointly in a computing environment. Also, XR has recently gained some popularity with the Meta’s Metaverse initiative. Given rising interest in XR, the Streaming Video Alliance has published an eXtended Reality (XR) white paper. Before we jump into the future, we wanted to give you a peak at the some recent developments in VR video streaming over the last 12 months that we feel exemplify the state of XR video experiences. The SVA will also update its VR paper in the coming months that will include more details on the deployments referenced in this blog.
XR video experience: tiling technology in UK
The most talked about VR deployments in 2020 were BT and SKY, both covering soccer and both using the tiling technology developed by Tiledmedia. In short, Tiling is a form of viewport-dependent streaming that only transmits the field of view that is being watched by the user on their HMD or Mobile, thus saving a significant amount of bandwidth vs the traditional (viewport-independent) approach that sends the full sphere. Both services were launched in 2020 but with different parameters. For the BT deployment, the capture is 8K 360 degrees, with off-camera stitching. The 8K video is encoded using viewport-dependent technology and streamed to Mobile devices, both Android and iOS. Note that the service is not limited to 5G and also works on 4G networks. Bitrates are between 8 and 15 Mbit/s, depending on how dynamically the user changes their viewport. More details can be found here.
For the Sky deployment, the content is captured using a broadcast-grade 4K camera with a 180-degree fisheye lens, which means that stitching is not required. The video is also encoded using viewport-dependent technology. It is then streamed to Oculus Quest devices (both the original Quest and the Quest 2 are supported). The bandwidth used on the network is in the order of 15M/s. Users also get a virtual, HD-resolution Jumbotron that can be freely moved around. Fans see 4 thumbnail feeds and they can choose one for the Jumbotron; it shows the TV director cut by default.
The feedback from users for both services has been great and reportedly the engagement time is an order of magnitude higher than the usual average usage of 3 min for VR video. More details on the SKY deployment can be found here.
XR video experience: layered technology in Canada
5G mobile operators in Canada have already trialed dual layer viewport-independent schemas for premium sports content, allowing a user to experience full 4K in their field of view, while the rest of the sphere is delivered in SD. As a user moves their head and changes their gaze, the SD tiled that were in their peripheral vision are replaces with 4K content. The ultra-low latency provided by the 5G network and deep edge caching in the 5G network is essential to provide a flawless transition between resolutions. 5G was leveraged as a complete end to end bearer network. 8 x 12k contribution streams were captured on site and delivered via 5G to a local edge where the live transcoding, brand safety, and stitching was completed to produce the dual layer encoding used to send HD (FoV) + SD (non-FOV). The final output was made available to consumers over 5G and Fixed Access networks with the highest bandwidths delivered in the order of 42M/s. The feedback from the trial audience was generally positive. Most consumers described the events as something very close the actual in-venue experience while low latency context switching, particularly over the 5G network provided relatively seamless transitions between content renditions minimizing motion sickness symptoms. A frequent complaint, however, was that the transition between the 4K FoV and the peripheral content was obvious which implying that the experience could be greatly enhanced by increasing the resolution for non-FoV content. It is expected that the delivery of VR Premium and Gaming content is one of the use cases that will play an important role in monetizing the operator’s investment in 5G and edge technology, as relying solely on consumer cellular services may not provide a sufficient ROI.
XR video experience: the Tokyo Olympics plays it safe
NBC decided to deploy an Oculus App on Quest 2 that could only be watched in US. This project was conducted between several companies: COSM for the production, B-Stream for the streaming, EZDRM on the DRM side, and Tiledmedia on the player side. The production was using a traditional 4K capture setup, like the one previously deployed by NBCU during the Rio Olympics in 2016. In the Tokyo case, the quality of capture and stitching was improved, DRM content protection was added, and the user had a choice of five 180-degree camera viewpoints, a VR180 “director cut” and one VR360 feed, all with 4K resolution and seamless switching. The app used ABR streaming with bandwidths up to about 20 Mbit/s. User feedback was quite positive especially as some prefer more 180 over 360, depending on the type of sporting event (some events are more naturally viewed straight on, while others lend themselves to a more immersive 360 viewing experience). We also saw some people more interested in the 2D experience, in which the user had access to the same feeds on a mobile device. They could look around through swiping or by moving their device around using “magic window” mode. We believe there is a correspondence between the content type, the type of workflow used, and the type of user; this is what the trial was aiming at determining.
XR Video experience: the European 5G 4K trial
A large European Tier-1 Telco has deployed several trials in 2021, here are some of the excerpts of the most recent one. The goal for the Mobile Network Operator (MNO) is to promote the 5G network and demonstrate how it can help the take-off of immersive experiences such as VR and Multiview. The content selected for the trials was premium events (concerts and soccer matches) as well as e-sports. The capture was done at the highest resolution in 8K, but due to the constraints of the live production workflow, it was decided to process live in 4K resolution. Based on the SVA trial results, this made perfect sense since the target device was a Mobile phone and tablet. The 8K content is always available for off-line stitching and production of VOD assets offering the best quality. Like for the Tokyo Olympics, the content is stitched live and encoded in 4K using viewport independent technology. On the client side, the user can select either the produced VR view or individual camera views, with seamless switching between these views. No details can be provided on the technical parameters nor the user feedback as this information is confidential at this moment.
The SVA trialing 8K VR
The SVA, following the VR Industry Forum Guidelines that specifies the use of 8K capture and distribution of the full sphere using viewport independent technology to 8K capable devices (which is the case for most recent 5G phones, Qualcomm XR2 or Oculus Quest 2) has tested this workflow using the technologies of Verizon (Fiber network), Qwilt (CDN and Open Caching), VRIF (content production), Harmonic (media processing & streaming), Viaccess Orca (secured player), and Nice people At Work (analytics). Testers were from Verizon, Lumen, Viaccess Orca, accessing the content from typical residential and mobile networks. The capture was done using an 8K VR camera rig, stitching was performed offline, content was offline encoded using traditional 8K ABR encoding, packaged in DASH, and sent to the Qwilt CDN. The goals of the trial were to see assess streaming quality on the FiOS fiber network and to measure any improvement brought by Open Caching. The trial was done by users located outside of FiOS network on very high-speed networks (Lumen, VO) as well as on the FiOS network (Verizon, Lumen) with 5G phones and Oculus Quest 2. The SVA is in the process of documenting the trial; in the meantime, Harmonic has presented together with Viaccess Orca and Orange (all SVA members) a paper on the 8K viewport independent technology at the SMPTE’21 conference. The bandwidth used on the network is on the order of 40M/s for the top 8K profile. The feedback from users on high-speed connections was that the quality was better when SBR (single bitrate) was used vs ABR (adaptive bitrate). This was largely due to the conservative nature of the player’s ABR algorithm, which started on the lowest quality rendition and gradually ramped up (with SBR, the experience jumped right into the highest quality rendition). An additional observation was that the 8K improvement vs 4K was not always visible on HMD. Consequently, Viaccess Orca has modified its player behavior to change the resolution much faster and Harmonic has submitted content where 8K resolution will be more visible (using static scenes with high level of details). It should be noted that on mobile devices, we do not see any difference between 4K and 8K content, which is explained by the fact majority of the 5G phones are still at 1440 resolution. Next the SVA will test the same technology using a 5G Verizon network and will also test on FioS the Tiledmedia viewport dependent technology and will therefore be able to compare the experience with the trial done on 8K viewport independent.
Conclusion
The VR video streaming market is split across different technologies: 4K viewport-independent that can be considered as legacy, 8K viewport dependent being the state of the art, and the 8K viewport-independent technology (not yet deployed but showing its maturity with the SVA trial). The 8K viewport-independent approach has the benefit of being integrated directly to a standard OTT video encoding and streaming workflow, thus enabling key features such as DRM (Widevine level 1), low latency (~5s), and targeted advertisement. This reliance on standard 2D streaming technologies can reduce integration time and cost.
We expect networks used for VR applications to be 5G for mobile consumption on the go or at the venue and very high-speed fixed network such as Fiber or DOCSIS 3.1 for HMD and mobile consumption. Furthermore, we expect the quality be the best when content is captured in 8K and delivered to either 4K capable devices using the viewport-dependent technique or to 8K capable devices using the viewport-independent technique. Of course, with viewport dependent techniques a 12-16K VR capture could be transmitted to 8K capable devices, the next frontier of VR experience!
The techniques developed in VR, such as 8K production workflow, multi-view and viewport-dependent scheme are the building blocks of a volumetric video system in the future.