Techniques for Interactive Region-Based Scalability
Techniques are provided herein for optimizing encoding and decoding operations for video data streams. An encoded video data stream is received, and select image segments of the encoded video data stream are identified. Each of the select image segments is an independently decodable portion of the encoded video data stream. Enhanced layer decoding operations are performed on each of the select image segments of the encoded video data stream to obtain an enhanced decoded output for the select image segments. Base layer decoding operations on each of the select image segments of the encoded video data stream are performed to obtain a base layer decoded output for the select image segments.
The present disclosure relates to enhancing video data streams.
BACKGROUNDIn a video conference environment, endpoint devices may send and receive communications (e.g., video data streams) between each other. For example, endpoint devices may send video data streams directly to each other or via a video conference bridge. The video data streams may be encoded in multiple data layers. For example, the video data streams may be encoded in a base layer and in an enhancement layer. One or more layers of the video data streams may be decoded by an endpoint device before the video is presented.
Techniques are provided herein for optimizing encoding and decoding operations for video data streams. An encoded video data stream is received, and select image segments of the encoded video data stream are identified. Each of the select image segments is an independently decodable portion of the encoded video data stream. Enhancement layer decoding operations are performed on each of the select image segments of the encoded video data stream to obtain an enhanced decoded output for the select image segments. Base layer decoding operations on each of the select image segments of the encoded video data stream are performed to obtain a base layer decoded output for the select image segments.
Example EmbodimentsTechniques are presented herein for optimizing video data streams. An example audio/video network environment (“network”) is shown in
The endpoint device 102 and the endpoint device 104 may each service a plurality of participants (not shown in
As stated above, the endpoint device 102 may be configured to send encoded video data to the endpoint device 104. As such,
In general, the techniques described herein support enhanced quality for interactive spatial ROIs for image segments of video data. ROIs refer to specific portions of the image segments of video data for which participants (viewers of the video) are interested in receiving enhanced quality. For example, there may be some applications where at least two video views are presented to participants. A participant may wish to see, in one video view, an enhanced selected portion of a video, while in a second video view, may wish to see the entire video. For example, a teacher presenting an online lecture remotely may wish to have an overall first view of the class but also a zoomed-in high-quality second view of a student who is asking a question. In another example, some participants of a video conference may wish to see everyone in a room, while others may wish to see a zoomed-in image of a speaker. In a third example, when viewing large data set visualizations in a unified collaboration tool, a user may wish to zoom into certain areas of the data, and different participants (e.g., at different endpoints) may wish to zoom into different areas simultaneously.
Traditionally, using existing decoding techniques, the multiple views may be provided to participants by decoding both the base layer and enhancement layer of encoded video data to access these ROIs. In other words, participants may wish to view an enhance ROI in a video, and accordingly to existing decoding techniques, both the base layer for the entire video data and the enhancement layer of the entire video data may be decoded to simply provide the enhancements to the ROI, which may be a small portion of the entire video data.
The techniques described herein overcome these limitations by enabling a decoder unit to perform enhanced layer decoding operations on select image segments corresponding to a ROI. Thus, a participant can receive an enhanced view of a ROI without requiring decoding of the entire enhancement layer of the video data (e.g., image segments of the video data outside of the ROI).
For example, in
In the case of sending limited enhancement information, a region of interest is identified in the base layer consisting of a (probably rectangular) subset of the video data. The encoder may then use spatial predictions from just this region in order to encode the restricted enhancement information. The enhancement layer may be at the same or higher resolution than the region of interest in the base layer, and the type of enhancement may be of improved resolution or improved quality or both. In this case, it is possible that the entire base layer needs to be decoded, or special coding tools are used to avoid this. For example, if tiles are used in the base layer along with restrictions on the motion vectors used in the base layer, only the tiles covering the region of interest in the base layer need decoding. In the second case of decoding selected portions of a full enhancement layer, once again, if tiles, slices or similar segments are used at the encoder for the enhancement layer to determine an independently decodable region of a frame, and restrictions on motion vectors are used to make these independently decodable across a sequence of frames, then only portions of the enhancement layer need to be decoded. If these restrictions are implemented both at the base layer and the enhancement layer, then only portions of both need to be decoded. It should be understood that, as used herein, the term “image segments” may refer to tiles as defined in any video encoding standard now known or hereinafter developed, such as the MPEG HEVC/ITU-T H.265 standard, VP9 or similar technologies, or slices as defined in any video encoding standard now known or hereinafter developed, such as the MPEG AVC/ITU-T H.264, MPEG HEVC/ITU-T H.265 or similar technologies. Furthermore, select image segments may be identified for a region of interest of the encoded video data stream base on video and/or audio analysis, such as based on detection of a loudest speaker, in the classroom example, referred to above.
To elaborate, restrictions on motion vectors are needed because tiles/slices and similar segmentations break spatial dependencies within a frame, and allow data within a frame to be decoded independently from each other. However, frames are decoded with reference to previously-encoded frames also, by means of motion-compensated (i.e., displaced) prediction. The restrictions on the motion vectors are so that each tile depends only on data from within the co-located tile in previous frames. This makes a tile like a sub-stream of independently decodable video. Thus, select image segments may be identified such that they are independently decodable by virtue of restricting prediction to be from the same image segments in a current video frame or from a previously decoded video frame.
Thus, according to the present techniques, the base layer may be a spatial superset of the enhancement layer, and the ROIs that require enhancement may be smaller than the overall picture/image area of the base layer. As a result, it may be advantageous to perform enhanced decoding for only a small area of the image corresponding to the ROIs. These techniques are described herein.
Reference is now made to
The spatial region 200 in
To be clear, the base layer may not be segmented. An encoder may only segment the enhancement layer, and require decoding of the whole base layer. It is desirable to allow the encoder not to use techniques like tiles or slices in the base layer, since the base layer may be provided by some simpler legacy equipment and the more complex enhancement layer is an add-on that can be used without direct communication with or configuration of the legacy equipment.
Reference is now made to
Reference is now made to
Upon receiving the encoded high-resolution video data, the decoder unit 106 of the bridge device 402, at 406, outputs high-resolution decoded data to the ROI analysis unit 110. The ROI analysis unit 110 sends, at 408, the ROI descriptions to the UI units 302 of each of the receiving endpoint devices 104(1) and 104(2). The ROI descriptions are similar to those described at reference numeral 304 in connection with
Reference is now made to
The media switch 502 then sends the appropriate base layer data 114 and enhancement layer data 116 to the decoder units 108 of the receiving endpoint devices 104(1) and 104(2). For example, the media switch 502 may send enhancement layer data 116 to the decoder unit 302 of receiving endpoint device 104(1) corresponding to the ROI selection performed by a user at receiving endpoint device 104(1). Likewise, the media switch 502 may send enhancement layer data 116 to the decoder unit 302 of receiving endpoint device 104(2) corresponding to the ROI selection performed by a user at receiving endpoint device 104(2).
As explained above, the ROI descriptions 408 are forwarded as shown in
Reference is now made to
At 606, the decoder unit 108 selects an enhancement layer (EL) configuration and at 608 requests from an encoder unit (e.g., encoder unit 106 in
Reference is now made to
Reference is now made to
The decoder unit 108 is coupled to the processor 804. The decoder unit 108 may be, for example a video codec hardware element of the video conference endpoint device 800 that performs video decoding operations, as described herein. The UI unit 302 and the ROI unit 110 are also coupled to the processor and are configured to perform the operations described herein. In one example, UI unit 302 (e.g., a mouse, keyboard, joystick, etc.) and the ROI unit 110 may be hardware elements of the video conference endpoint device 800. In another example, the UI unit 302 and the ROI unit 110 may be executable software components of the video conference endpoint device 800. It should be appreciated that the decoder unit 108, the UI unit 302 and the ROI unit 110 operate in the same manner with the same functions as described in connection with
The memory 806 may comprise read only memory (ROM), random access memory (RAM), magnetic disk storage media devices, optical storage media devices, flash memory devices, electrical, optical, or other physical/tangible (non-transitory) memory storage devices. The memory 806 stores software instructions for the enhanced decoding software 808. Thus, in general, the memory 806 may comprise one or more computer readable storage media (e.g., a memory storage device) encoded with software comprising computer executable instructions and when the software is executed (e.g., by the processor 804) it is operable to perform the operations described for the enhanced decoding software 808
The enhanced decoding software 808 may take any of a variety of forms, so as to be encoded in one or more tangible computer readable memory media or storage device for execution, such as fixed logic or programmable logic (e.g., software/computer instructions executed by a processor), and the processor 806 may be an ASIC that comprises fixed digital logic, or a combination thereof.
For example, the processor 804 may be embodied by digital logic gates in a fixed or programmable digital logic integrated circuit, which digital logic gates are configured to perform the enhanced decoding software 808. In general, the enhanced decoding software 808 may be embodied in one or more computer readable storage media encoded with software comprising computer executable instructions and when the software is executed operable to perform the operations described hereinafter.
It should be appreciated that the techniques described above in connection with all embodiments may be performed by one or more computer readable storage media that is encoded with software comprising computer executable instructions to perform the methods and steps described herein. For example, the operations performed by the endpoint devices and the intermediate devices may be performed by one or more computer or machine readable storage media (non-transitory) or device executed by a processor and comprising software, hardware or a combination of software and hardware to perform the techniques described herein.
In summary, a method is provided comprising: receiving an encoded video data stream; identifying select image segments of the encoded video data stream, wherein each of the select image segments is an independently decodable portion of the encoded video data stream; performing enhanced layer decoding operations on each of the select image segments of the encoded video data stream to obtain an enhanced decoded output for the select image segments; and performing base layer decoding operations on each of the select image segments of the encoded video data stream to obtain a base layer decoded output for the select image segments.
In addition, a computer readable storage media is provided that is encoded with software comprising computer executable instructions and when the software is executed operable to: obtain an encoded video data stream; identify select image segments of the encoded video data stream, wherein each of the select image segments is an independently decodable portion of the encoded video data stream; perform enhanced layer decoding operations on each of the select image segments of the encoded video data stream to obtain an enhanced decoded output for the select image segments; and perform base layer decoding operations on each of the select image segments of the encoded video data stream to obtain a base layer decoded output for the select image segments.
Furthermore, an apparatus is provided comprising: a decoder unit configured to decode an encoded video data stream; and a processor coupled to the decoder unit, and further configured to: identify select image segments of the encoded video data stream, wherein each of the select image segments is an independently decodable portion of the encoded video data stream; cause the decoder unit to perform enhanced layer decoding operations on each of the select image segments of the encoded video data stream to obtain an enhanced decoded output for the select image segments; and cause the decoder unit to perform base layer decoding operations on each of the select image segments of the encoded video data stream to obtain a base layer decoded output for the select image segments.
The above description is intended by way of example only. Various modifications and structural changes may be made therein without departing from the scope of the concepts described herein and within the scope and range of equivalents of the claims.
Claims
1. A method comprising:
- receiving an encoded video data stream;
- identifying select image segments of the encoded video data stream, wherein each of the select image segments is an independently decodable portion of the encoded video data stream;
- performing enhanced layer decoding operations on each of the select image segments of the encoded video data stream to obtain an enhanced decoded output for the select image segments; and
- performing base layer decoding operations on each of the select image segments of the encoded video data stream to obtain a base layer decoded output for the select image segments.
2. The method of claim 1, further comprising performing base layer decoding operations on every image segment of the encoded video data stream.
3. The method of claim 1, wherein identifying comprises receiving from a device that generates the encoded video data stream an indication of the select image segments.
4. The method of claim 3, wherein receiving comprises receiving the indication of the select image segments that represents at least one region of interest of the encoded video data stream.
5. The method of claim 4, wherein identifying comprises identifying the select image segments of the region of interest that identifies a spatial location for enhancement in an image of the encoded video data stream.
6. The method of claim 1, wherein receiving the encoded video data stream comprises receiving the encoded video data stream that comprises base layer encoded data and enhancement layer encoded data.
7. The method of claim 1, wherein receiving the encoded video data stream comprises receiving the encoded video data stream that comprises base layer encoded data; and further comprising:
- after identifying the select image segments, requesting from a device that generates the encoded video data stream, enhancement layer encoded data for the select image segments of the encoded video data stream.
8. The method of claim 1, wherein performing the enhanced layer decoding operations comprises performing the enhanced layer decoding operations based on an enhancement decoding configuration.
9. The method of claim 1, wherein the image segments are tiles as defined in the MPEG HEVC/ITU-T H.265 standard, VP9 or similar technologies, or slices as defined in the MPEG AVC/ITU-T H.264, MPEG HEVC/ITU-T H.265 or similar technologies.
10. The method of claim 1, wherein identifying comprises identifying the select image segments such that they are independently decodable by virtue of restricting prediction to be from the same image segments in a current video frame or from a previously decoded video frame.
11. The method of claim 1, wherein identifying comprises identifying the select image segments that represent a region of interest of the encoded video data stream is based on video and/or audio analysis.
12. A computer readable storage media encoded with software comprising computer executable instructions and when the software is executed operable to:
- obtain an encoded video data stream;
- identify select image segments of the encoded video data stream, wherein each of the select image segments is an independently decodable portion of the encoded video data stream;
- perform enhanced layer decoding operations on each of the select image segments of the encoded video data stream to obtain an enhanced decoded output for the select image segments; and
- perform base layer decoding operations on each of the select image segments of the encoded video data stream to obtain a base layer decoded output for the select image segments.
13. The computer readable storage media of claim 12, further comprising instructions that are operable to perform base layer decoding operations on the every image segment of the encoded video data stream.
14. The computer readable storage media of claim 12, wherein the instructions that are operable to identify comprise instructions that are operable to receive an indication of the select image segments from a device that generates the encoded video data stream.
15. The computer readable storage media of claim 12, wherein the instructions that are operable to obtain comprise instructions that are operable to receive the indication of the select image segments that represents at least one region of interest of the encoded video data stream.
16. The computer readable storage media of claim 15, wherein the instructions that are operable to identify comprise instructions that are operable to identify the select image segments of the region of interest that identifies a spatial location for enhancement in an image of the encoded video data stream.
17. The computer readable storage media of claim 12, wherein the instructions that are operable to obtain comprise instructions that are operable to receive the encoded video data stream that comprises base layer encoded data and enhancement layer encoded data.
18. The computer readable storage media of claim 12, wherein the instructions that are operable to obtain comprise instructions that are operable to obtain the encoded video data stream that comprises base layer encoded data; and further comprising instructions operable to:
- request from a device that generates the encoded video data stream, enhancement layer encoded data for the select image segments of the encoded video data stream after identifying the select image segments.
19. The computer readable storage media of claim 12, wherein the instructions that are operable to perform the enhanced layer decoding operations comprise instructions operable to perform the enhanced layer decoding operations based on an enhancement decoding configuration.
20. An apparatus comprising:
- a decoder unit that decodes an encoded video data stream;
- a processor coupled to the decoder unit, wherein the processor is configured to: identify select image segments of the encoded video data stream, wherein each of the select image segments is an independently decodable portion of the encoded video data stream; cause the decoder unit to perform enhanced layer decoding operations on each of the select image segments of the encoded video data stream to obtain an enhanced decoded output for the select image segments; and cause the decoder unit to perform base layer decoding operations on each of the select image segments of the encoded video data stream to obtain a base layer decoded output for the select image segments.
21. The apparatus of claim 20, wherein the processor causes the decoder unit to perform base layer decoding operations on every image segment of the encoded video data stream.
22. The apparatus of claim 20, wherein the processor obtains an indication of the select image segments received from a device that generates the encoded video data stream.
23. The apparatus of claim 22, wherein the processor obtains the indication of the select image segments that represents at least one region of interest of the encoded video data stream.
Type: Application
Filed: Jun 23, 2014
Publication Date: Dec 24, 2015
Inventor: Thomas Davies (Guildford)
Application Number: 14/311,741