HIERARCHICAL VIDEO COMPRESSION SUPPORTING SELECTIVE DELIVERY OF TWO-DIMENSIONAL AND THREE-DIMENSIONAL VIDEO CONTENT

Info

Publication number: 20110157309
Type: Application
Filed: Dec 30, 2010
Publication Date: Jun 30, 2011
Applicant: BROADCOM CORPORATION (Irvine, CA)
Inventors: James D. Bennett (Hroznetin), Jeyhan Karaoguz (Irvine, CA)
Application Number: 12/982,053

Abstract

Systems, methods and apparatuses are described herein for encoding a plurality of video frame sequences, wherein each video frame sequence corresponds to a different perspective view of the same subject matter. In accordance with various embodiments, the encoding is performed in a hierarchical manner that leverages referencing between frames of different ones of the video frame sequences (so-called “external referencing”), but that also allows for encoded representations of only a subset of the video frame sequences to be provided when less than all sequences are required to support a particular viewing mode of a display system that is capable of displaying the subject matter in a two-dimensional mode, a three-dimensional mode, or a multi-view three-dimensional mode.

Description

Description

This application claims the benefit of U.S. Provisional Patent Application No. 61/291,818, filed on Dec. 31, 2009, and U.S. Provisional Patent Application No. 61/303,119, filed on Feb. 10, 2010. The entirety of each of these applications is incorporated by reference herein.

This application is also related to the following U.S. Patent Applications, each of which also claims the benefit of U.S. Provisional Patent Application Nos. 61/291,818 and 61/303,119 and each of which is incorporated by reference herein:

U.S. patent application Ser. No. 12/845,409, filed on Jul. 28, 2010, and entitled “Display with Adaptable Parallax Barrier”;

U.S. patent application Ser. No. 12/845,440, filed on Jul. 28, 2010, and entitled “Adaptable Parallax Barrier Supporting Mixed 2D and Stereoscopic 3D Display Regions”; and

U.S. patent application Ser. No. 12/845,461, filed on Jul. 28, 2010, and entitled “Display Supporting Multiple Simultaneous 3D Views.”

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention generally relates to the compression of digital video content for transmission and/or storage thereof, and the decompression of such compressed digital video content.

2. Background Art

Recently, there has been a substantial increase in the types and amount of digital video content available to consumers and a concomitant increase in demand for such content. Today, consumers can purchase or otherwise obtain digital video content via a variety of distribution channels. For example, consumers can view digital video content transmitted via terrestrial broadcast, cable TV or satellite TV services, downloaded or streamed over a wired/wireless Internet Protocol (IP) network connection, or recorded on Digital Versatile Discs (DVDs) and Blu-ray™ discs. The proliferation of high-quality low-cost digital video cameras, such as those being incorporated into the latest generations of smart telephones and media players, have also enabled consumers to easily create and distribute their own digital video content.

For a variety of reasons that include increasing sales, reducing costs and improving customer satisfaction, those in the business of distributing digital media content are highly motivated to minimize the amount of bandwidth required for transmitting content as well as to maximize the amount of content that can be stored on a given recording medium. Video compression, which refers to techniques that can be used to reduce the quantity of data used to represent digital video images, has become an important tool for achieving these goals. For example, the H.263 video compression standard, which is often used to compress digital video content for video conferencing, video telephony, and video on mobile phones, and the MPEG-2 video compression standard, which is often used to compress digital video content distributed via the Internet, can each achieve a compression ratio of about 30:1 for many types of digital video content while maintaining excellent image quality. As another example, the H.264/MPEG-4 Advanced Video Coding (AVC) video compression standard, which is currently used to compress Blu-ray™, Digital Video Broadcasting (DVB), iPod® video and HD DVD digital video content, can achieve a compression ratio of about 50:1 for many types of digital video content while maintaining excellent image quality.

Even though video compression techniques such as those mentioned above can be used to reduce the amount of data needed to represent digital video content, there is still a finite amount of bandwidth available on networks that transmit digital video content and there is also a finite amount of storage capacity on even the most advanced recording mediums. Consequently, distributors of digital video content continue to look for even more efficient video data transmission and storage techniques.

A recent development in the area of digital video entertainment involves the production and distribution of digital video content for viewing in three dimensions (also referred to herein as “three-dimensional video content”). Such video content includes a left-eye view and a right-eye view that must be concurrently presented to a viewer. Various techniques (e.g., colored, polarizing or shuttering glasses, light manipulation via a parallax barrier or lenticular lens) may be used to ensure that the left-eye view is perceived only by the left eye of a viewer and the right-eye view is perceived only by the right eye of the viewer. The mind of the viewer combines the left-eye view and the right-eye view to perceive a series of three-dimensional images. If each view is of the same resolution as a traditional two-dimensional video stream, then transmitting or storing the three-dimensional video content will consume even more network bandwidth/storage space than transmitting or storing the two-dimensional video stream. Thus, such three-dimensional video content may present a special challenge in terms of achieving efficient data transmission and storage.

BRIEF SUMMARY OF THE INVENTION

Systems, methods and apparatuses are described herein for encoding a plurality of video frame sequences, wherein each video frame sequence corresponds to a different perspective view of the same subject matter. In accordance with various embodiments, the encoding is performed in a hierarchical manner that leverages referencing between frames of different ones of the video frame sequences (so-called “external referencing”), but that also allows for encoded representations of only a subset of the video frame sequences to be provided when less than all sequences are required to support a particular viewing mode of a display system that is capable of displaying the subject matter in a two-dimensional mode, a three-dimensional mode, or a multi-view three-dimensional mode. Such apparatuses, systems, and methods are substantially as shown in and/or described herein in connection with at least one of the figures, as set forth more completely in the claims.

BRIEF DESCRIPTION OF THE DRAWINGS/FIGURES

The accompanying drawings, which are incorporated herein and form a part of the specification, illustrate the present invention and, together with the description, further serve to explain the principles of the invention and to enable a person skilled in the pertinent art to make and use the invention.

FIG. 1 is a block diagram of an example system for generating three-dimensional video content that may be encoded in accordance with an embodiment.

FIG. 2 depicts an exemplary two-dimensional (2D)/three-dimensional (3D) display system in accordance with an embodiment that renders a single digital video stream to a display in a manner that produces a single two-dimensional view.

FIG. 3 depicts an exemplary 2D/3D display system in accordance with an embodiment that renders two digital video streams to a display in an integrated manner that produces a single three-dimensional view.

FIG. 4 depicts an exemplary 2D/3D display system in accordance with an embodiment that renders four digital video streams to a display in an integrated manner that produces two three-dimensional views.

FIG. 5 is a block diagram of a hierarchical video encoding system in accordance with an embodiment.

FIG. 6 is a diagram that shows how internal and external referencing can be used to encode frames of two different digital video streams in accordance with an embodiment.

FIG. 7 is a block diagram of a hierarchical video encoding system in accordance with an alternate embodiment.

FIG. 8 depicts a flowchart of a method for hierarchically encoding multiple video frame sequences that represent different perspective views of the same subject matter in accordance with an embodiment.

FIG. 9 depicts a flowchart of a method for encoding a plurality of video elements, each of the plurality of video elements representing a video frame sequence from a selected perspective view, in accordance with an embodiment.

FIG. 10 is a block diagram of a hierarchical video decoding system in accordance with an embodiment.

FIG. 11 is a block diagram of a hierarchical video decoding system in accordance with an alternate embodiment.

FIG. 12 depicts a flowchart of a method for decoding first and second encoded portions of video content, the first and second portions relating to first and second perspective views of three-dimensional content, in accordance with an embodiment.

FIG. 13 is a block diagram of an example system that performs hierarchical video encoding and decoding in accordance with an embodiment.

FIGS. 14-16 are block diagrams of a system that performs hierarchical video encoding and decoding to produce different types of two-dimensional and three-dimensional visual presentations in accordance with various embodiments.

FIG. 17 illustrates a hardware configuration that may be used to implement a hierarchical video encoding and/or decoding system in accordance with an embodiment.

FIG. 18 is a block diagram of a software or firmware based system for implementing a hierarchical video encoder and/or decoder in accordance with an embodiment.

The present invention will now be described with reference to the accompanying drawings. In the drawings, like reference numbers indicate identical or functionally similar elements. Additionally, the left-most digit(s) of a reference number identifies the drawing in which the reference number first appears.

DETAILED DESCRIPTION OF THE INVENTION I. Introduction

The present specification discloses one or more embodiments that incorporate the features of the invention. The disclosed embodiment(s) merely exemplify the invention. The scope of the invention is not limited to the disclosed embodiment(s). The invention is defined by the claims appended hereto.

References in the specification to “one embodiment,” “an embodiment,” “an example embodiment,” etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to effect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.

Furthermore, it should be understood that spatial descriptions (e.g., “above,” “below,” “up,” “left,” “right,” “down,” “top,” “bottom,” “vertical,” “horizontal,” etc.) used herein are for purposes of illustration only, and that practical implementations of the structures described herein can be spatially arranged in any orientation or manner.

In accordance with embodiments described herein, three-dimensional video content is represented as a plurality of separate digital video streams, wherein each video stream comprises a sequence of frames (alternatively referred to as “images” or “pictures”). Each video stream provides a different perspective view of the same subject matter. This is illustrated by FIG. 1, which is a diagram of one example system 100 for generating three-dimensional video content. As shown in FIG. 1, system 100 includes eight video cameras 102₁-102₈that are directed at and operate to record images of the same subject matter 104 from different perspectives over the same period of time. This results in the generation of eight different digital video streams that provide different perspective views of subject matter 104 over the same period of time.

Of course, techniques other than utilizing video cameras may be used to produce the different digital video streams. For example, one or more of the digital video streams may be created in a manual or automated fashion by digital animators using advanced graphics and animation tools. Additionally, at least one of the digital video streams may be created by using a manual or automated interpolation process that creates a digital video stream based on analysis of at least two of the other digital video streams. For example, with reference to FIG. 1, if camera 102₂were absent, a digital video stream corresponding to the perspective view of subject matter 104 provided by that camera could nevertheless be created by performing an interpolation process on the digital video streams produced by cameras 102₁and 102₃. Still other techniques not described herein may be used to produce one or more of the different digital video streams.

Display systems have been described that can display a single image of certain subject matter to provide a two-dimensional view thereof and that can also display two images of the same subject matter viewed from different perspectives in an integrated manner to provide a three-dimensional view thereof. Such two-dimensional (2D)/three-dimensional (3D) display systems can further display a multiple of two images (e.g., four images, eight images, etc.) of the same subject matter viewed from different perspectives in an integrated manner to simultaneously provide multiple three-dimensional views thereof, wherein the particular three-dimensional view perceived by a viewer is determined based at least in part on the position of the viewer. Examples of such 2D/3D display systems are described in the following commonly-owned, co-pending U.S. Patent Applications: U.S. patent application Ser. No. 12/845,409, filed on Jul. 28, 2010, and entitled “Display with Adaptable Parallax Barrier”; U.S. patent application Ser. No. 12/845,440, filed on Jul. 28, 2010, and entitled “Adaptable Parallax Barrier Supporting Mixed 2D and Stereoscopic 3D Display Regions”; and U.S. patent application Ser. No. 12/845,461, filed on Jul. 28, 2010, and entitled “Display Supporting Multiple Simultaneous 3D Views.” The entirety of each of these applications is incorporated by reference herein.

The different digital video streams produced by system 100 can be obtained and provided to a 2D/3D display system as described above in order to facilitate the presentation of a two-dimensional view of subject matter 104, a single three-dimensional view of subject matter 104, or multiple three-dimensional views of subject matter 104. For example, FIG. 2 depicts an exemplary 2D/3D display system 210 that receives a single digital video stream 202, wherein digital video stream 202 represents a single perspective view of certain subject matter. 2D/3D display system 210 renders each image of digital video stream 202 to a display in a manner that produces a two-dimensional view 220 of the subject matter that can be perceived by a viewer. Digital video stream 202 may comprise, for example, a digital video stream produced by a single one of video cameras 102₁-102₈in FIG. 1 (e.g., the digital video stream produced by video camera 102₅).

FIG. 3 further illustrates that 2D/3D display system 210 can also receive a first digital video stream 302 and a second digital video stream 304, wherein each of first digital video stream 302 and second digital video stream 304 represents a different perspective view of the same subject matter. 2D/3D display system 210 renders corresponding images of first and second digital video streams 302 and 304 to a display in an integrated manner that produces a three-dimensional view 320 of the subject matter that can be perceived by a viewer. First digital video stream 302 may comprise, for example, a digital video stream produced by a first one of video cameras 102₁-102₈in FIG. 1 (e.g., the digital video stream produced by video camera 102₅) and second digital video stream 304 may comprise, for example, a digital video stream produced by a second one of video cameras 102₁-102₈in FIG. 1 (e.g., the digital video stream produced by video camera 102₃).

FIG. 4 further illustrates that 2D/3D display system 210 can also receive a first digital video stream 402, a second digital video stream 404, a third digital video stream 406 and a fourth digital video stream 408, wherein each of the digital video streams represents a different perspective of the same subject matter. 2D/3D display system 210 renders corresponding images of first, second, third and fourth digital video streams 402, 404, 406 and 408 to a display in an integrated manner that simultaneously produces a first three-dimensional view 420 of the subject matter that can be perceived by a first viewer and a second three-dimensional view 422 that can be perceived by a second viewer. For example, corresponding images from first digital video stream 402 and second digital video stream 404 can be rendered in a manner that produces first three-dimensional view 420 and corresponding images from third digital video stream 406 and fourth digital video stream 408 can be rendered in a manner that produces second three-dimensional view 422. First digital video stream 402 may comprise, for example, a digital video stream produced by a first one of video cameras 102₁-102₈in FIG. 1 (e.g., the digital video stream produced by video camera 102₅), second digital video stream 404 may comprise, for example, a digital video stream produced by a second one of video cameras 102₁-102₈in FIG. 1 (e.g., the digital video stream produced by video camera 102₃), third digital video stream 406 may comprise, for example, a digital video stream produced by a third one of video cameras 102₁-102₈in FIG. 1 (e.g., the digital video stream produced by video camera 102₄), and fourth digital video stream 408 may comprise, for example, a digital video stream produced by a fourth one of video cameras 102₁-102₈in FIG. 1 (e.g., the digital video stream produced by video camera 102₆).

It is to be understood that 2D/3D display system 410 can further receive additional pairs of digital video streams and utilize those streams to simultaneously produce additional three-dimensional views beyond first three-dimensional view 420 and second three-dimensional view 422 shown in FIG. 4. Additional details regarding the display capabilities of exemplary implementations of 2D/3D display system 210 can be found in the aforementioned, incorporated U.S. patent application Ser. Nos. 12/845,409, 12/845,440 and 12/845,461.

In order to provide the desired digital video streams to 2D/3D display system 410, such digital video streams may be transmitted across a network, including but not limited to a terrestrial broadcast, cable TV or satellite TV network, or a wired/wireless Internet Protocol (IP) network. In addition, such digital video streams may be stored on a recording medium (e.g., a Digital Versatile Disc (DVD) or Blu-ray™ disc) and accessed by a suitable reading device connected to or integrated with 2D/3D display system 410. Since network bandwidth and storage space are both finite resources, it would be advantageous if the amount of data required to represent each of the digital video streams could be reduced. To this end, encoding systems and methods will be described herein that provide a compressed data representation of each of the different digital video streams.

Also, as noted above, depending upon the display mode, only a subset of a plurality of digital video streams may need to be provided to 2D/3D display system 410. For example, only a single digital video stream need be provided to support a two-dimensional display mode, only two digital video streams need be provided to support a three-dimensional display mode, while only four digital video streams need be provided to support a dual-view three-dimensional display mode. As will be described below, encoding systems and methods described herein perform encoding in a hierarchical manner that leverages referencing between frames of different ones of the digital video streams (so-called “external referencing”), but that also allows for encoded representations of only a subset of the digital video streams to be provided when less than all streams are required to support a particular viewing mode.

II. Hierarchical Video Encoding in Accordance with Embodiments

FIG. 5 is a block diagram of a hierarchical video encoding system 500 in accordance with an embodiment. Video encoding system 500 operates to encode a plurality of digital video streams, wherein each digital video stream represents a different perspective view of the same subject matter. As noted in the preceding section, various subsets of such digital video streams may be used to support two-dimensional and three-dimensional display modes in suitably configured 2D/3D display systems.

Video encoding system 500 is configured to encode eight different digital video streams, denoted streams 1-8. With continued reference to the example of FIG. 1, streams 1-8 may represent digital video streams produced by corresponding ones of video cameras 102₁-102₈. Thus, for example, stream 1 may represent a digital video stream produced by video camera 102₁, stream 2 may represent a digital video stream produced by video camera 102₂, stream 3 may represent a digital video stream produced by video camera 102₃, and so forth and so on. However, this is only an example, and each of streams 1-8 may originate from one of a variety of different sources.

As shown in FIG. 5, video encoding system 500 comprises at least first tier encoding logic 502, second tier encoding logic 504, third tier encoding logic 506 and fourth tier encoding logic 508. Each of these components may be implemented in hardware using analog and/or digital circuits or in software, through the execution of instructions by one or more general purpose or special-purpose processors, or as a combination of hardware and software.

First tier encoding logic 502 is configured to receive stream 3 and apply a first tier encoding algorithm thereto to produce an encoded version thereof, wherein the encoded version is compressed with respect to the original version (i.e., the encoded version is represented by less data than the original version). The first tier encoding is performed in a manner that utilizes internal referencing only. As used herein, the term “internal referencing” refers to the application of encoding to a frame of a digital video stream in a manner that does not depend on a frame of any other digital video stream. Thus, for example, a frame of a digital video stream may be encoded without reference to any other frame except itself. An example of this type of encoding is the encoding that is applied to produce so-called “intra-coded frames” in various well-known video compression schemes, such as the encoding that is applied to produce I-frames in MPEG-2 and other video compression schemes. This type of encoding essentially constitutes a form of image compression. A frame that is encoded without reference to any other frame may likewise be decoded without reference to any other frame.

Another example of internal referencing is the encoding of a frame of a digital video stream with reference to one or more other frames of the same digital video stream. An example of this type of encoding is the encoding that is applied to produce so-called “inter-coded frames” in various well-known video compression schemes, such as the encoding that is applied to produce P-frames (predicted frames) and B-frames (bi-directional predicted frames) in MPEG-2 and other video compression schemes. In accordance with certain implementations of this type of encoding, a block of a given frame is encoded by encoding information that identifies a block of a preceding frame in the same digital video stream and by also encoding the differences between the two blocks. Additionally, a block of a given frame may be encoded by encoding information that identifies a first block of a preceding frame in the same digital video stream and a second block in a subsequent frame in the same digital video stream and by also encoding the differences between the block to be encoded and the first and second blocks. Typically, the reference block(s) that are selected are ones that are deemed closely matching to the block to be encoded, thereby reducing the size of the difference(s) that must be encoded. These are only examples, however, and numerous other techniques for encoding a frame of a digital video stream with reference to one or more other frames of the same digital video stream are known in the art.

Since first tier encoding logic 502 utilizes only internal referencing to encode stream 3, encoded stream 3 can be properly decoded without requiring access to an encoded representation of any of streams 1, 2 and 4-8.

Second tier encoding logic 504 is configured to receive stream 7 and apply a second tier encoding algorithm thereto to produce an encoded version thereof, wherein the encoded version is compressed with respect to the original version. The second tier encoding that is applied to stream 7 is performed in a manner that allows the utilization of both internal referencing to stream 7 and external referencing to stream 3. In accordance with the explanation provided above, internal referencing means that a frame of stream 7 may be encoded without reference to any other frame except itself or with reference to one or more frames in the same digital video stream. In this sense, second tier encoding logic 504 is similar to first tier encoding logic 502.

However, second tier encoding logic 504 also allows for external referencing to stream 3 to encode stream 7. This means that a frame of stream 7 may be encoded by referencing one or more frames of stream 3. For example, a block of a given frame of stream 7 may be encoded by encoding information that identifies a block of a frame of stream 3 and by also encoding the differences between the two blocks. As another example, a block of a given frame of stream 7 may be encoded by encoding information that identifies a first block of a first frame of stream 3 and a second block of a second frame of stream 3 and by also encoding the differences between the block to be encoded and the first and second blocks.

Second tier encoding logic 504 may further apply encoding to stream 7 in a manner that combines internal referencing to stream 7 and external referencing to stream 3. For example, a block of a first frame of stream 7 may be encoded by encoding information that identifies a first block of a first frame of stream 3 and a second block of a second frame of stream 7 and by also encoding the differences between the block to be encoded and the first and second blocks. Note that although the preceding description refers to the encoding of blocks of frames, encoding may be applied to entire frames or other portions of frames depending upon the implementation.

By allowing for external referencing to stream 3 when encoding stream 7, second tier encoding logic 504 can advantageously exploit the content of frames present in both stream 3 and stream 7 to find the best image content to use for encoding. For example, when looking for a matching block in another frame to encode a block of a frame of stream 7, second tier encoding logic 504 may find a better matching block in a frame of stream 3 than in any available frame of stream 7. By permitting referencing to the better-matching block, the size of a difference that must be encoded can be reduced and the compression ratio can be increased. Since each digital video stream represents a view of the same subject matter, albeit captured from a slightly different perspective, it is likely that temporally-corresponding frames of the different video streams will represent similar content and thus provide good content for matching. Thus, external referencing can be very helpful in achieving improved compression ratios.

Since second tier encoding logic 504 can encode stream 7 by utilizing both internal referencing to stream 7 and external referencing to stream 3, encoded stream 7 can only be properly decoded by acquiring access to encoded stream 3.

FIG. 6 is a diagram that further illustrates how first and second tier encoding logic 502 and 504 may operate in accordance with one implementation. FIG. 6 depicts a first encoded digital video stream 602 that comprises a series of encoded frames 610, 612, 614, 616, 618, 620 and 622. FIG. 6 also depicts a second encoded digital video stream 604 that comprises a series of encoded frames 630, 632, 634, 636, 638, 640 and 642. The frames represented by encoded frames 610, 612, 614, 616, 618, 620 and 622 temporally correspond to the frames represented by encoded frames 630, 632, 634, 636, 638, 640 and 642, but provide different perspective views of the same subject matter. In FIG. 6, an arrow pointing from a first encoded frame to a second encoded frame is intended to indicate that the first encoded frame was generated by referencing a frame represented by the second encoded frame.

First encoded digital video stream 602 has been generated using internal referencing only. Thus, first encoded digital video stream 602 may represent encoded stream 3 that is generated by first tier encoding logic 502 of video encoding system 500. For example, as shown in FIG. 6, encoded frame 610 has been generated without reference to any other frame and is thus labeled an I-frame in accordance with conventional video compression nomenclature. Encoded frames 616 and 622 have been generated by referencing a preceding frame in the same digital video stream represented by encoded frame 610 and are thus labeled P-frames in accordance with conventional video compression nomenclature. Encoded frames 612, 614, 618 and 620 have been generated by referencing both a preceding frame and a subsequent frame in the same digital video stream and are thus labeled B-frames in accordance with conventional video compression nomenclature. However, none of the frames represented by encoded digital video stream 602 have been generated by referencing a frame of any other digital video stream.

In contrast, second encoded digital video stream 604 has been generated using internal referencing as well as external referencing to frames represented by first encoded digital video stream 602. Thus, second encoded digital video stream 604 may represent encoded stream 7 that is generated by second tier encoding logic 504 of video encoding system 500. For example, as shown in FIG. 6, encoded frame 630 has been generated by referencing the frame represented by encoded frame 610 of first encoded digital video stream 604. Furthermore, encoded frame 634 has been generated by referencing both the frame represented by encoded frame 632 of second encoded digital video stream 604 and the frame represented by encoded frame 612 of first encoded digital video stream 612. As yet another example, encoded frame 640 has been generated by referencing the frames represented by encoded frames 618 and 620 of first encoded digital video stream 612.

It can be seen from FIG. 6 that first encoded digital video stream 602 can be decoded without accessing second encoded digital video stream 604 since there are no encoding dependencies running from first encoded digital video stream 602 to the frames represented by second encoded digital video stream 604. In contrast, however, second encoded digital video stream 604 cannot be properly decoded without accessing first encoded digital video stream 602 since there are encoding dependencies running from second encoded digital video stream 604 to the frames represented by first encoded digital video stream 602.

Returning now to the description of video encoding system 500, third tier encoding logic 506 is configured to receive stream 1 and stream 5 and apply a third tier encoding algorithm thereto to produce encoded versions thereof, wherein each encoded version is compressed with respect to the corresponding original version. The third tier encoding that is applied to stream 1 is performed in a manner that allows the utilization of both internal referencing to stream 1 and external referencing to any of streams 3, 5 and 7. The third tier encoding that is applied to stream 5 is performed in a manner that allows the utilization of both internal referencing to stream 5 and external referencing to any of streams 1, 3 and 7.

By allowing for external referencing to streams 3, 5 and 7 when encoding stream 1, third tier encoding logic 506 can advantageously exploit the content of frames present in streams 1, 3, 5 and 7 to find the best image content to use for encoding. As a result, however, encoded stream 1 can only be properly decoded by accessing encoded streams 3, 5 and 7. Furthermore, by allowing for external referencing to streams 1, 3 and 7 when encoding stream 5, third tier encoding logic 506 can advantageously exploit the content of frames present in streams 1, 3, 5 and 7 to find the best image content to use for encoding. As a result, however, encoded stream 5 can only be properly decoded by accessing encoded streams 1, 3 and 7.

As further shown in FIG. 5, fourth tier encoding logic 508 is configured to receive streams 2, 4, 6 and 8 and apply a fourth tier encoding algorithm thereto to produce encoded versions thereof, wherein each encoded version is compressed with respect to the corresponding original version. The fourth tier encoding that is applied to stream 2 is performed in a manner that allows the utilization of both internal referencing to stream 2 and external referencing to any of streams 1 and 3-8. The fourth tier encoding that is applied to stream 4 is performed in a manner that allows the utilization of both internal referencing to stream 4 and external referencing to any of streams 1-3 and 5-8. The fourth tier encoding that is applied to stream 6 is performed in a manner that allows the utilization of both internal referencing to stream 6 and external referencing to any of streams 1-5, 7 and 8. The fourth tier encoding that is applied to stream 8 is performed in a manner that allows the utilization of both internal referencing to stream 8 and external referencing to any of streams 1-7.

It can be seen that by allowing for external referencing to all other streams when encoding streams 2, 4, 6 and 8, fourth tier encoding logic 508 can advantageously exploit the content of frames present in all the streams to find the best image content for encoding. As a result, however, each of encoded streams 2, 4, 6 and 8 can only be properly decoded by accessing all the other encoded streams.

The manner in which video encoding system 500 operates to encode streams 1-8 provides a number of significant benefits. For example, by allowing for external referencing when encoding streams 1, 2 and 4-8, video encoding system 500 provides a greater number of candidate frames that can be searched to identify matching image content during encoding. Moreover, since each of streams 1-8 represents a different perspective view of the same subject matter, it is likely that the frames of the externally referenced streams will contain a significant amount of matching image content. This is particularly true for streams that represent similar perspective views (e.g., the streams produced by video cameras 102₁and 102₂in FIG. 1). By enabling better matching when performing encoding of streams 1, 2 and 4-8, video encoding system 500 can provide increased compression of those streams as compared to encoding each stream using a conventional scheme that allows for internal referencing only.

Furthermore, video encoding system 500 generates a plurality of distinct encoded video streams, different subsets of which can be selectively transmitted or accessed to support different desired viewing modes of a 2D/3D display system. For example, to enable a two-dimensional viewing mode, only encoded stream 3 need be transmitted to or accessed by a 2D/3D display system. Since encoded stream 3 was generated using internal referencing only, a decoder connected to or integrated with the 2D/3D display system can perform all necessary decoding operations without access to any other encoded stream other than encoded stream 3.

As a further example, to enable a three-dimensional viewing mode, only encoded streams 3 and 7 need be transmitted to or accessed by the 2D/3D display system. Since encoded stream 3 was generated using internal referencing only and encoded stream 7 was generated using internal referencing to stream 7 and external referencing to stream 3, a decoder connected to or integrated with the 2D/3D display system can perform all necessary decoding operations without access to any other encoded streams other than encoded streams 3 and 7.

As a still further example, to enable a viewing mode that allows for two simultaneous three-dimensional views, only encoded streams 3, 7, 1 and 5 need be transmitted to or accessed by the 2D/3D display system. Due to the way such encoded streams were generated, a decoder connected to or integrated with the 2D/3D display system can perform all necessary decoding operations without access to any other encoded streams other than encoded streams 3, 7, 1 and 5. As yet another example, to enable a viewing mode that allows for four simultaneous three-dimensional views, all encoded streams 1-8 can be transmitted to or accessed by the 2D/3D display system.

By producing the encoded streams in a manner that allows for only those streams that are needed to support a particular two-dimensional or three-dimensional viewing mode to be transmitted or accessed, video encoding system 500 advantageously allows for reducing the network bandwidth required for transmitting such digital video content as well as for reducing the amount of storage space necessary for producing a recorded representation of such content.

Furthermore, the tiered approach used by video encoding system 500 advantageously allows for gradual migration of video content from a form that supports two-dimensional viewing to a form that supports three-dimensional viewing, and from a form that supports three-dimensional viewing to a form that supports simultaneous presentation of multiple three-dimensional views.

For example, assume that stream 3 represents an existing digital video stream that presents a single perspective view of certain subject matter and that the other streams do not exist. In this case, stream 3 may be passed to first tier encoding logic 502 which operates in the manner described above to produce encoded stream 3. Encoded stream 3 can be used to support a two-dimensional viewing mode. Then assume at some later point in time that stream 7 is created through a manual or automated process and provides a second perspective view of the same subject matter. At this point in time, stream 3 and stream 7 may be passed to second tier encoding logic 504 which operates in the manner described above to produce encoded stream 7. Encoded streams 3 and 7 can then be used to support a three-dimensional viewing mode. Then assume at some even later point in time that streams 1 and 5 are created through a manual or automated process and provide third and fourth perspective views of the same subject matter. At this point in time, streams 3, 7, 1 and 5 may be passed to third tier encoding logic 506 which operates in the manner described above to produce encoded streams 1 and 5. Encoded streams 3, 7, 1 and 5 can then be used to support a viewing mode that simultaneously provides two different three-dimensional views. Thus, it can be seen that as additional streams providing additional perspective views of the subject matter of stream 3 are created, they can be encoded by encoding system 500 to support additional viewing configurations.

In alternate implementations of video encoding system 500, different external referencing permissions may be utilized by third tier encoding logic 506 and fourth tier encoding logic 508 in order to reduce inter-stream dependencies that can adversely affect decoding if one or more streams are corrupted or unavailable. For example, in accordance with the foregoing description, third tier encoding logic 506 encodes stream 1 in a manner that allows for external referencing to any of streams 3, 5 and 7 and encodes stream 5 in a manner that allows for external referencing to any of streams 1, 3 and 7. However, in an alternate embodiment, third tier encoding logic 506 may encode stream 1 in a manner that does not allow for external referencing to stream 5 and encode stream 5 in a manner that does not allow for external referencing to stream 1. In this way, if either stream 1 or stream 5 is corrupted or otherwise unavailable at the decoder, the other stream can still be properly decoded.

As a further example, in accordance with the foregoing description, fourth tier encoding logic 508 encodes streams 2, 4, 6 and 8 in a manner that allows for external referencing to any other stream. However, in an alternate embodiment, fourth tier encoding logic 506 may encode each of these streams in a manner that allows for external referencing to streams 1, 3, 5 and 7 only. In this way, if any of streams 2, 4, 6 or 8 is corrupted or otherwise unavailable at the decoder, the other streams can still be properly decoded.

Video encoding system 500 is merely one embodiment of a hierarchical encoding system. A wide variety of other implementations may be used. For example, FIG. 7 is a block diagram of an alternative hierarchical video encoding system 700 in accordance with an embodiment. Like video encoding system 500, video encoding system 700 operates to encode a plurality of digital video streams, denoted streams 1-8, wherein each digital video stream represents a different perspective view of the same subject matter. As shown in FIG. 7, video encoding system 700 comprises at least first tier encoding logic 702, second tier encoding logic 504 and third tier encoding logic 706, each of which may be implemented in hardware, in software, or as a combination of hardware and software.

First tier encoding logic 702 is configured to receive stream 4 and stream 6 and apply a first tier encoding algorithm thereto to produce encoded versions thereof, wherein each encoded version is compressed with respect to the corresponding original version. In one implementation, the first tier encoding of stream 4 is performed in a manner that allows for internal referencing to stream 4 only and the first tier encoding of stream 6 is performed in a manner that allows for internal referencing to stream 6 only. In an alternate implementation, the first tier encoding of stream 4 is performed in a manner that allows for internal referencing to stream 4 and external referencing to stream 6 and the first tier encoding of stream 6 is performed in a manner that allows for internal referencing to stream 6 and external referencing to stream 4.

Second tier encoding logic 704 is configured to receive stream 2 and stream 8 and to apply a second tier encoding algorithm thereto to produce encoded versions thereof, wherein each encoded version is compressed with respect to the corresponding original version. In one implementation, the second tier encoding that is applied to stream 2 is performed in a manner that allows for internal referencing to stream 2 and external referencing to streams 4, 6 and 8 and the second tier encoding that is applied to stream 8 is performed in a manner that allows for internal referencing to stream 8 and external referencing to streams 2, 4 and 6. In an alternate implementation, the second tier encoding of stream 2 is performed in a manner that allows for internal referencing to stream 2 and external referencing to streams 4 and 6 only and the second tier encoding of stream 8 is performed in a manner that allows for internal referencing to stream 8 and external referencing to streams 4 and 6 only.

Third tier encoding logic 706 is configured to receive streams 1, 3, 5 and 7 and apply a third tier encoding algorithm thereto to produce encoded versions thereof, wherein each encoded version is compressed with respect to the corresponding original version. In one implementation, the third tier encoding that is applied to each of streams 1, 3, 5 and 7 is performed in a manner that allows for internal referencing to the stream to be encoded and external referencing to any of the other streams. In an alternate implementation, the third tier encoding that is applied to each of streams 1, 3, 5 and 7 is performed in a manner that allows for internal referencing to the stream to be encoded and external referencing to streams 2, 4, 6 and 8 only.

In an embodiment that does not allow for external referencing between streams 4 and 6, either encoded stream 4 or encodes stream 6 can be provided to support a two-dimensional viewing mode of a 2D/3D display system. To support a three-dimensional viewing mode, both encoded streams 4 and 6 may be provided. To support the simultaneous display of two three-dimensional views, encoded streams 2, 4, 6 and 8 may be provided and to support the simultaneous display of four three-dimensional views, all the encoded streams may be provided.

FIG. 8 depicts a flowchart 800 of a method that further illustrates the concept of hierarchical encoding. In particular, flowchart 800 shows a method for encoding a first video frame sequence, a second video frame sequence and a third video frame sequence, wherein the first video frame sequence corresponds to a first perspective view of certain subject matter, the second video frame sequence corresponds to a second perspective view of the same subject matter, and the third video frame sequence corresponds to a third perspective view of the same subject matter.

As shown in FIG. 8, the method of flowchart 800 begins at step 802, during which a first tier of hierarchical encoding is applied to the first video frame sequence to produce first tier encoded data. In accordance with the method, the first tier of hierarchical encoding uses first internal referencing only, wherein the first internal referencing involves multiple frames within the first video frame sequence. Consequently, the first tier encoded data is independently decodable (i.e., it can be decoded without having access to any other encoded data). Step 802 may be performed, for example, by first tier encoding logic 502 of video encoding system 500, which encodes stream 3 using internal referencing that involves multiple frames of stream 3 to produce independently-decodable encoded stream 3.

At step 804, a second tier of hierarchical encoding is applied to the second video frame sequence to produce second tier encoded data. In accordance with the method, the second tier of hierarchical encoding uses both second internal referencing and first external referencing, wherein the second internal referencing involves multiple frames within the second video frame sequence and the first external referencing involves references between at least one frame of the second video frame sequence and at least one frame of the first video frame sequence. Consequently, the second tier encoded data is decodable only with access to the first tier encoded data. Step 804 may be performed, for example, by second tier encoding logic 504 of video encoding system 500, which encodes stream 7 using internal referencing that involves multiple frames of stream 7 and external referencing that involves at least one frame of stream 7 and at least one frame of stream 3 to produce encoded stream 7. Encoded stream 7 produced by second tier encoding logic 504, as discussed above, can only be properly decoded by having access to encoded stream 3.

At step 806, a third tier of hierarchical encoding is applied to the third video frame sequence to produce third tier encoded data. In accordance with one implementation, the third tier of hierarchical encoding uses both third internal referencing and second external referencing, wherein the third internal referencing involves multiple frames within the third video frame sequence and the second external referencing involves references between at least one frame of the third video frame sequence and a frame of the first video frame sequence, the second video frame sequence, or frames of both the first and second video frame sequences. Consequently, decoding the third tier encoded data will require access to the first tier and/or second tier encoded data. Step 806 may be performed, for example, by third tier encoding logic 504 of video encoding system 500, which encodes stream 1 using internal referencing that involves multiple frames of stream 1 and external referencing that involves at least one frame of stream 1 and at least one frame of stream 3, one frame of stream 7, or frames of both streams 3 and 7 to produce encoded stream 1. Encoded stream 1 produced by third tier encoding logic 506, as discussed above, can only be properly decoded with access to encoded streams 3 and 7.

FIG. 9 depicts a flowchart 900 of another method that illustrates aspects of hierarchical encoding. In particular, flowchart 900 shows a method for encoding a plurality of video elements, wherein each of the plurality of video elements represents a video frame sequence from a selected perspective view.

As shown in FIG. 9, the method of flowchart 900 begins at step 902, during which a first of the plurality of video elements is identified, the first of the plurality of video elements having a first video frame sequence corresponding to a first perspective view. At step 904, an encoded version of the first video frame sequence is produced, the encoded version of the first video frame sequence being independently decodable. Steps 902 and 904 may be performed, for example, by first tier encoding logic 502 of video encoding system 500 which operates to identify stream 3 in the plurality of streams 1-8 and to encode stream 3 in a manner that produces independently-decodable encoded stream 3.

At step 906, a second of the plurality of video elements is identified, the second of the plurality of video elements having a second video frame sequence corresponding to a second perspective view. At step 908, an encoded version of the second video frame sequence is produced, the encoded version of the second video frame sequence being decodable with reference to at least a representation of a frame of the first video frame sequence. Steps 906 and 908 may be performed, for example, by second tier encoding logic 504 of video encoding system 500 which operates to identify stream 7 in the plurality of streams 1-8 and to encode stream 7 in a manner that produces encoded stream 7 which can only be decoded with reference to at least a representation of a frame of stream 3.

The method of flowchart 900 may include additional steps such as identifying a third of the plurality of video elements, the third of the plurality of video elements having a third video frame sequence corresponding to a third perspective view. The method may still further include producing an encoded version of the third video frame sequence, the encoded version of the third video frame sequence being decodable with reference to at least a representation of a frame of the first video frame sequence, at least a representation of a frame of the second vide frame sequence, or at least a representation of a frame of the first video frame sequence and a representation of a frame of the second video frame sequence. These additional steps may be performed, for example, by third tier encoding logic 506 of video encoding system 500 which operates to identify stream 1 in the plurality of streams 1-8 and to encode stream 1 in a manner that produces encoded stream 1 which can only be decoded with reference to at least a representation of a frame of stream 3, at least a representation of a frame of stream 7, or at least a representation of a frame of stream 1 and a representation of a frame of stream 7.

III. Hierarchical Video Decoding in Accordance with Embodiments

FIG. 10 is a block diagram of a hierarchical video decoding system 1000 in accordance with an embodiment. Video decoding system 1000 operates to decode a plurality of encoded digital video streams, wherein each digital video stream represents a different perspective view of the same subject matter. Video decoding system 1000 is intended to represent a system that can be utilized to decode encoded streams 1-8 produced by example video encoding system 500 as described above in reference to FIG. 5.

As shown in FIG. 10, video decoding system 1000 comprises at least first tier decoding logic 1002, second tier decoding logic 1004, third tier decoding logic 1006 and fourth tier decoding logic 1008. Each of these components may be implemented in hardware using analog and/or digital circuits or in software, through the execution of instructions by one or more general purpose or special-purpose processors, or as a combination of hardware and software.

First tier decoding logic 1002 is configured to receive encoded stream 3 and apply a first tier decoding algorithm thereto to produce a decoded version thereof. Since encoded stream 3 was generated using internal referencing only, first tier decoding logic 1002 can decode the stream without having access to any of encoded streams 1, 2, or 4-8. In an embodiment in which the compression algorithm is lossless, the first tier decoding applied to encoded stream 3 will produce a representation of the stream that is the same as the original representation received by the encoder, assuming no corruption of the encoded data during transmission or access thereof. In an embodiment in which the compression algorithm is lossy, the first tier decoding applied to encoded stream 3 will produce a representation of the stream that is an estimation of the original representation received by the encoder.

Second tier decoding logic 1004 is configured to receive encoded stream 7 and apply a second tier decoding algorithm thereto to produce a decoded version thereof. Since encoded stream 7 was generated using both internal referencing and external referencing to stream 3, second tier decoding logic 1004 decodes the stream, in part, by accessing decoded frames of stream 3 produced by first tier decoding logic 1002. The second tier decoding applied to encoded stream 7 will produce a representation of the stream that is the same as or an estimate of the original representation received by the encoder, depending upon whether the encoder applied lossless or lossy compression.

Third tier decoding logic 1006 is configured to receive encoded streams 1 and 5 and apply a third tier decoding algorithm thereto to produce decoded versions thereof. Since encoded stream 1 was generated using both internal referencing and external referencing to streams 3, 5 and 7, third tier decoding logic 1006 decodes the stream, in part, by accessing decoded frames of stream 3 produced by first tier decoding logic 1002, decoded frames of stream 7 produced by second tier decoding logic 1004, and decoded frames of stream 5 produced by third tier decoding logic 1006. Additionally, since encoded stream 5 was generated using both internal referencing and external referencing to streams 3, 7 and 1, third tier decoding logic 1006 decodes the stream, in part, by accessing decoded frames of stream 3 produced by first tier decoding logic 1002, decoded frames of stream 7 produced by second tier decoding logic 1004, and decoded frames of stream 1 produced by third tier decoding logic 1006. The third tier decoding applied to encoded streams 1 and 5 will produce a representation of each stream that is the same as or an estimate of the corresponding original representation received by the encoder, depending upon whether the encoder applied lossless or lossy compression.

Fourth tier decoding logic 1008 is configured to receive encoded streams 2, 4, 6 and 8, and apply a fourth tier decoding algorithm thereto to produce decoded versions thereof. Since each encoded stream was generated using both internal referencing and external referencing to all the other streams, fourth tier decoding logic 1008 decodes each stream, in part, by accessing decoded frames of stream 3 produced by first tier decoding logic 1002, decoded frames of stream 7 produced by second tier decoding logic 1004, decoded frames of streams 1 and 5 produced by third tier decoding logic 1006, and decoded frames of three out of four of streams 2, 4, 6 and 8 produced by fourth tier decoding logic 1008. The fourth tier decoding applied to encoded streams 2, 4, 6 and 8 will produce a representation of each stream that is the same as or an estimate of the corresponding original representation received by the encoder, depending upon whether the encoder applied lossless or lossy compression.

FIG. 11 is a block diagram of a hierarchical video decoding system 1100 in accordance with an alternate embodiment. Like video decoding system 1000 of FIG. 10, video decoding system 1100 operates to decode a plurality of encoded digital video streams, wherein each digital video stream represents a different perspective view of the same subject matter. Video decoding system 1100 is intended to represent a system that can be utilized to decode encoded streams 1-8 produced by example video encoding system 700 as described above in reference to FIG. 7.

As shown in FIG. 11, video decoding system 1100 comprises at least first tier decoding logic 1102, second tier decoding logic 1104 and third tier decoding logic 1106. Each of these components may be implemented in hardware using analog and/or digital circuits or in software, through the execution of instructions by one or more general purpose or special-purpose processors, or as a combination of hardware and software.

First tier decoding logic 1102 is configured to receive encoded streams 4 and 6 and apply a first tier decoding algorithm thereto to produce a decoded version thereof. In an embodiment in which each of encoded streams 4 and 6 is generated using internal referencing only, first tier decoding logic 1102 decodes each stream without accessing decoded frames associated with any of the other encoded streams. In an embodiment in which each of encoded streams 4 and 6 is generated using internal referencing and external referencing to the other stream, first tier decoding logic 1102 decodes each stream, in part, by accessing decoded frames of the other stream. The first tier decoding applied to encoded streams 4 and 6 will produce a representation of each stream that is the same as or an estimate of the corresponding original representation received by the encoder, depending upon whether the encoder applied lossless or lossy compression.

Second tier decoding logic 1104 is configured to receive encoded streams 2 and 8 and apply a second tier decoding algorithm thereto to produce decoded versions thereof. In an embodiment in which encoded stream 2 is generated using both internal referencing and external referencing to streams 4, 6 and 8, second tier decoding logic 1104 decodes the stream, in part, by accessing decoded frames of streams 4 and 6 produced by first tier decoding logic 1102 and decoded frames of stream 8 produced by second tier decoding logic 1104. In an embodiment in which encoded stream 2 is generated using both internal referencing and external referencing to streams 4 and 6 only, second tier decoding logic 1104 decodes the stream, in part, by accessing decoded frames of streams 4 and 6 produced by first tier decoding logic 1102. Additionally, in an embodiment in which encoded stream 8 is generated using both internal referencing and external referencing to streams 4, 6 and 2, second tier decoding logic 1104 decodes the stream, in part, by accessing decoded frames of streams 4 and 6 produced by first tier decoding logic 1102 and decoded frames of stream 2 produced by second tier decoding logic 1104. In an embodiment in which encoded stream 8 is generated using both internal referencing and external referencing to streams 4 and 6 only, second tier decoding logic 1104 decodes the stream, in part, by accessing decoded frames of streams 4 and 6 produced by first tier decoding logic 1102. The second tier decoding applied to encoded streams 2 and 8 will produce a representation of each stream that is the same as or an estimate of the corresponding original representation received by the encoder, depending upon whether the encoder applied lossless or lossy compression.

Third tier decoding logic 1106 is configured to receive encoded streams 1, 3, 5 and 7, and apply a third tier decoding algorithm thereto to produce decoded versions thereof. In an embodiment in which each encoded stream is generated using both internal referencing and external referencing to all the other streams, third tier decoding logic 1106 decodes each stream, in part, by accessing decoded frames of streams 4 and 6 produced by first tier decoding logic 1102, decoded frames of streams 2 and 8 produced by second tier decoding logic 1104, and decoded frames of three out of four of streams 1, 3, 5 and 7 produced by third tier decoding logic 1008. In an embodiment in which each encoded stream is generated using both internal referencing and external referencing to streams 4, 6, 2 and 8 only, third tier decoding logic 1106 decodes each stream, in part, by accessing decoded frames of streams 4 and 6 produced by first tier decoding logic 1102 and decoded frames of streams 2 and 8 produced by second tier decoding logic 1104. The third tier decoding applied to encoded streams 1, 3, 5 and 7 will produce a representation of each stream that is the same as or an estimate of the corresponding original representation received by the encoder, depending upon whether the encoder applied lossless or lossy compression.

Video decoding systems 1000 and 1100 have been presented herein by way of example only and represent only two possible embodiments of a hierarchical decoding system. A wide variety of other implementations may be used.

FIG. 12 depicts a flowchart 1200 of a method that further illustrates the concept of hierarchical decoding. In particular, flowchart 1200 shows a method for decoding encoded video content, the encoded video content having both a first encoded portion relating to a first perspective view of three-dimensional content and a second encode portion relating to a second perspective view of three-dimensional content.

As shown in FIG. 12, the method of flowchart 1200 begins at step 1202, during which the first encoded portion of the encoded video content is operated on using a first predefined decoding approach that does not support external referencing. This step results in the production of first video data, wherein the first video data is representative of the first perspective view of the three-dimensional content. Step 1202 may be performed, for example, by first tier decoding logic 1002 of video decoding system 1000, which decodes encoded stream 3 in a manner that does not support external referencing to any other digital video stream as described above to produce decoded stream 3.

At step 1204, the second encoded portion of the encoded video content is operated on using a second predefined decoding approach that differs from the first predefined encoding approach in at least that the second predefined encoding approach supports external referencing. This steps results in the production of second video data, wherein the second video data is representative of the second perspective view of the three-dimensional content. Step 1204 may be performed, for example, by second tier decoding logic 1004 of video decoding system 1000, which decodes stream 7 in a manner that supports external referencing to stream 7 as described above to produce decoded stream 3.

IV. Hierarchical Encoding/Decoding System in Accordance with Embodiments

FIG. 13 is a block diagram of an example system 1300 that performs hierarchical video encoding and decoding in accordance with an embodiment. As shown in FIG. 13, system 1300 includes a plurality of digital video streams 1-8 that are processed by one or more hierarchical video encoders 1302 to generate a plurality of encoded streams 1-8. For the purposes of this example, it is to be assumed that encoded streams 1-8 are generated in a like fashion to encoded streams 1-8 described above in reference to video encoding system 500 of FIG. 5.

Digital video streams 1-8 may be received from the same source. Alternatively, different digital video streams may be received from different sources. All of the digital video streams may be encoded at the same time by the same encoder or, alternatively, different combinations of different digital video streams may be encoded at different times by different encoders to produce different ones of encoded streams 1-8. For example, digital video stream 1 may be encoded by a first hierarchical video encoder at a first time to generate encoded stream 1 and digital video stream 2 may be encoded by a second hierarchical video encoder that also has access to digital video stream 1 at a second time to generate encoded stream 2.

Encoded streams 1-8 generated by hierarchical video encoder(s) 1302 are provided in real-time or stored for subsequent distribution via one or more digital video content source(s) 1304. A digital video content source may comprise, for example, an entity capable of transmitting one or more of encoded streams 1-8 such as, but not limited to, entities associated with a terrestrial broadcasting, cable TV or satellite TV service or a server connected to an IP network. A digital video content source may also comprise, for example, a data storage device capable of storing one or more of encoded streams 1-8 such as, but not limited to, a DVD, a Blu-ray™ disc, a hard disk drive or the like. All of encoded streams 1-8 may be transmitted or stored by a single digital video content source. Alternatively, different ones of encoded streams 1-8 may be transmitted or stored by different digital video content sources (e.g., encoded streams 1, 3, 5 and 7 may be stored on a DVD and encoded streams 2, 4, 6 and 8 may be served by an Internet server).

Digital video content presentation system 1308 comprises a system used by an end user to view two-dimensional and three-dimensional video content. As shown in FIG. 13, system 1308 includes a multisource combiner 1312, a hierarchical video decoder 1314, and a 2D/3D display system 1316. System 1308 is capable of requesting digital video content that comprises one or more of encoded streams 1-8. Such requests are directed to the appropriate digital video content source(s). These source(s) then deliver the requested video content to system 1308 via one or more distribution channels 1306. For example, if encoded streams 1, 3, 5 and 7 are stored on a DVD and encoded streams 2, 4, 6 and 8 are served by an Internet server, digital video content presentation system 1310 can obtain encoded streams 1, 3, 5 and 7 by sending commands to and receiving the streams from a DVD player connected thereto and can obtain encoded streams 2, 4, 6 and 8 by requesting and receiving those streams over a suitable Internet connection.

Hierarchical video decoder 1312 operates to decode the encoded streams received via channel(s) 1306 to generate decoded versions thereof which are then passed to 2D/3D display system 1314. For the purposes of this example, it is to be assumed that hierarchical video decoder 1312 operates in a like fashion to example video decoding system 700 of FIG. 7, although other hierarchical decoder implementations may be used. When the encoded streams are obtained from different digital video content sources, multisource combiner 1310 operates to identify related encoded streams as they are received over different channels and to present them to hierarchical video decoder 1312 in a synchronized manner. 2D/3D display system operates in a like fashion to example 2D/3D display system 210 as described above in reference to FIGS. 2-4 to process the decoded streams produced by hierarchical video decoder 1312 to generate one of a two-dimensional view, a single three-dimensional view, or to simultaneously display multiple three-dimensional views.

System 1308 may advantageously request only those encoded streams that it requires to implement a desired display mode of 2D/3D display system 1314. For example, to support a two-dimensional display mode of 2D/3D display system 1314, system 1308 may request only a single one of encoded streams 1-8. To support a three-dimensional display mode of 2D/3D display system 1314, system 1308 may request only two of encoded streams 1-8. To support a display mode of 2D/3D display system 1314 that provides multiple three-dimensional views simultaneously, system 1308 may request 4, 6 or 8 of encoded streams 1-8.

The determination of which encoded streams to be obtained by or provided to system 1308 may be based on a variety of factors other than or in addition to the current display mode of 2D/3D display system. For example and without limitation, such a determination may additionally or alternatively made based on the innate display capabilities of 2D/3D display system 1314, the availability of a particular encoded stream or digital video content source 1304, and/or the quality or availability of a channel 1306 over which an encoded stream is to be delivered. Furthermore, depending upon the implementation, the determination of which encoded streams to be obtained by or provided to system 1308 may be made by system 1308, by one or more of digital video content sources 1304, or by one or more nodes included within any one of video content distribution channel(s) 1306.

FIG. 14 is a block diagram of another example system 1400 that performs hierarchical video encoding and decoding in accordance with an embodiment. As shown in FIG. 14, system 1400 includes a media source 1402 that includes a hierarchical video encoder 1412. Media source 1402 may comprise, for example, a terrestrial broadcasting, cable TV or satellite TV service or a server connected to an IP network that has access to various types of media. Media source 1402 may also comprise, for example, a device with media generation capability. In any case, media source 1402 generates or has access to plurality of “raw” (i.e., unencoded) video frame sequences 1414 each of which represents a different perspective view.

Hierarchical video encoder 1412, which may represent for example any of video encoding system 500, video encoding system 700, or hierarchical video encoder 1302 as described above, processes plurality of raw video frame sequences 1414 in a manner described herein to generate a plurality of hierarchically-encoded video streams. As discussed above in reference to other embodiments, certain ones of these hierarchically-encoded video streams may be produced in a manner that does not utilize external referencing while certain other ones of these hierarchically-encoded video streams may be produced in a manner that does utilize external referencing.

As further shown in FIG. 14, all or a selected number of the hierarchically-encoded video streams may be transmitted to or retrieved by one or more 2D/3D display systems 1406. The hierarchically-encoded video streams may be transmitted or retrieved in real time over one or more communication links. Alternatively, the hierarchically-encoded video streams may be stored on a suitable storage medium 1404 for subsequent transmission to or retrieval by 2D/3D display system(s) 1406. Storage medium 1404 may comprise for example and without limitation a DVD or Blu-ray™ disc, a hard disk drive, a distributed storage system, or the like. Such storage medium 1404 may be co-located with the media source, co-located with 2D/3D display system(s) 1406, or accessible to a node located along a communication link between media source 1402 and 2D/3D display system(s) 1406. Depending upon the implementation, various elements of 2D/3D display system(s) 1406 may operate to retrieve the hierarchically-encoded video stream(s). Such elements may be included within a set-top box, gateway device, personal computer, telephone, television, stand-alone display, an Internet or NAS server, or some other network node located internal or external to a premises, or circuitry located there within.

As further shown in FIG. 14, 2D/3D display system(s) 1406 include a hierarchical video decoder 1416. Hierarchical video decoder 1416, which may represent for example any of video decoding system 1000, video decoding system 1100, or hierarchical video decoder 1312 as described above, selectively decodes one or more of the received or retrieved hierarchically-encoded video stream(s) in a manner described herein to produce one or more raw video streams suitable for supporting a particular two-dimensional or three-dimensional video presentation to one or more viewers. As shown in the example embodiment of FIG. 14, hierarchical video decoder 1416 operates to decode a single hierarchically-encoded video stream that was encoded without external referencing to produce a single raw video stream in support of a two-dimensional visual presentation 1408. FIG. 15 is a block diagram of an alternate embodiment 1500 of the system shown in FIG. 14 in which hierarchical video decoder 1416 operates to decode a pair of selected hierarchically-encoded video streams, only one of which was encoded with external referencing, to produce two raw video streams in support of a 3D2 visual presentation 1502, wherein a 3D2 visual presentation is a three-dimensional visual presentation based on two different perspective views. FIG. 16 is a block diagram of an alternate embodiment 1600 of the system shown in FIG. 14 in which hierarchical video decoder 1416 operates to decode selected ones of a plurality of hierarchically-encoded video streams to produce one or more video streams in support of a particular 2D or 3Dx visual presentation 1602, wherein a 3Dx visual presentation is a three-dimensional presentation based on x different perspective views. As further shown in FIG. 16, the raw video stream(s) produced by hierarchical video decoder 1416 may simultaneously be used to support another 2D or 3Dx visual presentation 1604 for a second viewer.

V. Example Hardware and Software Implementations

Video encoding system 500, video encoding system 700, video decoding system 1000, video decoding system 1100, hierarchical video encoder(s) 1302, hierarchical video decoder 1312, hierarchical video encoder 1412, hierarchical video decoder 1416, and any sub-components thereof may be implemented in hardware, software, firmware, or any combination thereof.

For example, FIG. 17 illustrates a hardware configuration 1700 that may be used to implement any of the aforementioned video encoding or decoding systems. In an encoder implementation, input circuitry 1702 operates to receive one or more digital video streams that represent different perspective views of the same subject matter and processing circuitry 1704 operates to apply hierarchical encoding thereto in accordance with any of the methods described herein. The operation of processing circuitry 1704 produces one or more encoded streams which are then communicated to another entity via output circuitry 1706.

In a decoder implementation, input circuitry 1702 operates to receive one or more encoded digital video streams, wherein the digital video streams represent different perspective views of the same subject matter, and processing circuitry 1704 operates to apply hierarchical decoding thereto in accordance with any of the methods described herein. The operation of processing circuitry 1704 produces one or more decoded streams which are then communicated to another entity via output circuitry 1706.

In contrast, FIG. 18 shows a block diagram of a software or firmware based system 1800 for implementing a hierarchical video encoder and/or decoder in accordance with an embodiment. As shown in the example of FIG. 18, system 1800 may include one or more processors (also called central processing units, or CPUs), such as a processor 1804. Processor 1804 is connected to a communication infrastructure 1802, such as a communication bus. In some embodiments, processor 1804 can simultaneously operate multiple computing threads.

System 1800 also includes a primary or main memory 1806, such as random access memory (RAM). Main memory 1806 has stored therein control logic 1828A (computer software), and data.

System 1800 also includes one or more secondary storage devices 1810. Secondary storage devices 1810 include, for example, a hard disk drive 1812 and/or a removable storage device or drive 1814, as well as other types of storage devices, such as memory cards and memory sticks. For instance, system 1800 may include an industry standard interface, such a universal serial bus (USB) interface for interfacing with devices such as a memory stick. Removable storage drive 1814 represents a floppy disk drive, a magnetic tape drive, a compact disk drive, an optical storage device, tape backup, etc.

Removable storage drive 1814 interacts with a removable storage unit 1816. Removable storage unit 1816 includes a computer useable or readable storage medium 1824 having stored therein computer software 1828B (control logic) and/or data. Removable storage unit 1816 represents a floppy disk, magnetic tape, compact disk, DVD, optical storage disk, or any other computer data storage device. Removable storage drive 1814 reads from and/or writes to removable storage unit 1816 in a well known manner.

System 1800 further includes a communication or network interface 1818. Communication interface 1818 enables system 1800 to communicate with remote devices. For example, communication interface 1818 allows system 1800 to communicate over communication networks or mediums 1832 (representing a form of a computer useable or readable medium), such as LANs, WANs, the Internet, etc. Network interface 1818 may interface with remote sites or networks via wired or wireless connections.

Control logic 1828C may be transmitted to and from system 1800 via the communication medium 1842.

Any apparatus or manufacture comprising a computer useable or readable medium having control logic (software) stored therein is referred to herein as a computer program product or program storage device. This includes, but is not limited to, system 1800, main memory 1806, secondary storage devices 1810, and removable storage unit 1816. Such computer program products, having control logic stored therein that, when executed by one or more data processing devices, cause such data processing devices to operate as described herein, represent embodiments of the invention.

Devices in which embodiments may be implemented may include storage, such as storage drives, memory devices, and further types of computer-readable media. Examples of such computer-readable storage media include a hard disk, a removable magnetic disk, a removable optical disk, flash memory cards, digital video disks, random access memories (RAMs), read only memories (ROM), and the like. As used herein, the terms “computer program medium” and “computer-readable medium” are used to generally refer to the hard disk associated with a hard disk drive, a removable magnetic disk, a removable optical disk (e.g., CDROMs, DVDs, etc.), zip disks, tapes, magnetic storage devices, MEMS (micro-electromechanical systems) storage, nanotechnology-based storage devices, as well as other media such as flash memory cards, digital video discs, RAM devices, ROM devices, and the like. Such computer-readable storage media may store program modules that include computer program logic for implementing the functions of video encoding system 500, video encoding system 700, video decoding system 1000, video decoding system 1100, hierarchical video encoder(s) 1302, hierarchical video decoder 1312, hierarchical video encoder 1412, hierarchical video decoder 1416, and any of the sub-components thereof or for performing the steps of any of the flowcharts described herein. Embodiments of the invention are directed to computer program products comprising such logic (e.g., in the form of program code or software) stored on any computer useable medium. Such program code, when executed in one or more processors, causes a device to operate as described herein.

The invention can work with software, hardware, and/or operating system implementations other than those described herein. Any software, hardware, and operating system implementations suitable for performing the functions described herein can be used.

VI. Conclusion

While various embodiments of the present invention have been described above, it should be understood that they have been presented by way of example only, and not limitation. It will be apparent to persons skilled in the relevant art that various changes in form and detail can be made therein without departing from the spirit and scope of the invention. Thus, the breadth and scope of the present invention should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.

Claims

1. A method for encoding a first video frame sequence and a second video frame sequence, the first video frame sequence corresponding to a first perspective view, the second frame sequence corresponding to a second perspective view, the method comprising:

applying a first tier of hierarchical encoding to the first video frame sequence to produce first tier encoded data, the first tier of hierarchical encoding using first internal referencing, the first internal referencing involving multiple frames within the first video frame sequence;

applying a second tier of hierarchical encoding to the second video frame sequence to produce second tier encoded data, the second tier of hierarchical encoding using both second internal referencing and first external referencing, the second internal referencing involving multiple frames within the second video frame sequence, the first external referencing involving references between a frame of the second video frame sequence and a frame of the first video frame sequence; and

the first tier encoded data being independently decodable, and the second tier encoded data being decodable with access to the first tier encoded data.

2. The method of claim 1, further comprising delivering the first tier encoded data to service a two-dimensional viewing environment.

3. The method of claim 1, further comprising delivering both the first tier encoded data and the second tier encoded data to service a three-dimensional viewing environment.

4. The method of claim 1, further comprising applying a third tier of hierarchical encoding to a third video frame sequence corresponding to a third perspective view, the third tier of hierarchical encoding using both third internal referencing and second external referencing, the third internal referencing involving multiple frames within the third video frame sequence, the second external referencing involving references between a frame of the third video frame sequence and a frame of the second video frame sequence.

5. The method of claim 1, further comprising applying a third tier of hierarchical encoding to a third video frame sequence corresponding to a third perspective view, the third tier of hierarchical encoding using both third internal referencing and second external referencing, the third internal referencing involving multiple frames within the third video frame sequence, the second external referencing involving references between a frame of the third video frame sequence and a frame of the first video frame sequence.

6. The method of claim 1, further comprising applying a third tier of hierarchical encoding to a third video frame sequence corresponding to a third perspective view, the third tier of hierarchical encoding using both third internal referencing and second external referencing, the third internal referencing involving multiple frames within the third video frame sequence, the second external referencing involving references between a first frame of the third video frame sequence and a frame of the first video frame sequence and references between a second frame of the third video frame sequence and a frame of the second video frame sequence.

7. A method used to encode a plurality of video elements, each of the plurality of video elements representing a video frame sequence from a selected perspective view, the method comprising:

identifying a first of the plurality of video elements, the first of the plurality of video elements having a first video frame sequence corresponding to a first perspective view;

producing an encoded version of the first video frame sequence, the encoded version of the first video frame sequence being independently decodable;

identifying a second of the plurality of video elements, the second of the plurality of video elements having a second video frame sequence corresponding to a second perspective view; and

producing an encoded version of the second video frame sequence, the encoded version of the second video frame sequence being decodable with reference to at least a representation of a frame of the first video frame sequence.

8. The method of claim 7, wherein the encoded version of the first video frame sequence can be decoded to produce a two-dimensional output.

9. The method of claim 7, wherein the encoded versions of both the first video frame sequence and the second video frame sequence can be decoded to produce a three-dimensional output.

10. The method of claim 7, further comprising producing an encoded version of a third video frame sequence, the encoded version of the third video frame sequence being decodable with reference to at least a representation of a frame of the second video frame sequence.

11. The method of claim 7, further comprising producing an encoded version of a third video frame sequence, the encoded version of the third video frame sequence being decodable with reference to at least a representation of a frame of the first video frame sequence.

12. The method of claim 7, further comprising producing an encoded version of a third video frame sequence, the encoded version of the third video frame sequence being decodable with reference to at least a representation of a frame of the first video frame sequence and a representation of a frame of the second video frame sequence.

13. An encoding system that receives video content, the video content having both first video data representative of a first perspective view of three-dimensional content and second video data representative of a second perspective view of three-dimensional content, the encoding system comprising:

a processing circuit that, using a first predefined encoding approach, operates on the first video data to produce a first encoded portion of an encoded version of the video content;

the processing circuit that, using a second predefined encoding approach, operates on the second video data to produce a second encoded portion of the encoded version of the video content, the second predefined encoding approach differing from the first predefined encoding approach in at least that the second predefined encoding approach supports external referencing; and

output circuitry, coupled to the processing circuitry, through which the first encoded portion and the second encoded portion are communicated.

14. The encoding system of claim 13, wherein the first predefined encoding approach comprises first tier hierarchical encoding and the second predefined encoding approach comprises second tier hierarchical encoding.

15. The encoding system of claim 13, wherein the video content also has third video data representative of a third perspective view of three-dimensional content;

wherein the second predefined encoding approach supports external referencing between frames of the second video data and frames of the first video data;

wherein the processing circuit, using a third predefined encoding approach, operates on the third video data to produce a third encoded portion of the encoded version of the video content, the third predefined encoding approach supporting external referencing between frames of the third video data and frames of the second video data and external referencing between frames of the third video data and frames of the first video data.

16. The encoding system of claim 15, wherein the first predefined encoding approach comprises first tier hierarchical encoding, the second predefined encoding approach comprises second tier hierarchical encoding, and the third predefined encoding approach comprises third tier hierarchical encoding.

17. A decoding system that receives encoded video content, the encoded video content having both a first encoded portion relating to a first perspective view of three-dimensional content and a second encoded portion relating to a second perspective view of three-dimensional content, the decoding system comprising:

a processing circuit that, using a first predefined decoding approach, operates on the first encoded portion of the encoded video content to produce first video data, the first video data representative of the first perspective view of the three-dimensional content;

the processing circuit that, using a second predefined decoding approach, operates on the second encoded portion of the encoded video content to produce second video data, the second video data representative of the second perspective view of the three-dimensional content, and the second predefined decoding approach differing from the first predefined decoding approach in at least that the second predefined decoding approach supports external referencing; and

output circuitry, coupled to the processing circuitry, through which the first video data and the second video data are communicated.

18. The decoding system of claim 11, wherein the first predefined decoding approach comprises first tier hierarchical decoding and the second predefined decoding approach comprises second tier hierarchical decoding.

19. The decoding system of claim 11, wherein the encoded video content also has a third encoded portion relating to a third perspective view of three-dimensional content;

wherein the second predefined decoding approach supports external referencing between frames represented by the second encoded portion of the encoded video content and frames represented by the first encoded portion of the video content;

wherein the processing circuit, using a third predefined decoding approach, operates on the third encoded portion of the encoded video content to produce third video data, the third video data representative of the third perspective view of the three-dimensional content, the third predefined decoding approach supporting external referencing between frames represented by the third encoded portion of the encoded video content and frames represented by the second encoded portion of the encoded video content and external referencing between frames represented by the third encoded portion of the encoded video content and frames represented by the first encoded portion of the encoded video content.

20. The decoding system of claim 13, wherein the first predefined decoding approach comprises first tier hierarchical decoding, the second predefined decoding approach comprises second tier hierarchical decoding, and the third predefined decoding approach comprises third tier hierarchical decoding.