CAMERA TAP TRANSCODER ARCHITECTURE WITH FEED FORWARD ENCODE DATA
Embodiments of the present disclosure include transcoder architecture that can decode an input encoded media data as raw media data and then utilize feed forward encode data (provided with the encoded media data) to encode the raw media data. Further, embodiments of the transcoder architecture include a camera tap in the transcoder architecture.
Latest BROADCOM CORPORATION Patents:
This application claims priority to copending U.S. provisional application entitled, “Image Capture Device Systems and Methods,” having Ser. No. 61/509,747, filed Jul. 20, 2011, which is entirely incorporated herein by reference.
BACKGROUNDVideo and other media is often streamed in compressed form over a communication network to a destination and rendered in real time by a media player. Instead of downloading the media as a file in its entirety and then playing the file, encoded media is sent in a continuous stream of data, decoded at a destination decoder, and played as the data arrives at a media player. As a result, the streaming of media places a great deal of stress on destination decoders and media players, especially when the encoded media data may need to be adjusted to accommodate constraints at the destination.
Many aspects of the present disclosure can be better understood with reference to the following drawings. The components in the drawings are not necessarily to scale, emphasis instead being placed upon clearly illustrating the principles of the present disclosure. Moreover, in the drawings, like reference numerals designate corresponding parts throughout the several views.
Embodiments of the present disclosure utilize supplemental encoding information (“feed forward encode data”) provided from an upstream encoder to assist in the encoding of media, where an upstream encoding of the raw media data generated contents of the supplemental feed forward encode data as a by-product. Embodiments include transcoder architecture that can decode an input encoded media data as raw media data and then utilize feed forward encode data (provided with the encoded media data) to encode the raw media data. Further, embodiments of the transcoder architecture include a camera tap in the transcoder architecture.
The first encoded media stream is received and decoded by a transcoder 120 (via decoder 122) and then encoded by the transcoder 120 (via encoder 124) as a second encoded media stream. The encoder 124, in encoding the media, may also scale the raw media data before generating the second encoded media stream for a destination decoder 130.
Over a communication pathway 125, the second encoded media stream is transmitted and received by the destination decoder 130. A video-image camera 140 is also provided and is shown to contain taps or inputs into a decoder 122 and/or encoder 124 of the transcoder 120 over communication pathways. Various elements of the system components (e.g., encoder 110, 124, decoders 122, 130, camera 140, etc.) may be implemented in hardware and/or software.
Feed forward encode data 105 is shown to be supplied from and to encoders 110, 124 in the environment. In one embodiment, feed forward encode data 105 comprises supplemental encoding information that is provided from an encoder and sent downstream in addition with encoded media being supplied from an encoder, such as the media source encoder 110 or an encoder 124 downstream from the media source encoder 110. As an example, video encoding standards such as MPEG-2, ITU-H.264 (also known as MPEG-4, Part 10 and Advanced Video Coding) use motion compensation for compressing video data comprising a series of pictures. Therefore, intermediate results from motion compensation processes may be provided as feed forward encode data 105 from media source encoder 110 in generating the first encoded media stream, where a downstream encoder 124 utilizes the feed forward encode data to supplement its motion compensation processes used to generate the second encoded media stream. As a result, the downstream encoder 124 can rely on computations and configurations of an upstream or previous encoder to assist in encoding of the raw media data.
In embodiments of the present disclosure, however, encoder data from the upstream encoder is not discarded and is rather output from the upstream encoder 110 and received by the downstream or secondary encoder 124 to accelerate the secondary encoder's task of encoding the raw media data. Embodiments of the transcoder 120 may also serve up the media data after scaling or converting the media data in a format suitable for and supported by the destination decoder 130 and/or display device. In general, scaling may involve temporal, spatial, and quality modifications and various factors may govern the applicability of scaling, such as with scaled video coding (SVC). One factor is the screen size and screen processing capabilities of a display device, including how many frames per second the device can handle, a capability of the device to process 3D images, current power constraints (e.g., has a limited battery), etc. These are types of possible constraints that may cause the transcoder 120 (or another network encoder) to implement SVC adjustments. For example, in one embodiment, the video-image camera may be equipped with its own encoder and may perform its own scaling adjustments before outputting a bit stream to the transcoder 120 and its decoder 122. In an alternative embodiment, the video-image camera 140 may not be equipped with its own encoder and may feed raw media data (video, image, audio, etc.) to the encoder 124 of the transcoder 120. In this case, the video-image camera 140 can still do temporal, spatial, and quality modifications before sending the raw data to the transcoder 120. Also, the media source encoder 110 may also implement SVC adjustments before sending an output downstream. Accordingly, a bit stream may be scaled to remove parts of the bit stream in order to adapt the output to the various needs or preferences of downstream devices or users as well as varying terminal capabilities or network conditions.
The transcoder 120 in the media processing environment of
In
Referring now to
Accordingly, the transcoder 120 may provide multiple encoders and decoders to handle the multiple possible standards and formats that are received and required by upstream and downstream nodes in a streaming environment. In various embodiments, encoders and decoders may be hardware accelerated and/or comprised of a general purpose processor and applicable software.
In the present disclosure, a media source encoder 110 in addition to providing a media stream may also provide feed forward encode data 105. In
In
In addition to the encode streams 230, feed forward encode data 105 is supplied to decode architecture of decoder 122. The decode architecture 122 is shown to output raw stream(s) and/or groups of raw stream(s) 232, where a grouping of raw streams may all be sent to a particular destination device 250. The decode architecture 122 passes the raw stream(s) and the feed forward encode data 105 to the Multiple Input, Multiple Output encode architecture 124. In addition, in one embodiment, the encoder 124 is configured to provide overlay support such that multiple input streams may be combined such that content of one stream is to be overlaid over content of another upon being displayed.
Further, the encode architecture 124 may receive input streams (via the interface circuitry 202) from local memory storage 240 (that can be removable) or from internal or external video-image cameras 140i, 140e. These streams may be encoded or raw streams 230, 232, as the case may be.
During encoding operations, the encode architecture 124 may scale a bit stream during SVC coding and therefore SVC feedback data 242 is passed to the decode architecture 122 and interface circuitry 202 so that SVC feedback data 242 may be provided to upstream nodes. On the downstream side, the encoded output bit stream is provided to destination devices. In the figure, screen assemblies 250 (e.g., a device having display hardware and a display driver) are depicted for the destination devices. It is submitted that two screen assemblies may actually be located in the same device or serviced by the same device.
Referring now to
For example, in
The searches associated with the motion prediction block 350 (as discussed below) are generally intense since many different directions in many different neighboring frames are analyzed. A particular encode standard may define a size of a search area (e.g., how many frames backwards and forwards) to be searched for possible matches with a current block. However, the motion prediction block 350 may initiate SVC adjustments and adapt on the directions that are searched (e.g., only search backwards, do not look back more than 3 frames, etc.) in response to a buffer constraint, a power constraint, limited processing capabilities, etc., as indicated by line 307. Also, other blocks or stages may be adjusted, including motion compensation 352, frame buffer 354, etc.
In focusing on operations of the encoder, the encoding operation consists of the forward encoding path 310 and an inverse decoding path 320. Following a typical H.264 encoding operation, input media data, such as a video frame, is divided into smaller blocks of pixels or samples. In one embodiment, input media data is processed in units of a macroblock (MB) corresponding to a 16×16 displayed pixels.
In the encoder, the forward encoding path 310 predicts each macroblock using Intra or Inter-prediction. In intra-prediction mode, spatial correlation is used in each macroblock to reduce the amount of transmission data necessary to represent an image. In turn, redundancies in a frame are removed without comparing with other media frames. Diversely, in inter-prediction mode, redundancies are removed by comparing with other media frames.
The encoder 120 then searches pixels from the macroblock for a similar block, known as a reference block. An identification of the reference block is made and subtracted from the current macroblock to form a residual macroblock or prediction error. Identification of the similar block is known as motion estimation. A memory (frame buffer 354) stores the reference block and other reference blocks. The motion prediction block or stage 350 searches the memory for a reference block that is similar to the current macroblock block.
Once a reference block is selected, the reference block is identified by a motion vector MV and the prediction error during motion compensation 352. The residual macroblock and motion vectors are transformed (in DCT stage 356), quantized (in quantizer stage 358), and encoded (in entropy encoder stage 360) before being output.
The transformation is used to compress the image in Inter-frames or Intra-frames. The quantization stage 358 reduces the amount of information by dividing each coefficient by a particular number to reduce the quantity of possible values that value could have. Because this makes the values fall into a narrower range, this allows entropy coding 360 to express the values more compactly. The entropy encoder 360 removes the redundancies in the final bit-stream, such as recurring patterns in the bit-stream.
In parallel, the quantized data are re-scaled (in inverse quantizer stage 359) and inverse transformed (in inverse DCT stage 357) and added to the prediction macroblock to reconstruct a coded version of the media frame which is stored for later predictions in the frame buffer 354.
Motion estimation can potentially use a very large number of memory accesses for determining a reference block. For an input frame, the frame is segmented into multiple macroblocks which are reduced to sets of motion vectors. Accordingly, one whole frame is reduced into many sets of motion vectors.
To illustrate, a high definition television (HDTV) video comprises 1920×1080 pixel pictures per second, for example. A common block size can be, for example, a 16×16 block of pixels. Therefore, an exhaustive search may not be practical, especially for encoding in real time. In one approach, the encoder 300 may limit the search for samples of the current macroblock by reducing a search area. Although the foregoing may be faster than an exhaustive search, this can also be time-consuming and computationally intense.
Referring now to
As an illustration, consider an upstream encoder that encodes raw video input. For each input block of a video frame, the upstream encoder will search neighboring frames in the inter-prediction stage (or the same frame in an intra-prediction stage) for a reference block. In an exhaustive search, the upstream encoder is not going to know which motion vector to send until all possible frames and blocks have been checked in all possible directions. Once the best matches have been determined and the residuals computed, then a motion vector output can be generated and sent downstream to a downstream transcoder. Accordingly, at the receiving decoder, the output stream from the upstream encoder is decoded into raw data once again and supplied to the downstream encoder of the transcoder 120. The downstream encoder, however, may not be currently capable to do an exhaustive search, as carried out by the upstream encoder, and therefore may not be capable of producing a high-quality compressed stream, but for the existence of the feed forward encode data 105 provided from the upstream encoder.
In particular, an embodiment of the upstream encoder 110 extracts results of its search operations and provide them to the downstream encoder 124 as one possible form of feed forward encode data 105. Based on the feed forward encode data 105, then, the encoder 124 may be able to identify the best match for a current pixel, since the search operation had been previously performed by the upstream encoder 110 and the results of the search are now provided to the downstream encoder 124, as part of feed forward encode data 105. Further, due to the constraints on the downstream encoder 124, the encoder may be only able to search for neighboring blocks within a set distance or search area from the current block. Therefore, the best match, as indicated in the feed forward encode data 105, may not be within the search area. However, the fourth best match (in the exhaustive search area) may be within the search area (being utilized by the current encoder) and may be selected as the best match for the current encode operation. Basically, the feed forward encode data 105 may allow the downstream encoder 124 to limit its motion estimation searching but still generate high quality and fast processing, because a full search is avoided from being implemented.
Consider, in one embodiment, the transcoder 120 may be integrated as part of a personal device, such as a tablet, that does not have comparable processing power or battery power, as compared to the upstream encoder. Using the feed forward encode data, however, the tablet device may provide a compressed video stream that is comparable with that provided by the more powerful upstream encoder. In a manner of speaking, use of the feed forward encode data 105 can appear to increase the process speed of the encoder 124 acting on the data.
Referring back to
Next,
In step 608, the encoder 124 initiates execution of an encoding process on the raw media data, where the encoding process contains multiple stages in a pipeline arrangement that are to be completed. During the encoding process, information is extracted from the feed forward encode data and used to assist in completion of a respective task by a particular stage, in step 610. For example, during a DCT transform stage, coefficient values or weights previously used in computing an output transform of an input signal by an upstream encoder are reused in completing a DCT transform stage in the current encoding process. Also, during a motion prediction stage, the results of comparisons completed in a motion prediction stage by an upstream encoder may also be extracted and used by a motion prediction stage in the current encoding process. In step 612, a second primary encoded media stream is output from the encoder 124. Further, in some embodiments, the encoder continues to pass or output feed forward encode data downstream that has been used in the encoding process, in step 614.
Electronic device 700 also includes a primary or main memory 706, such as random access memory (RAM). Main memory 706 has stored therein control logic 728A (computer software), and data.
Electronic device 700 also includes one or more secondary storage devices 710. Secondary storage devices 710 include, for example, a hard disk drive 712 and/or a removable storage device or drive 714, as well as other types of storage devices, such as memory cards and memory sticks. For instance, electronic device 700 may include an industry standard interface, such a universal serial bus (USB) interface for interfacing with devices such as a memory stick. Removable storage drive 714 represents a floppy disk drive, a magnetic tape drive, a compact disk drive, an optical storage device, tape backup, etc. As shown in
Removable storage drive 714 interacts with a removable storage unit 716. Removable storage unit 716 includes a computer useable or readable storage medium 724 having stored therein computer software 728B (control logic) and/or data. Removable storage unit 716 represents a floppy disk, magnetic tape, compact disk, DVD, optical storage disk, or any other computer data storage device. Removable storage drive 714 reads from and/or writes to removable storage unit 716 in a well known manner.
Electronic device 700 further includes a communication or network interface 718. Communication interface 718 enables the electronic device 700 to communicate with remote devices. For example, communication interface 718 allows electronic device 700 to communicate over communication networks or mediums 742 (representing a form of a computer useable or readable medium), such as LANs, WANs, the Internet, etc. Network interface 718 may interface with remote sites or networks via wired or wireless connections.
Control logic 728C may be transmitted to and from electronic device 700 via the communication medium 742. Any apparatus or manufacture comprising a computer useable or readable medium having control logic (software) stored therein is referred to herein as a computer program product or program storage device. This includes, but is not limited to, electronic device 700, main memory 706, secondary storage devices 710, and removable storage unit 716. Such computer program products, having control logic stored therein that, when executed by one or more data processing devices, cause such data processing devices to operate as described herein, represent embodiments of the present disclosure.
Electronic device 700 may be implemented in association with a variety of types of display devices. For instance, electronic device 700 may be one of a variety of types of media devices, such as a stand-alone display (e.g., a television display such as flat panel display, etc.), a computer, a tablet, a smart phone, a game console, a set top box, a digital video recorder (DVR), a networking device (e.g., a router, a switch, etc.), a server, or other electronic device mentioned elsewhere herein, etc. Media content that is delivered in two-dimensional or three-dimensional form according to embodiments described herein may be stored locally or received from remote locations. For instance, such media content may be locally stored for playback (replay TV, DVR), may be stored in removable memory (e.g. DVDs, memory sticks, etc.), may be received on wireless and/or wired pathways through a network such as a home network, through Internet download streaming, through a cable network, a satellite network, and/or a fiber network, etc. For instance,
Video-image camera 140 may include an image sensor device and image processor and/or additional/alternative elements. The video-image camera 140 captures video images, and generates corresponding video data that is output on a video data signal. In an embodiment, the video data signal contains the video data that is output on an image processor output signal, including processed pixel data values that correspond to images captured by the image sensor device. The video data signal may include video data captured on a frame-by-frame basis or other basis. In an embodiment, the video data signal may include video data formatted as Bayer pattern data or in another image pattern data type known in the art.
Any process descriptions or blocks in flow charts should be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps in the process, and alternate implementations are included within the scope of an embodiment of the present disclosure in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present disclosure.
It should be emphasized that the above-described embodiments of the present disclosure are merely possible examples of implementations, merely set forth for a clear understanding of the principles of the present disclosure. Many variations and modifications may be made to the above-described embodiment(s) without departing substantially from the spirit and principles of the disclosure. All such modifications and variations are intended to be included herein within the scope of this disclosure and the present disclosure and protected by the following claims.
Claims
1. A video transcoding system that processes encoded media data generated by a primary encoder, the video transcoding comprising:
- at least one decoder that produces decoded media data by decoding the encoded media data generated by the primary encoder;
- at least one secondary encoder that receives the decoded media data from the at least one decoder;
- the at least one secondary encoder also receiving feed forward encode data generated by the primary encoder; and
- the at least one secondary encoder that uses the feed forward encode data to assist in encoding the decoded media data.
2. The video transcoding system of claim 1, further comprising:
- a camera that produces an imaging output; and
- the at least one secondary encoder producing an encoded output that is related to the imaging output.
3. The video transcoding system of claim 1, wherein the at least one secondary encoder limits a search area size performed in a motion prediction stage of an encoding process, wherein the at least one secondary encoder utilizes the feed forward encode data to assist in the motion prediction stage, wherein the primary encoder that generated the feed forward encode data performed the motion prediction stage for a greater search area size.
4. The video transcoding system of claim 1, wherein the feed forward encode data comprises motion vectors computed by the primary encoder.
5. The video transcoding system of claim 1, wherein the feed forward encode data comprises configuration settings used in completing at least one encoding process stage.
6. The video transcoding system of claim 1, wherein the at least one secondary encoder passes the feed forward encode data along with second encoded media data as outputs.
7. A method used by an encoder for encoding media data, the method comprising:
- receiving the media data;
- receiving feed forward encode data; and
- using the feed forward encode data to assist in the encoding of the media data.
8. The method of claim 7, wherein the feed forward encode data is utilized to limit a search area size performed in a motion prediction stage of the encoding of the media data, wherein an upstream encoder that generated the feed forward encode data performed the motion prediction stage for a greater search area size.
9. The method of claim 7, further comprising:
- passing the feed forward encode data along with encoded media data as outputs.
10. The method of claim 7, wherein the feed forward encode data comprises motion vectors from an upstream encoder.
11. The method of claim 7, wherein the feed forward encode data comprises configuration settings used in completing at least one upstream encoding process stage.
12. A video processing system that operates on source encoded media generated by a source encoder, the video processing system comprising:
- a transcoding system having at least one decoder and at least one secondary encoder;
- the at least one decoder of the transcoding system receives the source encoded media generated by the source encoder;
- the at least one decoder of the transcoding system processes the source encoded media to generate decoded media;
- the at least one secondary encoder of the transcoding system processes the decoded media to generate secondary encoded media;
- a camera, coupled to the transcoding system, that produces an imaging output; and
- at least a portion of the transcoding system processing the imaging output of the camera to generate encoded imaging output.
13. The video processing system of claim 12, wherein the at least one secondary encoder of the transcoding system uses feed forward encode data produced by the source encoder to generate the secondary encoded media.
14. The video processing system of claim 12, wherein the imaging output of the camera is encoded and delivered to the at least one decoder.
15. The video processing system of claim 12, wherein the imaging output of the camera is raw and delivered to the at least one decoder.
16. A method used by an encoder that operates on media data, the method comprising:
- receiving the media data;
- generating an encoded media data output to be consumed by a downstream decoder; and
- generating a feed forward encode data output to be consumed by a downstream encoder.
17. The method of claim 16, wherein the feed forward encode data comprises motion vectors from an upstream encoder.
18. The method of claim 16, wherein the feed forward encode data comprises configuration settings used in completing at least one upstream encoding process stage.
19. The method of claim 16, wherein the feed forward encode data is utilized to limit a search area size performed in a motion prediction stage of encoding of the media data at the downstream encoder, wherein the encoder that generated the feed forward encode data performed the motion prediction stage for a greater search area size.
20. The method of claim 16, wherein the encoder concurrently generates the feed forward encode data and the encoded media data output.
Type: Application
Filed: Dec 7, 2011
Publication Date: Jan 24, 2013
Applicant: BROADCOM CORPORATION (Irvine, CA)
Inventor: James D. Bennett (Hroznetin)
Application Number: 13/313,345
International Classification: H04N 7/32 (20060101); H04N 7/26 (20060101);