SYSTEM AND METHOD FOR VIDEO ERROR CONCEALMENT
The present invention provides, in one embodiment, a system and method for concealing video errors. The system encodes, reorders, and packetizes video information into video data packets for transmission over a communication network such that the system conceals errors caused by lost video data packets when the system receives, depacketizes, orders, and decodes the data packets. In one embodiment, the system and method encodes and packetizes video information, such that adjacent macroblocks are not placed in the same video data packets. Additionally, the system and method may provide information accompanying the video data packets to facilitate the decoding process. An advantage to such a scheme is that errors due to video data packet loss are spatially distributed over a video frame. Thus, if regions of data surrounding a lost macroblock are successfully decoded, the decoder may predict motion vectors and spatial content with a higher degree of accuracy, which leads to higher video quality.
Latest POLYCOM, INC. Patents:
This application is a continuation application of U.S. patent application Ser. No. 10/226,504, filed Aug. 23, 2002, which is incorporated by reference in its entirety, and to which priority is claimed. This application also claims the benefit of Provisional Patent Application Ser. No. 60/314,413, filed Aug. 23, 2001, entitled which is also incorporated herein by reference.
BACKGROUND OF THE INVENTION1. Field of the Invention
This present invention relates generally to video communication, and more particularly to video error concealment.
2. Description of Related Art
Video images have become an increasingly important part of global communication. In particular, video conferencing and video telephony have a wide range of applications such as desktop and room-based conferencing, video over the Internet and over telephone lines, surveillance and monitoring, telemedicine, and computer-based training and education. In each of these applications, video and accompanying audio information is transmitted across telecommunication links, including telephone lines, ISDN, DSL, and radio frequencies.
A standard video format used in video conferencing is Common Intermediate Format (CIF), which is part of the International Telecommunications Union (ITU) H.261 videoconferencing standard. The primary CIF format is also known as Full CIF or FCIF.
Additional formats with resolutions higher and lower than FCIF have also been established.
Presently, efficient transmission and reception of video signals may require encoding and compression of video and accompanying audio data. Video compression coding is a method of encoding digital video data such that it requires less memory to store the video data and reduces required transmission bandwidth. Certain compression/decompression (CODEC) schemes are frequently used to compress video frames to reduce required transmission bit rates. Thus, CODEC hardware and software allow digital video data to be compressed into a smaller binary format than required by the original (i.e., uncompressed) digital video format.
Several conventional approaches and standards to encoding and compressing source video signals exist. Some standards are designed for a particular application such as JPEG (Joint Photographic Experts Group) for still images, and H.261, H.263, MPEG (Moving Pictures Experts Group), MPEG-2 and MPEG-4 for moving images. These coding standards, typically, use block-based motion-compensated prediction on 16.times.16 pixels, commonly referred to as macroblocks. A macroblock is a unit of information containing four 8.times.8 blocks of luminance data and two corresponding 8.times.8 blocks of chrominance data in accordance with a 4:2:0 sampling structure, where the chrominance data is subsampled 2:1 in both vertical and horizontal directions.
As a practicality, audio data also must be compressed, transmitted, and synchronized along with the video data. Synchronization, multiplexing, and protocol issues are covered by standards such as H.320 (ISDN-based video conferencing), H.324 (POTS-based video telephony), and H.323 (LAN or IP-based video conferencing). H.263 (or its predecessor, H.261) provides the video coding part of these standards groups.
A motion estimation and compensation scheme is one conventional method typically used for reducing transmission bandwidth requirements for a video signal. Because the macroblock is the basic data unit, the motion estimation and compensation scheme may compare a given macroblock in a current video frame with the given macroblock's surrounding area in a previously transmitted video frame, and attempt to find a close data match. Typically, a closely matched macroblock in the previously transmitted video frame is spatially offset from the given macroblock by less than a width of the given macroblock. If a close data match is found, the scheme subtracts the given macroblock in the current video frame from the closely matched, offset macroblock in the previously transmitted video frame so that only a difference (i.e., residual) and the spatial offset needs to be encoded and transmitted. The spatial offset is commonly referred to as a motion vector. If the motion estimation and compensation process is efficient, the remaining residual macroblock should contain only an amount of information necessary to describe data associated with pixels that change from the previous video frame to the current video frame and a motion vector. Thus, areas of a video frame that do not change (e.g., the background) are not encoded and transmitted.
Conventionally, the H.263 standard specifies that the motion vectors used for motion estimation and motion compensation be differentially encoded. Although differential encoding reduces data amounts required for transmission, any error in which motion vector data is lost or corrupted for one macroblock negatively impacts adjacent macroblocks. The result is a propagation of error due to the corrupted data which leads to lower video quality.
When preparing video frame information for transmission over a packet switched communication network, encoding schemes transform the video frame information, compressed by motion estimation and compensation techniques, into data packets for transmission across a communication network. Although data packets allow for greater transmission efficiency, lost, corrupted, or delayed data packets can also introduce errors resulting in video quality degradation. Alternatively, video data may be transmitted on heterogeneous communications networks in which one of the endpoints is associated with a circuit-switched network and a gateway or other packet-switched to circuit switched network bridging device is used.
Currently, lost or corrupted data packets often cause reduced video quality. Therefore, there is a need for a system and method which organizes and transmits data packets in order to conceal errors caused by data packet loss.
SUMMARY OF THE INVENTIONThe present system and method overcome or substantially alleviate prior problems associated with packet loss of video data. In general, the present invention provides a system and method that encodes, reorders, and packetizes video information for transmission across a packet switched network with a capability to conceal video error caused by video data packet loss.
In an exemplary embodiment, video signals are encoded into sets of macroblocks. A macroblock reordering engine then assigns integer labels called macroblock group identifiers (MBGIDs) to each macroblock. Advantageously, adjacent macroblocks are not assigned identical MBGIDs in one exemplary embodiment. A macroblock packetization engine then enables packetizing of the macroblocks, such that macroblocks assigned identical MBGIDs are packetized together. For embodiments of the invention in which adjacent macroblocks are not assigned identical MBGIDs, it follows that spatially adjacent macroblocks are not packetized together. Additionally, corresponding data, such as an intra-macroblock map, may be incorporated in a picture header or conveyed by some other mechanism to facilitate a corresponding decoding process.
In yet another embodiment of the invention, when an image processing engine receives data packets containing encoded macroblocks, the data packets are depacketized, and the encoded macroblocks are ordered and decoded. In an alternate embodiment, the image processing engine depacketizes the received data packets, then decodes the macroblocks in an order in which they were received to reduce processing delay. If one or more data packets are lost, data accompanying the macroblocks of successfully transmitted data packets are used to attenuate effects of the lost data packets. Various methods based on whether the lost macroblocks were intra-coded or inter-coded compensate for the missing macroblocks. Upon compensation, the video signal may then be displayed. As a result, the present system and method is capable of concealing video errors resulting from data packet loss.
BRIEF DESCRIPTION OF THE DRAWINGS
The present invention conceals errors in video signals caused by data packet loss. The present system and method departs from existing technologies by packetizing macroblocks in a flexible (e.g., non-raster-scan) order in a video frame. In contrast to existing video coding standards, macroblocks are packetized in an order specified by a macroblock reorder pattern. In addition, motion vectors for each macroblock may be non-differentially encoded. These improvements seek to attenuate the disturbances caused by data packet loss across a communication link. The scope of the present invention covers a variety of video standards, including, but not limited to, H.261, H.263, H.264, MPEG, MPEG-2, and MPEG-4.
In addition, the coding engine 402 encodes (i.e., compresses) each macroblock to reduce the number of bits used to represent data content. Each macroblock may be “intra-coded” or “inter-coded,” and a frame may be comprised of any combination of intra-coded and inter-coded macroblocks. Inter-coded macroblocks are encoded using temporal similarities (i.e., similarities that exist between a macroblock from one frame and a closely matched macroblock from a previous frame). Specifically, a given inter-coded macroblock comprises encoded differences between the given macroblock and a closely matched macroblock from a previous video frame. The closely matched macroblock from the previous video frame may comprise data associated with pixels that are offset from the pixels associated with the given macroblock. Alternatively, intra-coded macroblocks are encoded without use of information from other video frames in a manner similar to that employed by the JPEG still image encoding standard.
For example, to determine if a given macroblock may be encoded as an inter-coded macroblock, the coding engine 402 computes differences between data of the given macroblock of a current video frame with data of a macroblock from a previous video frame (referred to as an offset macroblock), where the differences may be realized, for example, by a mean-absolute error or a mean-squared error between data corresponding to pixels located at co-located positions within the macroblocks. For the given macroblock, the coding engine 402 computes errors for a plurality of offset macroblocks. If the coding engine 402 only finds errors greater than a predetermined difference threshold value, then significant similarities do not exist between data from the given macroblock and data from the previous frame, and the macroblock is intra-coded. However, if one error is found to be less than the predetermined difference threshold value for the given macroblock and a given offset macroblock from the previous frame, then the given macroblock is inter-coded.
To inter-code the given macroblock, the coding engine 402 subtracts the given macroblock's data from the offset macroblock's data (i.e., luminance and chrominance data associated with a pixel of the given macroblock is subtracted from luminance and chrominance data associated with a corresponding pixel of the offset macroblock for every pixel) to give difference data, encodes the difference data using standard coding techniques such as Discrete Cosine Transforms and quantization methods among others, determines an offset vector from the given macroblock to the offset macroblock (referred to as a motion vector), and encodes the motion vector.
Presently, video coding standards, such as H.261 and H.263, specify that motion vectors of inter-coded macroblocks be differentially encoded to improve coding efficiency. However, differential encoding causes errors created by lost or corrupted motion vector data to propagate to adjacent macroblocks that would otherwise be decoded without error, since encoded motion vector data associated with a given macroblock is, in general, not independent of the motion vector data of neighboring macroblocks. Thus, the effects of the motion vector data of a given macroblock are not spatially localized to the given macroblock. However, if the motion vectors of each inter-coded macroblock are non-differentially encoded, then the effects of the motion vector data are localized to the given macroblock, resulting in a significant increase in error resilience. In most cases, a change in motion vector coding method from a differential to a non-differential technique results in a small loss in overall coding efficiency (typically less than a few percent). Advantageously, the motion vector components associated with each inter-coded macroblock, contrary to conventional methods, are not differentially encoded, according to one embodiment of the present invention.
In another embodiment of the invention, the coding engine 402 may intra-code macroblocks of a frame using a “walk-around-refresh” mechanism. The “walk-around-refresh” mechanism is a deterministic mechanism to clean up reference frame mismatches, called data drift, by intra-coding a specific pattern of macroblocks for each frame. The coding engine 402 uses macroblocks of a reference frame as offset macroblocks in decoding inter-coded macroblocks of a current frame. In one embodiment of the invention, the “walk-around-refresh” mechanism is enabled to intra-code a pattern of macroblocks using an integer walk-around interval w selected from a set of predetermined integer walk-around intervals. For example, if w=47, then the coding engine 402 intra-codes every w.sup.th macroblock. The walk-around interval may be selected based upon video data transmission rates and error rates. When the “walk-around-refresh” intra-coded macroblocks are received by the coding engine of the remote video conference station 204 (
Furthermore, the coding engine 402 may generate an intra-macroblock map that identifies which macroblocks in a coded video frame are intra-coded. After the intra-macroblock map is generated, the image processing engine 310 sends the map to the remote video conference station 204. The map may be sent as part of a picture header field associated with the coded video frame, for example, although other fields may be used.
According to the present invention, the coding engine 402 may generate the intra-macroblock map in one of two ways. In one embodiment of the invention, the coding engine 402 uses run-length encoding to describe locations of intra-coded macroblocks within the frame. Run-length encoding is a technique to reduce the size of a repeating string of characters. In another embodiment of the invention, the coding engine 402 generates a bitmap, where each bit in the bitmap corresponds to one macroblock of the frame. A bit's value identifies a corresponding macroblock's coding type. For example, in one embodiment of the invention, a “1” bit signifies that a corresponding macroblock is intra-coded. In another embodiment of the invention, a “1” bit signifies that the corresponding macroblock is inter-coded. Other methods for generating the intra-macroblock map may be contemplated for use in the present invention.
In yet another embodiment of the invention, the coding engine 402 selects the intra-macroblock map coding method that produces the fewest number of bits. For example, a 352.times.288 pixel (i.e., a 352 pixel horizontal resolution by 288 pixel vertical resolution) FCIF video frame comprises 396 macroblocks configured as a 22.times.18 macroblock matrix. Not including any bit overhead that may be required, the bitmap encoding method requires 396 bits (one bit for each macroblock). Thus, 396 bits are used to transmit the bitmap encoded intra-macroblock map, independent of the number of intra-coded macroblocks within the FCIF frame. In contrast, however, the number of bits utilized to transmit the run-length encoded intra-macroblock map is dependent upon the number of intra-coded macroblocks within the FCIF frame. The cost of transmitting a run-length encoded intra-macroblock map is eight bits per intra-coded macroblock (i.e., eight bits per run value), where the run value identifies a location of the intra-coded macroblock within the FCIF frame. Therefore, if the FCIF frame contains n intra-coded macroblocks, then 8n bits are required to transfer the run-length encoded intra-macroblock map.
Thus, if the CIF frame contains less than 50 intra-coded macroblocks (n<50), then the source coding engine 402 selects the run-length encoding method, otherwise the source coding engine 402 selects the bitmap encoding method. The selection of an intra-macroblock map encoding method depends upon the video format, of which the FCIF video frame is an exemplary example.
Subsequently, the encoded macroblocks are forwarded to the macroblock reordering engine 404. The macroblock reordering engine 404 reorders the encoded macroblocks. Specifically, each macroblock is assigned a macroblock group identifier (MBGID) from a plurality of MBGIDs. In an exemplary embodiment, the macroblocks are numbered one to six according to an exemplary macroblock assignment pattern illustrated in
As will be discussed further below in conjunction with
The coding engine 402 (
Different reorder patterns and MBGIDs may be utilized according to the present invention. In one embodiment of the invention, the macroblock reordering engine 404 selects a MGID based on video data rates and/or video format.
Referring back to
Subsequently, the data packets and picture header are forwarded to the communication buffer 408 for transmission across the network 206 (
Conversely, the image processing engine 310 also processes video data packets received from a remote location and provides video signals for display. Initially, video data packets are received by the communication interface 312 (
Subsequently, the coding engine 402 functions as a decoder, and determines whether a video data packet was lost in transit across the network 206.
Referring back to
If the lost macroblocks are intra-coded, then several error concealment techniques may be utilized. For example, if the lost macroblock is intra-coded as part of a “walk-around-refresh” mechanism, the coding engine 402 replaces the lost macroblock with contents of a “corresponding” macroblock from a previous frame, where two “corresponding” macroblocks cover the same spatial area of their respective frames. According to the present invention, the “walk-around-refresh” mechanism's clean up rate is a function of the data and error rates.
Alternatively, if a lost intra-coded macroblock is not coded as part of the “walk-around-refresh” mechanism, then the coding engine 402 spatially interpolates the contents of the lost macroblock from adjacent macroblocks. In one embodiment of the invention, each 8.times.8 block of the lost macroblock is spatially interpolated from the two nearest blocks located in adjacent macroblocks.
Similarly, to reconstruct data for an 8.times.8 upper right-hand block 750 of the lost macroblock 705, the coding engine 402 interpolates data in a first column of data 755 from an 8.times.8 upper left-hand block 760 of the right adjacent macroblock 720, and data in a last row of data 765 from an 8.times.8 lower right-hand block 770 of the upper adjacent macroblock 715. Other forms of interpolation may also be applied and other blocks of adjacent macroblocks may be utilized, and are within the scope of the present invention.
If the lost macroblock is inter-coded, then the coding engine 402 computes an estimate of the lost macroblock's motion vector by examining the motion vectors of adjacent macroblocks.
Once the lost macroblock's motion vector is estimated, the coding engine 402 (
Subsequently, in step 915, the coding engine 402 generates an intra-macroblock map that identifies locations of the intra-coded macroblocks. In one embodiment of the present invention, the intra-macroblock map is coded using either a run-length encoding method or a bitmap encoding method based upon total number of bits required to code the intra-macroblock map.
Next, a macroblock reordering engine 404 (
Subsequently, the macroblock packetization engine 406 (
Next, the coding engine 402 (
For example, if the lost macroblock is intra-coded as part of the “walk-around-refresh” mechanism, then the coding engine 402 replaces the lost macroblock's content with the data content of a corresponding macroblock from a previous frame. Alternatively, if a lost intra-coded macroblock is not coded as part of the “walk-around-refresh” mechanism, then the lost macroblock's content are spatially interpolated from nearest-neighbor adjacent macroblocks. In one embodiment of the present invention, the coding engine 402 uses a two-dimensional interpolation to interpolate data from adjacent macroblocks (
Alternatively, if the lost macroblock is inter-coded, then the coding engine 402 estimates the lost macroblock's motion vector by examining the motion vectors of adjacent macroblocks. In one embodiment of the invention, the motion vector is computed as a median of three neighboring macroblocks' motion vectors (
The invention has been explained above with reference to exemplary embodiments. It will be evident to those skilled in the art that various modifications may be made thereto without departing from the broader spirit and scope of the invention. Further, although the invention has been described in the context of its implementation in particular environments and for particular applications, those skilled in the art will recognize that the present invention's usefulness is not limited thereto and that the invention can be beneficially utilized in any number of environments and implementations. The foregoing description and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.
Claims
1. A system for processing video data, comprising:
- a coding engine for processing each frame of a video signal to generate macroblocks and to encode the macroblocks;
- a macroblock reordering engine for assigning a macroblock group identifier (MBGID) from a plurality of MBGIDs to each encoded macroblock; and
- a macroblock packetization engine for placing each of the encoded macroblocks into a particular data packet according to the MBGID.
Type: Application
Filed: May 24, 2007
Publication Date: Oct 4, 2007
Applicant: POLYCOM, INC. (PLEASANTON, CA)
Inventors: MICHAEL HOROWITZ (Austin, TX), Rick Flott (Austin, TX)
Application Number: 11/753,465
International Classification: H04N 7/12 (20060101);