PICTURE ENCODING METHOD AND APPARATUS AND PICTURE DECODING METHOD AND APPARATUS
A picture encoding method includes receiving an input video signal, encoding the video signal using a reference picture signal to generate a video code stream, encoding the reference picture signal to generate a reference picture code stream, and multiplexing the video code stream with the reference picture code stream to generate an output code stream.
The present divisional application claims the benefit of priority under 35 U.S.C. §120 to application Ser. No. 10/661,697, filed Sep. 15, 2003, which is a Continuation Application of PCT Application No. PCT/JP03/00426, filed Jan. 20, 2003, which was not published under PCT Article 21(2) in English; and under 35 U.S.C. § 119 from Japanese applications Nos. 2002-010875, filed on Jan. 18, 2002, and 2003-010135, filed on Jan. 17, 2003, the entire contents of each are hereby incorporated herein by reference.
BACKGROUND OF THE INVENTION1. Field of the Invention
The present invention relates to a picture encoding method of compression-encoding a picture in a few number of bits and a picture decoding method of playing back a picture by decoding a code stream obtained by compression encoding and, more particularly, to a picture encoding method and apparatus and a picture decoding method and apparatus which can make a recovery from the adverse effect of an error as fast as possible without degrading the encoding efficiency when transmitting/storing encoded data through a transmission path susceptible to errors.
2. Description of the Related Art
It is necessary to compression-encode pictures in a few number of bits in order to transmit or store the pictures in systems designed to transmit or store pictures, e.g., a videophone, video conference system, portable information terminal, digital video disk system, and digital TV broadcasting system.
As such compression encoding techniques, various schemes have been developed, including a motion compensation scheme, discrete cosine transform scheme, subband encoding scheme, pyramid encoding scheme, and combinations thereof. The following are defined as international standards for video compression encoding: ISO•MPEG-1, MPEG-2, MPEG-4, ITU-T H.261, H.262, H.263, and the like.
All these schemes are compression encoding schemes based on a combination of motion compensation adaptive prediction and discrete cosine transform, which are described in detail in reference 1 (Hiroshi Yasuda, “MPEG/International Standardization of Multimedia Encoding”, Maruzen) and the like.
A conventional picture encoding/decoding apparatus has the following problems. First, in a communication path with the possibility of being mixed with errors, such as a radio communication path, performing only encoding will lead to considerable deterioration in decoded picture quality upon occurrence of an error. When errors occur in signals such as a sync signal, mode information, and motion vector information, in particular, picture quality noticeably deteriorates.
Second, in motion compensation adaptive predictive encoding used for picture encoding, only the difference between frames is encoded. For this reason, if an error occurs, the corresponding frame fails, and an erroneous picture is stored in a frame memory. A predictive picture is generated by using the erroneous picture, and the residual error is added to the predictive error. As a consequence, even if subsequent frames are properly decoded, proper pictures cannot be obtained from the subsequent frames unless information is sent in an encoding mode (INTRA mode) of encoding pictures only within frames without using the differences between the frames or the influence of the error gradually wanes to restore the original pictures.
If 1-frame information is lost due to an error, the second frame is not decoded at all, and, for example, the first frame is directly output. At the third frame, a residual error which allows proper decoding only when it is added to the second frame is added to the first frame. As a consequence, the third frame is decoded into a picture totally different from the proper picture. Subsequently, residual errors are added to wrong pictures. Basically, therefore, the error does not disappear, and proper decoded pictures cannot be played back.
In order to solve this problem, in the prior art, a technique called “refresh” is generally used, in which encoding is performed in the INTRA mode in a predetermined cycle. When encoding is performed in the INTRA mode, since the number of coded bits increases, the quality of a picture without any error greatly deteriorates. For this reason, a periodic refresh method or the like is usually used, which refreshes several macroblocks per frame instead of refreshing an entire frame at once. In this periodic refresh method, however, although an increase in the number of coded bits can be suppressed, a long period of time is required to recover a normal state.
Other measures against errors include a measure of using error correction codes. Although this scheme can correct errors that occur randomly, it has difficulty in coping with errors of several hundred bits that consecutively occur in a burst manner. Even if the scheme can cope with such errors, considerable redundancy occurs.
Techniques have been studied to receive error information and the like about a network from a system and adaptively process the error information and the like on the server side. More specifically, such a technique uses a method of performing re-encoding upon reception of error information or switching a plurality of files. In this method, the server needs to have an encoding function and a function of adaptively switching a plurality of files, resulting in extra processing.
As described above, according to the conventional picture encoding techniques, loss of information due to an error causes a great deterioration in picture quality. In addition, a technique such as the periodic refresh method of reconstructing information lost due to an error requires a long period of time to achieve error recovery in consideration of the encoding efficiency. Shortening the time required for recovery will increase the number of encoded bits to result in a deterioration in encoding efficiency.
BRIEF SUMMARY OF THE INVENTIONIt is an object of the present invention to provide a picture encoding method and apparatus and a picture decoding method and apparatus which can quickly recover from an error even if information is lost by the error, exhibit high encoding efficiency, and need not perform any re-encoding.
According to a first aspect of the present invention, there is provided a picture encoding method which comprises receiving an input video signal, encoding the video signal using a reference picture signal to generate a video code stream, encoding the reference picture signal to generate a reference picture code stream, and multiplexing the video code stream with the reference picture code stream to generate an output code stream.
According to a second aspect of the present invention, there is provided a picture encoding apparatus comprising a receiving unit configured to receive an input video signal, a first encoding unit configured to encode the video signal by using a reference picture signal to generate a video code stream, a second encoding unit configured to encode the reference picture signal to generate a reference picture code stream, and a multiplexing unit configured to multiplex the video code stream and the reference picture code stream to generate an output code stream.
According to the third aspect of the present invention, there is provided a picture decoding method which comprises receiving an input code stream containing a video code stream obtained by encoding a video signal and a reference picture code stream obtained by encoding a reference picture signal, decoding the reference picture code stream contained in the input code stream to generate a first reference picture signal, and decoding the video code stream contained in the input code stream by selectively using one of a second reference picture signal obtained from a previous picture signal and the first reference picture signal to generate a playback picture signal.
According to a fourth aspect of the present invention, there is provided a picture decoding apparatus which comprises an input unit configured to receive an input code stream containing a video code stream obtained by encoding a video signal and a reference picture code stream obtained by encoding a reference picture signal, a first decoding unit configured to decode the reference picture code stream contained in the input code stream to generate a first reference picture signal, and a second decoding unit configured to decode the video code stream contained in the input code stream by selectively using one of a second reference picture signal obtained from a previous picture signal and the first reference picture signal to generate a playback picture signal.
BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWING
An input video signal 131 is divided into a plurality of predetermined areas first by an area divider 101 and then subjected to the following motion compensation adaptive prediction. A motion compensation adaptive predictor 111 detects a motion vector 143 between an input picture signal 132 and a reference picture signal 141 of the previous frame which is stored in a frame memory 110 and has already been encoded and subjected to a local decoding. Motion compensation is performed for the reference picture signal 141 by using this motion vector. This generates a predictive picture signal (the reference picture signal after motion compensation) 142. The motion compensation adaptive predictor 111 selects a suitable prediction mode of the motion compensation prediction mode and the intra encoding (predictive picture signal=0) mode using the input picture signal 132 for encoding without any change, and outputs the predictive picture signal 142 corresponding to the selected prediction mode.
A subtracter 102 subtracts the predictive picture signal 142 from the input picture signal 132 and outputs a predictive residual error signal 133. The predictive residual error signal 133 is subjected to discrete cosine transform (DCT) for each block having a given size in a first discrete cosine transformer 103. DCT coefficients 134 obtained by the discrete cosine transform are quantized by a second quantizer 104. A first variable length encoder 105 encodes quantized DCT coefficients 135 to obtain a DCT coefficient code stream 136. A multiplexer 106 multiplexes the DCT coefficient code stream 136 with a motion vector code stream 144 obtained by encoding motion vector information using a second variable length encoder 112. The resultant data is output as a video code stream 137.
On the other hand, the DCT coefficient 135 is dequantized by a dequantizer 107 and then subjected to an inverse discrete cosine transform (inverse DCT). An adder 109 adds an output 139 from an inverse cosine transformer 108 to the predictive picture signal 142 to generate a local decoded picture signal 140. The local decoded picture signal 140 is stored as a reference picture signal in the frame memory 110.
The reference picture signal 141 of the previous frame output from the frame memory 110 is encoded by a reference picture encoding unit comprising blocks denoted by reference numerals 113 to 115. More specifically, the reference picture signal 141 is input to both the motion compensation adaptive predictor 111 and the second discrete cosine transformer 113. In the second discrete cosine transformer 113, the reference picture signal 141 is subjected to a discrete cosine transform (DCT) for each block having a predetermined size. The second quantizer 114 quantizes transform coefficients 145 obtained by this operation. The third variable length encoder 115 encodes the quantized transform coefficients. A code stream (to be referred to as a reference picture code stream hereinafter) 147 obtained by the third variable length encoder 115 is output as a frame different from the video code stream 137.
In contrast to the case shown in
With regard to a Timestamp indicating the display time of a frame or the like, it is preferable to describe in an R-picture the Timestamp of a frame using this, i.e., the Timestamp of the next frame. Assume that the reference picture code stream 147 is omitted due to an error or a frame using the reference picture code stream 147 is omitted due to an error. In this case, such a Timestamp is effective information to identify the association between the frame and the reference picture code stream 147. In addition, using the same code stream structure as that of a general frame eliminates the necessity of a special additional circuit, a general circuit can be used.
The use of the scheme of discriminating the modes in accordance with mode information in this manner can implement a recovery function by not only encoding a reference picture signal used in this embodiment but also intra-encoding, for example, the target frame itself, which is to be recovered from an error, and implementing redundancy. A recovery function can be implemented by encoding in advance, in the intra mode, a frame to be subjected to motion compensation adaptive predictive encoding, and designating only mode information in an R-Picture or the like. In this case, when it is determined on the transmission side to transmit an R-Picture, there is no need to send the code stream of a corresponding general frame (mainly a P-Picture or B-Picture). This embodiment is therefore useful for the effective use of a transmission path.
The basic arrangement of a picture decoding apparatus corresponding to the picture encoding apparatus according to this embodiment will be described with reference to
The reference picture code stream 241 demultiplexed from the input code stream by a header demultiplexing unit (not shown) is transformed into a reference picture signal 244 through a variable length encoder 209, dequantizer 210, and inverse discrete cosine transformer 211. This code stream is then stored in a frame memory 208. Using the motion vector information 238, the motion compensation predictor 207 performs motion compensation for a reference picture signal 239 of the previous frame read out from the frame memory 208 to generate a predictive picture signal (a reference picture signal after motion compensation) 240. An adder 205 adds the predictive error signal 235 and the predictive picture signal 240 to generate a playback picture signal 236. The playback picture signal 236 is output to the outside of the apparatus and stored as a reference picture signal in the frame memory 208.
In this embodiment, the picture encoding apparatus sends out the information of a reference picture as a reference picture code stream to the transmission system or storage system independently of a video code stream. The picture decoding apparatus then decodes the reference picture code stream to reconstruct the information of the reference picture. This makes it possible to properly cope with the occurrence of an error. As described above, according to this embodiment, the picture recovery ability upon occurrence of an error can be improved.
This effect will be further described below. Consider, for example, video encoding operation using a prediction like that shown in
In contrast to this, according to this embodiment, in the picture decoding apparatus shown in
In the arrangement of this embodiment, the total number of codes generated in the picture encoding apparatus increases by the extent to which a reference picture signal is separately encoded. This problem can be solved by outputting a reference picture code stream only when needed. For example, mode information indicating a reference picture code stream is written at the head of a frame as frame type information indicating the type of the frame. This mode information is analyzed on the picture encoding apparatus side to determine whether or not to output a reference picture code stream. In the normal mode, no reference picture code stream is output.
A determination result 432 from the additional information determination unit 401 is input to an additional information output determination unit 403. This unit determines in accordance with state information 433 indicating the current state whether or not to output the reference picture code stream 147. Assume that the state information 433 is information indicating whether or not an error is currently occurring. In this case, if an error is occurring, the additional information output determination unit 403 determines to output the reference picture code stream 147. In the normal state in which no error is occurring, the reference picture code stream 147 determines not to output the reference picture code stream 147.
A determination result 434 from the additional information output determination unit 403 is transferred to an output unit 402. The output unit 402 outputs the reference code stream 147, contained in a code stream 435 input through the additional information determination unit 401, as an output code stream 436 in accordance with the determination result 434 from the additional information output determination unit 403. This makes it possible to adaptively output the reference code stream 147, and hence prevents unnecessary information from being output in the normal state without any error.
In the picture decoding apparatus shown in
A determination result 532 from the additional information determination unit 501 is input to a decoding method determination unit 503 to be used to determine whether or not to decode the reference picture code stream 241. Information indicating whether the current decoding operation is local decoding or an error has occurred is supplied as state information 533 to the decoding method determination unit 503. The decoding method determination unit 503 determines from the determination result 532 from the additional information determination unit 501 and the state information 533 whether or not to decode the reference picture code stream 241 contained in a code stream 535 input through the additional information determination unit 501. A decoding unit 502 performs decoding in accordance with a determination result 534 from the decoding method determination unit 503 and outputs a playback signal 536. With this operation, in the case of local decoding or the like, the picture decoding apparatus can be controlled not to decode additional information. In the normal state without any transmission error, for example, the reference picture code stream 241 is discarded by the decoding method determination unit 503 without being decoded. Assume that a frame to be referred to is omitted and a playback picture cannot be normally decoded because an error has occurred in the transmission path. In this case, since a reference picture required to decode the playback picture is not stored in the frame memory, the reference picture code stream 241 is decoded to replace the picture stored in the frame memory. This prevents a deterioration in the playback picture due to mixing of the error. This apparatus can also use a technique of decoding a reference code stream and replacing the reference frame with the resultant data only when an error has occurred. The reception side can also be configured to decode an entire reference code stream upon receiving it regardless of whether or not the reference frame is to be replaced.
This embodiment has been described on the premise that one reference frame is used. However, a plurality of reference frames may be used. In this case, if all the pictures of a plurality of frames are added, the number of coded bits may become excessively large, resulting in lack of practicality. For this reason, only a small area (e.g., a macroblock in this case) of a plurality of reference frames which is to be referred to in motion compensation is selected and output as the reference picture code stream 241. In this case, a data structure per macroblock replaces the data structure per frame in
With this arrangement, a reference picture signal required to encode and decode a specific frame of a video code stream is added to the frame. More specifically, as shown in
In the scheme of adding a reference picture code stream to a specific frame on a small area (macroblock) basis as described in the latter part of the first embodiment, the apparatus can use a scheme of multiplexing the video code stream 137 and the reference picture code stream 147 on a macroblock basis and adding, to the video code stream 137, determination flag information indicating whether or not the reference picture code stream 147 is added to a specific macroblock.
Referring to
For example, when an error occurs, the motion compensation predictor 207 can output the reference picture signal 244 after motion compensation which is reconstructed in the above manner as a predictive picture signal 240 instead of input motion vector information 238 and a reference picture signal 239 from the frame memory 208.
According to this embodiment, even when a plurality of reference pictures are to be used while some manipulations are applied thereto, the reference picture signal 142 after motion compensation is encoded, which is a predictive picture signal directly subtracted, by a subtracter 102, from an input video signal 132 after area segmentation. The reference signal 142 is used as a signal for restoration upon occurrence of an error. This makes it possible to solve the above problems.
In many cases, a picture frame subjected to predictive encoding is predictively encoded while it is selected whether predictive encoding (INTER mode) is performed on a macroblock (small area) basis or intra-frame encoding (INTRA mode) is performed. In this case, since there is no predictive picture signal in any intra-frame-encoded macroblocks, if a reference picture signal is output as one frame, an unnecessary portion may be produced. It is therefore possible to select and store the reference picture code streams 147 required for decoding operation on a macroblock basis as well as storing reference picture signals in the frame memory on a frame basis.
Fourth EmbodimentThe first to third embodiments have exemplified the case wherein video encoding is performed by a combination of motion compensation prediction, discrete cosine transform, quantization, and variable length encoding. However, the present invention is not limited to such an encoding scheme. For example, the present invention can be applied to next-generation encoding techniques such as wavelet encoding.
In general, when a reference picture is encoded in the INTRA mode, an error is produced between this reference picture and the original reference picture due to quantization. For this reason, in encoding operation, the picture encoded signal obtained by conversion/encoding and quantization is used as a reference picture instead of a reference picture signal as a predictive signal. By transmitting this signal as an additional reference picture encoded signal to the decoding apparatus side, a system free from errors due to quantization can be realized.
By using the present invention in combination with a feedback information RTCP implementing RTP (Real-time Transport Protocol) or the like, the effect of the present invention can be enhanced. This is because when error information of a network is sent from the reception side to the transmission side, the information can be used as a condition for determining whether or not to transmit additional information. If, for example, it is determined from RTCP that an error has occurred, the reference picture code stream of the next frame is transmitted to the reception side.
Picture encoding and decoding in the present invention described above may be implemented by hardware, or part or all of processing may be implemented by software using a computer. Such software (computer program) may be distributed upon being recorded on a recording medium such as a semiconductor memory or CD-ROM, or can be distributed through a transmission medium such as a radio channel or wire.
As described above, according to the present invention, the recovery ability upon occurrence of an error can be improved without any deterioration in transmission efficiency. In addition, processing within an encoding framework and preparing all data at the time of encoding will eliminate the necessity to perform re-encoding, complicated processing at the time of transmission, or the like. This makes it possible to construct a simple picture transmission/reception system.
As has been described above, the video encoding and decoding apparatuses according to the present invention can be used for a system designed to compression-encode pictures in a small information amount and transmit or store the resultant data in a videophone, video conference system, portable information terminal, digital video disk system, and digital TV broadcasting system.
Additional advantages and modifications will readily occur to those skilled in the art. Therefore, the invention in its broader aspects is not limited to the specific details and representative embodiments shown and described herein. Accordingly, various modifications may be made without departing from the spirit or scope of the general inventive concept as defined by the appended claims and their equivalents.
Claims
1. A picture decoding method comprising:
- receiving an input code stream containing a video code stream obtained by encoding a video signal and a reference picture code stream obtained by encoding a reference picture signal;
- decoding the reference picture code stream contained in the input code stream to generate a first reference picture signal;
- decoding the video code stream contained in the input code stream by selectively using one of a second reference picture signal obtained from a previous picture signal and the first reference picture signal to generate a playback picture signal; and
- storing in a frame memory the first reference picture signal and the playback picture signal as the second reference picture signal;
- the decoding the video code stream decoding the video code stream by selectively reading out the second reference picture signal and the first reference picture signal from the frame memory.
2. A picture decoding apparatus comprising:
- an input unit configured to receive an input code stream containing a video code stream obtained by encoding a video signal and a reference picture code stream obtained by encoding a reference picture signal;
- a first decoding unit configured to decode the reference picture code stream contained in the input code stream to generate a first reference picture signal; and
- a second decoding unit configured to decode the video code stream contained in the input code stream by selectively using one of a second reference picture signal obtained from a previous picture signal and the first reference picture signal to generate a playback picture signal, the second decoding unit including a frame memory which stores the playback picture signal as the second reference picture signal, together with the first reference picture signal, and decodes the video code stream by selectively reading out the second reference picture signal and the first reference picture signal from the frame memory.
3. A computer readable storage medium storing instructions of a computer program which when executed by a computer results in performance of steps comprising:
- receiving an input code stream containing a video code stream obtained by encoding a video signal and a reference picture code stream obtained by encoding a reference picture signal;
- decoding the reference picture code stream contained in the input code stream to generate a first reference picture signal;
- decoding the video code stream contained in the input code stream by selectively using one of a second reference picture signal obtained from a previous picture signal and the first reference picture signal to generate a playback picture signal; and
- storing in a frame memory the first reference picture signal and the playback picture signal as the second reference picture signal, wherein
- the decoding the video code stream decodes the video code stream by selectively reading out the second reference picture signal and the first reference picture signal from the frame memory.
4. A computer system comprising:
- means for receiving an input code stream containing a video code stream obtained by encoding a video signal and a reference picture code stream obtained by encoding a reference picture signal;
- means for decoding the reference picture code stream contained in the input code stream to generate a first reference picture signal;
- means for decoding the video code stream contained in the input code stream by selectively using one of a second reference picture signal obtained from a previous picture signal and the first reference picture signal to generate a playback picture signal; and
- means for storing in a frame memory the first reference picture signal and the playback picture signal as the second reference picture signal;
- the means for decoding the video code stream decoding the video code stream by selectively reading out the second reference picture signal and the first reference picture signal from the frame memory.
Type: Application
Filed: Mar 15, 2007
Publication Date: Jul 5, 2007
Inventors: Takeshi Nagai (Tokorozawa-shi), Takeshi Chujoh (Tokyo), Shinichiro Koto (Yokohama-shi), Yoshihiro Kikuchi (Yokohama-shi), Wataru Asano (Yokohama-shi)
Application Number: 11/686,372
International Classification: H04N 11/02 (20060101); H04N 7/12 (20060101);