Method and system of noise-adaptive motion detection in an interlaced video sequence
A motion decision value provides a dependable estimate whether motion occurs in a given region of a video image in an interlaced video sequence. The motion detection is particularly applicable in the conversion from interlaced video to progressive video. An input first is fed to an absolute value former which computes a frame difference signal from a difference between the first field and the second field in one frame. A point-wise motion detection signal is computed based on the frame difference signal and noise in the video sequence, wherein the point-wise motion detection signal is noise-adaptive. The point-wise motion detection signal is then followed by a region-wise motion detection that combines the point-wise motion detection signal with an adjacent point-wise motion detection signal delayed by one field. The motion decision value is then computed from the region-wise motion detection signal and output for further processing in the video signal processing system, such as for choosing whether the spatially interpolated video signal value or the temporally interpolated video signal value should be used for the output.
Latest Samsung Electronics Patents:
The present invention relates generally to motion detection in video sequences, and in particular to noise-adaptive motion detection in interlaced video sequences.
BACKGROUND OF THE INVENTIONIn the development of current Digital TV (DTV) systems, it is essential to employ video format conversion units because of the variety of the video formats adopted in many different DTV standards worldwide. For example, the ATSC DTV standard system of North America adopted 1080×1920 interlaced video, 720×1280 progressive video, 720×480 interlaced and progressive video, etc. as its standard video formats for digital TV broadcasting.
Video format conversion operation is to convert an incoming video format to a specified output video format to properly present the video signal on a display device (e.g., monitor, FLCD, Plasma display) which has a fixed resolution. A proper video format conversion system is important as it can directly affect the visual quality of the video of a DTV Receiver. Fundamentally, video format conversion operation requires advanced algorithms for multi-rate system design, poly-phase filter design, and interlaced to progressive scanning rate conversion or simply deinterlacing, where deinterlacing represents an operation that doubles the vertical scanning rate of the interlaced video signal.
Historically, deinterlacing algorithms were developed to enhance the video quality of NTSC TV receivers by reducing the intrinsic annoying artifacts of the interlaced video signal such as a serrate line observed when there is motion between fields, line flickering, raster line visibility, and field flickering. This also applies to the DTV Receiver.
Elaborate deinterlacing algorithms utilizing motion detection or motion compensation allow doubling the vertical scanning rate of the interlaced video signal especially for stationary (motionless) objects in the video signal. Motion detection based deinterlacing operation can be used for analog and digital TV receivers.
A number of deinterlacing algorithms exist. Such deinterlacing algorithms can be categorized into two classes: 2-D (spatial) deinterlacing algorithms and 3-D (spatio-termporal) deinterlacing algorithms depending on the use of motion information embedded in consecutive interlaced video sequence. It is well-known that a 3-D deinterlacing algorithm based on motion detection provides more pleasing performance than a 2-D deinterlacing algorithm. The key point of a 3-D deinterlacing algorithm is precisely detecting motion in the interlaced video signals.
Existing methods disclose estimating a motion decision factor based on the frame difference signal and the sample correlation in vertical direction. These methods provide a way of reducing the visual artifacts that can arise from false motion detection by utilizing the sample correlation in vertical direction of the sampling point where the value is to be interpolated. However, such methods may not provide a true motion detection method when there are high frequency components in vertical direction. As a consequence, such methods do not increase the vertical resolution even when there is no real motion between fields. In other methods, the filtering result of the frame difference is compared with a constant value to determine the motion detection signal. However, in such methods, performance deteriorates when noise in the video sequence increases.
BRIEF SUMMARY OF THE INVENTIONThe present invention addresses the above shortcomings. In one embodiment the present invention provides a method of computing a motion decision value for a video processing system, comprising the steps of: inputting a video signal with an interlaced video sequence of fields; computing a frame difference signal from a difference between a previous field and a next field in the video sequence; computing a point-wise motion detection signal based on the frame difference signal and noise in the video sequence, wherein the point-wise motion detection signal is noise-adaptive; and computing the motion decision value as a function of the point-wise motion detection signal.
In another aspect, the present invention provides a method of processing interlaced video signals, comprising the steps of: spatially interpolating a value of the video: signal at a given location from a video signal of at least one adjacent location in a given video field; temporally interpolating the value of the video signal at the given location from a video signal at the same location in temporally adjacent video fields; forming a motion decision value for the same location as discussed above; and mixing an output signal for the video signal at the given location from the spatially interpolated signal and the temporally interpolated signal and weighting the output signal in accordance with the motion decision value.
The present invention further provides systems to implement the above methods. Other embodiments, features and advantages of the present invention will be apparent from the following specification taken in conjunction with the following drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
FIGS. 5A-F show examples of calculating a motion decision signal according to the present invention.
In one embodiment the present invention provides a robust method of estimating a noise-adaptive motion decision parameter in an interlaced video sequence. Further, the present invention provides a deinterlacing system utilizing the motion decision parameter estimation method.
In order to systematically describe the deinterlacing problem and the methods of the present invention, in the following description let xn denote the incoming interlaced video field at time instant t=n, and xn(v,h) denote the associated value of the video signal at the geometrical location (v,h) where v represents vertical location and h represents horizontal location.
Referring to the example in
Top and bottom fields 10, 20 are typically available in turn in time. It is assumed that the input interlaced video is corrupted by independent, identically distributed additive and stationary zero-mean Gaussian noise with variance σ02, that is, each available signal value xn(v,h) can be denoted as xn(v,h)={circumflex over (x)}n(v,h)+δn(v,h), where {circumflex over (x)}n(v,h) is the true pixel value without noise corruption and δn(v,h) is the Gaussian distributed noise component. It is further assumed that the noise variance σ02 is already known, manually set or pre-detected by a separated noise estimation unit. σ0 represents noise standard deviation.
Based upon the above description of the interlaced video signal, a deinterlacing problem can be stated as a process to reconstruct or interpolate the unavailable signal values of in each field. That is, the deinterlacing problem is to reconstruct the signal values of xn at odd lines (v=1, 3, 5, . . . ) for top field xn and to reconstruct the signal values of xn at even lines (v=0, 2, 4, . . . ) for bottom field xn.
For clarity of description herein, the deinterlacing problem is simplified as a process which reconstructs or interpolates the unavailable signal value of xn at the ith line where the signal values of the lines at i±1, i±3, i±5, . . . are available. More simply, deinterlacing is to interpolate the value of xn(i,h), which is not originally available. Because xn−1 and xn+1 have different sampling phase from xn, the signal values of xn−1(i,h) and xn+1(i,h) are available, whereby motion detection can be incorporated with the deinterlacing problem. This relation is depicted by example in
Referring to
First, the frame difference signal Dn is computed as the difference between the fields in one frame interval as Dn=|xn+1−xn−1| which is associated with a scene change that occurred between the fields xn+1 and xn−1. The frame difference signal is then low pass filtered as dn=LPF(Dn) where LPF(•) represents a low pass filtering process over the input video signal.
The M×N kernel (WM×N) of the low pass filter, LPF(•), can be expressed as
where (w11, . . . , wMN) represents a set of predetermined normalized coefficients (i.e.,
Based on the analysis in the commonly assigned patent application titled “Methods to estimate noise variance from a video sequence,” filed Nov. 17, 2004, Ser. No. 10/991,265(incorporated herein by reference), it can be seen that any value Dn in the non-motion region is a random variable with probability density function (p.d.f):
The filtered result dn in the non-motion region is also a random variable with a p.d.f. pd(z), satisfying:
In one example, if the noise standard deviation is σ0=3.0, and the kernel is
the p.d.f. pd(z) is as shown in
It should be mentioned that LPF(•) can be an all-pass filter depending on the choice of the kernel WM×N. As such, if the kernel is set as M=N=1 and w11=1, the LPF(•) becomes the all-pass filter and, thus, dn=Dn.
Next, a point-wise motion detection signal is computed as:
fn(i,h)=TK(dn(i,h)) (1)
where TK(•) denotes a threshold function. An example implementation of TK(•) can be represented as:
in which K is a constant value. The above function TK(•) outputs hard-switching motion detection signals, illustrated by the example curve in
The threshold Kσ0 is automatically adjusted according to the noise standard deviation of the video sequence. Robust performance can thus be obtained against noise. The value K can be determined by the error probability of detecting a non-motion pixel as a motion pixel:
Other noise-adaptive methods can also be used for computing soft-switching motion detection signals. From the stochastic characteristic of dn(v,h), a monotonically increasing curve can be used for implementing the function TK(•) as illustrated by examples in FIGS. 5B-F.
Then, the point-wise motion detection signal is filtered in spatial and temporal domains to obtain the motion decision parameter mn(i,h):
mn(i,h)=F(fn(i,h)).
An example implementation of the filter F(•) is shown in
φn(i,h)=fn(i,h)∥fn−1(i−1,h)∥fn−1(i+1,h),
where fn−1(•) denotes the one field delayed motion detection signal in relation (1), where the notation ∥ denotes the logical OR operation. Other methods can be used if soft-switching point-wise motion detection signal is used, such as
φn(i,h)=max(fn(i,h),fn−1(i−1,h),fn−1(i+1,h)).
The region-wise motion detection signal is then low-pass filtered to form the motion decision parameter mn(i,h). The A×B kernel, ΘA×B, of the low pass-filter can be expressed as
where (θ11, . . . , θAB) represents a set of predetermined normalized coefficients (i.e.,
For example, the kernel θA×B can be
The computed motion decision parameter mn(i,h) can then used to mix a spatially interpolated signal and a temporally interpolated signal.
The spatial interpolator 204 spatially interpolates the value of xn(i,h) by using a predetermined algorithm. The temporal interpolator 206 temporally interpolates the value of xn(i,h) by using a predetermined algorithm. The motion decision processor 208 computes the motion decision value, mn(i,h), as described above (e.g.
Conceptually, the value of the motion decision parameter is bounded as 0≦mn(i,h)≦1, wherein mn(i,h)=0 implies “no motion” and mn(i,h)=1 implies “motion”. The mixer 210 mixes the output signal of the spatial interpolator 204 and the output signal of the temporal interpolator 206 in accordance with the motion decision value mn(i,h). Denoting xns(i,h) and xnt(i,h) as the output signals of the spatial interpolator 204 and the temporal interpolator 206, respectively, then the output signal of the mixer 210 (i.e., the interpolated signal) is represented as
xn(i,h)=(1−mn(i,h))·xnt(i,h)+mn(i,h)·xns(i,h). (3)
Note that xn(i,h)=xnt(i,h) when mn(i,h)=0 (no motion), and xn(i,h)=xns(v,h) when mn(i,h)=1 (motion)
In the example of
Examples of the spatially interpolated signal xns(v,h) are
xns(i,h)=(xn(i−1,h)+xn(i+1,h))/2,
which corresponds to a line average, and
xns(i,h)=xn(i−1,h)
which corresponds to a method known as line doubling.
Examples of temporally interpolated signal xnt(v,h) are
xnt(i,h)=(xn+1(i,h)+xn−1(i,h))/2
and
xnt(i,h)=xxn−1(i,h).
The present invention has been described in considerable detail with reference to certain preferred versions thereof; however, other versions are possible. Therefore, the spirit and scope of the appended claims should not be limited to the description of the preferred versions contained herein.
Claims
1. In a video signal processing system, a method of computing a motion decision value, comprising the steps of:
- inputting a video signal with an interlaced video sequence of fields;
- computing a frame difference signal from a difference between a previous field and a next field in the video sequence;
- computing a point-wise motion detection signal based on the frame difference signal and noise in the video sequence, wherein the point-wise motion detection signal is noise-adaptive; and
- computing the motion decision value as a function of the point-wise motion detection signal.
2. The method of claim 1 wherein the step of computing the point-wise detection signal further includes the steps of forming the point-wise motion detection signal based on the frame difference signal and a threshold value that is a function of noise in the video sequence.
3. The method of claim 2 wherein the step of calculating the point-wise motion detection signal further includes the steps of:
- comparing the frame difference signal to the threshold value;
- forming the point-wise motion detection signal based on the comparison results.
4. The method of claim 1 wherein the step of computing the motion decision value further includes the steps of:
- filtering the point-wise motion detection signal in spatial and temporal domains; and
- forming the motion decision value as a function of the filtered point-wise motion detection signal.
5. The method of claim 4 wherein the step of filtering the point-wise motion detection signal in spatial and temporal domains comprises the steps of:
- computing a region-wise motion detection signal from the point-wise motion detection signal an adjacent point-wise motion detection signal delayed by one field.
6. The method of claim 5 wherein the step of forming the motion decision value further comprises the steps of forming the motion decision value as a function of the region-wise motion detection signal.
7. The method of claim 6 wherein the step of forming the motion decision value further comprises the steps of low-pass filtering the region-wise motion detection signal to form the motion decision value.
8. The method of claim 1 further including the steps of low-pass filtering the difference signal prior to the step of computing the point-wise motion detection signal.
9. The method of claim 1 wherein the steps of computing the point-wise motion detection signal comprises computing fn(i,h)=TK(dn(i,h))
- where fn(•) is a point-wise motion detection signal, i and h define a spatial location of the respective video signal value in a Cartesian matrix, and TK(•) denotes a noise-adaptive threshold function.
10. The method of claim 9 wherein: T K ( y ) = { 1 if y > K σ 0 0 otherwise,
- where K is a constant value and σ0 represents noise standard deviation.
11. The method of claim 9 wherein TK(•) comprises a monotonically increasing function.
12. The method of claim 9 wherein the step of computing the motion decision value as a function of the point-wise motion detection signal comprises computing mn(i,h)=F(fn(i,h))
- where mn(i,h) is the motion decision value and F(•) comprises a filtering function.
13. The method of claim 12 wherein the filtering process F(•) comprises the steps of:
- computing a region-wise motion detection signal as
- φn(i,h)=fn(i,h)∥fn−1(i−1,h)∥fn−1(i+1,h),
- where fn−1(•) denotes a one field delayed motion detection signal, and the notation ∥ denotes the logical OR operation;
- low-pass filtering the region-wise motion detection signal to form the motion decision value mn(i,h).
14. A method of processing interlaced video signals, comprising the steps of:
- spatially interpolating a value of the video signal at a given location from a video signal of at least one adjacent location in a given video field;
- temporally interpolating the value of the video signal at the given location from a video signal at the same location in temporally adjacent video fields;
- forming a motion decision value for the same location in accordance with claim 1; and
- mixing an output signal for the video signal at the given location from the spatially interpolated signal and the temporally interpolated signal and weighting the output signal in accordance with the motion decision value.
15. The method of claim 14 further including the steps of varying the motion decision value between 0 and 1 as a function of an estimate of the degree of motion at the given location and, upon estimating a high degree of motion, heavily weighting the output signal towards the spatially interpolated signal and, upon estimating a low degree of motion, heavily weighting the output signal towards the temporally interpolated signal.
16. The method of claim 15 further including the steps of outputting the spatially interpolated signal as the output signal upon estimating a high degree of motion, and outputting the temporally interpolated signal as the output signal upon estimating a low degree of motion.
17. In a video signal processing system, an apparatus for computing a motion decision value, comprising:
- an input for receiving a video signal with an interlaced video sequence;
- difference forming means that computes a frame difference signal from a difference between a previous field and a next field in the video sequence;
- means for forming a point-wise motion detection signal based on the frame difference signal and noise in the video sequence, wherein the point-wise motion detection signal is noise-adaptive; and
- means for forming the motion decision value as a function of the point-wise motion detection signal.
18. The apparatus of claim 17 wherein the means for forming a point-wise motion detection signal further forms the point-wise motion detection signal based on the frame difference signal and a threshold value that is a function of noise in the video sequence.
19. The apparatus of claim 18 wherein the means for forming a point-wise motion detection signal forms the point-wise motion detection signal by further comparing the frame difference signal to the threshold value, and generating the point-wise motion detection signal based on the comparison results.
20. The apparatus of claim 17 wherein the means for forming the motion decision value further comprises:
- filter means for filtering the point-wise motion detection signal in spatial and temporal domains; and
- means for forming the motion decision value as a function of the filtered point-wise motion detection signal.
21. The apparatus of claim 20 wherein in filtering the point-wise motion detection signal in spatial and temporal domains, the filter means further computes a region-wise motion detection signal from the point-wise motion detection signal an adjacent point-wise motion detection signal delayed by one field.
22. The apparatus of claim 21 wherein the means for forming the motion decision value further forms the motion decision value as a function of the region-wise motion detection signal.
23. The apparatus of claim 22 the means for forming the motion decision value further comprises a low-pass filter for filtering the region-wise motion detection signal to form the motion decision value.
24. The apparatus of claim 17 further including a low-pass filter for low-pass filtering the difference signal prior to forming the point-wise motion detection signal.
25. The apparatus of claim 17 wherein the means for forming point-wise motion detection signal is programmed to compute fn(i,h)=TK(dn(i,h))
- where fn(•) is a point-wise motion detection signal, i and h define a spatial location of the respective video signal value in a Cartesian matrix, and TK(•) denotes a noise-adaptive threshold function.
26. The apparatus of claim 25 wherein: T K ( y ) = { 1 if y > K σ 0 0 otherwise,
- where K is a constant value and σ0 represents noise standard deviation.
27. The apparatus of claim 25 wherein TK(•) comprises a monotonically increasing function.
28. The apparatus of claim 25 wherein the means for forming the motion decision value as a function of the point-wise motion detection signal is programmed to compute mn(i,h)=F(fn(i,h))
- where mn(i,h) is the motion decision value and F(•) comprises a filter.
29. The apparatus of claim 28 wherein the F(•) filter comprises:
- means for forming a region-wise motion detection signal as
- φn(i,h)=fn(i,h)∥fn−1(i−1,h)∥fn−1(i+1,h),
- where fn−1(•) denotes a one field delayed motion detection signal, and the notation ∥ denotes the logical OR operation; and
- a low-pass filter for low-pass filtering the region-wise motion detection signal to form the motion decision value mn(i,h).
30. An apparatus of processing interlaced video signals, comprising:
- an input for receiving a video signal with an interlaced video sequence of fields;
- a spatial interpolator connected to said input and configured for spatially interpolating a value of the video signal at a given location from a video signal of at least one adjacent location in a given video field;
- a temporal interpolator connected to said input in parallel with said spatial interpolator for temporally interpolating the value of the video signal at the given location from a video signal at the same location in temporally adjacent video fields;
- a computing apparatus according to claim 17 connected to said input and in parallel with said spatial interpolator and said temporal interpolator for forming a motion decision value for the same location; and
- a mixer connected to receive an output signal from each of said spatial interpolator, said temporal interpolator, and said computing apparatus, said mixer configured for mixing an output signal for the video signal at the given location from the spatially interpolated signal and the temporally interpolated signal based on the motion decision value output by said computing apparatus.
31. The apparatus of claim 30 wherein the mixer further includes means for varying the motion decision value between 0 and 1 as a function of an estimate of the degree of motion at the given location and, upon estimating a high degree of motion, heavily weighting the output signal towards the spatially interpolated signal and, upon estimating a low degree of motion, heavily weighting the output signal towards the temporally interpolated signal.
32. The apparatus of claim 31 further comprising means for outputting the spatially interpolated signal as the output signal upon estimating a high degree of motion, and outputting the temporally interpolated signal as the output signal upon estimating a low degree of motion.
Type: Application
Filed: Jan 20, 2005
Publication Date: Jul 20, 2006
Patent Grant number: 7542095
Applicant: Samsung Electronics Co., Ltd. (Suwon City)
Inventors: Zhi Zhou (Irvine, CA), Yeong-Taeg Kim (Irvine, CA)
Application Number: 11/040,578
International Classification: H04N 7/01 (20060101); H04N 5/14 (20060101);