Adaptive Approach to Improve G.711 Perceptual Quality
In order to achieve the best improvement of ITU G.711 related codec perceptual quality, perceptual weighting controlling parameter(s) should be at least adaptive to relative quantization error statistics or adaptive to signal level. When the relative quantization error statistics are larger or the signal level is lower, the perceptual weighting should be “stronger”; when the relative quantization error statistics are smaller or the signal level is larger, the perceptual weighting should be “weaker”.
Latest Huawei Technologies Co., Ltd. Patents:
This application is a continuation of U.S. application Ser. No. 12/203,052, filed on Sep. 2, 2008, which claims priority to U.S. Provisional Application No. 60/997,663, filed on Oct. 4, 2007. Both of the aforementioned patent applications are hereby incorporated by reference in their entireties.
BACKGROUND OF THE INVENTION1. Field of the Invention The present invention is generally in the field of signal coding. In particular, the present invention is in the field of speech/signal coding and specifically in the application where ITU G.711 A-law or μ-law codec is involved.
2. Background Art
ITU G.711G.711 is an old ITU speech and audio codec standard which has been widely used in communication systems. G.711 is PCM based codec. Every signal sample is encoded with 8 bits. If the sampling rate is 8 kHz, the resulting codec bit rate is 64 kb/sec. Two encoding laws are recommended and these are commonly referred to as the A-law and the μ-law. The definition of these laws is given in Tables published in the ITU recommendation. When using the μ-law in networks where suppression of the all 0 character signal is required, the character signal corresponding to negative input values between decision values numbers 127 and 128 should be 00000010 and the value at the decoder output is -7519. The corresponding decoder output value number is 125. The number of quantized values results from the encoding law. Digital paths between countries which have adopted different encoding laws should carry signals encoded in accordance with the A-law. Where both countries have adopted the same law, that law should be used on digital paths between them. Any necessary conversion will be done by the countries using the μ-law. The rules for conversion are given in the ITU publication. Every “decision value” and “quantized value” of the A (resp. μ) law should be associated with a “uniform PCM value”. (For a definition of “decision value” and “quantized value”, see ITU Recommendation G.701 and in particular FIG. 2/G.701). This requires the application of a 13 (14) bit uniform PCM code. The mapping from A-law PCM, and μ-law PCM, respectively, to the uniform code is given in the ITU publication. The conversion to A-law or μ-law values from uniform PCM values corresponding to the decision values is left to the individual equipment specification. One option is described in ITU Recommendation G.721, §4.2.8 subblock COMPRESS.
Perceptual Weighting FilterPerceptual weighting filtering is a technology which explores human ear masking effect to improve perceptual quality of signal coding or speech coding. This technology has been widely used in many standards during recent decades. One typical application of perceptual weighting is shown in
where * means mathematical convolution; hw(n) is the impulsive response of weighting filter W(z).
The equation (1) can be re-written in another form:
where hF(n) is the impulsive response of the modified weighting filter F(z)=W(z)−1.
The equation (2) can be expressed in the diagram shown in
The above presented weighting filter is used in encoder side only. This paragraph will describe the usage of weighting filter in both encoder and decoder; such an example can be seen in ITU G.729.1 and other standards.
All above mentioned weighting filters are normally estimated on unquantized original signal in encoder or quantized original signal in decoder.
This invention proposes a way to control weighting filter parameters; in particular, the invention is used to improve the quantizer (encoder) and/or de-quantizer (decoder), which is related to ITU standard G.711.
SUMMARY OF THE INVENTIONThis invention proposes a way to control weighting filter parameters; in particular, the invention is used to improve the quantizer (encoder) and/or de-quantizer (decoder), which is related to ITU standard G.711. When relative quantization error is larger or signal level is very low, perceptual weighting filter should be tuned in one way; when relative quantization error is small or signal level is high, perceptual weighting filter should be tuned in another (opposite) way.
The features and advantages of the present invention will become more readily apparent to those ordinarily skilled in the art after reviewing the following detailed description and accompanying drawings, wherein:
G.711. In the similar way as
1. Characteristics of G.711 with General Weighting Filters
G.711 is a very old ITU speech and audio codec standard which is widely used in communication systems. G.711 is PCM based codec. Every signal sample is encoded with 8 bits. At the sampling rate of 8 kHz, the resulting bit rate is 64 kb/sec. G.711 can be used alone or works as a core layer of scalable codecs. There are two coding schemes for G.711; one is called A-Law; another one is called μ-Law. They are all scalar quantization approach. The quantization step size is changed according to the sample magnitude, which could be 8, 16, 32, 64, . . . , etc. If we define absolute quantization error and relative quantization error as followings,
where s(n) is unquantized original signal sample entering G.711 encoder and ŝ(n) is quantized original signal sample outputting from G.711 decoder, the statistical absolute error is determined by the quantization step size. Both A-law and μ-law coding schemes generate larger absolute errors and smaller relative errors in high signal level area; they produce smaller absolute errors and larger relative errors in low signal level area.
It is well known that perceptual weighting technology (or quantization noise feedback technology), which uses human ear masking effect, can improve the perceptual quality resulted by any speech or audio codecs. The quantization error spectrum with the original G.711 is flat as shown in
In similar way as
The perceptual weighting filter can be expressed as W(z, α), here the parameter α is traditionally a constant (0≦α≦1) which controls how “strong” the weighting should be. A typical example weighting filter could be
here, {αi, i=1,2, . . . , P} are LPC coefficients obtained from LPC analysis on unquantized original signal or quantized original signal. Sometimes, several controlling parameters are used to determine a weighting filter; such a popular example of the weighting filter could be
where β<α. Another popular weighting filter is like,
Due to special quantization error structure (shown in
The above description contains specific information pertaining to the adaptive weighting filter parameter control. However, one skilled in the art will recognize that the present invention may be practiced in conjunction with various encoding/decoding algorithms different from those specifically discussed in the present application.
Moreover, some of the specific details, which are within the knowledge of a person of ordinary skill in the art, are not discussed to avoid obscuring the present invention.
The drawings in the present application and their accompanying detailed description are directed to merely example embodiments of the invention. To maintain brevity, other embodiments of the invention which use the principles of the present invention are not specifically described in the present application and are not specifically illustrated by the present drawings.
Claims
1. A method for improving a perceptual weighting filter W(z) or a perceptual noise shaping filter F(z), wherein F(z)=W(z)−1, and the weighting filter W(z) or the perceptual noise shaping filter F(z) is used to enhance perceptual performance of a G.711 codec, the method comprising: wherein the unquantized signal is encoded using the W(z) or the F(z) as follows: W ( z ) = A ( z / α ) = 1 + ∑ i = 1 P a i · α i · z - i, i = 1, 2, … , P F ( z ) = W ( z ) - 1 = ∑ i = 1 P a i · α i · z - i, i = 1, 2, … , P A ( z ) = 1 + ∑ i = 1 P a i · z - i, i = 1, 2, … , P where A(z) is a linear prediction (LP) predictor obtained from analyzing an input signal, {α1, i=1,2,..., P} are the LP predictor coefficients, P is the LP predictor order and α is a controlling parameter controlling the W(z) or the F(z), and wherein the controlling parameter a depends on an input signal level, when the input signal level becomes low (towards zero), a approaches 0.
- receiving, by the G.711 codec, an unquantized signal;
- encoding, by the G.711 codec, said unquantized signal to produce an encoded bitstream; and
- sending, by the G.711 codec, the encoded bitstream to a decoder for outputting an enhanced decoded signal;
2. The method of claim 1, wherein the G.711 codec performs as a core layer of a scalable encoder.
3. The method of claim 1, wherein the G.711 codec is compatible with International Telecommunication Union (ITU) G.711 A-law or μ-law codec standard.
4. A method for improving a perceptual weighting filter W(z) or a perceptual noise shaping filter F(z), wherein F(z)=W(z)−1, and the weighting filter W(z) or the perceptual noise shaping filter F(z) is used to enhance perceptual performance of a G.711 codec, the method comprising: wherein the W(z) or the F(z) is characterized as follows:
- receiving, by the G. 711 codec, an unquantized signal;
- encoding, by the G. 711 codec, said unquantized signal using the W(z) or the F(z) to produce an encoded bitstream; and
- sending, by the G. 711 codec, the encoded bitstream to a decoder for outputting an enhanced decoded signal;
- the W(z) or the F(z) is controlled by one or more parameters; and
- at least one of the parameters controls the W(z) or the F(z) based on an input signal level, and when the input signal level becomes low (towards zero), the W(z) approaches 1 or equivalently the F(z) approaches 0.
5. The method of claim 4, wherein the perceptual weighting filter W(z) or the perceptual noise shaping filter F(z) is defined by the following equations: W ( z ) = A ( z / α ) = 1 + ∑ i = 1 P a i · α i · z - i, i = 1, 2, … , P F ( z ) = W ( z ) - 1 = ∑ i = 1 P a i · α i · z - i, i = 1, 2, … , P A ( z ) = 1 + ∑ i = 1 P a i · z - i, i = 1, 2, … , P where A(z) is a linear prediction (LP) predictor obtained from analyzing the input signal, {α1, i=1,2,..., P} are the LP predictor coefficients, P is the LP predictor order and a is the controlling parameter controlling the W(z) or the F(z).
6. The method of claim 4, wherein the perceptual weighting filter W(z) or the perceptual noise shaping filter F(z) is defined by the following equations: W ( z ) = A ( z / α ) A ( z / β ) F ( z ) = W ( z ) - 1 A ( z ) = 1 + ∑ i = 1 P a i · z - i, i = 1, 2, … , P where A(z) is a linear prediction (LP) predictor obtained from analyzing the input signal; {αi, i =1,2,..., P} are the LP predictor coefficients; P is the LP predictor order; α and β (β<α) are the controlling parameters controlling the W(z) or the F(z).
7. The method of claim 4, wherein the perceptual weighting filter W(z) or the perceptual noise shaping filter F(z) is defined by the following equations: W ( z ) = A ( z / α ) 1 + β · z - 1 F ( z ) = W ( z ) - 1 A ( z ) = 1 + ∑ i = 1 P a i · z - i, i = 1, 2, … , P where A(z) is a linear prediction (LP) predictor obtained from analyzing the input signal; {αi, i=1,2,..., P} are the LP predictor coefficients; P is the LP predictor order; α and β (β<α) are the controlling parameters controlling the W(z) or the F(z).
8. The method of claim 4, wherein the G.711 encoder performs as a core layer of a scalable encoder.
9. The method of claim 4, wherein the G.711 encoder is compatible with International Telecommunication Union (ITU) G.711 A-law or μ-law codec standard.
10. A codec, comprising: wherein the unquantized signal is encoded using a perceptual weighting filter W(z) or a perceptual noise shaping filter F(z), the W(z) or the F(z) is characterized as follows: W ( z ) = A ( z / α ) = 1 + ∑ i = 1 P a i · α i · z - i, i = 1, 2, … , P F ( z ) = W ( z ) - 1 = ∑ i = 1 P a i · α i · z - i, i = 1, 2, … , P A ( z ) = 1 + ∑ i = 1 P a i · z - i, i = 1, 2, … , P where A(z) is a linear prediction (LP) predictor obtained from analyzing an input signal, {αi, i=1,2,..., P} are the LP predictor coefficients, P is the LP predictor order and α is a controlling parameter controlling the W(z) or the F(z), and wherein the controlling parameter a depends on an input signal level, when the input signal level becomes low (towards zero), a approaches 0.
- a receiving unit, configured to receive an unquantized signal;
- an encoding unit, configured to encode said unquantized signal to produce an encoded bitstream; and
- a sending unit, configured to send the encoded bitstream to a decoder for outputting an enhanced decoded signal;
11. The codec according to claim 10, wherein the codec performs as a core layer of a scalable encoder.
12. The codec according to claim 10, wherein the codec is compatible with International Telecommunication Union (ITU) G.711 A-law or β-law codec standard.
13. A codec, comprising: wherein the unquantized signal is encoded using a perceptual weighting filter W(z) or a perceptual noise shaping filter F(z), wherein F(z)=W(z)−1, and the W(z) or the F(z) is characterized as follows:
- a receiving unit, configured to receive an unquantized signal;
- an encoding unit, configured to encode said unquantized signal to produce an encoded bitstream; and
- a sending unit, configured to send the encoded bitstream to a decoder for outputting an enhanced decoded signal;
- the W(z) or the F(z) is controlled by one or more parameters; and
- at least one of the parameters controls the W(z) or the F(z) based on an input signal level, and when the input signal level becomes low (towards zero), the W(z) approaches 1 or equivalently the F(z) approaches 0.
14. The codec according to claim 13, wherein the perceptual weighting filter W(z) or the perceptual noise shaping filter F(z) is defined by the following equations: W ( z ) = A ( z / α ) = 1 + ∑ i = 1 P a i · α i · z - i, i = 1, 2, … , P F ( z ) = W ( z ) - 1 = ∑ i = 1 P a i · α i · z - i, i = 1, 2, … , P A ( z ) = 1 + ∑ i = 1 P a i · z - i, i = 1, 2, … , P where A(z) is a linear prediction (LP) predictor obtained from analyzing the input signal, {αi, i=1,2,..., P} are the LP predictor coefficients, P is the LP predictor order and α is the controlling parameter controlling the W(z) or the F(z).
15. The codec according to claim 13, wherein the perceptual weighting filter W(z) or the perceptual noise shaping filter F(z) is defined by the following equations: W ( z ) = A ( z / α ) A ( z / β ) F ( z ) = W ( z ) - 1 A ( z ) = 1 + ∑ i = 1 P a i · z - i, i = 1, 2, … , P where A(z) is a linear prediction (LP) predictor obtained from analyzing the input signal; {αi, i=1,2,..., P} are the LP predictor coefficients; P is the LP predictor order; α and β (β<α) are the controlling parameters controlling the W(z) or the F(z).
16. The codec according to claim 13, wherein the perceptual weighting filter W(z) or the perceptual noise shaping filter F(z) is defined by the following equations: W ( z ) = A ( z / α ) 1 + β · z - 1 F ( z ) = W ( z ) - 1 A ( z ) = 1 + ∑ i = 1 P a i · z - i, i = 1, 2, … , P where A(z) is a linear prediction (LP) predictor obtained from analyzing the input signal; {αi, i=1,2,..., P}are the LP predictor coefficients; P is the LP predictor order; α and β (β<α) are the controlling parameters controlling the W(z) or the F(z).
17. The codec according to claim 13, wherein the codec performs as a core layer of a scalable encoder.
18. The codec according to claim 13, wherein the codec is compatible with International Telecommunication Union (ITU) G.711 A-law or μ-law codec standard.
Type: Application
Filed: Aug 20, 2012
Publication Date: Aug 22, 2013
Applicant: Huawei Technologies Co., Ltd. (Shenzhen)
Inventor: Yang Gao (Mission Viego, CA)
Application Number: 13/590,157
International Classification: G10L 21/0208 (20060101);