Bandwidth Extension via Constrained Synthesis
Audio signal bandwidth extension may be performed on a narrow bandwidth signal received from a remote source over the audio communication network. The narrow band signal bandwidth may be extended such that the bandwidth is greater than that of the audio communication network. The signal may be extended by synthesizing an audio signal having spectral values within an extended bandwidth from synthetic components. The synthetic components may be generated using parameters derived from original narrowband audio signal. The audio signal may be synthesized in the form of an excitation signal and vocal tract envelope. The excitation signal and vocal tract may be extended independently. In various embodiments, excitation components may be derived from constrained synthesis using a constraint filter with nulls in regions where the extension is desired.
This application claims the benefit of U.S. Provisional Application Ser. No. 61/658,831, filed Jun. 12, 2012. The disclosure of the aforementioned application is incorporated herein by reference in its entirety for all purposes.
BACKGROUNDAudio communication networks often have bandwidth limitations affecting the quality of the audio transmitted over the networks. For example, telephone channel networks limit the bandwidth of audio signal frequencies to between 300 Hz to 3500 Hz. As a result, speech transmitted using only this limited bandwidth sounds thin and dull due to the lack of low and high frequency content in the audio signal, thereby limiting speech quality.
A challenge in bandwidth enhancement systems is creating a natural and perceptually fused enhancement signal with frequency components outside the bandwidth of the original narrowband signal.
One of the common methods for creating higher frequency components may include (optionally without low-pass filtering) using the narrowband signal to create spectrally-folded energy in the higher band. This method may create a distinct distortion due to the aliasing which is difficult to (e.g., perceptually) conceal. Additionally, this method may fail to cover spectral holes near the folding frequency (e.g., a hole from 3.5 to 4.5 kHz for telephone speech).
Other methods may copy harmonics of the narrowband signal and transpose the harmonics to the higher empty frequency bands. These methods may rely (heavily) on accurate pitch detection for computing the translation parameters, and also require explicit phase alignment for achieving perceptual fusion.
SUMMARYEmbodiments of the present disclosure may address limitations present in the methods described above. Embodiments may, for example, create missing excitation components and may include envelope shaping methods to produce the final excitation-filter model output.
Embodiments of the present disclosure may treat the empty frequency bands where new components are sought as missing data regions. For example, for extending the higher band of telephone speech, the signal may be resampled to the desired rate (e.g., 16 kHz) with the frequency band above 3.5 kHz being treated as missing data. Signal reconstruction methods may be used to restore missing components.
In some embodiments, the methods described herein may be applied to the Linear Predictive Coding (LPC) residual of a resampled narrowband signal. The reconstruction method may be based at least on the properties of Code-Excited Linear Prediction (CELP) coding, where a Long-Term Predictor (LTP) and a fixed codebook may be used in an analysis-by-synthesis framework for replicating the residual signal with constrained degrees of freedom. In general, a “perceptual” filter may be applied to a matching error signal for shaping coding noise. Such a perceptual filter may be generally derived from at least the input envelope parameters.
Embodiments of the present disclosure may augment the perceptual filter by cascading it with a filter whose shape is similar to the passband characteristics of the telephone channel (e.g., the same filter that rejected the missing components). Such a filter may place emphasis on the present components and de-emphasize the missing components, so that the LTP creates a fullband signal (i.e., increased entropy) with the same periodicity as the narrowband input. A restored excitation signal may include estimates of the missing components and may be used to synthesize the enhancement signal using a bandwidth extended envelope filter.
Further embodiments of the present disclosure may include a non-transitory computer readable storage medium including a program executable by a processor to perform methods for extending a spectral bandwidth of an acoustic signal as described above.
The present technology may extend the bandwidth of an audio signal received over an audio communication network with a limited bandwidth. The audio signal bandwidth extension may commence with receiving a narrow bandwidth signal from a remote source transmitted over the audio communication network. The narrow band signal bandwidth may then be extended such that the bandwidth is greater than that of the audio communication network.
The present technology may treat an empty frequency band in regions of the bandwidth extension as missing data and synthesize new components in the extended bandwidth based on a spectral envelope and excitation components. In the various embodiments, the spectral envelope for the narrow bandwidth may be mapped to the extended bandwidth using a statistical model, while the excitation components for the extended bandwidth may be generated by Code-Excited Linear Prediction (CELP) closed loop coding in an analysis-by-synthesis framework with constrained degrees of freedom. A perceptual filter used in the CELP closed loop coding may be based on a spectral envelop mapped to the extended bandwidth. Embodiments of the present disclosure may also provide for augmenting a perceptual filter by cascading the filter with a filter having a shape similar to the passband characteristics of the telephone channel.
Various embodiments may be practiced with any audio device configured to receive and/or provide audio such as, but not limited to, cellular phones, phone handsets, headsets, and conferencing systems. It should be understood that while some embodiments will be described in reference to operations of a cellular phone, the present technology may be practiced with any audio device.
Processor 202 may execute instructions and modules stored in a memory (not illustrated in
The example receiver 200 is configured to receive an audio signal from the communications network 120. In the illustrated embodiment, the receiver 200 may include an antenna device (not shown on
The plot of
Audio processing system 210 may receive an audio signal including one or more time-domain input signals and provide the input signals for frequency analysis module 410. Audio processing system 210 may receive a narrow band acoustic signal from audio communication network 120.
The input signals may be received from receiver 200. Frequency analysis module 410 may generate frequency sub-bands from the time-domain signals and output the frequency sub-band signals.
Noise reduction module 420 may receive the narrow band signal (comprised of frequency sub-bands) and provide a noise reduced version to bandwidth extension module 430. An audio processing system suitable for performing noise reduction by noise reduction module 420 is discussed in more detail in U.S. patent application Ser. No. 12/832,901, titled “Method for Jointly Optimizing Noise Reduction and Voice Quality in a Mono or Multi-Microphone System, filed on Jul. 8, 2010, the disclosure of which is incorporated herein by reference for all purposes.
Bandwidth extension module 430 may process the noise reduced narrow band signal to extend the bandwidth of the signal. Bandwidth extension module 430 is discussed in more details below with reference to
Reconstruction module 440 may receive signals from bandwidth extension module 430 and reconstruct synthetically generated extended bandwidth signal into a single audio signal.
The envelope mapper module 520 may receive the spectral envelope component created from narrow band signal and may generate a spectral envelope component for the extended bandwidth signal. The extended bandwidth envelope may be represented using a Line Spectral Frequencies (LSF) model.
The excitation processing module 530 may generate the Linear Predictive Coding (LPC) residual of the narrowband signal by removing the spectral envelope component from the narrowband signal. The LPC residual data may be passed to resampling processing module 540. The resampling processing module 540 may receive the LPC residual of the narrowband signal. The signal may be resampled to a desired rate.
The CELP/LTP processing module 550 may receive resampled LPC residual signal from resampling processing module 540 (and extended bandwidth spectral envelope for the current frame from envelope mapper module 520) to determine an excitation component for the extended band signal. The CELP/LTP processing module 550 is discussed in more detail below with reference to
Synthesis module 560 may receive an excitation signal for the extended bandwidth from CELP/LTP processing module 550 and an extended bandwidth spectral envelope for the current frame from envelope mapper module 520. Synthesis module 560 may generate and output a synthesized audio signal having spectral values within the extended bandwidth (i.e., an Extended Bandwidth Signal). Synthesis module 560 is discussed in more detail below and in
Long term prediction model 610 may receive current frame band signals as well as pitch data and output an actual excitation for each band. The pitch may be determined based on audio signal data. An example method for determining a pitch is described in U.S. patent application Ser. No. 12/860,043, entitled “Monaural Noise Suppression Based on Computational Auditory Scene Analysis,” filed on Aug. 20, 2010, the disclosure of which is incorporated herein by reference for all purposes.
The actual excitations are provided by long term prediction module 610 to codebook look-up module 630. Codebook look-up module 630 receives the actual excitations, and compares them to a set of excitation values associated with a clean signal and stored in codebook 640. The set of clean excitation data stored in codebook 640 may represent different types of speech. Codebook look-up module 630 may select the clean excitation value set that best matches the reliable excitation values and provide the complete excitation data associated with the matching excitation value set e′j(t) as an output for the CELP/LTP processing module 550.
A weighted error metric may be used inside codebook look-up module 630 in order to find the best matched excitation set. The weighting parameters of the error metric can be based on a perceptual filter. The perceptual filter may be constructed using spectral envelope for extended bandwidth provided by envelope mapper module 520 (coupling between these modules is shown in
In some embodiments, additional constraints may be applied in reconstruction of the excitation components by codebook look-up module 630. The perceptual filter may be augmented by cascading the filter with a constrained filter 650. The constrained filter 650 may have nulls in the regions of the extension of the bandwidth. The constrained filter 650 may be of shape similar to a shape of a passband characteristic of a telephone channel.
Envelope processing may be performed at operation 830. The envelope processing may generate a spectral envelope component for the narrowband signal. The envelope mapping process may be carried out at operation 840. The envelope mapping process may map the spectral envelope for the narrowband signal to the extended bandwidth.
Excitation processing may be performed at operation 850. The excitation processing may generate excitation components for the extended bandwidth signal. The excitation components may be generated by CELP/LTP processing module 550 within bandwidth extension module 430.
Synthesis processing may be performed at operation 860. The synthesis processing may generate an extended band signal using the spectral envelope generated by envelope mapper module 520 and excitation components generated by CELP/LTP processing module 550 within bandwidth extension module 430.
The components shown in
Mass storage device 930, which may be implemented with a magnetic disk drive or an optical disk drive, is a non-volatile storage device for storing data and instructions for use by processor 910. Mass storage device 930 may store the system software for implementing embodiments of the present disclosure for purposes of loading that software into main memory 920.
Portable storage device 940 operates in conjunction with a portable non-volatile storage medium, such as a floppy disk, compact disk, digital video disc, or USB storage device, to input and output data and code to and from the computer system 900 of
Input devices 960 provide a portion of a user interface. Input devices 960 may include an alphanumeric keypad, such as a keyboard, for inputting alpha-numeric and other information, or a pointing device, such as a mouse, a trackball, stylus, or cursor direction keys. Input devices 960 may also include a touchscreen. Additionally, the system 900 as shown in
Display system 970 may include a liquid crystal display (LCD) or other suitable display device. Display system 970 receives textual and graphical information, and processes the information for output to the display device.
Peripherals 980 may include any type of computer support device to add additional functionality to the computer system. Peripheral device(s) 980 may include a modem or a router.
The components provided in the computer system 900 of
It is noteworthy that any hardware platform suitable for performing the processing described herein is suitable for use with the embodiments provided herein. Computer-readable storage media refer to any medium or media that participate in providing instructions to a central processing unit (CPU), a processor, a microcontroller, or the like. Such media may take forms including, but not limited to, non-volatile and volatile media such as optical or magnetic disks and dynamic memory, respectively. Common forms of computer-readable storage media include a floppy disk, a flexible disk, a hard disk, magnetic tape, any other magnetic storage medium, a CD-ROM disk, digital video disk (DVD), Blu-ray Disc (BD), any other optical storage medium, RAM, PROM, EPROM, EEPROM, FLASH memory, and/or any other memory chip, module, or cartridge.
Claims
1. A method for extending bandwidth of an audio signal, the method comprising:
- receiving, by a processor, an audio signal having spectral values within a narrow bandwidth;
- determining, via instructions stored in a memory and executed by the processor, synthetic components of an audio signal having spectral values within an extended bandwidth; and
- synthesizing, via instructions stored in the memory and executed by the processor and based on the synthetic components, an extended audio signal having spectral values within an extended bandwidth.
2. The method of claim 1, wherein the extended bandwidth includes a frequency outside the narrow bandwidth.
3. The method of claim 1, wherein the synthetic components are divided into a spectral envelope and excitation components.
4. The method of claim 3, wherein the spectral envelope and the excitation components are estimated independently.
5. The method of claim 3, wherein the spectral envelope for the extended bandwidth signal is estimated based on information derived from the spectral envelope of the narrow bandwidth signal.
6. The method of claim 3, wherein the spectral envelope for the extended bandwidth is estimated based on a statistical model, the statistical model mapping the spectral envelope for the narrow bandwidth signal to the spectral envelope for the extended bandwidth signal.
7. The method of claim 3, wherein synthesizing includes applying a gain to excitation components of the extended bandwidth signal, the gain being based on the spectral envelope of the extended bandwidth signal.
8. The method of claim 3, wherein the excitation components are derived using a constrained filter, the constrained filter having nulls in regions of extension of the narrow bandwidth.
9. The method of claim 8, wherein the constrained filter has a shape similar to a shape of a passband filter of a telephone channel.
10. A system for bandwidth extension of an audio signal, the system comprising:
- a processor; and
- a memory communicatively coupled with the processor, the memory storing instructions which when executed by the processor performs a method comprising: receiving an audio signal having spectral values within a narrow bandwidth; determining synthetic components of an audio signal having spectral values within an extended bandwidth; and synthesizing, based on the synthetic components, the extended audio signal having spectral values within the extended bandwidth.
11. The system of claim 10, wherein the extended bandwidth includes a frequency outside of the narrow bandwidth.
12. The system of claim 10, wherein the synthetic components are divided into a spectral envelope and excitation components.
13. The system of claim 12, wherein the spectral envelope and the excitation components are estimated independently.
14. The system of claim 12, wherein the spectral envelope for the extended bandwidth signal is estimated based on information derived from the spectral envelope of the narrow bandwidth signal.
15. The system of claim 12, wherein the spectral envelope is estimated based on a statistical model, the statistical model mapping the spectral envelope for the narrow bandwidth signal to the spectral envelope of the extended bandwidth signal.
16. The system of claim 12, wherein synthesizing includes applying a gain to excitation components of the extended bandwidth signal, the gain being based on the spectral envelope of extended bandwidth signal.
17. The system of claim 12, wherein the excitation components are derived using a constrained filter with nulls in regions of extension of the narrow bandwidth.
18. The system of claim 17, wherein the constrained filter has a shape similar to a shape of a passband filter of a telephone channel.
19. A non-transitory computer-readable storage medium having embodied thereon a program, the program being executable by a processor to perform a method for bandwidth extension, the method comprising:
- receiving an audio signal having spectral values within a narrow bandwidth;
- determining synthetic components of an audio signal having spectral values within an extended bandwidth; and
- synthesizing, based on the synthetic components, the extended audio signal having spectral values within the extended bandwidth.
20. The non-transitory computer-readable storage medium of claim 19, wherein the extended bandwidth includes a frequency outside the narrow bandwidth.
Type: Application
Filed: Jun 12, 2013
Publication Date: Dec 12, 2013
Inventors: Carlos Avendano (Campbell, CA), Marios Athineos (San Francisco, CA), Ethan Duni (Mountain View, CA)
Application Number: 13/916,388
International Classification: G10L 19/12 (20060101);