Transformation Patents (Class 704/203)
  • Publication number: 20090222258
    Abstract: A voice activity detection method in a low SNR environment. The voice activity detection is performed by extracting a long-term spectrum variation component and a harmonic structure as feature vectors from a speech signal and increasing difference in feature vectors between speech and non-speech (i) using the long-term spectrum variation component feature or (ii) using a long-term spectrum variation component extraction and a harmonic structure feature extraction. A correct rate and an accuracy rate of the voice activity detection is improved over conventional methods by using a long-term spectrum variation component having a window length over an average phoneme duration of an utterance in the speech signal. The voice activity detection system and method provides speech processing, automatic speech recognition, and speech output capable of very accurate voice activity detection.
    Type: Application
    Filed: February 27, 2009
    Publication date: September 3, 2009
    Inventors: Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura
  • Patent number: 7580893
    Abstract: Acoustic signal encoder is provided which comprises a subband filter band to divide an original signal into a plurality of frequency bands, a spectrum transformation circuit to detect the amplitude of a signal in each of the plurality of frequency bands in each of sub-blocks resulted by division of a block length for signal coding, process the signal amplitude in each band based on the detected amplitude and transform the signals divided in the frequency bans to spectra, a normalizing circuit and quantizing circuit to normalize and quantize the spectrum, respectively, and a code row generator to generate a code row from the signals processed by the above circuits.
    Type: Grant
    Filed: October 5, 1999
    Date of Patent: August 25, 2009
    Assignee: Sony Corporation
    Inventor: Shiro Suzuki
  • Publication number: 20090210219
    Abstract: Provided is a residual signal coding/decoding apparatus and method. The residual signal coding apparatus includes a transformer, a band splitter, a pulse searcher, and a pulse quantizer. The transformer transforms time-domain residual signals into a frequency domain to output transform coefficients. The band splitter splits the transform coefficients into bands to output the transform coefficients. The pulse searcher searches the transform coefficients for the respective bands to select optimal pulses and output parameters of the optimal pulses. The pulse quantizer quantizes the parameters of the optimal pulses.
    Type: Application
    Filed: April 8, 2009
    Publication date: August 20, 2009
    Inventors: Jong-Mo SUNG, Hyun-Woo KIM, Mi-Suk LEE, Do-Young KIM
  • Patent number: 7577564
    Abstract: Method and apparatus for the classification of speech signals. Speech is classified into two broad classes of speech production—whispered speech and normally phonated speech. Speech classified in this manner will yield increased performance of automated speech processing systems because the erroneous results that occur when typical automated speech processing systems encounter non-typical speech such as whispered speech, will be avoided.
    Type: Grant
    Filed: March 3, 2003
    Date of Patent: August 18, 2009
    Assignee: The United States of America as represented by the Secretary of the Air Force
    Inventors: Stanley J. Wenndt, Edward J. Cupples
  • Patent number: 7552225
    Abstract: A communication system can include a telephony application server and at least one speech engine, where the system uses a standardized messaging protocol that specifies a standard for media resources. The telephony application server can have at least one voice server component. The speech engines can be allocated to handle requests for the voice server component on a per turn basis. The standardized messaging protocol can define a message format for request messages sent from the voice server component to a selected speech engine, a message format for response messages sent from the speech engine to the voice server component, and a message format for event messages sent from the speech engine to the voice server component. Each message format can include a field for a call identifier.
    Type: Grant
    Filed: April 28, 2004
    Date of Patent: June 23, 2009
    Assignee: International Business Machines Corporation
    Inventors: Thomas E. Creamer, Victor S. Moore, Wendi L. Nusbickel, Ricardo Dos Santos, James J. Sliwa
  • Publication number: 20090157391
    Abstract: An audio fingerprint is extracted from an audio sample, where the fingerprint contains information that is characteristic of the content in the sample. The fingerprint may be generated by computing an energy spectrum for the audio sample, resampling the energy spectrum logarithmically in the time dimension, transforming the resampled energy spectrum to produce a series of feature vectors, and computing the fingerprint using differential coding of the feature vectors. The generated fingerprint can be compared to a set of reference fingerprints in a database to identify the original audio content.
    Type: Application
    Filed: February 24, 2009
    Publication date: June 18, 2009
    Inventor: Sergiy Bilobrov
  • Publication number: 20090157393
    Abstract: An encoding device (200) includes an MDCT unit (202) that transforms an input signal in a time domain into a frequency spectrum including a lower frequency spectrum, a BWE encoding unit (204) that generates extension data which specifies a higher frequency spectrum at a higher frequency than the lower frequency spectrum, and an encoded data stream generating unit (205) that encodes to output the lower frequency spectrum obtained by the MDCT unit (202) and the extension data obtained by the BWE encoding unit (204). The BWE encoding unit (204) generates as the extension data (i) a first parameter which specifies a lower subband which is to be copied as the higher frequency spectrum from among a plurality of the lower subbands which form the lower frequency spectrum obtained by the MDCT unit (202) and (ii) a second parameter which specifies a gain of the lower subband after being copied.
    Type: Application
    Filed: February 12, 2009
    Publication date: June 18, 2009
    Inventors: Mineo TSUSHIMA, Takeshi NORIMATSU, Kosuke NISHIO, Naoya TANAKA
  • Publication number: 20090150143
    Abstract: A post-filtering apparatus and method for speech enhancement in a modified discrete cosine transform (MDCT) domain are disclosed. In the apparatus and method, previous and current MDCT coefficients are used for obtaining a speech spectrum coefficient similar to a real speech spectrum, and a convex function is used for transforming the speech spectrum coefficient and obtaining a post-filter coefficient so that difference can increase in the case where the speech spectrum coefficient is small but decrease in the case where the coefficient is large. Then, the post-filter coefficient is applied to the MDCT coefficient. With this configuration, both the current and previous MDCT values are used, so that it is possible to obtain a spectrum coefficient similar to the real speech spectrum and to obtain a more accurate filter coefficient. Further, the coefficient is adaptively transformed through the convex function, thereby enhancing speech quality.
    Type: Application
    Filed: June 5, 2008
    Publication date: June 11, 2009
    Applicant: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE
    Inventors: Hyun-woo Kim, Jong-mo Sung, Mi-suk Lee, Do-young Kim, Byung-sun Lee
  • Patent number: 7546240
    Abstract: A transform coder is described that performs a time-split transform in addition to a discrete cosine type transform. A time-split transform is selectively performed based on characteristics of media data. Transient detection identifies a changing signal characteristic, such as a transient in media data. After encoding an input signal from a time domain to a transform domain, a time-splitting transformer selectively perform an orthogonal sum-difference transform on adjacent coefficients indicated by a changing signal characteristic location. The orthogonal sum-difference transform on adjacent coefficients results in transforming a vector of coefficients in the transform domain as if they were multiplied by an identity matrix including at least one 2×2 time-split block along a diagonal of the matrix. A decoder performs an inverse of the described transforms.
    Type: Grant
    Filed: July 15, 2005
    Date of Patent: June 9, 2009
    Assignee: Microsoft Corporation
    Inventors: Sanjeev Mehrotra, Wei-Ge Chen, Henrique Sarmento Malvar
  • Publication number: 20090132241
    Abstract: A method is provided whereby, before being subjected to a low rate voice coding, an incoming digital voice signal is chronologically segmented into blocks, the blocks are broken down respectively, in chronological order, into frequency components by a transformation in the frequency range and the frequency components are multiplied by weight factors depending on the frequency and modifiable in time, a frequency component being multiplied by the last weight factor calculated for the frequency component if the factor is less than the current weight factor.
    Type: Application
    Filed: May 20, 2008
    Publication date: May 21, 2009
    Applicant: Palm, Inc.
    Inventors: Walter Frank, Marc Ihle
  • Patent number: 7536299
    Abstract: Transmitters and receivers in multiple description coding systems use correlating and decorrelating transforms to generate and process multiple descriptions of elements of an input signal. The multiple descriptions include groups of correlating transform coefficients that permit recovery of an inexact facsimile of the signal if some of the correlating transform coefficients are lost or corrupted during transmission. Noiseless implementations of the correlating and decorrelating transforms are described that allow the signal elements to be quantized with different quantizing resolutions. Implementations using the Fast Hadamard Transform are described that reduce the resources needed to perform the transforms.
    Type: Grant
    Filed: December 19, 2005
    Date of Patent: May 19, 2009
    Assignee: Dolby Laboratories Licensing Corporation
    Inventors: Corey I. Cheng, Claus Bauer
  • Patent number: 7529941
    Abstract: A system and method of retrieving a watermark in a watermarked signal are disclosed. The watermarked signal comprises odd and even overlapped blocks where the watermark is contained in the even blocks. The method comprises, for each k-th even block, subtracting the two adjacent odd numbered blocks from the k-th even block of the watermarked signal to retrieve s*k(n), transforming s*k(n) into the frequency domain to generate Sk(f), calculating a phase of Sk(f) as ?(f) and a phase of Sk(f) as ?(f), calculating the difference ?(f) between ?(f) and ?(f), unwrapping ?(f) to obtain the phase modulation {tilde over (?)}k(f), and using a Viterbi search to retrieve the watermark embedded in {tilde over (?)}k(f).
    Type: Grant
    Filed: September 12, 2006
    Date of Patent: May 5, 2009
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: James David Johnston, Shyh-Shiaw Kuo, Schuyler Reynier Quackenbush, William Turin
  • Patent number: 7519532
    Abstract: Transcoding from EVRC to G.729ab with LSP parameters interpolated from EVRC to G.729ab, EVRC pitch used as input to G.729ab closed-loop pitch search, and G.729ab fixed codebook pulses found from a search limited to positions of EVRC fixed codebook pulses together with positions of target-impulse correlation maxima on the subframe tracks or full track search if no EVRC pulses.
    Type: Grant
    Filed: September 29, 2004
    Date of Patent: April 14, 2009
    Assignee: Texas Instruments Incorporated
    Inventor: Pankaj K. Rabha
  • Publication number: 20090094022
    Abstract: A transformation-parameter calculating unit calculates a first model parameter indicating a parameter of a speaker model for causing a first likelihood for a clean feature to maximum, and calculates a transformation parameter for causing the first likelihood to maximum. The transformation parameter transforms, for each of the speakers, a distribution of the clean feature corresponding to the identification information of the speaker to a distribution represented by the speaker model of the first model parameter. A model-parameter calculating unit transforms a noisy feature corresponding to identification information for each of speakers by using the transformation parameter, and calculates a second model parameter indicating a parameter of the speaker model for causing a second likelihood for the transformed noisy feature to maximum.
    Type: Application
    Filed: October 2, 2008
    Publication date: April 9, 2009
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Yusuke Shinohara, Masami Akamine
  • Publication number: 20090076804
    Abstract: A portable assistive listening system for enhancing sound for hearing impaired individuals includes a fully functional hearing aid and a separate handheld digital signal processing (DSP) device. The focus of the present invention is directed to the handheld DSP device. The DSP device includes a programmable digital signal processor, a UWB transceiver for communicating with the hearing aid and/or other wireless audio sources, an LCD display, a user input device (keypad) and at least one memory device for storing programming settings and data. Specifically, the invention focuses on a memory buffer within the DSP device, and configuration of the device to buffer an incoming audio signal, to enhance the audio for replay, and to replay that audio. The device is further configured to convert the audio signal to text for display on the handheld DSP device. The device enhances the audio and then buffers the enhanced audio stream prior to output.
    Type: Application
    Filed: September 13, 2007
    Publication date: March 19, 2009
    Applicant: BIONICA CORPORATION
    Inventors: KIPP BRADFORD, RALPH A. BECKMAN, JOHN F. MURPHY, III
  • Patent number: 7505897
    Abstract: The subject matter includes systems, engines, and methods for generalizing a class of Lempel-Ziv algorithms for lossy compression of multimedia. One implementation of the subject matter compresses audio signals. Because music, especially electronically generated music, has a substantial level of repetitiveness within a single audio clip, the basic Lempel-Ziv compression technique can be generalized to support representing a single window of an audio signal using a linear combination of filtered past windows. Exemplary similarity searches and filtering strategies for finding the past windows are described.
    Type: Grant
    Filed: January 27, 2005
    Date of Patent: March 17, 2009
    Assignee: Microsoft Corporation
    Inventors: Darko Kirovski, Zeph Landau
  • Patent number: 7505898
    Abstract: A simple and efficient method for producing an obfuscated speech signal which may be used to mask a stream of speech, is disclosed. A speech signal representing the speech stream to be masked is obtained. The speech signal is then temporally partitioned into segments, preferably corresponding to phonemes within the speech stream. The segments are then stored in a memory, and some or all of the segments are subsequently selected, retrieved, and assembled into an obfuscated speech signal representing an unintelligible speech stream that, when combined with the speech signal or reproduced and combined with the speech stream, provides a masking effect. While the presently preferred embodiment finds application most readily in an open plan office, embodiments suitable for use in restaurants, classrooms, and in telecommunications systems are also disclosed.
    Type: Grant
    Filed: July 11, 2006
    Date of Patent: March 17, 2009
    Assignee: Applied Minds, Inc.
    Inventors: W. Daniel Hillis, Bran Ferren, Russel Howe
  • Patent number: 7496482
    Abstract: A method and a device for signal separation. First, values of signals observed by M sensors are transformed into frequency domain values, and these frequency domain values are used to calculate relative values of the observed values between the sensors at each frequency. These relative values are clustered into N clusters, and the representative value of each cluster is calculated. Then, using these representative values, a mask is produced to extract the values of the signals emitted by V (1?V?M) signal sources from the frequency-domain signal values, and this mask is applied to the frequency-domain signal values. After that, if V=1 then the limited signal is output directly as a separated signal, while if V?2 then the separated values are obtained by separating this limited signal with separation techniques such as ICA.
    Type: Grant
    Filed: September 1, 2004
    Date of Patent: February 24, 2009
    Assignee: Nippon Telegraph and Telephone Corporation
    Inventors: Shoko Araki, Hiroshi Sawada, Shoji Makino, Ryo Mukai
  • Publication number: 20090043566
    Abstract: A speech processing apparatus includes a plurality of microphones which receive speech produced by a first sound source to obtain first speech signals for a plurality of channels having one-to-one correspondence with the plurality of microphones, a calculation unit configured to calculate a first characteristic amount indicative of an inter-channel correlation of the first speech signals, a storage unit configured to store in advance a second characteristic amount indicative of an inter-channel correlation of second speech signals for the plurality of channels obtained by receiving speech produced by a second sound source by the plurality of microphones, and a collation unit configured to collate the first characteristic amount with the second characteristic amount to determine whether the first sound source matches with the second sound source.
    Type: Application
    Filed: July 21, 2008
    Publication date: February 12, 2009
    Inventor: Tadashi Amada
  • Publication number: 20090030676
    Abstract: A method of deriving a compressed acoustic model for speech recognition is disclosed herein. In a described embodiment, the method comprises transforming an acoustic model into an eigenspace at step 20, determining eigenvectors of the eigenspace and their eigenvalues, and selectively encoding dimensions of the eigenvectors based on values of the eigenspace at step 30 to obtain a compressed acoustic model at steps 40 and 50.
    Type: Application
    Filed: July 26, 2007
    Publication date: January 29, 2009
    Applicant: CREATIVE TECHNOLOGY LTD
    Inventors: Jun XU, Huayun ZHANG
  • Publication number: 20090024396
    Abstract: An audio signal encoding method and apparatus for efficiently encoding an audio signal in an interval having many birth sinusoids and enabling tracking of sinusoidal signals in the next interval, and a computer readable recording medium having embodied thereon a computer program for executing the audio signal encoding method are provided. According to the method and apparatus, by applying transform coding instead of parametric coding to a frame having many birth sinusoids, the sinusoids are encoded, thereby reducing the number of bits required for the encoding and enabling efficient coding. Also, when transform coding is applied to a frame of a predetermined interval, an inverse transform of the transform coding is applied to the encoded data in order to decode the data, and then sinusoids are extracted from the decoded data, thereby enabling tracking of sinusoids of the next frame.
    Type: Application
    Filed: February 8, 2008
    Publication date: January 22, 2009
    Applicant: Samsung Electronics Co., Ltd.
    Inventors: Chul-woo Lee, Geon-hyoung Lee, Jae-one Oh, Jong-hoon Jeong, Nam-suk Lee
  • Publication number: 20090018824
    Abstract: Provided is an audio encoding device for modeling a spectrum waveform and accurately restoring the spectrum waveform. The audio encoding device includes: an FFT unit (104) for subjecting a spectrum amplitude of a drive sound source signal to an FFT process to obtain an FFT transform coefficient; a second spectrum amplitude calculation unit (105) for calculating a second spectrum amplitude of the FFT transform coefficient; a peak point position identification unit (106) for identifying the positions of the most significant N peaks of the second spectrum amplitude; a coefficient selection unit (107) for selecting FFT transform coefficients corresponding to the identified positions; and a quantization unit (108) for quantizing the selected FFT transform coefficients.
    Type: Application
    Filed: January 30, 2007
    Publication date: January 15, 2009
    Applicant: MATSUSHITA ELECTRIC INDUSTRIAL CO., LTD.
    Inventor: Chun Woei Teo
  • Publication number: 20080312912
    Abstract: Provided is an audio signal encoding method including transforming an input signal from a time domain to a time/frequency domain using a first transformation method, extracting a stereo parameter from a signal of the time/frequency domain, encoding the stereo parameter, and down-mixing the signal of the time/frequency domain, transforming each of sub-bands of the down-mixed signal to a frequency domain by using a second transformation method, and encoding the signal of the frequency domain in the frequency domain.
    Type: Application
    Filed: October 4, 2007
    Publication date: December 18, 2008
    Applicant: Samsung Electronics Co., Ltd
    Inventors: Ki-hyun CHOO, Eun-mi OH, Jung-hoe Kim, Konstantin Osipov, Sergey Petrov
  • Publication number: 20080281584
    Abstract: A speech enhancement system improves the perceptual quality of an aural signal. A receiver detects and receives an unvoiced signal, a fully voiced signal, or a mixed voice remote signal. A coherence processor identifies the similarities or differences between a local signal and the remote signal. A cancellation processor or controller dampens reflected signals that may be part of the local signal.
    Type: Application
    Filed: June 29, 2007
    Publication date: November 13, 2008
    Applicant: QNX SOFTWARE SYSTEMS (WAVEMAKERS), INC.
    Inventors: Phillip A. Hetherington, Shreyas A. Paranjpe
  • Patent number: 7451077
    Abstract: Complex acoustic information, such as music, is presented as visual information or as movement of an object in a manner simulating the reception of the complex acoustic information by the human auditory system including a complexity of tempo, rhythms, intensity variation from highs to lows, and silences of the audio, providing a synchronicity with these characteristics. The acoustic information is processed by an acoustic human-like auditory transformation. The transformation may be varied depending on the presentation controlled by the device. The transformed signal is then applied to a tactile or visual presentation. The audience reception of the invention is through light, color, or animation of an image or object complementing the reception of the acoustic information.
    Type: Grant
    Filed: September 23, 2004
    Date of Patent: November 11, 2008
    Inventors: Felicia Lindau, Chuck Wooters, James Beck
  • Patent number: 7447317
    Abstract: In processing a multi-channel audio signal having at least three original channels, a first downmix channel and a second downmix channel are provided, which are derived from the original channels. For a selected original channel of the original channels, channel side information are calculated such that a downmix channel or a combined downmix channel including the first and the second downmix channels, when weighted using the channel side information, results in an approximation of the selected original channel. The channel side information and the first and second downmix channels form output data to be transmitted to a decoder, which, in case of a low level decoder only decodes the first and second downmix channels or, in case of a high level decoder provides a full multi-channel audio signal based on the downmix channels and the channel side information.
    Type: Grant
    Filed: October 2, 2003
    Date of Patent: November 4, 2008
    Assignee: Fraunhofer-Gesellschaft zur Foerderung der Angewandten Forschung E.V
    Inventors: Jürgen Herre, Johannes Hilpert, Stefan Geyersberger, Andreas Hölzer, Claus Spenger
  • Publication number: 20080249766
    Abstract: A scalable decoder which does not frequently switch the band of the decoded signal even if the signal in an expanded layer in band scalable encoding disappear and does not give any strangeness or discomfort to the subjective quality. If frame disappearance does not occur, the signal is a signal (S101). However, if a high-band packet is made to disappear, the actually received signal is only a low-band packet. Therefore, the scalable decoder subjects the signal of a low-band packet to an upsample processing. As a result, a signal (S102) where the sampling rate is a wide band and only the low-frequency component is left is generated. From the signal (S103) of the (n?1)-th frame, a compensation signal (S104) is generated by hiding and passed through an HPF to extract only the high-frequency component to generate a signal (S105). The signal (S101) where only the low-frequency component is left is added to the signal (S105) where high-frequency component is left to generate a decoded signal (S106).
    Type: Application
    Filed: April 25, 2005
    Publication date: October 9, 2008
    Applicant: MATSUSHITA ELECTRIC INDUSTRIAL CO., LTD.
    Inventor: Hiroyuki Ehara
  • Publication number: 20080249765
    Abstract: A decoder particularly, but not exclusively, for MPEG-1 layer III data signals, in which recovered spectral coefficients are transformed into time domain signal components, the time domain signal components then being transformed, using a forward transform which is orthogonally modulated with respect to the forward transform that was used at the encoder, to produce a set of second spectral coefficients. In this way, the first and second spectral coefficients may be used as complex-valued spectral coefficients which are amenable to post-processing. In the preferred embodiment, the complex-valued frequency components are, after post-processing, transformed to the time domain using an odd-frequency modulated Discrete Fourier Transform (DFT).
    Type: Application
    Filed: January 13, 2005
    Publication date: October 9, 2008
    Applicant: KONINKLIJKE PHILIPS ELECTRONIC, N.V.
    Inventor: Erik Gosuinus Petrus Schuijers
  • Patent number: 7433815
    Abstract: A variable-rate voice transcoder that transcodes a bitstream representing frames of data encoded according to a first compression standard to a bitstream representing frames of data according to a second compression standard; the second compression standard defines a variable-rate voice codec. The method includes unquantizing a bitstream into a first set of parameters compatible with the first compression standard. The first set of parameters in addition to external control commands are then used to determine a frame class and a rate for the second compression standard. Next, the first set of parameters are transformed into a second set of parameters compatible with the second compression standard according to the frame-classification and rate determination decision. Lastly, the second set of parameters is packed into a bitstream compatible with the second compression standard.
    Type: Grant
    Filed: September 10, 2003
    Date of Patent: October 7, 2008
    Assignee: Dilithium Networks Pty Ltd.
    Inventors: Marwan A. Jabri, Jianwei Wang, Nicola Chong-White
  • Publication number: 20080235007
    Abstract: A method and system for speaker recognition and identification includes transforming features of a speaker utterance in a first condition state to match a second condition state and provide a transformed utterance. A discriminative criterion is used to generate a transform that maps an utterance to obtain a computed result. The discriminative criterion is maximized over a plurality of speakers to obtain a best transform for recognizing speech and/or identifying a speaker under the second condition state. Speech recognition and speaker identity may be determined by employing the best transform for decoding speech to reduce channel mismatch.
    Type: Application
    Filed: June 3, 2008
    Publication date: September 25, 2008
    Inventors: Jiri Navratil, Jagon Pelecanos, Ganesh N. Ramaswamy
  • Publication number: 20080228470
    Abstract: A signal separating device that is inputted with signals formed by mixing plural signals and separates the signals into individual signals includes a signal converting unit that converts input signals into signals in the time-frequency domain and generates observation spectrograms and a signal separating unit that generates separated results from the observation spectrograms generated by the signal converting unit. The signal separating unit interprets the observation spectrograms as observation signals subjected to convolutive mixtures in the time-frequency domain and generates separated results by executing processing for solving convolutive mixtures in the time-frequency domain.
    Type: Application
    Filed: February 19, 2008
    Publication date: September 18, 2008
    Inventor: Atsuo Hiroe
  • Publication number: 20080228471
    Abstract: Methods and apparatus are disclosed for approximating an MDCT coefficient of a block of windowed sinusoid having a defined frequency, the block being multiplied by a window sequence and having a block length and a block index. A finite trigonometric series is employed to approximate the window sequence. A window summation table is pre-computed using the finite trigonometric series and the defined frequency of the sinusoid. A block phase is computed for each block with the defined frequency, the block length and the block index. An MDCT coefficient is approximated by the dot product of a phase vector computed using the block phase with a corresponding row of the window summation table.
    Type: Application
    Filed: March 14, 2007
    Publication date: September 18, 2008
    Applicant: XFRM, INC.
    Inventors: Richard C. Cabot, Matthew S. Ashman
  • Publication number: 20080208571
    Abstract: This application for patent describes an invention toward achieving potentially hundred- to thousand-fold enhancement in the efficiency of the utilization of frequency-bandwidth for digital transmission of speech. This invention is based on the observation that human speech can be assumed to be composed of a series of contiguous fundamental ‘phonic elements’ (“phonoms”) that could be judiciously used toward developing an extremely low bit-rate digital coding of the speech signals. A generic example of a simple implementation of this invention—the basic equipment and associated device(s), methodologies and technologies—for ultr-low bit-rate voice-telecommunications over any transmission channel is also presented.
    Type: Application
    Filed: November 19, 2007
    Publication date: August 28, 2008
    Inventor: Ashok Kumar Sinha
  • Patent number: 7418396
    Abstract: Presented herein is a reduced memory implementation technique of filterbank and block switching for real-time audio applications. Calculation of the pulse code modulated samples from the IMDCT samples and inverse window functions is simplified by exploiting the symetric qualities of the IMDCT function. As a result, memory requirements and operations are significantly reduced.
    Type: Grant
    Filed: October 14, 2003
    Date of Patent: August 26, 2008
    Assignee: Broadcom Corporation
    Inventor: Sunoj Koshy
  • Patent number: 7418394
    Abstract: The time needed to encode an input audio stream is reduced by dividing the stream into two or more overlapping segments of audio information blocks, applying an encoding process to each segment to generate encoded segments in parallel, and appending the encoded segments to form an encoded output signal. The encoding process is responsive to one or more control parameters. Some of the control parameters, which apply to a given block, are calculated from audio information in one or more previous blocks. The length of the overlap between adjacent segments is chosen such that the differences between control parameter values and corresponding reference values at the end of the overlap interval are small enough to avoid producing audible artifacts in a signal that is obtained by decoding the encoded output signal.
    Type: Grant
    Filed: April 28, 2005
    Date of Patent: August 26, 2008
    Assignee: Dolby Laboratories Licensing Corporation
    Inventor: James Stuart Jeremy Cowdery
  • Publication number: 20080195382
    Abstract: An audio enhancement refines a short-time spectrum. The refinement may reduce overlap between audio sub-bands. The sub-bands are transformed into sub-band short-time spectra. A portion of the spectra are time-delayed. The sub-band short-time spectrum and the time-delayed portion are filtered to obtain a refined sub-band short-time spectrum. The refined spectrum improves audio processing.
    Type: Application
    Filed: November 30, 2007
    Publication date: August 14, 2008
    Inventors: Mohamed Krini, Gerhard Uwe Schmidt
  • Publication number: 20080162122
    Abstract: Pitch estimation and classification into voiced, unvoiced and transitional speech were performed by a spectro-temporal auto-correlation technique. A peak picking formula was then employed. A weighing function was then applied to the power spectrum. The harmonics weighted power spectrum underwent mel-scaled band-pass filtering, and the log-energy of the filter's output was discrete cosine transformed to produce cepstral coefficients. A within-filter cubic-root amplitude compression was applied to reduce amplitude variation without compromise of the gain invariance properties.
    Type: Application
    Filed: February 1, 2008
    Publication date: July 3, 2008
    Inventors: Kenneth Rose, Liang Gu
  • Patent number: 7395210
    Abstract: A system and method for lossless and/or progressive to lossless data coding (e.g., audio and/or image) is provided. The system and method employ a multiple factorization reversible transform component that provides quantized coefficients based, at least in part, upon a multiple factorization reversible transform. The multiple factorization reversible transform component can employ an N-point modulated lapped transform in accordance with one aspect of the present invention. The multiple factorization reversible transform component can comprise a modulation stage, a pre-FFT rotation stage, a complex FFT stage and a post-FFT rotation stage.
    Type: Grant
    Filed: November 21, 2002
    Date of Patent: July 1, 2008
    Assignee: Microsoft Corporation
    Inventor: Jin Li
  • Patent number: 7386445
    Abstract: The present invention provides a method for compensating transient effects in transform coding and decoding of a combined speech and audio in electronic devices by using a transform based time-frequency domain codec. The method can combine, e.g., a CELP (code excited linear prediction) type speech codec and a transform type audio codec. The invention describes a compensation method to handle the transient (e.g., from the CELP coding to the transform coding) in transform coding when the number of quantized transform coding coefficients is lower than in the output of the transform.
    Type: Grant
    Filed: January 18, 2005
    Date of Patent: June 10, 2008
    Assignee: Nokia Corporation
    Inventor: Pasi Ojala
  • Patent number: 7383174
    Abstract: A method of generating and assigning identifying tags to sound files according to standardized criteria that result in substantially unique tags while minimizing differences in sound files that are ideally identical. A number of points in the sound file's unique frequency domain are chosen to create a position in N dimensional space, and this position is used to determine similarities and differences among sound files.
    Type: Grant
    Filed: October 3, 2003
    Date of Patent: June 3, 2008
    Inventor: Matthew A. Paulin
  • Publication number: 20080120095
    Abstract: A method and apparatus to encode and/or decode a speech signal and/or an audio signal. The apparatus includes a first domain transforming unit, a frequency domain encoding unit, and a multiplexing unit to encode the speech signal and/or an audio signal. The apparatus includes a demultiplexing unit, a frequency domain decoding unit, and a second domain inverse transformation unit to decode the speech signal and/or the audio signal. The method and apparatus are capable of effectively encoding or decoding all of a speech signal, an audio signal, and a mixed signal of a speech signal and an audio signal, and improving the quality of sound by using a small number of bits.
    Type: Application
    Filed: November 16, 2007
    Publication date: May 22, 2008
    Applicant: Samsung Electronics Co., Ltd.
    Inventors: Eun-mi OH, Chang-youg Son, Ki-hyun Choo, Jung-hoe Kim
  • Patent number: 7369989
    Abstract: A unified filter bank for use in encoding and decoding MPEG-1 audio data, wherein input audio data is encoded into coded audio data and the coded audio data is subsequently decoded into output audio data. The unified filter bank includes a plurality of filters, with each filter of the plurality of filters being a cosine modulation of a prototype filter. The unified filter bank is operational as an analysis filter bank during audio data encoding and as a synthesis filter bank during audio data decoding, wherein the unified filter bank is effective to substantially eliminate the effects of aliasing, phase distortion and amplitude distortion in the output audio data.
    Type: Grant
    Filed: June 8, 2001
    Date of Patent: May 6, 2008
    Assignee: STMicroelectronics Asia Pacific Pte, Ltd.
    Inventors: Mohammed Javed Absar, Sapna George
  • Patent number: 7359854
    Abstract: A solution for improving the perceived sound quality of a decoded acoustic signal is accomplished by extending the spectrum of a received narrow-band acoustic signal (aNB). A wide-band acoustic signal (AWB) is produced by extracting at least one essential attribute (zNB) from the narrow-band acoustic signal (aNB). Parameters, e.g., representing signal energies, with respect to wide-band frequency components outside the spectrum (ANB) of the narrow-band acoustic signal (aNB), are estimated based on the at least one essential attribute (zNB). This estimation involves allocating a parameter value to a wide-band frequency component, based on a corresponding confidence level.
    Type: Grant
    Filed: April 10, 2002
    Date of Patent: April 15, 2008
    Assignee: Telefonaktiebolaget LM Ericsson (publ)
    Inventors: Mattias Nilsson, Bastiaan Kleijn
  • Patent number: 7356465
    Abstract: The invention relates to a computer device comprising a memory 108 for storing audio signals 114, in part pre-recorded, each corresponding to a defined source, by means of spatial position data 116, and a processing module 110 for processing these audio signals in real time as a function of the spatial position data. The processing module 110 allows for the instantaneous power level parameters to be calculated on the basis of audio signals 114, the corresponding sources being defined by instantaneous power level parameters. The processing module 110 comprises a selection module 120 for regrouping certain of the audio signals into a variable number of audio signal groups, and the processing module 110 is capable of calculating spatial position data which is representative of a group of audio signals as a function of the spatial position data 116 and instantaneous power level parameters for each corresponding source.
    Type: Grant
    Filed: December 31, 2003
    Date of Patent: April 8, 2008
    Assignee: Inria Institut National de Recherche en Informatique et en Automatique
    Inventors: Nicolas Tsingos, Emmanuel Gallo, George Drettakis
  • Publication number: 20080082321
    Abstract: In an encoding process, a CPU transforms an audio signal from the real-time domain to the frequency domain, and transforms the signal into spectra consisting of MDCT coefficients. The CPU separates the audio signal into several frequency bands, and performs bit shifting in each band such that the MDCT coefficients can be expressed with pre-configured numbers of bits. The CPU re-quantizes the MDCT coefficients at a precision differing for each band, and transmits the values acquired thereby and shift bit numbers as encoded data. Meanwhile, in a decoding process, a CPU receives encoded data and inverse re-quantizes and inverse bit shifts the data, thereby restoring the MDCT coefficients. Furthermore, the CPU transforms the data from frequency domain to the real-time domain by using the inverse MDCT, and restores and outputs the audio signal.
    Type: Application
    Filed: October 1, 2007
    Publication date: April 3, 2008
    Applicant: Casio Computer Co., Ltd.
    Inventor: Hiroyasu Ide
  • Publication number: 20080071522
    Abstract: A method for the protected transmission of data words involves provision of a first data word (X1), transformation of the first data word (X1) into a sequence comprising at least one second data word (X2) by a first transformation rule (T1), transformation of at least one of the second data words (X2) into a third data word (X3) by a second transformation rule (T2), and checking whether a prescribed relationship exists between the third data word (X3) and a comparison data word (VX).
    Type: Application
    Filed: March 20, 2006
    Publication date: March 20, 2008
    Inventors: Franz Klug, Thomas Kuenemund, Steffen Sonnekalb, Andreas Wenzel
  • Patent number: 7346498
    Abstract: Systems and methods are described for a fast paired method of 1-D cyclic convolution. A method includes calculating a paired transform of a signal, grouping components of the paired transform to form a plurality of splitting-signals, shifting the plurality of splitting signals, multiplying the plurality of splitting signals by a plurality of corresponding Fourier transforms, and calculating an inverse paired transform of the plurality of splitting signals.
    Type: Grant
    Filed: January 31, 2003
    Date of Patent: March 18, 2008
    Assignee: Board of Regents, the University of Texas System
    Inventor: Artyom M. Grigoryan
  • Patent number: 7343284
    Abstract: A method for discriminating noise from signal in a noise-contaminated signal involves decomposing a frame of samples of the signal into decorrelated components, and using a difference between probability distributions of the noise contributions and the signal contributions to identify signal and noise. A Gaussian distribution is used to determine whether the components are only noise whereas a Laplacian distribution is used to determine whether the components contain the signal. Such discrimination may be used in speech enhancement or voice activity detection apparatus.
    Type: Grant
    Filed: July 17, 2003
    Date of Patent: March 11, 2008
    Assignee: Nortel Networks Limited
    Inventors: Saeed Gazor, Mohamed El-Hennawey
  • Patent number: 7337107
    Abstract: Pitch estimation and classification into voiced, unvoiced and transitional speech were performed by a spectro-temporal auto-correlation technique. A peak picking formula was then employed. A weighting function was then applied to the power spectrum. The harmonics weighted power spectrum underwent mel-scaled band-pass filtering, and the log-energy of the filter's output was discrete cosine transformed to produce cepstral coefficients. A within-filter cubic-root amplitude compression was applied to reduce amplitude variation without compromise of the gain invariance properties.
    Type: Grant
    Filed: October 2, 2001
    Date of Patent: February 26, 2008
    Assignee: The Regents of the University of California
    Inventors: Kenneth Rose, Liang Gu
  • Patent number: RE40691
    Abstract: An audio type signal is encoded. The signal is first divided into bands. For each band, a yardstick signal element is selected. The yardstick may be the signal element having the largest magnitude in the band, the second largest, closest to the median magnitude, or having some other selected magnitude. This magnitude is used for various purposes, including assigning bits to the different bands, and for establishing reconstruction levels within a band. The magnitude of non yardstick signal elements is also quantized. The encoded signal is also decoded. Apparatus for both encoding and decoding are also disclosed. The location of the yardstick element within its band may also be recorded and encoded, and used for efficiently allocating bits to non-yardstick signal elements. Split bands may be established, such that each split band includes a yardstick signal element and each full band includes a major and a minor yardstick signal element.
    Type: Grant
    Filed: June 17, 1999
    Date of Patent: March 31, 2009
    Assignee: Massachusetts Institute of Technology
    Inventor: Jae S. Lim