Time Patents (Class 704/211)
  • Patent number: 6842735
    Abstract: A data-compressed audio waveform is temporally modified without requiring complete decompression of the audio signal. Packets of compressed audio data are first unpacked, to remove scaling that was applied in the formation of the packets. The unpacked data is then temporally modified, using one of a number of different approaches. This modification takes place while the audio information remains in a data-compressed format. New packets are then assembled from the modified data, to produce a data-compressed output stream that can be subsequently processed in a conventional manner to reproduce the desired sound. The assembly of the new packets employs a technique for inferring an auditory model from the original packets, to requantize the data in the output packets.
    Type: Grant
    Filed: September 13, 2000
    Date of Patent: January 11, 2005
    Assignee: Interval Research Corporation
    Inventors: Michele M. Covell, Malcolm Slaney, Arthur Rothstein
  • Publication number: 20040260541
    Abstract: An exemplary multi-channel speech processor comprises a controller capable of interfacing with a plurality of channels, and at least one signal processing unit (SPU) coupled to the controller, where the multi-channel speech processor has a maximum execution time for processing all frames, one channel at a time, by processing a single frame from each of the plurality of channels. The signal processing unit encodes each of the single frames from each of the plurality of channels, one channel at a time, to generate encoded frames until the maximum execution time elapses or is about to elapse. The controller also transmits a pre-determined frame for each of the plurality of channels not processed during the encoding step, due to the maximum execution time elapsing or being about to elapse, such that the predetermined frame causes a decoder which receives the predetermined frame to generate a frame erase frame.
    Type: Application
    Filed: June 17, 2003
    Publication date: December 23, 2004
    Applicant: Conexant Systems, Inc.
    Inventors: Carlo Murgia, Jeffrey D. Klein, Huan-Yu Su
  • Patent number: 6829578
    Abstract: Robust acoustic tone features are achieved first by the introduction of on-line, look-ahead trace back of the fundamental frequency (F0) contour with adaptive pruning, this fundamental frequency serves as the signal preprocessing front-end. The F0 contour is subsequently decomposed into lexical tone effect, phrase intonation effect, and random effect by means of time-variant, weighted moving average (MA) filter in conjunction with weighted (placing more emphasis on vowels) least squares of the F0 contour. The intonation effect is removed by subtraction of the F0 contour under superposition assumption. The acoustic tone features are defined as two parts. First, is the coefficients of the second order weighted regression of the de-intonation of the F0 contour over neighbouring frames. The second part deals with the degree of the periodicity of the signal, which are the coefficients of the second order regression of the auto-correlation.
    Type: Grant
    Filed: July 9, 2001
    Date of Patent: December 7, 2004
    Assignee: Koninklijke Philips Electronics, N.V.
    Inventors: Chang-Han Huang, Frank Torsten Bernd Seide
  • Patent number: 6826525
    Abstract: A method for detecting a transient in a discrete-time audio signal is performed completely in the time domain and includes the step of segmenting the discrete-time audio signal so as to generate consecutive segments of the same length with unfiltered discrete-time audio signals xs(T−1). The discrete-time audio signal in a current segment is subsequently filtered. Then either the energy of the filtered discrete-time audio signal in the current segment can be compared with the energy of the filtered discrete-time audio signal in a preceding segment or a current relationship between the energy of the filtered discrete-time audio signal in the current segment and the energy of the unfiltered discrete-time audio signal in the current segment can be formed and this current relationship compared with a preceding corresponding relationship. On the basis of the one and/or the other of these comparisons it is detected whether a transient is present in the discrete-time audio signal.
    Type: Grant
    Filed: June 25, 2002
    Date of Patent: November 30, 2004
    Assignee: Fraunhofer-Gesellschaft zur Foerderung der Angewandten Forschung E.V.
    Inventors: Johannes Hilpert, Jürgen Herre, Bernhard Grill, Rainer Buchta, Karlheinz Brandenburg, Heinz Gerhäuser
  • Publication number: 20040236572
    Abstract: The invention concerns audio signal processing, comprising: a first processing of an audio source signal, using at least a mathematical transform applied on first sequences of samples obtained by applying first segmentation windows on the audio source signal; and a second audio processing applied on second sequences of samples obtained by applying second segmentation windows on the signal delivered by the first step; the two successive first windows and/or the two successive second windows overlapping, the overlaps being such that the segmentations are synchronous.
    Type: Application
    Filed: May 24, 2004
    Publication date: November 25, 2004
    Inventors: Franck Bietrix, Hubert Cadusseau
  • Publication number: 20040220805
    Abstract: In order to obtain an integer transform, which provides integer output values, the TDAC function of a MDCT is explicitly carried out in the time domain before the forward transform. In overlapping windows, this results in a Givens rotation which may be represented by lifting matrices, wherein time-discrete sampled values of an audio signal may at first be summed up on a pair-wise basis to build a vector so as to be sequentially provided with a lifting matrix. In accordance with the invention, after each multiplication of a vector by a lifting matrix, a rounding step is carried out such that, on the output-side, only integers will result. By transforming the windowed integer sampled value with an integer transform, a spectral representation with integer spectral values may be obtained. The inverse mapping with an inverse rotation matrix and corresponding inverse lifting matrices results in an exact reconstruction.
    Type: Application
    Filed: June 25, 2004
    Publication date: November 4, 2004
    Inventors: Ralf Geiger, Thomas Sporer, Jurgen Koller, Karlheinz Brandenburg, Jurgen Herre
  • Publication number: 20040215449
    Abstract: A system and method related to a new approach to speech recognition that reacts to concepts conveyed through speech. In its fullest implementation, the system and method shifts the balance of power in speech recognition from straight sound recognition and statistical models to a more powerful and complete approach determining and addressing conveyed concepts. This is done by using a probabilistically unbiased multi-phoneme recognition process, followed by a phoneme stream analysis process that builds the list of candidate words derived from recognized phonemes, followed by a permutation analysis process that produces sequences of candidate words with high potential of being syntactically valid, and finally, by processing targeted syntactic sequences in a conceptual analysis process to generate the utterance's conceptual representation that can be used to produce an adequate response.
    Type: Application
    Filed: June 30, 2003
    Publication date: October 28, 2004
    Inventor: Philippe Roy
  • Publication number: 20040204932
    Abstract: A method and device within a speech processing unit (SPU) for reducing scheduling delay between the SPU and a radio network node. Within the SPU, data packets are processed in a plurality of time slots that are subunits of frames. The device receives timing information from the node that identifies a beginning and an ending of processing periods in the node. The timing information is utilized to select a time slot within each frame as a target time slot. The target time slot has a position within each frame such that the scheduling delay between the ending of a processing period in the node and the beginning of the target time slot is minimized. Data packets for a particular channel are assigned to the target time slot to reduce the scheduling delay. The phase of the frame is then adjusted by erasing superfluous data packets.
    Type: Application
    Filed: April 9, 2003
    Publication date: October 14, 2004
    Applicant: Telefonaktiebolaget LM Ericsson (publ)
    Inventors: Eckhard Delfs, Emilian Ertel
  • Patent number: 6801888
    Abstract: An embodiment of the present invention is a method for generating a listener-interest-filtered work for an audio or audio-visual work, which method includes steps of: (a) generating one or more average speed contours for one or more audio or audio-visual works for one or more categories of users; (b) converting the one or more average speed contours to one or more conceptual speed association data structures; and forming a listener-interest-filtered conceptual speed association data structure from the one or more conceptual speed association data structures.
    Type: Grant
    Filed: February 25, 2002
    Date of Patent: October 5, 2004
    Assignee: Enounce Incorporated
    Inventor: Donald J. Hejna, Jr.
  • Publication number: 20040186709
    Abstract: A system and method of synthesizing a plurality of voices are described. The system has a processing unit, a register, a latch unit, a timer and a digital/analog converter. The processing unit decodes voice data into decoded voices and the decoded voices are then transmitted to the register. A plurality of different sampling signals of the timer are transmitted to the latch unit to trigger periodically the latch unit and the latch unit sequentially fetches the decoded voices stored in the register to prevent effectively jitters when the voice data are synthesized.
    Type: Application
    Filed: March 17, 2003
    Publication date: September 23, 2004
    Inventor: Chao-Wen Chi
  • Patent number: 6789058
    Abstract: A multi-channel speech processor for encoding speech in a packet network environment is disclosed. In one illustrative aspect, a complexity resource manager (CRM) is executed by a controller or processor. The CRM manages the level of complexity of encoding which is used by a signal processing unit (SPU) to convert the speech signal into packet data. In general, the CRM determines the level of complexity of encoding based on a calculated complexity budget, where the complexity budget is determined based on the time required to process prior speech signal channels and the time available to process the remaining channels. In this way, the CRM is able to control the overall complexity of the speech processor through its ability to signal the SPU to encode speech signal in a complexity reduced mode based on the calculated complexity budget under certain conditions.
    Type: Grant
    Filed: October 15, 2002
    Date of Patent: September 7, 2004
    Assignee: Mindspeed Technologies, Inc.
    Inventors: Eyal Shlomot, Huan-Yu Su
  • Publication number: 20040162722
    Abstract: A voice communications device (4) and speech processing method are described. A speech signal, generated by a microphone (41) in response to speech input (2) in to the microphone (41) by a user (1), has a proportion extracted therefrom. The speech signal is transmitted to an appliction apparatus that may be integral or remote. A speech quality value is evaluated for the extracted speech signal. An indication of the quality of the speech signal, based on the speech quality value, is indicated to the user. Thus a direct indication of the current quality of a received speech signal, in a form that is easily interpreted by a non-expert user of the device, is provided, thereby providing the use with an opportunity to improve the sppech quality. Examples of appliction apparatus include hands-free telephones and automatic speech recognition systems.
    Type: Application
    Filed: November 17, 2003
    Publication date: August 19, 2004
    Inventors: James Alexander Rex, David John Benjamin Pearce
  • Publication number: 20040148159
    Abstract: A method for time aligning audio signal, wherein one signal has been derived from the other or both have been derived from another signal, comprises deriving reduced-information characterizations of the audio signals, auditory scene analysis. The time offset of one characterization with respect to the other characterization is calculated and the temporal relationship of the audio signals with respect to each other is modified in response to the time offset such that the audio signals are coicident with each other. These principles may also be applied to a method for time aligning a video signal and an audio signal that will be subjected to differential time offsets.
    Type: Application
    Filed: November 20, 2003
    Publication date: July 29, 2004
    Inventors: Brett G. Crockett, Michael J. Smithers
  • Patent number: 6766290
    Abstract: An audio/video system may generate audio data for a user. The user in turn may provide voice commands to the audio/video system. The audio generated by the system may be adaptively delayed, amplitude adjusted, and subjected to sampling interval shifting before subtracting it from the composite signal received from a microphone. As a result, the audio generated by the system can be subtracted from a signal representing both the audio generated and the spoken command to facilitate the recognition of the spoken command. In this way, a voice responsive audio/video system may be implemented.
    Type: Grant
    Filed: March 30, 2001
    Date of Patent: July 20, 2004
    Assignee: Intel Corporation
    Inventor: Iwan R. Grau
  • Patent number: 6766300
    Abstract: A method and apparatus for transient detection and time-scaling an audio signal detects transients and scales only intervals located between transients to avoid artifacts. In one embodiment, the transient detection process compares frequency characteristic energy between succeeding windows of the audio signal and calculates values of an energy curve where the energy increases. Transients are detected at maxima of the energy curve.
    Type: Grant
    Filed: August 20, 1999
    Date of Patent: July 20, 2004
    Assignee: Creative Technology Ltd.
    Inventor: Jean Laroche
  • Publication number: 20040138877
    Abstract: A receiving unit receives a speech signal. A signal processing unit processes the speech signal. A memory stores environment information related to time. A time measurement unit measures a time. A control unit retrieves environment information related to the time from the memory, and controls the processing of the signal processing unit in accordance with the retrieved environment information.
    Type: Application
    Filed: December 23, 2003
    Publication date: July 15, 2004
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventor: Masahide Ariu
  • Publication number: 20040088161
    Abstract: To address the need for a method and apparatus for preventing speech dropout in a low-latency text-to-speech system, a method and apparatus for preventing such speech dropout is described herein. In accordance with the preferred embodiment of the present invention the rate of speech is allowed to vary based on an amount of data existing within the buffer. More particularly, as the buffer empties, the rate of speech slows, reducing the chances that the output buffer will empty.
    Type: Application
    Filed: October 30, 2002
    Publication date: May 6, 2004
    Inventors: Gerald Corrigan, Steven Albrecht
  • Publication number: 20040064309
    Abstract: This invention relates to a mobile communicator which is arranged to save power by efficiently controlling a power amplifier by selecting an adequate speech coding rate. An inventive mobile communicator comprises a plurality of encoding means for encoding a speech signal with different encoding rates, respectively; encoding rate inputting means for accepting an input of encoding rate selected by a terminal user; and encoding rate deciding means for deciding an encoding rate applied for encoding the speech signal based on the encoding rate inputted via the encoding rate inputting means and for selectively switching the plurality of encoding means corresponding to this encoding rate.
    Type: Application
    Filed: September 26, 2003
    Publication date: April 1, 2004
    Applicant: Mitsubishi Denki Kabushiki Kaisha
    Inventor: Atsushi Kosai
  • Publication number: 20040064308
    Abstract: A system includes a frame reception device to receive a stream of frames. An energy determination device determines a first energy of a first frame preceding a gap, and a second energy of a second frame, and the second frame is received after the first frame. A candidate testing and blending device determines at least one of first portion of the first frame and a second portion of the second frame to insert in place of the gap, based on the first energy trajectory and the second energy trajectory, and on a determination of an optimal blend point, and blends with at least one of the first frame and the second frame.
    Type: Application
    Filed: September 30, 2002
    Publication date: April 1, 2004
    Applicant: Intel Corporation
    Inventor: Michael E. Deisher
  • Publication number: 20040054528
    Abstract: When M observed signals xi(k) are sequentially inputted into a noise removing part 12 via M channels 11a of an input part 11 in time series, processing is sequentially performed on the observed signals xi(k) by singular value decomposition units 12a of N stages cascaded to one another. Specifically, the singular value deposition unit 12a of each stage separates M input signals into a signal subspace and a noise subspace by a singular value decomposition and extracts M output signals, which are signals over a time region, by orthonormal projection of the M input signals onto the separated signal subspace.
    Type: Application
    Filed: May 1, 2003
    Publication date: March 18, 2004
    Inventors: Tetsuya Hoya, Andrzej Cichocki, Takahiro Murakami, Yoshihisa Ishida
  • Publication number: 20040030547
    Abstract: A system for transmitting audio signals over a telecommunications link generates the signals as two or more alternative feeds, for example at different data rates. The two feeds are encoded using coding methods having a frame structure with different frame lengths. To facilitate switching between the two, the input signal is notionally divided into temporal portions and each is coded by taking it, plus enough of the next (or preceding) portion to make up a whole number of frames, and encoding it, whereby the encoded portions overlap—at least for one of the feeds. The overlap is lost upon decoding by discarding duplicate material.
    Type: Application
    Filed: May 30, 2003
    Publication date: February 12, 2004
    Inventors: Anthony R Leaning, Richard J Whiting
  • Patent number: 6678651
    Abstract: A speech-coding device includes a fixed codebook, an adaptive codebook, a short-term enhancement circuit, and a summing circuit. The short-term enhancement circuit connects an output of the fixed codebook to a summing circuit. The summing circuit adds an adaptive codebook contribution to a fixed codebook contribution. The short-term enhancement circuit can also be connected to a synthesis filter to emphasize the spectral formants in an encoder and a decoder.
    Type: Grant
    Filed: January 25, 2001
    Date of Patent: January 13, 2004
    Assignee: Mindspeed Technologies, Inc.
    Inventor: Yang Gao
  • Patent number: 6678650
    Abstract: An apparatus and method for converting the speed of reproducing an input acoustic signal. The apparatus and method can efficiently delay the output signal without using an output-data storage section of a large storage capacity even if the input acoustic signal has a high sampling frequency. In the apparatus, the speech-speed converting section generates an acoustic frame signal s6 which has been converted in speech speed and which has a predetermined length. The frame-signal encoding section encodes the acoustic frame signal s6 generated by the speech-speed converting section, thereby generating coded data s10 that is smaller than the data represented by the acoustic frame signal s6. The coded data storage section stores the coded data s10. The frame-signal decoding section decodes the coded data s11 read from the storage section, generating an output acoustic signal s9 having a particular length.
    Type: Grant
    Filed: March 9, 2001
    Date of Patent: January 13, 2004
    Assignee: Sony Corporation
    Inventor: Akira Inoue
  • Patent number: 6675140
    Abstract: The signal processing method includes the steps of: wavelet-transforming an input signal in a computer; and extracting features of the signal by Mellin-transforming the output of the wavelet transform step in synchrony with the input signal in a computer.
    Type: Grant
    Filed: January 28, 2000
    Date of Patent: January 6, 2004
    Assignee: Seiko Epson Corporation
    Inventors: Toshio Irino, Roy D. Patterson
  • Patent number: 6664913
    Abstract: In a method of lossless processing of an integer value signal in a prediction filter which includes a quantiser, a numerator of the prediction filter is implemented prior to the quantiser and a denominator of the prediction filter is implemented recursively around the quantiser to reduce the peak data rate of an output signal. In the lossless processor, at each sample instant, an input to the quantiser is jointly responsive to a first sample value of a signal input to the prediction filter, a second sample value of a signal input to the prediction filter at a previous sample instant, and an output value of the quantiser at a previous sample incident. In a preferred embodiment, the prediction filter includes noise shaping for affecting the output of the quantiser.
    Type: Grant
    Filed: May 17, 1999
    Date of Patent: December 16, 2003
    Assignee: Dolby Laboratories Licensing Corporation
    Inventors: Peter G. Craven, Michael A. Gerzon
  • Publication number: 20030229490
    Abstract: Time-scaled, sound signals (i.e. sounds output at differing speeds) are generated by mixing weighted time-and frequency-domain processed signals, the former signal generally representing speech-based signals while the latter representing music-based signals. The weights applied to each type of signal may be determined by a scaling factor, which in turn is related to the desired speed at which a listener desires to hear a sound signal. In one example of the invention, only stationary signal portions of an input sound signal are used to generate time-scaled processed signals. An adaptive frame-size may also be used to pre-process the separate signals prior to being weighted, which at least decreases the amount of unwanted reverberative sound qualities in a resulting sound signal. Together, techniques envisioned by the present invention produce improved, speed adjusted sound signals.
    Type: Application
    Filed: June 7, 2002
    Publication date: December 11, 2003
    Inventor: Walter Etter
  • Patent number: 6658382
    Abstract: An input signal is time-frequency transformed, then the frequency-domain coefficients are divided into coefficient segments of about 100 Hz width to generate a sequence of coefficient segments, and the sequence of coefficient segments is split into subbands each consisting of plural coefficient segments. A threshold value is determined based on the intensity of each coefficient segment in each subband. The intensity of each coefficient segment is compared with the threshold value, and the coefficient segments are classified into low- and high-intensity groups. The coefficient segments are quantized for each group, or they are flattened respectively and then quantized through recombination.
    Type: Grant
    Filed: March 23, 2000
    Date of Patent: December 2, 2003
    Assignee: Nippon Telegraph and Telephone Corporation
    Inventors: Naoki Iwakami, Takehiro Moriya, Akio Jin, Kazuaki Chikira, Takeshi Mori
  • Patent number: 6647366
    Abstract: A method and a system are provided for controlling the coding rates of a multimode coding system with respect to a sequence of input audio signal frames. The method eliminates or minimizes the overflow and underflow of a bit-stream buffer maintained by the coding system for temporarily recording bit-stream data prior to transmission or storage.
    Type: Grant
    Filed: December 28, 2001
    Date of Patent: November 11, 2003
    Assignee: Microsoft Corporation
    Inventors: Tian Wang, Kazuhito Koishida, Vladimir Cuperman
  • Publication number: 20030182127
    Abstract: The invention is a low speed speech encoding method based on Internet protocol (IP), comprising: determining speech characteristic parameters in TN duration, determining an optimized frame length T for successive speech data processing according to the characteristic parameters, making compressed encoding of the speech data in every T, assembling packet of the encoded bits with TCP or UDP, again assembling packet of the assembled bits with IP and finally outputting to the channel. The method uses single frame, variable length frame, intra-frame adaptive low speed speech encoding method, which is benefit of reducing bits rate and raising transmission efficiency. The method takes optimized length encoded frame as a unit to break the IP datagram, so it raises encoding and decoding quality of a speech data greatly. Informal test shows that the method can raise MOS (mean opinion score) value 0.1 to 0.2.
    Type: Application
    Filed: February 19, 2003
    Publication date: September 25, 2003
    Applicant: Huawei Technologies Co., Ltd.
    Inventors: Shengxi Pan, Yingtao Li
  • Patent number: 6611797
    Abstract: An input speech signal to an input terminal is supplied to a speech synthesizer section through a speech analyzer section and frequency parameter quantizer section to form a synthesis filter, and the input speech signal is expressed by quantized LPC coefficients representing the characteristics of the synthesis filter and an excitation signal for exciting the synthesis filter. In this case, in a pulse excitation section, a pulse position selector selects pulse position candidates from the integer pulse positions and non-integer pulse positions stored in a pulse position codebook, and an integer position pulse generator and non-integer position pulse generator respectively generate integer position pulses set at sampling points of the excitation signal and non-integer position pulses set at positions located between sampling points. These pulses are synthesized into a pulse train serving as a source of an excitation signal.
    Type: Grant
    Filed: January 21, 2000
    Date of Patent: August 26, 2003
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Tadashi Amada, Katsumi Tsuchiya
  • Publication number: 20030158729
    Abstract: Methods and systems for digitally generating sound from phase and amplitude information of a narrow bandwidth signal, such as a narrow bandwidth locator signal. Phase-derivative information is determined from the phase information. The bandwidth of the phase-derivative information is spread out, or stretched, over a wider bandwidth, so that the frequency variations will be more perceptible to users. The result is combined with an audio band carrier frequency, the result of which controls an oscillator. The oscillator output is combined with the amplitude information to generate an analog audio signal that is modulated with the amplitude information and the phase-derivative information. The amplitude information wider bandwidth phase-derivative information are used to modulate an audio carrier in both frequency and amplitude.
    Type: Application
    Filed: February 15, 2002
    Publication date: August 21, 2003
    Applicant: Radiodetection Limited
    Inventors: John Mark Royle, James Ian King, Richard David Pearson
  • Patent number: 6594307
    Abstract: A device for determining quality of an output signal to be generated by a signal processing circuit, including a radio link, with respect to a reference signal. The device has a first and second series circuits for receiving the output signal and the reference signal, respectively. The device generates an objective quality signal through a combining circuit coupled to the two series circuits, wherein a scaling circuit is disposed between the two series circuits for scaling at least one series circuit signal. A poor correlation between the objective quality signal and a subjective quality signal to be assessed by human observers can be considerably improved by disposing a discounting arrangement inside the combining circuit, and coupling the discounting arrangement to the scaling circuit so as to receive a comparison signal and discount the comparison signal while generating the objective quality signal.
    Type: Grant
    Filed: August 30, 1999
    Date of Patent: July 15, 2003
    Assignee: Koninklijke KPN N.V.
    Inventor: John Gerard Beerends
  • Publication number: 20030125936
    Abstract: According to the method according to the invention for determining a characteristic data set (“fingerprint”) for a sound signal, the sound signal itself is searched through for characteritsic locations, and these characetristics locations are used for producing a characteritsic data set. For this the frequency spectrum is evaluated over a time interval, subdivided into frequency bands and averaged over each frequency band into a value.
    Type: Application
    Filed: November 25, 2002
    Publication date: July 3, 2003
    Inventor: Christoph Dworzak
  • Patent number: 6587817
    Abstract: A method which comprises forming a first noise reduction frame (18) containing speech samples; which is windowed by a first window function. For the windowed frame, noise reduction is performed for producing a second noise reduction frame (19; 45). A speech coding frame (44) to be formed comprises noise-reduced samples of at least two successive second noise reduction frames (45, 46), partly summed with one another. On the basis of said speech coding frame (44), a set of speech coding parameters pj are determined. A lookahead part (42) of the speech coding frame is at least partly formed of a first slope (41), the first slope (10, 41) comprising a set of most recent noise-reduced samples of the second noise reduction frame, not summed with the samples of any other second noise reduction frame. The method reduces the delay caused by speech coding and noise reduction.
    Type: Grant
    Filed: January 7, 2000
    Date of Patent: July 1, 2003
    Assignee: Nokia Mobile Phones Ltd.
    Inventors: Antti Vähätalo, Erkki Paajanen
  • Patent number: 6581032
    Abstract: A speech compression system capable of encoding a speech signal into a bitstream for subsequent decoding to generate synthesized speech is disclosed. The speech compression system optimizes the bandwidth consumed by the bitstream by balancing the desired average bit rate with the perceptual quality of the reconstructed speech. The speech compression system comprises a full-rate codec, a half-rate codec, a quarter-rate codec and an eighth-rate codec. The codecs are selectively activated based on a rate selection. In addition, the full and half-rate codecs are selectively activated based on a type classification. Each codec is selectively activated to encode and decode the speech signals at different bit rates emphasizing different aspects of the speech signal to enhance overall quality of the synthesized speech.
    Type: Grant
    Filed: September 15, 2000
    Date of Patent: June 17, 2003
    Assignee: Conexant Systems, Inc.
    Inventors: Yang Gao, Adil Benyassine, Jes Thyssen, Eyal Shlomot, Huan-yu Su
  • Publication number: 20030078769
    Abstract: A method and system are provided for synthesizing a number of corrupted frames output from a decoder including one or more predictive filters. The corrupted frames are representative of one segment of a decoded signal (sq(n)) output from the decoder. The method comprises determining a first preliminary time lag (ppfe1) based upon examining a predetermined number (K) of samples of another segment of the decoded signal and determining a scaling factor (ptfe) associated with the examined number (K) of samples when the first preliminary time lag (ppfe1) is determined. The method also comprises extrapolating one or more replacement frames based upon the first preliminary time lag (ppfe1) and the scaling factor (ptfe).
    Type: Application
    Filed: August 19, 2002
    Publication date: April 24, 2003
    Applicant: Broadcom Corporation
    Inventor: Juin-Hwey Chen
  • Patent number: 6549886
    Abstract: A lost packet recovery device, method and computer program for a VoIP system. Lost packets containing voice information are replaced using time domain interpolation techniques. A first embodiment relies on time domain harmonic scaling to interpolate a replacement frame, using the frames that come before and after the missing frame. A second embodiment replicates a frame immediately prior to the missing frame, with an energy reduction function applied that reduces the energy level of the data samples in the frame. This replicated frame replaces the missing frame. Duplicating the prior frame and reducing its energy levels are repeated until a further frame is detected. An energy restoration function is then applied to the next available frame to gradually increase its energy level and provide for a smooth transition. Using these techniques, missing frames of voice data may be replaced to mask the effects of missing frames to a listener.
    Type: Grant
    Filed: November 3, 1999
    Date of Patent: April 15, 2003
    Assignee: Nokia IP Inc.
    Inventor: Momir Partalo
  • Publication number: 20030055631
    Abstract: A method and system are provided for processing an extrapolated signal including a number of consecutive replacement frames. The method comprises attenuating a portion of the extrapolated signal when the extrapolated signal reaches a predetermined duration. The attenuating produces an output signal having an attenuated portion, wherein the output signal includes the number of consecutive replacement frames. Each of the consecutive frames within the attenuated portion is attenuated by applying an attenuation window with a starting magnitude value of approximately 1 and including a unique ending magnitude. The unique ending magnitudes decrease over time.
    Type: Application
    Filed: June 28, 2002
    Publication date: March 20, 2003
    Applicant: Broadcom Corporation
    Inventor: Juin-Hwey Chen
  • Publication number: 20030046065
    Abstract: Methods and systems for handling speech recognition processing in effectively real-time, via the internet, in order that users do not experience noticeable delays from the start of an exercise until they receive responsive feedback. A user uses a client to access the internet and a server supporting speech recognition processing, e.g., for language learning activities. The user inputs speech to the client, which transmits the user speech to the server in approximate real-time. The server evaluates the user speech in context of the current speech recognition exercise being executed, and provides responsive feedback to the client, again, in approximate real-time, with minimum latency delays. The client upon receiving responsive feedback from the server, displays, or otherwise provides, the feedback to the user.
    Type: Application
    Filed: July 19, 2002
    Publication date: March 6, 2003
    Applicant: Global English Corporation
    Inventor: Christopher S. Jochumson
  • Publication number: 20030040903
    Abstract: A start of an input speech signal is detected during presentation of an output audio signal and an input start time, relative to the output audio signal, is determined. The input start time is then provided for use in responding to the input speech signal. In another embodiment, the output audio signal has a corresponding identification. When the input speech signal is detected during presentation of the output audio signal, the identification of the output audio signal is provided for use in responding to the input speech signal. Information signals comprising data and/or control signals are provided in response to at least the contextual information provided, i.e., the input start time and/or the identification of the output audio signal. In this manner, the present invention accurately establishes a context of an input speech signal relative to an output audio signal regardless of the delay characteristics of the underlying communication system.
    Type: Application
    Filed: October 5, 1999
    Publication date: February 27, 2003
    Inventor: IRA A. GERSON
  • Patent number: 6526377
    Abstract: A system and terminal for facilitating a “virtual presence” allows users on a communication network to simply begin speaking through other users. A system immediately detects the destination party's name, and begins routing the audio signal to a particular destination without any noticeable call set-up. Additionally, the system performs pitch corrected speed control in order to allow the detection and processing of a speech pattern without causing delay to an end user.
    Type: Grant
    Filed: November 2, 1999
    Date of Patent: February 25, 2003
    Assignee: Intel Corporation
    Inventor: Howard Bubb
  • Patent number: 6519558
    Abstract: A signal processing method and apparatus is disclosed, which is capable of reproducing a coded audio signal by decoding it while shifting its pitch, and reproducing, from an original sound, a sound having a sufficiently higher pitch than the original sound with few operations and less cost for the decoder used in the signal processing apparatus, and an information serving medium for serving a program which implements the signal decoding and pitch shifting. In one embodiment, the method of providing a signal processing method for decoding a coded signal for reading, includes setting a pitch for the coded signal, decoding only a low frequency portion of the coded signal according to the set pitch, and shifting the pitch of the decoded read signal based on the set pitch.
    Type: Grant
    Filed: May 19, 2000
    Date of Patent: February 11, 2003
    Assignee: Sony Corporation
    Inventor: Kyoya Tsutsui
  • Publication number: 20030009325
    Abstract: A method for signal controlled switching between audio coding schemes includes receiving input audio signals, classifying a first set of the input audio signals as speech or non-speech signals, coding the speech signals using a time domain coding scheme, and coding the nonspeech signals using a transform coding scheme. A multicode coder has an audio signal input and a switch for receiving the audio signal inputs, the switch having a time domain encoder, a transform encoder, and a signal classifier for classifying the audio signals generally as speech or non-speech, the signal classifier directing speech audio signals to the time domain encoder and non-speech audio signals to the transform encoder. A multicode decoder is also provided.
    Type: Application
    Filed: January 22, 1999
    Publication date: January 9, 2003
    Inventors: RAIF KIRCHHERR, JOACHIM STEGMANN
  • Patent number: 6505153
    Abstract: Disclosed is a five-step process for producing closed captions for a television program, subtitles for a movie or other uses for time-aligned transcripts. An operator transcribes the audio track while listening to the recorded material. The system helps him/her to work efficiently and produce precisely aligned captions. The first step consists of identifying the portions of the input audio that contain spoken text. Only the spoken parts are further processed by the invention system. The other parts may be used to generate non-spoken captions. The second step controls the rate of speech depending on how fast the operator types. While the operator types, the third module records the time the words were typed in. This provides a rough time alignment for the transcribed text. Then the fourth module realigns precisely the transcribed text on the audio track. A final module segments the transcribed text into captions, based on acoustic clues and natural language constraints.
    Type: Grant
    Filed: May 22, 2000
    Date of Patent: January 7, 2003
    Assignee: Compaq Information Technologies Group, L.P.
    Inventors: Jean-Manuel Van Thong, Michael Swain, Beth Logan
  • Patent number: 6499009
    Abstract: A speech quality estimation technique that employs an arbitrary, speech quality estimation algorithm. The speech quality estimation technique analyzes a reference speech signal and a test speech signal, and based on this analysis, identifies the level of continuous delay variation, if any, and the location of and size of any intermittent delay variations along the test signal. The reference speech signal and/or the test speech signal are adjusted to account for continuous delay variation and intermittent delay variations, such that the reference speech signal and the test signal are similarly scaled with respect to the time domain. The reference speech signal and the test speech signal are then compared for the purpose of generating a speech quality estimation. The resulting speech quality estimation is then adjusted based on the level of continuous delay variation and any intermittent delay variations.
    Type: Grant
    Filed: February 25, 2000
    Date of Patent: December 24, 2002
    Assignee: Telefonaktiebolaget LM Ericsson
    Inventors: Jonas Lundberg, Arne Steinarson, Anders Karlsson
  • Patent number: 6490553
    Abstract: The disclosed method and apparatus controls the rate of playback of audio data corresponding to a stream of speech. Using speech recognition, the rate of speech of the audio data is determined. The determined rate of speech is compared to a target rate. Based on the comparison, the playback rate is adjusted, i.e. increased or decreased, to match the target rate.
    Type: Grant
    Filed: February 12, 2001
    Date of Patent: December 3, 2002
    Assignee: Compaq Information Technologies Group, L.P.
    Inventors: Jean-Manuel Van Thong, Davis Pan
  • Patent number: 6484137
    Abstract: An audio reproducing apparatus comprises: audio decoding means for decoding an input audio signal frame by frame; data expanding/compressing means for subjecting data in a decoded frame to time-scale modification process; a frame sequence table which contains a sequence determined according to a given speed rate in which respective frames are expanded/compressed; frame counting means for counting the number of frames of the input audio signal; and data expansion/compression control means for instructing the dalta expanding/compressing means to subject the frame to one of time-scale compression process, time-scale expansion process, and process without time-scale modification process, with reference to the frame sequence table based on a count value output from the frame counting means, the data expanding/compressing means subjecting the audio signal to time-scale modification process in accordance with an instruction signal from the data expansion/compression control means.
    Type: Grant
    Filed: October 29, 1998
    Date of Patent: November 19, 2002
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Hirotsugu Taniguchi, Masayuki Misaki, Junichi Tagawa, Michio Matsumoto
  • Patent number: 6484140
    Abstract: An apparatus and a method for encoding an input signal on the time base through orthogonal transform involves removing the correlation of signal waveform based on parameters obtained by linear predictive coding (LPC) analysis and pitch analysis of the input signal on the time base prior to the orthogonal transform. A normalization circuit section removes the correlation of the signal waveform and takes out the residue by an LPC inverse filter and pitch inverse filter and sends the residue to an orthogonal transform circuit section. The LPC parameters and the pitch parameters are sent to a bit allocation calculation circuit. A coefficient quantization section quantizes the coefficients from the orthogonal transform circuit section according to the number of allocated bits from the bit allocation calculation section.
    Type: Grant
    Filed: August 23, 2001
    Date of Patent: November 19, 2002
    Assignee: Sony Corporation
    Inventors: Jun Matsumoto, Masayuki Nishiguchi, Kenichi Makino
  • Publication number: 20020169602
    Abstract: Various methods and apparatus are described for implementing effective echo suppression in a wide variety of telephony system architectures. These methods and apparatus include broadband and multi-band techniques for speech detection, estimation of near-end transmission path attenuation, and estimation of far-end transmission path attenuation and delay.
    Type: Application
    Filed: December 3, 2001
    Publication date: November 14, 2002
    Applicant: Octiv, Inc.
    Inventor: Richard Hodges
  • Patent number: 6473732
    Abstract: A signal analyzer (303) and method thereof using short-time signal analysis, preferably recursive, to obtain a time variant feature from a signal, the signal analyzer including a signal sampler (401) with an input register (403) for storing a sequence of samples of the signal, a multiplier (405) for weighting in accordance with, alternatively, a half-sine, cosine, 2nd order complex pole, or 3rd order complex pole function the sequence of samples to provide weighted samples of the signal, and a combiner (407) for combining the weighted samples to provide a signal feature estimate, such as a signal average or frequency dependent energy estimate, for the signal.
    Type: Grant
    Filed: October 18, 1995
    Date of Patent: October 29, 2002
    Assignee: Motorola, Inc.
    Inventor: Weizhong Chen