Patents Examined by David D. Knepper
  • Patent number: 6772119
    Abstract: A speaker recognition technique is provided that can operate within the memory and processing constraints of existing portable computing devices. A smaller memory footprint and computational efficiency are achieved using single Gaussian models for each enrolled speaker. During enrollment, features are extracted from one or more enrollment utterances from each enrolled speaker, to generate a target speaker model based on a sample covariance matrix. During a recognition phase, features are extracted from one or more test utterances to generate a test utterance model that is also based on the sample covariance matrix. A sphericity ratio is computed that compares the test utterance model to the target speaker model, as well as a background model. The sphericity ratio indicates how similar test utterance speech is to the speech used when the user was enrolled, as represented by the target speaker model, and how dissimilar the test utterance speech is from the background model.
    Type: Grant
    Filed: December 10, 2002
    Date of Patent: August 3, 2004
    Assignee: International Business Machines Corporation
    Inventors: Upendra V. Chaudhari, Ganesh N. Ramaswamy, Ran Zilca
  • Patent number: 6772113
    Abstract: A data processing apparatus and method in which spectral characteristic information and waveform characteristic information within a time area are detected from inputted audio data and the detected spectral characteristic information and waveform characteristic information are recorded together with information indicating a correspondence relationship with the audio data. As a result, an efficient search can be achieved when searching audio data.
    Type: Grant
    Filed: January 21, 2000
    Date of Patent: August 3, 2004
    Assignee: Sony Corporation
    Inventors: Noriaki Fujita, Yasuhiro Toguri
  • Patent number: 6772117
    Abstract: In a speech recognition method and apparatus, according to the present invention, feature vectors produced by an analysing unit of a speech recognition device are modified for compensating the effects of noise. According to the invention, feature vectors are normalized using a sliding normalization buffer (31). By means of the method according to the invention, the performance of the speech recognition device improves in situations, wherein the speech recognition device's training phase has been carried out in a noise environment that differs from the noise environment of the actual speech recognition phase.
    Type: Grant
    Filed: April 9, 1998
    Date of Patent: August 3, 2004
    Assignee: Nokia Mobile Phones Limited
    Inventors: Kari Laurila, Olli Viikki
  • Patent number: 6772124
    Abstract: The Internet is searched in order to find resources that provide streamable audio such as live Internet broadcasts. The resources are identified based on their file extension and are categorized according to, e.g., the natural language or music style. The user is enabled to browse the collection based on textual or musical input.
    Type: Grant
    Filed: November 5, 2002
    Date of Patent: August 3, 2004
    Assignee: Koninklijke Philips Electronics N.V.
    Inventors: Mark B. Hoffberg, Yevgeniy Eugene Shteyn
  • Patent number: 6772122
    Abstract: The present invention provides a method and apparatus for generating an animated character representation. This is achieved by using marked-up data including both content data and presentation data. The system then uses this information to generate phoneme and viseme data representing the speech to be presented by the character. By providing the presentation data this ensures that at least some variation in character appearance will automatically occur beyond that of the visemes required to make the character appear to speak. This contributes to the character having a far more lifelike appearance.
    Type: Grant
    Filed: February 11, 2003
    Date of Patent: August 3, 2004
    Assignee: Ananova Limited
    Inventors: Jonathan Simon Jowitt, William James Cooper, Andrew Robert Burgess
  • Patent number: 6768979
    Abstract: The noise suppressor utilizes statistical characteristics of the noise signal to attenuate amplitude values of the noisy speech signal that have a probability of containing noise. In one embodiment, the noise suppressor utilizes an attenuation function having a shape determined in part by a noise average and a noise standard deviation. In a further embodiment, the noise suppressor also utilizes an adaptive attenuation coefficient that depends on signal-to-noise conditions in the speech recognition system.
    Type: Grant
    Filed: March 31, 1999
    Date of Patent: July 27, 2004
    Assignees: Sony Corporation, Sony Electronics Inc.
    Inventors: Xavier Menéndez-Pidal, Miyuki Tanaka, Ruxin Chen
  • Patent number: 6763331
    Abstract: In the prior art, it has been difficult to perform proper sentence recognition by using speech recognition or text sentence recognition. The present invention provides a sentence recognition apparatus comprising: a data base for storing a plurality of predetermined standard content word pairs each formed from a plurality of predetermined content words; a speech recognition means of recognizing an input sentence made up of a plurality of words; a content word selection means of selecting content words from among the plurality of words forming the recognized sentence; a judging means of judging whether a content word pair arbitrarily formed from the selected content words matches any one of the standard content word pairs stored in the data base; and an erroneously recognized content word determining means 105 of determining, based on the result of the judgement, an erroneously recognized content word for which the recognition failed from among the selected content words.
    Type: Grant
    Filed: April 9, 2003
    Date of Patent: July 13, 2004
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Yumi Wakita, Kenji Matsui
  • Patent number: 6757651
    Abstract: A system, method and computer program product for performing speech detection. The method first receives a sound signal and determines if the energy value of the sound signal is above a threshold energy value. If the energy level of the signal is above the threshold energy value, the method determines a predictive signal of the received signal, subtracts the predictive signal from the signal, and determines if the result of the subtraction indicates the presence of speech. If it is determined that no presence of speech is indicated, the threshold energy value is set to the energy level of the present received signal. If it is determined that the result of the subtraction indicates the presence of speech, the received signal is sent to a speech recognition engine. The speech recognition engine generates control system commands for controlling one or more system components. The system components are vehicle system components.
    Type: Grant
    Filed: December 17, 2001
    Date of Patent: June 29, 2004
    Assignee: Intellisist, LLC
    Inventor: Julien Rivarol Vergin
  • Patent number: 6748360
    Abstract: It is determined whether audio identifying information generated for an audio content image matches audio identifying information in an audio content database. If the audio identifying information generated for the audio content image matches audio identifying information in the audio content database, at least one product containing or relating to audio content that corresponds to the matching audio identifying information is identified. In one embodiment, the audio content image is received, and the audio identifying information is generated for the audio content image. In another embodiment, the audio identifying information for the audio content image is received. Also provided is a system for selling products.
    Type: Grant
    Filed: February 21, 2002
    Date of Patent: June 8, 2004
    Assignee: International Business Machines Corporation
    Inventors: Michael C. Pitman, Blake G. Fitch, Steven Abrams, Robert S. Germain
  • Patent number: 6741965
    Abstract: A first audio signal is generated from a number of stereo input channels (such as a left and a right channel). A signal level that corresponds to one of the plurality of input channels and another signal level from another of the plurality of input channels are determined. A second audio signal is selected on the basis of the signal levels such that the second audio signal is selected from the group consisting of the one input channel, the other input channel, and a signal generated from the number of input channels that is different than the first audio signal. The first audio signal and the selected second audio signal are separately coded.
    Type: Grant
    Filed: March 12, 1999
    Date of Patent: May 25, 2004
    Assignee: Sony Corporation
    Inventors: Osamu Shimoyoshi, Kyoya Tsutsui
  • Patent number: 6732073
    Abstract: A method and apparatus for enhancing an auditory signal to make sounds, particularly speech sounds, more distinguishable. An input auditory signal is divided into a plurality of spectral channels. An output gain for each channel is derived based on the time varying history of the energy in the channel and, preferably, the time varying history of energy in neighboring channels. The magnitude of the output gain for each channel thus derived is preferably inversely related to the history of energy in the channel. The output gain derived for each channel is applied to the channel to form a plurality of modified spectral channel signals. The plurality of modified spectral channel signals are combined to form an enhanced output auditory signal. The present invention is particularly applicable to electronic hearing aid devices, speech recognition systems, and the like.
    Type: Grant
    Filed: September 7, 2000
    Date of Patent: May 4, 2004
    Assignee: Wisconsin Alumni Research Foundation
    Inventors: Keith R. Kluender, Rick L. Jenison
  • Patent number: 6732077
    Abstract: A speech recognition equipped geographic information recording apparatus and method. In one embodiment, a mobile data terminal has a communication node therein. A geographic mapping system is integral with the mobile data terminal, and is coupled to the communication node. A speech recognition system adapted to receive verbal information is also coupled to the mobile data terminal. The speech recognition system is adapted to receive attribute data verbalized by an operator of the mobile data terminal. Additionally, the speech recognition system is adapted to receive operating commands verbalized by an operator of the mobile data terminal. The communication node of the mobile data terminal includes a transmitter for sending information from the mobile data terminal to a desired location, and a receiver for receiving information from a desired location. In the present embodiment, a real-time communication link exists between the mobile data terminal and the desired location.
    Type: Grant
    Filed: May 28, 1996
    Date of Patent: May 4, 2004
    Assignee: Trimble Navigation Limited
    Inventors: Charles Gilbert, James M. Janky, Charles N. Branch, Mark E. Nichols
  • Patent number: 6728682
    Abstract: Audio associated with a video program, such as an audio track or live or recorded commentary, may be analyzed to recognize or detect one or more predetermined sound patterns, such as words or sound effects. The recognized or detected sound patterns may be used to enhance video processing, by controlling video capture and/or delivery during editing, or to facilitate selection of clips or splice points during editing.
    Type: Grant
    Filed: October 25, 2001
    Date of Patent: April 27, 2004
    Assignee: Avid Technology, Inc.
    Inventor: Peter Fasciano
  • Patent number: 6721712
    Abstract: In an exemplary conversion scheme, a frame of a first speech signal comprising a plurality of frames encoded at a plurality of first rates, including a first non-speech rate, is received. The rate of the received frame is determined, and if the received frame is encoded at the first non-speech rate, then the received frame is re-encoded at either a second or third non-speech rate to generate a frame of a second speech signal. Moreover, a system for converting a speech signal comprises a receiver for receiving a frame of a first speech signal and a processor capable of determining the encoding rate of the received frame and re-encoding the received frame at either a second or third non-speech rate if the received frame was originally encoded at a first non-speech rate.
    Type: Grant
    Filed: January 24, 2002
    Date of Patent: April 13, 2004
    Assignee: Mindspeed Technologies, Inc.
    Inventors: Adil Benyassine, Eyal Shlomot, Huan-Yu Su
  • Patent number: 6718300
    Abstract: A method and apparatus are disclosed for reducing aliasing between neighboring subbands in cascaded filter banks. An alias reduction filter bank is included to reduce the aliasing components between different subbands. Generally, the magnitude response and phase of the alias reduction filter bank is similar to the magnitude response of the synthesis filter bank of the first stage filter bank. The alias reduction filter bank filters and adds the signals from a set of M2 subbands from the M1 subbands of the first stage analysis filter bank. A higher frequency resolution is obtained after the alias reduction stage by a following analysis filter bank. The signals of these subbands are first fed into an alias reduction filter bank to reduce the aliasing.
    Type: Grant
    Filed: June 2, 2000
    Date of Patent: April 6, 2004
    Assignee: Agere Systems Inc.
    Inventor: Gerald Dietrich Schuller
  • Patent number: 6718295
    Abstract: A volume control unit 3 obtains, from a scale factor updating unit 2, data about how much the playback scale factor of sub-bands adjacent to a sub-band to be emphasized has been reduced, and increases the analog audio signal output level of a volume control 77 to an extent corresponding to the playback scale factor necessary for restoring the playback scale factor before the reduction. As a result, the signal level of the desired sub-band (i.e., sub-band to be emphasized) is increased to obtain a sufficient sense of emphasis of the desired sub-band.
    Type: Grant
    Filed: November 5, 1998
    Date of Patent: April 6, 2004
    Assignee: NEC Corporation
    Inventor: Satoshi Hasegawa
  • Patent number: 6708146
    Abstract: A method and apparatus for classifying signals into a multiplicity of signal classes which employs discriminant functions of low-complexity discriminant variables that are computed directly from the passband signal. The method can be applied to the problem of classifying voiceband data (VBD), facsimile (FAX), native binary data, and speech on a 64 Kbps digital channel. In a hybrid two stage classification system, the first stage employs linear discriminant functions to make classification decisions into a smaller number of possible preliminary signal classes. The decisions of the first stage are then refined by a second stage that uses nonlinear discriminant functions such as quadratic or pseudo-quadratic functions. The second stage of a hybrid classifier then assigns the signal into a larger number of possible classes than does the first stage of the classifier alone.
    Type: Grant
    Filed: April 30, 1999
    Date of Patent: March 16, 2004
    Assignee: Telecommunications Research Laboratories
    Inventors: Jeremy S. Sewall, Bruce F. Cockburn, Deepak P. Sarda
  • Patent number: 6697777
    Abstract: A system and method for generating a user interface for a speech recognition program module which provides user feedback by inserting a place mark or bar into the text of the document at the insertion point. The place mark indicates to the user that the speech recognition program module has recorded the dictated speech string and is in the process of translating the speech string. The place mark consists of a string of characters, such as a string of ellipses. The place mark has a length that is proportional in length to the expected length of the text that the user has dictated. The length of the place mark is based on the elapsed time of the speech string dictated by the user. When the speech recognition engine has completed the translation of the speech string into text, the final text replaces the place mark in the document. The place mark may be highlighted in different colors or the characters rendered in different colors to indicate to the user the volume level of the speech string being translated.
    Type: Grant
    Filed: June 28, 2000
    Date of Patent: February 24, 2004
    Assignee: Microsoft Corporation
    Inventors: Sebastian Sai Dun Ho, Jeffrey C. Reynar
  • Patent number: 6684186
    Abstract: In an illustrative embodiment, a speaker model is generated for each of a number of speakers from which speech samples have been obtained. Each speaker model contains a collection of distributions of audio feature data derived from the speech sample of the associated speaker. A hierarchical speaker model tree is created by merging similar speaker models on a layer by layer basis. Each time two or more speaker models are merged, a corresponding parent speaker model is created in the next higher layer of the tree. The tree is useful in applications such as speaker verification and speaker identification.
    Type: Grant
    Filed: January 26, 1999
    Date of Patent: January 27, 2004
    Assignee: International Business Machines Corporation
    Inventors: Homayoon S. M. Beigi, Stephane H. Maes, Jeffrey S. Sorensen
  • Patent number: 6678656
    Abstract: A voice sample characterization front-end suitable for use in a distributed speech recognition context. A digitized voice sample 31 is split between a low frequency path 32 and a high frequency path 33. Both paths are used to determine spectral content suitable for use when determining speech recognition parameters (such as cepstral coefficients) that characterize the speech sample for recognition purposes. The low frequency path 32 has a thorough noise reduction capability. In one embodiment, the results of this noise reduction are used by the high frequency path 33 to aid in de-noising without requiring the same level of resource capacity as used by the low frequency path 32.
    Type: Grant
    Filed: January 30, 2002
    Date of Patent: January 13, 2004
    Assignee: Motorola, Inc.
    Inventors: Dusan Macho, Yan Ming Cheng