Patents Examined by David D. Knepper
-
Patent number: 6772119Abstract: A speaker recognition technique is provided that can operate within the memory and processing constraints of existing portable computing devices. A smaller memory footprint and computational efficiency are achieved using single Gaussian models for each enrolled speaker. During enrollment, features are extracted from one or more enrollment utterances from each enrolled speaker, to generate a target speaker model based on a sample covariance matrix. During a recognition phase, features are extracted from one or more test utterances to generate a test utterance model that is also based on the sample covariance matrix. A sphericity ratio is computed that compares the test utterance model to the target speaker model, as well as a background model. The sphericity ratio indicates how similar test utterance speech is to the speech used when the user was enrolled, as represented by the target speaker model, and how dissimilar the test utterance speech is from the background model.Type: GrantFiled: December 10, 2002Date of Patent: August 3, 2004Assignee: International Business Machines CorporationInventors: Upendra V. Chaudhari, Ganesh N. Ramaswamy, Ran Zilca
-
Patent number: 6772113Abstract: A data processing apparatus and method in which spectral characteristic information and waveform characteristic information within a time area are detected from inputted audio data and the detected spectral characteristic information and waveform characteristic information are recorded together with information indicating a correspondence relationship with the audio data. As a result, an efficient search can be achieved when searching audio data.Type: GrantFiled: January 21, 2000Date of Patent: August 3, 2004Assignee: Sony CorporationInventors: Noriaki Fujita, Yasuhiro Toguri
-
Patent number: 6772117Abstract: In a speech recognition method and apparatus, according to the present invention, feature vectors produced by an analysing unit of a speech recognition device are modified for compensating the effects of noise. According to the invention, feature vectors are normalized using a sliding normalization buffer (31). By means of the method according to the invention, the performance of the speech recognition device improves in situations, wherein the speech recognition device's training phase has been carried out in a noise environment that differs from the noise environment of the actual speech recognition phase.Type: GrantFiled: April 9, 1998Date of Patent: August 3, 2004Assignee: Nokia Mobile Phones LimitedInventors: Kari Laurila, Olli Viikki
-
Patent number: 6772124Abstract: The Internet is searched in order to find resources that provide streamable audio such as live Internet broadcasts. The resources are identified based on their file extension and are categorized according to, e.g., the natural language or music style. The user is enabled to browse the collection based on textual or musical input.Type: GrantFiled: November 5, 2002Date of Patent: August 3, 2004Assignee: Koninklijke Philips Electronics N.V.Inventors: Mark B. Hoffberg, Yevgeniy Eugene Shteyn
-
Patent number: 6772122Abstract: The present invention provides a method and apparatus for generating an animated character representation. This is achieved by using marked-up data including both content data and presentation data. The system then uses this information to generate phoneme and viseme data representing the speech to be presented by the character. By providing the presentation data this ensures that at least some variation in character appearance will automatically occur beyond that of the visemes required to make the character appear to speak. This contributes to the character having a far more lifelike appearance.Type: GrantFiled: February 11, 2003Date of Patent: August 3, 2004Assignee: Ananova LimitedInventors: Jonathan Simon Jowitt, William James Cooper, Andrew Robert Burgess
-
Patent number: 6768979Abstract: The noise suppressor utilizes statistical characteristics of the noise signal to attenuate amplitude values of the noisy speech signal that have a probability of containing noise. In one embodiment, the noise suppressor utilizes an attenuation function having a shape determined in part by a noise average and a noise standard deviation. In a further embodiment, the noise suppressor also utilizes an adaptive attenuation coefficient that depends on signal-to-noise conditions in the speech recognition system.Type: GrantFiled: March 31, 1999Date of Patent: July 27, 2004Assignees: Sony Corporation, Sony Electronics Inc.Inventors: Xavier Menéndez-Pidal, Miyuki Tanaka, Ruxin Chen
-
Patent number: 6763331Abstract: In the prior art, it has been difficult to perform proper sentence recognition by using speech recognition or text sentence recognition. The present invention provides a sentence recognition apparatus comprising: a data base for storing a plurality of predetermined standard content word pairs each formed from a plurality of predetermined content words; a speech recognition means of recognizing an input sentence made up of a plurality of words; a content word selection means of selecting content words from among the plurality of words forming the recognized sentence; a judging means of judging whether a content word pair arbitrarily formed from the selected content words matches any one of the standard content word pairs stored in the data base; and an erroneously recognized content word determining means 105 of determining, based on the result of the judgement, an erroneously recognized content word for which the recognition failed from among the selected content words.Type: GrantFiled: April 9, 2003Date of Patent: July 13, 2004Assignee: Matsushita Electric Industrial Co., Ltd.Inventors: Yumi Wakita, Kenji Matsui
-
Patent number: 6757651Abstract: A system, method and computer program product for performing speech detection. The method first receives a sound signal and determines if the energy value of the sound signal is above a threshold energy value. If the energy level of the signal is above the threshold energy value, the method determines a predictive signal of the received signal, subtracts the predictive signal from the signal, and determines if the result of the subtraction indicates the presence of speech. If it is determined that no presence of speech is indicated, the threshold energy value is set to the energy level of the present received signal. If it is determined that the result of the subtraction indicates the presence of speech, the received signal is sent to a speech recognition engine. The speech recognition engine generates control system commands for controlling one or more system components. The system components are vehicle system components.Type: GrantFiled: December 17, 2001Date of Patent: June 29, 2004Assignee: Intellisist, LLCInventor: Julien Rivarol Vergin
-
Patent number: 6748360Abstract: It is determined whether audio identifying information generated for an audio content image matches audio identifying information in an audio content database. If the audio identifying information generated for the audio content image matches audio identifying information in the audio content database, at least one product containing or relating to audio content that corresponds to the matching audio identifying information is identified. In one embodiment, the audio content image is received, and the audio identifying information is generated for the audio content image. In another embodiment, the audio identifying information for the audio content image is received. Also provided is a system for selling products.Type: GrantFiled: February 21, 2002Date of Patent: June 8, 2004Assignee: International Business Machines CorporationInventors: Michael C. Pitman, Blake G. Fitch, Steven Abrams, Robert S. Germain
-
Patent number: 6741965Abstract: A first audio signal is generated from a number of stereo input channels (such as a left and a right channel). A signal level that corresponds to one of the plurality of input channels and another signal level from another of the plurality of input channels are determined. A second audio signal is selected on the basis of the signal levels such that the second audio signal is selected from the group consisting of the one input channel, the other input channel, and a signal generated from the number of input channels that is different than the first audio signal. The first audio signal and the selected second audio signal are separately coded.Type: GrantFiled: March 12, 1999Date of Patent: May 25, 2004Assignee: Sony CorporationInventors: Osamu Shimoyoshi, Kyoya Tsutsui
-
Patent number: 6732073Abstract: A method and apparatus for enhancing an auditory signal to make sounds, particularly speech sounds, more distinguishable. An input auditory signal is divided into a plurality of spectral channels. An output gain for each channel is derived based on the time varying history of the energy in the channel and, preferably, the time varying history of energy in neighboring channels. The magnitude of the output gain for each channel thus derived is preferably inversely related to the history of energy in the channel. The output gain derived for each channel is applied to the channel to form a plurality of modified spectral channel signals. The plurality of modified spectral channel signals are combined to form an enhanced output auditory signal. The present invention is particularly applicable to electronic hearing aid devices, speech recognition systems, and the like.Type: GrantFiled: September 7, 2000Date of Patent: May 4, 2004Assignee: Wisconsin Alumni Research FoundationInventors: Keith R. Kluender, Rick L. Jenison
-
Patent number: 6732077Abstract: A speech recognition equipped geographic information recording apparatus and method. In one embodiment, a mobile data terminal has a communication node therein. A geographic mapping system is integral with the mobile data terminal, and is coupled to the communication node. A speech recognition system adapted to receive verbal information is also coupled to the mobile data terminal. The speech recognition system is adapted to receive attribute data verbalized by an operator of the mobile data terminal. Additionally, the speech recognition system is adapted to receive operating commands verbalized by an operator of the mobile data terminal. The communication node of the mobile data terminal includes a transmitter for sending information from the mobile data terminal to a desired location, and a receiver for receiving information from a desired location. In the present embodiment, a real-time communication link exists between the mobile data terminal and the desired location.Type: GrantFiled: May 28, 1996Date of Patent: May 4, 2004Assignee: Trimble Navigation LimitedInventors: Charles Gilbert, James M. Janky, Charles N. Branch, Mark E. Nichols
-
Patent number: 6728682Abstract: Audio associated with a video program, such as an audio track or live or recorded commentary, may be analyzed to recognize or detect one or more predetermined sound patterns, such as words or sound effects. The recognized or detected sound patterns may be used to enhance video processing, by controlling video capture and/or delivery during editing, or to facilitate selection of clips or splice points during editing.Type: GrantFiled: October 25, 2001Date of Patent: April 27, 2004Assignee: Avid Technology, Inc.Inventor: Peter Fasciano
-
Patent number: 6721712Abstract: In an exemplary conversion scheme, a frame of a first speech signal comprising a plurality of frames encoded at a plurality of first rates, including a first non-speech rate, is received. The rate of the received frame is determined, and if the received frame is encoded at the first non-speech rate, then the received frame is re-encoded at either a second or third non-speech rate to generate a frame of a second speech signal. Moreover, a system for converting a speech signal comprises a receiver for receiving a frame of a first speech signal and a processor capable of determining the encoding rate of the received frame and re-encoding the received frame at either a second or third non-speech rate if the received frame was originally encoded at a first non-speech rate.Type: GrantFiled: January 24, 2002Date of Patent: April 13, 2004Assignee: Mindspeed Technologies, Inc.Inventors: Adil Benyassine, Eyal Shlomot, Huan-Yu Su
-
Patent number: 6718300Abstract: A method and apparatus are disclosed for reducing aliasing between neighboring subbands in cascaded filter banks. An alias reduction filter bank is included to reduce the aliasing components between different subbands. Generally, the magnitude response and phase of the alias reduction filter bank is similar to the magnitude response of the synthesis filter bank of the first stage filter bank. The alias reduction filter bank filters and adds the signals from a set of M2 subbands from the M1 subbands of the first stage analysis filter bank. A higher frequency resolution is obtained after the alias reduction stage by a following analysis filter bank. The signals of these subbands are first fed into an alias reduction filter bank to reduce the aliasing.Type: GrantFiled: June 2, 2000Date of Patent: April 6, 2004Assignee: Agere Systems Inc.Inventor: Gerald Dietrich Schuller
-
Patent number: 6718295Abstract: A volume control unit 3 obtains, from a scale factor updating unit 2, data about how much the playback scale factor of sub-bands adjacent to a sub-band to be emphasized has been reduced, and increases the analog audio signal output level of a volume control 77 to an extent corresponding to the playback scale factor necessary for restoring the playback scale factor before the reduction. As a result, the signal level of the desired sub-band (i.e., sub-band to be emphasized) is increased to obtain a sufficient sense of emphasis of the desired sub-band.Type: GrantFiled: November 5, 1998Date of Patent: April 6, 2004Assignee: NEC CorporationInventor: Satoshi Hasegawa
-
Patent number: 6708146Abstract: A method and apparatus for classifying signals into a multiplicity of signal classes which employs discriminant functions of low-complexity discriminant variables that are computed directly from the passband signal. The method can be applied to the problem of classifying voiceband data (VBD), facsimile (FAX), native binary data, and speech on a 64 Kbps digital channel. In a hybrid two stage classification system, the first stage employs linear discriminant functions to make classification decisions into a smaller number of possible preliminary signal classes. The decisions of the first stage are then refined by a second stage that uses nonlinear discriminant functions such as quadratic or pseudo-quadratic functions. The second stage of a hybrid classifier then assigns the signal into a larger number of possible classes than does the first stage of the classifier alone.Type: GrantFiled: April 30, 1999Date of Patent: March 16, 2004Assignee: Telecommunications Research LaboratoriesInventors: Jeremy S. Sewall, Bruce F. Cockburn, Deepak P. Sarda
-
Patent number: 6697777Abstract: A system and method for generating a user interface for a speech recognition program module which provides user feedback by inserting a place mark or bar into the text of the document at the insertion point. The place mark indicates to the user that the speech recognition program module has recorded the dictated speech string and is in the process of translating the speech string. The place mark consists of a string of characters, such as a string of ellipses. The place mark has a length that is proportional in length to the expected length of the text that the user has dictated. The length of the place mark is based on the elapsed time of the speech string dictated by the user. When the speech recognition engine has completed the translation of the speech string into text, the final text replaces the place mark in the document. The place mark may be highlighted in different colors or the characters rendered in different colors to indicate to the user the volume level of the speech string being translated.Type: GrantFiled: June 28, 2000Date of Patent: February 24, 2004Assignee: Microsoft CorporationInventors: Sebastian Sai Dun Ho, Jeffrey C. Reynar
-
Patent number: 6684186Abstract: In an illustrative embodiment, a speaker model is generated for each of a number of speakers from which speech samples have been obtained. Each speaker model contains a collection of distributions of audio feature data derived from the speech sample of the associated speaker. A hierarchical speaker model tree is created by merging similar speaker models on a layer by layer basis. Each time two or more speaker models are merged, a corresponding parent speaker model is created in the next higher layer of the tree. The tree is useful in applications such as speaker verification and speaker identification.Type: GrantFiled: January 26, 1999Date of Patent: January 27, 2004Assignee: International Business Machines CorporationInventors: Homayoon S. M. Beigi, Stephane H. Maes, Jeffrey S. Sorensen
-
Patent number: 6678656Abstract: A voice sample characterization front-end suitable for use in a distributed speech recognition context. A digitized voice sample 31 is split between a low frequency path 32 and a high frequency path 33. Both paths are used to determine spectral content suitable for use when determining speech recognition parameters (such as cepstral coefficients) that characterize the speech sample for recognition purposes. The low frequency path 32 has a thorough noise reduction capability. In one embodiment, the results of this noise reduction are used by the high frequency path 33 to aid in de-noising without requiring the same level of resource capacity as used by the low frequency path 32.Type: GrantFiled: January 30, 2002Date of Patent: January 13, 2004Assignee: Motorola, Inc.Inventors: Dusan Macho, Yan Ming Cheng