Distance Patents (Class 704/238)

Apparatus and methods for speech recognition

Patent number: 7664639

Abstract: A telephone dialing speech recognition method includes determining a location associated with a cellular telephone from geographic indications provided by the cellular telephone, and selects associated search information as a function of the location. Speech based dialers operating in a car environment often have difficulty determining the digits said since some digits have similar sounding names in certain languages. To improve recognition performance, constraints are added to the recognition process, based on the natural constraints of the dialing process. The method utilizes the selected associated search information when recognizing the incoming speech signal. For speech dialing, if the user defines a location where the phone is used, then the “numbering plan” of that country may be used to constrain certain digits. Such constraining of the speech recognizer significantly improves the recognition results.

Type: Grant

Filed: January 14, 2004

Date of Patent: February 16, 2010

Assignee: Art Advanced Recognition Technologies, Inc.

Inventors: Ran Mochary, Eran Dukas
MIMO wireless precoding system robust to power imbalance

Patent number: 7629902

Abstract: The present invention relates to methods and apparatus for preventing power imbalance in a multiple input multiple output (MIMO) wireless precoding system. According to one aspect of the present invention, a codebook is constructed with a first subset of codewords that are constant modulus matrices, and a second subset of codewords that are non-constant modulus matrices. A mapping scheme is established between the first subset of codewords and the second subset of codewords. When a unit of user equipment feeds back a first codeword that is a non-constant modulus matrix, the Node-B may replace the first codewords with a second codeword that is selected from the first subset of codewords and that corresponds to the first codeword in accordance with the mapping scheme.

Type: Grant

Filed: May 30, 2008

Date of Patent: December 8, 2009

Assignee: Samsung Electronics Co., Ltd.

Inventors: Jianzhong Zhang, Cornelius Van Rensburg, Farooq Khan, Bruno Clerckx, Juho Lee, Zhouyue Pi
Distributional similarity-based models for query correction

Patent number: 7590626

Abstract: A distributional similarity between a word of a search query and a term of a candidate word sequences is used to determine an error model probability that describes the probability of the search query given the candidate word sequence. The error model probability is used to determine a probability of the candidate word sequence given the search query. The probability of the candidate word sequence given the search query is used to select a candidate word sequence as a corrected word sequence for the search query. Distributional similarity is also used to build features that are applied in maximum entropy model to compute the probability of the candidate word sequence given the search query.

Type: Grant

Filed: October 30, 2006

Date of Patent: September 15, 2009

Assignee: Microsoft Corporation

Inventors: Mu Li, Ming Zhou
Speaker clustering and adaptation method based on the HMM model variation information and its apparatus for speech recognition

Patent number: 7590537

Abstract: A speech recognition method and apparatus perform speaker clustering and speaker adaptation using average model variation information over speakers while analyzing the quantity variation amount and the directional variation amount. In the speaker clustering method, a speaker group model variation is generated based on the model variation between a speaker-independent model and a training speaker ML model. In the speaker adaptation method, the model in which the model variation between a test speaker ML model and a speaker group ML model to which the test speaker belongs which is most similar to a training speaker group model variation is found, and speaker adaptation is performed on the found model. Herein, the model variation in the speaker clustering and the speaker adaptation are calculated while analyzing both the quantity variation amount and the directional variation amount. The present invention may be applied to any speaker adaptation algorithm of MLLR and MAP.

Type: Grant

Filed: December 27, 2004

Date of Patent: September 15, 2009

Assignee: Samsung Electronics Co., Ltd.

Inventors: Namhoon Kim, Injeong Choi, Yoonkyung Song
Speech recognition method and apparatus

Patent number: 7565290

Abstract: A speech recognition apparatus includes a word dictionary having recognition target words, a first acoustic model which expresses a reference pattern of a speech unit by one or more states, a second acoustic model which is lower in precision than said first acoustic model, selection means for selecting one of said first acoustic model and said second acoustic model on the basis of a parameter associated with a state of interest, and likelihood calculation means for calculating a likelihood of an acoustic feature parameter with respect to said acoustic model selected by said selection means.

Type: Grant

Filed: June 24, 2005

Date of Patent: July 21, 2009

Assignee: Canon Kabushiki Kaisha

Inventors: Hideo Kuboyama, Toshiaki Fukada, Yasuhiro Komori
Method and device for characterizing a signal and method and device for producing an indexed signal

Patent number: 7478045

Abstract: In a method for characterizing a signal representing an audio content a measure is determined for a tonality of the signal, whereupon a statement is made about the audio content of the signal on the basis of the measure for the tonality of the signal. The measure for the tonality is derived from a quotient whose numerator is the mean of the summed values of spectral components of the signal exponentiated with a first power and whose denominator is the mean of the summed values of spectral components exponentiated with a second power, the first and second powers differing from each other. The measure for the tonality of the signal for the content analysis is robust in relation to a signal distortion, due e.g. to MP3 coding, and has a high correlation with the content of the analyzed signal.

Type: Grant

Filed: July 15, 2002

Date of Patent: January 13, 2009

Assignee: M2ANY GmbH

Inventors: Eric Allamanche, Jürgen Herre, Oliver Hellmuth, Thorsten Kastner
Speaker recognition using local models

Patent number: 7475013

Abstract: A system and method for voice recognition is disclosed. The system enrolls speakers using an enrollment voice samples and identification information. An extraction module characterizes enrollment voice samples with high-dimensional feature vectors or speaker data points. A data structuring module organizes data points into a high-dimensional data structure, such as a kd-tree, in which similarity between data points dictates a distance, such as a Euclidean distance, a Minkowski distance, or a Manhattan distance. The system recognizes a speaker using an unidentified voice sample. A data querying module searches the data structure to generate a subset of approximate nearest neighbors based on an extracted high-dimensional feature vector. A data modeling module uses Parzen windows to estimate a probability density function representing how closely characteristics of the unidentified speaker match enrolled speakers, in real-time, without extensive training data or parametric assumptions about data distribution.

Type: Grant

Filed: March 26, 2004

Date of Patent: January 6, 2009

Assignee: Honda Motor Co., Ltd.

Inventor: Ryan Rifkin
Speech Recognition Circuit and Method

Publication number: 20080255839

Abstract: A speech recognition circuit comprising a circuit for providing state identifiers which identify states corresponding to nodes or groups of adjacent nodes in a lexical tree, and for providing scores corresponding to said state identifiers, the lexical tree comprising a model of words; a memory structure for receiving and storing state identifiers identified by a node identifier identifying a node or group of adjacent nodes, said memory structure being adapted to allow lookup to identify particular state identifiers, reading of the scores corresponding to the state identifiers, and writing back of the scores to the memory structure after modification of the scores; an accumulator for receiving score updates corresponding to particular state identifiers from a score update generating circuit which generates the score updates using audio input, for receiving scores from the memory structure, and for modifying said scores by adding said score updates to said scores; and a selector circuit for selecting at least o

Type: Application

Filed: September 14, 2005

Publication date: October 16, 2008

Applicant: ZENTIAN LIMITED

Inventors: Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
Audio duplicate detector

Patent number: 7421305

Abstract: The present invention relates to a system and methodology to facilitate automatic management and pruning of audio files residing in a database. Audio fingerprinting is a powerful tool for identifying streaming or file-based audio, using a database of fingerprints. Duplicate detection identifies duplicate audio clips in a set, even if the clips differ in compression quality or duration. The present invention can be provided as a self-contained application that it does not require an external database of fingerprints. Also, a user interface provides various options for managing and pruning the audio files.

Type: Grant

Filed: February 24, 2004

Date of Patent: September 2, 2008

Assignee: Microsoft Corporation

Inventors: Christopher J. C. Burges, John C. Platt, Daniel Plastina, Erin L. Renshaw
Apparatus and method for recognizing biological named entity from biological literature based on UMLS

Patent number: 7403891

Abstract: The present invention relates to an apparatus and method for recognizing biological named entity from biological literature based on united medical language system (UMLS). The apparatus and the method receives metathesaurus from the UMLS, constructs a concept name database, a single name database and a category keyterm database, which are language resources to be used recognize a named entity, receives each concept name stored in the concept name database, extracts features of each of the concept names by using data stored in the single name database and the category keyterm database, constructs a rule database by creating rules used to recognize the named entity and filtering the rules by using the extracted features, receives a biological literature, extracts nouns and noun phrases that are candidate named entities, applies the rules stored in the rule database to the nouns and the noun phrases, and recognizes the named entities.

Type: Grant

Filed: February 13, 2004

Date of Patent: July 22, 2008

Assignee: Electronics and Telecommunications Research Institute

Inventors: Soo Jun Park, Tae Hyun Kim, Hyun Sook Lee, Hyun Chul Jang, Seon Hee Park
Method and apparatus for differential compression of speaker models

Patent number: 7379868

Abstract: A differential compression technique is disclosed for compression individual speaker models, such as Gaussian mixture models, by computing a delta model from the difference between an individual speaker model and a baseline model. Further compression may be applied to the delta model to reduce the large storage requirements generally attributed to speaker models.

Type: Grant

Filed: January 2, 2003

Date of Patent: May 27, 2008

Assignee: Massachusetts Institute of Technology

Inventor: Douglas A. Reynolds
System and method of pattern recognition in very high-dimensional space

Patent number: 7369993

Abstract: A system and method of recognizing speech comprises an audio receiving element and a computer server. The audio receiving element and the computer server perform the process steps of the method. The method involves training a stored set of phonemes by converting them into n-dimensional space, where n is a relatively large number. Once the stored phonemes are converted, they are transformed using single value decomposition to conform the data generally into a hypersphere. The received phonemes from the audio-receiving element are also converted into n-dimensional space and transformed using single value decomposition to conform the data into a hypersphere. The method compares the transformed received phoneme to each transformed stored phoneme by comparing a first distance from a center of the hypersphere to a point associated with the transformed received phoneme and a second distance from the center of the hypersphere to a point associated with the respective transformed stored phoneme.

Type: Grant

Filed: December 29, 2006

Date of Patent: May 6, 2008

Assignee: AT&T Corp.

Inventor: Bishnu Saroop Atal
Method for searching data in at least two databases

Patent number: 7363222

Abstract: A method and database system is disclosed for searching data in at least two databases (Dn), particularly for searching telephone directories or the like. To allow simultaneous access to two or more databases by means of speech recognition in order to perform a search therein as in a single database, a search term is input by speech via a voice controlled user interface (28) connected to a database primary control apparatus (26) and comprises speech recognition front end means (8, 9) for processing a sound sequence of a search term input by speech to obtain a comparable speech pattern (X) thereof. By means of speech recognition back end means (6) associated with databases (D1-D6), the comparable speech pattern (X) is compared with corresponding speech patterns (An,i) of database entries (En,i) to determine for each of the at least two databases (Dn) at least that database entry (En,j) the speech pattern (An,j) which best matches the comparable speech pattern (X) of the search term.

Type: Grant

Filed: June 24, 2002

Date of Patent: April 22, 2008

Assignee: Nokia Corporation

Inventor: Michael Josenhans
Method and apparatus for performing observation probability calculations

Patent number: 7356466

Abstract: A method and apparatus for calculating an observation probability includes a first operation unit that subtracts a mean of a first plurality of parameters of an input voice signal from a second parameter of an input voice signal, and multiplies the subtraction result to obtain a first output. The first output is squared and accumulated N times in a second operation unit to obtain a second output. A third operation unit subtracts a given weighted value from the second output to obtain a third output, and a comparator stores the third output for a comparator stores the third output in order to extract L outputs therefrom, and stores the L extracted outputs based on an order of magnitude of the extracted L outputs.

Type: Grant

Filed: June 20, 2003

Date of Patent: April 8, 2008

Assignee: Samsung Electronics Co., Ltd.

Inventors: Byung-Ho Min, Tae-Su Kim, Hyun-Woo Park, Ho-Rang Jang, Keun-Cheol Hong, Sung-Jae Kim
Method, device and computer program for recognition of a handwritten character

Patent number: 7349576

Abstract: A method for recognition of a handwritten character comprises the steps of determining a plurality of position features defining the handwritten character, and comparing the handwritten character to reference characters stored in a database in order to find the closest matching reference character. The step of comparing comprises the steps of computing a difference between one of the plurality of position features of the handwritten character and a corresponding position feature of one of the reference characters, determining, by lookup in a predefined table, a distance measure based on the computed difference and determining a distance measure for each of the plurality of position features of the handwritten character, and computing a cost function based on the determined distance measures. A device and a computer program for implementing the method are also described.

Type: Grant

Filed: January 11, 2002

Date of Patent: March 25, 2008

Assignee: Zi Decuma AB

Inventor: Anders Holtsberg
Automatic identification of sound recordings

Patent number: 7328153

Abstract: Copies of original sound recordings are identified by extracting features from the copy, creating a vector of those features, and comparing that vector against a database of vectors. Identification can be performed for copies of sound recordings that have been subjected to compression and other manipulation such that they are not exact replicas of the original. Computational efficiency permits many hundreds of queries to be serviced at the same time. The vectors may be less than 100 bytes, so that many millions of vectors can be stored on a portable device.

Type: Grant

Filed: July 22, 2002

Date of Patent: February 5, 2008

Assignee: Gracenote, Inc.

Inventors: Maxwell Wells, Vidya Venkatachalam, Luca Cazzanti, Kwan Fai Cheung, Navdeep Dhillon, Somsak Sukittanon
Speech recognition apparatus and speech recognition method

Patent number: 7321853

Abstract: The present invention relates to a speech recognition apparatus and a speech recognition method for speech recognition with improved accuracy. A distance calculator 47 determines the distance from a microphone 21 to a user uttering. Data indicating the determined distance is supplied to a speech recognition unit 41B. The speech recognition unit 41B has plural sets of acoustic models produced from speech data obtained by capturing speeches uttered at various distances. From those sets of acoustic models, the speech recognition unit 41B selects a set of acoustic models produced from speech data uttered at a distance closest to the distance determined by the distance calculator 47, and the speech recognition unit 41B performs speech recognition using the selected set of acoustic models.

Type: Grant

Filed: February 24, 2006

Date of Patent: January 22, 2008

Assignee: Sony Corporation

Inventor: Yasuharu Asano
Apparatus for performing speaker identification and speaker searching in speech or sound image data, and method thereof

Patent number: 7315819

Abstract: A process of identifying a speaker in coded speech data and a process of searching for the speaker are efficiently performed with fewer computations and with a smaller storage capacity. In an information search apparatus, an LSP decoding section extracts and decodes only LSP information from coded speech data which is read for each block. An LPC conversion section converts the LSP information into LPC information. A Cepstrum conversion section converts the obtained LPC information into an LPC Cepstrum which represents features of speech. A vector quantization section performs vector quantization on the LPC Cepstrum. A speaker identification section identifies a speaker on the basis of the result of the vector quantization. Furthermore, the identified speaker is compared with a search condition in a condition comparison section, and based on the result, the search result is output.

Type: Grant

Filed: July 23, 2002

Date of Patent: January 1, 2008

Assignee: Sony Corporation

Inventors: Yasuhiro Toguri, Masayuki Nishiguchi
Method of speech segment selection for concatenative synthesis based on prosody-aligned distance measure

Patent number: 7315813

Abstract: A method of speech segment selection for concatenative synthesis based on prosody-aligned distance measure is disclosed. This method is based on comparison of speech segments segmented from a speech corpus, wherein speech segments are fully prosody-aligned to each other before distortion measure. With prosody alignment embedded in selection process, distortion resulting from possible prosody modification in synthesis could be taken into account objectively in selection phase. In order to carry out the purpose of the present invention, automatic segmentation, pitch marking and PSOLA method work together for prosody alignment. Two distortion measures, MFCC and PSQM are used for comparing two prosody-aligned segments of speech because of human perceptual consideration.

Type: Grant

Filed: July 29, 2002

Date of Patent: January 1, 2008

Assignee: Industrial Technology Research Institute

Inventors: Chih-Chung Kuo, Chi-Shiang Kuo
Systems and methods for using one-dimensional gaussian distributions to model speech

Patent number: 7295978

Abstract: A system for recognizing speech receives an input speech vector and identifies a Gaussian distribution. The system determines an address from the input speech vector (610) and uses the address to retrieve a distance value for the Gaussian distribution from a table (620). The system then determines the probability of the Gaussian distribution using the distance value (630) and recognizes the input speech vector based on the determined probability (640).

Type: Grant

Filed: September 5, 2000

Date of Patent: November 13, 2007

Assignees: Verizon Corporate Services Group Inc., BBN Technologies Corp.

Inventors: Richard Mark Schwartz, Jason Charles Davenport, James Donald Van Sciver, Long Nguyen
Method, apparatus, and medium for measuring confidence about speech recognition in speech recognizer

Publication number: 20070185712

Abstract: A method of measuring confidence of speech recognition in a speech recognizer compares a phase change point with a phoneme string change point and uses a difference between the phase change point and the phoneme string change point and a likelihood ratio, and an apparatus using the method is provided. That is, the method of the present invention includes detecting a phase change point of a speech signal; detecting a phoneme string change point according to a result of speech recognition; calculating confidence of the speech recognition by using a difference between the detected phase change point and phoneme string change point. According to the present invention, a performance of measuring confidence may become improved by simultaneously taking not only a likelihood ratio, but also taking a comparison result of a phase change point with a phoneme string change point into consideration.

Type: Application

Filed: June 30, 2006

Publication date: August 9, 2007

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Jae-Hoon Jeong, Kwang Cheol Oh
Automatic pronunciation scoring for language learning

Patent number: 7219059

Abstract: A method and apparatus for generating a pronunciation score by receiving a user phrase intended to conform to a reference phrase and processing the user phrase in accordance with at least one of an articulation-scoring engine, a duration scoring engine and an intonation-scoring engine to derive thereby the pronunciation score.

Type: Grant

Filed: July 3, 2002

Date of Patent: May 15, 2007

Assignee: Lucent Technologies Inc.

Inventors: Sunil K. Gupta, Ziyi Lu, Fengguang Zhao
System and method for processing speech recognition results

Patent number: 7219058

Abstract: An automatic speech recognition (ASR) system and method is provided for controlling the recognition of speech utterances generated by an end user operating a communications device. The ASR system and method can be used with a mobile device that is used in a communications network. The ASR system can be used for ASR of speech utterances input into a mobile device, to perform compensating techniques using at least one characteristic and for updating an ASR speech recognizer associated with the ASR system by determined and using a background noise value and a distortion value that is based on the features of the mobile device. The ASR system can be used to augment a limited data input capability of a mobile device, for example, caused by limited input devices physically located on the mobile device.

Type: Grant

Filed: October 1, 2001

Date of Patent: May 15, 2007

Assignee: AT&T Corp.

Inventors: Richard C. Rose, Sarangarajan Pathasarathy, Aaron Edward Rosenberg, Shrikanth Sambasivan Narayanan
System and method of pattern recognition in very high-dimensional space

Patent number: 7216076

Abstract: A system and method of recognizing speech comprises an audio receiving element and a computer server. The audio receiving element and the computer server perform the process steps of the method. The method involves training a stored set of phonemes by converting them into n-dimensional space, where n is a relatively large number. Once the stored phonemes are converted, they are transformed using single value decomposition to conform the data generally into a hypersphere. The received phonemes from the audio-receiving element are also converted into n-dimensional space and transformed using single value decomposition to conform the data into a hypersphere. The method compares the transformed received phoneme to each transformed stored phoneme by comparing a first distance from a center of the hypersphere to a point associated with the transformed received phoneme and a second distance from the center of the hypersphere to a point associated with the respective transformed stored phoneme.

Type: Grant

Filed: December 19, 2005

Date of Patent: May 8, 2007

Assignee: AT&T Corp.

Inventor: Bishnu Saroop Atal
Concatenative speech synthesis using a finite-state transducer

Patent number: 7165030

Abstract: A method for concatenative speech synthesis includes a processing stage that selects segments based on their symbolic labeling in an efficient graph-based search, which uses a finite-state transducer formalism. This graph-based search uses a representation of concatenation constraints and costs that does not necessarily grow with the size of the source corpus thereby limiting the increase in computation required for the search as the size of the source corpus increases. In one application of this method, multiple alternative segment sequences are generated and a best segment sequence is then be selected using characteristics that depend on specific signal characteristics of the segments.

Type: Grant

Filed: September 17, 2001

Date of Patent: January 16, 2007

Assignee: Massachusetts Institute of Technology

Inventors: Jon Rong-Wei Yi, James Robert Glass, Irvine Lee Hetherington
Speech recognition apparatus using distance based acoustic models

Patent number: 7031917

Abstract: The present invention relates to a speech recognition apparatus and a speech recognition method for speech recognition with improved accuracy. A distance calculator 47 determines the distance from a microphone 21 to a user uttering. Data indicating the determined distance is supplied to a speech recognition unit 41B. The speech recognition unit 41B has plural sets of acoustic models produced from speech data obtained by capturing speeches uttered at various distances. From those sets of acoustic models, the speech recognition unit 41B selects a set of acoustic models produced from speech data uttered at a distance closest to the distance determined by the distance calculator 47, and the speech recognition unit 41B performs speech recognition using the selected set of acoustic models.

Type: Grant

Filed: October 21, 2002

Date of Patent: April 18, 2006

Assignee: Sony Corporation

Inventor: Yasuharu Asano
System and method of pattern recognition in very high-dimensional space

Patent number: 7006969

Abstract: A system and method of recognizing speech comprises an audio receiving element and a computer server. The audio receiving element and the computer server perform the process steps of the method. The method involves training a stored set of phonemes by converting them into n-dimensional space, where n is a relatively large number. Once the stored phonemes are converted, they are transformed using single value decomposition to conform the data generally into a hypersphere. The received phonemes from the audio-receiving element are also converted into n-dimensional space and transformed using single value decomposition to conform the data into a hypersphere. The method compares the transformed received phoneme to each transformed stored phoneme by comparing a first distance from a center of the hypersphere to a point associated with the transformed received phoneme and a second distance from the center of the hypersphere to a point associated with the respective transformed stored phoneme.

Type: Grant

Filed: November 1, 2001

Date of Patent: February 28, 2006

Assignee: AT&T Corp.

Inventor: Bishnu Saroop Atal
Weighted spectral distance calculator

Patent number: 6983245

Abstract: A spectral distance calculator includes a calculator for performing spectral distance calculations to compare a spectrum of an an input signal in the presence of a noise signal and a reference spectrum. A memory pre-stores a noise spectrum from the noise signal. A masking unit masks the spectral distance between the input spectrum and the reference spectrum with respect to the pre-stored noise spectrum.

Type: Grant

Filed: June 7, 2000

Date of Patent: January 3, 2006

Assignee: Telefonaktiebolaget LM Ericsson (publ)

Inventors: Alberto Jimenez Felstrom, Jim Rasmusson
Speech recognition support method and apparatus

Patent number: 6978237

Abstract: A speech recognition support method in a system to retrieve a map in response to a user's input speech. The user's speech is recognized and a recognition result is obtained. If the recognition result represents a point on the map, a distance between the point and a base point on the map is calculated. The distance is decided to be above a threshold or not. If the distance is above the threshold, an inquiry to confirm whether the recognition result is correct is output to the user.

Type: Grant

Filed: October 20, 2003

Date of Patent: December 20, 2005

Assignee: Kabushiki Kaisha Toshiba

Inventors: Mitsuyoshi Tachimori, Hiroshi Kanazawa
Phonetic self-improving search engine

Patent number: 6976019

Abstract: The present invention relates to phonetic self-improving search engines. The search engine may include a phonetic database having a plurality of phonetic equivalent formulas stored therein, each of the phonetic equivalent formulas being associated with at least one respective pronounceable unit. After an initial query in a primary database fails to produce a positive result, an error memory database may be queried with a search string to obtain a positive result based on records of previously failed searches which ultimately found a positive result. If no record is found, the search string may be parsed into at least one pronounceable unit. Phonetically equivalent formulas may be applied to the at least one pronounceable unit to create at least one phonetic search string which is re-queried into the error memory database and the primary database. Successful positive results may be stored with the search string in the error memory database.

Type: Grant

Filed: April 19, 2002

Date of Patent: December 13, 2005

Inventor: Arash M Davallou
Method and device for generating an adapted reference for automatic speech recognition

Patent number: 6961702

Abstract: The invention relates to a method for generating an adapted reference for automatic speech recognition. In a first step, recognition is performed based on a spoken utterance and a recognition result which corresponds to a currently valid reference is obtained. In a second step, the currently valid reference is adapted in accordance with the utterance in order to create an adapted reference. In a third step, the adapted reference is assessed and it is decided if the adapted reference is used for further recognition.

Type: Grant

Filed: November 6, 2001

Date of Patent: November 1, 2005

Assignee: Telefonaktiebolaget LM Ericsson (publ)

Inventors: Stefan Dobler, Andreas Kiessling, Ralph Schleifer, Raymond Brückner
Compressing HMM prototypes

Patent number: 6907398

Abstract: A method is described for compressing the storage space required by HMM prototypes in an electronic memory. For this purpose prescribed HMM prototypes are mapped onto compressed HMM prototypes with the aid of a neural network (encoder). These can be stored with a smaller storage space than the uncompressed HMM prototypes. A second neural network (decoder) serves to reconstruct the HMM prototypes.

Type: Grant

Filed: September 6, 2001

Date of Patent: June 14, 2005

Assignee: Siemens Aktiengesellschaft

Inventor: Harald Hoege
Pattern matching for large vocabulary speech recognition systems

Patent number: 6879954

Abstract: A method is provided for improving pattern matching in a speech recognition system having a plurality of acoustic models. The improved method includes: receiving continuous speech input; generating a sequence of acoustic feature vectors that represent temporal and spectral behavior of the speech input; loading a first group of acoustic feature vectors from the sequence of acoustic feature vectors into a memory workspace accessible to a processor; loading an acoustic model from the plurality of acoustic models into the memory workspace; and determining a similarity measure for each acoustic feature vector of the first group of acoustic feature vectors in relation to the acoustic model. Prior to retrieving another group of acoustic feature vectors, similarity measures are computed for the first group of acoustic feature vectors in relation to each of the acoustic models employed by the speech recognition system.

Type: Grant

Filed: April 22, 2002

Date of Patent: April 12, 2005

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Patrick Nguyen, Luca Rigazio
Method and apparatus for recording/reproducing or producing a waveform using time position information

Patent number: 6873955

Abstract: Partial waveform data representative of a waveform shape variation are extracted from supplied waveform data, and the extracted partial waveform data are stored along with time position information indicative of their respective time positions. In reproduction, the partial waveform data and time position information are read out, then the partial waveform data are arranged on the time axis in accordance with the time position information, and a waveform is produced on the basis of the waveform data arranged on the time axis. In another implementation, sets of sample identification information and time position information are obtained in accordance with a performance tone waveform to be reproduced, and sample data are obtained from a database in accordance with the sample identification information. The thus-obtained sample data are arranged on the time axis in accordance with the time position information, and the desired waveform is produced on the basis of the sample data arranged on the time axis.

Type: Grant

Filed: September 22, 2000

Date of Patent: March 29, 2005

Assignee: Yamaha Corporation

Inventors: Hideo Suzuki, Motoichi Tamura, Satoshi Usa
Method and apparatus for call processing in response to a call request from an originating device

Patent number: 6870848

Abstract: A communication system includes a packet-based data network coupled to various network entities. The communications system includes a community that has a call processing system and various other devices, such as an integrated voice response (IVR) system and plural agent systems. The call processing system includes a combination of logical entities that perform call processing tasks. As an example, the logical entities may include a Session Initiation Protocol (SIP) proxy, a SIP client, and a SIP server. In response to a call request from outside the community, the call processing system, under control of the server, sends back responses to the originating system. The call processing system also establishes a call with a network element inside the community, such as the IVR system. The IVR system is capable of receiving input data from the originating system. Based on the received input data, the call processing system can reconnect or forward the call to one of the agent systems.

Type: Grant

Filed: June 7, 2000

Date of Patent: March 22, 2005

Assignee: Nortel Networks Limited

Inventor: Andrew J. Prokop
Process for voice recognition in a noisy acoustic signal and system implementing this process

Patent number: 6868378

Abstract: The invention relates to a process and a system for voice recognition in a noisy signal. In a preferred embodiment, the system (2) comprises modules for detecting speech (30) and for formulating a noise model (31), a module (40) for quantifying the energy level of the noise and for comparing with preestablished energy spans, a parameterization pathway (5) comprising an optional denoising module (51), with Wiener filter, a module (52) for calculating the spectral energy in Bark windows, a module (50, 530) for applying a configuration of shift values (531), by adding these values to the Bark coefficients, as a function of the quantification (40), so as to modify the parameterization, a module (54) for calculating vectors of parameters, and a block (6) for recognizing shapes, performing the voice recognition by comparison with vectors of parameters prerecorded during a learning phase.

Type: Grant

Filed: November 19, 1999

Date of Patent: March 15, 2005

Assignee: Thomson-CSF Sextant

Inventor: Pierre-Albert Breton
Learning of dialogue states and language model of spoken information system

Patent number: 6839671

Abstract: In this invention dialogue states for a dialogue model are created using a training corpus of example human—human dialogues. Dialogue states are modelled at the turn level rather than at the move level, and the dialogue states are derived from the training corpus. The range of operator dialogue utterances is actually quite small in many services and therefore may be categorized into a set of predetermined meanings. This is an important assumption which is not true of general conversation, but is often true of conversations between telephone operators and people. Phrases are specified which have specific substitution and deletion penalties, for example the two phrases “I would like to” and “can I” may be specified as a possible substitution with low or zero penalty. Thus allows common equivalent phrases are given low substitution penalties. Insignificant phrases such as ‘erm’ are given low or zero deletion penalties.

Type: Grant

Filed: December 19, 2000

Date of Patent: January 4, 2005

Assignee: British Telecommunications public limited company

Inventors: David J. Attwater, Michael D. Edgington, Peter J. Durston
System and method for hybrid voice recognition

Patent number: 6836758

Abstract: A method and system for speech recognition combines different types of engines in order to recognize user-defined digits and control words, predefined digits and control words, and nametags. Speaker-independent engines are combined with speaker-dependent engines. A Hidden Markov Model (HMM) engine is combined with Dynamic Time Warping (DTW) engines.

Type: Grant

Filed: January 9, 2001

Date of Patent: December 28, 2004

Assignee: Qualcomm Incorporated

Inventors: Ning Bi, Andrew P. DeJaco, Harinath Garudadri, Chienchung Chang, William Yee-Ming Huang, Narendranath Malayath, Suhail Jalil, David Puig Oses, Yingyong Qi
Use of semantic inference and context-free grammar with speech recognition system

Patent number: 6836760

Abstract: A method and apparatus to use semantic inference with speech recognition systems includes recognizing at least one spoken word, processing the spoken word using a context-free grammar, deriving an output from the context-free grammar, and translating the output to a predetermined command.

Type: Grant

Filed: September 29, 2000

Date of Patent: December 28, 2004

Assignee: Apple Computer, Inc.

Inventors: Jerome R. Bellegarda, Kim E. A. Silverman
Speech recognition apparatus and method performing speech recognition with feature parameter preceding lead voiced sound as feature parameter of lead consonant

Patent number: 6823304

Abstract: A lead consonant buffer stores a feature parameter preceding a lead voiced sound detected by a voiced sound detector as a feature parameter of a lead consonant. A matching processing unit performs matching processing of a feature parameter of a lead consonant stored in the lead consonant buffer with a feature parameter of a registered pattern. Hence, the matching processing unit can perform matching processing reflecting information on a lead consonant even when no lead consonant can be detected due to a noise.

Type: Grant

Filed: July 19, 2001

Date of Patent: November 23, 2004

Assignee: Renesas Technology Corp.

Inventor: Masahiko Ikeda
Apparatus and method for providing call return service

Patent number: 6788767

Abstract: An apparatus and method for enabling provision of a call return service is disclosed. The apparatus utilizes a method of generating telephone numbers from voice messages. The method includes the step of using speech recognition to isolate a spoken number in a voice message, and confirming to a high degree of accuracy that the spoken number represents a telephone number. The method further includes the step of converting the spoken number into a data sequence representing the telephone number. This data sequence is then made available for immediate or later use.

Type: Grant

Filed: December 28, 2000

Date of Patent: September 7, 2004

Assignee: Gateway, Inc.

Inventor: Jay V. Lambke
Method and apparatus for handset detection

Patent number: 6778957

Abstract: Disclosed is a method of automated handset identification, comprising receiving a sample speech input signal from a sample handset; deriving a cepstral covariance sample matrix from said first sample speech signal; calculating, with a distance metric, all distances between said sample matrix and one or more cepstral covariance handset matrices, wherein each said handset matrix is derived from a plurality of speech signals taken from different speakers through the same handset; and determining if the smallest of said distances is below a predetermined threshold value.

Type: Grant

Filed: August 21, 2001

Date of Patent: August 17, 2004

Assignee: International Business Machines Corporation

Inventors: Zhong-Hua Wang, David Lubensky, Cheng Wu
Speech recognition with soft pruning

Publication number: 20040158468

Abstract: A method, program product, and system for speech recognition, the method comprising in one embodiment pruning a hypothesis based on a first criteria; storing information about the pruned hypothesis; and reactivating the pruned hypothesis if a second criterion is met. In an embodiment, the first criteria may be that another hypothesis has a better score at that time by some predetermined amount. In an embodiment, the stored information may comprise at least one of a score for the pruned hypothesis, an identification of the hypothesis that caused the pruning and the frame in which the pruning took place. In a further embodiment, the reactivating step may use at least some of the stored information about the pruned hypothesis in performing the reactivation and the second criteria may be that a revised score for the hypothesis that caused the pruning is worse by some predetermined amount from an original expected score calculated for that hypothesis.

Type: Application

Filed: February 12, 2003

Publication date: August 12, 2004

Applicant: Aurilab, LLC

Inventor: James K. Baker
Creating a hierarchical tree of language models for a dialog system based on prompt and dialog context

Patent number: 6754626

Abstract: The invention disclosed herein concerns a method of converting speech to text using a hierarchy of contextual models. The hierarchy of contextual models can be statistically smoothed into a language model. The method can include processing text with a plurality of contextual models. Each one of the plurality of contextual models can correspond to a node in a hierarchy of the plurality of contextual models. Also included can be identifying at least one of the contextual models relating to the text and processing subsequent user spoken utterances with the identified at least one contextual model.

Type: Grant

Filed: March 1, 2001

Date of Patent: June 22, 2004

Assignee: International Business Machines Corporation

Inventor: Mark E. Epstein
Codebook re-ordering to reduce undesired packet generation

Patent number: 6754624

Abstract: A method and apparatus for enhancing coding efficiency by reducing illegal or other undesirable packet generation while encoding a signal. The probability of generating illegal or other undesirable packets while encoding a signal is reduced by first analyzing a history of the frequency of codebook values selected while quantizing speech parameters. Codebook entries are then reordered so that the index/indices that create illegal or other undesirable packets contain the least frequently used entry/entries. Reordering multiple codebooks for various parameters further reduces the probability that an illegal or other undesirable packet will be created during signal encoding. The method and apparatus may be applied to reduce the probability of generating illegal null traffic channel data packets while encoding eighth rate speech.

Type: Grant

Filed: February 13, 2001

Date of Patent: June 22, 2004

Assignee: Qualcomm, Inc.

Inventors: Eddie-Lun Tik Choy, Arasanipalai K. Ananthapadmanabhan, Andrew P. DeJaco
System and method for automatic voice recognition using mapping

Patent number: 6754629

Abstract: A method and system that combines voice recognition engines and resolves differences between the results of individual voice recognition engines using a mapping function. Speaker independent voice recognition engines and speaker-dependent voice recognition engines are combined. Hidden Markov Model (HMM) engines and Dynamic Time Warping (DTW) engines are combined.

Type: Grant

Filed: September 8, 2000

Date of Patent: June 22, 2004

Assignee: Qualcomm Incorporated

Inventors: Yingyong Qi, Ning Bi, Harinath Garudadri
Speaker recognition using cohort-specific feature transforms

Patent number: 6754628

Abstract: Methods and apparatus for facilitating speaker recognition, wherein, from target data that is provided relating to a target speaker and background data that is provided relating to at least one background speaker, a set of cohort data is selected from the background data that has at least one proximate characteristic with respect to the target data. The target data and the cohort data are then combined in a manner to produce at least one new cohort model for use in subsequent speaker verification. Similar methods and apparatus are contemplated for non-voice-based applications, such as verification through fingerprints.

Type: Grant

Filed: June 13, 2000

Date of Patent: June 22, 2004

Assignee: International Business Machines Corporation

Inventors: Upendra V. Chaudhari, Stephane H. Maes, Jiri Navratil
Method of real-time speaker change point detection, speaker tracking and speaker model construction

Publication number: 20040107100

Abstract: A method is provided for real-time speaker change detection and speaker tracking in a speech signal. The method is a “coarse-to-refine” process, which consists of two stages: pre-segmentation and refinement. In the pre-segmentation process, the covariance of a feature vector of each segment of speech is built initially. A distance is determined based on the covariance of the current segment and a previous segment; and the distance is used to determine if there is a potential speaker change between these two segments. If there is no speaker change, the model of current identified speaker model is updated by incorporating data of the current segment. Otherwise, if there is a speaker change, a refinement process is utilized to confirm the potential speaker change point.

Type: Application

Filed: November 29, 2002

Publication date: June 3, 2004

Inventors: Lie Lu, Hong-Jiang Zhang
Speech recognition system and standard pattern preparation system as well as speech recognition method and standard pattern preparation method

Patent number: 6741962

Abstract: A speech recognition system for recognizing an input voice of a narrow frequency band. The speech recognition system includes: a frequency band converting unit for converting the input voice of the narrow frequency band into a pseudo voice of a wide frequency band which covers an entirety of the narrow frequency band and which is wider than the narrow frequency band.

Type: Grant

Filed: March 7, 2002

Date of Patent: May 25, 2004

Assignee: NEC Corporation

Inventor: Kenichi Iso
Pattern matching method and apparatus

Patent number: 6725196

Abstract: A method and apparatus is provided for matching a first sequence of patterns representative of a first signal with a second sequence of patterns representative of a second signal. The system uses a plurality of different pruning thresholds (th) to control the propagation of paths which represent possible matchings between a sequence of second signal patterns and a sequence of first signal patterns ending at the current first signal pattern. In particular, the pruning threshold used for a given path during the processing of a current first signal pattern depends upon the position, within the sequence of patterns representing the second signal, of the second signal pattern which is at the end of the given path.

Type: Grant

Filed: March 20, 2001

Date of Patent: April 20, 2004

Assignee: Canon Kabushiki Kaisha

Inventors: Robert Alexander Keiller, Eli Tzirkel-Hancock, Julian Richard Seward

prev 1 2 3 4 5 6 next