Specialized Equations Or Comparisons Patents (Class 704/236)
-
Patent number: 7035867Abstract: A system for identifying files can use fingerprints to compare various files and determine redundant files. Frequency representations of portions of files can be used, such as Fast Fourier Transforms, as the fingerprints.Type: GrantFiled: November 28, 2001Date of Patent: April 25, 2006Assignee: Aerocast.com, Inc.Inventors: Mark R. Thompson, Nathan F. Raciborski
-
Patent number: 7035798Abstract: A trained vector generation section 16 generates beforehand a trained vector v of unvoiced sounds. An LPC Cepstrum analysis section 18 generates a feature vector A of a voice within the non-voice period, an inner product operation section 19 calculates an inner product value VTA between the feature vector A and the trained vector V, and a threshold generation section 20 generates a threshold ?v on the basis of the inner product value VTA. Also, the LFC Cepstrum analysis section 18 generates a prediction residual power ? of the signal within the non-voice period, and the threshold generation section 22 generates a threshold THD on the basis of the prediction residual power ?.Type: GrantFiled: September 12, 2001Date of Patent: April 25, 2006Assignee: Pioneer CorporationInventor: Hajime Kobayashi
-
Patent number: 7031915Abstract: A speech recognition method, system and program product, the method in one embodiment comprising: obtaining input speech data; initiating a first speech recognition search process with at least one hypothesis; initiating a second speech recognition search process with a plurality of hypotheses; obtaining partial results from the second speech recognition search process, where the partial results include an evaluation of at least one hypothesis that the first speech recognition search process has not evaluated at this point in time; and utilizing the partial results to alter the first speech recognition search process.Type: GrantFiled: January 23, 2003Date of Patent: April 18, 2006Assignee: Aurilab LLCInventor: James K. Baker
-
Patent number: 7031921Abstract: A method is provided for monitoring audio content available over a network. According to the method, the network is searched for audio files, and audio identifying information is generated for each audio file that is found. It is determined whether the audio identifying information generated for each audio file matches audio identifying information in an audio content database. In one preferred embodiment, each audio file that is found is analyzed so as to generate the audio file information, which is an audio feature signature that is based on the content of the audio file. Also provided is a system for monitoring audio content available over a network.Type: GrantFiled: June 29, 2001Date of Patent: April 18, 2006Assignee: International Business Machines CorporationInventors: Michael C. Pitman, Blake G. Fitch, Steven Abrams, Robert S. Germain
-
Patent number: 7027987Abstract: A system provides search results from a voice search query. The system receives a voice search query from a user, derives one or more recognition hypotheses, each being associated with a weight, from the voice search query, and constructs a weighted boolean query using the recognition hypotheses. The system then provides the weighted boolean query to a search system and provides the results of the search system to a user.Type: GrantFiled: February 7, 2001Date of Patent: April 11, 2006Assignee: Google Inc.Inventors: Alexander Mark Franz, Monika H. Henzinger, Sergey Brin, Brian Christopher Milch
-
Patent number: 7016835Abstract: A characteristic-specific digitization method and apparatus are disclosed that reduces the error rate in converting input information into a computer-readable format. The input information is analyzed and subsets of the input information are classified according to whether the input information exhibits a specific physical parameter affecting recognition accuracy. If the input information exhibits the specific physical parameter affecting recognition accuracy, the characteristic-specific digitization system recognizes the input information using a characteristic-specific recognizer that demonstrates improved performance for the given physical parameter. If the input information does not exhibit the specific physical parameter affecting recognition accuracy, the characteristic-specific digitization system recognizes the input information using a general recognizer that performs well for typical input information.Type: GrantFiled: December 19, 2002Date of Patent: March 21, 2006Assignee: International Business Machines CorporationInventors: Ellen Marie Eide, Ramesh Ambat Gopinath, Dimitri Kanevsky, Peder Andreas Olsen
-
Patent number: 7010486Abstract: The invention relates to a speech recognition system and a method of calculating iteration values for free parameters ??ortho(n) of a maximum-entropy speech model MESM with the aid of the generalized-iterative scaling training algorithm in a computer-supported speech recognition system in accordance with the formula ??ortho(n+1)=G(??ortho(n), m?ortho, . . . ), where n is an iteration parameter, G a mathematical function, ? an attribute in the MESM and m?ortho a desired orthogonalized boundary value in the MESM for the attribute ?. It is an object of the invention to further develop the system and method so that they make a fast computation of the free parameters ? possible without a change of the original training object. According to the invention this object is achieved in that the desired orthogonalized boundary value m?ortho is calculated by a linear combination of the desired boundary value m? with desired boundary values m? from attributes ? that have a larger range than the attribute ?.Type: GrantFiled: February 13, 2002Date of Patent: March 7, 2006Assignee: Koninklijke Philips Electronics, N.V.Inventor: Jochen Peters
-
Patent number: 7010484Abstract: A method of phrase verification to verify a phrase not only according to its confidence measures but also according to neighboring concepts and their confidence tags. First, an utterance is received, and the received utterance is parsed to find a concept sequence. Subsequently, a plurality of tag sequences corresponding to the concept sequence is produced. Then, a first score of each of the tag sequences is calculated. Finally, the tag sequence of the highest first score is selected as the most probable tag sequence, and the tags contained therein are selected as the most probable confidence tags, respectively corresponding to the concepts in the concept sequence.Type: GrantFiled: December 12, 2001Date of Patent: March 7, 2006Assignee: Industrial Technology Research InstituteInventor: Yi-Chung Lin
-
Patent number: 7003458Abstract: An automated voice pattern filtering method implemented in a system having a client side and a server side is disclosed. At the client side, a speech signal is transformed into a first set of spectral parameters which are encoded into a set of spectral shapes that are compared to a second set of spectral parameters corresponding to one or more keywords. From the comparison, the client side determines if the speech signal is acceptable. If so, spectral information indicating a difference in a voice pattern between the speech signal and the keyword(s) is encoded and utilized as a basis to generate a voice pattern filter.Type: GrantFiled: January 15, 2002Date of Patent: February 21, 2006Assignee: General Motors CorporationInventors: Kai-Ten Feng, Jane F. MacFarlane, Stephen C. Habermas
-
Patent number: 6999925Abstract: The present invention provides a computerized method and apparatus for automatically generating from a first speech recognizer a second speech recognizer which can be adapted to a specific domain. The first speech recognizer can include a first acoustic model with a first decision network and corresponding first phonetic contexts. The first acoustic model can be used as a starting point for the adaptation process. A second acoustic model with a second decision network and corresponding second phonetic contexts for the second speech recognizer can be generated by re-estimating the first decision network and the corresponding first phonetic contexts based on domain-specific training data.Type: GrantFiled: November 13, 2001Date of Patent: February 14, 2006Assignee: International Business Machines CorporationInventors: Volker Fischer, Siegfried Kunzmann, Eric-W. Janke, A. Jon Tyrrell
-
Patent number: 6993483Abstract: A speech recognizer suitable for distributed speech recognition is robust to missing speech feature vectors. Speech is transmitted via a packet switched network in the form of basic feature vectors. Missing feature vectors are detected and replacement feature vectors are estimated by interpolation of received data prior to speech recognition. Features may be converted and interpolation may be accomplished in a spectral domain.Type: GrantFiled: November 2, 2000Date of Patent: January 31, 2006Assignee: British Telecommunications public limited companyInventor: Benjamin P Milner
-
Patent number: 6993481Abstract: According to the invention, a method for detecting speech activity for a signal is disclosed. In one step, a plurality of features is extracted from the signal. An active speech probability density function (PDF) of the plurality of features is modeled, and an inactive speech PDF of the plurality of features is modeled. The active and inactive speech PDFs are adapted to respond to changes in the signal over time. The signal is probability-based classifyied based, at least in part, on the plurality of features. Speech in the signal is distinguished based, at least in part, upon the probability-based classification.Type: GrantFiled: December 4, 2001Date of Patent: January 31, 2006Assignee: Global IP Sound ABInventors: Jan K. Skoglund, Jan T. Linden
-
Patent number: 6985860Abstract: To achieve an improvement in recognition performance, a non-speech acoustic model correction unit adapts a non-speech acoustic model representing a non-speech state using input data observed during an interval immediately before a speech recognition interval during which speech recognition is performed, by means of one of the most likelihood method, the complex statistic method, and the minimum distance-maximum separation theorem.Type: GrantFiled: August 30, 2001Date of Patent: January 10, 2006Assignee: Sony CorporationInventor: Hironaga Nakatsuka
-
Patent number: 6973427Abstract: A method and computer-readable medium convert the text of a word and a user's pronunciation of the word into a phonetic description to be added to a speech recognition lexicon. Initially, two possible phonetic descriptions are generated. One phonetic description is formed from the text of the word. The other phonetic description is formed by decoding a speech signal representing the user's pronunciation of the word. Both phonetic descriptions are scored based on their correspondence to the user's pronunciation. The phonetic description with the highest score is then selected for entry in the speech recognition lexicon.Type: GrantFiled: December 26, 2000Date of Patent: December 6, 2005Assignee: Microsoft CorporationInventors: Mei-Yuh Hwang, Fileno A. Alleva, Rebecca C. Weiss
-
Patent number: 6970818Abstract: The present invention comprises a methodology for implementing a vocabulary set for use in a speech recognition system, and may preferably include a recognizer for analyzing utterances from the vocabulary set to generate N-best lists of recognition candidates. The N-best lists may then be utilized to create an acoustical matrix configured to relate said utterances to top recognition candidates from said N-best lists, as well as a lexical matrix configured to relate the utterances to the top recognition candidates from the N-best lists only when second-highest recognition candidates from the N-best lists are correct recognition results. An utterance ranking may then preferably be created according to composite individual error/accuracy values for each of the utterances. The composite individual error/accuracy values may preferably be derived from both the acoustical matrix and the lexical matrix.Type: GrantFiled: March 14, 2002Date of Patent: November 29, 2005Assignees: Sony Corporation, Sony Electronics Inc.Inventors: Xavier Menedez-Pidal, Lex S. Olorenshaw
-
Patent number: 6963834Abstract: A method for performing speech recognition can include determining a recognition result for received user speech. The recognition result can include recognized text and a corresponding confidence score. The confidence score of the recognition result can correspond to a predetermined minimum threshold. If the confidence score does not exceed the predetermined minimum threshold, the user can be presented with at least one empirically determined alternate word candidate corresponding to the recognition result.Type: GrantFiled: May 29, 2001Date of Patent: November 8, 2005Assignee: International Business Machines CorporationInventors: Matthew W. Hartley, James R. Lewis, David E. Reich
-
Patent number: 6961701Abstract: An extended-word selecting section calculates a score for a phoneme string formed of one more phonemes, corresponding to a user's speech, and searches a large-vocabulary-dictionary for a word having one or more phonemes equal to or similar to those of a phoneme string having a score equal to or higher than a predetermined value. A matching section calculates scores for the word searched for by the extended-word selecting section in addition to a word preliminary word-selecting section. A control section determines a word string as the result of recognition of the speech uttered by the user.Type: GrantFiled: March 3, 2001Date of Patent: November 1, 2005Assignee: Sony CorporationInventors: Hiroaki Ogawa, Katsuki Minamino, Yasuharu Asano, Helmut Lucke
-
Patent number: 6957183Abstract: A method for processing digitized speech signals by analyzing redundant features to provide more robust voice recognition. A primary transformation is applied to a source speech signal to extract primary features therefrom. Each of at least one secondary transformation is applied to the source speech signal or extracted primary features to yield at least one set of secondary features statistically dependant on the primary features. At least one predetermined function is then applied to combine the primary features with the secondary features. A recognition answer is generated by pattern matching this combination against predetermined voice recognition templates.Type: GrantFiled: March 20, 2002Date of Patent: October 18, 2005Assignee: Qualcomm Inc.Inventors: Narendranath Malayath, Harinath Garudadri
-
Patent number: 6910010Abstract: A feature extraction and pattern recognition system in which an observation vector forming input data, which represents a certain point in the observation vector space, is mapped to a distribution having a spread in the feature vector space, and a feature distribution parameter representing the distribution is determined. Pattern recognition of the input data is performed based on the feature distribution parameter.Type: GrantFiled: October 28, 1998Date of Patent: June 21, 2005Assignee: Sony CorporationInventors: Naoto Iwahashi, Hongchang Bao, Hitoshi Honda
-
Patent number: 6907367Abstract: A method for segmenting a signal into segments having similar spectral characteristics is provided. Initially the method generates a table of previous values from older signal values that contains a scoring value for the best segmentation of previous values and a segment length of the last previously identified segment. The method then receives a new sample of the signal and computes a new spectral characteristic function for the signal based on the received sample. A new scoring function is computed from the spectral characteristic function. Segments of the signal are recursively identified based on the newly computed scoring function and the table of previous values. The spectral characteristic function can be a selected one of an autocorrelation function and a discrete Fourier transform. An example is provided for segmenting a speech signal.Type: GrantFiled: August 31, 2001Date of Patent: June 14, 2005Assignee: The United States of America as represented by the Secretary of the NavyInventor: Paul M. Baggenstoss
-
Patent number: 6901365Abstract: The invention enables even a CPU having low processing performance to find an HMM output probability by simplifying arithmetic operations. The dimensions of an input vector are grouped into several sets, and tables are created for the sets. When an output probability is calculated, codes corresponding to the first dimension to n-the dimension of the input vector are sequentially obtained, and for each code, by referring to the corresponding table, output values for each table are obtained. By substituting the output values for each table for a formula for finding an output probability, the output probability is found.Type: GrantFiled: September 19, 2001Date of Patent: May 31, 2005Assignee: Seiko Epson CorporationInventor: Yasunaga Miyazawa
-
Patent number: 6882970Abstract: A system is provided for comparing an input query with a number of stored annotations to identify information to be retrieved from a database. The comparison technique divides the input query into a number of fixed-size fragments and identifies how many times each of the fragments occurs within each annotation using a dynamic programming matching technique. The frequencies of occurrence of the fragments in both the query and the annotation are then compared to provide a measure of the similarity between the query and the annotation. The information to be retrieved is then determined from the similarity measures obtained for all the annotations.Type: GrantFiled: October 25, 2000Date of Patent: April 19, 2005Assignee: Canon Kabushiki KaishaInventors: Philip Neil Garner, Jason Peter Andrew Charlesworth, Asako Higuchi
-
Patent number: 6879955Abstract: A signal modification technique facilitates compact voice coding by employing a continuous, rather than piece-wise continuous, time warp contour to modify an original residual signal to match an idealized contour, avoiding edge effects caused by prior art techniques. Warping is executed using a continuous warp contour lacking spatial discontinuities which does not invert or overly distend the positions of adjacent end points in adjacent frames. The linear shift implemented by the warp contour is derived via quadratic approximation or other method, to reduce the complexity of coding to allow for practical and economical implementation. In particular, the algorithm for determining the warp contour uses only a subset of possible contours contained within a sub-range of the range of possible contours. The relative correlation strengths from these contours are modeled as points on a polynomial trace and the optimum warp contour is calculated by maximizing the modeling function.Type: GrantFiled: June 29, 2001Date of Patent: April 12, 2005Assignee: Microsoft CorporationInventor: Ajit V. Rao
-
Patent number: 6868382Abstract: The generic word label series used for recognition of words uttered by unspecified speakers are stored in the vocabulary label network accumulation processing. The speech of a particular speaker is entered. Based on the input speech, the registered word label series extraction processing generates the registered word label series. The registered word label series of the particular speaker can then be registered with the vocabulary label network accumulation processing.Type: GrantFiled: March 9, 2001Date of Patent: March 15, 2005Assignee: Asahi Kasei Kabushiki KaishaInventor: Makoto Shozakai
-
Patent number: 6868381Abstract: A speech recognition system having an input for receiving an input signal indicative of a spoken utterance that is indicative of at least one speech element. The system further includes a first processing unit operative for processing the input signal to derive from a speech recognition dictionary a speech model associated to a given speech element that constitutes a potential match to the at least one speech element. The system further comprised a second processing unit for generating a modified version of the speech model on the basis of the input signal. The system further provides a third processing unit for processing the input signal on the basis of the modified version of the speech model to generate a recognition result indicative of whether the modified version of the at least one speech model constitutes a match to the input signal.Type: GrantFiled: December 21, 1999Date of Patent: March 15, 2005Assignee: Nortel Networks LimitedInventors: Stephen Douglas Peters, Daniel Boies, Benoit Dumoulin
-
Patent number: 6868380Abstract: A speech recognition system for transforming an acoustic signal into a stream of phonetic estimates includes a frequency analyzer for generating a short-time frequency representation of the acoustic signal. A novelty processor separates background components of the representation from region of interest components of the representation. The output of the novelty processor includes the region of interest components of the representation according to the novelty parameters. An attention processor produces a gating signal as a function of the novelty output according to attention parameters. A coincidence processor produces information regarding co-occurrences between samples of the novelty output over time and frequency. The coincidence processor selectively gates the coincidence output as a function of the gating signal according to one or more coincidence parameters.Type: GrantFiled: March 23, 2001Date of Patent: March 15, 2005Assignee: Eliza CorporationInventor: John Kroeker
-
Patent number: 6850885Abstract: To increase the accuracy and the flexibility of a method for recognizing speech which employs a keyword spotting process on the basis of a combination of a keyword model (KM) and a garbage model (GM) it is suggested to associate at least one variable penalty value (Ptrans, P1, . . . , P6) with a global penalty (Pglob) so as to increase the recognition of keywords (Kj).Type: GrantFiled: December 12, 2001Date of Patent: February 1, 2005Assignee: Sony International (Europe) GmbHInventors: Daniela Raddino, Ralf Kompe, Thomas Kemp
-
Patent number: 6836758Abstract: A method and system for speech recognition combines different types of engines in order to recognize user-defined digits and control words, predefined digits and control words, and nametags. Speaker-independent engines are combined with speaker-dependent engines. A Hidden Markov Model (HMM) engine is combined with Dynamic Time Warping (DTW) engines.Type: GrantFiled: January 9, 2001Date of Patent: December 28, 2004Assignee: Qualcomm IncorporatedInventors: Ning Bi, Andrew P. DeJaco, Harinath Garudadri, Chienchung Chang, William Yee-Ming Huang, Narendranath Malayath, Suhail Jalil, David Puig Oses, Yingyong Qi
-
Publication number: 20040260548Abstract: A system and method that facilitate modeling unobserved speech dynamics based upon a hidden dynamic speech model in the form of segmental switching state space model that employs model parameters including those describing the unobserved speech dynamics and those describing the relationship between the unobserved speech dynamic vector and the observed acoustic feature vector is provided. The model parameters are modified based, at least in part, upon, a variational learning technique. In accordance with an aspect of the present invention, novel and powerful variational expectation maximization (EM) algorithm(s) for the segmental switching state space models used in speech applications, which are capable of capturing key internal (or hidden) dynamics of natural speech production, are provided. For example, modification of model parameters can be based upon an approximate mixture of Gaussian (MOG) posterior and/or based upon an approximate hidden Markov model (HMM) posterior using a variational technique.Type: ApplicationFiled: June 20, 2003Publication date: December 23, 2004Inventors: Hagai Attias, Li Deng, Leo J. Lee
-
Patent number: 6823308Abstract: A speech recognition method for use in a multimodal input system comprises receiving a multimodal input comprising digitized speech as a first modality input and data in at least one further modality input. Features in the speech and in the data in at least one further modality are identified. The identified features in the speech and in the data are used in the recognition of words by comparing the identified features with states in models for the words. The models have states for the recognition of speech and for words having features in at least one further modality associated with the words, the models also have states for the recognition of events in the further modality or each further modality.Type: GrantFiled: February 16, 2001Date of Patent: November 23, 2004Assignee: Canon Kabushiki KaishaInventors: Robert Alexander Keiller, Nicolas David Fortescue
-
Patent number: 6823304Abstract: A lead consonant buffer stores a feature parameter preceding a lead voiced sound detected by a voiced sound detector as a feature parameter of a lead consonant. A matching processing unit performs matching processing of a feature parameter of a lead consonant stored in the lead consonant buffer with a feature parameter of a registered pattern. Hence, the matching processing unit can perform matching processing reflecting information on a lead consonant even when no lead consonant can be detected due to a noise.Type: GrantFiled: July 19, 2001Date of Patent: November 23, 2004Assignee: Renesas Technology Corp.Inventor: Masahiko Ikeda
-
Publication number: 20040186715Abstract: This invention relates to a non-intrusive speech quality assessment system. The invention provides a method and apparatus for training a quality assessment tool in which a database comprising a plurality of samples, each with an associated mean opinion score, is divided into a plurality of distortion sets of samples according to a distortion criterion; and a distortion specific assessment handler for each distortion set is trained, such that a fit between a distortion specific quality measure generated from a distortion specific plurality of parameters for a sample and the mean opinion score associated with said sample is optimised.Type: ApplicationFiled: January 14, 2004Publication date: September 23, 2004Applicant: PSYTECHNICS LIMITEDInventors: Philip Gray, Ludovic Malfait
-
Publication number: 20040186716Abstract: A processing unit and method are described herein that are capable of estimating a quality of a speech signal transmitted through a wireless network. The processing unit uses a logistic function to map a score output from an objective voice quality method (PESQ algorithm) into a mean of opinion (MOS) score which is an estimation of the quality of the speech signal that was transmitted through the wireless network. The logistic function has the form: y=1+4/(1+exp(−1.7244*x+5.0187)) where x is the score from the PESQ algoritm which is in the range of −0.5 to 4.5 and y is the mapped MOS score which is in the range of 1 to 5 wherein if y=5 then the quality of the speech signal is considered excellent and if y=1 then the quality of the speech signal is considered bad.Type: ApplicationFiled: January 20, 2004Publication date: September 23, 2004Applicant: Telefonaktiebolaget LM EricssonInventors: John C. Morfitt, Irina C. Cotanis
-
Publication number: 20040186714Abstract: A method, program product and system for speech recognition for use with a base speech recognition process, but which does not affect scoring models in the base speech recognition process, the method comprising in one embodiment: obtaining an output hypothesis from a base speech recognition process that uses a first set of scoring models; obtaining a set of alternative hypotheses; scoring the set of alternative hypotheses based on a second set of different scoring models that is separate from and external to the base speech recognition process and does not affect the scoring models thereof; and selecting a hypothesis with a best score.Type: ApplicationFiled: March 18, 2003Publication date: September 23, 2004Applicant: Aurilab, LLCInventor: James K. Baker
-
Patent number: 6792405Abstract: A feature extraction process for use in a wireless communication system provides automatic speech recognition based on both spectral envelope and voicing information. The shape of the spectral envelope is used to determine the LSPs of the incoming bitstream and the adaptive gain coefficients and fixed gain coefficients are used to generate the “voiced” and “unvoiced” feature parameter information.Type: GrantFiled: December 5, 2000Date of Patent: September 14, 2004Assignee: AT&T Corp.Inventors: Richard Vandervoort Cox, Hong Kook Kim
-
Patent number: 6788767Abstract: An apparatus and method for enabling provision of a call return service is disclosed. The apparatus utilizes a method of generating telephone numbers from voice messages. The method includes the step of using speech recognition to isolate a spoken number in a voice message, and confirming to a high degree of accuracy that the spoken number represents a telephone number. The method further includes the step of converting the spoken number into a data sequence representing the telephone number. This data sequence is then made available for immediate or later use.Type: GrantFiled: December 28, 2000Date of Patent: September 7, 2004Assignee: Gateway, Inc.Inventor: Jay V. Lambke
-
Publication number: 20040162725Abstract: A stochastic processor of the present invention comprises a fluctuation generator (15) configured to output an analog quantity having a fluctuation, a fluctuation difference calculation means (401) configured to output fluctuation difference data with an output of the fluctuation generator added to analog difference between two data, a thresholding unit (47) configured to perform thresholding on an output of the fluctuation difference calculation means to thereby generate a pulse, and a pulse detection means configured to detect the pulse output from the thresholding unit.Type: ApplicationFiled: February 20, 2004Publication date: August 19, 2004Applicant: Matsushita Electric Industrial Co., Ltd.Inventors: Michihito Ueda, Kiyoyuki Morita
-
Publication number: 20040158467Abstract: An automated speech recognition filter is disclosed. The automated speech recognition filter device provides a speech signal to an automated speech platform that approximates an original speech signal as spoken into a transceiver by a user. In providing the speech signal, the automated speech recognition filter determines various models representative of a cumulative signal degradation of the original speech signal from various devices along a transmission signal path and a reception signal path between the transceiver and a device housing the filter. The automated speech platform can thereby provide an audio signal corresponding to a context of the original speech signal.Type: ApplicationFiled: February 6, 2004Publication date: August 12, 2004Inventors: Stephen C. Habermas, Ognjen Todic, Kai-Ten Feng, Jane F. MacFarlane
-
Publication number: 20040158466Abstract: Vocal and vocal-like sounds can be characterised and/or identified by using an intelligent classifying method adapted to determine prosodic attributes of the sounds and base a classificatory scheme upon composite functions of these attributes, the composite functions defining a discrimination space. The sounds are segmented before prosodic analysis on a segment by segment basis. The prosodic analysis of the sounds involves pitch analysis, intensity analysis, formant analysis and timing analysis. This method can be implemented in systems including language-identification and singing-style-identification systems.Type: ApplicationFiled: April 9, 2004Publication date: August 12, 2004Inventor: Eduardo Reck Miranda
-
Patent number: 6775652Abstract: Recognizing a stream of speech received as speech vectors over a lossy communications link includes constructing for a speech recognizer a series of speech vectors from packets received over a lossy packetized transmission link, wherein some of the packets associated with each speech vector are lost or corrupted during transmission. Each constructed speech vector is multi-dimensional and includes associated features. Potentially corrupted features within the speech vector are indicated to the speech recognizer when present. Speech recognition is attempted at the speech recognizer on the speech vectors when corrupted features are present. This recognition may be based only on certain or valid features within each speech vector. Retransmission of a missing or corrupted packet is requested when corrupted values are indicated by the indicating step and when the attempted recognition step fails.Type: GrantFiled: June 30, 1998Date of Patent: August 10, 2004Assignee: AT&T Corp.Inventors: Richard Vandervoort Cox, Stephen Michael Marcus, Mazin G. Rahim, Nambirajan Seshadri, Robert Douglas Sharp
-
Patent number: 6772119Abstract: A speaker recognition technique is provided that can operate within the memory and processing constraints of existing portable computing devices. A smaller memory footprint and computational efficiency are achieved using single Gaussian models for each enrolled speaker. During enrollment, features are extracted from one or more enrollment utterances from each enrolled speaker, to generate a target speaker model based on a sample covariance matrix. During a recognition phase, features are extracted from one or more test utterances to generate a test utterance model that is also based on the sample covariance matrix. A sphericity ratio is computed that compares the test utterance model to the target speaker model, as well as a background model. The sphericity ratio indicates how similar test utterance speech is to the speech used when the user was enrolled, as represented by the target speaker model, and how dissimilar the test utterance speech is from the background model.Type: GrantFiled: December 10, 2002Date of Patent: August 3, 2004Assignee: International Business Machines CorporationInventors: Upendra V. Chaudhari, Ganesh N. Ramaswamy, Ran Zilca
-
Patent number: 6772116Abstract: A method of selecting a language model for decoding received user spoken utterances in a speech recognition system can include a series of steps. The steps can include computing confidence scores for identified closed-class words and computing a running average of the confidence scores for a predetermined number of decoded closed-class words. Additionally, based upon the running average, telegraphic decoding can be selectively enabled.Type: GrantFiled: March 27, 2001Date of Patent: August 3, 2004Assignee: International Business Machines CorporationInventor: James R. Lewis
-
Publication number: 20040138884Abstract: A method compresses one or more ordered arrays of integer values. The integer values can represent a vocabulary of a language mode, in the form of an N-gram, of an automated speech recognition system. For each ordered array to be compressed, and an inverse array I[.] is defined. One or more spilt inverse arrays are also defined for each ordered array. The minimum and optimum number of bits required to store the array A[.] in terms of the split arrays and split inverse arrays are determined. Then, the original array is stored in such a way that the total amount of memory used is minimized.Type: ApplicationFiled: January 13, 2003Publication date: July 15, 2004Inventors: Edward W. D. Whittaker, Bhiksha Ramakrishnan
-
Publication number: 20040138883Abstract: A method compresses one or more ordered arrays of integer values. The integer values can represent a vocabulary of a language mode, in the form of an N-gram, of an automated speech recognition system. For each ordered array to be compressed, and an inverse array I[.] is defined. One or more spilt inverse arrays are also defined for each ordered array. The minimum and optimum number of bits required to store the array A[.] in terms of the split arrays and split inverse arrays are determined. Then, the original array is stored in such a way that the total amount of memory used is minimized.Type: ApplicationFiled: January 13, 2003Publication date: July 15, 2004Inventors: Bhiksha Ramakrishnan, Edward W. D. Whittaker
-
Publication number: 20040128130Abstract: Pitch estimation and classification into voiced, unvoiced and transitional speech were performed by a spectro-temporal auto-correlation technique. A peak picking formula was then employed. A weighting function was then applied to the power spectrum. The harmonics weighted power spectrum underwent mel-scaled band-pass filtering, and the log-energy of the filter's output was discrete cosine transformed to produce cepstral coefficients. A within-filter cubic-root amplitude compression was applied to reduce amplitude variation without compromise of the gain invariance properties.Type: ApplicationFiled: May 19, 2003Publication date: July 1, 2004Inventors: Kenneth Rose, Liang Gu
-
Patent number: 6754624Abstract: A method and apparatus for enhancing coding efficiency by reducing illegal or other undesirable packet generation while encoding a signal. The probability of generating illegal or other undesirable packets while encoding a signal is reduced by first analyzing a history of the frequency of codebook values selected while quantizing speech parameters. Codebook entries are then reordered so that the index/indices that create illegal or other undesirable packets contain the least frequently used entry/entries. Reordering multiple codebooks for various parameters further reduces the probability that an illegal or other undesirable packet will be created during signal encoding. The method and apparatus may be applied to reduce the probability of generating illegal null traffic channel data packets while encoding eighth rate speech.Type: GrantFiled: February 13, 2001Date of Patent: June 22, 2004Assignee: Qualcomm, Inc.Inventors: Eddie-Lun Tik Choy, Arasanipalai K. Ananthapadmanabhan, Andrew P. DeJaco
-
Patent number: 6754626Abstract: The invention disclosed herein concerns a method of converting speech to text using a hierarchy of contextual models. The hierarchy of contextual models can be statistically smoothed into a language model. The method can include processing text with a plurality of contextual models. Each one of the plurality of contextual models can correspond to a node in a hierarchy of the plurality of contextual models. Also included can be identifying at least one of the contextual models relating to the text and processing subsequent user spoken utterances with the identified at least one contextual model.Type: GrantFiled: March 1, 2001Date of Patent: June 22, 2004Assignee: International Business Machines CorporationInventor: Mark E. Epstein
-
Publication number: 20040111261Abstract: A speaker recognition technique is provided that can operate within the memory and processing constraints of existing portable computing devices. A smaller memory footprint and computational efficiency are achieved using single Gaussian models for each enrolled speaker. During enrollment, features are extracted from one or more enrollment utterances from each enrolled speaker, to generate a target speaker model based on a sample covariance matrix. During a recognition phase, features are extracted from one or more test utterances to generate a test utterance model that is also based on the sample covariance matrix. A sphericity ratio is computed that compares the test utterance model to the target speaker model, as well as a background model. The sphericity ratio indicates how similar test utterance speech is to the speech used when the user was enrolled, as represented by the target speaker model, and how dissimilar the test utterance speech is from the background model.Type: ApplicationFiled: December 10, 2002Publication date: June 10, 2004Applicant: International Business Machines CorporationInventors: Upendra V. Chaudhari, Ganesh N. Ramaswamy, Ran Zilca
-
Publication number: 20040102971Abstract: In a particular embodiment, the disclosure is directed to a method of recognizing input that includes receiving input data; receiving context data associated with the input data, the context data associated with an interpretation mapping; and generating symbolic data from the input data using the interpretation mapping. In another particular embodiment, the disclosure is directed to an input recognition system that includes a context module, an input capture module, and a recognition module. The context module is configured to receive context input and provide context data. The input capture module is configured to receive input data and is configured to provide digitized input data. The recognition module is coupled to the context module and is coupled to the input capture module. The recognition module is configured to receive the digitized input data and to interpret the digitized input data utilizing an interpretation mapping associated with the context data.Type: ApplicationFiled: August 11, 2003Publication date: May 27, 2004Applicant: RECARE, Inc.Inventors: Randolph B. Lipscher, Michael D. Dahlin
-
Patent number: 6725193Abstract: A voice recognition system for use with a communication system having an incoming line carrying an incoming signal from a first end to a second end operably attached to a speaker and the outgoing line carrying an outgoing signal from a microphone near the speaker. A first speech recognition unit (SRU) detects selected incoming words and a second SRU detect outgoing words. A comparator/signal generator compares the outgoing word with the incoming word and outputs the outgoing word when the outgoing word does not match the incoming word. The first SRU may be delayed relative to the second SRU. The SRU's may also search only for selected words in template, or may ignore words which are first detected by the other SRU. A signaler may also provide a signal indicating inclusion of one of the selected words in a known incoming signal with an SRU being responsive to that signal to ignore the included one command word in the template for a selected period of time.Type: GrantFiled: September 13, 2000Date of Patent: April 20, 2004Assignee: Telefonaktiebolaget LM EricssonInventor: Thomas J. Makovicka