Specialized Equations Or Comparisons Patents (Class 704/236)
-
Patent number: 7636426Abstract: A telecommunications device includes a voice dialer and a text-to-speech engine. The text-to-speech engine is configured to convert at least a portion of a user contact list information to speech and the voice dialer is configured to receive an audio input and perform a voice recognition, comparing said audio input to converted user contact list information.Type: GrantFiled: August 10, 2005Date of Patent: December 22, 2009Assignee: Siemens Communications, Inc.Inventors: Sarah Korah, John Vuong
-
Publication number: 20090313015Abstract: A multiple audio/video data stream simulation method and system. A computing system receives first audio and/or video data streams. The first audio and/or video data streams include data associated with a first person and a second person. The computing system monitors the first audio and/or video data streams. The computing system identifies emotional attributes comprised by the first audio and/or video data streams. The computing system generates second audio and/or video data streams associated with the first audio and/or video data streams. The second audio and/or video data streams include the first audio and/or video data streams data without the emotional attributes. The computing system stores the second audio and/or video data streams.Type: ApplicationFiled: June 13, 2008Publication date: December 17, 2009Inventors: Sara H. Basson, Dimitri Kanevsky, Edward Emile Kelley, Bhuvana Ramabhadran
-
Patent number: 7630894Abstract: A frame erasure concealment technique for a bitstream-based feature extractor in a speech recognition system particularly suited for use in a wireless communication system operates to “delete” each frame in which an erasure is declared. The deletions thus reduce the length of the observation sequence, but have been found to provide for sufficient speech recognition based on both single word and “string” tests of the deletion technique.Type: GrantFiled: August 1, 2006Date of Patent: December 8, 2009Assignee: AT&T Intellectual Property II, L.P.Inventors: Richard Vandervoort Cox, Hong Kook Kim
-
Patent number: 7627468Abstract: An apparatus enabling automatic determination of a portion that reliably represents a feature of a speech waveform includes: an acoustic/prosodic analysis unit calculating, from data, distribution of an energy of a prescribed frequency range of the speech waveform on a time axis, and for extracting, among various syllables of the speech waveform, a range that is generated stably, based on the distribution and the pitch of the speech waveform; cepstral analysis unit estimating, based on the spectral distribution of the speech waveform on the time axis, a range of the speech waveform of which change is well controlled by a speaker; and a pseudo-syllabic center extracting unit extracting, as a portion of high reliability of the speech waveform, that range which has been estimated to be the stably generated range and of which change is estimated to be well controlled by the speaker.Type: GrantFiled: February 21, 2003Date of Patent: December 1, 2009Assignees: Japan Science and Technology Agency, Advanced Telecommunication Research Institute InternationalInventors: Nick Campbell, Parham Mokhtari
-
Patent number: 7620263Abstract: An image processing system provides image enhancement and anti-clipping units. The anti-clipping unit for image sharpness enhancement, operates such that any shoot artifacts in the enhanced image that go beyond pixel value lower/upper bounds are properly adjusted back within the lower and upper bounds, without causing prominent edge jaggedness artifacts in the final resulting output image.Type: GrantFiled: October 6, 2005Date of Patent: November 17, 2009Assignee: Samsung Electronics Co., Ltd.Inventors: Surapong Lertrattanapanich, Yeong-Taeg Kim, Zhi Zhou
-
Patent number: 7617102Abstract: A speaker identifying apparatus includes: a module for performing a principal component analysis on predetermined vocal tract geometrical parameters of a plurality of speakers and calculating an average and principal component vectors representing speaker-dependent variation; a module for performing acoustic analysis on the speech data being uttered for each of the speakers to calculate cepstrum coefficients; a module for calculating principal component coefficients for approximating the vocal tract geometrical parameter of each of the plurality of speakers by a linear sum of principal component coefficients; a module for determining, by multiple regression analysis, a coefficient sequence for estimating principal component coefficients by a linear sum of the plurality of prescribed features, for each of the plurality of speakers; a module for calculating a plurality of features from speech data of the speaker to be identified, and estimating principal component coefficients for calculating the vocal tract geType: GrantFiled: September 27, 2006Date of Patent: November 10, 2009Assignee: Advanced Telecommunications Research Institute InternationalInventors: Parham Mokhtari, Tatsuya Kitamura, Hironori Takemoto, Seiji Adachi, Kiyoshi Honda
-
Publication number: 20090276216Abstract: A method for speech recognition, the method includes: extracting time—frequency speech features from a series of reference speech elements in a first series of sampling windows; aligning reference speech elements that are not of equal time span duration; constructing a common subspace for the aligned speech features; determining a first set of coefficient vectors; extracting a time—frequency feature image from a test speech stream spanned by a second sampling window; approximating the extracted image in the common subspace for the aligned extracted time—frequency speech features with a second coefficient vector; computing a similarity measure between the first and the second coefficient vector; determining if the similarity measure is below a predefined threshold; and wherein a match between the reference speech elements and a portion of the test speech stream is made in response to a similarity measure below a predefined threshold.Type: ApplicationFiled: May 2, 2008Publication date: November 5, 2009Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Lisa Amini, Pascal Frossard, Effrosyni Kokiopoulou, Oliver Verscheure
-
Publication number: 20090265170Abstract: An audio feature is extracted from audio signal data for each analysis frame and stored in a storage part. Then, the audio feature is read from the storage part, and an emotional state probability of the audio feature corresponding to an emotional state is calculated using one or more statistical models constructed based on previously input learning audio signal data. Then, based on the calculated emotional state probability, the emotional state of a section including the analysis frame is determined.Type: ApplicationFiled: September 13, 2007Publication date: October 22, 2009Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATIONInventors: Go Irie, Kouta Hidaka, Takashi Satou, Yukinobu Taniguchi, Shinya Nakajima
-
Patent number: 7603274Abstract: A method and apparatus for determining the possibility of pattern recognition of time series signal independent of a pattern recognition ratio is provided. The method for determining the possibility of pattern recognition of time series signal includes extracting a time forward feature and a time reversed feature from an input signal having a time series pattern, generating time forward alignment and time reversed alignment by using the time forward feature and the time reversed feature, comparing the time forward alignment with the time reversed alignment to compute a likelihood of pattern recognition, and determining that the input signal can be recognized if the likelihood is larger than a predetermined threshold value.Type: GrantFiled: November 2, 2005Date of Patent: October 13, 2009Assignee: Samsung Electronics Co., Ltd.Inventor: Kwangil Hwang
-
Patent number: 7603278Abstract: A segment set before updating is read, and clustering considering a phoneme environment is performed to it. For each cluster obtained by the clustering, a representative segment of a segment set belonging to the cluster is generated. For each cluster, a segment belonging to the cluster is replaced with the representative segment so as to update the segment set.Type: GrantFiled: September 14, 2005Date of Patent: October 13, 2009Assignee: Canon Kabushiki KaishaInventors: Toshiaki Fukada, Masayuki Yamada, Yasuhiro Komori
-
Patent number: 7593842Abstract: A device and method for translating language is disclosed. In one embodiment, for example, a method for providing a translated output signal derived from a speech input signal, comprises receiving a speech input signal in a first language, converting the speech input signal into a digital format, comprising a voice model component representing a speech pattern of the speech input signal and a content component representing a content of the speech input signal, translating the content component from the first language into a second language to provide a translated content component; and generating an audible output signal comprising the translated content in an approximation of the speech pattern of the speech input signal.Type: GrantFiled: December 10, 2003Date of Patent: September 22, 2009Inventor: Leslie Rousseau
-
Patent number: 7590605Abstract: A system is described for matching lattices such as phoneme lattices generated by an automatic speech recognition unit. The system can be used to retrieve files from a database by comparing a query lattice with annotation lattices associated with the data files that can be retrieved, and by retrieving the data files having an annotation lattice most similar to the query lattice.Type: GrantFiled: July 16, 2004Date of Patent: September 15, 2009Assignee: Canon Kabushiki KaishaInventor: Ljubomir Josifovski
-
Patent number: 7590537Abstract: A speech recognition method and apparatus perform speaker clustering and speaker adaptation using average model variation information over speakers while analyzing the quantity variation amount and the directional variation amount. In the speaker clustering method, a speaker group model variation is generated based on the model variation between a speaker-independent model and a training speaker ML model. In the speaker adaptation method, the model in which the model variation between a test speaker ML model and a speaker group ML model to which the test speaker belongs which is most similar to a training speaker group model variation is found, and speaker adaptation is performed on the found model. Herein, the model variation in the speaker clustering and the speaker adaptation are calculated while analyzing both the quantity variation amount and the directional variation amount. The present invention may be applied to any speaker adaptation algorithm of MLLR and MAP.Type: GrantFiled: December 27, 2004Date of Patent: September 15, 2009Assignee: Samsung Electronics Co., Ltd.Inventors: Namhoon Kim, Injeong Choi, Yoonkyung Song
-
Patent number: 7587318Abstract: A speech recognition device can include an audio signal receiver configured to receive audio signals from a speech source, a video signal receiver configured to receive video signals from the speech source, and a processing unit configured to process the audio signals and the video signals. In addition, the speech recognition device can include a conversion unit configured to convert the audio signals and the video signals to recognizable speech, and an implementation unit configured to implement a task based on the recognizable speech.Type: GrantFiled: September 12, 2003Date of Patent: September 8, 2009Assignee: Broadcom CorporationInventor: Nambi Seshadri
-
Publication number: 20090210226Abstract: A method, system and communication device for enabling voice-to-voice searching and ordered content retrieval via audio tags assigned to individual content, which tags generate uniterms that are matched against components of a voice query. The method includes storing content and tagging at least one of the content with an audio tag. The method further includes receiving a voice query to retrieve content stored on the device. When the voice query is received, the method completes a voice-to-voice search utilizing uniterms of the audio tag, scored against the phoneme latent lattice model generated by the voice query to identify matching terms within the audio tags and corresponding stored content. The retrieved content(s) associated with the identified audio tags having uniterms that score within the phoneme lattice model are outputted in an order corresponding to an order in which the uniterms are structured within the voice query.Type: ApplicationFiled: February 15, 2008Publication date: August 20, 2009Inventor: Changxue Ma
-
Patent number: 7574357Abstract: Method and system for generating electromyographic or sub-audible signals (“SAWPs”) and for transmitting and recognizing the SAWPs that represent the original words and/or phrases. The SAWPs may be generated in an environment that interferes excessively with normal speech or that requires stealth communications, and may be transmitted using encoded, enciphered or otherwise transformed signals that are less subject to signal distortion or degradation in the ambient environment.Type: GrantFiled: June 24, 2005Date of Patent: August 11, 2009Assignee: The United States of America as represented by the Admimnistrator of the National Aeronautics and Space Administration (NASA)Inventors: C. Charles Jorgensen, Bradley J. Betts
-
Patent number: 7571098Abstract: Word lattices that are generated by an automatic speech recognition system are used to generate a modified word lattice that is usable by a spoken language understanding module. In one embodiment, the spoken language understanding module determines a set of salient phrases by calculating an intersection of the modified word lattice, which is optionally preprocessed, and a finite state machine that includes a plurality of salient grammar fragments.Type: GrantFiled: May 29, 2003Date of Patent: August 4, 2009Assignee: AT&T Intellectual Property II, L.P.Inventors: Allen Louis Gorin, Dilek Z. Hakkani-Tur, Giuseppe Riccardi, Gokhan Tur, Jeremy Huntley Wright
-
Publication number: 20090192788Abstract: In a sound processing device, a modulation spectrum specifier specifies a modulation spectrum of an input sound for each of a plurality of unit intervals. An index calculator calculates an index value corresponding to a magnitude of components of modulation frequencies belonging to a predetermined range of the modulation spectrum. A determinator determines whether the input sound of each of the unit intervals is a vocal sound or a non-vocal sound based on the index value.Type: ApplicationFiled: January 23, 2009Publication date: July 30, 2009Applicant: Yamaha CorporationInventor: Yasuo YOSHIOKA
-
Patent number: 7567903Abstract: A method and apparatus for performing speech recognition are provided. A Vocal Tract Length Normalized acoustic model for a speaker is generated from training data. Speech recognition is performed on a first recognition input to determine a first best hypothesis. A first Vocal Tract Length Normalization factor is estimated based on the first best hypothesis. Speech recognition is performed on a second recognition input using the Vocal Tract Length Normalized acoustic model to determine an other best hypothesis. An other Vocal Tract Length Normalization factor is estimated based on the other best hypothesis and at least one previous best hypothesis.Type: GrantFiled: January 12, 2005Date of Patent: July 28, 2009Assignee: AT&T Intellectual Property II, L.P.Inventors: Vincent Goffin, Andrej Ljolje, Murat Saraclar
-
Patent number: 7565213Abstract: A significant short-time spectrum is extracted from an information signal, the means for extracting being configured to extract such short-time spectra which come closer to a specific characteristic than others. The short-time spectra extracted are then decomposed into component signals using ICA analysis, a component signal spectrum representing a profile spectrum of a tone source which generates a tone corresponding to the characteristic sought. From a sequence of short-time spectra of the information signal and from the profile spectra determined, an amplitude envelope is calculated for each profile spectrum to indicate how a tone source profile spectrum changes over time. The profile spectra and all the amplitude envelopes associated therewith provide a description of the information signal which may be evaluated further, for example for transcription purposes in the case of a music signal.Type: GrantFiled: May 5, 2005Date of Patent: July 21, 2009Assignee: Gracenote, Inc.Inventors: Christian Dittmar, Christian Uhle, Jürgen Herre
-
Patent number: 7562014Abstract: A large amount of human labor is required to transcribe and annotate a training corpus that is needed to create and update models for automatic speech recognition (ASR) and spoken language understanding (SLU). Active learning enables a reduction in the amount of transcribed and annotated data required to train ASR and SLU models. In one aspect of the present invention, an active learning ASR process and active learning SLU process are coupled, thereby enabling further efficiencies to be gained relative to a process that maintains an isolation of data in both the ASR and SLU domains.Type: GrantFiled: September 26, 2007Date of Patent: July 14, 2009Assignee: AT&T Intellectual Property II, L.P.Inventors: Dilek Z Hakkani-Tur, Mazin G Rahim, Giuseppe Riccardi, Gokhan Tur
-
Publication number: 20090157400Abstract: The invention relates to a speech recognition system and method with cepstral noise subtraction. The speech recognition system and method utilize a first scalar coefficient, a second scalar coefficient, and a determining condition to limit the process for the cepstral feature vector, so as to avoid excessive enhancement or subtraction in the cepstral feature vector, so that the operation of the cepstral feature vector is performed properly to improve the anti-noise ability in speech recognition. Furthermore, the speech recognition system and method can be applied in any environment, and have a low complexity and can be easily integrated into other systems, so as to provide the user with a more reliable and stable speech recognition result.Type: ApplicationFiled: October 1, 2008Publication date: June 18, 2009Applicant: Industrial Technology Research InstituteInventor: Shih-Ming Huang
-
Patent number: 7546236Abstract: This invention identifies anomalies in a data stream, without prior training, by measuring the difficulty in finding similarities between neighborhoods in the ordered sequence of elements. Data elements in an area that is similar to much of the rest of the scene score low mismatches. On the other hand a region that possesses many dissimilarities with other parts of the ordered sequence will attract a high score of mismatches. The invention makes use of a trial and error process to find dissimilarities between parts of the data stream and does not require prior knowledge of the nature of the anomalies that may be present. The method avoids the use of processing dependencies between data elements and is capable of a straightforward parallel implementation for each data element. The invention is of application in searching for anomalous patterns in data streams, which include audio signals, health screening and geographical data. A method of error correction is also described.Type: GrantFiled: March 24, 2003Date of Patent: June 9, 2009Assignee: British Telecommunications public limited companyInventor: Frederick W M Stentiford
-
Publication number: 20090125306Abstract: The present invention proposes a method, system and computer program for speech recognition. According to one embodiment, a method is provided wherein, for an expected input string divided into a plurality of expected string segments, a speech segment is received for each expected string segment. Speech recognition is then performed separately on each said speech segment via the generation, for each said speech segment, of a segment n-best list comprising n highest confidence score results. A global n-best list is then generated corresponding to the expected input string utilizing the segment n-best lists and a final global speech recognition result corresponding to said expected input string is determined via the pruning of the results of the global n-best list utilizing a pruning criterion.Type: ApplicationFiled: September 19, 2008Publication date: May 14, 2009Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Remi Lejeune, Hubert Crepy
-
Patent number: 7533015Abstract: Provides speech enhancement techniques for extemporaneous noise without a noise interval and unknown extemporaneous noise. Signal enhancement includes: subtracting a given reference signal from an input signal containing a target signal and a noise signal by spectral subtraction; applying an adaptive filter to the reference signal; and controlling a filter coefficient of the adaptive filter in order to reduce components of the noise signal in the input signal. In signal enhancement, a database of a signal model concerning the target signal expressing a given feature by a given statistical model is provided, and the filter coefficient is controlled based on the likelihood of the signal model with respect to an output signal from the spectral subtraction means.Type: GrantFiled: February 28, 2005Date of Patent: May 12, 2009Assignee: International Business Machines CorporationInventors: Tetsuya Takiguchi, Masafumi Nishimura
-
Patent number: 7529666Abstract: In connection with speech recognition, the design of a linear transformation ??p×n, of rank p×n, which projects the features of a classifier x?n onto y=?x?p such as to achieve minimum Bayes error (or probability of misclassification). Two avenues are explored: the first is to maximize the ?-average divergence between the class densities and the second is to minimize the union Bhattacharyya bound in the range of ?. While both approaches yield similar performance in practice, they outperform standard linear discriminant analysis features and show a 10% relative improvement in the word error rate over known cepstral features on a large vocabulary telephony speech recognition task.Type: GrantFiled: October 30, 2000Date of Patent: May 5, 2009Assignee: International Business Machines CorporationInventors: Mukund Padmanabhan, George A. Saon
-
Patent number: 7529665Abstract: A two stage utterance verification device and a method thereof are provided. The two stage utterance verification method includes performing a first utterance verification function based on a SVM pattern classification method by using feature data inputted from a search block of a speech recognizer and performing a second utterance verification function based on a CART pattern classification method by using heterogeneity feature data including meta data extracted from a preprocessing module, intermediate results from function blocks of the speech recognizer and the result of the first utterance verification function. Therefore, the two state utterance verification device and the method thereof provide a high quality speech recognition service to a user.Type: GrantFiled: April 1, 2005Date of Patent: May 5, 2009Assignee: Electronics and Telecommunications Research InstituteInventors: Sanghun Kim, YoungJik Lee
-
Patent number: 7529668Abstract: A system and method for implementing a refined dictionary for speech recognition includes a database analyzer that initially identifies first vocabulary words that are present in a training database and second vocabulary words that are not present in the training database. A relevance module then performs refinement procedures upon the first vocabulary words to produce refined short word pronunciations and refined long word pronunciations that are added to a refined dictionary. A consensus module compares the second pronunciations with calculated plurality pronunciations to identify final consensus pronunciations that are then included in the refined dictionary.Type: GrantFiled: August 3, 2004Date of Patent: May 5, 2009Assignees: Sony Corporation, Sony Electronics Inc.Inventors: Gustavo Abrego, Lex S. Olorenshaw
-
Publication number: 20090112585Abstract: Recognizing a stream of speech received as speech vectors over a lossy communications link includes constructing for a speech recognizer a series of speech vectors from packets received over a lossy packetized transmission link, wherein some of the packets associated with each speech vector are lost or corrupted during transmission. Each constructed speech vector is multi-dimensional and includes associated features. After waiting for a predetermined time, speech vectors are generated and potentially corrupted features within the speech vector are indicated to the speech recognizer when present. Speech recognition is attempted at the speech recognizer on the speech vectors when corrupted features are present. This recognition may be based only on certain or valid features within each speech vector. Retransmission of a missing or corrupted packet is requested when corrupted values are indicated by the indicating step and when the attempted recognition step fails.Type: ApplicationFiled: December 29, 2008Publication date: April 30, 2009Applicant: AT&T Corp.Inventors: Richard Vandervoort Cox, Stephen Michael Marcus, Mazin G. Rahim, Nambirajan Seshadri, Robert Douglas Sharp
-
Patent number: 7509256Abstract: It is intended to increase the recognition rate in speech recognition and image recognition. An observation vector as input data, which represents a certain point in the observation vector space, is mapped to a distribution having a spread in the feature vector space, and a feature distribution parameter representing the distribution is determined. Pattern recognition of the input data is performed based on the feature distribution parameter.Type: GrantFiled: March 29, 2005Date of Patent: March 24, 2009Assignee: Sony CorporationInventors: Naoto Iwahashi, Hongchang Bao, Hitoshi Honda
-
Patent number: 7505897Abstract: The subject matter includes systems, engines, and methods for generalizing a class of Lempel-Ziv algorithms for lossy compression of multimedia. One implementation of the subject matter compresses audio signals. Because music, especially electronically generated music, has a substantial level of repetitiveness within a single audio clip, the basic Lempel-Ziv compression technique can be generalized to support representing a single window of an audio signal using a linear combination of filtered past windows. Exemplary similarity searches and filtering strategies for finding the past windows are described.Type: GrantFiled: January 27, 2005Date of Patent: March 17, 2009Assignee: Microsoft CorporationInventors: Darko Kirovski, Zeph Landau
-
Publication number: 20090070110Abstract: A MMR system for newspaper publishing comprises a plurality of mobile devices, an MMR gateway, an MMR matching unit and an MMR publisher. The MMR matching unit receives an image query from the MMR gateway and sends it to one or more of the recognition units to identify a result including a document, the page and the location on the page. The MMR matching unit also includes a result combiner coupled to each of the recognition units to receive recognition results. The result combiner produces a list of most likely results and associated confidence scores. This list of results is sent by the result combiner back to the MMR gateway for presentation on the mobile device. The result combiner uses the quality predictor as an input in deciding which results are best. The present invention also includes a number of novel methods including a method for generating the list of best results.Type: ApplicationFiled: September 15, 2008Publication date: March 12, 2009Inventors: Berna Erol, Jonathan J. Hull, Jorge Moraleda
-
Publication number: 20090063144Abstract: An automatic speech recognition (ASR) system and method is provided for controlling the recognition of speech utterances generated by an end user operating a communications device. The ASR system and method can be used with a communications device that is used in a communications network. The ASR system can be used for ASR of speech utterances input into a mobile device, to perform compensating techniques using at least one characteristic and for updating an ASR speech recognizer associated with the ASR system by determined and using a background noise value and a distortion value that is based on the features of the mobile device. The ASR system can be used to augment a limited data input capability of a mobile device, for example, caused by limited input devices physically located on the mobile device.Type: ApplicationFiled: November 4, 2008Publication date: March 5, 2009Applicant: AT&T Corp.Inventors: Richard C. Rose, Sarangarajan Pathasarathy, Aaron Edward Rosenberg, Shrikanth Sambasivan Narayanan
-
Publication number: 20090048835Abstract: A feature extracting apparatus includes a spectrum calculator that calculates a logarithmic frequency spectrum including frequency components obtained from an input speech signal at regular intervals on a logarithmic frequency scale of a frame; a function calculator that calculates a cross-correlation function between a logarithmic frequency spectrum of a time and a logarithmic frequency spectrum of one or plural times included in a certain temporal width before and after the time, from a sequence of the logarithmic frequency spectra calculated at each time; and a feature extractor that extracts a set of the cross-correlation functions as a local and relative fundamental-frequency pattern feature at the frame.Type: ApplicationFiled: March 4, 2008Publication date: February 19, 2009Applicant: KABUSHIKI KAISHA TOSHIBAInventor: Takashi Masuko
-
Patent number: 7493257Abstract: To handle portions of a recognized sentence having an error, a user is questioned about contents associated with portions. According to a user's answer, a result is obtained. Speech recognition unit extracts a speech feature of a speech signal inputted from user and finds a phoneme nearest to the speech feature to recognize a word. Recognition error determination unit finds a sentence confidence based on a confidence of the recognized word, performs examination of a semantic structure of a recognized sentence, and determines whether or not an error exists in the recognized sentence which is subjected to speech recognition according to predetermined criterion based on both sentence confidence and result of examining semantic structure. Meta-dialogue generation unit generates a question asking user for additional information based on content of a portion where the error exists and a type of the error.Type: GrantFiled: August 5, 2004Date of Patent: February 17, 2009Assignee: Samsung Electronics Co., Ltd.Inventors: Jung-eun Kim, Jae-won Lee
-
Publication number: 20090030685Abstract: Speech recorded by an audio capture facility of a navigation facility is processed by a speech recognition facility to generate results that are provided to the navigation facility. When information related to a navigation application running on the navigation facility are provided to the speech recognition facility, the results generated are based at least in part on the application related information. The speech recognition facility uses an unstructured language model for generating results. The user of the navigation facility may optionally be allowed to edit the results being provided to the navigation facility. The speech recognition facility may also adapt speech recognition based on usage of the results.Type: ApplicationFiled: August 1, 2008Publication date: January 29, 2009Inventors: Joseph P. Cerra, John N. Nguyen, Michael S. Phillips, Han Shu, Yongdeng Chen
-
Publication number: 20090030683Abstract: Disclosed are methods, systems, and computer-readable media for tracking dialog states in a spoken dialog system. The method comprises casting a plurality of dialog states, or particles, as a network describing the probability relationships between each of a plurality of variables, sampling a subset of the plurality of dialog states, or particles, in the network, for each sampled dialog state, or particle, projecting into the future, assigning a weight to each sampled particle, and normalizing the assigned weights to yield a new estimated distribution over each variable's values, wherein the distribution over the variables is used in a spoken dialog system. Also disclosed is a method of tuning performance of the methods, systems, and computer-readable media by adding or removing particles to/from the network.Type: ApplicationFiled: July 26, 2007Publication date: January 29, 2009Applicant: AT&T Labs, IncInventor: Jason WILLIAMS
-
Publication number: 20090030684Abstract: A method and system for entering information into a software application resident on a mobile communication facility is provided. The method and system may include recording speech presented by a user using a mobile communication facility resident capture facility, transmitting the recording through a wireless communication facility to a speech recognition facility, transmitting information relating to the software application to the speech recognition facility, generating results utilizing the speech recognition facility using an unstructured language model based at least in part on the information relating to the software application and the recording, transmitting the results to the mobile communications facility, loading the results into the software application and simultaneously displaying the results as a set of words and as a set of application results based on those words.Type: ApplicationFiled: August 1, 2008Publication date: January 29, 2009Inventors: Joseph P. Cerra, John N. Nguyen, Michael S. Phillips, Han Shu, Alexandra Beth Mischke
-
Publication number: 20090024390Abstract: A method of speech recognition converts an unknown speech input into a stream of representative features. The feature stream is transformed based on speaker dependent adaptation of multi-class feature models. Then automatic speech recognition is used to compare the transformed feature stream to multi-class speaker independent acoustic models to generate an output representative of the unknown speech input.Type: ApplicationFiled: May 2, 2008Publication date: January 22, 2009Applicant: NUANCE COMMUNICATIONS, INC.Inventors: Neeraj Deshmukh, Puming Zhan
-
Patent number: 7480615Abstract: A method of efficiently setting posterior probability parameters for a switching state space model begins by defining a window containing at least two but fewer than all of the frames. A separate posterior probability parameter is determined for each frame in the window. The window is then shifted sequentially from left to right in time so that it includes one or more subsequent frames in the sequence of frames. A separate posterior probability parameter is then determined for each frame in the shifted window. This method closely approximates a more rigorous solution but saves computational cost by two to three orders of magnitude. Further, a method of determining the optimal discrete state sequence in the switching state space model is invented that directly exploits the observation vector on a frame-by-frame basis and operates from left to right in time.Type: GrantFiled: January 20, 2004Date of Patent: January 20, 2009Assignee: Microsoft CorporationInventors: Hagai Attias, Li Deng, Leo Lee
-
Patent number: 7478045Abstract: In a method for characterizing a signal representing an audio content a measure is determined for a tonality of the signal, whereupon a statement is made about the audio content of the signal on the basis of the measure for the tonality of the signal. The measure for the tonality is derived from a quotient whose numerator is the mean of the summed values of spectral components of the signal exponentiated with a first power and whose denominator is the mean of the summed values of spectral components exponentiated with a second power, the first and second powers differing from each other. The measure for the tonality of the signal for the content analysis is robust in relation to a signal distortion, due e.g. to MP3 coding, and has a high correlation with the content of the analyzed signal.Type: GrantFiled: July 15, 2002Date of Patent: January 13, 2009Assignee: M2ANY GmbHInventors: Eric Allamanche, Jürgen Herre, Oliver Hellmuth, Thorsten Kastner
-
Patent number: 7475012Abstract: Robust signal detection against various types of background noise is implemented. According to a signal detection apparatus, the feature amount of an input signal sequence and the feature amount of a noise component contained in the signal sequence are extracted. After that, the first likelihood indicating probability that the signal sequence is detected and the second likelihood indicating probability that the noise component is detected are calculated on the basis of a predetermined signal-to-noise ratio and the extracted feature amount of the signal sequence. Additionally, a likelihood ratio indicating the ratio between the first likelihood and the second likelihood is calculated. Detection of the signal sequence is determined on the basis of the likelihood ratio.Type: GrantFiled: December 9, 2004Date of Patent: January 6, 2009Assignee: Canon Kabushiki KaishaInventors: Philip Garner, Toshiaki Fukada, Yasuhiro Komori
-
Patent number: 7472063Abstract: A speech recognition method includes several embodiments describing application of support vector machine analysis to a mouth region. Lip position can be accurately determined and used in conjunction with synchronous or asynchronous audio data to enhance speech recognition probabilities.Type: GrantFiled: December 19, 2002Date of Patent: December 30, 2008Assignee: Intel CorporationInventors: Ara V. Nefian, Xiaobo Pi, Luhong Liang, Xiaoxing Liu, Yibao Zhao
-
Publication number: 20080306738Abstract: Voice processing methods and systems are provided. An utterance is received. The utterance is compared with teaching materials according to at least one matching algorithm to obtain a plurality of matching values corresponding to a plurality of voice units of the utterance. Respective voice units are scored in at least one first scoring item according to the matching values and a personified voice scoring algorithm. The personified voice scoring algorithm is generated according to training utterances corresponding to at least one training sentence in a phonetic-balanced sentence set of a plurality of learners and at least one real teacher, and scores corresponding to the respective voice units of the training utterances of the learners in the first scoring item provided by the real teacher.Type: ApplicationFiled: June 6, 2008Publication date: December 11, 2008Applicant: NATIONAL TAIWAN UNIVERSITYInventors: Lin-Shan Lee, Che-Kuang Lin, Chia-Lin Chang, Yi-Jing Lin, Yow-Bang Wang, Yun-Huan Lee, Li-Wei Cheng
-
Patent number: 7464031Abstract: In a speech recognition system, the combination of a log-linear model with a multitude of speech features is provided to recognize unknown speech utterances. The speech recognition system models the posterior probability of linguistic units relevant to speech recognition using a log-linear model. The posterior model captures the probability of the linguistic unit given the observed speech features and the parameters of the posterior model. The posterior model may be determined using the probability of the word sequence hypotheses given a multitude of speech features. Log-linear models are used with features derived from sparse or incomplete data. The speech features that are utilized may include asynchronous, overlapping, and statistically non-independent speech features. Not all features used in training need to appear in testing/recognition.Type: GrantFiled: November 28, 2003Date of Patent: December 9, 2008Assignee: International Business Machines CorporationInventors: Scott E. Axelrod, Sreeram Viswanath Balakrishnan, Stanley F. Chen, Yuging Gao, Ramesh A. Gopinath, Hong-Kwang Kuo, Benoit Maison, David Nahamoo, Michael Alan Picheny, George A. Saon, Geoffrey G. Zweig
-
Publication number: 20080300875Abstract: A speech recognition method and system, the method comprising the steps of providing a speech model, said speech model includes at least a portion of a state of Gaussian, clustering said Gaussian of said speech model to give N clusters of Gaussians, wherein N is an integer and utilizing said Gaussian in recognizing an utterance.Type: ApplicationFiled: June 4, 2008Publication date: December 4, 2008Inventors: Kaisheng Yao, Yu Tsao
-
Publication number: 20080270129Abstract: A method for automatically providing a hypothesis of a linguistic formulation that is uttered by users of a voice service based on an automatic speech recognition system and that is outside a recognition domain of the automatic speech recognition system. The method includes providing a constrained and an unconstrained speech recognition from an input speech signal, identifying a part of the constrained speech recognition outside the recognition domain, identifying a part of the unconstrained speech recognition corresponding to the identified part of the constrained speech recognition, and providing the linguistic formulation hypothesis based on the identified part of the unconstrained speech recognition.Type: ApplicationFiled: February 17, 2005Publication date: October 30, 2008Applicant: Loquendo S.p.A.Inventors: Daniele Colibro, Claudio Vair, Luciano Fissore, Cosmin Popovici
-
Publication number: 20080270130Abstract: Systems and methods for annotating speech data. The present invention reduces the time required to annotate speech data by selecting utterances for annotation that will be of greatest benefit. A selection module uses speech models, including speech recognition models and spoken language understanding models, to identify utterances that should be annotated based on criteria such as confidence scores generated by the models. These utterances are placed in an annotation list along with a type of annotation to be performed for the utterances and an order in which the annotation should proceed. The utterances in the annotation list can be annotated for speech recognition purposes, spoken language understanding purposes, labeling purposes, etc. The selection module can also select utterances for annotation based on previously annotated speech data and deficiencies in the various models.Type: ApplicationFiled: July 1, 2008Publication date: October 30, 2008Applicant: AT&T Corp.Inventors: Tirso M. Alonso, Ilana Bromberg, Dilek Z. Hakkani-Tur, Barbara B. Hollister, Mazin G. Rahim, Giuseppe Riccardi, Lawrence Lyon Rose, Daniel Leon Stern, Gokhan Tur, James M. Wilson
-
Publication number: 20080249773Abstract: A method and system for automatically generating a scoring model for scoring a speech sample are disclosed. One or more training speech samples are received in response to a prompt. One or more speech features are determined for each of the training speech samples. A scoring model is then generated based on the speech features. At least one of the training speech samples may be a high entropy speech sample. An evaluation speech sample is received and a score is assigned to the evaluation speech sample using the scoring model. The evaluation speech sample may be a high entropy speech sample.Type: ApplicationFiled: June 16, 2008Publication date: October 9, 2008Inventors: Isaac Bejar, Klaus Zechner
-
Publication number: 20080235015Abstract: A system and method for speech recognition includes determining active Gaussians related to a first feature stream and a second feature stream by labeling at least one of the first and second streams, and determining active Gaussians co-occurring in the first stream and the second stream based upon joint probability. A number of Gaussians computed is reduced based upon Gaussians already computed for the first stream and a number of Gaussians co-occurring in the second stream. Speech is decoded based on the Gaussians computed for the first and second streams.Type: ApplicationFiled: June 2, 2008Publication date: September 25, 2008Inventors: Stephen Mingyu Chu, Vaibhava Goel, Etienne Marcheret, Gerasimos Potamianos