Update Patterns Patents (Class 704/244)
-
Patent number: 7957968Abstract: The invention includes a computer based system or method for automatically generating a grammar associated with a first task comprising the steps of: receiving first data representing the first task based from responses received from a distributed network; automatically tagging the first data into parts of speech to form first tagged data; identifying filler words and core words from said first tagged data; modeling sentence structure based upon said first tagged data using a first set of rules; identifying synonyms of said core words; and creating the grammar for the first task using said modeled sentence structure, first tagged data and said synonyms.Type: GrantFiled: December 12, 2006Date of Patent: June 7, 2011Assignee: Honda Motor Co., Ltd.Inventors: Rakesh Gupta, Ken Hennacy
-
Publication number: 20110119059Abstract: Disclosed herein are systems, methods, and computer-readable storage media for selecting a speech recognition model in a standardized speech recognition infrastructure. The system receives speech from a user, and if a user-specific supervised speech model associated with the user is available, retrieves the supervised speech model. If the user-specific supervised speech model is unavailable and if an unsupervised speech model is available, the system retrieves the unsupervised speech model. If the user-specific supervised speech model and the unsupervised speech model are unavailable, the system retrieves a generic speech model associated with the user. Next the system recognizes the received speech from the user with the retrieved model. In one embodiment, the system trains a speech recognition model in a standardized speech recognition infrastructure. In another embodiment, the system handshakes with a remote application in a standardized speech recognition infrastructure.Type: ApplicationFiled: November 13, 2009Publication date: May 19, 2011Applicant: AT&T Intellectual Property I, L.P.Inventors: Andrej LJOLJE, Bernard S. RENGER, Steven Neil TISCHER
-
Patent number: 7933777Abstract: A hybrid speech recognition system uses a client-side speech recognition engine and a server-side speech recognition engine to produce speech recognition results for the same speech. An arbitration engine produces speech recognition output based on one or both of the client-side and server-side speech recognition results.Type: GrantFiled: August 30, 2009Date of Patent: April 26, 2011Assignee: Multimodal Technologies, Inc.Inventor: Detlef Koll
-
Patent number: 7933774Abstract: A system and method is provided for rapidly generating a new spoken dialog application. In one embodiment, a user experience person labels the transcribed data (e.g., 3000 utterances) using a set of interactive tools. The labeled data is then stored in a processed data database. During the labeling process, the user experience person not only groups utterances in various call type categories, but also flags (e.g., 100-200) specific utterances as positive and negative examples for use in an annotation guide. The labeled data in the processed data database can also be used to generate an initial natural language understanding (NLU) model.Type: GrantFiled: March 18, 2004Date of Patent: April 26, 2011Assignee: AT&T Intellectual Property II, L.P.Inventors: Lee Begeja, Mazin G. Rahim, Allen Louis Gorin, Behzad Shahraray, David Crawford Gibbon, Zhu Liu, Bernard S. Renger, Patrick Guy Haffner, Harris Drucker, Steven Hart Lewis
-
Patent number: 7925505Abstract: Architecture is disclosed herewith for minimizing an empirical error rate by discriminative adaptation of a statistical language model in a dictation and/or dialog application. The architecture allows assignment of an improved weighting value to each term or phrase to reduce empirical error. Empirical errors are minimized whether a user provides correction results or not based on criteria for discriminatively adapting the user language model (LM)/context-free grammar (CFG) to the target. Moreover, algorithms are provided for the training and adaptation processes of LM/CFG parameters for criteria optimization.Type: GrantFiled: April 10, 2007Date of Patent: April 12, 2011Assignee: Microsoft CorporationInventor: Jian Wu
-
Publication number: 20110082697Abstract: A method is described for correcting and improving the functioning of certain devices for the diagnosis and treatment of speech that dynamically measure the functioning of the velum in the control of nasality during speech. The correction method uses an estimate of the vowel frequency spectrum to greatly reduce the variation of nasalance with the vowel being spoken, so as to result in a corrected value of nasalance that reflects with greater accuracy the degree of velar opening. Correction is also described for reducing the effect on nasalance values of energy from the oral and nasal channels crossing over into the other channel because of imperfect acoustic separation.Type: ApplicationFiled: October 6, 2009Publication date: April 7, 2011Applicant: Rothenberg EnterprisesInventor: Martin ROTHENBERG
-
Publication number: 20110077942Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for handling expected repeat speech queries or other inputs. The method causes a computing device to detect a misrecognized speech query from a user, determine a tendency of the user to repeat speech queries based on previous user interactions, and adapt a speech recognition model based on the determined tendency before an expected repeat speech query. The method can further include recognizing the expected repeat speech query from the user based on the adapted speech recognition model. Adapting the speech recognition model can include modifying an acoustic model, a language model, and/or a semantic model. Adapting the speech recognition model can also include preparing a personalized search speech recognition model for the expected repeat query based on usage history and entries in a recognition lattice. The method can include retaining unmodified speech recognition models with adapted speech recognition models.Type: ApplicationFiled: September 30, 2009Publication date: March 31, 2011Applicant: AT&T Intellectual Property I, L.P.Inventors: Andrej LJOLJE, Diamantino Antonio Caseiro
-
Patent number: 7912828Abstract: A computer-based method for identifying patterns in computer text using structures defining types of patterns which are to be identified, wherein a structure comprises one or more definition items, the method comprising assigning a weighting to each structure and each definition item; searching the computer text for a pattern to be identified on the basis of a particular structure, a pattern being provisionally identified if it matches the definition given by said particular structure; in a provisionally identified pattern, determining those of the definition items making up said particular structure that have been identified in the provisionally identified pattern; combining the weightings of the determined definition items and optionally, the weighting of the particular structure, to a single quantity; assessing whether the single quantity fulfils a given condition; depending on the result of said assessment, rejecting or confirming the provisionally identified pattern.Type: GrantFiled: February 23, 2007Date of Patent: March 22, 2011Assignee: Apple Inc.Inventors: Olivier Bonnet, Frédéric De Jaeger, Toby Paterson
-
Patent number: 7904298Abstract: This disclosure describes a practical system/method for predicting spoken text (a spoken word or a spoken sentence/phrase) given that text's partial spelling (example, initial characters forming the spelling of a word/sentence). The partial spelling may be given using “Speech” or may be inputted using the keyboard/keypad or may be obtained using other input methods. The disclosed system is an alternative method for inputting text into devices; the method is faster (especially for long words or phrases) compared to existing predictive-text-input and/or word-completion methods.Type: GrantFiled: November 16, 2007Date of Patent: March 8, 2011Inventor: Ashwin P. Rao
-
Patent number: 7899826Abstract: Determining a semantic relationship is disclosed. Source content is received. Cluster analysis is performed at least in part by using at least a portion of the source content. At least a portion of a result of the cluster analysis is used to determine the semantic relationship between two or more content elements comprising the source content.Type: GrantFiled: August 31, 2009Date of Patent: March 1, 2011Assignee: Apple Inc.Inventors: Philip Andrew Mansfield, Michael Robert Levy, Yuri Khramov, Darryl Will Fuller
-
Patent number: 7890328Abstract: Disclosed herein are methods and systems for recognizing speech. A method embodiment comprises comparing received speech with a precompiled grammar based on a database and if the received speech matches data in the precompiled grammar then returning a result based on the matched data. If the received speech does not match data in the precompiled grammar, then dynamically compiling a new grammar based only on new data added to the database after the compiling of the precompiled grammar. The database may comprise a directory of names.Type: GrantFiled: September 7, 2006Date of Patent: February 15, 2011Assignee: AT&T Intellectual Property II, L.P.Inventors: Harry Blanchard, Steven Lewis, Shankarnarayan Sivaprasad, Lan Zhang
-
Patent number: 7885812Abstract: Parameters for a feature extractor and acoustic model of a speech recognition module are trained. An objective function is utilized to determine values for the feature extractor parameters and the acoustic model parameters.Type: GrantFiled: November 15, 2006Date of Patent: February 8, 2011Assignee: Microsoft CorporationInventors: Alejandro Acero, James G. Droppo, Milind V. Mahajan
-
Patent number: 7881932Abstract: The present invention extends the VoiceXML language model to natively support voice enrolled grammars. Specifically, three VoiceXML tags can be added to the language model to add, modify, and delete acoustically provided phrases to voice enrolled grammars. Once created, the voice enrolled grammars can be used in normal speaker dependent speech recognition operations. That is, the voice enrolled grammars can be referenced and utilized just like text enrolled grammars can be referenced and utilized. For example using the present invention, voice enrolled grammars can be referenced by standard text-based Speech Recognition Grammar Specification (SRGS) grammars to create more complex, usable grammars.Type: GrantFiled: October 2, 2006Date of Patent: February 1, 2011Assignee: Nuance Communications, Inc.Inventor: Brien H. Muschett
-
Patent number: 7877258Abstract: Systems, methods, and apparatuses, including computer program products, are provided for representing language models. In some implementations, a computer-implemented method is provided. The method includes generating a compact language model including receiving a collection of n-grams from the corpus, each n-gram of the collection having a corresponding first probability of occurring in the corpus and generating a trie representing the collection of n-grams. The method also includes using the language model to identify a second probability of a particular string of words occurring.Type: GrantFiled: March 29, 2007Date of Patent: January 25, 2011Assignee: Google Inc.Inventors: Ciprian Chelba, Thorsten Brants
-
Publication number: 20100318355Abstract: Techniques and systems for training an acoustic model are described. In an embodiment, a technique for training an acoustic model includes dividing a corpus of training data that includes transcription errors into N parts, and on each part, decoding an utterance with an incremental acoustic model and an incremental language model to produce a decoded transcription. The technique may further include inserting silence between a pair of words into the decoded transcription and aligning an original transcription corresponding to the utterance with the decoded transcription according to time for each part. The technique may further include selecting a segment from the utterance having at least Q contiguous matching aligned words, and training the incremental acoustic model with the selected segment. The trained incremental acoustic model may then be used on a subsequent part of the training data. Other embodiments are described and claimed.Type: ApplicationFiled: June 10, 2009Publication date: December 16, 2010Applicant: MICROSOFT CORPORATIONInventors: Jinyu Li, Yifan Gong, Chaojun Liu, Kaisheng Yao
-
Patent number: 7853448Abstract: An electronic instrument includes: a display control unit for displaying a control content corresponding to the command information based on the result of the speech recognition; an instruction unit for instructing that a control for the control content displayed by the display control unit, is cancelled; a control unit for performing the control based on the command information based on the result of the speech recognition after a predetermined standby time elapses since the control content corresponding to the command information based on the result of the speech recognition starts to be displayed by the display control unit when the instruction unit does not instruct that the control for the control content is cancelled within the predetermined standby time, and for canceling the control based on the command information based on the result of the speech recognition when the instruction unit instructs that the control is cancelled within the predetermined standby time.Type: GrantFiled: April 16, 2007Date of Patent: December 14, 2010Assignee: Funai Electric Co., Ltd.Inventors: Shusuke Narita, Susumu Tokoshima
-
Publication number: 20100312556Abstract: Disclosed herein are systems, computer-implemented methods, and tangible computer-readable storage media for speaker recognition personalization. The method recognizes speech received from a speaker interacting with a speech interface using a set of allocated resources, the set of allocated resources including bandwidth, processor time, memory, and storage. The method records metrics associated with the recognized speech, and after recording the metrics, modifies at least one of the allocated resources in the set of allocated resources commensurate with the recorded metrics. The method recognizes additional speech from the speaker using the modified set of allocated resources. Metrics can include a speech recognition confidence score, processing speed, dialog behavior, requests for repeats, negative responses to confirmations, and task completions.Type: ApplicationFiled: June 9, 2009Publication date: December 9, 2010Applicant: AT & T Intellectual Property I , L.P.Inventors: Andrej LJOLJE, Alistair D. CONKIE, Ann K. SYRDAL
-
Patent number: 7844457Abstract: Methods are disclosed for automatic accent labeling without manually labeled data. The methods are designed to exploit accent distribution between function and content words.Type: GrantFiled: February 20, 2007Date of Patent: November 30, 2010Assignee: Microsoft CorporationInventors: YiNing Chen, Frank Kao-ping Soong, Min Chu
-
Patent number: 7823144Abstract: A method, apparatus and computer program product for comparing two computer program codes is disclosed. For each code, a stream of lexemes is generated for the program text of each code. The streams are concatenated in the same order as the program text. The two concatenated streams of lexemes are compared on a language-type by language-type basis to identify lexemes present only in one stream. The comparison derives a set of edit operations including minimal text block moves needed to convert one program code into the other program code.Type: GrantFiled: December 29, 2005Date of Patent: October 26, 2010Assignee: International Business Machines CorporationInventors: Donald P Pazel, Pradeep Varma
-
Patent number: 7813926Abstract: A training system for a speech recognition application is disclosed. In embodiments described, the training system is used to train a classification model or language model. The classification model is trained using an adaptive language model generated by an iterative training process. In embodiments described, the training data is recognized by the speech recognition component and the recognized text is used to create the adaptive language model which is used for speech recognition in a following training iteration.Type: GrantFiled: March 16, 2006Date of Patent: October 12, 2010Assignee: Microsoft CorporationInventors: Ye-Yi Wang, John Sie Yuen Lee, Alex Acero
-
Publication number: 20100256978Abstract: A method for performing speech recognition relating to an object for the purpose of affecting automatic processing of the object by a processing system. The object carries information with at least a character string of processing information. The character string spoken by an operator is processed by way of a speech recognition procedure to generate a first result. Based on the need for more information of an element of the first result additional processing data is requested. An operator's response generates a second result. The first result is then modified to achieve consistency with the operator's response.Type: ApplicationFiled: April 6, 2010Publication date: October 7, 2010Applicant: SIEMENS AKTIENGESELLSCHAFTInventor: Walter Rosenbaum
-
Patent number: 7797156Abstract: Presented herein are systems and methods for generating an adaptive noise codebook for use with electronic speech systems. The noise codebook includes a plurality of entries which may be updated based on environmental noise sounds. The speech system includes a speech codebook and the adaptive noise codebook. The system identifies speech sounds in an audio signal using the speech and noise codebooks.Type: GrantFiled: February 15, 2006Date of Patent: September 14, 2010Assignee: Raytheon BBN Technologies Corp.Inventors: Robert David Preuss, Darren Ross Fabbri, Daniel Ramsay Cruthirds
-
Publication number: 20100204988Abstract: A speech recognition method includes receiving a speech input signal in a first noise environment which includes a sequence of observations, determining the likelihood of a sequence of words arising from the sequence of observations using an acoustic model, adapting the model trained in a second noise environment to that of the first environment, wherein adapting the model trained in the second environment to that of the first environment includes using second order or higher order Taylor expansion coefficients derived for a group of probability distributions and the same expansion coefficient is used for the whole group.Type: ApplicationFiled: April 20, 2010Publication date: August 12, 2010Inventors: Haitian XU, Kean Kheong Chin
-
Patent number: 7769588Abstract: The method of operating a man-machine interface unit includes classifying at least one utterance of a speaker to be of a first type or of a second type. If the utterance is classified to be of the first type, the utterance belongs to a known speaker of a speaker data base, and if the utterance is classified to be of the second type, the utterance belongs to an unknown speaker that is not included in the speaker data base. The method also includes storing a set of utterances of the second type, clustering the set of utterances into clusters, wherein each cluster comprises utterances having similar features, and automatically adding a new speaker to the speaker data base based on utterances of one of the clusters.Type: GrantFiled: August 20, 2008Date of Patent: August 3, 2010Assignee: Sony Deutschland GmbHInventors: Ralf Kompe, Thomas Kemp
-
Publication number: 20100191530Abstract: A speech understanding apparatus includes a speech recognition unit which performs speech recognition of an utterance using multiple language models, and outputs multiple speech recognition results obtained by the speech recognition, a language understanding unit which uses multiple language understanding models to perform language understanding for each of the multiple speech recognition results output from the speech recognition unit, and outputs multiple speech understanding results obtained from the language understanding, and an integrating unit which calculates, based on values representing features of the speech understanding results, utterance batch confidences that numerically express accuracy of the speech understanding results for each of the multiple speech understanding results output from the language understanding unit, and selects one of the speech understanding results with a highest utterance batch confidence among the calculated utterance batch confidences.Type: ApplicationFiled: January 22, 2010Publication date: July 29, 2010Applicant: HONDA MOTOR CO., LTD.Inventors: Mikio NAKANO, Masaki KATSUMARU, Kotaro FUNAKOSHI, Hiroshi OKUNO
-
Patent number: 7761302Abstract: A method for generating output data identifying a class of a predetermined plurality of classes. The method comprises receiving data representing an acoustic signal; determining an amplitude of at least a first predetermined frequency component of said acoustic signal; and comparing the or each amplitude with a respective primary threshold; and generating output data identifying one of said classes to which said acoustic signal should be allocated, based upon said comparison.Type: GrantFiled: June 7, 2005Date of Patent: July 20, 2010Assignee: South Manchester University Hospitals NHS TrustInventors: Ashley Arthur Woodcock, Jaclyn Ann Smith, Kevin McGuinness
-
Publication number: 20100179812Abstract: Provided are an apparatus and method for recognizing voice commands, the apparatus including: a voice command recognition unit which recognizes an input voice command; a voice command recognition learning unit which learns a recognition-targeted voice command; and a controller which controls the voice command recognition unit to recognize the recognition-targeted voice command from an input voice command, controls the voice command recognition learning unit to learn the input voice command if the voice command recognition is unsuccessful, and performs a particular operation corresponding to the recognized voice command if the voice command recognition is successful.Type: ApplicationFiled: September 2, 2009Publication date: July 15, 2010Applicant: Samsung Electronics Co., Ltd.Inventors: Jong-hyuk Jang, Seung-kwon Park, Jong-ho Lea
-
Patent number: 7756708Abstract: A method for generating a speech recognition model includes accessing a baseline speech recognition model, obtaining information related to recent language usage from search queries, and modifying the speech recognition model to revise probabilities of a portion of a sound occurrence based on the information. The portion of a sound may include a word. Also, a method for generating a speech recognition model, includes receiving at a search engine from a remote device an audio recording and a transcript that substantially represents at least a portion of the audio recording, synchronizing the transcript with the audio recording, extracting one or more letters from the transcript and extracting the associated pronunciation of the one or more letters from the audio recording, and generating a dictionary entry in a pronunciation dictionary.Type: GrantFiled: April 3, 2006Date of Patent: July 13, 2010Assignee: Google Inc.Inventors: Michael H. Cohen, Shumeet Baluja, Pedro J. Moreno
-
Publication number: 20100169094Abstract: A speaker adaptation apparatus includes an acquiring unit configured to acquire an acoustic model including HMMs and decision trees for estimating what type of the phoneme or the word is included in a feature value used for speech recognition, the HMMs having a plurality of states on a phoneme-to-phoneme basis or a word-to-word basis, and the decision trees being configured to reply to questions relating to the feature value and output likelihoods in the respective states of the HMMs, and a speaker adaptation unit configured to adapt the decision trees to a speaker, the decision trees being adapted using speaker adaptation data vocalized by the speaker of an input speech.Type: ApplicationFiled: September 17, 2009Publication date: July 1, 2010Applicant: KABUSHIKI KAISHA TOSHIBAInventors: Masami Akamine, Jitendra Ajmera, Partha Lal
-
Publication number: 20100161332Abstract: A method and apparatus are provided that use narrowband data and wideband data to train a wideband acoustic model.Type: ApplicationFiled: March 8, 2010Publication date: June 24, 2010Applicant: MICROSOFT CORPORATIONInventors: Michael L. Seltzer, Alejandro Acero
-
Publication number: 20100161331Abstract: In many application environments, it is desirable to provide voice access to tables on Internet pages, where the user asks a subject-related question in a natural language and receives an adequate answer from the table read out to him in a natural language. A method is disclosed for preparing information presented in a tabular form for a speech dialogue system so that the information of the table can be consulted in a user dialogue in a targeted manner.Type: ApplicationFiled: October 25, 2006Publication date: June 24, 2010Applicant: Siemens AktiengesellschaftInventors: Hans-Ulrich Block, Manfred Gehrke, Stefanie Schachchti
-
Publication number: 20100161327Abstract: A computer-implemented method for automatically analyzing, predicting, and/or modifying acoustic units of prosodic human speech utterances for use in speech synthesis or speech recognition. Possible steps include: initiating analysis of acoustic wave data representing the human speech utterances, via the phase state of the acoustic wave data; using one or more phase state defined acoustic wave metrics as common elements for analyzing, and optionally modifying, pitch, amplitude, duration, and other measurable acoustic parameters of the acoustic wave data, at predetermined time intervals; analyzing acoustic wave data representing a selected acoustic unit to determine the phase state of the acoustic unit; and analyzing the acoustic wave data representing the selected acoustic unit to determine at least one acoustic parameter of the acoustic unit with reference to the determined phase state of the selected acoustic unit. Also included are systems for implementing the described and related methods.Type: ApplicationFiled: December 16, 2009Publication date: June 24, 2010Inventors: Nishant CHANDRA, Reiner Wilhelms-Tricarico, Rattima Nitisaroj, Brian Mottershead, Gary A. Marple, John B. Reichenbach
-
Patent number: 7742919Abstract: The present invention provides various elements of a toolkit used for generating a TTS voice for use in a spoken dialog system. The embodiments in each case may be in the form of the system, a computer-readable medium or a method for generating the TTS voice. One embodiment of the invention relates to a method of correcting a database associated with the development of a text-to-speech (TTS) voice. The method comprises generating a pronunciation dictionary for use with a TTS voice, generating a TTS voice to a stage wherein it is prepared to be tested before being deployed, identifying mislabeled phonetic units associated with the TTS voice, for each identified mislabeled phonetic unit, linking to an entry within the pronunciation dictionary to correct the entry and deleting utterances and all associated data for unacceptable utterances.Type: GrantFiled: September 27, 2005Date of Patent: June 22, 2010Assignee: AT&T Intellectual Property II, L.P.Inventors: Steven Lawrence Davis, Shane Fetters, David Eugene Schulz, Beverly Gustafson, Louise Loney
-
Patent number: 7734466Abstract: A method for reducing a computational complexity of an m-stage adaptive filter is provided by updating recursively forward and backward error prediction square terms for a first portion of a length of the adaptive filter, and keeping the updated forward and backward error prediction square terms constant for a second portion of the length of the adaptive filter.Type: GrantFiled: April 7, 2006Date of Patent: June 8, 2010Assignee: Motorola, Inc.Inventors: David L. Barron, Kyle K. Iwai, James B. Piket
-
Patent number: 7725316Abstract: A speech recognition adaptation method for a vehicle having a telematics unit with an embedded speech recognition system. Speech is received and pre-processed to generate acoustic feature vectors, and an adaptation parameter is applied to the acoustic feature vectors to yield transformed acoustic feature vectors. The transformed acoustic feature vectors are decoded and a hypothesis of the speech is selected, and the adaptation parameter is trained using acoustic feature vectors from the hypothesis. The method also includes one or more of the following steps: the speech is observed for a certain characteristic and the trained adaptation parameter is saved in accordance with the certain characteristic for use in transforming feature vectors of subsequent speech having the certain characteristic; use of the trained adaptation parameter persists from one vehicle ignition cycle to the next; and use of the trained adaptation parameter is ceased upon detection of a system fault.Type: GrantFiled: July 5, 2006Date of Patent: May 25, 2010Assignee: General Motors LLCInventors: Rathinavelu Chengalvarayan, John J Correia, Scott M Pennock
-
Patent number: 7720681Abstract: Generally described, the present invention is directed toward generating, maintaining, updating, and applying digital voice profiles. Voice profiles may be generated for individuals. The voice profiles include information that is unique to each individual and which may be applied to digital representations of that individual's voice to improve the quality of a transmitted digital representation of that individual's voice. A voice profile may include, but is not limited to, basic information about the individual, and filter definitions relating to the individuals voice patters, such as a frequency range and amplitude range. The voice profile may also include a speech definition that includes digital representations of the individual's unique speech patterns.Type: GrantFiled: March 23, 2006Date of Patent: May 18, 2010Assignee: Microsoft CorporationInventors: David Milstein, Kuansan Wang, Linda Criddle
-
Patent number: 7716049Abstract: An apparatus for providing adaptive language model scaling includes an adaptive scaling element and an interface element. The adaptive scaling element is configured to receive input speech comprising a sequence of spoken words and to determine a plurality of candidate sequences of text words in which each of the candidate sequences has a corresponding sentence score representing a probability that a candidate sequence matches the sequence of spoken words. Each corresponding sentence score is calculated using an adaptive scaling factor. The interface element is configured to receive a user input selecting one of the candidate sequences. The adaptive scaling element is further configured to estimate an objective function based on the user input and to modify the adaptive scaling factor based on the estimated objective function.Type: GrantFiled: June 30, 2006Date of Patent: May 11, 2010Assignee: Nokia CorporationInventor: Jilei Tian
-
Publication number: 20100100380Abstract: A system, method and computer-readable medium provide a multitask learning method for intent or call-type classification in a spoken language understanding system. Multitask learning aims at training tasks in parallel while using a shared representation. A computing device automatically re-uses the existing labeled data from various applications, which are similar but may have different call-types, intents or intent distributions to improve the performance. An automated intent mapping algorithm operates across applications. In one aspect, active learning is employed to selectively sample the data to be re-used.Type: ApplicationFiled: December 28, 2009Publication date: April 22, 2010Applicant: AT&T Corp.Inventor: Gokhan Tur
-
Patent number: 7702509Abstract: Pronunciation for an input word is modeled by generating a set of candidate phoneme strings having pronunciations close to the input word in an orthographic space. Phoneme sub-strings in the set are selected as the pronunciation. In one aspect, a first closeness measure between phoneme strings for words chosen from a dictionary and contexts within the input word is used to determine the candidate phoneme strings. The words are chosen from the dictionary based on a second closeness measure between a representation of the input word in the orthographic space and orthographic anchors corresponding to the words in the dictionary. In another aspect, the phoneme sub-strings are selected by aligning the candidate phoneme strings on common phoneme sub-strings to produce an occurrence count, which is used to choose the phoneme sub-strings for the pronunciation.Type: GrantFiled: November 21, 2006Date of Patent: April 20, 2010Assignee: Apple Inc.Inventor: Jerome R. Bellegarda
-
Publication number: 20100094629Abstract: A weighting factor learning system includes an audio recognition section that recognizes learning audio data and outputting the recognition result; a weighting factor updating section that updates a weighting factor applied to a score obtained from an acoustic model and a language model so that the difference between a correct-answer score calculated with the use of a correct-answer text of the learning audio data and a score of the recognition result becomes large; a convergence determination section that determines, with the use of the score after updating, whether to return to the weighting factor updating section to update the weighting factor again; and a weighting factor convergence determination section that determines, with the use of the score after updating, whether to return to the audio recognition section to perform the process again and update the weighting factor using the weighting factor updating section.Type: ApplicationFiled: February 19, 2008Publication date: April 15, 2010Inventors: Tadashi Emori, Yoshifumi Onishi
-
Publication number: 20100076968Abstract: Implementations relate to systems and methods for aggregating and presenting data related to geographic locations. Geotag data related to geographic locations and associated features or attributes can be collected to build a regional profile characterizing a set of locations within the region. Geotag data related to the constituent locations, such as user ratings or popularity ranks for restaurants, shops, parks, or other features, sites, or attractions, can be combined to generate a profile of characteristics of locations in the region. The platform can generate recommendations of locations to transmit to the user of a mobile device, based for instance on the location of the device in the region as reported by GPS or other location service and the regional profile. Geotag data can include audio data analyzed using region-specific terms, and user recommendations can be presented via dynamic menus based on regional profiles, user preferences or other criteria.Type: ApplicationFiled: May 21, 2009Publication date: March 25, 2010Inventors: Mark R. BOYNS, Chand MEHTA, Jeffrey C. TSAY, Giridhar D. MANDYAM
-
Patent number: 7676366Abstract: A speaker adaptation system and method for speech models of symbols displays a multi-word symbol to be spoken as a symbol. The supervised adaptation system and method has unsupervised adaptation for multi-word symbols, limited to the set of words associated with each multi-word symbol.Type: GrantFiled: January 13, 2003Date of Patent: March 9, 2010Assignee: Art Advanced Recognition Technologies Inc.Inventors: Ran Mochary, Sasi Solomon, Tal El-Hay, Tal Yadid, Itamar Bartur
-
Patent number: 7664640Abstract: A signal processing system is disclosed which is implemented using Gaussian Mixture Model (GMM) based Hidden Markov Model (HMM), or a GMM alone, parameters of which are constrained during its optimization procedure. Also disclosed is a constraint system applied to input vectors representing the input signal to the system. The invention is particularly, but not exclusively, related to speech recognition systems. The invention reduces the tendency, common in prior art systems, to get caught in local minima associated with highly anisotropic Gaussian components—which reduces the recognizer performance—by employing the constraint system as above whereby the anisotropy of such components are minimized. The invention also covers a method of processing a signal, and a speech recognizer trained according to the method.Type: GrantFiled: March 24, 2003Date of Patent: February 16, 2010Assignee: Qinetiq LimitedInventor: Christopher John St. Clair Webber
-
Patent number: 7660715Abstract: A system and method to improve the automatic adaptation of one or more speech models in automatic speech recognition systems. After a dialog begins, for example, the dialog asks the customer to provide spoken input and it is recorded. If the speech recognizer determines it may not have correctly transcribed the verbal response, i.e., voice input, the invention uses monitoring and if necessary, intervention to guarantee that the next transcription of the verbal response is correct. The dialog asks the customer to repeat his verbal response, which is recorded and a transcription of the input is sent to a human monitor, i.e., agent or operator. If the transcription of the spoken input is correct, the human does not intervene and the transcription remains unmodified. If the transcription of the verbal response is incorrect, the human intervenes and the transcription of the misrecognized word is corrected. In both cases, the dialog asks the customer to confirm the unmodified and corrected transcription.Type: GrantFiled: January 12, 2004Date of Patent: February 9, 2010Assignee: Avaya Inc.Inventor: David Preshan Thambiratnam
-
Patent number: 7660717Abstract: Speech recognition is performed by matching between a characteristic quantity of an inputted speech and a composite HMM obtained by synthesizing a speech HMM (hidden Markov model) and a noise HMM for each speech frame of the inputted speech by use of the composite HMM.Type: GrantFiled: January 9, 2008Date of Patent: February 9, 2010Assignee: Nuance Communications, Inc.Inventors: Tetsuya Takiguchi, Masafumi Nishimura
-
Patent number: 7657430Abstract: An apparatus inputs an utterance and performs speech recognition on the input utterance. The speech processing apparatus determines whether the recognition result contains an unknown word. If it is determined that the recognition result contains an unknown word, it is then determined whether the recognition result is rejected or not. If it is determined that the recognition result is not rejected, a word corresponding to the unknown word contained in the recognition result is required. The apparatus can be used as a speech processing apparatus.Type: GrantFiled: July 20, 2005Date of Patent: February 2, 2010Assignee: Sony CorporationInventor: Hiroaki Ogawa
-
Publication number: 20100023329Abstract: Speech recognition of even a speaker who uses a speech recognition system is enabled by using an extended recognition dictionary suited to the speaker without requiring any previous learning using an utterance label corresponding to the speech of the speaker.Type: ApplicationFiled: January 15, 2008Publication date: January 28, 2010Applicant: NEC CORPORATIONInventor: Yoshifumi Onishi
-
Publication number: 20100004931Abstract: An apparatus is provided for speech utterance verification. The apparatus is configured to compare a first prosody component from a recorded speech with a second prosody component for a reference speech. The apparatus determines a prosodic verification evaluation for the recorded speech utterance in dependence of the comparison.Type: ApplicationFiled: September 15, 2006Publication date: January 7, 2010Inventors: Bin Ma, Haizhou Li, Minghui Dong
-
Patent number: 7643985Abstract: Architecture that interacts with a user, or users of different tongues to enhance speech translation. A recognized concept or situation is sensed and/or converged upon, and disambiguated with mixed-initiative user interaction with a device to provide simplified inferences about user communication goals in working with others who speak another language. Reasoning is applied about communication goals based on the concept or situation at the current focus of attention or the probability distribution over the likely focus of attention, and the user or user's conversational partner is provided with appropriately triaged choices and, images, text and/or speech translations for review or perception. The inferences can also process an utterance or other input from a user as part of the evidence in reasoning about a concept, situation, goals, and/or disambiguating the latter.Type: GrantFiled: June 27, 2005Date of Patent: January 5, 2010Assignee: Microsoft CorporationInventor: Eric J. Horvitz
-
Publication number: 20090313017Abstract: A framework in which a numerical value that represents a statistical appearance tendency of each of words in a language model is set with respect to the words not only as a constant, but also as an update function that changes in time, is included. The numerical value that represents the set statistical appearance tendency of a word is automatically updated in accordance with passage of time.Type: ApplicationFiled: July 6, 2007Publication date: December 17, 2009Inventors: Satoshi Nakazawa, Hitoshi Yamamoto, Tasuku Kitade