Specialized Models Patents (Class 704/255)
  • Patent number: 8612212
    Abstract: The invention concerns a method and corresponding system for building a phonotactic model for domain independent speech recognition. The method may include recognizing phones from a user's input communication using a current phonotactic model, detecting morphemes (acoustic and/or non-acoustic) from the recognized phones, and outputting the detected morphemes for processing. The method also updates the phonotactic model with the detected morphemes and stores the new model in a database for use by the system during the next user interaction. The method may also include making task-type classification decisions based on the detected morphemes from the user's input communication.
    Type: Grant
    Filed: March 4, 2013
    Date of Patent: December 17, 2013
    Assignee: AT&T Intellectual Property II, L.P.
    Inventor: Giuseppe Riccardi
  • Patent number: 8612207
    Abstract: Language analysis means 21 analyzes texts read from a text DB 11, and generates a sentence structure as the analysis result. Similar-structure generation adjustment means 25 generates, from an input of an input device, a determination item for determining whether or not the structures are identical every type of differences between the sentence structures. Similar-structure determination adjustment means 26 generates, from an input of the input device 6, a determination item for determining whether or not the difference between attribute values is ignored every type of attribute values. Similar-structure generating means 22 generates a similar structure of a partial structure forming the sentence structure obtained by language analysis means 21 in accordance with the determination item from the similar-structure generation adjustment means 25, and sets the generated similar structure as an equivalent class of the partial structure on the generation source.
    Type: Grant
    Filed: March 17, 2005
    Date of Patent: December 17, 2013
    Assignee: NEC Corporation
    Inventors: Yousuke Sakao, Kenji Satoh, Susumu Akamine
  • Patent number: 8606580
    Abstract: To provide a data process unit and data process unit control program that are suitable for generating acoustic models for unspecified speakers taking distribution of diversifying feature parameters into consideration under such specific conditions as the type of speaker, speech lexicons, speech styles, and speech environment and that are suitable for providing acoustic models intended for unspecified speakers and adapted to speech of a specific person. The data process unit comprises a data classification section, data storing section, pattern model generating section, data control section, mathematical distance calculating section, pattern model converting section, pattern model display section, region dividing section, division changing section, region selecting section, and specific pattern model generating section.
    Type: Grant
    Filed: December 30, 2008
    Date of Patent: December 10, 2013
    Assignee: Asahi Kasei Kabushiki Kaisha
    Inventors: Makoto Shozakai, Goshu Nagino
  • Patent number: 8605885
    Abstract: Systems and methods for handling information communicated by voice. The method may comprise: (i) receiving a call from a caller, the call comprising utterances from the caller; (ii) verbally communicating information to the caller through a customer service representative, the agent interacting with a display; (iii) processing the utterances with a computing device; (iv) determining content of the utterances; and (v) displaying information on the display based on the content.
    Type: Grant
    Filed: October 23, 2009
    Date of Patent: December 10, 2013
    Assignee: Next IT Corporation
    Inventor: Charles C. Wooters
  • Patent number: 8600743
    Abstract: Systems, methods, and devices for noise profile determination for a voice-related feature of an electronic device are provided. In one example, an electronic device capable of such noise profile determination may include a microphone and data processing circuitry. When a voice-related feature of the electronic device is not in use, the microphone may obtain ambient sounds. The data processing circuitry may determine a noise profile based at least in part on the obtained ambient sounds. The noise profile may enable the data processing circuitry to at least partially filter other ambient sounds obtained when the voice-related feature of the electronic device is in use.
    Type: Grant
    Filed: January 6, 2010
    Date of Patent: December 3, 2013
    Assignee: Apple Inc.
    Inventors: Aram Lindahl, Joseph M. Williams, Gints Valdis Klimanis
  • Patent number: 8595010
    Abstract: A program for generating Hidden Markov Models to be used for speech recognition with a given speech recognition system, the information storage medium storing a program, that renders a computer to function as a scheduled-to-be-used model group storage section that stores a scheduled-to-be-used model group including a plurality of Hidden Markov Models scheduled to be used by the given speech recognition system, and a filler model generation section that generates Hidden Markov Models to be used as filler models by the given speech recognition system based on all or at least a part of the Hidden Markov Model group in the scheduled-to-be-used model group.
    Type: Grant
    Filed: February 5, 2010
    Date of Patent: November 26, 2013
    Assignee: Seiko Epson Corporation
    Inventors: Paul W. Shields, Matthew E. Dunnachie, Yasutoshi Takizawa
  • Patent number: 8589334
    Abstract: Methods and systems are provided for developing decision information relating to a single system based on data received from a plurality of sensors. The method includes receiving first data from a first sensor that defines first information of a first type that is related to a system, receiving second data from a second sensor that defines second information of a second type that is related to said system, wherein the first type is different from the second type, generating a first decision model, a second decision model, and a third decision model, determining whether data is available from only the first sensor, only the second sensor, or both the first and second sensors, and selecting based on the determination of availability an additional model to apply the available data, wherein the additional model is selected from a plurality of additional decision models including the third decision model.
    Type: Grant
    Filed: January 18, 2011
    Date of Patent: November 19, 2013
    Assignee: Telcordia Technologies, Inc.
    Inventor: Akshay Vashist
  • Patent number: 8589163
    Abstract: Disclosed herein are systems, methods, and computer-readable storage media for performing speech recognition based on a masked language model. A system configured to practice the method receives a masked language model including a plurality of words, wherein a bit mask identifies whether each of the plurality of words is allowed or disallowed with regard to an adaptation subset, receives input speech, generates a speech recognition lattice based on the received input speech using the masked language model, removes from the generated lattice words identified as disallowed by the bit mask for the adaptation subset, and recognizes the received speech based on the lattice. Alternatively during the generation step, the system can only add words indicated as allowed by the bit mask. The bit mask can be separate from or incorporated as part of the masked language model. The system can dynamically update the adaptation subset and bit mask.
    Type: Grant
    Filed: December 4, 2009
    Date of Patent: November 19, 2013
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Andrej Ljolje, Mazin Gilbert
  • Patent number: 8583436
    Abstract: A word category estimation apparatus (100) includes a word category model (5) which is formed from a probability model having a plurality of kinds of information about a word category as features, and includes information about an entire word category graph as at least one of the features. A word category estimation unit (4) receives the word category graph of a speech recognition hypothesis to be processed, computes scores by referring to the word category model for respective arcs that form the word category graph, and outputs a word category sequence candidate based on the scores.
    Type: Grant
    Filed: December 19, 2008
    Date of Patent: November 12, 2013
    Assignee: NEC Corporation
    Inventors: Hitoshi Yamamoto, Kiyokazu Miki
  • Publication number: 20130297313
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for enhancing speech recognition accuracy. In one aspect, a method includes receiving an audio signal that corresponds to an utterance recorded by a mobile device, determining a geographic location associated with the mobile device, adapting one or more acoustic models for the geographic location, and performing speech recognition on the audio signal using the one or more acoustic models model that are adapted for the geographic location.
    Type: Application
    Filed: April 12, 2013
    Publication date: November 7, 2013
    Inventors: Matthew I. Lloyd, Trausti T. Kristjansson
  • Patent number: 8577670
    Abstract: A statistical language model (SLM) may be iteratively refined by considering N-gram counts in new data, and blending the information contained in the new data with the existing SLM. A first group of documents is evaluated to determine the probabilities associated with the different N-grams observed in the documents. An SLM is constructed based on these probabilities. A second group of documents is then evaluated to determine the probabilities associated with each N-gram in that second group. The existing SLM is then evaluated to determine how well it explains the probabilities in the second group of documents, and a weighting parameter is calculated from that evaluation. Using the weighting parameter, a new SLM is then constructed as a weighted average of the existing SLM and the new probabilities.
    Type: Grant
    Filed: January 8, 2010
    Date of Patent: November 5, 2013
    Assignee: Microsoft Corporation
    Inventors: Kuansan Wang, Xiaolong Li, Jiangbo Miao, Frederic H. Behr, Jr.
  • Patent number: 8577678
    Abstract: A speech recognition system according to the present invention includes a sound source separating section which separates mixed speeches from multiple sound sources from one another; a mask generating section which generates a soft mask which can take continuous values between 0 and 1 for each frequency spectral component of a separated speech signal using distributions of speech signal and noise against separation reliability of the separated speech signal; and a speech recognizing section which recognizes speeches separated by the sound source separating section using soft masks generated by the mask generating section.
    Type: Grant
    Filed: March 10, 2011
    Date of Patent: November 5, 2013
    Assignee: Honda Motor Co., Ltd.
    Inventors: Kazuhiro Nakadai, Toru Takahashi, Hiroshi Okuno
  • Patent number: 8571866
    Abstract: Disclosed herein are systems, methods, and computer-readable storage media for improving speech recognition accuracy using textual context. The method includes retrieving a recorded utterance, capturing text from a device display associated with the spoken dialog and viewed by one party to the recorded utterance, and identifying words in the captured text that are relevant to the recorded utterance. The method further includes adding the identified words to a dynamic language model, and recognizing the recorded utterance using the dynamic language model. The recorded utterance can be a spoken dialog. A time stamp can be assigned to each identified word. The method can include adding identified words to and/or removing identified words from the dynamic language model based on their respective time stamps. A screen scraper can capture text from the device display associated with the recorded utterance. The device display can contain customer service data.
    Type: Grant
    Filed: October 23, 2009
    Date of Patent: October 29, 2013
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Dan Melamed, Srinivas Bangalore, Michael Johnston
  • Patent number: 8566095
    Abstract: Systems, methods, and apparatuses including computer program products for segmenting words using scaled probabilities. In one implementation, a method is provided. The method includes receiving a probability of a n-gram identifying a word, determining a number of atomic units in the corresponding n-gram, identifying a scaling weight depending on the number of atomic units in the n-gram, and applying the scaling weight to the probability of the n-gram identifying a word to determine a scaled probability of the n-gram identifying a word.
    Type: Grant
    Filed: October 11, 2011
    Date of Patent: October 22, 2013
    Assignee: Google Inc.
    Inventor: Mark Davis
  • Patent number: 8560319
    Abstract: The present invention provides for a method and apparatus for segmenting a multi-media program based upon audio events. In an embodiment a method of classifying an audio stream is provided. This method includes receiving an audio stream. Sampling the audio stream at a predetermined rate and then combining a predetermined number of samples into a clip. A plurality of features are then determined for the clip and are analyzed using a linear approximation algorithm. The clip is then characterized based upon the results of the analysis conducted with the linear approximation algorithm.
    Type: Grant
    Filed: January 15, 2008
    Date of Patent: October 15, 2013
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Qian Huang, Zhu Liu
  • Patent number: 8560324
    Abstract: A mobile terminal including an input unit configured to receive an input to activate a voice recognition function on the mobile terminal, a memory configured to store information related to operations performed on the mobile terminal, and a controller configured to activate the voice recognition function upon receiving the input to activate the voice recognition function, to determine a meaning of an input voice instruction based on at least one prior operation performed on the mobile terminal and a language included in the voice instruction, and to provide operations related to the determined meaning of the input voice instruction based on the at least one prior operation performed on the mobile terminal and the language included in the voice instruction and based on a probability that the determined meaning of the input voice instruction matches the information related to the operations of the mobile terminal.
    Type: Grant
    Filed: January 31, 2012
    Date of Patent: October 15, 2013
    Assignee: LG Electronics Inc.
    Inventors: Jong-Ho Shin, Jae-Do Kwak, Jong-Keun Youn
  • Publication number: 20130262117
    Abstract: The invention presents a method for analyzing speech in a spoken dialog system, comprising the steps of: accepting an utterance by at least one means for accepting acoustical signals, in particular a microphone, analyzing the utterance and obtaining prosodic cues from the utterance using at least one processing engine, wherein the utterance is evaluated based on the prosodic cues to determine a prominence of parts of the utterance, and wherein the utterance is analyzed to detect at least one marker feature, e.g. a negative statement, indicative of the utterance containing at least one part to replace at least one part in a previous utterance, the part to be replaced in the previous utterance being determined based on the prominence determined for the parts of the previous utterance and the replacement parts being determined based on the prominence of the parts in the utterance, and wherein the previous utterance is evaluated with the replacement part(s).
    Type: Application
    Filed: March 18, 2013
    Publication date: October 3, 2013
    Applicant: HONDA RESEARCH INSTITUTE EUROPE GMBH
    Inventor: Martin HECKMANN
  • Patent number: 8548807
    Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.
    Type: Grant
    Filed: June 9, 2009
    Date of Patent: October 1, 2013
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Andrej Ljolje, Alistair D. Conkie, Ann K. Syrdal
  • Patent number: 8543400
    Abstract: Voice processing methods and systems are provided. An utterance is received. The utterance is compared with teaching materials according to at least one matching algorithm to obtain a plurality of matching values corresponding to a plurality of voice units of the utterance. Respective voice units are scored in at least one first scoring item according to the matching values and a personified voice scoring algorithm. The personified voice scoring algorithm is generated according to training utterances corresponding to at least one training sentence in a phonetic-balanced sentence set of a plurality of learners and at least one real teacher, and scores corresponding to the respective voice units of the training utterances of the learners in the first scoring item provided by the real teacher.
    Type: Grant
    Filed: June 6, 2008
    Date of Patent: September 24, 2013
    Assignee: National Taiwan University
    Inventors: Lin-Shan Lee, Che-Kuang Lin, Chia-Lin Chang, Yi-Jing Lin, Yow-Bang Wang, Yun-Huan Lee, Li-Wei Cheng
  • Patent number: 8538757
    Abstract: In embodiments of the present invention, a system and computer-implemented method for enabling a user to interact with a computer platform using a voice command may include the steps of defining a structured grammar for generating a global voice command, defining a global voice command of the structured grammar, wherein the global voice command building a custom list of objects, and mapping at least one function of a listed object from the custom list of objects to the global voice command, wherein upon receiving voice input from the user the platform object recognizes at least one global voice command in the voice input and executes the function on the listed object in accordance with the recognized global voice command.
    Type: Grant
    Filed: December 21, 2009
    Date of Patent: September 17, 2013
    Assignee: Redstart Systems, Inc.
    Inventor: Kimberly Patch
  • Patent number: 8538752
    Abstract: The invention comprises a method and apparatus for predicting word accuracy. Specifically, the method comprises obtaining an utterance in speech data where the utterance comprises an actual word string, processing the utterance for generating an interpretation of the actual word string, processing the utterance to identify at least one utterance frame, and predicting a word accuracy associated with the interpretation according to at least one stationary signal-to-noise ratio and at least one non-stationary signal to noise ratio, wherein the at least one stationary signal-to-noise ratio and the at least one non-stationary signal to noise ratio are determined according to a frame energy associated with each of the at least one utterance frame.
    Type: Grant
    Filed: May 7, 2012
    Date of Patent: September 17, 2013
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Mazin Gilbert, Hong Kook Kim
  • Publication number: 20130238336
    Abstract: Speech recognition systems may perform the following operations: receiving audio; recognizing the audio using language models for different languages to produce recognition candidates for the audio, where the recognition candidates are associated with corresponding recognition scores; identifying a candidate language for the audio; selecting a recognition candidate based on the recognition scores and the candidate language; and outputting data corresponding to the selected recognition candidate as a recognized version of the audio.
    Type: Application
    Filed: December 26, 2012
    Publication date: September 12, 2013
    Inventors: Yun-hsuan Sung, Francoise Beaufays, Brian Strope, Hui Lin, Jui-Ting Huang
  • Patent number: 8532995
    Abstract: A method, system and machine-readable medium are provided. Speech input is received at a speech recognition component and recognized output is produced. A common dialog cue from the received speech input or input from a second source is recognized. An action is performed corresponding to the recognized common dialog cue. The performed action includes sending a communication from the speech recognition component to the speech generation component while bypassing a dialog component.
    Type: Grant
    Filed: May 21, 2012
    Date of Patent: September 10, 2013
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Vincent J. Goffin, Sarangarajan Parthasarathy
  • Patent number: 8532991
    Abstract: Speech models are trained using one or more of three different training systems. They include competitive training which reduces a distance between a recognized result and a true result, data boosting which divides and weights training data, and asymmetric training which trains different model components differently.
    Type: Grant
    Filed: March 10, 2010
    Date of Patent: September 10, 2013
    Assignee: Microsoft Corporation
    Inventors: Xiaodong He, Jian Wu
  • Patent number: 8532992
    Abstract: Disclosed herein are systems, methods, and computer-readable storage media for selecting a speech recognition model in a standardized speech recognition infrastructure. The system receives speech from a user, and if a user-specific supervised speech model associated with the user is available, retrieves the supervised speech model. If the user-specific supervised speech model is unavailable and if an unsupervised speech model is available, the system retrieves the unsupervised speech model. If the user-specific supervised speech model and the unsupervised speech model are unavailable, the system retrieves a generic speech model associated with the user. Next the system recognizes the received speech from the user with the retrieved model. In one embodiment, the system trains a speech recognition model in a standardized speech recognition infrastructure. In another embodiment, the system handshakes with a remote application in a standardized speech recognition infrastructure.
    Type: Grant
    Filed: February 8, 2013
    Date of Patent: September 10, 2013
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Andrej Ljolje, Bernard S. Renger, Steven Neil Tischer
  • Patent number: 8532993
    Abstract: A system and method for performing speech recognition is disclosed. The method comprises receiving an utterance, applying the utterance to a recognizer with a language model having pronunciation probabilities associated with unique word identifiers for words given their pronunciations and presenting a recognition result for the utterance. Recognition improvement is found by moving a pronunciation model from a dictionary to the language model.
    Type: Grant
    Filed: July 2, 2012
    Date of Patent: September 10, 2013
    Assignee: AT&T Intellectual Property II, L.P.
    Inventor: Andrej Ljolje
  • Patent number: 8527273
    Abstract: Systems and methods for identifying the N-best strings of a weighted automaton. A potential for each state of an input automaton to a set of destination states of the input automaton is first determined. Then, the N-best paths are found in the result of an on-the-fly determinization of the input automaton. Only the portion of the input automaton needed to identify the N-best paths is determinized. As the input automaton is determinized, a potential for each new state of the partially determinized automaton is determined and is used in identifying the N-best paths of the determinized automaton, which correspond exactly to the N-best strings of the input automaton.
    Type: Grant
    Filed: July 30, 2012
    Date of Patent: September 3, 2013
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Mehryar Mohri, Michael Dennis Riley
  • Patent number: 8527279
    Abstract: The subject matter of this specification can be embodied in, among other things, a method that includes receiving geographical information derived from a non-verbal user action associated with a first computing device. The non-verbal user action implies an interest of a user in a geographic location. The method also includes identifying a grammar associated with the geographic location using the derived geographical information and outputting a grammar indicator for use in selecting the identified grammar for voice recognition processing of vocal input from the user.
    Type: Grant
    Filed: August 23, 2012
    Date of Patent: September 3, 2013
    Assignee: Google Inc.
    Inventors: David P. Singleton, Debajit Ghosh
  • Patent number: 8521527
    Abstract: A computer-implemented system and method for processing audio in a voice response environment is provided. A database of host scripts each comprising signature files of audio phrases and actions to take when one of the audio phrases is recognized is maintained. The host scripts are loaded and a call to a voice mail server is initiated. Incoming audio buffers are received during the call from voice messages stored on the voice mail server. The incoming audio buffers are processed. A signature data structure is created for each audio buffer. The signature data structure is compared with signatures of expected phrases in the host scripts. The actions stored in the host scripts are executed when the signature data structure matches the signature of the expected phrase.
    Type: Grant
    Filed: September 10, 2012
    Date of Patent: August 27, 2013
    Assignee: Intellisist, Inc.
    Inventor: Martin R. M. Dunsmuir
  • Patent number: 8515753
    Abstract: The example embodiment of the present invention provides an acoustic model adaptation method for enhancing recognition performance for a non-native speaker's speech. In order to adapt acoustic models, first, pronunciation variations are examined by analyzing a non-native speaker's speech. Thereafter, based on variation pronunciation of a non-native speaker's speech, acoustic models are adapted in a state-tying step during a training process of acoustic models. When the present invention for adapting acoustic models and a conventional acoustic model adaptation scheme are combined, more-enhanced recognition performance can be obtained. The example embodiment of the present invention enhances recognition performance for a non-native speaker's speech while reducing the degradation of recognition performance for a native speaker's speech.
    Type: Grant
    Filed: March 30, 2007
    Date of Patent: August 20, 2013
    Assignee: Gwangju Institute of Science and Technology
    Inventors: Hong Kook Kim, Yoo Rhee Oh, Jae Sam Yoon
  • Patent number: 8515734
    Abstract: An integrated language model includes an upper-level language model component and a lower-level language model component, with the upper-level language model component including a non-terminal and the lower-level language model component being applied to the non-terminal. The upper-level and lower-level language model components can be of the same or different language model formats, including finite state grammar (FSG) and statistical language model (SLM) formats. Systems and methods for making integrated language models allow designation of language model formats for the upper-level and lower-level components and identification of non-terminals. Automatic non-terminal replacement and retention criteria can be used to facilitate the generation of one or both language model components, which can include the modification of existing language models.
    Type: Grant
    Filed: February 8, 2010
    Date of Patent: August 20, 2013
    Assignee: Adacel Systems, Inc.
    Inventors: Chang-Qing Shu, Han Shu, John M. Mervin
  • Patent number: 8510111
    Abstract: A speech recognition apparatus includes a generating unit generating a speech-feature vector expressing a feature for each of frames obtained by dividing an input speech, a storage unit storing a first acoustic model obtained by modeling a feature of each word by using a state transition model, a storage unit configured to store at least one second acoustic model, a calculation unit calculating, for each state, a first probability of transition to an at-end-frame state to obtain first probabilities, and select a maximum probability of the first probabilities, a selection unit selecting a maximum-probability-transition path, a conversion unit converting the maximum-probability-transition path into a corresponding-transition-path corresponding to the second acoustic model, a calculation unit calculating a second probability of transition to the at-end-frame state on the corresponding-transition-path, and a finding unit finding to which word the input speech corresponds based on the maximum probability and the s
    Type: Grant
    Filed: February 8, 2008
    Date of Patent: August 13, 2013
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Masaru Sakai, Hiroshi Fujimura, Shinichi Tanaka
  • Patent number: 8504364
    Abstract: Differential dynamic content delivery including providing a session document for a presentation, wherein the session document includes a session grammar and a session structured document; selecting from the session structured document a classified structural element in dependence upon user classifications of a user participant in the presentation; presenting the selected structural element to the user; streaming presentation speech to the user including individual speech from at least one user participating in the presentation; converting the presentation speech to text; detecting whether the presentation speech contains simultaneous individual speech from two or more users; and displaying the text if the presentation speech contains simultaneous individual speech from two or more users.
    Type: Grant
    Filed: September 14, 2012
    Date of Patent: August 6, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: William K. Bodin, Michael John Burkhart, Daniel G. Eisenhauer, Thomas James Watson, Daniel Mark Schumacher
  • Patent number: 8504366
    Abstract: Method, system, and computer program product are provided for Joint Factor Analysis (JFA) scoring in speech processing systems. The method includes: carrying out an enrollment session offline to enroll a speaker model in a speech processing system using JFA, including: extracting speaker factors from the enrollment session; estimating first components of channel factors from the enrollment session. The method further includes: carrying out a test session including: calculating second components of channel factors strongly dependent on the test session; and generating a score based on speaker factors, channel factors, and test session Gaussian mixture model sufficient statistics to provide a log-likelihood ratio for a test session.
    Type: Grant
    Filed: November 16, 2011
    Date of Patent: August 6, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Aronowitz Hagai, Barkan Oren
  • Patent number: 8503662
    Abstract: A method includes receiving speech of a call from a caller at a processor of a call routing system. The method includes using the processor to determine a first call destination for the call based on the speech. The method includes using the processor to determine whether the caller is in compliance with at least one business rule related to an account of the caller. The method includes routing the call to the first call destination when the caller is in compliance with the at least one business rule and routing the call to a second call destination when the caller is not in compliance with the at least one business rule.
    Type: Grant
    Filed: May 26, 2010
    Date of Patent: August 6, 2013
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Robert R. Bushey, Benjamin Anthony Knott, Sarah Korth
  • Patent number: 8504363
    Abstract: A system and method is provided for combining active and unsupervised learning for automatic speech recognition. This process enables a reduction in the amount of human supervision required for training acoustic and language models and an increase in the performance given the transcribed and un-transcribed data.
    Type: Grant
    Filed: April 9, 2012
    Date of Patent: August 6, 2013
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Dilek Zeynep Hakkani-Tur, Giuseppe Riccardi
  • Patent number: 8504359
    Abstract: A speech recognition method using a domain ontology includes: constructing domain ontology DB; forming a speech recognition grammar using the formed domain ontology DB; extracting a feature vector from a speech signal; modeling the speech signal using an acoustic model. The method performs speech recognition by using the acoustic model, the speech recognition dictionary and the speech recognition grammar on the basis of the feature vector.
    Type: Grant
    Filed: September 1, 2009
    Date of Patent: August 6, 2013
    Assignee: Electronics and Telecommunications Research Institute
    Inventors: Seung Yun, Soo Jong Lee, Jeong Se Kim, Il Bin Lee, Jun Park, Sang Kyu Park
  • Patent number: 8484025
    Abstract: Disclosed embodiments relate to mapping an utterance to an action using a classifier. One illustrative computing device includes a user interface having an input component. The computing device further includes a processor and a computer-readable storage medium, having stored thereon program instructions that, upon execution by the processor, cause the computing device to perform a set of operations including: receiving an audio utterance via the input component; determining a text string based on the utterance; determining a string-feature vector based on the text string; selecting a target classifier from a set of classifiers, wherein the target classifier is selected based on a determination that a string-feature criteria of the target classifier corresponds to at least one string-feature of the string-feature vector; and initiating a target action that corresponds to the target classifier.
    Type: Grant
    Filed: October 4, 2012
    Date of Patent: July 9, 2013
    Assignee: Google Inc.
    Inventors: Pedro J. Moreno Mengibar, Martin Jansche, Fadi Biadsy
  • Patent number: 8468013
    Abstract: Disclosed is a method, system and computer readable recording medium for correcting an OCR result. According to an exemplary embodiment of the present invention, there is provided a method for correcting an OCR result, the method including performing character recognition on content including character information using an OCR technique, removing extra carriage return information from the content, outputting the character recognition result, and correcting word spacing on the outputted result.
    Type: Grant
    Filed: December 30, 2009
    Date of Patent: June 18, 2013
    Assignee: NHN Corporation
    Inventors: Byoung Seok Yang, Hee Cheol Seo, Do Gil Lee, Ki Joon Sung
  • Patent number: 8442828
    Abstract: A conditional model is used in spoken language understanding. One such model is a conditional random field model.
    Type: Grant
    Filed: March 17, 2006
    Date of Patent: May 14, 2013
    Assignee: Microsoft Corporation
    Inventors: Ye-Yi Wang, Alejandro Acero, John Sie Yuen Lee, Milind V. Mahajan
  • Patent number: 8442831
    Abstract: A speech recognition capability in which words of spoken text are identified based on the contour of sound waves representing the spoken text. Variations in the contour of the sound waves are identified, features are assigned to those variations, and then the features are mapped to sound constructs to provide the words.
    Type: Grant
    Filed: October 31, 2008
    Date of Patent: May 14, 2013
    Assignee: International Business Machines Corporation
    Inventor: Mukundan Sundararajan
  • Patent number: 8438030
    Abstract: A method of and system for automated distortion classification. The method includes steps of (a) receiving audio including a user speech signal and at least some distortion associated with the signal; (b) pre-processing the received audio to generate acoustic feature vectors; (c) decoding the generated acoustic feature vectors to produce a plurality of hypotheses for the distortion; and (d) post-processing the plurality of hypotheses to identify at least one distortion hypothesis of the plurality of hypotheses as the received distortion. The system can include one or more distortion models including distortion-related acoustic features representative of various types of distortion and used by a decoder to compare the acoustic feature vectors with the distortion-related acoustic features to produce the plurality of hypotheses for the distortion.
    Type: Grant
    Filed: November 25, 2009
    Date of Patent: May 7, 2013
    Assignee: General Motors LLC
    Inventors: Gaurav Talwar, Rathinavelu Chengalvarayan
  • Patent number: 8438031
    Abstract: A conversation manager processes spoken utterances from a user of a computer. The conversation manager includes a semantics analysis module and a syntax manager. A domain model that is used in processing the spoken utterances includes an ontology (i.e., world view for the relevant domain of the spoken utterances), lexicon, and syntax definitions. The syntax manager combines the ontology, lexicon, and syntax definitions to generate a grammatic specification. The semantics module uses the grammatic specification and the domain model to develop a set of frames (i.e., internal representation of the spoken utterance). The semantics module then develops a set of propositions from the set of frames. The conversation manager then uses the set of propositions in further processing to provide a reply to the spoken utterance.
    Type: Grant
    Filed: June 7, 2007
    Date of Patent: May 7, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Steven I. Ross, Robert C. Armes, Julie F. Alweis, Elizabeth A. Brownholtz, Jeffrey G. MacAllister
  • Patent number: 8433558
    Abstract: Disclosed herein are systems and methods to incorporate human knowledge when developing and using statistical models for natural language understanding. The disclosed systems and methods embrace a data-driven approach to natural language understanding which progresses seamlessly along the continuum of availability of annotated collected data, from when there is no available annotated collected data to when there is any amount of annotated collected data.
    Type: Grant
    Filed: July 25, 2005
    Date of Patent: April 30, 2013
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Srinivas Bangalore, Mazin Gilbert, Narendra K. Gupta
  • Patent number: 8433572
    Abstract: A method for multiple value confirmation and correction in spoken dialog systems. A user is allowed to correct errors in values captured by the spoken dialog system, such that the interaction necessary for error correction between the system and the user is reduced. When the spoken dialog system collects a set of values from a user, the system provides a spoken confirmation of the set of values to the user. The spoken confirmation comprises the set of values and possibly pause associated with each value. Upon hearing an incorrect value, the user may react and barge-in the spoken confirmation and provide a corrected value. Responsive to detecting the user interruption during the pause or after the system speaking of a value, the system halts the spoken confirmation and collects the corrected value. The system then provides a new spoken confirmation to the user, wherein the new spoken confirmation includes the corrected value.
    Type: Grant
    Filed: April 2, 2008
    Date of Patent: April 30, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Sasha Porto Caskey, Juan Manuel Huerta, Roberto Pieraccini
  • Patent number: 8433573
    Abstract: A prosody modification device includes: a real voice prosody input part that receives real voice prosody information extracted from an utterance of a human; a regular prosody generating part that generates regular prosody information having a regular phoneme boundary that determines a boundary between phonemes and a regular phoneme length of a phoneme by using data representing a regular or statistical phoneme length in an utterance of a human with respect to a section including at least a phoneme or a phoneme string to be modified in the real voice prosody information; and a real voice prosody modification part that resets a real voice phoneme boundary by using the generated regular prosody information so that the real voice phoneme boundary and a real voice phoneme length of the phoneme or the phoneme string to be modified in the real voice prosody information are approximate to an actual phoneme boundary and an actual phoneme length of the utterance of the human, thereby modifying the real voice prosody in
    Type: Grant
    Filed: February 11, 2008
    Date of Patent: April 30, 2013
    Assignee: Fujitsu Limited
    Inventors: Kentaro Murase, Nobuyuki Katae
  • Patent number: 8428944
    Abstract: A speech recognition system prompts a user to provide a first utterance, which is recorded. Speech recognition is performed on the first user utterance to yield a recognition result. The user is prompted to provide a second user utterance, which is recorded, processed and compared to the first utterance to detect a plurality of acoustic differences for each acoustic parameter. The acoustic model used by the speech recognition engine is modified as a function of the acoustic difference.
    Type: Grant
    Filed: May 7, 2007
    Date of Patent: April 23, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Timothy David Poultney, Matthew Whitbourne, Kamourudeen Larry Yusuf
  • Patent number: 8428951
    Abstract: A speech recognition apparatus includes a speech recognition dictionary and a speech recognition unit. The speech recognition dictionary includes comparison data used to recognize a voice input. The speech recognition unit is adapted to calculate the score for each comparison data by comparing voice input data generated based on the voice input with each comparison data, recognize the voice input based on the score, and produce the recognition result of the voice input. The speech recognition apparatus further includes data indicating score weights associated with particular comparison data, used to weight the scores calculated for the particular comparison data. After the score is calculated for each comparison data, the score weights are added to the scores of the particular comparison data, and the voice input is recognized based on total scores including the added score weights.
    Type: Grant
    Filed: July 6, 2006
    Date of Patent: April 23, 2013
    Assignee: Alpine Electronics, Inc.
    Inventor: Toshiyuki Hyakumoto
  • Patent number: 8423348
    Abstract: A method and system is disclosed herein for generating a plurality of equivalent sentence patterns from a declared sentence pattern for a specific language. The declared pattern is fed into a pattern selector. The pattern selector reads a predetermined library of equivalent pattern sets and selects an equivalent pattern set for the declared pattern. The selected equivalent pattern set corresponds to the declared pattern and represents a set of equivalent declared patterns. The set of equivalent declared patterns and the declared pattern are fed to a rules generator. The rules generator outputs executable semantic pattern recognition rules. The reader module, using the generated executable semantic pattern recognition rules, reads the given information source to determine the information of interest.
    Type: Grant
    Filed: June 10, 2006
    Date of Patent: April 16, 2013
    Assignee: Trigent Software Ltd.
    Inventors: Charles Rehberg, Krishnamurthy Satyanarayana, Rengarajan Seshadri, Vasudevan Comandur, Abhishek Mehta, Amit Goel
  • Patent number: 8423363
    Abstract: Occurrences of one or more keywords in audio data are identified using a speech recognizer employing a language model to derive a transcript of the keywords. The transcript is converted into a phoneme sequence. The phonemes of the phoneme sequence are mapped to the audio data to derive a time-aligned phoneme sequence that is searched for occurrences of keyword phoneme sequences corresponding to the phonemes of the keywords. Searching includes computing a confusion matrix. The language model used by the speech recognizer is adapted to keywords by increasing the likelihoods of the keywords in the language model. For each potential occurrences keywords detected, a corresponding subset of the audio data may be played back to an operator to confirm whether the potential occurrences correspond to actual occurrences of the keywords.
    Type: Grant
    Filed: January 13, 2010
    Date of Patent: April 16, 2013
    Assignee: CRIM (Centre de Recherche Informatique de Montréal)
    Inventors: Vishwa Nath Gupta, Gilles Boulianne