Creating Patterns For Matching Patents (Class 704/243)
  • Patent number: 8170874
    Abstract: A speech recognition apparatus which improves the sound quality of speech output as a speech recognition result is provided. The speech recognition apparatus includes a recognition unit, which recognizes speech based on a recognition dictionary, and a registration unit, which registers a dictionary entry of a new recognition word in the recognition dictionary. The recognition unit includes a generation unit, which generates a dictionary entry including speech of the new recognition word item and feature parameters of the speech, and a modification unit, which makes a modification for improving the sound quality of the speech included in the dictionary entry generated by the generation unit. The recognition unit includes a speech output unit, which outputs speech which is included in a dictionary entry corresponding to the recognition result of input speech, and is modified by the modification unit.
    Type: Grant
    Filed: July 1, 2008
    Date of Patent: May 1, 2012
    Assignee: Canon Kabushiki Kaisha
    Inventors: Masayuki Yamada, Toshiaki Fukada, Yasuo Okutani, Michio Aizawa
  • Publication number: 20120101820
    Abstract: A method is disclosed for applying a multi-state barge-in acoustic model in a spoken dialogue system. The method includes receiving an audio speech input from the user during the presentation of a prompt, accumulating the audio speech input from the user, applying a non-speech component having at least two one-state Hidden Markov Models (HMMs) to the audio speech input from the user, applying a speech component having at least five three-state HMMs to the audio speech input from the user, in which each of the five three-state HMMs represents a different phonetic category, determining whether the audio speech input is a barge-in-speech input from the user, and if the audio speech input is determined to be the barge-in-speech input from the user, terminating the presentation of the prompt.
    Type: Application
    Filed: October 24, 2011
    Publication date: April 26, 2012
    Applicant: AT&T Intellectual Property I, L.P.
    Inventor: Andrej Ljolje
  • Publication number: 20120101821
    Abstract: A speech recognition apparatus is disclosed. The apparatus converts a speech signal into a digitalized speech data, and performs speech recognition based on the speech data. The apparatus makes a comparison between the speech data inputted the last time and the speech data inputted the time before the last time in response to a user's indication that the speech recognition results in erroneous recognition multiple times in a row. When the speech data inputted the last time is determined to substantially match the speech data inputted the time before the last time, the apparatus outputs a guidance prompting the user to utter an input target by calling it by another name.
    Type: Application
    Filed: October 13, 2011
    Publication date: April 26, 2012
    Applicant: DENSO CORPORATION
    Inventor: Takahiro TSUDA
  • Patent number: 8165878
    Abstract: A system and methods for matching at least one word of an utterance against a set of template hierarchies to select the best matching template or set of templates corresponding to the utterance. The system and methods determines at least one exact, inexact, and partial match between the at least one word of the utterance and at least one term within the template hierarchy to select and populate a template or set of templates corresponding to the utterance. The populated template or set of templates may then be used to generate a narrative template or a report template.
    Type: Grant
    Filed: April 26, 2010
    Date of Patent: April 24, 2012
    Assignee: Cyberpulse L.L.C.
    Inventors: James Roberge, Jeffrey Soble
  • Patent number: 8160869
    Abstract: Provided are a method and apparatus for encoding an audio signal and a method and apparatus for decoding an audio signal. The method includes performing sinusoidal analysis on an audio signal in order to extract a sinusoidal signal of a current frame, determining continuation sinusoidal signal information indicating a number of continuation sinusoidal signals of next frames, which continue from the sinusoidal signal of the current frame, by performing sinusoidal tracking on the extracted sinusoidal signal of the current frame, and encoding the determined continuation sinusoidal signal information by using different Huffman tables according to index information of the current frame, thereby allowing efficient encoding with a low bitrate.
    Type: Grant
    Filed: June 3, 2008
    Date of Patent: April 17, 2012
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Nam-suk Lee, Geon-hyoung Lee, Jae-one Oh, Jong-hoon Jeong
  • Patent number: 8150690
    Abstract: The invention relates to a speech recognition system and method with cepstral noise subtraction. The speech recognition system and method utilize a first scalar coefficient, a second scalar coefficient, and a determining condition to limit the process for the cepstral feature vector, so as to avoid excessive enhancement or subtraction in the cepstral feature vector, so that the operation of the cepstral feature vector is performed properly to improve the anti-noise ability in speech recognition. Furthermore, the speech recognition system and method can be applied in any environment, and have a low complexity and can be easily integrated into other systems, so as to provide the user with a more reliable and stable speech recognition result.
    Type: Grant
    Filed: October 1, 2008
    Date of Patent: April 3, 2012
    Assignee: Industrial Technology Research Institute
    Inventor: Shih-Ming Huang
  • Patent number: 8150694
    Abstract: The system and method described herein may provide an acoustic grammar to dynamically sharpen speech interpretation. In particular, the acoustic grammar may be used to map one or more phonemes identified in a user verbalization to one or more syllables or words, wherein the acoustic grammar may have one or more linking elements to reduce a search space associated with mapping the phonemes to the syllables or words. As such, the acoustic grammar may be used to generate one or more preliminary interpretations associated with the verbalization, wherein one or more post-processing techniques may then be used to sharpen accuracy associated with the preliminary interpretations. For example, a heuristic model may assign weights to the preliminary interpretations based on context, user profiles, or other knowledge and a probable interpretation may be identified based on confidence scores associated with one or more candidate interpretations generated with the heuristic model.
    Type: Grant
    Filed: June 1, 2011
    Date of Patent: April 3, 2012
    Assignee: VoiceBox Technologies, Inc.
    Inventors: Robert A. Kennewick, Min Ke, Michael Tjalve, Philippe Di Cristo
  • Patent number: 8145484
    Abstract: The described implementations relate to speech spelling by a user. One method identifies one or more symbols that may match a user utterance and displays an individual symbol for confirmation by the user.
    Type: Grant
    Filed: November 11, 2008
    Date of Patent: March 27, 2012
    Assignee: Microsoft Corporation
    Inventor: Geoffrey Zweig
  • Patent number: 8145482
    Abstract: Methods and apparatus for the enhancement of speech to text engines, by providing indications to the correctness of the found words, based on additional sources besides the internal indication provided by the STT engine. The enhanced indications comprise sources of data such as acoustic features, CTI features, phonetic search and others. The apparatus and methods also enable the detection of important or significant keywords found in audio files, thus enabling more efficient usages, such as further processing or transfer of interactions to relevant agents, escalation of issues, or the like. The methods and apparatus employ a training phase in which word model and key phrase model are generated for determining an enhanced correctness indication for a word and an enhanced importance indication for a key phrase, based on the additional features.
    Type: Grant
    Filed: May 25, 2008
    Date of Patent: March 27, 2012
    Inventors: Ezra Daya, Oren Pereg, Yuval Lubowich, Moshe Wasserblat
  • Patent number: 8145483
    Abstract: The invention can recognize any several languages at the same time without using samples. The important skill is that features of known words in any language are extracted from unknown words or continuous voices. These unknown words represented by matrices are spread in the 144-dimensional space. The feature of a known word of any language represented by a matrix is simulated by the surrounding unknown words. The invention includes 12 elastic frames of equal length without filter and without overlap to normalize the signal waveform of variable length for a word, which has one to several syllables, into a 12×12 matrix as a feature of the word. The invention can improve the feature such that the speech recognition of an unknown sentence is correct. The invention can correctly recognize any languages without samples, such as English, Chinese, German, French, Japanese, Korean, Russian, Cantonese, Taiwanese, etc.
    Type: Grant
    Filed: August 5, 2009
    Date of Patent: March 27, 2012
    Inventors: Tze Fen Li, Tai-Jan Lee Li, Shih-Tzung Li, Shih-Hon Li, Li-Chuan Liao
  • Publication number: 20120072217
    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for approximating relevant responses to a user query with voice-enabled search. A system practicing the method receives a word lattice generated by an automatic speech recognizer based on a user speech and a prosodic analysis of the user speech, generates a reweighted word lattice based on the word lattice and the prosodic analysis, approximates based on the reweighted word lattice one or more relevant responses to the query, and presents to a user the responses to the query. The prosodic analysis examines metalinguistic information of the user speech and can identify the most salient subject matter of the speech, assess how confident a speaker is in the content of his or her speech, and identify the attitude, mood, emotion, sentiment, etc. of the speaker. Other information not described in the content of the speech can also be used.
    Type: Application
    Filed: September 17, 2010
    Publication date: March 22, 2012
    Applicant: AT&T Intellectual Property I, L.P
    Inventors: Srinivas BANGALORE, Junlan Feng, Michael Johnston, Taniya Mishra
  • Patent number: 8140331
    Abstract: Characteristic features are extracted from an audio sample based on its acoustic content. The features can be coded as fingerprints, which can be used to identify the audio from a fingerprints database. The features can also be used as parameters to separate the audio into different categories.
    Type: Grant
    Filed: July 4, 2008
    Date of Patent: March 20, 2012
    Inventor: Xia Lou
  • Patent number: 8135590
    Abstract: A representation of a speech signal is received and is decoded to identify a sequence of position-dependent phonetic tokens wherein each token comprises a phone and a position indicator that indicates the position of the phone within a syllable.
    Type: Grant
    Filed: January 11, 2007
    Date of Patent: March 13, 2012
    Assignee: Microsoft Corporation
    Inventors: Peng Liu, Yu Shi, Frank Kao-ping Soong
  • Publication number: 20120059654
    Abstract: An objective is to provide a technique for accurately reproducing features of a fundamental frequency of a target-speaker's voice on the basis of only a small amount of learning data. A learning apparatus learns shift amounts from a reference source F0 pattern to a target F0 pattern of a target-speaker's voice. The learning apparatus associates a source F0 pattern of a learning text to a target F0 pattern of the same learning text by associating their peaks and troughs. For each of points on the target F0 pattern, the learning apparatus obtains shift amounts in a time-axis direction and in a frequency-axis direction from a corresponding point on the source F0 pattern in reference to a result of the association, and learns a decision tree using, as an input feature vector, linguistic information obtained by parsing the learning text, and using, as an output feature vector, the calculated shift amounts.
    Type: Application
    Filed: March 16, 2010
    Publication date: March 8, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Masafumi Nishimura, Ryuki Tachibana
  • Publication number: 20120059849
    Abstract: In one embodiment, a system and method is provided to browse and analyze files comprising text strings tagged with metadata. The system and method comprise various functions including browsing the metadata tags in the file, browsing the text strings, selecting subsets of the text strings by including or excluding strings tagged with specific metadata tags, selecting text strings by matching patterns of words and/or parts of speech in the text string and matching selected text strings to a database to identify similar text string. The system and method further provide functions to generate suggested text selection rules by analyzing a selected subset of a plurality of text strings.
    Type: Application
    Filed: September 8, 2010
    Publication date: March 8, 2012
    Applicant: DEMAND MEDIA, INC.
    Inventors: David M. Yehaskel, Henrik M. Kjallbring
  • Publication number: 20120059653
    Abstract: A method for producing speech recognition results on a device includes receiving first speech recognition results, obtaining a language model, wherein the language model represents information stored on the device, and using the first speech recognition results and the language model to generate second speech recognition results.
    Type: Application
    Filed: August 30, 2011
    Publication date: March 8, 2012
    Inventors: Jeffrey P. Adams, Kenneth Basye, Ryan Thomas, Jeffrey C. O'Neill
  • Patent number: 8131554
    Abstract: A tool, method, and system for use in the development of sentence-based test items are disclosed. The tool may include a user interface that may include a database selection field, a sentence pattern entry field, an option pane, and an output pane. The tool may search a database for one or more sentences and may generate one or more responses to the one or more sentences. The one or more sentences and one or more responses may be used to produce the sentence-based test items. The tool may allow test items to be developed more quickly and easily than manual test item authoring. Accordingly, test item development costs may be lowered and test security may be enhanced.
    Type: Grant
    Filed: March 11, 2011
    Date of Patent: March 6, 2012
    Assignee: Educational Testing Service
    Inventor: Derrick Higgins
  • Patent number: 8131547
    Abstract: A method and system are disclosed that automatically segment speech to generate a speech inventory. The method includes initializing a Hidden Markov Model (HMM) using seed input data, performing a segmentation of the HMM into speech units to generate phone labels, correcting the segmentation of the speech units. Correcting the segmentation of the speech units includes re-estimating the HMM based on a current version of the phone labels, embedded re-estimating of the HMM, and updating the current version of the phone labels using spectral boundary correction. The system includes modules configured to control a processor to perform steps of the method.
    Type: Grant
    Filed: August 20, 2009
    Date of Patent: March 6, 2012
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Alistair D. Conkie, Yeon-Jun Kim
  • Patent number: 8126711
    Abstract: A modifying method for a speech model and a modifying module thereof are provided. The modifying method is as follows. First, a correct sequence of a speech is generated according to a correct sequence generating method and the speech model. Next, a candidate sequence generating method is selected from a plurality of candidate sequence generating methods, and a candidate sequence of the speech is generated according to the selected candidate sequence generating method and the speech model. Finally, the speech model is modified according to the correct sequence and the candidate sequence. Therefore, the present invention increases a discrimination of the speech model.
    Type: Grant
    Filed: January 10, 2008
    Date of Patent: February 28, 2012
    Assignee: Industrial Technology Research Institute
    Inventors: Jia-Jang Tu, Yuan-Fu Liao
  • Publication number: 20120046946
    Abstract: A system and method for merging audio data streams receive audio data streams from separate inputs, independently transform each data stream from the time to the frequency domain, and generate separate feature data sets for the transformed data streams. Feature data from each of the separate feature data sets is selected to form a merged feature data set that is output to a decoder for recognition purposes. The separate inputs can include an ear microphone and a mouth microphone.
    Type: Application
    Filed: August 20, 2010
    Publication date: February 23, 2012
    Applicant: ADACEL SYSTEMS, INC.
    Inventor: Chang-Qing Shu
  • Patent number: 8117030
    Abstract: A method for analyzing and adjusting the performance of a speech-enabled application includes selecting a number of user utterances that were previously received by the speech-enabled application. The speech-enabled application receives such user utterances and associates each user utterance with an action-object based on one or more salient terms in the user utterance that are associated with the action-object. The method further includes associating one of a number of action-objects with each of the selected user utterances. Furthermore, for each action-object, the percentage of the utterances associated with the action-object that include at least one of the salient terms associated with the action-object is determined. If the percentage does not exceed a selected threshold, the method also includes adjusting the one or more salient terms associated with the action-object.
    Type: Grant
    Filed: September 13, 2006
    Date of Patent: February 14, 2012
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Robert R. Bushey, Benjamin A. Knott, John M. Martin
  • Patent number: 8116445
    Abstract: An apparatus and method for monitoring an interaction between a caller and an automated voice response (AVR) system is provided. An audio communication from a caller is processed by executing an AVR script, which includes a plurality of instructions. A visual representation of the audio communication is presented substantially simultaneously with the audio communication to an agent based on the AVR script. The visual representation includes at least one field to be populated with information obtained from the caller and the information populated in the field can be updated by the agent.
    Type: Grant
    Filed: April 3, 2007
    Date of Patent: February 14, 2012
    Assignee: Intellisist, Inc.
    Inventors: Gilad Odinak, Alastair Sutherland, William A. Tolhurst
  • Patent number: 8108205
    Abstract: A system and method of refining context-free grammars (CFGs). The method includes deriving back-off grammar (BOG) rules from an initially developed CFG and utilizing the initial CFG and the derived BOG rules to recognize user utterances. Based on a response of the initial CFG and the derived BOG rules to the user utterances, at least a portion of the derived BOG rules are utilized to modify the initial CFG and thereby produce a refined CFG. The above method can carried out iterativey, with each new iteration utilizing a refined CFG from preceding iterations.
    Type: Grant
    Filed: December 1, 2006
    Date of Patent: January 31, 2012
    Assignee: Microsoft Corporation
    Inventors: Timothy Paek, Max Chickering, Eric Badger
  • Patent number: 8103502
    Abstract: Multimodal utterances contain a number of different modes. These modes can include speech, gestures, and pen, haptic, and gaze inputs, and the like. This invention use recognition results from one or more of these modes to provide compensation to the recognition process of one or more other ones of these modes. In various exemplary embodiments, a multimodal recognition system inputs one or more recognition lattices from one or more of these modes, and generates one or more models to be used by one or more mode recognizers to recognize the one or more other modes. In one exemplary embodiment, a gesture recognizer inputs a gesture input and outputs a gesture recognition lattice to a multimodal parser. The multimodal parser generates a language model and outputs it to an automatic speech recognition system, which uses the received language model to recognize the speech input that corresponds to the recognized gesture input.
    Type: Grant
    Filed: September 26, 2007
    Date of Patent: January 24, 2012
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Srinivas Bangalore, Michael J. Johnston
  • Patent number: 8099278
    Abstract: A device may be configured to provide a query to a user. Voice data may be received from the user responsive to the query. Voice recognition may be performed on the voice data to identify a query answer. A confidence score associated with the query answer may be calculated, wherein the confidence score represents the likelihood that the query answer has been accurately identified. A likely age range associated with the user may be determined based on the confidence score. The device to calculate the confidence score may be tuned to increase a likelihood of recognition of voice data for a particular age range of callers.
    Type: Grant
    Filed: December 22, 2010
    Date of Patent: January 17, 2012
    Assignee: Verizon Patent and Licensing Inc.
    Inventor: Kevin R. Witzman
  • Publication number: 20120010885
    Abstract: A system and method is provided for combining active and unsupervised learning for automatic speech recognition. This process enables a reduction in the amount of human supervision required for training acoustic and language models and an increase in the performance given the transcribed and un-transcribed data.
    Type: Application
    Filed: September 19, 2011
    Publication date: January 12, 2012
    Applicant: AT&T Intellectual Property II, L.P.
    Inventors: Dilek Zeynep Hakkani-Tür, Giuseppe Riccardi
  • Patent number: 8095372
    Abstract: Digital process for authentication of a user of a database for access to protected data or a service reserved for a defined circle of users or for the use of data currently entered by the user, wherein a voice sample currently enunciated during an access attempt by the user is routed to a voice analysis unit and, herein, a current voice profile is computed and this is compared in a voice profile comparison unit against a previously stored initial voice profile and, in response to a positive comparison result, the user is authenticated and a first control signal enabling access, but in response to a negative comparison result a second control signal disabling access or triggering a substitute authentication procedure is generated.
    Type: Grant
    Filed: January 7, 2008
    Date of Patent: January 10, 2012
    Assignee: VOICECASH IP GmbH
    Inventors: Raja Kuppuswamy, Hermann Geupel
  • Patent number: 8090738
    Abstract: A multi-modal search system (and corresponding methodology) that employs wildcards is provided. Wildcards can be employed in the search query either initiated by the user or inferred by the system. These wildcards can represent uncertainty conveyed by a user in a multi-modal search query input. In examples, the words “something” or “whatchamacallit” can be used to convey uncertainty and partial knowledge about portions of the query and to dynamically trigger wildcard generation.
    Type: Grant
    Filed: August 28, 2008
    Date of Patent: January 3, 2012
    Assignee: Microsoft Corporation
    Inventors: Timothy Seung Yoon Paek, Bo Thiesson, Yun-Cheng Ju, Bongshin Lee, Christopher A. Meek
  • Patent number: 8086455
    Abstract: A recognition (e.g., speech, handwriting, etc.) model build process that is declarative and data-dependence-based. Process steps are defined in a declarative language as individual processors having input/output data relationships and data dependencies of predecessors and subsequent process steps. A compiler is utilized to generate the model building sequence. The compiler uses the input data and output data files of each model build processor to determine the sequence of model building and automatically orders the processing steps based on the declared input/output relationship (the user does not need to determine the order of execution). The compiler also automatically detects ill-defined processes, including cyclic definition and data being produced by more than one action. The user can add, change and/or modify a process by editing a declaration file, and rerunning the compiler, thereby a new process is automatically generated.
    Type: Grant
    Filed: January 9, 2008
    Date of Patent: December 27, 2011
    Assignee: Microsoft Corporation
    Inventors: Yifan Gong, Ye Tian
  • Patent number: 8082150
    Abstract: A system for determining an identity of a received work. The system receives audio data for an unknown work. The audio data is divided into segments. The system generates a signature of the unknown work from each of the segments. Reduced dimension signatures are then generated at least a portion of the signatures. The reduced dimension signatures are then compared to reduced dimensions signatures of known works that are stored in a database. A list of candidates of known works is generated from the comparison. The signatures of the unknown works are then compared to the signatures of the known works in the list of candidates. The unknown work is then identified as the known work having signatures matching within a threshold.
    Type: Grant
    Filed: March 24, 2009
    Date of Patent: December 20, 2011
    Assignee: Audible Magic Corporation
    Inventor: Erling H. Wold
  • Patent number: 8082148
    Abstract: Methods, systems, and products for testing a grammar used in speech recognition for reliability in a plurality of operating environments having different background noise that include: receiving recorded background noise for each of the plurality of operating environments; generating a test speech utterance for recognition by a speech recognition engine using a grammar; mixing the test speech utterance with each recorded background noise, resulting in a plurality of mixed test speech utterances, each mixed test speech utterance having different background noise; performing, for each of the mixed test speech utterances, speech recognition using the grammar and the mixed test speech utterance, resulting in speech recognition results for each of the mixed test speech utterances; and evaluating, for each recorded background noise, speech recognition reliability of the grammar in dependence upon the speech recognition results for the mixed test speech utterance having that recorded background noise.
    Type: Grant
    Filed: April 24, 2008
    Date of Patent: December 20, 2011
    Assignee: Nuance Communications, Inc.
    Inventors: Ciprian Agapi, William K. Bodin, Charles W. Cross, Jr., Michael H. Mirt
  • Patent number: 8078463
    Abstract: A method and apparatus for spotting a target speaker within a call interaction by generating speaker models based on one or more speaker's speech; and by searching for speaker models associated with one or more target speaker speech files.
    Type: Grant
    Filed: November 23, 2004
    Date of Patent: December 13, 2011
    Assignee: Nice Systems, Ltd.
    Inventors: Moshe Wasserblat, Yaniv Zigel, Oren Pereg
  • Publication number: 20110301953
    Abstract: Provided is a system of voice recognition that adapts and stores a voice of a speaker for each feature to each of a basic voice model and new independent multi models and provides stable real-time voice recognition through voice recognition using a multi adaptive model.
    Type: Application
    Filed: April 11, 2011
    Publication date: December 8, 2011
    Applicant: Seoby Electronic Co., Ltd
    Inventor: Sung-Sub Lee
  • Patent number: 8073262
    Abstract: In an image matching apparatus of the present invention, only a connected region in which the number of pixels included therein exceeds a threshold value, among connected regions that are specified by a labeling process section, is sent to a centroid calculation process section from a threshold value processing section, and a centroid (feature point) of the connected region is calculated. When it is determined that a target document to be matched is an N-up document, the threshold value processing section uses, instead of a default threshold value, a variant threshold value that varies depending on the number of images laid out on the N-up document and a document size that are found and detected by an N-up document determination section and a document size detection section. This makes it possible to determine a similarity to a reference document with high accuracy even in a case of an N-up document, i.e., a case where each target image to be matched is reduced in size from an original image.
    Type: Grant
    Filed: September 8, 2008
    Date of Patent: December 6, 2011
    Assignee: Sharp Kabushiki Kaisha
    Inventor: Hitoshi Hirohata
  • Publication number: 20110295602
    Abstract: An apparatus and a method are provided for building a spoken language understanding model. Labeled data may be obtained for a target application. A new classification model may be formed for use with the target application by using the labeled data for adaptation of an existing classification model. In some implementations, the existing classification model may be used to determine the most informative examples to label.
    Type: Application
    Filed: August 8, 2011
    Publication date: December 1, 2011
    Applicant: AT&T Intellectual Property II, L.P.
    Inventor: Gokhan Tur
  • Patent number: 8069044
    Abstract: Content matching using phoneme comparison and scoring is described, including extracting phonemes from a file, comparing the phonemes to other phonemes, associating a first score with the phonemes based on a probability of the other phonemes matching the phonemes, and providing the file with another file when a request is received to access one or more files having a second score that is substantially similar to the first score.
    Type: Grant
    Filed: March 16, 2007
    Date of Patent: November 29, 2011
    Assignee: Adobe Systems Incorporated
    Inventor: James Moorer
  • Patent number: 8069042
    Abstract: A method and system for obtaining a pool of speech syllable models. The model pool is generated by first detecting a training segment using unsupervised speech segmentation or speech unit spotting. If the model pool is empty, a first speech syllable model is trained and added to the model pool. If the model pool is not empty, an existing model is determined from the model pool that best matches the training segment. Then the existing module is scored for the training segment. If the score is less than a predefined threshold, a new model for the training segment is created and added to the pool. If the score equals the threshold or is larger than the threshold, the training segment is used to improve or to re-estimate the model.
    Type: Grant
    Filed: September 21, 2007
    Date of Patent: November 29, 2011
    Assignee: Honda Research Institute Europe GmbH
    Inventors: Frank Joublin, Holger Brandl
  • Patent number: 8065144
    Abstract: A method for speech recognition. The method uses a single pronunciation estimator to train acoustic phoneme models and recognize utterances from multiple languages. The method includes accepting text spellings of training words in a plurality of sets of training words, each set corresponding to a different one of a plurality of languages. The method also includes, for each of the sets of training words in the plurality, receiving pronunciations for the training words in the set, the pronunciations being characteristic of native speakers of the language of the set, the pronunciations also being in terms of subword units at least some of which are common to two or more of the languages. The method also includes training a single pronunciation estimator using data comprising the text spellings and the pronunciations of the training words.
    Type: Grant
    Filed: February 3, 2010
    Date of Patent: November 22, 2011
    Assignee: Voice Signal Technologies, Inc.
    Inventors: Laurence S. Gillick, Thomas E. Lynch, Michael J. Newman, Daniel L. Roth, Steven A. Wegmann, Jonathan P. Yamron
  • Patent number: 8065149
    Abstract: Techniques for acquiring, from an input text and an input speech, a set of a character string and a pronunciation thereof which should be recognized as a word. A system according to the present invention: selects, from an input text, plural candidate character strings which are candidates to be recognized as a word; generates plural pronunciation candidates of the selected candidate character strings; generates frequency data by combining data in which the generated pronunciation candidates are respectively associated with the character strings; generates recognition data in which character strings respectively indicating plural words contained in the input speech are associated with pronunciations; and selects and outputs a combination contained in the recognition data, out of combinations each consisting of one of the candidate character strings and one of the pronunciation candidates.
    Type: Grant
    Filed: March 6, 2008
    Date of Patent: November 22, 2011
    Assignee: Nuance Communications, Inc.
    Inventors: Gakuto Kurata, Shinsuke Mori, Masafumi Nishimura
  • Patent number: 8065241
    Abstract: A new machine learning technique is herein disclosed which generalizes the support vector machine framework. A separating hyperplane in a separating space is optimized in accordance with generalized constraints which dependent upon the clustering of the input vectors in the dataset.
    Type: Grant
    Filed: April 9, 2008
    Date of Patent: November 22, 2011
    Assignee: NEC Laboratories America, Inc.
    Inventors: Vladimir N. Vapnik, Michael R. Miller, Margaret A. Miller, legal representative
  • Patent number: 8060368
    Abstract: A voice recognition apparatus 10, which performs voice recognition of an input voice by referring to a voice recognition dictionary and outputs a voice recognition result, has an external information acquiring section 14 for acquiring from externally connected devices 20-1-20-N connected thereto a type of each externally connected device, and for acquiring data recorded in each externally connected device; a vocabulary extracting analyzing section 15 and 16 for extracting a vocabulary item from the data as an extracted vocabulary item, and for producing analysis data by analyzing the extracted vocabulary item and by providing the extracted vocabulary item with reading; and a dictionary generating section 17 for storing the analysis data in the voice recognition dictionary corresponding to the type. For each type of the externally connected devices, one of the voice recognition dictionaries 13-1-13-N is assigned.
    Type: Grant
    Filed: August 18, 2006
    Date of Patent: November 15, 2011
    Assignee: Mitsubishi Electric Corporation
    Inventors: Masanobu Osawa, Reiko Okada, Takashi Ebihara
  • Patent number: 8060365
    Abstract: A dialog processing system which includes a target expression data extraction unit for extracting a plurality of target expression data each including a pattern matching portion which matches an utterance pattern, which are inputted by an utterance pattern input unit and is an utterance structure derived from contents of field-independent general conversations, among a plurality of utterance data which are inputted by an utterance data input unit and obtained by converting contents of a plurality of conversations in one field; a feature extraction unit for retrieving the pattern matching portions, respectively, from the plurality of target expression data extracted, and then for extracting feature quantity common to the plurality of pattern matching portions; and a mandatory data extraction unit for extracting mandatory data in the one field included in the plurality of utterance data by use of the feature quantities extracted.
    Type: Grant
    Filed: July 3, 2008
    Date of Patent: November 15, 2011
    Assignee: Nuance Communications, Inc.
    Inventors: Nobuyasu Itoh, Shiho Negishi, Hironori Takeuchi
  • Publication number: 20110276329
    Abstract: A speech dialogue apparatus, a dialogue control method, and a dialogue control program are provided, whereby an appropriate dialogue control is enabled by determining a user's proficiency level in a dialogue behavior correctly and performing an appropriate dialogue control according to the user's proficiency level correctly determined, without being influenced by an accidental one-time behavior of the user. An input unit 1 inputs a speech uttered by the user. An extraction unit 3 extracts a proficiency level determination factor that is a factor for determining a user's proficiency level in a dialogue behavior, based upon an input result of the speech of the input unit 1. A history storage unit 4 stores as a history the proficiency level determination factor extracted by the extraction unit 3.
    Type: Application
    Filed: January 20, 2010
    Publication date: November 10, 2011
    Inventors: Masaaki Ayabe, Jun Okamoto
  • Publication number: 20110276323
    Abstract: The illustrative embodiments described herein provide systems and methods for authenticating a speaker. In one embodiment, a method includes receiving reference speech input including a reference passphrase to form a reference recording, and receiving test speech input including a test passphrase to form a test recording. The method includes determining whether the test passphrase matches the reference passphrase, and determining whether one or more voice features of the speaker of the test passphrase matches one or more voice features of the speaker of the reference passphrase. The method authenticates the speaker of the test speech input in response to determining that the reference passphrase matches the test passphrase and that one or more voice features of the speaker of the test passphrase matches one or more voice features of the speaker of the reference passphrase.
    Type: Application
    Filed: May 6, 2010
    Publication date: November 10, 2011
    Applicant: Senam Consulting, Inc.
    Inventor: Serge Olegovich Seyfetdinov
  • Patent number: 8055502
    Abstract: A voice dialing method includes the steps of receiving an utterance from a user, decoding the utterance to identify a recognition result for the utterance, and communicating to the user the recognition result. If an indication is received from the user that the communicated recognition result is incorrect, then it is added to a rejection reference. Then, when the user repeats the misunderstood utterance, the rejection reference can be used to eliminate the incorrect recognition result as a potential subsequent recognition result. The method can be used for single or multiple digits or digit strings.
    Type: Grant
    Filed: November 28, 2006
    Date of Patent: November 8, 2011
    Assignee: General Motors LLC
    Inventors: Jason W. Clark, Rathinavelu Chengalvarayan, Timothy J. Grost, Dana B. Fecher, Jeremy M. Spaulding
  • Patent number: 8055503
    Abstract: A system and method provide an audio analysis intelligence tool with ad-hoc search capabilities using spoken words as an organized data form. An SQL-like interface is used to process and search audio data and combine it with other traditional data forms to enhance searching of audio segments to identify those audio segments satisfying minimum confidence levels for a match.
    Type: Grant
    Filed: November 1, 2006
    Date of Patent: November 8, 2011
    Assignee: Siemens Enterprise Communications, Inc.
    Inventors: Robert Scarano, Lawrence Mark
  • Patent number: 8050918
    Abstract: A method and system for evaluating the quality of voice input recognition by a voice portal is provided. An analysis interface extracts a set of current grammars from the voice portal. A test pattern generator generates a test input for each current grammar. The test input includes a test pattern and a set of active grammars corresponding to each current grammar. The system further includes a text-to-speech engine for entering each test pattern into the voice server. A results collector analyzes each test pattern entered into the voice server with the speech recognition engine against the set of active grammars corresponding to the current grammar for said test pattern. A results analyzer derives a set of statistics of a quality of recognition of each current grammar.
    Type: Grant
    Filed: December 11, 2003
    Date of Patent: November 1, 2011
    Assignee: Nuance Communications, Inc.
    Inventors: Reza Ghasemi, Walter Haenel
  • Patent number: 8046224
    Abstract: A phonetic vocabulary for a speech recognition system is adapted to a particular speaker's pronunciation. A speaker can be attributed specific pronunciation styles, which can be identified from specific pronunciation examples. Consequently, a phonetic vocabulary can be reduced in size, which can improve recognition accuracy and recognition speed.
    Type: Grant
    Filed: April 18, 2008
    Date of Patent: October 25, 2011
    Assignee: Nuance Communications, Inc.
    Inventors: Nitendra Rajput, Ashish Verma
  • Patent number: 8036890
    Abstract: A speech recognition circuit comprises an input buffer for receiving processed speech parameters. A lexical memory contains lexical data for word recognition. The lexical data comprises a plurality of lexical tree data structures. Each lexical tree data structure comprises a model of words having common prefix components. An initial component of each lexical tree structure is unique. A plurality of lexical tree processors are connected in parallel to the input buffer for processing the speech parameters in parallel to perform parallel lexical tree processing for word recognition by accessing the lexical data in the lexical memory. A results memory is connected to the lexical tree processors for storing processing results from the lexical tree processors and lexical tree identifiers to identify lexical trees to be processed by the lexical tree processors.
    Type: Grant
    Filed: September 4, 2009
    Date of Patent: October 11, 2011
    Assignee: Zentian Limited
    Inventor: Mark Catchpole
  • Patent number: 8032373
    Abstract: A system and method for enabling two computer systems to communicate over an audio communications channel, such as a voice telephony connection. Such a system includes a software application that enables a user's computer to call, interrogate, download, and manage a voicemail account stored on a telephone company's computer, without human intervention. A voicemail retrieved from the telephone company's computer can be stored in a digital format on the user's computer. In such a format, the voicemail can be readily archived, or even distributed throughout a network, such as the Internet, in a digital form, such as an email attachment. Preferably a computationally efficient audio recognition algorithm is employed by the user's computer to respond to and navigate the automated audio menu of the telephone company's computer.
    Type: Grant
    Filed: February 28, 2007
    Date of Patent: October 4, 2011
    Assignee: Intellisist, Inc.
    Inventor: Martin R. M. Dunsmuir