Creating Patterns For Matching Patents (Class 704/243)
-
Patent number: 8170874Abstract: A speech recognition apparatus which improves the sound quality of speech output as a speech recognition result is provided. The speech recognition apparatus includes a recognition unit, which recognizes speech based on a recognition dictionary, and a registration unit, which registers a dictionary entry of a new recognition word in the recognition dictionary. The recognition unit includes a generation unit, which generates a dictionary entry including speech of the new recognition word item and feature parameters of the speech, and a modification unit, which makes a modification for improving the sound quality of the speech included in the dictionary entry generated by the generation unit. The recognition unit includes a speech output unit, which outputs speech which is included in a dictionary entry corresponding to the recognition result of input speech, and is modified by the modification unit.Type: GrantFiled: July 1, 2008Date of Patent: May 1, 2012Assignee: Canon Kabushiki KaishaInventors: Masayuki Yamada, Toshiaki Fukada, Yasuo Okutani, Michio Aizawa
-
Publication number: 20120101820Abstract: A method is disclosed for applying a multi-state barge-in acoustic model in a spoken dialogue system. The method includes receiving an audio speech input from the user during the presentation of a prompt, accumulating the audio speech input from the user, applying a non-speech component having at least two one-state Hidden Markov Models (HMMs) to the audio speech input from the user, applying a speech component having at least five three-state HMMs to the audio speech input from the user, in which each of the five three-state HMMs represents a different phonetic category, determining whether the audio speech input is a barge-in-speech input from the user, and if the audio speech input is determined to be the barge-in-speech input from the user, terminating the presentation of the prompt.Type: ApplicationFiled: October 24, 2011Publication date: April 26, 2012Applicant: AT&T Intellectual Property I, L.P.Inventor: Andrej Ljolje
-
Publication number: 20120101821Abstract: A speech recognition apparatus is disclosed. The apparatus converts a speech signal into a digitalized speech data, and performs speech recognition based on the speech data. The apparatus makes a comparison between the speech data inputted the last time and the speech data inputted the time before the last time in response to a user's indication that the speech recognition results in erroneous recognition multiple times in a row. When the speech data inputted the last time is determined to substantially match the speech data inputted the time before the last time, the apparatus outputs a guidance prompting the user to utter an input target by calling it by another name.Type: ApplicationFiled: October 13, 2011Publication date: April 26, 2012Applicant: DENSO CORPORATIONInventor: Takahiro TSUDA
-
Patent number: 8165878Abstract: A system and methods for matching at least one word of an utterance against a set of template hierarchies to select the best matching template or set of templates corresponding to the utterance. The system and methods determines at least one exact, inexact, and partial match between the at least one word of the utterance and at least one term within the template hierarchy to select and populate a template or set of templates corresponding to the utterance. The populated template or set of templates may then be used to generate a narrative template or a report template.Type: GrantFiled: April 26, 2010Date of Patent: April 24, 2012Assignee: Cyberpulse L.L.C.Inventors: James Roberge, Jeffrey Soble
-
Patent number: 8160869Abstract: Provided are a method and apparatus for encoding an audio signal and a method and apparatus for decoding an audio signal. The method includes performing sinusoidal analysis on an audio signal in order to extract a sinusoidal signal of a current frame, determining continuation sinusoidal signal information indicating a number of continuation sinusoidal signals of next frames, which continue from the sinusoidal signal of the current frame, by performing sinusoidal tracking on the extracted sinusoidal signal of the current frame, and encoding the determined continuation sinusoidal signal information by using different Huffman tables according to index information of the current frame, thereby allowing efficient encoding with a low bitrate.Type: GrantFiled: June 3, 2008Date of Patent: April 17, 2012Assignee: Samsung Electronics Co., Ltd.Inventors: Nam-suk Lee, Geon-hyoung Lee, Jae-one Oh, Jong-hoon Jeong
-
Patent number: 8150690Abstract: The invention relates to a speech recognition system and method with cepstral noise subtraction. The speech recognition system and method utilize a first scalar coefficient, a second scalar coefficient, and a determining condition to limit the process for the cepstral feature vector, so as to avoid excessive enhancement or subtraction in the cepstral feature vector, so that the operation of the cepstral feature vector is performed properly to improve the anti-noise ability in speech recognition. Furthermore, the speech recognition system and method can be applied in any environment, and have a low complexity and can be easily integrated into other systems, so as to provide the user with a more reliable and stable speech recognition result.Type: GrantFiled: October 1, 2008Date of Patent: April 3, 2012Assignee: Industrial Technology Research InstituteInventor: Shih-Ming Huang
-
Patent number: 8150694Abstract: The system and method described herein may provide an acoustic grammar to dynamically sharpen speech interpretation. In particular, the acoustic grammar may be used to map one or more phonemes identified in a user verbalization to one or more syllables or words, wherein the acoustic grammar may have one or more linking elements to reduce a search space associated with mapping the phonemes to the syllables or words. As such, the acoustic grammar may be used to generate one or more preliminary interpretations associated with the verbalization, wherein one or more post-processing techniques may then be used to sharpen accuracy associated with the preliminary interpretations. For example, a heuristic model may assign weights to the preliminary interpretations based on context, user profiles, or other knowledge and a probable interpretation may be identified based on confidence scores associated with one or more candidate interpretations generated with the heuristic model.Type: GrantFiled: June 1, 2011Date of Patent: April 3, 2012Assignee: VoiceBox Technologies, Inc.Inventors: Robert A. Kennewick, Min Ke, Michael Tjalve, Philippe Di Cristo
-
Patent number: 8145484Abstract: The described implementations relate to speech spelling by a user. One method identifies one or more symbols that may match a user utterance and displays an individual symbol for confirmation by the user.Type: GrantFiled: November 11, 2008Date of Patent: March 27, 2012Assignee: Microsoft CorporationInventor: Geoffrey Zweig
-
Patent number: 8145482Abstract: Methods and apparatus for the enhancement of speech to text engines, by providing indications to the correctness of the found words, based on additional sources besides the internal indication provided by the STT engine. The enhanced indications comprise sources of data such as acoustic features, CTI features, phonetic search and others. The apparatus and methods also enable the detection of important or significant keywords found in audio files, thus enabling more efficient usages, such as further processing or transfer of interactions to relevant agents, escalation of issues, or the like. The methods and apparatus employ a training phase in which word model and key phrase model are generated for determining an enhanced correctness indication for a word and an enhanced importance indication for a key phrase, based on the additional features.Type: GrantFiled: May 25, 2008Date of Patent: March 27, 2012Inventors: Ezra Daya, Oren Pereg, Yuval Lubowich, Moshe Wasserblat
-
Patent number: 8145483Abstract: The invention can recognize any several languages at the same time without using samples. The important skill is that features of known words in any language are extracted from unknown words or continuous voices. These unknown words represented by matrices are spread in the 144-dimensional space. The feature of a known word of any language represented by a matrix is simulated by the surrounding unknown words. The invention includes 12 elastic frames of equal length without filter and without overlap to normalize the signal waveform of variable length for a word, which has one to several syllables, into a 12×12 matrix as a feature of the word. The invention can improve the feature such that the speech recognition of an unknown sentence is correct. The invention can correctly recognize any languages without samples, such as English, Chinese, German, French, Japanese, Korean, Russian, Cantonese, Taiwanese, etc.Type: GrantFiled: August 5, 2009Date of Patent: March 27, 2012Inventors: Tze Fen Li, Tai-Jan Lee Li, Shih-Tzung Li, Shih-Hon Li, Li-Chuan Liao
-
Publication number: 20120072217Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for approximating relevant responses to a user query with voice-enabled search. A system practicing the method receives a word lattice generated by an automatic speech recognizer based on a user speech and a prosodic analysis of the user speech, generates a reweighted word lattice based on the word lattice and the prosodic analysis, approximates based on the reweighted word lattice one or more relevant responses to the query, and presents to a user the responses to the query. The prosodic analysis examines metalinguistic information of the user speech and can identify the most salient subject matter of the speech, assess how confident a speaker is in the content of his or her speech, and identify the attitude, mood, emotion, sentiment, etc. of the speaker. Other information not described in the content of the speech can also be used.Type: ApplicationFiled: September 17, 2010Publication date: March 22, 2012Applicant: AT&T Intellectual Property I, L.PInventors: Srinivas BANGALORE, Junlan Feng, Michael Johnston, Taniya Mishra
-
Patent number: 8140331Abstract: Characteristic features are extracted from an audio sample based on its acoustic content. The features can be coded as fingerprints, which can be used to identify the audio from a fingerprints database. The features can also be used as parameters to separate the audio into different categories.Type: GrantFiled: July 4, 2008Date of Patent: March 20, 2012Inventor: Xia Lou
-
Patent number: 8135590Abstract: A representation of a speech signal is received and is decoded to identify a sequence of position-dependent phonetic tokens wherein each token comprises a phone and a position indicator that indicates the position of the phone within a syllable.Type: GrantFiled: January 11, 2007Date of Patent: March 13, 2012Assignee: Microsoft CorporationInventors: Peng Liu, Yu Shi, Frank Kao-ping Soong
-
Publication number: 20120059654Abstract: An objective is to provide a technique for accurately reproducing features of a fundamental frequency of a target-speaker's voice on the basis of only a small amount of learning data. A learning apparatus learns shift amounts from a reference source F0 pattern to a target F0 pattern of a target-speaker's voice. The learning apparatus associates a source F0 pattern of a learning text to a target F0 pattern of the same learning text by associating their peaks and troughs. For each of points on the target F0 pattern, the learning apparatus obtains shift amounts in a time-axis direction and in a frequency-axis direction from a corresponding point on the source F0 pattern in reference to a result of the association, and learns a decision tree using, as an input feature vector, linguistic information obtained by parsing the learning text, and using, as an output feature vector, the calculated shift amounts.Type: ApplicationFiled: March 16, 2010Publication date: March 8, 2012Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Masafumi Nishimura, Ryuki Tachibana
-
Publication number: 20120059849Abstract: In one embodiment, a system and method is provided to browse and analyze files comprising text strings tagged with metadata. The system and method comprise various functions including browsing the metadata tags in the file, browsing the text strings, selecting subsets of the text strings by including or excluding strings tagged with specific metadata tags, selecting text strings by matching patterns of words and/or parts of speech in the text string and matching selected text strings to a database to identify similar text string. The system and method further provide functions to generate suggested text selection rules by analyzing a selected subset of a plurality of text strings.Type: ApplicationFiled: September 8, 2010Publication date: March 8, 2012Applicant: DEMAND MEDIA, INC.Inventors: David M. Yehaskel, Henrik M. Kjallbring
-
Publication number: 20120059653Abstract: A method for producing speech recognition results on a device includes receiving first speech recognition results, obtaining a language model, wherein the language model represents information stored on the device, and using the first speech recognition results and the language model to generate second speech recognition results.Type: ApplicationFiled: August 30, 2011Publication date: March 8, 2012Inventors: Jeffrey P. Adams, Kenneth Basye, Ryan Thomas, Jeffrey C. O'Neill
-
Patent number: 8131554Abstract: A tool, method, and system for use in the development of sentence-based test items are disclosed. The tool may include a user interface that may include a database selection field, a sentence pattern entry field, an option pane, and an output pane. The tool may search a database for one or more sentences and may generate one or more responses to the one or more sentences. The one or more sentences and one or more responses may be used to produce the sentence-based test items. The tool may allow test items to be developed more quickly and easily than manual test item authoring. Accordingly, test item development costs may be lowered and test security may be enhanced.Type: GrantFiled: March 11, 2011Date of Patent: March 6, 2012Assignee: Educational Testing ServiceInventor: Derrick Higgins
-
Patent number: 8131547Abstract: A method and system are disclosed that automatically segment speech to generate a speech inventory. The method includes initializing a Hidden Markov Model (HMM) using seed input data, performing a segmentation of the HMM into speech units to generate phone labels, correcting the segmentation of the speech units. Correcting the segmentation of the speech units includes re-estimating the HMM based on a current version of the phone labels, embedded re-estimating of the HMM, and updating the current version of the phone labels using spectral boundary correction. The system includes modules configured to control a processor to perform steps of the method.Type: GrantFiled: August 20, 2009Date of Patent: March 6, 2012Assignee: AT&T Intellectual Property II, L.P.Inventors: Alistair D. Conkie, Yeon-Jun Kim
-
Patent number: 8126711Abstract: A modifying method for a speech model and a modifying module thereof are provided. The modifying method is as follows. First, a correct sequence of a speech is generated according to a correct sequence generating method and the speech model. Next, a candidate sequence generating method is selected from a plurality of candidate sequence generating methods, and a candidate sequence of the speech is generated according to the selected candidate sequence generating method and the speech model. Finally, the speech model is modified according to the correct sequence and the candidate sequence. Therefore, the present invention increases a discrimination of the speech model.Type: GrantFiled: January 10, 2008Date of Patent: February 28, 2012Assignee: Industrial Technology Research InstituteInventors: Jia-Jang Tu, Yuan-Fu Liao
-
Publication number: 20120046946Abstract: A system and method for merging audio data streams receive audio data streams from separate inputs, independently transform each data stream from the time to the frequency domain, and generate separate feature data sets for the transformed data streams. Feature data from each of the separate feature data sets is selected to form a merged feature data set that is output to a decoder for recognition purposes. The separate inputs can include an ear microphone and a mouth microphone.Type: ApplicationFiled: August 20, 2010Publication date: February 23, 2012Applicant: ADACEL SYSTEMS, INC.Inventor: Chang-Qing Shu
-
Patent number: 8117030Abstract: A method for analyzing and adjusting the performance of a speech-enabled application includes selecting a number of user utterances that were previously received by the speech-enabled application. The speech-enabled application receives such user utterances and associates each user utterance with an action-object based on one or more salient terms in the user utterance that are associated with the action-object. The method further includes associating one of a number of action-objects with each of the selected user utterances. Furthermore, for each action-object, the percentage of the utterances associated with the action-object that include at least one of the salient terms associated with the action-object is determined. If the percentage does not exceed a selected threshold, the method also includes adjusting the one or more salient terms associated with the action-object.Type: GrantFiled: September 13, 2006Date of Patent: February 14, 2012Assignee: AT&T Intellectual Property I, L.P.Inventors: Robert R. Bushey, Benjamin A. Knott, John M. Martin
-
Patent number: 8116445Abstract: An apparatus and method for monitoring an interaction between a caller and an automated voice response (AVR) system is provided. An audio communication from a caller is processed by executing an AVR script, which includes a plurality of instructions. A visual representation of the audio communication is presented substantially simultaneously with the audio communication to an agent based on the AVR script. The visual representation includes at least one field to be populated with information obtained from the caller and the information populated in the field can be updated by the agent.Type: GrantFiled: April 3, 2007Date of Patent: February 14, 2012Assignee: Intellisist, Inc.Inventors: Gilad Odinak, Alastair Sutherland, William A. Tolhurst
-
Patent number: 8108205Abstract: A system and method of refining context-free grammars (CFGs). The method includes deriving back-off grammar (BOG) rules from an initially developed CFG and utilizing the initial CFG and the derived BOG rules to recognize user utterances. Based on a response of the initial CFG and the derived BOG rules to the user utterances, at least a portion of the derived BOG rules are utilized to modify the initial CFG and thereby produce a refined CFG. The above method can carried out iterativey, with each new iteration utilizing a refined CFG from preceding iterations.Type: GrantFiled: December 1, 2006Date of Patent: January 31, 2012Assignee: Microsoft CorporationInventors: Timothy Paek, Max Chickering, Eric Badger
-
Patent number: 8103502Abstract: Multimodal utterances contain a number of different modes. These modes can include speech, gestures, and pen, haptic, and gaze inputs, and the like. This invention use recognition results from one or more of these modes to provide compensation to the recognition process of one or more other ones of these modes. In various exemplary embodiments, a multimodal recognition system inputs one or more recognition lattices from one or more of these modes, and generates one or more models to be used by one or more mode recognizers to recognize the one or more other modes. In one exemplary embodiment, a gesture recognizer inputs a gesture input and outputs a gesture recognition lattice to a multimodal parser. The multimodal parser generates a language model and outputs it to an automatic speech recognition system, which uses the received language model to recognize the speech input that corresponds to the recognized gesture input.Type: GrantFiled: September 26, 2007Date of Patent: January 24, 2012Assignee: AT&T Intellectual Property II, L.P.Inventors: Srinivas Bangalore, Michael J. Johnston
-
Patent number: 8099278Abstract: A device may be configured to provide a query to a user. Voice data may be received from the user responsive to the query. Voice recognition may be performed on the voice data to identify a query answer. A confidence score associated with the query answer may be calculated, wherein the confidence score represents the likelihood that the query answer has been accurately identified. A likely age range associated with the user may be determined based on the confidence score. The device to calculate the confidence score may be tuned to increase a likelihood of recognition of voice data for a particular age range of callers.Type: GrantFiled: December 22, 2010Date of Patent: January 17, 2012Assignee: Verizon Patent and Licensing Inc.Inventor: Kevin R. Witzman
-
Publication number: 20120010885Abstract: A system and method is provided for combining active and unsupervised learning for automatic speech recognition. This process enables a reduction in the amount of human supervision required for training acoustic and language models and an increase in the performance given the transcribed and un-transcribed data.Type: ApplicationFiled: September 19, 2011Publication date: January 12, 2012Applicant: AT&T Intellectual Property II, L.P.Inventors: Dilek Zeynep Hakkani-Tür, Giuseppe Riccardi
-
Patent number: 8095372Abstract: Digital process for authentication of a user of a database for access to protected data or a service reserved for a defined circle of users or for the use of data currently entered by the user, wherein a voice sample currently enunciated during an access attempt by the user is routed to a voice analysis unit and, herein, a current voice profile is computed and this is compared in a voice profile comparison unit against a previously stored initial voice profile and, in response to a positive comparison result, the user is authenticated and a first control signal enabling access, but in response to a negative comparison result a second control signal disabling access or triggering a substitute authentication procedure is generated.Type: GrantFiled: January 7, 2008Date of Patent: January 10, 2012Assignee: VOICECASH IP GmbHInventors: Raja Kuppuswamy, Hermann Geupel
-
Patent number: 8090738Abstract: A multi-modal search system (and corresponding methodology) that employs wildcards is provided. Wildcards can be employed in the search query either initiated by the user or inferred by the system. These wildcards can represent uncertainty conveyed by a user in a multi-modal search query input. In examples, the words “something” or “whatchamacallit” can be used to convey uncertainty and partial knowledge about portions of the query and to dynamically trigger wildcard generation.Type: GrantFiled: August 28, 2008Date of Patent: January 3, 2012Assignee: Microsoft CorporationInventors: Timothy Seung Yoon Paek, Bo Thiesson, Yun-Cheng Ju, Bongshin Lee, Christopher A. Meek
-
Patent number: 8086455Abstract: A recognition (e.g., speech, handwriting, etc.) model build process that is declarative and data-dependence-based. Process steps are defined in a declarative language as individual processors having input/output data relationships and data dependencies of predecessors and subsequent process steps. A compiler is utilized to generate the model building sequence. The compiler uses the input data and output data files of each model build processor to determine the sequence of model building and automatically orders the processing steps based on the declared input/output relationship (the user does not need to determine the order of execution). The compiler also automatically detects ill-defined processes, including cyclic definition and data being produced by more than one action. The user can add, change and/or modify a process by editing a declaration file, and rerunning the compiler, thereby a new process is automatically generated.Type: GrantFiled: January 9, 2008Date of Patent: December 27, 2011Assignee: Microsoft CorporationInventors: Yifan Gong, Ye Tian
-
Patent number: 8082150Abstract: A system for determining an identity of a received work. The system receives audio data for an unknown work. The audio data is divided into segments. The system generates a signature of the unknown work from each of the segments. Reduced dimension signatures are then generated at least a portion of the signatures. The reduced dimension signatures are then compared to reduced dimensions signatures of known works that are stored in a database. A list of candidates of known works is generated from the comparison. The signatures of the unknown works are then compared to the signatures of the known works in the list of candidates. The unknown work is then identified as the known work having signatures matching within a threshold.Type: GrantFiled: March 24, 2009Date of Patent: December 20, 2011Assignee: Audible Magic CorporationInventor: Erling H. Wold
-
Patent number: 8082148Abstract: Methods, systems, and products for testing a grammar used in speech recognition for reliability in a plurality of operating environments having different background noise that include: receiving recorded background noise for each of the plurality of operating environments; generating a test speech utterance for recognition by a speech recognition engine using a grammar; mixing the test speech utterance with each recorded background noise, resulting in a plurality of mixed test speech utterances, each mixed test speech utterance having different background noise; performing, for each of the mixed test speech utterances, speech recognition using the grammar and the mixed test speech utterance, resulting in speech recognition results for each of the mixed test speech utterances; and evaluating, for each recorded background noise, speech recognition reliability of the grammar in dependence upon the speech recognition results for the mixed test speech utterance having that recorded background noise.Type: GrantFiled: April 24, 2008Date of Patent: December 20, 2011Assignee: Nuance Communications, Inc.Inventors: Ciprian Agapi, William K. Bodin, Charles W. Cross, Jr., Michael H. Mirt
-
Patent number: 8078463Abstract: A method and apparatus for spotting a target speaker within a call interaction by generating speaker models based on one or more speaker's speech; and by searching for speaker models associated with one or more target speaker speech files.Type: GrantFiled: November 23, 2004Date of Patent: December 13, 2011Assignee: Nice Systems, Ltd.Inventors: Moshe Wasserblat, Yaniv Zigel, Oren Pereg
-
Publication number: 20110301953Abstract: Provided is a system of voice recognition that adapts and stores a voice of a speaker for each feature to each of a basic voice model and new independent multi models and provides stable real-time voice recognition through voice recognition using a multi adaptive model.Type: ApplicationFiled: April 11, 2011Publication date: December 8, 2011Applicant: Seoby Electronic Co., LtdInventor: Sung-Sub Lee
-
Patent number: 8073262Abstract: In an image matching apparatus of the present invention, only a connected region in which the number of pixels included therein exceeds a threshold value, among connected regions that are specified by a labeling process section, is sent to a centroid calculation process section from a threshold value processing section, and a centroid (feature point) of the connected region is calculated. When it is determined that a target document to be matched is an N-up document, the threshold value processing section uses, instead of a default threshold value, a variant threshold value that varies depending on the number of images laid out on the N-up document and a document size that are found and detected by an N-up document determination section and a document size detection section. This makes it possible to determine a similarity to a reference document with high accuracy even in a case of an N-up document, i.e., a case where each target image to be matched is reduced in size from an original image.Type: GrantFiled: September 8, 2008Date of Patent: December 6, 2011Assignee: Sharp Kabushiki KaishaInventor: Hitoshi Hirohata
-
Publication number: 20110295602Abstract: An apparatus and a method are provided for building a spoken language understanding model. Labeled data may be obtained for a target application. A new classification model may be formed for use with the target application by using the labeled data for adaptation of an existing classification model. In some implementations, the existing classification model may be used to determine the most informative examples to label.Type: ApplicationFiled: August 8, 2011Publication date: December 1, 2011Applicant: AT&T Intellectual Property II, L.P.Inventor: Gokhan Tur
-
Patent number: 8069044Abstract: Content matching using phoneme comparison and scoring is described, including extracting phonemes from a file, comparing the phonemes to other phonemes, associating a first score with the phonemes based on a probability of the other phonemes matching the phonemes, and providing the file with another file when a request is received to access one or more files having a second score that is substantially similar to the first score.Type: GrantFiled: March 16, 2007Date of Patent: November 29, 2011Assignee: Adobe Systems IncorporatedInventor: James Moorer
-
Patent number: 8069042Abstract: A method and system for obtaining a pool of speech syllable models. The model pool is generated by first detecting a training segment using unsupervised speech segmentation or speech unit spotting. If the model pool is empty, a first speech syllable model is trained and added to the model pool. If the model pool is not empty, an existing model is determined from the model pool that best matches the training segment. Then the existing module is scored for the training segment. If the score is less than a predefined threshold, a new model for the training segment is created and added to the pool. If the score equals the threshold or is larger than the threshold, the training segment is used to improve or to re-estimate the model.Type: GrantFiled: September 21, 2007Date of Patent: November 29, 2011Assignee: Honda Research Institute Europe GmbHInventors: Frank Joublin, Holger Brandl
-
Patent number: 8065144Abstract: A method for speech recognition. The method uses a single pronunciation estimator to train acoustic phoneme models and recognize utterances from multiple languages. The method includes accepting text spellings of training words in a plurality of sets of training words, each set corresponding to a different one of a plurality of languages. The method also includes, for each of the sets of training words in the plurality, receiving pronunciations for the training words in the set, the pronunciations being characteristic of native speakers of the language of the set, the pronunciations also being in terms of subword units at least some of which are common to two or more of the languages. The method also includes training a single pronunciation estimator using data comprising the text spellings and the pronunciations of the training words.Type: GrantFiled: February 3, 2010Date of Patent: November 22, 2011Assignee: Voice Signal Technologies, Inc.Inventors: Laurence S. Gillick, Thomas E. Lynch, Michael J. Newman, Daniel L. Roth, Steven A. Wegmann, Jonathan P. Yamron
-
Patent number: 8065149Abstract: Techniques for acquiring, from an input text and an input speech, a set of a character string and a pronunciation thereof which should be recognized as a word. A system according to the present invention: selects, from an input text, plural candidate character strings which are candidates to be recognized as a word; generates plural pronunciation candidates of the selected candidate character strings; generates frequency data by combining data in which the generated pronunciation candidates are respectively associated with the character strings; generates recognition data in which character strings respectively indicating plural words contained in the input speech are associated with pronunciations; and selects and outputs a combination contained in the recognition data, out of combinations each consisting of one of the candidate character strings and one of the pronunciation candidates.Type: GrantFiled: March 6, 2008Date of Patent: November 22, 2011Assignee: Nuance Communications, Inc.Inventors: Gakuto Kurata, Shinsuke Mori, Masafumi Nishimura
-
Patent number: 8065241Abstract: A new machine learning technique is herein disclosed which generalizes the support vector machine framework. A separating hyperplane in a separating space is optimized in accordance with generalized constraints which dependent upon the clustering of the input vectors in the dataset.Type: GrantFiled: April 9, 2008Date of Patent: November 22, 2011Assignee: NEC Laboratories America, Inc.Inventors: Vladimir N. Vapnik, Michael R. Miller, Margaret A. Miller, legal representative
-
Patent number: 8060368Abstract: A voice recognition apparatus 10, which performs voice recognition of an input voice by referring to a voice recognition dictionary and outputs a voice recognition result, has an external information acquiring section 14 for acquiring from externally connected devices 20-1-20-N connected thereto a type of each externally connected device, and for acquiring data recorded in each externally connected device; a vocabulary extracting analyzing section 15 and 16 for extracting a vocabulary item from the data as an extracted vocabulary item, and for producing analysis data by analyzing the extracted vocabulary item and by providing the extracted vocabulary item with reading; and a dictionary generating section 17 for storing the analysis data in the voice recognition dictionary corresponding to the type. For each type of the externally connected devices, one of the voice recognition dictionaries 13-1-13-N is assigned.Type: GrantFiled: August 18, 2006Date of Patent: November 15, 2011Assignee: Mitsubishi Electric CorporationInventors: Masanobu Osawa, Reiko Okada, Takashi Ebihara
-
Patent number: 8060365Abstract: A dialog processing system which includes a target expression data extraction unit for extracting a plurality of target expression data each including a pattern matching portion which matches an utterance pattern, which are inputted by an utterance pattern input unit and is an utterance structure derived from contents of field-independent general conversations, among a plurality of utterance data which are inputted by an utterance data input unit and obtained by converting contents of a plurality of conversations in one field; a feature extraction unit for retrieving the pattern matching portions, respectively, from the plurality of target expression data extracted, and then for extracting feature quantity common to the plurality of pattern matching portions; and a mandatory data extraction unit for extracting mandatory data in the one field included in the plurality of utterance data by use of the feature quantities extracted.Type: GrantFiled: July 3, 2008Date of Patent: November 15, 2011Assignee: Nuance Communications, Inc.Inventors: Nobuyasu Itoh, Shiho Negishi, Hironori Takeuchi
-
Publication number: 20110276329Abstract: A speech dialogue apparatus, a dialogue control method, and a dialogue control program are provided, whereby an appropriate dialogue control is enabled by determining a user's proficiency level in a dialogue behavior correctly and performing an appropriate dialogue control according to the user's proficiency level correctly determined, without being influenced by an accidental one-time behavior of the user. An input unit 1 inputs a speech uttered by the user. An extraction unit 3 extracts a proficiency level determination factor that is a factor for determining a user's proficiency level in a dialogue behavior, based upon an input result of the speech of the input unit 1. A history storage unit 4 stores as a history the proficiency level determination factor extracted by the extraction unit 3.Type: ApplicationFiled: January 20, 2010Publication date: November 10, 2011Inventors: Masaaki Ayabe, Jun Okamoto
-
Publication number: 20110276323Abstract: The illustrative embodiments described herein provide systems and methods for authenticating a speaker. In one embodiment, a method includes receiving reference speech input including a reference passphrase to form a reference recording, and receiving test speech input including a test passphrase to form a test recording. The method includes determining whether the test passphrase matches the reference passphrase, and determining whether one or more voice features of the speaker of the test passphrase matches one or more voice features of the speaker of the reference passphrase. The method authenticates the speaker of the test speech input in response to determining that the reference passphrase matches the test passphrase and that one or more voice features of the speaker of the test passphrase matches one or more voice features of the speaker of the reference passphrase.Type: ApplicationFiled: May 6, 2010Publication date: November 10, 2011Applicant: Senam Consulting, Inc.Inventor: Serge Olegovich Seyfetdinov
-
Patent number: 8055502Abstract: A voice dialing method includes the steps of receiving an utterance from a user, decoding the utterance to identify a recognition result for the utterance, and communicating to the user the recognition result. If an indication is received from the user that the communicated recognition result is incorrect, then it is added to a rejection reference. Then, when the user repeats the misunderstood utterance, the rejection reference can be used to eliminate the incorrect recognition result as a potential subsequent recognition result. The method can be used for single or multiple digits or digit strings.Type: GrantFiled: November 28, 2006Date of Patent: November 8, 2011Assignee: General Motors LLCInventors: Jason W. Clark, Rathinavelu Chengalvarayan, Timothy J. Grost, Dana B. Fecher, Jeremy M. Spaulding
-
Patent number: 8055503Abstract: A system and method provide an audio analysis intelligence tool with ad-hoc search capabilities using spoken words as an organized data form. An SQL-like interface is used to process and search audio data and combine it with other traditional data forms to enhance searching of audio segments to identify those audio segments satisfying minimum confidence levels for a match.Type: GrantFiled: November 1, 2006Date of Patent: November 8, 2011Assignee: Siemens Enterprise Communications, Inc.Inventors: Robert Scarano, Lawrence Mark
-
Patent number: 8050918Abstract: A method and system for evaluating the quality of voice input recognition by a voice portal is provided. An analysis interface extracts a set of current grammars from the voice portal. A test pattern generator generates a test input for each current grammar. The test input includes a test pattern and a set of active grammars corresponding to each current grammar. The system further includes a text-to-speech engine for entering each test pattern into the voice server. A results collector analyzes each test pattern entered into the voice server with the speech recognition engine against the set of active grammars corresponding to the current grammar for said test pattern. A results analyzer derives a set of statistics of a quality of recognition of each current grammar.Type: GrantFiled: December 11, 2003Date of Patent: November 1, 2011Assignee: Nuance Communications, Inc.Inventors: Reza Ghasemi, Walter Haenel
-
Patent number: 8046224Abstract: A phonetic vocabulary for a speech recognition system is adapted to a particular speaker's pronunciation. A speaker can be attributed specific pronunciation styles, which can be identified from specific pronunciation examples. Consequently, a phonetic vocabulary can be reduced in size, which can improve recognition accuracy and recognition speed.Type: GrantFiled: April 18, 2008Date of Patent: October 25, 2011Assignee: Nuance Communications, Inc.Inventors: Nitendra Rajput, Ashish Verma
-
Patent number: 8036890Abstract: A speech recognition circuit comprises an input buffer for receiving processed speech parameters. A lexical memory contains lexical data for word recognition. The lexical data comprises a plurality of lexical tree data structures. Each lexical tree data structure comprises a model of words having common prefix components. An initial component of each lexical tree structure is unique. A plurality of lexical tree processors are connected in parallel to the input buffer for processing the speech parameters in parallel to perform parallel lexical tree processing for word recognition by accessing the lexical data in the lexical memory. A results memory is connected to the lexical tree processors for storing processing results from the lexical tree processors and lexical tree identifiers to identify lexical trees to be processed by the lexical tree processors.Type: GrantFiled: September 4, 2009Date of Patent: October 11, 2011Assignee: Zentian LimitedInventor: Mark Catchpole
-
Patent number: 8032373Abstract: A system and method for enabling two computer systems to communicate over an audio communications channel, such as a voice telephony connection. Such a system includes a software application that enables a user's computer to call, interrogate, download, and manage a voicemail account stored on a telephone company's computer, without human intervention. A voicemail retrieved from the telephone company's computer can be stored in a digital format on the user's computer. In such a format, the voicemail can be readily archived, or even distributed throughout a network, such as the Internet, in a digital form, such as an email attachment. Preferably a computationally efficient audio recognition algorithm is employed by the user's computer to respond to and navigate the automated audio menu of the telephone company's computer.Type: GrantFiled: February 28, 2007Date of Patent: October 4, 2011Assignee: Intellisist, Inc.Inventor: Martin R. M. Dunsmuir