Creating Patterns For Matching Patents (Class 704/243)

Update patterns (Class 704/244)

Clustering (Class 704/245)

Apparatus and method for recognizing speech based on feature parameters of modified speech and playing back the modified speech

Patent number: 8170874

Abstract: A speech recognition apparatus which improves the sound quality of speech output as a speech recognition result is provided. The speech recognition apparatus includes a recognition unit, which recognizes speech based on a recognition dictionary, and a registration unit, which registers a dictionary entry of a new recognition word in the recognition dictionary. The recognition unit includes a generation unit, which generates a dictionary entry including speech of the new recognition word item and feature parameters of the speech, and a modification unit, which makes a modification for improving the sound quality of the speech included in the dictionary entry generated by the generation unit. The recognition unit includes a speech output unit, which outputs speech which is included in a dictionary entry corresponding to the recognition result of input speech, and is modified by the modification unit.

Type: Grant

Filed: July 1, 2008

Date of Patent: May 1, 2012

Assignee: Canon Kabushiki Kaisha

Inventors: Masayuki Yamada, Toshiaki Fukada, Yasuo Okutani, Michio Aizawa
MULTI-STATE BARGE-IN MODELS FOR SPOKEN DIALOG SYSTEMS

Publication number: 20120101820

Abstract: A method is disclosed for applying a multi-state barge-in acoustic model in a spoken dialogue system. The method includes receiving an audio speech input from the user during the presentation of a prompt, accumulating the audio speech input from the user, applying a non-speech component having at least two one-state Hidden Markov Models (HMMs) to the audio speech input from the user, applying a speech component having at least five three-state HMMs to the audio speech input from the user, in which each of the five three-state HMMs represents a different phonetic category, determining whether the audio speech input is a barge-in-speech input from the user, and if the audio speech input is determined to be the barge-in-speech input from the user, terminating the presentation of the prompt.

Type: Application

Filed: October 24, 2011

Publication date: April 26, 2012

Applicant: AT&T Intellectual Property I, L.P.

Inventor: Andrej Ljolje
SPEECH RECOGNITION APPARATUS

Publication number: 20120101821

Abstract: A speech recognition apparatus is disclosed. The apparatus converts a speech signal into a digitalized speech data, and performs speech recognition based on the speech data. The apparatus makes a comparison between the speech data inputted the last time and the speech data inputted the time before the last time in response to a user's indication that the speech recognition results in erroneous recognition multiple times in a row. When the speech data inputted the last time is determined to substantially match the speech data inputted the time before the last time, the apparatus outputs a guidance prompting the user to utter an input target by calling it by another name.

Type: Application

Filed: October 13, 2011

Publication date: April 26, 2012

Applicant: DENSO CORPORATION

Inventor: Takahiro TSUDA
System and methods for matching an utterance to a template hierarchy

Patent number: 8165878

Abstract: A system and methods for matching at least one word of an utterance against a set of template hierarchies to select the best matching template or set of templates corresponding to the utterance. The system and methods determines at least one exact, inexact, and partial match between the at least one word of the utterance and at least one term within the template hierarchy to select and populate a template or set of templates corresponding to the utterance. The populated template or set of templates may then be used to generate a narrative template or a report template.

Type: Grant

Filed: April 26, 2010

Date of Patent: April 24, 2012

Assignee: Cyberpulse L.L.C.

Inventors: James Roberge, Jeffrey Soble
Method and apparatus for encoding continuation sinusoid signal information of audio signal and method and apparatus for decoding same

Patent number: 8160869

Abstract: Provided are a method and apparatus for encoding an audio signal and a method and apparatus for decoding an audio signal. The method includes performing sinusoidal analysis on an audio signal in order to extract a sinusoidal signal of a current frame, determining continuation sinusoidal signal information indicating a number of continuation sinusoidal signals of next frames, which continue from the sinusoidal signal of the current frame, by performing sinusoidal tracking on the extracted sinusoidal signal of the current frame, and encoding the determined continuation sinusoidal signal information by using different Huffman tables according to index information of the current frame, thereby allowing efficient encoding with a low bitrate.

Type: Grant

Filed: June 3, 2008

Date of Patent: April 17, 2012

Assignee: Samsung Electronics Co., Ltd.

Inventors: Nam-suk Lee, Geon-hyoung Lee, Jae-one Oh, Jong-hoon Jeong
Speech recognition system and method with cepstral noise subtraction

Patent number: 8150690

Abstract: The invention relates to a speech recognition system and method with cepstral noise subtraction. The speech recognition system and method utilize a first scalar coefficient, a second scalar coefficient, and a determining condition to limit the process for the cepstral feature vector, so as to avoid excessive enhancement or subtraction in the cepstral feature vector, so that the operation of the cepstral feature vector is performed properly to improve the anti-noise ability in speech recognition. Furthermore, the speech recognition system and method can be applied in any environment, and have a low complexity and can be easily integrated into other systems, so as to provide the user with a more reliable and stable speech recognition result.

Type: Grant

Filed: October 1, 2008

Date of Patent: April 3, 2012

Assignee: Industrial Technology Research Institute

Inventor: Shih-Ming Huang
System and method for providing an acoustic grammar to dynamically sharpen speech interpretation

Patent number: 8150694

Abstract: The system and method described herein may provide an acoustic grammar to dynamically sharpen speech interpretation. In particular, the acoustic grammar may be used to map one or more phonemes identified in a user verbalization to one or more syllables or words, wherein the acoustic grammar may have one or more linking elements to reduce a search space associated with mapping the phonemes to the syllables or words. As such, the acoustic grammar may be used to generate one or more preliminary interpretations associated with the verbalization, wherein one or more post-processing techniques may then be used to sharpen accuracy associated with the preliminary interpretations. For example, a heuristic model may assign weights to the preliminary interpretations based on context, user profiles, or other knowledge and a probable interpretation may be identified based on confidence scores associated with one or more candidate interpretations generated with the heuristic model.

Type: Grant

Filed: June 1, 2011

Date of Patent: April 3, 2012

Assignee: VoiceBox Technologies, Inc.

Inventors: Robert A. Kennewick, Min Ke, Michael Tjalve, Philippe Di Cristo
Speech processing with predictive language modeling

Patent number: 8145484

Abstract: The described implementations relate to speech spelling by a user. One method identifies one or more symbols that may match a user utterance and displays an individual symbol for confirmation by the user.

Type: Grant

Filed: November 11, 2008

Date of Patent: March 27, 2012

Assignee: Microsoft Corporation

Inventor: Geoffrey Zweig
Enhancing analysis of test key phrases from acoustic sources with key phrase training models

Patent number: 8145482

Abstract: Methods and apparatus for the enhancement of speech to text engines, by providing indications to the correctness of the found words, based on additional sources besides the internal indication provided by the STT engine. The enhanced indications comprise sources of data such as acoustic features, CTI features, phonetic search and others. The apparatus and methods also enable the detection of important or significant keywords found in audio files, thus enabling more efficient usages, such as further processing or transfer of interactions to relevant agents, escalation of issues, or the like. The methods and apparatus employ a training phase in which word model and key phrase model are generated for determining an enhanced correctness indication for a word and an enhanced importance indication for a key phrase, based on the additional features.

Type: Grant

Filed: May 25, 2008

Date of Patent: March 27, 2012

Inventors: Ezra Daya, Oren Pereg, Yuval Lubowich, Moshe Wasserblat
Speech recognition method for all languages without using samples

Patent number: 8145483

Abstract: The invention can recognize any several languages at the same time without using samples. The important skill is that features of known words in any language are extracted from unknown words or continuous voices. These unknown words represented by matrices are spread in the 144-dimensional space. The feature of a known word of any language represented by a matrix is simulated by the surrounding unknown words. The invention includes 12 elastic frames of equal length without filter and without overlap to normalize the signal waveform of variable length for a word, which has one to several syllables, into a 12×12 matrix as a feature of the word. The invention can improve the feature such that the speech recognition of an unknown sentence is correct. The invention can correctly recognize any languages without samples, such as English, Chinese, German, French, Japanese, Korean, Russian, Cantonese, Taiwanese, etc.

Type: Grant

Filed: August 5, 2009

Date of Patent: March 27, 2012

Inventors: Tze Fen Li, Tai-Jan Lee Li, Shih-Tzung Li, Shih-Hon Li, Li-Chuan Liao
SYSTEM AND METHOD FOR USING PROSODY FOR VOICE-ENABLED SEARCH

Publication number: 20120072217

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for approximating relevant responses to a user query with voice-enabled search. A system practicing the method receives a word lattice generated by an automatic speech recognizer based on a user speech and a prosodic analysis of the user speech, generates a reweighted word lattice based on the word lattice and the prosodic analysis, approximates based on the reweighted word lattice one or more relevant responses to the query, and presents to a user the responses to the query. The prosodic analysis examines metalinguistic information of the user speech and can identify the most salient subject matter of the speech, assess how confident a speaker is in the content of his or her speech, and identify the attitude, mood, emotion, sentiment, etc. of the speaker. Other information not described in the content of the speech can also be used.

Type: Application

Filed: September 17, 2010

Publication date: March 22, 2012

Applicant: AT&T Intellectual Property I, L.P

Inventors: Srinivas BANGALORE, Junlan Feng, Michael Johnston, Taniya Mishra
Feature extraction for identification and classification of audio signals

Patent number: 8140331

Abstract: Characteristic features are extracted from an audio sample based on its acoustic content. The features can be coded as fingerprints, which can be used to identify the audio from a fingerprints database. The features can also be used as parameters to separate the audio into different categories.

Type: Grant

Filed: July 4, 2008

Date of Patent: March 20, 2012

Inventor: Xia Lou
Position-dependent phonetic models for reliable pronunciation identification

Patent number: 8135590

Abstract: A representation of a speech signal is received and is decoded to identify a sequence of position-dependent phonetic tokens wherein each token comprises a phone and a position indicator that indicates the position of the phone within a syllable.

Type: Grant

Filed: January 11, 2007

Date of Patent: March 13, 2012

Assignee: Microsoft Corporation

Inventors: Peng Liu, Yu Shi, Frank Kao-ping Soong
SPEAKER-ADAPTIVE SYNTHESIZED VOICE

Publication number: 20120059654

Abstract: An objective is to provide a technique for accurately reproducing features of a fundamental frequency of a target-speaker's voice on the basis of only a small amount of learning data. A learning apparatus learns shift amounts from a reference source F0 pattern to a target F0 pattern of a target-speaker's voice. The learning apparatus associates a source F0 pattern of a learning text to a target F0 pattern of the same learning text by associating their peaks and troughs. For each of points on the target F0 pattern, the learning apparatus obtains shift amounts in a time-axis direction and in a frequency-axis direction from a corresponding point on the source F0 pattern in reference to a result of the association, and learns a decision tree using, as an input feature vector, linguistic information obtained by parsing the learning text, and using, as an output feature vector, the calculated shift amounts.

Type: Application

Filed: March 16, 2010

Publication date: March 8, 2012

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Masafumi Nishimura, Ryuki Tachibana
Systems and Methods for Keyword Analyzer

Publication number: 20120059849

Abstract: In one embodiment, a system and method is provided to browse and analyze files comprising text strings tagged with metadata. The system and method comprise various functions including browsing the metadata tags in the file, browsing the text strings, selecting subsets of the text strings by including or excluding strings tagged with specific metadata tags, selecting text strings by matching patterns of words and/or parts of speech in the text string and matching selected text strings to a database to identify similar text string. The system and method further provide functions to generate suggested text selection rules by analyzing a selected subset of a plurality of text strings.

Type: Application

Filed: September 8, 2010

Publication date: March 8, 2012

Applicant: DEMAND MEDIA, INC.

Inventors: David M. Yehaskel, Henrik M. Kjallbring
METHODS AND SYSTEMS FOR OBTAINING LANGUAGE MODELS FOR TRANSCRIBING COMMUNICATIONS

Publication number: 20120059653

Abstract: A method for producing speech recognition results on a device includes receiving first speech recognition results, obtaining a language model, wherein the language model represents information stored on the device, and using the first speech recognition results and the language model to generate second speech recognition results.

Type: Application

Filed: August 30, 2011

Publication date: March 8, 2012

Inventors: Jeffrey P. Adams, Kenneth Basye, Ryan Thomas, Jeffrey C. O'Neill
Method and system for text retrieval for computer-assisted item creation

Patent number: 8131554

Abstract: A tool, method, and system for use in the development of sentence-based test items are disclosed. The tool may include a user interface that may include a database selection field, a sentence pattern entry field, an option pane, and an output pane. The tool may search a database for one or more sentences and may generate one or more responses to the one or more sentences. The one or more sentences and one or more responses may be used to produce the sentence-based test items. The tool may allow test items to be developed more quickly and easily than manual test item authoring. Accordingly, test item development costs may be lowered and test security may be enhanced.

Type: Grant

Filed: March 11, 2011

Date of Patent: March 6, 2012

Assignee: Educational Testing Service

Inventor: Derrick Higgins
Automatic segmentation in speech synthesis

Patent number: 8131547

Abstract: A method and system are disclosed that automatically segment speech to generate a speech inventory. The method includes initializing a Hidden Markov Model (HMM) using seed input data, performing a segmentation of the HMM into speech units to generate phone labels, correcting the segmentation of the speech units. Correcting the segmentation of the speech units includes re-estimating the HMM based on a current version of the phone labels, embedded re-estimating of the HMM, and updating the current version of the phone labels using spectral boundary correction. The system includes modules configured to control a processor to perform steps of the method.

Type: Grant

Filed: August 20, 2009

Date of Patent: March 6, 2012

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Alistair D. Conkie, Yeon-Jun Kim
Method and module for modifying speech model by different speech sequence

Patent number: 8126711

Abstract: A modifying method for a speech model and a modifying module thereof are provided. The modifying method is as follows. First, a correct sequence of a speech is generated according to a correct sequence generating method and the speech model. Next, a candidate sequence generating method is selected from a plurality of candidate sequence generating methods, and a candidate sequence of the speech is generated according to the selected candidate sequence generating method and the speech model. Finally, the speech model is modified according to the correct sequence and the candidate sequence. Therefore, the present invention increases a discrimination of the speech model.

Type: Grant

Filed: January 10, 2008

Date of Patent: February 28, 2012

Assignee: Industrial Technology Research Institute

Inventors: Jia-Jang Tu, Yuan-Fu Liao
SYSTEM AND METHOD FOR MERGING AUDIO DATA STREAMS FOR USE IN SPEECH RECOGNITION APPLICATIONS

Publication number: 20120046946

Abstract: A system and method for merging audio data streams receive audio data streams from separate inputs, independently transform each data stream from the time to the frequency domain, and generate separate feature data sets for the transformed data streams. Feature data from each of the separate feature data sets is selected to form a merged feature data set that is output to a decoder for recognition purposes. The separate inputs can include an ear microphone and a mouth microphone.

Type: Application

Filed: August 20, 2010

Publication date: February 23, 2012

Applicant: ADACEL SYSTEMS, INC.

Inventor: Chang-Qing Shu
System and method for analysis and adjustment of speech-enabled systems

Patent number: 8117030

Abstract: A method for analyzing and adjusting the performance of a speech-enabled application includes selecting a number of user utterances that were previously received by the speech-enabled application. The speech-enabled application receives such user utterances and associates each user utterance with an action-object based on one or more salient terms in the user utterance that are associated with the action-object. The method further includes associating one of a number of action-objects with each of the selected user utterances. Furthermore, for each action-object, the percentage of the utterances associated with the action-object that include at least one of the salient terms associated with the action-object is determined. If the percentage does not exceed a selected threshold, the method also includes adjusting the one or more salient terms associated with the action-object.

Type: Grant

Filed: September 13, 2006

Date of Patent: February 14, 2012

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Robert R. Bushey, Benjamin A. Knott, John M. Martin
System and method for monitoring an interaction between a caller and an automated voice response system

Patent number: 8116445

Abstract: An apparatus and method for monitoring an interaction between a caller and an automated voice response (AVR) system is provided. An audio communication from a caller is processed by executing an AVR script, which includes a plurality of instructions. A visual representation of the audio communication is presented substantially simultaneously with the audio communication to an agent based on the AVR script. The visual representation includes at least one field to be populated with information obtained from the caller and the information populated in the field can be updated by the agent.

Type: Grant

Filed: April 3, 2007

Date of Patent: February 14, 2012

Assignee: Intellisist, Inc.

Inventors: Gilad Odinak, Alastair Sutherland, William A. Tolhurst
Leveraging back-off grammars for authoring context-free grammars

Patent number: 8108205

Abstract: A system and method of refining context-free grammars (CFGs). The method includes deriving back-off grammar (BOG) rules from an initially developed CFG and utilizing the initial CFG and the derived BOG rules to recognize user utterances. Based on a response of the initial CFG and the derived BOG rules to the user utterances, at least a portion of the derived BOG rules are utilized to modify the initial CFG and thereby produce a refined CFG. The above method can carried out iterativey, with each new iteration utilizing a refined CFG from preceding iterations.

Type: Grant

Filed: December 1, 2006

Date of Patent: January 31, 2012

Assignee: Microsoft Corporation

Inventors: Timothy Paek, Max Chickering, Eric Badger
Systems and methods for extracting meaning from multimodal inputs using finite-state devices

Patent number: 8103502

Abstract: Multimodal utterances contain a number of different modes. These modes can include speech, gestures, and pen, haptic, and gaze inputs, and the like. This invention use recognition results from one or more of these modes to provide compensation to the recognition process of one or more other ones of these modes. In various exemplary embodiments, a multimodal recognition system inputs one or more recognition lattices from one or more of these modes, and generates one or more models to be used by one or more mode recognizers to recognize the one or more other modes. In one exemplary embodiment, a gesture recognizer inputs a gesture input and outputs a gesture recognition lattice to a multimodal parser. The multimodal parser generates a language model and outputs it to an automatic speech recognition system, which uses the received language model to recognize the speech input that corresponds to the recognized gesture input.

Type: Grant

Filed: September 26, 2007

Date of Patent: January 24, 2012

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Srinivas Bangalore, Michael J. Johnston
Age determination using speech

Patent number: 8099278

Abstract: A device may be configured to provide a query to a user. Voice data may be received from the user responsive to the query. Voice recognition may be performed on the voice data to identify a query answer. A confidence score associated with the query answer may be calculated, wherein the confidence score represents the likelihood that the query answer has been accurately identified. A likely age range associated with the user may be determined based on the confidence score. The device to calculate the confidence score may be tuned to increase a likelihood of recognition of voice data for a particular age range of callers.

Type: Grant

Filed: December 22, 2010

Date of Patent: January 17, 2012

Assignee: Verizon Patent and Licensing Inc.

Inventor: Kevin R. Witzman
System and Method for Unsupervised and Active Learning for Automatic Speech Recognition

Publication number: 20120010885

Abstract: A system and method is provided for combining active and unsupervised learning for automatic speech recognition. This process enables a reduction in the amount of human supervision required for training acoustic and language models and an increase in the performance given the transcribed and un-transcribed data.

Type: Application

Filed: September 19, 2011

Publication date: January 12, 2012

Applicant: AT&T Intellectual Property II, L.P.

Inventors: Dilek Zeynep Hakkani-Tür, Giuseppe Riccardi
Digital process and arrangement for authenticating a user of a database

Patent number: 8095372

Abstract: Digital process for authentication of a user of a database for access to protected data or a service reserved for a defined circle of users or for the use of data currently entered by the user, wherein a voice sample currently enunciated during an access attempt by the user is routed to a voice analysis unit and, herein, a current voice profile is computed and this is compared in a voice profile comparison unit against a previously stored initial voice profile and, in response to a positive comparison result, the user is authenticated and a first control signal enabling access, but in response to a negative comparison result a second control signal disabling access or triggering a substitute authentication procedure is generated.

Type: Grant

Filed: January 7, 2008

Date of Patent: January 10, 2012

Assignee: VOICECASH IP GmbH

Inventors: Raja Kuppuswamy, Hermann Geupel
Multi-modal search wildcards

Patent number: 8090738

Abstract: A multi-modal search system (and corresponding methodology) that employs wildcards is provided. Wildcards can be employed in the search query either initiated by the user or inferred by the system. These wildcards can represent uncertainty conveyed by a user in a multi-modal search query input. In examples, the words “something” or “whatchamacallit” can be used to convey uncertainty and partial knowledge about portions of the query and to dynamically trigger wildcard generation.

Type: Grant

Filed: August 28, 2008

Date of Patent: January 3, 2012

Assignee: Microsoft Corporation

Inventors: Timothy Seung Yoon Paek, Bo Thiesson, Yun-Cheng Ju, Bongshin Lee, Christopher A. Meek
Model development authoring, generation and execution based on data and processor dependencies

Patent number: 8086455

Abstract: A recognition (e.g., speech, handwriting, etc.) model build process that is declarative and data-dependence-based. Process steps are defined in a declarative language as individual processors having input/output data relationships and data dependencies of predecessors and subsequent process steps. A compiler is utilized to generate the model building sequence. The compiler uses the input data and output data files of each model build processor to determine the sequence of model building and automatically orders the processing steps based on the declared input/output relationship (the user does not need to determine the order of execution). The compiler also automatically detects ill-defined processes, including cyclic definition and data being produced by more than one action. The user can add, change and/or modify a process by editing a declaration file, and rerunning the compiler, thereby a new process is automatically generated.

Type: Grant

Filed: January 9, 2008

Date of Patent: December 27, 2011

Assignee: Microsoft Corporation

Inventors: Yifan Gong, Ye Tian
Method and apparatus for identifying an unknown work

Patent number: 8082150

Abstract: A system for determining an identity of a received work. The system receives audio data for an unknown work. The audio data is divided into segments. The system generates a signature of the unknown work from each of the segments. Reduced dimension signatures are then generated at least a portion of the signatures. The reduced dimension signatures are then compared to reduced dimensions signatures of known works that are stored in a database. A list of candidates of known works is generated from the comparison. The signatures of the unknown works are then compared to the signatures of the known works in the list of candidates. The unknown work is then identified as the known work having signatures matching within a threshold.

Type: Grant

Filed: March 24, 2009

Date of Patent: December 20, 2011

Assignee: Audible Magic Corporation

Inventor: Erling H. Wold
Testing a grammar used in speech recognition for reliability in a plurality of operating environments having different background noise

Patent number: 8082148

Abstract: Methods, systems, and products for testing a grammar used in speech recognition for reliability in a plurality of operating environments having different background noise that include: receiving recorded background noise for each of the plurality of operating environments; generating a test speech utterance for recognition by a speech recognition engine using a grammar; mixing the test speech utterance with each recorded background noise, resulting in a plurality of mixed test speech utterances, each mixed test speech utterance having different background noise; performing, for each of the mixed test speech utterances, speech recognition using the grammar and the mixed test speech utterance, resulting in speech recognition results for each of the mixed test speech utterances; and evaluating, for each recorded background noise, speech recognition reliability of the grammar in dependence upon the speech recognition results for the mixed test speech utterance having that recorded background noise.

Type: Grant

Filed: April 24, 2008

Date of Patent: December 20, 2011

Assignee: Nuance Communications, Inc.

Inventors: Ciprian Agapi, William K. Bodin, Charles W. Cross, Jr., Michael H. Mirt
Method and apparatus for speaker spotting

Patent number: 8078463

Abstract: A method and apparatus for spotting a target speaker within a call interaction by generating speaker models based on one or more speaker's speech; and by searching for speaker models associated with one or more target speaker speech files.

Type: Grant

Filed: November 23, 2004

Date of Patent: December 13, 2011

Assignee: Nice Systems, Ltd.

Inventors: Moshe Wasserblat, Yaniv Zigel, Oren Pereg
SYSTEM AND METHOD OF MULTI MODEL ADAPTATION AND VOICE RECOGNITION

Publication number: 20110301953

Abstract: Provided is a system of voice recognition that adapts and stores a voice of a speaker for each feature to each of a basic voice model and new independent multi models and provides stable real-time voice recognition through voice recognition using a multi adaptive model.

Type: Application

Filed: April 11, 2011

Publication date: December 8, 2011

Applicant: Seoby Electronic Co., Ltd

Inventor: Sung-Sub Lee
Image matching apparatus, image matching method, and image data output processing apparatus

Patent number: 8073262

Abstract: In an image matching apparatus of the present invention, only a connected region in which the number of pixels included therein exceeds a threshold value, among connected regions that are specified by a labeling process section, is sent to a centroid calculation process section from a threshold value processing section, and a centroid (feature point) of the connected region is calculated. When it is determined that a target document to be matched is an N-up document, the threshold value processing section uses, instead of a default threshold value, a variant threshold value that varies depending on the number of images laid out on the N-up document and a document size that are found and detected by an N-up document determination section and a document size detection section. This makes it possible to determine a similarity to a reference document with high accuracy even in a case of an N-up document, i.e., a case where each target image to be matched is reduced in size from an original image.

Type: Grant

Filed: September 8, 2008

Date of Patent: December 6, 2011

Assignee: Sharp Kabushiki Kaisha

Inventor: Hitoshi Hirohata
APPARATUS AND METHOD FOR MODEL ADAPTATION FOR SPOKEN LANGUAGE UNDERSTANDING

Publication number: 20110295602

Abstract: An apparatus and a method are provided for building a spoken language understanding model. Labeled data may be obtained for a target application. A new classification model may be formed for use with the target application by using the labeled data for adaptation of an existing classification model. In some implementations, the existing classification model may be used to determine the most informative examples to label.

Type: Application

Filed: August 8, 2011

Publication date: December 1, 2011

Applicant: AT&T Intellectual Property II, L.P.

Inventor: Gokhan Tur
Content matching using phoneme comparison and scoring

Patent number: 8069044

Abstract: Content matching using phoneme comparison and scoring is described, including extracting phonemes from a file, comparing the phonemes to other phonemes, associating a first score with the phonemes based on a probability of the other phonemes matching the phonemes, and providing the file with another file when a request is received to access one or more files having a second score that is substantially similar to the first score.

Type: Grant

Filed: March 16, 2007

Date of Patent: November 29, 2011

Assignee: Adobe Systems Incorporated

Inventor: James Moorer
Using child directed speech to bootstrap a model based speech segmentation and recognition system

Patent number: 8069042

Abstract: A method and system for obtaining a pool of speech syllable models. The model pool is generated by first detecting a training segment using unsupervised speech segmentation or speech unit spotting. If the model pool is empty, a first speech syllable model is trained and added to the model pool. If the model pool is not empty, an existing model is determined from the model pool that best matches the training segment. Then the existing module is scored for the training segment. If the score is less than a predefined threshold, a new model for the training segment is created and added to the pool. If the score equals the threshold or is larger than the threshold, the training segment is used to improve or to re-estimate the model.

Type: Grant

Filed: September 21, 2007

Date of Patent: November 29, 2011

Assignee: Honda Research Institute Europe GmbH

Inventors: Frank Joublin, Holger Brandl
Multilingual speech recognition

Patent number: 8065144

Abstract: A method for speech recognition. The method uses a single pronunciation estimator to train acoustic phoneme models and recognize utterances from multiple languages. The method includes accepting text spellings of training words in a plurality of sets of training words, each set corresponding to a different one of a plurality of languages. The method also includes, for each of the sets of training words in the plurality, receiving pronunciations for the training words in the set, the pronunciations being characteristic of native speakers of the language of the set, the pronunciations also being in terms of subword units at least some of which are common to two or more of the languages. The method also includes training a single pronunciation estimator using data comprising the text spellings and the pronunciations of the training words.

Type: Grant

Filed: February 3, 2010

Date of Patent: November 22, 2011

Assignee: Voice Signal Technologies, Inc.

Inventors: Laurence S. Gillick, Thomas E. Lynch, Michael J. Newman, Daniel L. Roth, Steven A. Wegmann, Jonathan P. Yamron
Unsupervised lexicon acquisition from speech and text

Patent number: 8065149

Abstract: Techniques for acquiring, from an input text and an input speech, a set of a character string and a pronunciation thereof which should be recognized as a word. A system according to the present invention: selects, from an input text, plural candidate character strings which are candidates to be recognized as a word; generates plural pronunciation candidates of the selected candidate character strings; generates frequency data by combining data in which the generated pronunciation candidates are respectively associated with the character strings; generates recognition data in which character strings respectively indicating plural words contained in the input speech are associated with pronunciations; and selects and outputs a combination contained in the recognition data, out of combinations each consisting of one of the candidate character strings and one of the pronunciation candidates.

Type: Grant

Filed: March 6, 2008

Date of Patent: November 22, 2011

Assignee: Nuance Communications, Inc.

Inventors: Gakuto Kurata, Shinsuke Mori, Masafumi Nishimura
Learning machine that considers global structure of data

Patent number: 8065241

Abstract: A new machine learning technique is herein disclosed which generalizes the support vector machine framework. A separating hyperplane in a separating space is optimized in accordance with generalized constraints which dependent upon the clustering of the input vectors in the dataset.

Type: Grant

Filed: April 9, 2008

Date of Patent: November 22, 2011

Assignee: NEC Laboratories America, Inc.

Inventors: Vladimir N. Vapnik, Michael R. Miller, Margaret A. Miller, legal representative
Speech recognition apparatus

Patent number: 8060368

Abstract: A voice recognition apparatus 10, which performs voice recognition of an input voice by referring to a voice recognition dictionary and outputs a voice recognition result, has an external information acquiring section 14 for acquiring from externally connected devices 20-1-20-N connected thereto a type of each externally connected device, and for acquiring data recorded in each externally connected device; a vocabulary extracting analyzing section 15 and 16 for extracting a vocabulary item from the data as an extracted vocabulary item, and for producing analysis data by analyzing the extracted vocabulary item and by providing the extracted vocabulary item with reading; and a dictionary generating section 17 for storing the analysis data in the voice recognition dictionary corresponding to the type. For each type of the externally connected devices, one of the voice recognition dictionaries 13-1-13-N is assigned.

Type: Grant

Filed: August 18, 2006

Date of Patent: November 15, 2011

Assignee: Mitsubishi Electric Corporation

Inventors: Masanobu Osawa, Reiko Okada, Takashi Ebihara
Dialog processing system, dialog processing method and computer program

Patent number: 8060365

Abstract: A dialog processing system which includes a target expression data extraction unit for extracting a plurality of target expression data each including a pattern matching portion which matches an utterance pattern, which are inputted by an utterance pattern input unit and is an utterance structure derived from contents of field-independent general conversations, among a plurality of utterance data which are inputted by an utterance data input unit and obtained by converting contents of a plurality of conversations in one field; a feature extraction unit for retrieving the pattern matching portions, respectively, from the plurality of target expression data extracted, and then for extracting feature quantity common to the plurality of pattern matching portions; and a mandatory data extraction unit for extracting mandatory data in the one field included in the plurality of utterance data by use of the feature quantities extracted.

Type: Grant

Filed: July 3, 2008

Date of Patent: November 15, 2011

Assignee: Nuance Communications, Inc.

Inventors: Nobuyasu Itoh, Shiho Negishi, Hironori Takeuchi
SPEECH DIALOGUE APPARATUS, DIALOGUE CONTROL METHOD, AND DIALOGUE CONTROL PROGRAM

Publication number: 20110276329

Abstract: A speech dialogue apparatus, a dialogue control method, and a dialogue control program are provided, whereby an appropriate dialogue control is enabled by determining a user's proficiency level in a dialogue behavior correctly and performing an appropriate dialogue control according to the user's proficiency level correctly determined, without being influenced by an accidental one-time behavior of the user. An input unit 1 inputs a speech uttered by the user. An extraction unit 3 extracts a proficiency level determination factor that is a factor for determining a user's proficiency level in a dialogue behavior, based upon an input result of the speech of the input unit 1. A history storage unit 4 stores as a history the proficiency level determination factor extracted by the extraction unit 3.

Type: Application

Filed: January 20, 2010

Publication date: November 10, 2011

Inventors: Masaaki Ayabe, Jun Okamoto
SPEECH-BASED SPEAKER RECOGNITION SYSTEMS AND METHODS

Publication number: 20110276323

Abstract: The illustrative embodiments described herein provide systems and methods for authenticating a speaker. In one embodiment, a method includes receiving reference speech input including a reference passphrase to form a reference recording, and receiving test speech input including a test passphrase to form a test recording. The method includes determining whether the test passphrase matches the reference passphrase, and determining whether one or more voice features of the speaker of the test passphrase matches one or more voice features of the speaker of the reference passphrase. The method authenticates the speaker of the test speech input in response to determining that the reference passphrase matches the test passphrase and that one or more voice features of the speaker of the test passphrase matches one or more voice features of the speaker of the reference passphrase.

Type: Application

Filed: May 6, 2010

Publication date: November 10, 2011

Applicant: Senam Consulting, Inc.

Inventor: Serge Olegovich Seyfetdinov
Voice dialing using a rejection reference

Patent number: 8055502

Abstract: A voice dialing method includes the steps of receiving an utterance from a user, decoding the utterance to identify a recognition result for the utterance, and communicating to the user the recognition result. If an indication is received from the user that the communicated recognition result is incorrect, then it is added to a rejection reference. Then, when the user repeats the misunderstood utterance, the rejection reference can be used to eliminate the incorrect recognition result as a potential subsequent recognition result. The method can be used for single or multiple digits or digit strings.

Type: Grant

Filed: November 28, 2006

Date of Patent: November 8, 2011

Assignee: General Motors LLC

Inventors: Jason W. Clark, Rathinavelu Chengalvarayan, Timothy J. Grost, Dana B. Fecher, Jeremy M. Spaulding
Methods and apparatus for audio data analysis and data mining using speech recognition

Patent number: 8055503

Abstract: A system and method provide an audio analysis intelligence tool with ad-hoc search capabilities using spoken words as an organized data form. An SQL-like interface is used to process and search audio data and combine it with other traditional data forms to enhance searching of audio segments to identify those audio segments satisfying minimum confidence levels for a match.

Type: Grant

Filed: November 1, 2006

Date of Patent: November 8, 2011

Assignee: Siemens Enterprise Communications, Inc.

Inventors: Robert Scarano, Lawrence Mark
Quality evaluation tool for dynamic voice portals

Patent number: 8050918

Abstract: A method and system for evaluating the quality of voice input recognition by a voice portal is provided. An analysis interface extracts a set of current grammars from the voice portal. A test pattern generator generates a test input for each current grammar. The test input includes a test pattern and a set of active grammars corresponding to each current grammar. The system further includes a text-to-speech engine for entering each test pattern into the voice server. A results collector analyzes each test pattern entered into the voice server with the speech recognition engine against the set of active grammars corresponding to the current grammar for said test pattern. A results analyzer derives a set of statistics of a quality of recognition of each current grammar.

Type: Grant

Filed: December 11, 2003

Date of Patent: November 1, 2011

Assignee: Nuance Communications, Inc.

Inventors: Reza Ghasemi, Walter Haenel
Speaker adaptation of vocabulary for speech recognition

Patent number: 8046224

Abstract: A phonetic vocabulary for a speech recognition system is adapted to a particular speaker's pronunciation. A speaker can be attributed specific pronunciation styles, which can be identified from specific pronunciation examples. Consequently, a phonetic vocabulary can be reduced in size, which can improve recognition accuracy and recognition speed.

Type: Grant

Filed: April 18, 2008

Date of Patent: October 25, 2011

Assignee: Nuance Communications, Inc.

Inventors: Nitendra Rajput, Ashish Verma
Speech recognition circuit using parallel processors

Patent number: 8036890

Abstract: A speech recognition circuit comprises an input buffer for receiving processed speech parameters. A lexical memory contains lexical data for word recognition. The lexical data comprises a plurality of lexical tree data structures. Each lexical tree data structure comprises a model of words having common prefix components. An initial component of each lexical tree structure is unique. A plurality of lexical tree processors are connected in parallel to the input buffer for processing the speech parameters in parallel to perform parallel lexical tree processing for word recognition by accessing the lexical data in the lexical memory. A results memory is connected to the lexical tree processors for storing processing results from the lexical tree processors and lexical tree identifiers to identify lexical trees to be processed by the lexical tree processors.

Type: Grant

Filed: September 4, 2009

Date of Patent: October 11, 2011

Assignee: Zentian Limited

Inventor: Mark Catchpole
Closed-loop command and response system for automatic communications between interacting computer systems over an audio communications channel

Patent number: 8032373

Abstract: A system and method for enabling two computer systems to communicate over an audio communications channel, such as a voice telephony connection. Such a system includes a software application that enables a user's computer to call, interrogate, download, and manage a voicemail account stored on a telephone company's computer, without human intervention. A voicemail retrieved from the telephone company's computer can be stored in a digital format on the user's computer. In such a format, the voicemail can be readily archived, or even distributed throughout a network, such as the Internet, in a digital form, such as an email attachment. Preferably a computationally efficient audio recognition algorithm is employed by the user's computer to respond to and navigate the automated audio menu of the telephone company's computer.

Type: Grant

Filed: February 28, 2007

Date of Patent: October 4, 2011

Assignee: Intellisist, Inc.

Inventor: Martin R. M. Dunsmuir

prev … 10 11 12 13 14 15 16 17 18 … next