Subportions Patents (Class 704/254)

Method and apparatus for aligning texts

Patent number: 8527272

Abstract: A method and apparatus for aligning texts. The method includes acquiring a target text and a reference text and aligning the target text and the reference text at word level based on phoneme similarity. The method can be applied to automatically archiving a multimedia resource and a method of automatically searching a multimedia resource.

Type: Grant

Filed: August 27, 2010

Date of Patent: September 3, 2013

Assignee: International Business Machines Corporation

Inventors: Yong Qin, Qin Shi, Zhiwei Shuang, Shi Lei Zhang, Jie Zhou
AUTOMATIC SPOKEN LANGUAGE IDENTIFICATION BASED ON PHONEME SEQUENCE PATTERNS

Publication number: 20130226583

Abstract: A language identification system that includes a universal phoneme decoder (UPD) is described. The UPD contains a universal phoneme set representing both 1) all phonemes occurring in the set of two or more spoken languages, and 2) captures phoneme correspondences across languages, such that a set of unique phoneme patterns and probabilities are calculated in order to identify a most likely phoneme occurring each time in the audio files in the set of two or more potential languages in which the UPD was trained on. Each statistical language model (SLM) uses the set of unique phoneme patterns created for each language in the set to distinguish between spoken human languages in the set of languages. The run-time language identifier module identifies a particular human language being spoken by utilizing the linguistic probabilities supplied by the SLMs that are based on the set of unique phoneme patterns created for each language.

Type: Application

Filed: March 18, 2013

Publication date: August 29, 2013

Applicant: Autonomy Corporation Limited

Inventor: Autonomy Corporation Limited
Computer-implemented system and method for processing audio in a voice response environment

Patent number: 8521527

Abstract: A computer-implemented system and method for processing audio in a voice response environment is provided. A database of host scripts each comprising signature files of audio phrases and actions to take when one of the audio phrases is recognized is maintained. The host scripts are loaded and a call to a voice mail server is initiated. Incoming audio buffers are received during the call from voice messages stored on the voice mail server. The incoming audio buffers are processed. A signature data structure is created for each audio buffer. The signature data structure is compared with signatures of expected phrases in the host scripts. The actions stored in the host scripts are executed when the signature data structure matches the signature of the expected phrase.

Type: Grant

Filed: September 10, 2012

Date of Patent: August 27, 2013

Assignee: Intellisist, Inc.

Inventor: Martin R. M. Dunsmuir
Method for segmenting audio signals

Patent number: 8521529

Abstract: An input signal is converted to a feature-space representation. The feature-space representation is projected onto a discriminant subspace using a linear discriminant analysis transform to enhance the separation of feature clusters. Dynamic programming is used to find global changes to derive optimal cluster boundaries. The cluster boundaries are used to identify the segments of the audio signal.

Type: Grant

Filed: April 18, 2005

Date of Patent: August 27, 2013

Assignee: Creative Technology Ltd

Inventors: Michael M. Goodwin, Jean Laroche
SPEECH UNDERSTANDING METHOD AND SYSTEM

Publication number: 20130218563

Abstract: A speech recognition system includes a mobile device and a remote server. The mobile device receives the speech from the user and extracts the features and phonemes from the speech. Selected phonemes and measures of uncertainty are transmitted to the server, which processes the phonemes for speech understanding and transmits a text of the speech (or the context or understanding of the speech) back to the mobile device.

Type: Application

Filed: January 29, 2013

Publication date: August 22, 2013

Applicant: Intelligent Mechatronic Systems Inc.

Inventor: Intelligent Mechatronic Systems Inc.
Speech-to-speech translation

Patent number: 8515749

Abstract: Systems and methods for facilitating communication including recognizing speech in a first language represented in a first audio signal; forming a first text representation of the speech; processing the first text representation to form data representing a second audio signal; and causing presentation of the second audio signal to a second user while responsive to an interrupt signal from a first user. In some embodiments, processing the first text representation includes translating the first text representation to a second text representation in a second language and processing the second text representation to form the data representing the second audio signal. In some embodiments include accepting an interrupt signal from the first user and interrupting the presentation of the second audio signal.

Type: Grant

Filed: May 20, 2009

Date of Patent: August 20, 2013

Assignee: Raytheon BBN Technologies Corp.

Inventor: David G. Stallard
Acoustic model adaptation methods based on pronunciation variability analysis for enhancing the recognition of voice of non-native speaker and apparatus thereof

Patent number: 8515753

Abstract: The example embodiment of the present invention provides an acoustic model adaptation method for enhancing recognition performance for a non-native speaker's speech. In order to adapt acoustic models, first, pronunciation variations are examined by analyzing a non-native speaker's speech. Thereafter, based on variation pronunciation of a non-native speaker's speech, acoustic models are adapted in a state-tying step during a training process of acoustic models. When the present invention for adapting acoustic models and a conventional acoustic model adaptation scheme are combined, more-enhanced recognition performance can be obtained. The example embodiment of the present invention enhances recognition performance for a non-native speaker's speech while reducing the degradation of recognition performance for a native speaker's speech.

Type: Grant

Filed: March 30, 2007

Date of Patent: August 20, 2013

Assignee: Gwangju Institute of Science and Technology

Inventors: Hong Kook Kim, Yoo Rhee Oh, Jae Sam Yoon
Realtime acoustic adaptation using stability measures

Patent number: 8515750

Abstract: Methods, systems, and computer programs encoded on a computer storage medium for real-time acoustic adaptation using stability measures are disclosed. The methods include the actions of receiving a transcription of a first portion of a speech session, wherein the transcription of the first portion of the speech session is generated using a speaker adaptation profile. The actions further include receiving a stability measure for a segment of the transcription and determining that the stability measure for the segment satisfies a threshold. Additionally, the actions include triggering an update of the speaker adaptation profile using the segment, or using a portion of speech data that corresponds to the segment. And the actions include receiving a transcription of a second portion of the speech session, wherein the transcription of the second portion of the speech session is generated using the updated speaker adaptation profile.

Type: Grant

Filed: September 19, 2012

Date of Patent: August 20, 2013

Assignee: Google Inc.

Inventors: Xin Lei, Petar Aleksic
Speech recognition apparatus and method and program therefor

Patent number: 8510111

Abstract: A speech recognition apparatus includes a generating unit generating a speech-feature vector expressing a feature for each of frames obtained by dividing an input speech, a storage unit storing a first acoustic model obtained by modeling a feature of each word by using a state transition model, a storage unit configured to store at least one second acoustic model, a calculation unit calculating, for each state, a first probability of transition to an at-end-frame state to obtain first probabilities, and select a maximum probability of the first probabilities, a selection unit selecting a maximum-probability-transition path, a conversion unit converting the maximum-probability-transition path into a corresponding-transition-path corresponding to the second acoustic model, a calculation unit calculating a second probability of transition to the at-end-frame state on the corresponding-transition-path, and a finding unit finding to which word the input speech corresponds based on the maximum probability and the s

Type: Grant

Filed: February 8, 2008

Date of Patent: August 13, 2013

Assignee: Kabushiki Kaisha Toshiba

Inventors: Masaru Sakai, Hiroshi Fujimura, Shinichi Tanaka
Speech retrieval apparatus and speech retrieval method

Patent number: 8504367

Abstract: Disclosed are a speech retrieval apparatus and a speech retrieval method for searching, in a speech database, for an audio file matching an input search term by using an acoustic model serialization code, a phonemic code, a sub-word unit, and a speech recognition result of speech. The speech retrieval apparatus comprises a first conversion device, a first division device, a first speech retrieval unit creation device, a second conversion device, a second division device, a second speech retrieval unit creation device, and a matching device. The speech retrieval method comprises a first conversion step, a first division step, a first speech retrieval unit creation step, a second conversion step, a second division step, a second speech retrieval unit creation step, and a matching step.

Type: Grant

Filed: August 31, 2010

Date of Patent: August 6, 2013

Assignee: Ricoh Company, Ltd.

Inventors: Dafei Shi, Yaojie Lu, Yueyan Yin, Jichuan Zheng, Lijun Zhao
Voice processing system, method for allocating acoustic and/or written character strings to words or lexical entries

Patent number: 8498859

Abstract: A language-processing system has an input for language in text or audio, as a message, an extractor operating to separate words and phrases from the input, to consult a knowledge base, and to assign a concept to individual ones of the words or phrases, and a connector operating to link the concepts to form a statement. In some cases there is a situation model updated as language is processed. The system may be used for controlling technical systems, such as robotic systems.

Type: Grant

Filed: November 12, 2003

Date of Patent: July 30, 2013

Inventor: Bernd Schönebeck
Dynamic speech recognition and transcription among users having heterogeneous protocols

Patent number: 8498871

Abstract: A system for facilitating free form dictation, including directed dictation and constrained recognition and/or structured transcription among users having heterogeneous protocols for generating, transcribing, and exchanging recognized and transcribed speech. The system includes a system transaction manager having a “system protocol,” to receive a speech information request from an authorized user. The speech information request is generated using a user interface capable of bi-directional communication with the system transaction manager and supporting dictation applications. A speech recognition and/or transcription engine (ASR), in communication with the systems transaction manager, receives the speech information request generates a transcribed response, and transmits the response to the system transaction manager. The system transaction manager routes the response to one or more of the users. In another embodiment, the system employs a virtual sound driver for streaming free form dictation to any ASR.

Type: Grant

Filed: May 24, 2011

Date of Patent: July 30, 2013

Assignee: Advanced Voice Recognition Systems, Inc.

Inventors: Joseph H. Miglietta, Michael K. Davis
Information Processing Device, Large Vocabulary Continuous Speech Recognition Method, and Program

Publication number: 20130191129

Abstract: System and method for performing speech recognition using acoustic invariant structure for large vocabulary continuous speech. An information processing device receives sound as input and performs speech recognition. The information processing device includes: a speech recognition processing unit for outputting a speech recognition score, a structure score calculation unit for calculation of a structure score that is a score that, with respect for each hypothesis concerning all phoneme pairs comprising the hypothesis, is found by applying phoneme pair-by-pair weighting to phoneme pair inter-distribution distance likelihood and then performing summation, and a ranking unit for ranking the multiple hypotheses based on a sum value of speech recognition score and structure score.

Type: Application

Filed: January 18, 2013

Publication date: July 25, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: International Business Machines Corporation
CONTINUOUS PHONETIC RECOGNITION METHOD USING SEMI-MARKOV MODEL, SYSTEM FOR PROCESSING THE SAME, AND RECORDING MEDIUM FOR STORING THE SAME

Publication number: 20130191128

Abstract: A continuous phonetic recognition method using semi-Markov model, a system for processing the method, and a recording medium for storing the method. In and embodiment of the phonetic recognition method of recognizing phones using a speech recognition system, a phonetic data recognition device receives speech, and a phonetic data processing device recognizes phones from the received speech using a semi-Markov model.

Type: Application

Filed: August 28, 2012

Publication date: July 25, 2013

Applicant: KOREA ADVANCED INSTITUTE OF SCIENCE AND TECHNOLOGY

Inventors: Chang Dong Yoo, Sung Woong Kim
Speech recognition using variable-length context

Patent number: 8494850

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for recognizing speech using a variable length of context. Speech data and data identifying a candidate transcription for the speech data are received. A phonetic representation for the candidate transcription is accessed. Multiple test sequences are extracted for a particular phone in the phonetic representation. Each of the multiple test sequences includes a different set of contextual phones surrounding the particular phone. Data indicating that an acoustic model includes data corresponding to one or more of the multiple test sequences is received. From among the one or more test sequences, the test sequence that includes the highest number of contextual phones is selected. A score for the candidate transcription is generated based on the data from the acoustic model that corresponds to the selected test sequence.

Type: Grant

Filed: June 29, 2012

Date of Patent: July 23, 2013

Assignee: Google Inc.

Inventors: Ciprian I. Chelba, Peng Xu, Fernando Pereira
SPEECH RECOGNITION SYSTEM WITH HUGE VOCABULARY

Publication number: 20130185073

Abstract: The invention deals with speech recognition, such as a system for recognizing words in continuous speech. A speech recognition system is disclosed which is capable of recognizing a huge number of words, and in principle even an unlimited number of words. The speech recognition system comprises a word recognizer for deriving a best path through a word graph, and wherein words are assigned to the speech based on the best path. The word score being obtained from applying a phonemic language model to each word of the word graph. Moreover, the invention deals with an apparatus and a method for identifying words from a sound block and to computer readable code for implementing the method.

Type: Application

Filed: March 6, 2013

Publication date: July 18, 2013

Applicant: Nuance Communications Austria GmbH

Inventor: Nuance Communications Austria GmbH
Disambiguation of spoken proper names

Patent number: 8489398

Abstract: A method is performed by a communication device that is configured to communicate with a server over a network. The method includes outputting, to the server, speech data for spoken words; receiving, from the server, speech recognition candidates for a spoken word in the speech data; checking the speech recognition candidates against a database on the communication device; and selecting one or more of the speech recognition candidates for use by the communication device based on the checking.

Type: Grant

Filed: January 14, 2011

Date of Patent: July 16, 2013

Assignee: Google Inc.

Inventor: Alexander H. Gruenstein
CHINESE TEXT READABILITY ASSESSING SYSTEM AND METHOD

Publication number: 20130179169

Abstract: A Chinese text readability assessing system analyzes and evaluates the readability of text data. A word segmentation module compares the text data with a corpus to obtain a plurality of word segments from the text data and provide part-of-speech settings corresponding to the word segments. A readability index analysis module analyzes the word segments and the part-of-speech settings based on readability indices to calculate index values of the readability indices in the text data. The index values are inputted to a readability mathematical model in a knowledge-evaluated training module, and the readability mathematical model produces a readability analysis result. Accordingly, the Chinese text readability assessing system of the present invention evaluates the readability of Chinese texts by word segmentation and the readability indices analysis in conjunction with the readability mathematical model.

Type: Application

Filed: July 5, 2012

Publication date: July 11, 2013

Applicant: NATIONAL TAIWAN NORMAL UNIVERSITY

Inventors: Yao-Ting Sung, Ju-Ling Chen
Method and system for assessing pronunciation difficulties of non-native speakers

Patent number: 8478597

Abstract: The present disclosure presents a useful metric for assessing the relative difficulty which non-native speakers face in pronouncing a given utterance and a method and systems for using such a metric in the evaluation and assessment of the utterances of non-native speakers. In an embodiment, the metric may be based on both known sources of difficulty for language learners and a corpus-based measure of cross-language sound differences. The method may be applied to speakers who primarily speak a first language speaking utterances in any non-native second language.

Type: Grant

Filed: January 10, 2006

Date of Patent: July 2, 2013

Assignee: Educational Testing Service

Inventors: Derrick Higgins, Klaus Zechner, Yoko Futagi, Rene Lawless
Spoken Utterance Classification Training for a Speech Recognition System

Publication number: 20130159000

Abstract: The subject disclosure is directed towards training a classifier for spoken utterances without relying on human-assistance. The spoken utterances may be related to a voice menu program for which a speech comprehension component interprets the spoken utterances into voice menu options. The speech comprehension component provides confirmations to some of the spoken utterances in order to accurately assign a semantic label. For each spoken utterance with a denied confirmation, the speech comprehension component automatically generates a pseudo-semantic label that is consistent with the denied confirmation and selected from a set of potential semantic labels and updates a classification model associated with the classifier using the pseudo-semantic label.

Type: Application

Filed: December 15, 2011

Publication date: June 20, 2013

Applicant: MICROSOFT CORPORATION

Inventors: Yun-Cheng Ju, James Garnet Droppo, III
Voice input system and voice input method

Patent number: 8463609

Abstract: In the present invention, a voice input system and a voice input method are provided. The voice input method includes the steps of: (A) initiating a speech recognition process by a first input associated with a first parameter of a first speech recognition subject; (B) providing a voice and a searching space constructed by a speech recognition model associated with the first speech recognition subject; (C) obtaining a sub-searching space from the searching space based on the first parameter; (D) searching at least one candidate item associated with the voice from the sub-searching space; and (E) showing the at least one candidate item.

Type: Grant

Filed: April 29, 2009

Date of Patent: June 11, 2013

Assignee: Delta Electronics Inc.

Inventors: Keng-Hung Yeh, Liang-Sheng Huang, Chao-Jen Huang, Jia-Lin Shen
Automatic evaluation of spoken fluency

Patent number: 8457967

Abstract: A procedure to automatically evaluate the spoken fluency of a speaker by prompting the speaker to talk on a given topic, recording the speaker's speech to get a recorded sample of speech, and then analyzing the patterns of disfluencies in the speech to compute a numerical score to quantify the spoken fluency skills of the speakers. The numerical fluency score accounts for various prosodic and lexical features, including formant-based filled-pause detection, closely-occurring exact and inexact repeat N-grams, normalized average distance between consecutive occurrences of N-grams. The lexical features and prosodic features are combined to classify the speaker with a C-class classification and develop a rating for the speaker.

Type: Grant

Filed: August 15, 2009

Date of Patent: June 4, 2013

Assignee: Nuance Communications, Inc.

Inventors: Kartik Audhkhasi, Om D. Deshmukh, Kundan Kandhway, Ashish Verma
METHOD AND SYSTEM FOR GENERATING SEARCH NETWORK FOR VOICE RECOGNITION

Publication number: 20130138441

Abstract: Disclosed is a method of generating a search network for voice recognition, the method including: generating a pronunciation transduction weighted finite state transducer by implementing a pronunciation transduction rule representing a phenomenon of pronunciation transduction between recognition units as a weighted finite state transducer; and composing the pronunciation transduction weighted finite state transducer and one or more weighted finite state transducers.

Type: Application

Filed: August 14, 2012

Publication date: May 30, 2013

Applicant: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE

Inventors: Seung Hi Kim, Dong Hyun Kim, Young Ik Kim, Jun Park, Hoon Young Cho, Sang Hun Kim
Providing Programming Information in Response to Spoken Requests

Publication number: 20130124205

Abstract: A system allows a user to obtain information about television programming and to make selections of programming using conversational speech. The system includes a speech recognizer that recognizes spoken requests for television programming information. A speech synthesizer generates spoken responses to the spoken requests for television programming information. A user may use a voice user interface as well as a graphical user interface to interact with the system to facilitate the selection of programming choices.

Type: Application

Filed: January 3, 2013

Publication date: May 16, 2013

Inventor: Christopher H. Genly
System and method for customized voice response

Patent number: 8442827

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for approximating an accent source. A system practicing the method collects data associated with customer specific services, generates country-specific or dialect-specific weights for each service in the customer specific services list, generates a summary weight based on an aggregation of the country-specific or dialect-specific weights, and sets an interactive voice response system language model based on the summary weight and the country-specific or dialect-specific weights. The interactive voice response system can also change the user interface based on the interactive voice response system language model. The interactive voice response system can tune a voice recognition algorithm based on the summary weight and the country-specific weights. The interactive voice response system can adjust phoneme matching in the language model based on a possibility that the speaker is using other languages.

Type: Grant

Filed: June 18, 2010

Date of Patent: May 14, 2013

Assignee: AT&T Intellectual Property I, L.P.

Inventor: Nicholas Duffield
Updating standard patterns of words in a voice recognition dictionary

Patent number: 8438027

Abstract: An object of the invention is to conveniently increase standard patterns registered in a voice recognition device to efficiently extend the amount of words that can be voice-recognized. New standard patterns are generated by modifying a part of an existing standard pattern. A pattern matching unit 16 of a modifying-part specifying unit 14 performs pattern matching process to specify a part to be modified in the existing standard pattern of a usage source. A standard pattern generating unit 18 generates the new standard patterns by cutting or deleting voice data of the modifying part of the usage-source standard pattern, substituting the voice data of the modifying part of the usage-source standard pattern for another voice data, or combining the voice data of the modifying part of the usage-source standard pattern with another voice data. A standard pattern database update unit 20 adds the new standard patterns to a standard pattern database 24.

Type: Grant

Filed: May 25, 2006

Date of Patent: May 7, 2013

Assignee: Panasonic Corporation

Inventors: Toshiyuki Teranishi, Kouji Hatano
Nametag confusability determination

Patent number: 8438028

Abstract: A method of and system for managing nametags including receiving a command from a user to store a nametag, prompting the user to input a number to be stored in association with the nametag, receiving an input for the number from the user, prompting the user to input the nametag to be stored in association with the number, receiving an input for the nametag from the user, processing the nametag input, and calculating confusability of the nametag input in multiple individual domains including a nametag domain, a number domain, and a command domain.

Type: Grant

Filed: May 18, 2010

Date of Patent: May 7, 2013

Assignee: General Motors LLC

Inventors: Rathinavelu Chengalvarayan, Lawrence D. Cepuran
Augmenting an audio signal via extraction of musical features and obtaining of media fragments

Patent number: 8433575

Abstract: A system and method is described in which a multimedia story is rendered to a consumer in dependence on features extracted from an audio signal representing for example a musical selection of the consumer. Features such as key changes and tempo of the music selection are related to dramatic parameters defined by and associated with story arcs, narrative story rules and film or story structure. In one example a selection of a few music tracks provides input audio signals (602) from which musical features are extracted (604), following which a dramatic parameter list and timeline are generated (606). Media fragments are then obtained (608), the fragments having story content associated with the dramatic parameters, and the fragments output (610) with the music selection.

Type: Grant

Filed: December 10, 2003

Date of Patent: April 30, 2013

Assignee: AMBX UK Limited

Inventors: David A. Eves, Richard S. Cole, Christopher Thorne
Prosody modification device, prosody modification method, and recording medium storing prosody modification program

Patent number: 8433573

Abstract: A prosody modification device includes: a real voice prosody input part that receives real voice prosody information extracted from an utterance of a human; a regular prosody generating part that generates regular prosody information having a regular phoneme boundary that determines a boundary between phonemes and a regular phoneme length of a phoneme by using data representing a regular or statistical phoneme length in an utterance of a human with respect to a section including at least a phoneme or a phoneme string to be modified in the real voice prosody information; and a real voice prosody modification part that resets a real voice phoneme boundary by using the generated regular prosody information so that the real voice phoneme boundary and a real voice phoneme length of the phoneme or the phoneme string to be modified in the real voice prosody information are approximate to an actual phoneme boundary and an actual phoneme length of the utterance of the human, thereby modifying the real voice prosody in

Type: Grant

Filed: February 11, 2008

Date of Patent: April 30, 2013

Assignee: Fujitsu Limited

Inventors: Kentaro Murase, Nobuyuki Katae
Apparatus and method for classification and segmentation of audio content, based on the audio signal

Patent number: 8428949

Abstract: An apparatus for classifying an input audio signal into audio contents of a first and second class, comprising an audio segmentation module adapted to segment said input audio signal into segments of a predetermined length; a feature computation module adapted to calculate for the segments features characterizing said audio input signal; a threshold comparison module adapted to generate a feature vector for each of said one or more segments based on a plurality of predetermined thresholds, the thresholds including for each of the audio contents of the first class and of the second class a substantially near certainty threshold, a substantially high certainty threshold, and a substantially low certainty threshold; and a classification module adapted to analyze the feature vector and classify each one of said one or more segments as audio contents of the first class, of the second class, or as non-decisive audio contents.

Type: Grant

Filed: June 30, 2009

Date of Patent: April 23, 2013

Assignee: Waves Audio Ltd.

Inventors: Itai Neoran, Yizhar Lavner, Dima Ruinskiy
Recognizer weight learning apparatus, speech recognition apparatus, and system

Patent number: 8428950

Abstract: A speech recognition apparatus (110) selects an optimum recognition result from recognition results output from a set of speech recognizers (s1-sM) based on a majority decision. This decision is implemented with taking into account weight values, as to the set of the speech recognizers, learned by a learning apparatus (100). The learning apparatus includes a unit (103) selecting speech recognizers corresponding to characteristics of speech for learning (101), a unit (104) finding recognition results of the speech for learning by using the selected speech recognizers, a unit (105) unifying the recognition results and generating a word string network, and a unit (106) finding weight values concerning a set of the speech recognizers by implementing learning processing.

Type: Grant

Filed: January 18, 2008

Date of Patent: April 23, 2013

Assignee: NEC Corporation

Inventors: Yoshifumi Onishi, Tadashi Emori
Speech recognition system with huge vocabulary

Patent number: 8417528

Abstract: The invention deals with speech recognition, such as a system for recognizing words in continuous speech. A speech recognition system is disclosed which is capable of recognizing a huge number of words, and in principle even an unlimited number of words. The speech recognition system comprises a word recognizer for deriving a best path through a word graph, and wherein words are assigned to the speech based on the best path. The word score being obtained from applying a phonemic language model to each word of the word graph. Moreover, the invention deals with an apparatus and a method for identifying words from a sound block and to computer readable code for implementing the method.

Type: Grant

Filed: February 3, 2012

Date of Patent: April 9, 2013

Assignee: Nuance Communications Austria GmbH

Inventor: Zsolt Saffer
Speaker adaptation of vocabulary for speech recognition

Patent number: 8417527

Abstract: A phonetic vocabulary for a speech recognition system is adapted to a particular speaker's pronunciation. A speaker can be attributed specific pronunciation styles, which can be identified from specific pronunciation examples. Consequently, a phonetic vocabulary can be reduced in size, which can improve recognition accuracy and recognition speed.

Type: Grant

Filed: October 13, 2011

Date of Patent: April 9, 2013

Assignee: Nuance Communications, Inc.

Inventors: Nitendra Rajput, Ashish Verma
APPARATUS AND METHOD FOR SPEECH RECOGNITION

Publication number: 20130085757

Abstract: An embodiment of an apparatus for speech recognition includes a plurality of trigger detection units, each of which is configured to detect a start trigger for recognizing a command utterance for controlling a device, a selection unit, utilizing a signal from one or more sensors embedded on the device, configured to select a selected trigger detection unit among the trigger detection units, the selected trigger detection unit being appropriate to a usage environment of the device, and a recognition unit configured to recognize the command utterance when the start trigger is detected by the selected trigger detection unit.

Type: Application

Filed: June 29, 2012

Publication date: April 4, 2013

Applicant: Kabushiki Kaisha Toshiba

Inventors: Masanobu NAKAMURA, Akinori KAWAMURA
Noise robust speech classifier ensemble

Patent number: 8412525

Abstract: Embodiments for implementing a speech recognition system that includes a speech classifier ensemble are disclosed. In accordance with one embodiment, the speech recognition system includes a classifier ensemble to convert feature vectors that represent a speech vector into log probability sets. The classifier ensemble includes a plurality of classifiers. The speech recognition system includes a decoder ensemble to transform the log probability sets into output symbol sequences. The speech recognition system further includes a query component to retrieve one or more speech utterances from a speech database using the output symbol sequences.

Type: Grant

Filed: April 30, 2009

Date of Patent: April 2, 2013

Assignee: Microsoft Corporation

Inventors: Kunal Mukerjee, Kazuhito Koishida, Shankar Regunathan
Guidance information display device, guidance information display method and recording medium

Patent number: 8407047

Abstract: A guidance information display device includes: a voice input unit; a display unit for displaying guidance information; an operation unit for accepting an operation; and a processor capable of executing the following processes of: a voice recognition process operation of performing voice recognition based on inputted voice; a calculation operation of calculating an evaluation value for a recognition result of voice recognition by the voice recognition process operation; a display operation of reading out guidance information corresponding to the recognition result from a storage unit, which stores the guidance information, and displaying the guidance information at a display unit; and a decision operation of deciding a display mode of the guidance information at the display unit based on a variable value, which varies with an operation from the operation unit for the guidance information displayed by the display operation, and the evaluation value calculated by the calculation operation.

Type: Grant

Filed: March 31, 2009

Date of Patent: March 26, 2013

Assignee: Fujitsu Limited

Inventor: Kenji Abe
Generating a frequency warping function based on phoneme and context

Patent number: 8401861

Abstract: A method for generating a frequency warping function comprising preparing the training speech of a source and a target speaker; performing frame alignment on the training speech of the speakers; selecting aligned frames from the frame-aligned training speech of the speakers; extracting corresponding sets of formant parameters from the selected aligned frames; and generating a frequency warping function based on the corresponding sets of formant parameters. The step of selecting aligned frames preferably selects a pair of aligned frames in the middle of the same or similar frame-aligned phonemes with the same or similar contexts in the speech of the source speaker and target speaker. The step of generating a frequency warping function preferably uses the various pairs of corresponding formant parameters in the corresponding sets of formant parameters as key positions in a piecewise linear frequency warping function to generate the frequency warping function.

Type: Grant

Filed: January 17, 2007

Date of Patent: March 19, 2013

Assignee: Nuance Communications, Inc.

Inventors: Shuang Zhi Wei, Raimo Bakis, Ellen Marie Eide, Liqin Shen
Speech recognition system and program therefor

Patent number: 8401847

Abstract: An unknown word is additionally registered in a speech recognition dictionary by utilizing a correction result, and a new pronunciation of the word that has been registered in a speech recognition dictionary is additionally registered in the speech recognition dictionary, thereby increasing the accuracy of speech recognition. The start time and finish time of each phoneme unit in speech data corresponding to each phoneme included in a phoneme sequence acquired by a phoneme sequence converting section 13 are added to the phoneme sequence. A phoneme sequence extracting section 15 extracts from the phoneme sequence a phoneme sequence portion composed of phonemes existing in a segment corresponding to the period from the start time to the finish time of the word segment of the word corrected by a word correcting section 9 and the extracted phoneme sequence portion is determined as the pronunciation of the corrected word.

Type: Grant

Filed: November 30, 2007

Date of Patent: March 19, 2013

Assignee: National Institute of Advanced Industrial Science and Technology

Inventors: Jun Ogata, Masataka Goto
Systems and methods for concatenation of words in text to speech synthesis

Patent number: 8396714

Abstract: Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized form text strings associated with media assets. A text string may be normalized and its native language determined for obtaining a target phoneme for providing human-sounding speech in a language (e.g., dialect or accent) that is familiar to a user. The algorithms may be implemented on a system including several dedicated render engines. The system may be part of a back end coupled to a front end including storage for media assets and associated synthesized speech, and a request processor for receiving and processing requests that result in providing the synthesized speech. The front end may communicate media assets and associated synthesized speech content over a network to host devices coupled to portable electronic devices on which the media assets and synthesized speech are played back.

Type: Grant

Filed: September 29, 2008

Date of Patent: March 12, 2013

Assignee: Apple Inc.

Inventors: Matthew Rogers, Kim Silverman, Devang Naik, Benjamin Rottler
TRANSCRIPT RE-SYNC

Publication number: 20130060572

Abstract: In an aspect, in general, method for aligning an audio recording and a transcript includes receiving a transcript including a plurality of terms, each term of the plurality of terms associated with a time location within a different version of the audio recording, forming a plurality of search terms from the terms of the transcript, determining possible time locations of the search terms in the audio recording, determining a correspondence between time locations within the different version of the audio recording associated with the search terms and the possible time locations of the search terms in the audio recording, and aligning the audio recording and the transcript including updating the time location associated with terms of the transcript based on the determined correspondence.

Type: Application

Filed: September 4, 2012

Publication date: March 7, 2013

Applicant: Nexidia Inc.

Inventors: Jacob B. Garland, Drew Lanham, Daryl Kip Watters, Marsal Gavalda, Mark Finlay, Kenneth K. Griggs
Multi-class constrained maximum likelihood linear regression

Patent number: 8386254

Abstract: A method of speech recognition converts an unknown speech input into a stream of representative features. The feature stream is transformed based on speaker dependent adaptation of multi-class feature models. Then automatic speech recognition is used to compare the transformed feature stream to multi-class speaker independent acoustic models to generate an output representative of the unknown speech input.

Type: Grant

Filed: May 2, 2008

Date of Patent: February 26, 2013

Assignee: Nuance Communications, Inc.

Inventors: Neeraj Deshmukh, Puming Zhan
Automatic pattern recognition using category dependent feature selection

Patent number: 8380506

Abstract: Disclosed are apparatus and methods that employ a modified version of a computational model of the human peripheral and central auditory system, and that provide for automatic pattern recognition using category dependent feature selection. The validity of the output of the model is examined by deriving feature vectors from the dimension expanded cortical response of the central auditory system for use in a conventional phoneme recognition task. In addition, the cortical response may be a place-coded data set where sounds are categorized according to the regions containing their most distinguishing features. This provides for a novel category-dependent feature selection apparatus and methods in which this mechanism may be utilized to better simulate robust human pattern (speech) recognition.

Type: Grant

Filed: November 29, 2007

Date of Patent: February 19, 2013

Assignee: Georgia Tech Research Corporation

Inventors: Woojay Jeon, Biing-Hwang Juang
System for recognizing speech for searching a database

Patent number: 8380505

Abstract: A system is provided for recognizing speech for searching a database. The system receives speech input as a spoken search request and then processes the speech input in a speech recognition step using a vocabulary for recognizing the spoken request. By processing the speech input words recognized in the speech input and included in the vocabulary are obtained to form at least one hypothesis. The hypothesis is then utilized to search a database using the at least one hypothesis as a search query. A search result is then received from the database and provided to the user.

Type: Grant

Filed: October 24, 2008

Date of Patent: February 19, 2013

Assignee: Nuance Communications, Inc.

Inventors: Lars König, Andreas Löw, Udo Haiber
Training and applying prosody models

Patent number: 8374873

Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.

Type: Grant

Filed: August 11, 2009

Date of Patent: February 12, 2013

Assignee: Morphism, LLC

Inventor: James H. Stephens, Jr.
Retrieving apparatus, retrieving method, and computer program product

Patent number: 8374845

Abstract: A word coinciding with a key word input by speech and a word related to the word are set as retrieval candidate words based on a word dictionary in which words representing formal names and aliases of the formal names are registered in association with a family attribute indicating a familiar relation among the words. Content related to any one of retrieval words selected out of the retrieval candidate words and a word related to the retrieval word is retrieved.

Type: Grant

Filed: February 29, 2008

Date of Patent: February 12, 2013

Assignee: Kabushiki Kaisha Toshiba

Inventors: Miwako Doi, Kaoru Suzuki, Toshiyuki Koga, Koichi Yamamoto
Method of recognizing speech

Patent number: 8374868

Abstract: A method for recognizing speech involves reciting, into a speech recognition system, an utterance including a numeric sequence that contains a digit string including a plurality of tokens and detecting a co-articulation problem related to at least two potentially co-articulated tokens in the digit string. The numeric sequence may be identified using i) a dynamically generated possible numeric sequence that potentially corresponds with the numeric sequence, and/or ii) at least one supplemental acoustic model. Also disclosed herein is a system for accomplishing the same.

Type: Grant

Filed: August 21, 2009

Date of Patent: February 12, 2013

Assignee: General Motors LLC

Inventors: Uma Arun, Sherri J Voran-Nowak, Rathinavelu Chengalvarayan, Gaurav Talwar
Utterance verification method and apparatus for isolated word N-best recognition result

Patent number: 8374869

Abstract: An utterance verification method for an isolated word N-best speech recognition result includes: calculating log likelihoods of a context-dependent phoneme and an anti-phoneme model based on an N-best speech recognition result for an input utterance; measuring a confidence score of an N-best speech-recognized word using the log likelihoods; calculating distance between phonemes for the N-best speech-recognized word; comparing the confidence score with a threshold and the distance with a predetermined mean of distances; and accepting the N-best speech-recognized word when the compared results for the confidence score and the distance correspond to acceptance.

Type: Grant

Filed: August 4, 2009

Date of Patent: February 12, 2013

Assignee: Electronics and Telecommunications Research Institute

Inventors: Jeom Ja Kang, Yunkeun Lee, Jeon Gue Park, Ho-Young Jung, Hyung-Bae Jeon, Hoon Chung, Sung Joo Lee, Euisok Chung, Ji Hyun Wang, Byung Ok Kang, Ki-young Park, Jong Jin Kim
System and Method for Discriminative Pronunciation Modeling for Voice Search

Publication number: 20130035939

Abstract: Disclosed herein is a method for speech recognition. The method includes receiving speech utterances, assigning a pronunciation weight to each unit of speech in the speech utterances, each respective pronunciation weight being normalized at a unit of speech level to sum to 1, for each received speech utterance, optimizing the pronunciation weight by identifying word and phone alignments and corresponding likelihood scores, and discriminatively adapting the pronunciation weight to minimize classification errors, and recognizing additional received speech utterances using the optimized pronunciation weights. A unit of speech can be a sentence, a word, a context-dependent phone, a context-independent phone, or a syllable. The method can further include discriminatively adapting pronunciation weights based on an objective function. The objective function can be maximum mutual information, maximum likelihood training, minimum classification error training, or other functions known to those of skill in the art.

Type: Application

Filed: October 11, 2012

Publication date: February 7, 2013

Applicant: AT&T INTELLECTUAL PROPERTY I, L.P.

Inventor: AT&T INTELLECTUAL PROPERTY I, L.P.
Directory dialer name recognition

Patent number: 8369492

Abstract: A method, apparatus, computer program product and service for directory dialer name recognition. The directory dialer has a directory of names and a first name grammar and a second name grammar representing phonetic baseforms of first names and second names respectively. The method includes: receiving voice data for a spoken name after requesting a user to speak the required name; extracting a set of phonetic baseforms for the voice data; and finding the best matches between the extracted set of phonetic baseforms voice data and any combination of the first name grammar and the second name grammar. The method can further include: checking the best match against the directory of names; if the best match does not exist in the directory, informing the user and prompting the next best match as an alternative; and if the best match does exist in the directory, forwarding the call to that best match.

Type: Grant

Filed: July 7, 2008

Date of Patent: February 5, 2013

Assignee: Nuance Communications, Inc.

Inventors: Eric William Janke, Keith Sloan
Detection of voice inactivity within a sound stream

Patent number: 8370144

Abstract: A method for identifying end of voiced speech within an audio stream of a noisy environment employs a speech discriminator. The discriminator analyzes each window of the audio stream, producing an output corresponding to the window. The output is used to classify the window in one of several classes, for example, (1) speech, (2) silence, or (3) noise. A state machine processes the window classifications, incrementing counters as each window is classified: speech counter for speech windows, silence counter for silence, and noise counter for noise. If the speech counter indicates a predefined number of windows, the state machine clears all counters. Otherwise, the state machine appropriately weights the values in the silence and noise counters, adds the weighted values, and compares the sum to a limit imposed on the number of non-voice windows. When the non-voice limit is reached, the state machine terminates processing of the audio stream.

Type: Grant

Filed: June 3, 2010

Date of Patent: February 5, 2013

Assignee: Applied Voice & Speech Technologies, Inc.

Inventor: Karl D. Gierach

prev … 4 5 6 7 8 9 10 11 12 … next