Subportions Patents (Class 704/254)

Systems and methods for normalizing input media

Patent number: 8688435

Abstract: A method and system for processing input media for provision to a text to speech engine comprising: a rules engine configured to maintain and update rules for processing the input media; a pre-parsing filter module configured to determine one or more metadata attributes using pre-parsing rules; a parsing filter module configured to identify content component from the input media using the parsing rules; a context and language detector configured to determine a default context and a default language; a learning agent configured to divide the content component into units of interest; a tagging module configured to iteratively assign tags to the units of interest using the tagging rules, wherein each tag is associated with a post-parsing rule; a post-parsing filter module configured to modify the content component by executing the post-parsing rules identified by the tags assigned to the phrases and strings.

Type: Grant

Filed: September 22, 2010

Date of Patent: April 1, 2014

Assignee: Voice on the Go Inc.

Inventors: Babak Nasri, Selva Thayaparam
SYSTEM AND METHOD FOR SPEECH RECOGNITION USING TIMBRE VECTORS

Publication number: 20140088968

Abstract: The present invention is a method and system to convert speech signal into a parametric representation in terms of timbre vectors, and to recover the speech signal thereof. The speech signal is first segmented into non-overlapping frames using the glottal closure instant information, each frame is converted into an amplitude spectrum using a Fourier analyzer, and then using Laguerre functions to generate a set of coefficients which constitute a timbre vector. A sequence of timbre vectors can be subject to a variety of manipulations. The new timbre vectors are converted back into voice signals by first transforming into amplitude spectra using Laguerre functions, then generating phase spectra from the amplitude spectra using Kramers-Knonig relations. A Fourier transformer converts the amplitude spectra and phase spectra into elementary waveforms, then superposed to become the output voice. The method and system can be used for voice transformation, speech synthesis, and automatic speech recognition.

Type: Application

Filed: December 3, 2012

Publication date: March 27, 2014

Inventor: Chengjun Julian Chen
Language model score look-ahead value imparting device, language model score look-ahead value imparting method, and program storage medium

Patent number: 8682668

Abstract: A speech recognition apparatus that performs frame synchronous beam search by using a language model score look-ahead value prevents the pruning of a correct answer hypothesis while suppressing an increase in the number of hypotheses. A language model score look-ahead value imparting device 108 is provided with a word dictionary 203 that defines a phoneme string of a word, a language model 202 that imparts a score of appearance easiness of a word, and a smoothing language model score look-ahead value calculation means 201. The smoothing language model score look-ahead value calculation means 201 obtains a language model score look-ahead value at each phoneme in the word from the phoneme string of the word defined by the word dictionary 203 and the language model score defined by the language model 202 so that the language model score look-ahead values are prevented from concentrating on the beginning of the word.

Type: Grant

Filed: March 27, 2009

Date of Patent: March 25, 2014

Assignee: NEC Corporation

Inventors: Koji Okabe, Ryosuke Isotani, Kiyoshi Yamabana, Ken Hanazawa
Information processing apparatus and message extraction method

Patent number: 8676568

Abstract: A storage unit stores first filter information specifying the formats of messages and second filter information specifying weights for words or phrases. A first search unit selects messages matching the formats specified by the first filter information from a plurality of messages as messages to be extracted. A second search unit calculates the importance level of each message unselected by the first search unit, based on the words or phrases included in the message and the second filter information, and selects messages to be extracted, according to the calculated importance levels from the messages unselected by the first search unit.

Type: Grant

Filed: April 23, 2013

Date of Patent: March 18, 2014

Assignee: Fujitsu Limited

Inventors: Akira Minegishi, Michio Sato, Takao Shibazaki, Naomi Ozawa
Use of metadata to post process speech recognition output

Patent number: 8676577

Abstract: A method of utilizing metadata stored in a computer-readable medium to assist in the conversion of an audio stream to a text stream. The method compares personally identifiable data, such as a user's electronic address book and/or Caller/Recipient ID information (in the case of processing voice mail to text), to the n-best results generated by a speech recognition engine for each word that is output by the engine. A goal of this comparison is to correct a possible misrecognition of a spoken proper noun such as a name or company with its proper textual form or a spoken phone number to correctly formatted phone number with Arabic numerals to improve the overall accuracy of the output of the voice recognition system.

Type: Grant

Filed: March 31, 2009

Date of Patent: March 18, 2014

Assignee: Canyon IP Holdings, LLC

Inventors: Igor Roditis Jablokov, Clifford J. Strohofer, III, Marc White, Victor Roditis Jablokov
Method and System for Building a Phonotactic Model for Domain Independent Speech Recognition

Publication number: 20140074476

Abstract: The invention concerns a method and corresponding system for building a phonotactic mode for domain independent speech recognition. The method may include recognizing phones from a user's input communication using a current phonotactic model detecting morphemes (acoustic and/or non-acoustic) from the recognized phones, and outputting die detected morphemes for processing. The method also updates the phonotactic model with the detected morphemes and stores the new model in a database for use by the system daring the next user interaction. The method may also include making task-type classification decisions based on the detected morphemes from the user's input communication.

Type: Application

Filed: November 15, 2013

Publication date: March 13, 2014

Applicant: AT&T Intellectual Property II, L.P.

Inventor: Giuseppe Riccardi
Speech signal similarity

Patent number: 8670983

Abstract: A method for determining a similarity between a first audio source and a second audio source includes: for the first audio source, determining a first frequency of occurrence for each of a plurality of phoneme sequences and determining a first weighted frequency for each of the plurality of phoneme sequences based on the first frequency of occurrence for the phoneme sequence; for the second audio source, determining a second frequency of occurrence for each of a plurality of phoneme sequences and determining a second weighted frequency for each of the plurality of phoneme sequences based on the second frequency of occurrence for the phoneme sequence; comparing the first weighted frequency for each phoneme sequence with the second weighted frequency for the corresponding phoneme sequence; and generating a similarity score representative of a similarity between the first audio source and the second audio source based on the results of the comparing.

Type: Grant

Filed: August 30, 2011

Date of Patent: March 11, 2014

Assignee: Nexidia Inc.

Inventors: Jacob B. Garland, Jon A. Arrowood, Drew Lanham, Marsal Gavalda
Processing natural language grammar

Patent number: 8666729

Abstract: Creating and processing a natural language grammar set of data based on an input text string are disclosed. The method may include tagging the input text string, and examining, via a processor, the input text string for at least one first set of substitutions based on content of the input text string. The method may also include determining whether the input text string is a substring of a previously tagged input text string by comparing the input text string to a previously tagged input text string, such that the substring determination operation determines whether the input text string is wholly included in the previously tagged input text string.

Type: Grant

Filed: February 10, 2010

Date of Patent: March 4, 2014

Assignee: West Corporation

Inventor: Steven John Schanbacher
Speech recognition system with huge vocabulary

Patent number: 8666745

Abstract: The invention deals with speech recognition, such as a system for recognizing words in continuous speech. A speech recognition system is disclosed which is capable of recognizing a huge number of words, and in principle even an unlimited number of words. The speech recognition system comprises a word recognizer for deriving a best path through a word graph, and wherein words are assigned to the speech based on the best path. The word score being obtained from applying a phonemic language model to each word of the word graph. Moreover, the invention deals with an apparatus and a method for identifying words from a sound block and to computer readable code for implementing the method.

Type: Grant

Filed: March 6, 2013

Date of Patent: March 4, 2014

Assignee: Nuance Communications, Inc.

Inventor: Zsolt Saffer
METHOD TO PROVIDE INCREMENTAL UI RESPONSE BASED ON MULTIPLE ASYNCHRONOUS EVIDENCE ABOUT USER INPUT

Publication number: 20140058732

Abstract: Techniques disclosed herein include systems and methods for managing user interface responses to user input including spoken queries and commands. This includes providing incremental user interface (UI) response based on multiple recognition results about user input that are received with different delays. Such techniques include providing an initial response to a user at an early time, before remote recognition results are available. Systems herein can respond incrementally by initiating an initial UI response based on first recognition results, and then modify the initial UI response after receiving secondary recognition results. Since an initial response begins immediately, instead of waiting for results from all recognizers, it reduces the perceived delay by the user before complete results get rendered to the user.

Type: Application

Filed: August 21, 2012

Publication date: February 27, 2014

Applicant: Nuance Communications, Inc.

Inventors: Martin Labsky, Tomas Macek, Ladislav Kunc, Jan Kleindienst
Partial word lists into a phoneme tree

Patent number: 8650032

Abstract: The present invention discloses converting a text form into a speech. In the present invention, partial word lists of a data source are obtained by parsing the data source in parallel or in series. The partial word lists are then compiled to obtain phoneme graphs corresponding, respectively, to the partial word lists, and then the obtained phoneme graphs are combined. Speech recognition is then conducted according to the combination results. According to the present invention, computational complexity may be reduced and recognition efficiency may be improved during speech recognition.

Type: Grant

Filed: November 2, 2011

Date of Patent: February 11, 2014

Assignee: Nuance Communications, Inc.

Inventors: Guo Kang Fu, Zhao Bing Han, Bin Jia, Ying Liu
Apparatus and method of extending pronunciation dictionary used for speech recognition

Patent number: 8645139

Abstract: An apparatus and method for extending a pronunciation dictionary for speech recognition are provided. The apparatus and the method may segment speech information of an input utterance into at least one phoneme, collect segmentation information of the at least one segmented phoneme, analyze a pronunciation variation of the at least one segmented phoneme based on the collected segmentation information, and select a substitutable phoneme group for the at least one phoneme where the pronunciation variation occurs, and extend the pronunciation dictionary.

Type: Grant

Filed: February 23, 2010

Date of Patent: February 4, 2014

Assignee: Samsung Electronics Co., Ltd.

Inventor: Gil Ho Lee
METHOD OF RECOGNIZING SPEECH AND ELECTRONIC DEVICE THEREOF

Publication number: 20140019131

Abstract: A method of recognizing a speech and an electronic device thereof are provided. The method includes: segmenting a speech signal into a plurality of sections at preset time intervals; performing a phoneme recognition with respect to one of the plurality of sections of the speech signal by using a first acoustic model; extracting a candidate word of the one of the plurality of sections of the speech signal by using the phoneme recognition result; and performing a speech recognition with respect to the one the plurality of sections the speech signal by using the candidate word.

Type: Application

Filed: July 12, 2013

Publication date: January 16, 2014

Inventors: Jae-won LEE, Dong-suk YOOK, Hyeon-taek LIM, Tae-yoon KIM
SPEECH-RECOGNITION SYSTEM, STORAGE MEDIUM, AND METHOD OF SPEECH RECOGNITION

Publication number: 20140012578

Abstract: A speech recognition system that recognizes speech data is provided. The speech recognition system includes a speech recognition part that performs speech recognition of the speech data, and calculates a likelihood of the speech data with respect to a registered word that is pre-registered, a reliability judgment part that performs reliability judgment on the speech recognition based on the likelihood, and a judgment reference change processing part that changes a judgment reference for the reliability judgment, according to an utterance speed of the speech data.

Type: Application

Filed: June 24, 2013

Publication date: January 9, 2014

Applicant: SEIKO EPSON CORPORATION

Inventor: Kiyotaka MORIOKA
Speech search device and speech search method

Patent number: 8626508

Abstract: Provided are a speech search device, the search speed of which is very fast, the search performance of which is also excellent, and which performs fuzzy search, and a speech search method. Not only the fuzzy search is performed, but also the distance between phoneme discrimination features included in speech data is calculated to determine the similarity with respect to the speech using both a suffix array and dynamic programming, and an object to be searched for is narrowed by means of search keyword division based on a phoneme and search thresholds relative to a plurality of the divided search keywords, the object to be searched for is repeatedly searched for while increasing the search thresholds in order, and whether or not there is the keyword division is determined according to the length of the search keywords, thereby implementing speech search, the search speed of which is very fast and the search performance of which is also excellent.

Type: Grant

Filed: February 10, 2010

Date of Patent: January 7, 2014

Assignee: National University Corporation TOYOHASHI UNIVERSITY OF TECHNOLOGY

Inventors: Koichi Katsurada, Tsuneo Nitta, Shigeki Teshima
Method and system for dynamic nametag scoring

Patent number: 8626506

Abstract: A method for dynamic nametag scoring includes receiving at least one confusion table including at least one circumstantial condition wherein the confusion table is based on a plurality of phonetically balanced utterances, determining a plurality of templates for the nametag based on the received confusion tables, and determining a global nametag score for the nametag based on the determined templates. A computer usable medium with suitable computer program code is employed for dynamic nametag scoring.

Type: Grant

Filed: January 20, 2006

Date of Patent: January 7, 2014

Assignee: General Motors LLC

Inventors: Rathinavelu Chengalvarayan, John J. Correia
SYSTEMS AND METHODS FOR MODELING L1-SPECIFIC PHONOLOGICAL ERRORS IN COMPUTER-ASSISTED PRONUNCIATION TRAINING SYSTEM

Publication number: 20140006029

Abstract: A non-transitory processor-readable medium storing code representing instructions to be executed by a processor includes code to cause the processor to receive acoustic data representing an utterance spoken by a language learner in a non-native language in response to prompting the language learner to recite a word in the non-native language and receive a pronunciation lexicon of the word in the non-native language. The pronunciation lexicon includes at least one alternative pronunciation of the word based on a pronunciation lexicon of a native language of the language learner. The code causes the processor to generate an acoustic model of the at least one alternative pronunciation in the non-native language and identify a mispronunciation of the word in the utterance based on a comparison of the acoustic data with the acoustic model. The code causes the processor to send feedback related to the mispronunciation of the word to the language learner.

Type: Application

Filed: July 1, 2013

Publication date: January 2, 2014

Inventors: Theban Stanley, Kadri Hacioglu, Vesa Siivola
Converting partial word lists into a phoneme tree for speech recognition

Patent number: 8620656

Abstract: The present invention discloses converting a text form into a speech. In the present invention, partial word lists of a data source are obtained by parsing the data source in parallel or in series. The partial word lists are then compiled to obtain phoneme graphs corresponding, respectively, to the partial word lists, and then the obtained phoneme graphs are combined. Speech recognition is then conducted according to the combination results. According to the present invention, computational complexity may be reduced and recognition efficiency may be improved during speech recognition.

Type: Grant

Filed: March 4, 2012

Date of Patent: December 31, 2013

Assignee: Nuance Communications, Inc.

Inventors: Guo Kang Fu, Zhao Bing Han, Bin Jia, Ying Liu
Speech processing system and method

Patent number: 8620655

Abstract: A speech processing method, comprising: receiving a speech input which comprises a sequence of feature vectors; determining the likelihood of a sequence of words arising from the sequence of feature vectors using an acoustic model and a language model, comprising: providing an acoustic model for performing speech recognition on an input signal which comprises a sequence of feature vectors, said model having a plurality of model parameters relating to the probability distribution of a word or part thereof being related to a feature vector, wherein said speech input is a mismatched speech input which is received from a speaker in an environment which is not matched to the speaker or environment under which the acoustic model was trained; and adapting the acoustic model to the mismatched speech input, the speech processing method further comprising determining the likelihood of a sequence of features occurring in a given language using a language model; and combining the likelihoods determined by the acoustic

Type: Grant

Filed: August 10, 2011

Date of Patent: December 31, 2013

Assignee: Kabushiki Kaisha Toshiba

Inventors: Haitian Xu, Kean Kheong Chin, Mark John Francis Gales
Speaker verification methods and apparatus

Patent number: 8620657

Abstract: One aspect includes determining validity of an identity asserted by a speaker using a voice print associated with a user whose identity the speaker is asserting, the voice print obtained from characteristic features of at least one first voice signal obtained from the user uttering at least one enrollment utterance including at least one enrollment word by obtaining a second voice signal of the speaker uttering at least one challenge utterance that includes at least one word not in the at least one enrollment utterance, obtaining at least one characteristic feature from the second voice signal, comparing the at least one characteristic feature with at least a portion of the voice print to determine a similarity between the at least one characteristic feature and the at least a portion of the voice print, and determining whether the speaker is the user based, at least in part, on the similarity.

Type: Grant

Filed: September 14, 2012

Date of Patent: December 31, 2013

Assignee: Nuance Communications, Inc.

Inventors: Kevin R. Farrell, David A. James, William F. Ganong, III, Jerry K. Carter
DISPLAY APPARATUS, INTERACTIVE SERVER, AND METHOD FOR PROVIDING RESPONSE INFORMATION

Publication number: 20130339020

Abstract: A display apparatus, an interactive server, and a method for providing response information are provided. The display apparatus includes: a voice collector which collects a user's uttered voice, a communication unit which communicates with an interactive server; and, a controller which, if response information corresponding to the uttered voice which is transmitted to the interactive server is received from the interactive server, controls to perform an operation corresponding to the user's uttered voice based on the response information, wherein the response information is generated in a different form according to a function of the display apparatus which is classified based on an utterance element extracted from the uttered voice. Accordingly the display apparatus can execute the function corresponding to each of the uttered voices and can output the response message corresponding to each of the uttered voices, even if a variety of uttered voices are input from the user.

Type: Application

Filed: May 6, 2013

Publication date: December 19, 2013

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Hye-hyun HEO, Hae-rim SON, Jun-hyung SHIN
Apparatus and method for automatic lyric alignment to music playback

Patent number: 8604327

Abstract: There is provided an information processing device including a storage unit that stores music data for playing music and lyrics data indicating lyrics of the music, a display control unit that displays the lyrics of the music on a screen, a playback unit that plays the music and a user interface unit that detects a user input. The lyrics data includes a plurality of blocks each having lyrics of at least one character. The display control unit displays the lyrics of the music on the screen in such a way that each block included in the lyrics data is identifiable to a user while the music is played by the playback unit. The user interface unit detects timing corresponding to a boundary of each section of the music corresponding to each displayed block in response to a first user input.

Type: Grant

Filed: March 2, 2011

Date of Patent: December 10, 2013

Assignee: Sony Corporation

Inventor: Haruto Takeda
Multi-pass speech recognition

Patent number: 8606581

Abstract: According to example configurations, a speech recognition system is configured to receive an utterance. Based on analyzing at least a portion of the utterance using a first speech recognition model on a first pass, the speech recognition system detects that the utterance includes a first group of one or more spoken words. The speech recognition system utilizes the first group of one or more spoken words identified in the utterance as detected on the first pass to locate a given segment of interest in the utterance. The given segment can include one or more that are unrecognizable by the first speech recognition model. Based on analyzing the given segment using a second speech recognition model on a second pass, the speech recognition system detects one or more additional words in the utterance. A natural language understanding module utilizes the detected words to generate a command intended by the utterance.

Type: Grant

Filed: December 14, 2010

Date of Patent: December 10, 2013

Assignee: Nuance Communications, Inc.

Inventors: Holger Quast, Marcus Gröber, Mathias Maria Juliaan De Wachter, Frédéric Elie Ratle, Arthi Murugesan
Search apparatus, search method, and program

Patent number: 8600752

Abstract: A search apparatus includes a sound recognition unit which recognizes input sound, a user information estimation unit which estimates at least one of a physical condition and emotional demeanor of a speaker of the input sound based on the input sound and outputs user information representing the estimation result, a matching unit which performs matching between a search result target pronunciation symbol string and a recognition result pronunciation symbol string for each of plural search result target word strings, and a generation unit which generates a search result word string as a search result for a word string corresponding to the input sound from the plural search result target word strings based on the matching result. At least one of the matching unit and the generation unit changes processing in accordance with the user information.

Type: Grant

Filed: May 18, 2011

Date of Patent: December 3, 2013

Assignee: Sony Corporation

Inventors: Keiichi Yamada, Hitoshi Honda
Method for dialog management

Patent number: 8600747

Abstract: A spoken dialog system and method having a dialog management module are disclosed. The dialog management module includes a plurality of dialog motivators for handling various operations during a spoken dialog. The dialog motivators comprise an error handling, disambiguation, assumption, confirmation, missing information, and continuation. The spoken dialog system uses the assumption dialog motivator in either a-priori or a-posteriori modes. A-priori assumption is based on predefined requirements for the call flow and a-posteriori assumption can work with the confirmation dialog motivator to assume the content of received user input and confirm received user input.

Type: Grant

Filed: June 17, 2008

Date of Patent: December 3, 2013

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Alicia Abella, Allen Louis Gorin
Pronunciation variation rule extraction apparatus, pronunciation variation rule extraction method, and pronunciation variation rule extraction program

Patent number: 8595004

Abstract: A problem to be solved is to robustly detect a pronunciation variation example and acquire a pronunciation variation rule having a high generalization property, with less effort. The problem can be solved by a pronunciation variation rule extraction apparatus including a speech data storage unit, a base form pronunciation storage unit, a sub word language model generation unit, a speech recognition unit, and a difference extraction unit. The speech data storage unit stores speech data. The base form pronunciation storage unit stores base form pronunciation data representing base form pronunciation of the speech data. The sub word language model generation unit generates a sub word language model from the base form pronunciation data. The speech recognition unit recognizes the speech data by using the sub word language model.

Type: Grant

Filed: November 27, 2008

Date of Patent: November 26, 2013

Assignee: NEC Corporation

Inventor: Takafumi Koshinaka
Method and apparatus for performing song detection on audio signal

Patent number: 8595009

Abstract: Methods and apparatuses for performing song detection on an audio signal are described. Clips of the audio signal are classified into classes comprising music. Class boundaries of music clips are detected as candidate boundaries of a first type. Combinations including non-overlapped sections are derived. Each section meets the following conditions: 1) including at least one music segment longer than a predetermined minimum song duration, 2) shorter than a predetermined maximum song duration, 3) both starting and ending with a music clip, and 4) a proportion of the music clips in each of the sections is greater than a predetermined minimum proportion. In this way, various possible song partitions in the audio signal can be obtained for investigation.

Type: Grant

Filed: July 26, 2012

Date of Patent: November 26, 2013

Assignee: Dolby Laboratories Licensing Corporation

Inventors: Lie Lu, Claus Bauer
APPARATUS FOR CORRECTING ERROR IN SPEECH RECOGNITION

Publication number: 20130311182

Abstract: An apparatus for correcting errors in speech recognition is provided. The apparatus includes a feature vector extracting unit extracting feature vectors from a received speech. A speech recognizing unit recognizes the received speech as a word sequence on the basis of the extracted feature vectors. A phoneme weighted finite state transducer (WFST)-based converting unit converts the recognized word sequence recognized by the speech recognizing unit into a phoneme WFST. A speech recognition error correcting unit corrects errors in the converted phoneme WFST. The speech recognition error correcting unit includes a WFST synthesizing unit modeling a phoneme WFST transferred from the phoneme WFST-based converting unit as pronunciation variation on the basis of a Kullback-Leibler (KL) distance matrix.

Type: Application

Filed: May 16, 2013

Publication date: November 21, 2013

Inventors: Hong-Kook KIM, Woo-Kyeong SEONG, Ji-Hun PARK
Method, system and computer program for enhanced speech recognition of digits input strings

Patent number: 8589162

Abstract: The present invention proposes a method, system and computer program for speech recognition. According to one embodiment, a method is provided wherein, for an expected input string divided into a plurality of expected string segments, a speech segment is received for each expected string segment. Speech recognition is then performed separately on each said speech segment via the generation, for each said speech segment, of a segment n-best list comprising n highest confidence score results. A global n-best list is then generated corresponding to the expected input string utilizing the segment n-best lists and a final global speech recognition result corresponding to said expected input string is determined via the pruning of the results of the global n-best list utilizing a pruning criterion.

Type: Grant

Filed: September 19, 2008

Date of Patent: November 19, 2013

Assignee: Nuance Communications, Inc.

Inventors: Remi Lejeune, Hubert Crepy
AUTOMATIC MEASUREMENT OF SPEECH FLUENCY

Publication number: 20130304472

Abstract: Techniques are described for automatically measuring fluency of a patient's speech based on prosodic characteristics thereof. The prosodic characteristics may include statistics regarding silent pauses, filled pauses, repetitions, or fundamental frequency of the patient's speech. The statistics may include a count, average number of occurrences, duration, average duration, frequency of occurrence, standard deviation, or other statistics. In one embodiment, a method includes receiving an audio sample that includes speech of a patient, analyzing the audio sample to identify prosodic characteristics of the speech of the patient, and automatically measuring fluency of the speech of the patient based on the prosodic characteristics. These techniques may present several advantages, such as objectively measuring fluency of a patient's speech without requiring a manual transcription or other manual intervention in the analysis process.

Type: Application

Filed: July 17, 2013

Publication date: November 14, 2013

Inventor: Serguei V.S. Pakhomov
System and method for efficiently transcribing verbal messages to text

Patent number: 8583433

Abstract: A system and method for efficiently transcribing verbal messages to text is provided. Verbal messages are received and at least one of the verbal messages is divided into segments. Automatically recognized text is determined for each of the segments by performing speech recognition and a confidence rating is assigned to the automatically recognized text for each segment. A threshold is applied to the confidence ratings and those segments with confidence ratings that fall below the threshold are identified. The segments that fall below the threshold are assigned to one or more human agents starting with those segments that have the lowest confidence ratings. Transcription from the human agents is received for the segments assigned to that agent. The transcription is assembled with the automatically recognized text of the segments not assigned to the human agents as a text message for the at least one verbal message.

Type: Grant

Filed: August 6, 2012

Date of Patent: November 12, 2013

Assignee: Intellisist, Inc.

Inventors: Mike O. Webb, Bruce J. Peterson, Janet S. Kaseda
Dialect-specific acoustic language modeling and speech recognition

Patent number: 8583432

Abstract: Methods and systems for automatic speech recognition and methods and systems for training acoustic language models are disclosed. One system for automatic speech recognition includes a dialect recognition unit and a controller. The dialect recognition unit is configured to analyze acoustic input data to identify portions of the acoustic input data that conform to a general language and to identify portions of the acoustic input data that conform to at least one dialect of the general language. In addition, the controller is configured to apply a general language model and at least one dialect language model to the input data to perform speech recognition by dynamically selecting between the models in accordance with each of the identified portions.

Type: Grant

Filed: July 25, 2012

Date of Patent: November 12, 2013

Assignee: International Business Machines Corporation

Inventors: Fadi Biadsy, Lidia Mangu, Hagen Soltau
Word category estimation apparatus, word category estimation method, speech recognition apparatus, speech recognition method, program, and recording medium

Patent number: 8583436

Abstract: A word category estimation apparatus (100) includes a word category model (5) which is formed from a probability model having a plurality of kinds of information about a word category as features, and includes information about an entire word category graph as at least one of the features. A word category estimation unit (4) receives the word category graph of a speech recognition hypothesis to be processed, computes scores by referring to the word category model for respective arcs that form the word category graph, and outputs a word category sequence candidate based on the scores.

Type: Grant

Filed: December 19, 2008

Date of Patent: November 12, 2013

Assignee: NEC Corporation

Inventors: Hitoshi Yamamoto, Kiyokazu Miki
Pronunciation discovery for spoken words

Patent number: 8577681

Abstract: A method of generating an alternative pronunciation for a word or phrase, given an initial pronunciation and a spoken example of the word or phrase, includes providing the initial pronunciation of the word or phrase, and generating the alternative pronunciation by searching a neighborhood of pronunciations about the initial pronunciation via a constrained hypothesis, wherein the neighborhood includes pronunciations that differ from the initial pronunciation by at most one phoneme. The method further includes selecting a highest scoring pronunciation within the neighborhood of pronunciations.

Type: Grant

Filed: September 13, 2004

Date of Patent: November 5, 2013

Assignee: Nuance Communications, Inc.

Inventors: Daniel L. Roth, Laurence S. Gillick, Mike Shire
Method and Device for Voice Controlling

Publication number: 20130289995

Abstract: The present invention discloses a method and device for voice control, which are used to solve the problem of low success rate of voice control in the prior art. The method includes: classifying stored recognition information used for voice recognizing to obtain a syntax packet corresponding to each type of recognition information (10); receiving an inputted voice signal, and performing a voice recognition processing respectively on the received voice signal by using each obtained syntax packet in turn (20), and performing a corresponding control processing based on a voice recognition result of the voice signal according to each syntax packet (30).

Type: Application

Filed: January 12, 2011

Publication date: October 31, 2013

Applicant: ZTE CORPORATION

Inventors: Manhai Li, Kaili Xiao, Jingping Wang, Xin Liao
EMBEDDED SYSTEM FOR CONSTRUCTION OF SMALL FOOTPRINT SPEECH RECOGNITION WITH USER-DEFINABLE CONSTRAINTS

Publication number: 20130289994

Abstract: Techniques disclosed herein include systems and methods that enable a voice trigger that wakes-up an electronic device or causes the device to make additional voice commands active, without manual initiation of voice command functionality. In addition, such a voice trigger is dynamically programmable or customizable. A speaker can program or designate a particular phrase as the voice trigger. In general, techniques herein execute a voice-activated wake-up system that operates on a digital signal processor (DSP) or other low-power, secondary processing unit of an electronic device instead of running on a central processing unit (CPU). A speech recognition manager runs two speech recognition systems on an electronic device. The CPU dynamically creates a compact speech system for the DSP. Such a compact system can be continuously run during a standby mode, without quickly exhausting a battery supply.

Type: Application

Filed: April 26, 2012

Publication date: October 31, 2013

Inventors: Michael Jack Newman, Robert Roth, William D. Alexander, Paul van Mulbregt
Lexical acquisition apparatus, multi dialogue behavior system, and lexical acquisition program

Patent number: 8566097

Abstract: A lexical acquisition apparatus includes: a phoneme recognition section 2 for preparing a phoneme sequence candidate from an inputted speech; a word matching section 3 for preparing a plurality of word sequences based on the phoneme sequence candidate; a discrimination section 4 for selecting, from among a plurality of word sequences, a word sequence having a high likelihood in a recognition result; an acquisition section 5 for acquiring a new word based on the word sequence selected by the discrimination section 4; a teaching word list 4A used to teach a name; and a probability model 4B of the teaching word and an unknown word, wherein the discrimination section 4 calculates, for each word sequence, a first evaluation value showing how much words in the word sequence correspond to teaching words in the list 4A and a second evaluation value showing a probability at which the words in the word sequence are adjacent to one another and selects a word sequence for which a sum of the first evaluation value and the

Type: Grant

Filed: June 1, 2010

Date of Patent: October 22, 2013

Assignees: Honda Motor Co., Ltd., Advanced Telecommunications Research Institute International

Inventors: Mikio Nakano, Takashi Nose, Ryo Taguchi, Kotaro Funakoshi, Naoto Iwahashi
Method and apparatus for segmenting a multimedia program based upon audio events

Patent number: 8560319

Abstract: The present invention provides for a method and apparatus for segmenting a multi-media program based upon audio events. In an embodiment a method of classifying an audio stream is provided. This method includes receiving an audio stream. Sampling the audio stream at a predetermined rate and then combining a predetermined number of samples into a clip. A plurality of features are then determined for the clip and are analyzed using a linear approximation algorithm. The clip is then characterized based upon the results of the analysis conducted with the linear approximation algorithm.

Type: Grant

Filed: January 15, 2008

Date of Patent: October 15, 2013

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Qian Huang, Zhu Liu
Methods and system for evaluating potential confusion within grammar structure for set of statements to be used in speech recognition during computing event

Patent number: 8560318

Abstract: A plurality of statements are received from within a grammar structure. Each of the statements is formed by a number of word sets. A number of alignment regions across the statements are identified by aligning the statements on a word set basis. Each aligned word set represents an alignment region. A number of potential confusion zones are identified across the statements. Each potential confusion zone is defined by words from two or more of the statements at corresponding positions outside the alignment regions. For each of the identified potential confusion zones, phonetic pronunciations of the words within the potential confusion zone are analyzed to determine a measure of confusion probability between the words when audibly processed by a speech recognition system during the computing event. An identity of the potential confusion zones across the statements and their corresponding measure of confusion probability are reported to facilitate grammar structure improvement.

Type: Grant

Filed: May 14, 2010

Date of Patent: October 15, 2013

Assignee: Sony Computer Entertainment Inc.

Inventor: Gustavo A. Hernandez-Abrego
Mobile terminal and menu control method thereof

Patent number: 8560324

Abstract: A mobile terminal including an input unit configured to receive an input to activate a voice recognition function on the mobile terminal, a memory configured to store information related to operations performed on the mobile terminal, and a controller configured to activate the voice recognition function upon receiving the input to activate the voice recognition function, to determine a meaning of an input voice instruction based on at least one prior operation performed on the mobile terminal and a language included in the voice instruction, and to provide operations related to the determined meaning of the input voice instruction based on the at least one prior operation performed on the mobile terminal and the language included in the voice instruction and based on a probability that the determined meaning of the input voice instruction matches the information related to the operations of the mobile terminal.

Type: Grant

Filed: January 31, 2012

Date of Patent: October 15, 2013

Assignee: LG Electronics Inc.

Inventors: Jong-Ho Shin, Jae-Do Kwak, Jong-Keun Youn
Training and applying prosody models

Patent number: 8554566

Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.

Type: Grant

Filed: November 29, 2012

Date of Patent: October 8, 2013

Assignee: Morphism LLC

Inventor: James H. Stephens, Jr.
METHOD AND APPARATUS FOR ELEMENT IDENTIFICATION IN A SIGNAL

Publication number: 20130262116

Abstract: A computer-implemented method and apparatus for searching for an element sequence, the method comprising: receiving a signal; determining an initial segment of the signal; inputting the initial segment into an element extraction engine to obtain a first element sequence; determining one or more second segments, each of the second segments at least partially overlapping with the initial segment; inputting the second segments into the element extraction engine to obtain at least one second element sequence; and searching for an element subsequence common to at least a predetermined number of sequences of the first element sequence and the second element sequences.

Type: Application

Filed: March 27, 2012

Publication date: October 3, 2013

Applicant: NOVOSPEECH

Inventor: Yossef Ben-Ezra
System and method for adapting automatic speech recognition pronunciation by acoustic model restructuring

Patent number: 8548807

Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.

Type: Grant

Filed: June 9, 2009

Date of Patent: October 1, 2013

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Andrej Ljolje, Alistair D. Conkie, Ann K. Syrdal
Proactive completion of input fields for automated voice enablement of a web page

Patent number: 8543404

Abstract: Embodiments of the present invention provide a method and computer program product for the proactive completion of input fields for automated voice enablement of a Web page. In an embodiment of the invention, a method for proactively completing empty input fields for voice enabling a Web page can be provided. The method can include receiving speech input for an input field in a Web page and inserting a textual equivalent to the speech input into the input field in a Web page. The method further can include locating an empty input field remaining in the Web page and generating a speech grammar for the input field based upon permitted terms in a core attribute of the empty input field and prompting for speech input for the input field. Finally, the method can include posting the received speech input and the grammar to an automatic speech recognition (ASR) engine and inserting a textual equivalent to the speech input provided by the ASR engine into the empty input field.

Type: Grant

Filed: April 7, 2008

Date of Patent: September 24, 2013

Assignee: Nuance Communications, Inc.

Inventors: Victor S. Moore, Wendi L. Nusbickel
Voice processing methods and systems

Patent number: 8543400

Abstract: Voice processing methods and systems are provided. An utterance is received. The utterance is compared with teaching materials according to at least one matching algorithm to obtain a plurality of matching values corresponding to a plurality of voice units of the utterance. Respective voice units are scored in at least one first scoring item according to the matching values and a personified voice scoring algorithm. The personified voice scoring algorithm is generated according to training utterances corresponding to at least one training sentence in a phonetic-balanced sentence set of a plurality of learners and at least one real teacher, and scores corresponding to the respective voice units of the training utterances of the learners in the first scoring item provided by the real teacher.

Type: Grant

Filed: June 6, 2008

Date of Patent: September 24, 2013

Assignee: National Taiwan University

Inventors: Lin-Shan Lee, Che-Kuang Lin, Chia-Lin Chang, Yi-Jing Lin, Yow-Bang Wang, Yun-Huan Lee, Li-Wei Cheng
Speaker segmentation in noisy conversational speech

Patent number: 8543402

Abstract: System and methods for robust multiple speaker segmentation in noisy conversational speech are presented. Robust voice activity detection is applied to detect temporal speech events. In order to get robust speech features and detect speech events in a noisy environment, a noise reduction algorithm is applied, using noise tracking. After noise reduction and voice activity detection, the incoming audio/speech is initially labeled as speech segments or silence segments. With no prior knowledge of the number of speakers, the system identifies one reliable speech segment near the beginning of the conversational speech and extracts speech features with a short latency, then learns a statistical model from the selected speech segment. This initial statistical model is used to identify the succeeding speech segments in a conversation. The statistical model is also continuously adapted and expanded with newly identified speech segments that match well to the model.

Type: Grant

Filed: April 29, 2011

Date of Patent: September 24, 2013

Assignee: The Intellisis Corporation

Inventor: Jiyong Ma
Command recognition device, command recognition method, and command recognition robot

Patent number: 8532989

Abstract: A command recognition device includes: an utterance understanding unit that determines or selects word sequence information from speech information; speech confidence degree calculating unit that calculates degree of speech confidence based on the speech information and the word sequence information; a phrase confidence degree calculating unit that calculates a degree of phrase confidence based on image information and phrase information included in the word sequence information; and a motion control instructing unit that determines whether a command of the word sequence information should be executed based on the degree of speech confidence and the degree of phrase confidence.

Type: Grant

Filed: September 2, 2010

Date of Patent: September 10, 2013

Assignee: Honda Motor Co., Ltd.

Inventors: Kotaro Funakoshi, Mikio Nakano, Xiang Zuo, Naoto Iwahashi, Ryo Taguchi
Speech recognition based on pronunciation modeling

Patent number: 8532993

Abstract: A system and method for performing speech recognition is disclosed. The method comprises receiving an utterance, applying the utterance to a recognizer with a language model having pronunciation probabilities associated with unique word identifiers for words given their pronunciations and presenting a recognition result for the utterance. Recognition improvement is found by moving a pronunciation model from a dictionary to the language model.

Type: Grant

Filed: July 2, 2012

Date of Patent: September 10, 2013

Assignee: AT&T Intellectual Property II, L.P.

Inventor: Andrej Ljolje
Searching for symbol string

Patent number: 8532988

Abstract: A method for searching for an input symbol string, includes receiving (B) an input symbol string, proceeding (C) in a trie data structure to a calculation point indicated by the next symbol, calculating (D) distances at the calculation point, selecting (E) repeatedly the next branch to follow (C) to the next calculation point to repeat the calculation (D). After the calculation (G), selecting the symbol string having the shortest distance to the input symbol string on the basis of the performed calculations. To minimize the number of calculations, not only the distances are calculated (D) at the calculation points, but also the smallest possible length difference corresponding to each distance, and on the basis of each distance and corresponding length difference a reference value is calculated, and the branch is selected (E) in such a manner that next the routine proceeds from the calculation point producing the lowest reference value.

Type: Grant

Filed: July 3, 2003

Date of Patent: September 10, 2013

Assignee: Syslore Oy

Inventor: Jorkki Hyvonen
Speech Recognition on Large Lists Using Fragments

Publication number: 20130231934

Abstract: A system and method is provided for recognizing a speech input and selecting an entry from a list of entries. The method includes recognizing a speech input. A fragment list of fragmented entries is provided and compared to the recognized speech input to generate a candidate list of best matching entries based on the comparison result. The system includes a speech recognition module, and a data base for storing the list of entries and the fragmented list. The speech recognition module may obtain the fragmented list from the data base and store a candidate list of best matching entries in memory. A display may also be provided to allow a user to select from a list of best matching entries.

Type: Application

Filed: March 18, 2013

Publication date: September 5, 2013

Applicant: NUANCE COMMUNICATIONS, INC.

Inventor: Markus Schwarz

prev … 3 4 5 6 7 8 9 10 11 … next