Subportions Patents (Class 704/254)
  • Patent number: 8688435
    Abstract: A method and system for processing input media for provision to a text to speech engine comprising: a rules engine configured to maintain and update rules for processing the input media; a pre-parsing filter module configured to determine one or more metadata attributes using pre-parsing rules; a parsing filter module configured to identify content component from the input media using the parsing rules; a context and language detector configured to determine a default context and a default language; a learning agent configured to divide the content component into units of interest; a tagging module configured to iteratively assign tags to the units of interest using the tagging rules, wherein each tag is associated with a post-parsing rule; a post-parsing filter module configured to modify the content component by executing the post-parsing rules identified by the tags assigned to the phrases and strings.
    Type: Grant
    Filed: September 22, 2010
    Date of Patent: April 1, 2014
    Assignee: Voice on the Go Inc.
    Inventors: Babak Nasri, Selva Thayaparam
  • Publication number: 20140088968
    Abstract: The present invention is a method and system to convert speech signal into a parametric representation in terms of timbre vectors, and to recover the speech signal thereof. The speech signal is first segmented into non-overlapping frames using the glottal closure instant information, each frame is converted into an amplitude spectrum using a Fourier analyzer, and then using Laguerre functions to generate a set of coefficients which constitute a timbre vector. A sequence of timbre vectors can be subject to a variety of manipulations. The new timbre vectors are converted back into voice signals by first transforming into amplitude spectra using Laguerre functions, then generating phase spectra from the amplitude spectra using Kramers-Knonig relations. A Fourier transformer converts the amplitude spectra and phase spectra into elementary waveforms, then superposed to become the output voice. The method and system can be used for voice transformation, speech synthesis, and automatic speech recognition.
    Type: Application
    Filed: December 3, 2012
    Publication date: March 27, 2014
    Inventor: Chengjun Julian Chen
  • Patent number: 8682668
    Abstract: A speech recognition apparatus that performs frame synchronous beam search by using a language model score look-ahead value prevents the pruning of a correct answer hypothesis while suppressing an increase in the number of hypotheses. A language model score look-ahead value imparting device 108 is provided with a word dictionary 203 that defines a phoneme string of a word, a language model 202 that imparts a score of appearance easiness of a word, and a smoothing language model score look-ahead value calculation means 201. The smoothing language model score look-ahead value calculation means 201 obtains a language model score look-ahead value at each phoneme in the word from the phoneme string of the word defined by the word dictionary 203 and the language model score defined by the language model 202 so that the language model score look-ahead values are prevented from concentrating on the beginning of the word.
    Type: Grant
    Filed: March 27, 2009
    Date of Patent: March 25, 2014
    Assignee: NEC Corporation
    Inventors: Koji Okabe, Ryosuke Isotani, Kiyoshi Yamabana, Ken Hanazawa
  • Patent number: 8676568
    Abstract: A storage unit stores first filter information specifying the formats of messages and second filter information specifying weights for words or phrases. A first search unit selects messages matching the formats specified by the first filter information from a plurality of messages as messages to be extracted. A second search unit calculates the importance level of each message unselected by the first search unit, based on the words or phrases included in the message and the second filter information, and selects messages to be extracted, according to the calculated importance levels from the messages unselected by the first search unit.
    Type: Grant
    Filed: April 23, 2013
    Date of Patent: March 18, 2014
    Assignee: Fujitsu Limited
    Inventors: Akira Minegishi, Michio Sato, Takao Shibazaki, Naomi Ozawa
  • Patent number: 8676577
    Abstract: A method of utilizing metadata stored in a computer-readable medium to assist in the conversion of an audio stream to a text stream. The method compares personally identifiable data, such as a user's electronic address book and/or Caller/Recipient ID information (in the case of processing voice mail to text), to the n-best results generated by a speech recognition engine for each word that is output by the engine. A goal of this comparison is to correct a possible misrecognition of a spoken proper noun such as a name or company with its proper textual form or a spoken phone number to correctly formatted phone number with Arabic numerals to improve the overall accuracy of the output of the voice recognition system.
    Type: Grant
    Filed: March 31, 2009
    Date of Patent: March 18, 2014
    Assignee: Canyon IP Holdings, LLC
    Inventors: Igor Roditis Jablokov, Clifford J. Strohofer, III, Marc White, Victor Roditis Jablokov
  • Publication number: 20140074476
    Abstract: The invention concerns a method and corresponding system for building a phonotactic mode for domain independent speech recognition. The method may include recognizing phones from a user's input communication using a current phonotactic model detecting morphemes (acoustic and/or non-acoustic) from the recognized phones, and outputting die detected morphemes for processing. The method also updates the phonotactic model with the detected morphemes and stores the new model in a database for use by the system daring the next user interaction. The method may also include making task-type classification decisions based on the detected morphemes from the user's input communication.
    Type: Application
    Filed: November 15, 2013
    Publication date: March 13, 2014
    Applicant: AT&T Intellectual Property II, L.P.
    Inventor: Giuseppe Riccardi
  • Patent number: 8670983
    Abstract: A method for determining a similarity between a first audio source and a second audio source includes: for the first audio source, determining a first frequency of occurrence for each of a plurality of phoneme sequences and determining a first weighted frequency for each of the plurality of phoneme sequences based on the first frequency of occurrence for the phoneme sequence; for the second audio source, determining a second frequency of occurrence for each of a plurality of phoneme sequences and determining a second weighted frequency for each of the plurality of phoneme sequences based on the second frequency of occurrence for the phoneme sequence; comparing the first weighted frequency for each phoneme sequence with the second weighted frequency for the corresponding phoneme sequence; and generating a similarity score representative of a similarity between the first audio source and the second audio source based on the results of the comparing.
    Type: Grant
    Filed: August 30, 2011
    Date of Patent: March 11, 2014
    Assignee: Nexidia Inc.
    Inventors: Jacob B. Garland, Jon A. Arrowood, Drew Lanham, Marsal Gavalda
  • Patent number: 8666729
    Abstract: Creating and processing a natural language grammar set of data based on an input text string are disclosed. The method may include tagging the input text string, and examining, via a processor, the input text string for at least one first set of substitutions based on content of the input text string. The method may also include determining whether the input text string is a substring of a previously tagged input text string by comparing the input text string to a previously tagged input text string, such that the substring determination operation determines whether the input text string is wholly included in the previously tagged input text string.
    Type: Grant
    Filed: February 10, 2010
    Date of Patent: March 4, 2014
    Assignee: West Corporation
    Inventor: Steven John Schanbacher
  • Patent number: 8666745
    Abstract: The invention deals with speech recognition, such as a system for recognizing words in continuous speech. A speech recognition system is disclosed which is capable of recognizing a huge number of words, and in principle even an unlimited number of words. The speech recognition system comprises a word recognizer for deriving a best path through a word graph, and wherein words are assigned to the speech based on the best path. The word score being obtained from applying a phonemic language model to each word of the word graph. Moreover, the invention deals with an apparatus and a method for identifying words from a sound block and to computer readable code for implementing the method.
    Type: Grant
    Filed: March 6, 2013
    Date of Patent: March 4, 2014
    Assignee: Nuance Communications, Inc.
    Inventor: Zsolt Saffer
  • Publication number: 20140058732
    Abstract: Techniques disclosed herein include systems and methods for managing user interface responses to user input including spoken queries and commands. This includes providing incremental user interface (UI) response based on multiple recognition results about user input that are received with different delays. Such techniques include providing an initial response to a user at an early time, before remote recognition results are available. Systems herein can respond incrementally by initiating an initial UI response based on first recognition results, and then modify the initial UI response after receiving secondary recognition results. Since an initial response begins immediately, instead of waiting for results from all recognizers, it reduces the perceived delay by the user before complete results get rendered to the user.
    Type: Application
    Filed: August 21, 2012
    Publication date: February 27, 2014
    Applicant: Nuance Communications, Inc.
    Inventors: Martin Labsky, Tomas Macek, Ladislav Kunc, Jan Kleindienst
  • Patent number: 8650032
    Abstract: The present invention discloses converting a text form into a speech. In the present invention, partial word lists of a data source are obtained by parsing the data source in parallel or in series. The partial word lists are then compiled to obtain phoneme graphs corresponding, respectively, to the partial word lists, and then the obtained phoneme graphs are combined. Speech recognition is then conducted according to the combination results. According to the present invention, computational complexity may be reduced and recognition efficiency may be improved during speech recognition.
    Type: Grant
    Filed: November 2, 2011
    Date of Patent: February 11, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Guo Kang Fu, Zhao Bing Han, Bin Jia, Ying Liu
  • Patent number: 8645139
    Abstract: An apparatus and method for extending a pronunciation dictionary for speech recognition are provided. The apparatus and the method may segment speech information of an input utterance into at least one phoneme, collect segmentation information of the at least one segmented phoneme, analyze a pronunciation variation of the at least one segmented phoneme based on the collected segmentation information, and select a substitutable phoneme group for the at least one phoneme where the pronunciation variation occurs, and extend the pronunciation dictionary.
    Type: Grant
    Filed: February 23, 2010
    Date of Patent: February 4, 2014
    Assignee: Samsung Electronics Co., Ltd.
    Inventor: Gil Ho Lee
  • Publication number: 20140019131
    Abstract: A method of recognizing a speech and an electronic device thereof are provided. The method includes: segmenting a speech signal into a plurality of sections at preset time intervals; performing a phoneme recognition with respect to one of the plurality of sections of the speech signal by using a first acoustic model; extracting a candidate word of the one of the plurality of sections of the speech signal by using the phoneme recognition result; and performing a speech recognition with respect to the one the plurality of sections the speech signal by using the candidate word.
    Type: Application
    Filed: July 12, 2013
    Publication date: January 16, 2014
    Inventors: Jae-won LEE, Dong-suk YOOK, Hyeon-taek LIM, Tae-yoon KIM
  • Publication number: 20140012578
    Abstract: A speech recognition system that recognizes speech data is provided. The speech recognition system includes a speech recognition part that performs speech recognition of the speech data, and calculates a likelihood of the speech data with respect to a registered word that is pre-registered, a reliability judgment part that performs reliability judgment on the speech recognition based on the likelihood, and a judgment reference change processing part that changes a judgment reference for the reliability judgment, according to an utterance speed of the speech data.
    Type: Application
    Filed: June 24, 2013
    Publication date: January 9, 2014
    Applicant: SEIKO EPSON CORPORATION
    Inventor: Kiyotaka MORIOKA
  • Patent number: 8626508
    Abstract: Provided are a speech search device, the search speed of which is very fast, the search performance of which is also excellent, and which performs fuzzy search, and a speech search method. Not only the fuzzy search is performed, but also the distance between phoneme discrimination features included in speech data is calculated to determine the similarity with respect to the speech using both a suffix array and dynamic programming, and an object to be searched for is narrowed by means of search keyword division based on a phoneme and search thresholds relative to a plurality of the divided search keywords, the object to be searched for is repeatedly searched for while increasing the search thresholds in order, and whether or not there is the keyword division is determined according to the length of the search keywords, thereby implementing speech search, the search speed of which is very fast and the search performance of which is also excellent.
    Type: Grant
    Filed: February 10, 2010
    Date of Patent: January 7, 2014
    Assignee: National University Corporation TOYOHASHI UNIVERSITY OF TECHNOLOGY
    Inventors: Koichi Katsurada, Tsuneo Nitta, Shigeki Teshima
  • Patent number: 8626506
    Abstract: A method for dynamic nametag scoring includes receiving at least one confusion table including at least one circumstantial condition wherein the confusion table is based on a plurality of phonetically balanced utterances, determining a plurality of templates for the nametag based on the received confusion tables, and determining a global nametag score for the nametag based on the determined templates. A computer usable medium with suitable computer program code is employed for dynamic nametag scoring.
    Type: Grant
    Filed: January 20, 2006
    Date of Patent: January 7, 2014
    Assignee: General Motors LLC
    Inventors: Rathinavelu Chengalvarayan, John J. Correia
  • Publication number: 20140006029
    Abstract: A non-transitory processor-readable medium storing code representing instructions to be executed by a processor includes code to cause the processor to receive acoustic data representing an utterance spoken by a language learner in a non-native language in response to prompting the language learner to recite a word in the non-native language and receive a pronunciation lexicon of the word in the non-native language. The pronunciation lexicon includes at least one alternative pronunciation of the word based on a pronunciation lexicon of a native language of the language learner. The code causes the processor to generate an acoustic model of the at least one alternative pronunciation in the non-native language and identify a mispronunciation of the word in the utterance based on a comparison of the acoustic data with the acoustic model. The code causes the processor to send feedback related to the mispronunciation of the word to the language learner.
    Type: Application
    Filed: July 1, 2013
    Publication date: January 2, 2014
    Inventors: Theban Stanley, Kadri Hacioglu, Vesa Siivola
  • Patent number: 8620656
    Abstract: The present invention discloses converting a text form into a speech. In the present invention, partial word lists of a data source are obtained by parsing the data source in parallel or in series. The partial word lists are then compiled to obtain phoneme graphs corresponding, respectively, to the partial word lists, and then the obtained phoneme graphs are combined. Speech recognition is then conducted according to the combination results. According to the present invention, computational complexity may be reduced and recognition efficiency may be improved during speech recognition.
    Type: Grant
    Filed: March 4, 2012
    Date of Patent: December 31, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Guo Kang Fu, Zhao Bing Han, Bin Jia, Ying Liu
  • Patent number: 8620655
    Abstract: A speech processing method, comprising: receiving a speech input which comprises a sequence of feature vectors; determining the likelihood of a sequence of words arising from the sequence of feature vectors using an acoustic model and a language model, comprising: providing an acoustic model for performing speech recognition on an input signal which comprises a sequence of feature vectors, said model having a plurality of model parameters relating to the probability distribution of a word or part thereof being related to a feature vector, wherein said speech input is a mismatched speech input which is received from a speaker in an environment which is not matched to the speaker or environment under which the acoustic model was trained; and adapting the acoustic model to the mismatched speech input, the speech processing method further comprising determining the likelihood of a sequence of features occurring in a given language using a language model; and combining the likelihoods determined by the acoustic
    Type: Grant
    Filed: August 10, 2011
    Date of Patent: December 31, 2013
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Haitian Xu, Kean Kheong Chin, Mark John Francis Gales
  • Patent number: 8620657
    Abstract: One aspect includes determining validity of an identity asserted by a speaker using a voice print associated with a user whose identity the speaker is asserting, the voice print obtained from characteristic features of at least one first voice signal obtained from the user uttering at least one enrollment utterance including at least one enrollment word by obtaining a second voice signal of the speaker uttering at least one challenge utterance that includes at least one word not in the at least one enrollment utterance, obtaining at least one characteristic feature from the second voice signal, comparing the at least one characteristic feature with at least a portion of the voice print to determine a similarity between the at least one characteristic feature and the at least a portion of the voice print, and determining whether the speaker is the user based, at least in part, on the similarity.
    Type: Grant
    Filed: September 14, 2012
    Date of Patent: December 31, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Kevin R. Farrell, David A. James, William F. Ganong, III, Jerry K. Carter
  • Publication number: 20130339020
    Abstract: A display apparatus, an interactive server, and a method for providing response information are provided. The display apparatus includes: a voice collector which collects a user's uttered voice, a communication unit which communicates with an interactive server; and, a controller which, if response information corresponding to the uttered voice which is transmitted to the interactive server is received from the interactive server, controls to perform an operation corresponding to the user's uttered voice based on the response information, wherein the response information is generated in a different form according to a function of the display apparatus which is classified based on an utterance element extracted from the uttered voice. Accordingly the display apparatus can execute the function corresponding to each of the uttered voices and can output the response message corresponding to each of the uttered voices, even if a variety of uttered voices are input from the user.
    Type: Application
    Filed: May 6, 2013
    Publication date: December 19, 2013
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Hye-hyun HEO, Hae-rim SON, Jun-hyung SHIN
  • Patent number: 8604327
    Abstract: There is provided an information processing device including a storage unit that stores music data for playing music and lyrics data indicating lyrics of the music, a display control unit that displays the lyrics of the music on a screen, a playback unit that plays the music and a user interface unit that detects a user input. The lyrics data includes a plurality of blocks each having lyrics of at least one character. The display control unit displays the lyrics of the music on the screen in such a way that each block included in the lyrics data is identifiable to a user while the music is played by the playback unit. The user interface unit detects timing corresponding to a boundary of each section of the music corresponding to each displayed block in response to a first user input.
    Type: Grant
    Filed: March 2, 2011
    Date of Patent: December 10, 2013
    Assignee: Sony Corporation
    Inventor: Haruto Takeda
  • Patent number: 8606581
    Abstract: According to example configurations, a speech recognition system is configured to receive an utterance. Based on analyzing at least a portion of the utterance using a first speech recognition model on a first pass, the speech recognition system detects that the utterance includes a first group of one or more spoken words. The speech recognition system utilizes the first group of one or more spoken words identified in the utterance as detected on the first pass to locate a given segment of interest in the utterance. The given segment can include one or more that are unrecognizable by the first speech recognition model. Based on analyzing the given segment using a second speech recognition model on a second pass, the speech recognition system detects one or more additional words in the utterance. A natural language understanding module utilizes the detected words to generate a command intended by the utterance.
    Type: Grant
    Filed: December 14, 2010
    Date of Patent: December 10, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Holger Quast, Marcus Gröber, Mathias Maria Juliaan De Wachter, Frédéric Elie Ratle, Arthi Murugesan
  • Patent number: 8600752
    Abstract: A search apparatus includes a sound recognition unit which recognizes input sound, a user information estimation unit which estimates at least one of a physical condition and emotional demeanor of a speaker of the input sound based on the input sound and outputs user information representing the estimation result, a matching unit which performs matching between a search result target pronunciation symbol string and a recognition result pronunciation symbol string for each of plural search result target word strings, and a generation unit which generates a search result word string as a search result for a word string corresponding to the input sound from the plural search result target word strings based on the matching result. At least one of the matching unit and the generation unit changes processing in accordance with the user information.
    Type: Grant
    Filed: May 18, 2011
    Date of Patent: December 3, 2013
    Assignee: Sony Corporation
    Inventors: Keiichi Yamada, Hitoshi Honda
  • Patent number: 8600747
    Abstract: A spoken dialog system and method having a dialog management module are disclosed. The dialog management module includes a plurality of dialog motivators for handling various operations during a spoken dialog. The dialog motivators comprise an error handling, disambiguation, assumption, confirmation, missing information, and continuation. The spoken dialog system uses the assumption dialog motivator in either a-priori or a-posteriori modes. A-priori assumption is based on predefined requirements for the call flow and a-posteriori assumption can work with the confirmation dialog motivator to assume the content of received user input and confirm received user input.
    Type: Grant
    Filed: June 17, 2008
    Date of Patent: December 3, 2013
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Alicia Abella, Allen Louis Gorin
  • Patent number: 8595004
    Abstract: A problem to be solved is to robustly detect a pronunciation variation example and acquire a pronunciation variation rule having a high generalization property, with less effort. The problem can be solved by a pronunciation variation rule extraction apparatus including a speech data storage unit, a base form pronunciation storage unit, a sub word language model generation unit, a speech recognition unit, and a difference extraction unit. The speech data storage unit stores speech data. The base form pronunciation storage unit stores base form pronunciation data representing base form pronunciation of the speech data. The sub word language model generation unit generates a sub word language model from the base form pronunciation data. The speech recognition unit recognizes the speech data by using the sub word language model.
    Type: Grant
    Filed: November 27, 2008
    Date of Patent: November 26, 2013
    Assignee: NEC Corporation
    Inventor: Takafumi Koshinaka
  • Patent number: 8595009
    Abstract: Methods and apparatuses for performing song detection on an audio signal are described. Clips of the audio signal are classified into classes comprising music. Class boundaries of music clips are detected as candidate boundaries of a first type. Combinations including non-overlapped sections are derived. Each section meets the following conditions: 1) including at least one music segment longer than a predetermined minimum song duration, 2) shorter than a predetermined maximum song duration, 3) both starting and ending with a music clip, and 4) a proportion of the music clips in each of the sections is greater than a predetermined minimum proportion. In this way, various possible song partitions in the audio signal can be obtained for investigation.
    Type: Grant
    Filed: July 26, 2012
    Date of Patent: November 26, 2013
    Assignee: Dolby Laboratories Licensing Corporation
    Inventors: Lie Lu, Claus Bauer
  • Publication number: 20130311182
    Abstract: An apparatus for correcting errors in speech recognition is provided. The apparatus includes a feature vector extracting unit extracting feature vectors from a received speech. A speech recognizing unit recognizes the received speech as a word sequence on the basis of the extracted feature vectors. A phoneme weighted finite state transducer (WFST)-based converting unit converts the recognized word sequence recognized by the speech recognizing unit into a phoneme WFST. A speech recognition error correcting unit corrects errors in the converted phoneme WFST. The speech recognition error correcting unit includes a WFST synthesizing unit modeling a phoneme WFST transferred from the phoneme WFST-based converting unit as pronunciation variation on the basis of a Kullback-Leibler (KL) distance matrix.
    Type: Application
    Filed: May 16, 2013
    Publication date: November 21, 2013
    Inventors: Hong-Kook KIM, Woo-Kyeong SEONG, Ji-Hun PARK
  • Patent number: 8589162
    Abstract: The present invention proposes a method, system and computer program for speech recognition. According to one embodiment, a method is provided wherein, for an expected input string divided into a plurality of expected string segments, a speech segment is received for each expected string segment. Speech recognition is then performed separately on each said speech segment via the generation, for each said speech segment, of a segment n-best list comprising n highest confidence score results. A global n-best list is then generated corresponding to the expected input string utilizing the segment n-best lists and a final global speech recognition result corresponding to said expected input string is determined via the pruning of the results of the global n-best list utilizing a pruning criterion.
    Type: Grant
    Filed: September 19, 2008
    Date of Patent: November 19, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Remi Lejeune, Hubert Crepy
  • Publication number: 20130304472
    Abstract: Techniques are described for automatically measuring fluency of a patient's speech based on prosodic characteristics thereof. The prosodic characteristics may include statistics regarding silent pauses, filled pauses, repetitions, or fundamental frequency of the patient's speech. The statistics may include a count, average number of occurrences, duration, average duration, frequency of occurrence, standard deviation, or other statistics. In one embodiment, a method includes receiving an audio sample that includes speech of a patient, analyzing the audio sample to identify prosodic characteristics of the speech of the patient, and automatically measuring fluency of the speech of the patient based on the prosodic characteristics. These techniques may present several advantages, such as objectively measuring fluency of a patient's speech without requiring a manual transcription or other manual intervention in the analysis process.
    Type: Application
    Filed: July 17, 2013
    Publication date: November 14, 2013
    Inventor: Serguei V.S. Pakhomov
  • Patent number: 8583433
    Abstract: A system and method for efficiently transcribing verbal messages to text is provided. Verbal messages are received and at least one of the verbal messages is divided into segments. Automatically recognized text is determined for each of the segments by performing speech recognition and a confidence rating is assigned to the automatically recognized text for each segment. A threshold is applied to the confidence ratings and those segments with confidence ratings that fall below the threshold are identified. The segments that fall below the threshold are assigned to one or more human agents starting with those segments that have the lowest confidence ratings. Transcription from the human agents is received for the segments assigned to that agent. The transcription is assembled with the automatically recognized text of the segments not assigned to the human agents as a text message for the at least one verbal message.
    Type: Grant
    Filed: August 6, 2012
    Date of Patent: November 12, 2013
    Assignee: Intellisist, Inc.
    Inventors: Mike O. Webb, Bruce J. Peterson, Janet S. Kaseda
  • Patent number: 8583432
    Abstract: Methods and systems for automatic speech recognition and methods and systems for training acoustic language models are disclosed. One system for automatic speech recognition includes a dialect recognition unit and a controller. The dialect recognition unit is configured to analyze acoustic input data to identify portions of the acoustic input data that conform to a general language and to identify portions of the acoustic input data that conform to at least one dialect of the general language. In addition, the controller is configured to apply a general language model and at least one dialect language model to the input data to perform speech recognition by dynamically selecting between the models in accordance with each of the identified portions.
    Type: Grant
    Filed: July 25, 2012
    Date of Patent: November 12, 2013
    Assignee: International Business Machines Corporation
    Inventors: Fadi Biadsy, Lidia Mangu, Hagen Soltau
  • Patent number: 8583436
    Abstract: A word category estimation apparatus (100) includes a word category model (5) which is formed from a probability model having a plurality of kinds of information about a word category as features, and includes information about an entire word category graph as at least one of the features. A word category estimation unit (4) receives the word category graph of a speech recognition hypothesis to be processed, computes scores by referring to the word category model for respective arcs that form the word category graph, and outputs a word category sequence candidate based on the scores.
    Type: Grant
    Filed: December 19, 2008
    Date of Patent: November 12, 2013
    Assignee: NEC Corporation
    Inventors: Hitoshi Yamamoto, Kiyokazu Miki
  • Patent number: 8577681
    Abstract: A method of generating an alternative pronunciation for a word or phrase, given an initial pronunciation and a spoken example of the word or phrase, includes providing the initial pronunciation of the word or phrase, and generating the alternative pronunciation by searching a neighborhood of pronunciations about the initial pronunciation via a constrained hypothesis, wherein the neighborhood includes pronunciations that differ from the initial pronunciation by at most one phoneme. The method further includes selecting a highest scoring pronunciation within the neighborhood of pronunciations.
    Type: Grant
    Filed: September 13, 2004
    Date of Patent: November 5, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Daniel L. Roth, Laurence S. Gillick, Mike Shire
  • Publication number: 20130289995
    Abstract: The present invention discloses a method and device for voice control, which are used to solve the problem of low success rate of voice control in the prior art. The method includes: classifying stored recognition information used for voice recognizing to obtain a syntax packet corresponding to each type of recognition information (10); receiving an inputted voice signal, and performing a voice recognition processing respectively on the received voice signal by using each obtained syntax packet in turn (20), and performing a corresponding control processing based on a voice recognition result of the voice signal according to each syntax packet (30).
    Type: Application
    Filed: January 12, 2011
    Publication date: October 31, 2013
    Applicant: ZTE CORPORATION
    Inventors: Manhai Li, Kaili Xiao, Jingping Wang, Xin Liao
  • Publication number: 20130289994
    Abstract: Techniques disclosed herein include systems and methods that enable a voice trigger that wakes-up an electronic device or causes the device to make additional voice commands active, without manual initiation of voice command functionality. In addition, such a voice trigger is dynamically programmable or customizable. A speaker can program or designate a particular phrase as the voice trigger. In general, techniques herein execute a voice-activated wake-up system that operates on a digital signal processor (DSP) or other low-power, secondary processing unit of an electronic device instead of running on a central processing unit (CPU). A speech recognition manager runs two speech recognition systems on an electronic device. The CPU dynamically creates a compact speech system for the DSP. Such a compact system can be continuously run during a standby mode, without quickly exhausting a battery supply.
    Type: Application
    Filed: April 26, 2012
    Publication date: October 31, 2013
    Inventors: Michael Jack Newman, Robert Roth, William D. Alexander, Paul van Mulbregt
  • Patent number: 8566097
    Abstract: A lexical acquisition apparatus includes: a phoneme recognition section 2 for preparing a phoneme sequence candidate from an inputted speech; a word matching section 3 for preparing a plurality of word sequences based on the phoneme sequence candidate; a discrimination section 4 for selecting, from among a plurality of word sequences, a word sequence having a high likelihood in a recognition result; an acquisition section 5 for acquiring a new word based on the word sequence selected by the discrimination section 4; a teaching word list 4A used to teach a name; and a probability model 4B of the teaching word and an unknown word, wherein the discrimination section 4 calculates, for each word sequence, a first evaluation value showing how much words in the word sequence correspond to teaching words in the list 4A and a second evaluation value showing a probability at which the words in the word sequence are adjacent to one another and selects a word sequence for which a sum of the first evaluation value and the
    Type: Grant
    Filed: June 1, 2010
    Date of Patent: October 22, 2013
    Assignees: Honda Motor Co., Ltd., Advanced Telecommunications Research Institute International
    Inventors: Mikio Nakano, Takashi Nose, Ryo Taguchi, Kotaro Funakoshi, Naoto Iwahashi
  • Patent number: 8560319
    Abstract: The present invention provides for a method and apparatus for segmenting a multi-media program based upon audio events. In an embodiment a method of classifying an audio stream is provided. This method includes receiving an audio stream. Sampling the audio stream at a predetermined rate and then combining a predetermined number of samples into a clip. A plurality of features are then determined for the clip and are analyzed using a linear approximation algorithm. The clip is then characterized based upon the results of the analysis conducted with the linear approximation algorithm.
    Type: Grant
    Filed: January 15, 2008
    Date of Patent: October 15, 2013
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Qian Huang, Zhu Liu
  • Patent number: 8560318
    Abstract: A plurality of statements are received from within a grammar structure. Each of the statements is formed by a number of word sets. A number of alignment regions across the statements are identified by aligning the statements on a word set basis. Each aligned word set represents an alignment region. A number of potential confusion zones are identified across the statements. Each potential confusion zone is defined by words from two or more of the statements at corresponding positions outside the alignment regions. For each of the identified potential confusion zones, phonetic pronunciations of the words within the potential confusion zone are analyzed to determine a measure of confusion probability between the words when audibly processed by a speech recognition system during the computing event. An identity of the potential confusion zones across the statements and their corresponding measure of confusion probability are reported to facilitate grammar structure improvement.
    Type: Grant
    Filed: May 14, 2010
    Date of Patent: October 15, 2013
    Assignee: Sony Computer Entertainment Inc.
    Inventor: Gustavo A. Hernandez-Abrego
  • Patent number: 8560324
    Abstract: A mobile terminal including an input unit configured to receive an input to activate a voice recognition function on the mobile terminal, a memory configured to store information related to operations performed on the mobile terminal, and a controller configured to activate the voice recognition function upon receiving the input to activate the voice recognition function, to determine a meaning of an input voice instruction based on at least one prior operation performed on the mobile terminal and a language included in the voice instruction, and to provide operations related to the determined meaning of the input voice instruction based on the at least one prior operation performed on the mobile terminal and the language included in the voice instruction and based on a probability that the determined meaning of the input voice instruction matches the information related to the operations of the mobile terminal.
    Type: Grant
    Filed: January 31, 2012
    Date of Patent: October 15, 2013
    Assignee: LG Electronics Inc.
    Inventors: Jong-Ho Shin, Jae-Do Kwak, Jong-Keun Youn
  • Patent number: 8554566
    Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.
    Type: Grant
    Filed: November 29, 2012
    Date of Patent: October 8, 2013
    Assignee: Morphism LLC
    Inventor: James H. Stephens, Jr.
  • Publication number: 20130262116
    Abstract: A computer-implemented method and apparatus for searching for an element sequence, the method comprising: receiving a signal; determining an initial segment of the signal; inputting the initial segment into an element extraction engine to obtain a first element sequence; determining one or more second segments, each of the second segments at least partially overlapping with the initial segment; inputting the second segments into the element extraction engine to obtain at least one second element sequence; and searching for an element subsequence common to at least a predetermined number of sequences of the first element sequence and the second element sequences.
    Type: Application
    Filed: March 27, 2012
    Publication date: October 3, 2013
    Applicant: NOVOSPEECH
    Inventor: Yossef Ben-Ezra
  • Patent number: 8548807
    Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.
    Type: Grant
    Filed: June 9, 2009
    Date of Patent: October 1, 2013
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Andrej Ljolje, Alistair D. Conkie, Ann K. Syrdal
  • Patent number: 8543404
    Abstract: Embodiments of the present invention provide a method and computer program product for the proactive completion of input fields for automated voice enablement of a Web page. In an embodiment of the invention, a method for proactively completing empty input fields for voice enabling a Web page can be provided. The method can include receiving speech input for an input field in a Web page and inserting a textual equivalent to the speech input into the input field in a Web page. The method further can include locating an empty input field remaining in the Web page and generating a speech grammar for the input field based upon permitted terms in a core attribute of the empty input field and prompting for speech input for the input field. Finally, the method can include posting the received speech input and the grammar to an automatic speech recognition (ASR) engine and inserting a textual equivalent to the speech input provided by the ASR engine into the empty input field.
    Type: Grant
    Filed: April 7, 2008
    Date of Patent: September 24, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Victor S. Moore, Wendi L. Nusbickel
  • Patent number: 8543400
    Abstract: Voice processing methods and systems are provided. An utterance is received. The utterance is compared with teaching materials according to at least one matching algorithm to obtain a plurality of matching values corresponding to a plurality of voice units of the utterance. Respective voice units are scored in at least one first scoring item according to the matching values and a personified voice scoring algorithm. The personified voice scoring algorithm is generated according to training utterances corresponding to at least one training sentence in a phonetic-balanced sentence set of a plurality of learners and at least one real teacher, and scores corresponding to the respective voice units of the training utterances of the learners in the first scoring item provided by the real teacher.
    Type: Grant
    Filed: June 6, 2008
    Date of Patent: September 24, 2013
    Assignee: National Taiwan University
    Inventors: Lin-Shan Lee, Che-Kuang Lin, Chia-Lin Chang, Yi-Jing Lin, Yow-Bang Wang, Yun-Huan Lee, Li-Wei Cheng
  • Patent number: 8543402
    Abstract: System and methods for robust multiple speaker segmentation in noisy conversational speech are presented. Robust voice activity detection is applied to detect temporal speech events. In order to get robust speech features and detect speech events in a noisy environment, a noise reduction algorithm is applied, using noise tracking. After noise reduction and voice activity detection, the incoming audio/speech is initially labeled as speech segments or silence segments. With no prior knowledge of the number of speakers, the system identifies one reliable speech segment near the beginning of the conversational speech and extracts speech features with a short latency, then learns a statistical model from the selected speech segment. This initial statistical model is used to identify the succeeding speech segments in a conversation. The statistical model is also continuously adapted and expanded with newly identified speech segments that match well to the model.
    Type: Grant
    Filed: April 29, 2011
    Date of Patent: September 24, 2013
    Assignee: The Intellisis Corporation
    Inventor: Jiyong Ma
  • Patent number: 8532989
    Abstract: A command recognition device includes: an utterance understanding unit that determines or selects word sequence information from speech information; speech confidence degree calculating unit that calculates degree of speech confidence based on the speech information and the word sequence information; a phrase confidence degree calculating unit that calculates a degree of phrase confidence based on image information and phrase information included in the word sequence information; and a motion control instructing unit that determines whether a command of the word sequence information should be executed based on the degree of speech confidence and the degree of phrase confidence.
    Type: Grant
    Filed: September 2, 2010
    Date of Patent: September 10, 2013
    Assignee: Honda Motor Co., Ltd.
    Inventors: Kotaro Funakoshi, Mikio Nakano, Xiang Zuo, Naoto Iwahashi, Ryo Taguchi
  • Patent number: 8532993
    Abstract: A system and method for performing speech recognition is disclosed. The method comprises receiving an utterance, applying the utterance to a recognizer with a language model having pronunciation probabilities associated with unique word identifiers for words given their pronunciations and presenting a recognition result for the utterance. Recognition improvement is found by moving a pronunciation model from a dictionary to the language model.
    Type: Grant
    Filed: July 2, 2012
    Date of Patent: September 10, 2013
    Assignee: AT&T Intellectual Property II, L.P.
    Inventor: Andrej Ljolje
  • Patent number: 8532988
    Abstract: A method for searching for an input symbol string, includes receiving (B) an input symbol string, proceeding (C) in a trie data structure to a calculation point indicated by the next symbol, calculating (D) distances at the calculation point, selecting (E) repeatedly the next branch to follow (C) to the next calculation point to repeat the calculation (D). After the calculation (G), selecting the symbol string having the shortest distance to the input symbol string on the basis of the performed calculations. To minimize the number of calculations, not only the distances are calculated (D) at the calculation points, but also the smallest possible length difference corresponding to each distance, and on the basis of each distance and corresponding length difference a reference value is calculated, and the branch is selected (E) in such a manner that next the routine proceeds from the calculation point producing the lowest reference value.
    Type: Grant
    Filed: July 3, 2003
    Date of Patent: September 10, 2013
    Assignee: Syslore Oy
    Inventor: Jorkki Hyvonen
  • Publication number: 20130231934
    Abstract: A system and method is provided for recognizing a speech input and selecting an entry from a list of entries. The method includes recognizing a speech input. A fragment list of fragmented entries is provided and compared to the recognized speech input to generate a candidate list of best matching entries based on the comparison result. The system includes a speech recognition module, and a data base for storing the list of entries and the fragmented list. The speech recognition module may obtain the fragmented list from the data base and store a candidate list of best matching entries in memory. A display may also be provided to allow a user to select from a list of best matching entries.
    Type: Application
    Filed: March 18, 2013
    Publication date: September 5, 2013
    Applicant: NUANCE COMMUNICATIONS, INC.
    Inventor: Markus Schwarz