Subportions Patents (Class 704/254)
-
Patent number: 8688435Abstract: A method and system for processing input media for provision to a text to speech engine comprising: a rules engine configured to maintain and update rules for processing the input media; a pre-parsing filter module configured to determine one or more metadata attributes using pre-parsing rules; a parsing filter module configured to identify content component from the input media using the parsing rules; a context and language detector configured to determine a default context and a default language; a learning agent configured to divide the content component into units of interest; a tagging module configured to iteratively assign tags to the units of interest using the tagging rules, wherein each tag is associated with a post-parsing rule; a post-parsing filter module configured to modify the content component by executing the post-parsing rules identified by the tags assigned to the phrases and strings.Type: GrantFiled: September 22, 2010Date of Patent: April 1, 2014Assignee: Voice on the Go Inc.Inventors: Babak Nasri, Selva Thayaparam
-
Publication number: 20140088968Abstract: The present invention is a method and system to convert speech signal into a parametric representation in terms of timbre vectors, and to recover the speech signal thereof. The speech signal is first segmented into non-overlapping frames using the glottal closure instant information, each frame is converted into an amplitude spectrum using a Fourier analyzer, and then using Laguerre functions to generate a set of coefficients which constitute a timbre vector. A sequence of timbre vectors can be subject to a variety of manipulations. The new timbre vectors are converted back into voice signals by first transforming into amplitude spectra using Laguerre functions, then generating phase spectra from the amplitude spectra using Kramers-Knonig relations. A Fourier transformer converts the amplitude spectra and phase spectra into elementary waveforms, then superposed to become the output voice. The method and system can be used for voice transformation, speech synthesis, and automatic speech recognition.Type: ApplicationFiled: December 3, 2012Publication date: March 27, 2014Inventor: Chengjun Julian Chen
-
Patent number: 8682668Abstract: A speech recognition apparatus that performs frame synchronous beam search by using a language model score look-ahead value prevents the pruning of a correct answer hypothesis while suppressing an increase in the number of hypotheses. A language model score look-ahead value imparting device 108 is provided with a word dictionary 203 that defines a phoneme string of a word, a language model 202 that imparts a score of appearance easiness of a word, and a smoothing language model score look-ahead value calculation means 201. The smoothing language model score look-ahead value calculation means 201 obtains a language model score look-ahead value at each phoneme in the word from the phoneme string of the word defined by the word dictionary 203 and the language model score defined by the language model 202 so that the language model score look-ahead values are prevented from concentrating on the beginning of the word.Type: GrantFiled: March 27, 2009Date of Patent: March 25, 2014Assignee: NEC CorporationInventors: Koji Okabe, Ryosuke Isotani, Kiyoshi Yamabana, Ken Hanazawa
-
Patent number: 8676568Abstract: A storage unit stores first filter information specifying the formats of messages and second filter information specifying weights for words or phrases. A first search unit selects messages matching the formats specified by the first filter information from a plurality of messages as messages to be extracted. A second search unit calculates the importance level of each message unselected by the first search unit, based on the words or phrases included in the message and the second filter information, and selects messages to be extracted, according to the calculated importance levels from the messages unselected by the first search unit.Type: GrantFiled: April 23, 2013Date of Patent: March 18, 2014Assignee: Fujitsu LimitedInventors: Akira Minegishi, Michio Sato, Takao Shibazaki, Naomi Ozawa
-
Patent number: 8676577Abstract: A method of utilizing metadata stored in a computer-readable medium to assist in the conversion of an audio stream to a text stream. The method compares personally identifiable data, such as a user's electronic address book and/or Caller/Recipient ID information (in the case of processing voice mail to text), to the n-best results generated by a speech recognition engine for each word that is output by the engine. A goal of this comparison is to correct a possible misrecognition of a spoken proper noun such as a name or company with its proper textual form or a spoken phone number to correctly formatted phone number with Arabic numerals to improve the overall accuracy of the output of the voice recognition system.Type: GrantFiled: March 31, 2009Date of Patent: March 18, 2014Assignee: Canyon IP Holdings, LLCInventors: Igor Roditis Jablokov, Clifford J. Strohofer, III, Marc White, Victor Roditis Jablokov
-
Publication number: 20140074476Abstract: The invention concerns a method and corresponding system for building a phonotactic mode for domain independent speech recognition. The method may include recognizing phones from a user's input communication using a current phonotactic model detecting morphemes (acoustic and/or non-acoustic) from the recognized phones, and outputting die detected morphemes for processing. The method also updates the phonotactic model with the detected morphemes and stores the new model in a database for use by the system daring the next user interaction. The method may also include making task-type classification decisions based on the detected morphemes from the user's input communication.Type: ApplicationFiled: November 15, 2013Publication date: March 13, 2014Applicant: AT&T Intellectual Property II, L.P.Inventor: Giuseppe Riccardi
-
Patent number: 8670983Abstract: A method for determining a similarity between a first audio source and a second audio source includes: for the first audio source, determining a first frequency of occurrence for each of a plurality of phoneme sequences and determining a first weighted frequency for each of the plurality of phoneme sequences based on the first frequency of occurrence for the phoneme sequence; for the second audio source, determining a second frequency of occurrence for each of a plurality of phoneme sequences and determining a second weighted frequency for each of the plurality of phoneme sequences based on the second frequency of occurrence for the phoneme sequence; comparing the first weighted frequency for each phoneme sequence with the second weighted frequency for the corresponding phoneme sequence; and generating a similarity score representative of a similarity between the first audio source and the second audio source based on the results of the comparing.Type: GrantFiled: August 30, 2011Date of Patent: March 11, 2014Assignee: Nexidia Inc.Inventors: Jacob B. Garland, Jon A. Arrowood, Drew Lanham, Marsal Gavalda
-
Patent number: 8666729Abstract: Creating and processing a natural language grammar set of data based on an input text string are disclosed. The method may include tagging the input text string, and examining, via a processor, the input text string for at least one first set of substitutions based on content of the input text string. The method may also include determining whether the input text string is a substring of a previously tagged input text string by comparing the input text string to a previously tagged input text string, such that the substring determination operation determines whether the input text string is wholly included in the previously tagged input text string.Type: GrantFiled: February 10, 2010Date of Patent: March 4, 2014Assignee: West CorporationInventor: Steven John Schanbacher
-
Patent number: 8666745Abstract: The invention deals with speech recognition, such as a system for recognizing words in continuous speech. A speech recognition system is disclosed which is capable of recognizing a huge number of words, and in principle even an unlimited number of words. The speech recognition system comprises a word recognizer for deriving a best path through a word graph, and wherein words are assigned to the speech based on the best path. The word score being obtained from applying a phonemic language model to each word of the word graph. Moreover, the invention deals with an apparatus and a method for identifying words from a sound block and to computer readable code for implementing the method.Type: GrantFiled: March 6, 2013Date of Patent: March 4, 2014Assignee: Nuance Communications, Inc.Inventor: Zsolt Saffer
-
Publication number: 20140058732Abstract: Techniques disclosed herein include systems and methods for managing user interface responses to user input including spoken queries and commands. This includes providing incremental user interface (UI) response based on multiple recognition results about user input that are received with different delays. Such techniques include providing an initial response to a user at an early time, before remote recognition results are available. Systems herein can respond incrementally by initiating an initial UI response based on first recognition results, and then modify the initial UI response after receiving secondary recognition results. Since an initial response begins immediately, instead of waiting for results from all recognizers, it reduces the perceived delay by the user before complete results get rendered to the user.Type: ApplicationFiled: August 21, 2012Publication date: February 27, 2014Applicant: Nuance Communications, Inc.Inventors: Martin Labsky, Tomas Macek, Ladislav Kunc, Jan Kleindienst
-
Patent number: 8650032Abstract: The present invention discloses converting a text form into a speech. In the present invention, partial word lists of a data source are obtained by parsing the data source in parallel or in series. The partial word lists are then compiled to obtain phoneme graphs corresponding, respectively, to the partial word lists, and then the obtained phoneme graphs are combined. Speech recognition is then conducted according to the combination results. According to the present invention, computational complexity may be reduced and recognition efficiency may be improved during speech recognition.Type: GrantFiled: November 2, 2011Date of Patent: February 11, 2014Assignee: Nuance Communications, Inc.Inventors: Guo Kang Fu, Zhao Bing Han, Bin Jia, Ying Liu
-
Patent number: 8645139Abstract: An apparatus and method for extending a pronunciation dictionary for speech recognition are provided. The apparatus and the method may segment speech information of an input utterance into at least one phoneme, collect segmentation information of the at least one segmented phoneme, analyze a pronunciation variation of the at least one segmented phoneme based on the collected segmentation information, and select a substitutable phoneme group for the at least one phoneme where the pronunciation variation occurs, and extend the pronunciation dictionary.Type: GrantFiled: February 23, 2010Date of Patent: February 4, 2014Assignee: Samsung Electronics Co., Ltd.Inventor: Gil Ho Lee
-
Publication number: 20140019131Abstract: A method of recognizing a speech and an electronic device thereof are provided. The method includes: segmenting a speech signal into a plurality of sections at preset time intervals; performing a phoneme recognition with respect to one of the plurality of sections of the speech signal by using a first acoustic model; extracting a candidate word of the one of the plurality of sections of the speech signal by using the phoneme recognition result; and performing a speech recognition with respect to the one the plurality of sections the speech signal by using the candidate word.Type: ApplicationFiled: July 12, 2013Publication date: January 16, 2014Inventors: Jae-won LEE, Dong-suk YOOK, Hyeon-taek LIM, Tae-yoon KIM
-
Publication number: 20140012578Abstract: A speech recognition system that recognizes speech data is provided. The speech recognition system includes a speech recognition part that performs speech recognition of the speech data, and calculates a likelihood of the speech data with respect to a registered word that is pre-registered, a reliability judgment part that performs reliability judgment on the speech recognition based on the likelihood, and a judgment reference change processing part that changes a judgment reference for the reliability judgment, according to an utterance speed of the speech data.Type: ApplicationFiled: June 24, 2013Publication date: January 9, 2014Applicant: SEIKO EPSON CORPORATIONInventor: Kiyotaka MORIOKA
-
Patent number: 8626508Abstract: Provided are a speech search device, the search speed of which is very fast, the search performance of which is also excellent, and which performs fuzzy search, and a speech search method. Not only the fuzzy search is performed, but also the distance between phoneme discrimination features included in speech data is calculated to determine the similarity with respect to the speech using both a suffix array and dynamic programming, and an object to be searched for is narrowed by means of search keyword division based on a phoneme and search thresholds relative to a plurality of the divided search keywords, the object to be searched for is repeatedly searched for while increasing the search thresholds in order, and whether or not there is the keyword division is determined according to the length of the search keywords, thereby implementing speech search, the search speed of which is very fast and the search performance of which is also excellent.Type: GrantFiled: February 10, 2010Date of Patent: January 7, 2014Assignee: National University Corporation TOYOHASHI UNIVERSITY OF TECHNOLOGYInventors: Koichi Katsurada, Tsuneo Nitta, Shigeki Teshima
-
Patent number: 8626506Abstract: A method for dynamic nametag scoring includes receiving at least one confusion table including at least one circumstantial condition wherein the confusion table is based on a plurality of phonetically balanced utterances, determining a plurality of templates for the nametag based on the received confusion tables, and determining a global nametag score for the nametag based on the determined templates. A computer usable medium with suitable computer program code is employed for dynamic nametag scoring.Type: GrantFiled: January 20, 2006Date of Patent: January 7, 2014Assignee: General Motors LLCInventors: Rathinavelu Chengalvarayan, John J. Correia
-
Publication number: 20140006029Abstract: A non-transitory processor-readable medium storing code representing instructions to be executed by a processor includes code to cause the processor to receive acoustic data representing an utterance spoken by a language learner in a non-native language in response to prompting the language learner to recite a word in the non-native language and receive a pronunciation lexicon of the word in the non-native language. The pronunciation lexicon includes at least one alternative pronunciation of the word based on a pronunciation lexicon of a native language of the language learner. The code causes the processor to generate an acoustic model of the at least one alternative pronunciation in the non-native language and identify a mispronunciation of the word in the utterance based on a comparison of the acoustic data with the acoustic model. The code causes the processor to send feedback related to the mispronunciation of the word to the language learner.Type: ApplicationFiled: July 1, 2013Publication date: January 2, 2014Inventors: Theban Stanley, Kadri Hacioglu, Vesa Siivola
-
Patent number: 8620656Abstract: The present invention discloses converting a text form into a speech. In the present invention, partial word lists of a data source are obtained by parsing the data source in parallel or in series. The partial word lists are then compiled to obtain phoneme graphs corresponding, respectively, to the partial word lists, and then the obtained phoneme graphs are combined. Speech recognition is then conducted according to the combination results. According to the present invention, computational complexity may be reduced and recognition efficiency may be improved during speech recognition.Type: GrantFiled: March 4, 2012Date of Patent: December 31, 2013Assignee: Nuance Communications, Inc.Inventors: Guo Kang Fu, Zhao Bing Han, Bin Jia, Ying Liu
-
Patent number: 8620655Abstract: A speech processing method, comprising: receiving a speech input which comprises a sequence of feature vectors; determining the likelihood of a sequence of words arising from the sequence of feature vectors using an acoustic model and a language model, comprising: providing an acoustic model for performing speech recognition on an input signal which comprises a sequence of feature vectors, said model having a plurality of model parameters relating to the probability distribution of a word or part thereof being related to a feature vector, wherein said speech input is a mismatched speech input which is received from a speaker in an environment which is not matched to the speaker or environment under which the acoustic model was trained; and adapting the acoustic model to the mismatched speech input, the speech processing method further comprising determining the likelihood of a sequence of features occurring in a given language using a language model; and combining the likelihoods determined by the acousticType: GrantFiled: August 10, 2011Date of Patent: December 31, 2013Assignee: Kabushiki Kaisha ToshibaInventors: Haitian Xu, Kean Kheong Chin, Mark John Francis Gales
-
Patent number: 8620657Abstract: One aspect includes determining validity of an identity asserted by a speaker using a voice print associated with a user whose identity the speaker is asserting, the voice print obtained from characteristic features of at least one first voice signal obtained from the user uttering at least one enrollment utterance including at least one enrollment word by obtaining a second voice signal of the speaker uttering at least one challenge utterance that includes at least one word not in the at least one enrollment utterance, obtaining at least one characteristic feature from the second voice signal, comparing the at least one characteristic feature with at least a portion of the voice print to determine a similarity between the at least one characteristic feature and the at least a portion of the voice print, and determining whether the speaker is the user based, at least in part, on the similarity.Type: GrantFiled: September 14, 2012Date of Patent: December 31, 2013Assignee: Nuance Communications, Inc.Inventors: Kevin R. Farrell, David A. James, William F. Ganong, III, Jerry K. Carter
-
Publication number: 20130339020Abstract: A display apparatus, an interactive server, and a method for providing response information are provided. The display apparatus includes: a voice collector which collects a user's uttered voice, a communication unit which communicates with an interactive server; and, a controller which, if response information corresponding to the uttered voice which is transmitted to the interactive server is received from the interactive server, controls to perform an operation corresponding to the user's uttered voice based on the response information, wherein the response information is generated in a different form according to a function of the display apparatus which is classified based on an utterance element extracted from the uttered voice. Accordingly the display apparatus can execute the function corresponding to each of the uttered voices and can output the response message corresponding to each of the uttered voices, even if a variety of uttered voices are input from the user.Type: ApplicationFiled: May 6, 2013Publication date: December 19, 2013Applicant: SAMSUNG ELECTRONICS CO., LTD.Inventors: Hye-hyun HEO, Hae-rim SON, Jun-hyung SHIN
-
Patent number: 8604327Abstract: There is provided an information processing device including a storage unit that stores music data for playing music and lyrics data indicating lyrics of the music, a display control unit that displays the lyrics of the music on a screen, a playback unit that plays the music and a user interface unit that detects a user input. The lyrics data includes a plurality of blocks each having lyrics of at least one character. The display control unit displays the lyrics of the music on the screen in such a way that each block included in the lyrics data is identifiable to a user while the music is played by the playback unit. The user interface unit detects timing corresponding to a boundary of each section of the music corresponding to each displayed block in response to a first user input.Type: GrantFiled: March 2, 2011Date of Patent: December 10, 2013Assignee: Sony CorporationInventor: Haruto Takeda
-
Patent number: 8606581Abstract: According to example configurations, a speech recognition system is configured to receive an utterance. Based on analyzing at least a portion of the utterance using a first speech recognition model on a first pass, the speech recognition system detects that the utterance includes a first group of one or more spoken words. The speech recognition system utilizes the first group of one or more spoken words identified in the utterance as detected on the first pass to locate a given segment of interest in the utterance. The given segment can include one or more that are unrecognizable by the first speech recognition model. Based on analyzing the given segment using a second speech recognition model on a second pass, the speech recognition system detects one or more additional words in the utterance. A natural language understanding module utilizes the detected words to generate a command intended by the utterance.Type: GrantFiled: December 14, 2010Date of Patent: December 10, 2013Assignee: Nuance Communications, Inc.Inventors: Holger Quast, Marcus Gröber, Mathias Maria Juliaan De Wachter, Frédéric Elie Ratle, Arthi Murugesan
-
Patent number: 8600752Abstract: A search apparatus includes a sound recognition unit which recognizes input sound, a user information estimation unit which estimates at least one of a physical condition and emotional demeanor of a speaker of the input sound based on the input sound and outputs user information representing the estimation result, a matching unit which performs matching between a search result target pronunciation symbol string and a recognition result pronunciation symbol string for each of plural search result target word strings, and a generation unit which generates a search result word string as a search result for a word string corresponding to the input sound from the plural search result target word strings based on the matching result. At least one of the matching unit and the generation unit changes processing in accordance with the user information.Type: GrantFiled: May 18, 2011Date of Patent: December 3, 2013Assignee: Sony CorporationInventors: Keiichi Yamada, Hitoshi Honda
-
Patent number: 8600747Abstract: A spoken dialog system and method having a dialog management module are disclosed. The dialog management module includes a plurality of dialog motivators for handling various operations during a spoken dialog. The dialog motivators comprise an error handling, disambiguation, assumption, confirmation, missing information, and continuation. The spoken dialog system uses the assumption dialog motivator in either a-priori or a-posteriori modes. A-priori assumption is based on predefined requirements for the call flow and a-posteriori assumption can work with the confirmation dialog motivator to assume the content of received user input and confirm received user input.Type: GrantFiled: June 17, 2008Date of Patent: December 3, 2013Assignee: AT&T Intellectual Property II, L.P.Inventors: Alicia Abella, Allen Louis Gorin
-
Patent number: 8595004Abstract: A problem to be solved is to robustly detect a pronunciation variation example and acquire a pronunciation variation rule having a high generalization property, with less effort. The problem can be solved by a pronunciation variation rule extraction apparatus including a speech data storage unit, a base form pronunciation storage unit, a sub word language model generation unit, a speech recognition unit, and a difference extraction unit. The speech data storage unit stores speech data. The base form pronunciation storage unit stores base form pronunciation data representing base form pronunciation of the speech data. The sub word language model generation unit generates a sub word language model from the base form pronunciation data. The speech recognition unit recognizes the speech data by using the sub word language model.Type: GrantFiled: November 27, 2008Date of Patent: November 26, 2013Assignee: NEC CorporationInventor: Takafumi Koshinaka
-
Patent number: 8595009Abstract: Methods and apparatuses for performing song detection on an audio signal are described. Clips of the audio signal are classified into classes comprising music. Class boundaries of music clips are detected as candidate boundaries of a first type. Combinations including non-overlapped sections are derived. Each section meets the following conditions: 1) including at least one music segment longer than a predetermined minimum song duration, 2) shorter than a predetermined maximum song duration, 3) both starting and ending with a music clip, and 4) a proportion of the music clips in each of the sections is greater than a predetermined minimum proportion. In this way, various possible song partitions in the audio signal can be obtained for investigation.Type: GrantFiled: July 26, 2012Date of Patent: November 26, 2013Assignee: Dolby Laboratories Licensing CorporationInventors: Lie Lu, Claus Bauer
-
Publication number: 20130311182Abstract: An apparatus for correcting errors in speech recognition is provided. The apparatus includes a feature vector extracting unit extracting feature vectors from a received speech. A speech recognizing unit recognizes the received speech as a word sequence on the basis of the extracted feature vectors. A phoneme weighted finite state transducer (WFST)-based converting unit converts the recognized word sequence recognized by the speech recognizing unit into a phoneme WFST. A speech recognition error correcting unit corrects errors in the converted phoneme WFST. The speech recognition error correcting unit includes a WFST synthesizing unit modeling a phoneme WFST transferred from the phoneme WFST-based converting unit as pronunciation variation on the basis of a Kullback-Leibler (KL) distance matrix.Type: ApplicationFiled: May 16, 2013Publication date: November 21, 2013Inventors: Hong-Kook KIM, Woo-Kyeong SEONG, Ji-Hun PARK
-
Patent number: 8589162Abstract: The present invention proposes a method, system and computer program for speech recognition. According to one embodiment, a method is provided wherein, for an expected input string divided into a plurality of expected string segments, a speech segment is received for each expected string segment. Speech recognition is then performed separately on each said speech segment via the generation, for each said speech segment, of a segment n-best list comprising n highest confidence score results. A global n-best list is then generated corresponding to the expected input string utilizing the segment n-best lists and a final global speech recognition result corresponding to said expected input string is determined via the pruning of the results of the global n-best list utilizing a pruning criterion.Type: GrantFiled: September 19, 2008Date of Patent: November 19, 2013Assignee: Nuance Communications, Inc.Inventors: Remi Lejeune, Hubert Crepy
-
Publication number: 20130304472Abstract: Techniques are described for automatically measuring fluency of a patient's speech based on prosodic characteristics thereof. The prosodic characteristics may include statistics regarding silent pauses, filled pauses, repetitions, or fundamental frequency of the patient's speech. The statistics may include a count, average number of occurrences, duration, average duration, frequency of occurrence, standard deviation, or other statistics. In one embodiment, a method includes receiving an audio sample that includes speech of a patient, analyzing the audio sample to identify prosodic characteristics of the speech of the patient, and automatically measuring fluency of the speech of the patient based on the prosodic characteristics. These techniques may present several advantages, such as objectively measuring fluency of a patient's speech without requiring a manual transcription or other manual intervention in the analysis process.Type: ApplicationFiled: July 17, 2013Publication date: November 14, 2013Inventor: Serguei V.S. Pakhomov
-
Patent number: 8583433Abstract: A system and method for efficiently transcribing verbal messages to text is provided. Verbal messages are received and at least one of the verbal messages is divided into segments. Automatically recognized text is determined for each of the segments by performing speech recognition and a confidence rating is assigned to the automatically recognized text for each segment. A threshold is applied to the confidence ratings and those segments with confidence ratings that fall below the threshold are identified. The segments that fall below the threshold are assigned to one or more human agents starting with those segments that have the lowest confidence ratings. Transcription from the human agents is received for the segments assigned to that agent. The transcription is assembled with the automatically recognized text of the segments not assigned to the human agents as a text message for the at least one verbal message.Type: GrantFiled: August 6, 2012Date of Patent: November 12, 2013Assignee: Intellisist, Inc.Inventors: Mike O. Webb, Bruce J. Peterson, Janet S. Kaseda
-
Patent number: 8583432Abstract: Methods and systems for automatic speech recognition and methods and systems for training acoustic language models are disclosed. One system for automatic speech recognition includes a dialect recognition unit and a controller. The dialect recognition unit is configured to analyze acoustic input data to identify portions of the acoustic input data that conform to a general language and to identify portions of the acoustic input data that conform to at least one dialect of the general language. In addition, the controller is configured to apply a general language model and at least one dialect language model to the input data to perform speech recognition by dynamically selecting between the models in accordance with each of the identified portions.Type: GrantFiled: July 25, 2012Date of Patent: November 12, 2013Assignee: International Business Machines CorporationInventors: Fadi Biadsy, Lidia Mangu, Hagen Soltau
-
Patent number: 8583436Abstract: A word category estimation apparatus (100) includes a word category model (5) which is formed from a probability model having a plurality of kinds of information about a word category as features, and includes information about an entire word category graph as at least one of the features. A word category estimation unit (4) receives the word category graph of a speech recognition hypothesis to be processed, computes scores by referring to the word category model for respective arcs that form the word category graph, and outputs a word category sequence candidate based on the scores.Type: GrantFiled: December 19, 2008Date of Patent: November 12, 2013Assignee: NEC CorporationInventors: Hitoshi Yamamoto, Kiyokazu Miki
-
Patent number: 8577681Abstract: A method of generating an alternative pronunciation for a word or phrase, given an initial pronunciation and a spoken example of the word or phrase, includes providing the initial pronunciation of the word or phrase, and generating the alternative pronunciation by searching a neighborhood of pronunciations about the initial pronunciation via a constrained hypothesis, wherein the neighborhood includes pronunciations that differ from the initial pronunciation by at most one phoneme. The method further includes selecting a highest scoring pronunciation within the neighborhood of pronunciations.Type: GrantFiled: September 13, 2004Date of Patent: November 5, 2013Assignee: Nuance Communications, Inc.Inventors: Daniel L. Roth, Laurence S. Gillick, Mike Shire
-
Publication number: 20130289995Abstract: The present invention discloses a method and device for voice control, which are used to solve the problem of low success rate of voice control in the prior art. The method includes: classifying stored recognition information used for voice recognizing to obtain a syntax packet corresponding to each type of recognition information (10); receiving an inputted voice signal, and performing a voice recognition processing respectively on the received voice signal by using each obtained syntax packet in turn (20), and performing a corresponding control processing based on a voice recognition result of the voice signal according to each syntax packet (30).Type: ApplicationFiled: January 12, 2011Publication date: October 31, 2013Applicant: ZTE CORPORATIONInventors: Manhai Li, Kaili Xiao, Jingping Wang, Xin Liao
-
Publication number: 20130289994Abstract: Techniques disclosed herein include systems and methods that enable a voice trigger that wakes-up an electronic device or causes the device to make additional voice commands active, without manual initiation of voice command functionality. In addition, such a voice trigger is dynamically programmable or customizable. A speaker can program or designate a particular phrase as the voice trigger. In general, techniques herein execute a voice-activated wake-up system that operates on a digital signal processor (DSP) or other low-power, secondary processing unit of an electronic device instead of running on a central processing unit (CPU). A speech recognition manager runs two speech recognition systems on an electronic device. The CPU dynamically creates a compact speech system for the DSP. Such a compact system can be continuously run during a standby mode, without quickly exhausting a battery supply.Type: ApplicationFiled: April 26, 2012Publication date: October 31, 2013Inventors: Michael Jack Newman, Robert Roth, William D. Alexander, Paul van Mulbregt
-
Patent number: 8566097Abstract: A lexical acquisition apparatus includes: a phoneme recognition section 2 for preparing a phoneme sequence candidate from an inputted speech; a word matching section 3 for preparing a plurality of word sequences based on the phoneme sequence candidate; a discrimination section 4 for selecting, from among a plurality of word sequences, a word sequence having a high likelihood in a recognition result; an acquisition section 5 for acquiring a new word based on the word sequence selected by the discrimination section 4; a teaching word list 4A used to teach a name; and a probability model 4B of the teaching word and an unknown word, wherein the discrimination section 4 calculates, for each word sequence, a first evaluation value showing how much words in the word sequence correspond to teaching words in the list 4A and a second evaluation value showing a probability at which the words in the word sequence are adjacent to one another and selects a word sequence for which a sum of the first evaluation value and theType: GrantFiled: June 1, 2010Date of Patent: October 22, 2013Assignees: Honda Motor Co., Ltd., Advanced Telecommunications Research Institute InternationalInventors: Mikio Nakano, Takashi Nose, Ryo Taguchi, Kotaro Funakoshi, Naoto Iwahashi
-
Patent number: 8560319Abstract: The present invention provides for a method and apparatus for segmenting a multi-media program based upon audio events. In an embodiment a method of classifying an audio stream is provided. This method includes receiving an audio stream. Sampling the audio stream at a predetermined rate and then combining a predetermined number of samples into a clip. A plurality of features are then determined for the clip and are analyzed using a linear approximation algorithm. The clip is then characterized based upon the results of the analysis conducted with the linear approximation algorithm.Type: GrantFiled: January 15, 2008Date of Patent: October 15, 2013Assignee: AT&T Intellectual Property II, L.P.Inventors: Qian Huang, Zhu Liu
-
Patent number: 8560318Abstract: A plurality of statements are received from within a grammar structure. Each of the statements is formed by a number of word sets. A number of alignment regions across the statements are identified by aligning the statements on a word set basis. Each aligned word set represents an alignment region. A number of potential confusion zones are identified across the statements. Each potential confusion zone is defined by words from two or more of the statements at corresponding positions outside the alignment regions. For each of the identified potential confusion zones, phonetic pronunciations of the words within the potential confusion zone are analyzed to determine a measure of confusion probability between the words when audibly processed by a speech recognition system during the computing event. An identity of the potential confusion zones across the statements and their corresponding measure of confusion probability are reported to facilitate grammar structure improvement.Type: GrantFiled: May 14, 2010Date of Patent: October 15, 2013Assignee: Sony Computer Entertainment Inc.Inventor: Gustavo A. Hernandez-Abrego
-
Patent number: 8560324Abstract: A mobile terminal including an input unit configured to receive an input to activate a voice recognition function on the mobile terminal, a memory configured to store information related to operations performed on the mobile terminal, and a controller configured to activate the voice recognition function upon receiving the input to activate the voice recognition function, to determine a meaning of an input voice instruction based on at least one prior operation performed on the mobile terminal and a language included in the voice instruction, and to provide operations related to the determined meaning of the input voice instruction based on the at least one prior operation performed on the mobile terminal and the language included in the voice instruction and based on a probability that the determined meaning of the input voice instruction matches the information related to the operations of the mobile terminal.Type: GrantFiled: January 31, 2012Date of Patent: October 15, 2013Assignee: LG Electronics Inc.Inventors: Jong-Ho Shin, Jae-Do Kwak, Jong-Keun Youn
-
Patent number: 8554566Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.Type: GrantFiled: November 29, 2012Date of Patent: October 8, 2013Assignee: Morphism LLCInventor: James H. Stephens, Jr.
-
Publication number: 20130262116Abstract: A computer-implemented method and apparatus for searching for an element sequence, the method comprising: receiving a signal; determining an initial segment of the signal; inputting the initial segment into an element extraction engine to obtain a first element sequence; determining one or more second segments, each of the second segments at least partially overlapping with the initial segment; inputting the second segments into the element extraction engine to obtain at least one second element sequence; and searching for an element subsequence common to at least a predetermined number of sequences of the first element sequence and the second element sequences.Type: ApplicationFiled: March 27, 2012Publication date: October 3, 2013Applicant: NOVOSPEECHInventor: Yossef Ben-Ezra
-
Patent number: 8548807Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.Type: GrantFiled: June 9, 2009Date of Patent: October 1, 2013Assignee: AT&T Intellectual Property I, L.P.Inventors: Andrej Ljolje, Alistair D. Conkie, Ann K. Syrdal
-
Patent number: 8543404Abstract: Embodiments of the present invention provide a method and computer program product for the proactive completion of input fields for automated voice enablement of a Web page. In an embodiment of the invention, a method for proactively completing empty input fields for voice enabling a Web page can be provided. The method can include receiving speech input for an input field in a Web page and inserting a textual equivalent to the speech input into the input field in a Web page. The method further can include locating an empty input field remaining in the Web page and generating a speech grammar for the input field based upon permitted terms in a core attribute of the empty input field and prompting for speech input for the input field. Finally, the method can include posting the received speech input and the grammar to an automatic speech recognition (ASR) engine and inserting a textual equivalent to the speech input provided by the ASR engine into the empty input field.Type: GrantFiled: April 7, 2008Date of Patent: September 24, 2013Assignee: Nuance Communications, Inc.Inventors: Victor S. Moore, Wendi L. Nusbickel
-
Patent number: 8543400Abstract: Voice processing methods and systems are provided. An utterance is received. The utterance is compared with teaching materials according to at least one matching algorithm to obtain a plurality of matching values corresponding to a plurality of voice units of the utterance. Respective voice units are scored in at least one first scoring item according to the matching values and a personified voice scoring algorithm. The personified voice scoring algorithm is generated according to training utterances corresponding to at least one training sentence in a phonetic-balanced sentence set of a plurality of learners and at least one real teacher, and scores corresponding to the respective voice units of the training utterances of the learners in the first scoring item provided by the real teacher.Type: GrantFiled: June 6, 2008Date of Patent: September 24, 2013Assignee: National Taiwan UniversityInventors: Lin-Shan Lee, Che-Kuang Lin, Chia-Lin Chang, Yi-Jing Lin, Yow-Bang Wang, Yun-Huan Lee, Li-Wei Cheng
-
Patent number: 8543402Abstract: System and methods for robust multiple speaker segmentation in noisy conversational speech are presented. Robust voice activity detection is applied to detect temporal speech events. In order to get robust speech features and detect speech events in a noisy environment, a noise reduction algorithm is applied, using noise tracking. After noise reduction and voice activity detection, the incoming audio/speech is initially labeled as speech segments or silence segments. With no prior knowledge of the number of speakers, the system identifies one reliable speech segment near the beginning of the conversational speech and extracts speech features with a short latency, then learns a statistical model from the selected speech segment. This initial statistical model is used to identify the succeeding speech segments in a conversation. The statistical model is also continuously adapted and expanded with newly identified speech segments that match well to the model.Type: GrantFiled: April 29, 2011Date of Patent: September 24, 2013Assignee: The Intellisis CorporationInventor: Jiyong Ma
-
Patent number: 8532989Abstract: A command recognition device includes: an utterance understanding unit that determines or selects word sequence information from speech information; speech confidence degree calculating unit that calculates degree of speech confidence based on the speech information and the word sequence information; a phrase confidence degree calculating unit that calculates a degree of phrase confidence based on image information and phrase information included in the word sequence information; and a motion control instructing unit that determines whether a command of the word sequence information should be executed based on the degree of speech confidence and the degree of phrase confidence.Type: GrantFiled: September 2, 2010Date of Patent: September 10, 2013Assignee: Honda Motor Co., Ltd.Inventors: Kotaro Funakoshi, Mikio Nakano, Xiang Zuo, Naoto Iwahashi, Ryo Taguchi
-
Patent number: 8532993Abstract: A system and method for performing speech recognition is disclosed. The method comprises receiving an utterance, applying the utterance to a recognizer with a language model having pronunciation probabilities associated with unique word identifiers for words given their pronunciations and presenting a recognition result for the utterance. Recognition improvement is found by moving a pronunciation model from a dictionary to the language model.Type: GrantFiled: July 2, 2012Date of Patent: September 10, 2013Assignee: AT&T Intellectual Property II, L.P.Inventor: Andrej Ljolje
-
Patent number: 8532988Abstract: A method for searching for an input symbol string, includes receiving (B) an input symbol string, proceeding (C) in a trie data structure to a calculation point indicated by the next symbol, calculating (D) distances at the calculation point, selecting (E) repeatedly the next branch to follow (C) to the next calculation point to repeat the calculation (D). After the calculation (G), selecting the symbol string having the shortest distance to the input symbol string on the basis of the performed calculations. To minimize the number of calculations, not only the distances are calculated (D) at the calculation points, but also the smallest possible length difference corresponding to each distance, and on the basis of each distance and corresponding length difference a reference value is calculated, and the branch is selected (E) in such a manner that next the routine proceeds from the calculation point producing the lowest reference value.Type: GrantFiled: July 3, 2003Date of Patent: September 10, 2013Assignee: Syslore OyInventor: Jorkki Hyvonen
-
Publication number: 20130231934Abstract: A system and method is provided for recognizing a speech input and selecting an entry from a list of entries. The method includes recognizing a speech input. A fragment list of fragmented entries is provided and compared to the recognized speech input to generate a candidate list of best matching entries based on the comparison result. The system includes a speech recognition module, and a data base for storing the list of entries and the fragmented list. The speech recognition module may obtain the fragmented list from the data base and store a candidate list of best matching entries in memory. A display may also be provided to allow a user to select from a list of best matching entries.Type: ApplicationFiled: March 18, 2013Publication date: September 5, 2013Applicant: NUANCE COMMUNICATIONS, INC.Inventor: Markus Schwarz