Subportions Patents (Class 704/254)
-
Patent number: 8527272Abstract: A method and apparatus for aligning texts. The method includes acquiring a target text and a reference text and aligning the target text and the reference text at word level based on phoneme similarity. The method can be applied to automatically archiving a multimedia resource and a method of automatically searching a multimedia resource.Type: GrantFiled: August 27, 2010Date of Patent: September 3, 2013Assignee: International Business Machines CorporationInventors: Yong Qin, Qin Shi, Zhiwei Shuang, Shi Lei Zhang, Jie Zhou
-
Publication number: 20130226583Abstract: A language identification system that includes a universal phoneme decoder (UPD) is described. The UPD contains a universal phoneme set representing both 1) all phonemes occurring in the set of two or more spoken languages, and 2) captures phoneme correspondences across languages, such that a set of unique phoneme patterns and probabilities are calculated in order to identify a most likely phoneme occurring each time in the audio files in the set of two or more potential languages in which the UPD was trained on. Each statistical language model (SLM) uses the set of unique phoneme patterns created for each language in the set to distinguish between spoken human languages in the set of languages. The run-time language identifier module identifies a particular human language being spoken by utilizing the linguistic probabilities supplied by the SLMs that are based on the set of unique phoneme patterns created for each language.Type: ApplicationFiled: March 18, 2013Publication date: August 29, 2013Applicant: Autonomy Corporation LimitedInventor: Autonomy Corporation Limited
-
Patent number: 8521527Abstract: A computer-implemented system and method for processing audio in a voice response environment is provided. A database of host scripts each comprising signature files of audio phrases and actions to take when one of the audio phrases is recognized is maintained. The host scripts are loaded and a call to a voice mail server is initiated. Incoming audio buffers are received during the call from voice messages stored on the voice mail server. The incoming audio buffers are processed. A signature data structure is created for each audio buffer. The signature data structure is compared with signatures of expected phrases in the host scripts. The actions stored in the host scripts are executed when the signature data structure matches the signature of the expected phrase.Type: GrantFiled: September 10, 2012Date of Patent: August 27, 2013Assignee: Intellisist, Inc.Inventor: Martin R. M. Dunsmuir
-
Patent number: 8521529Abstract: An input signal is converted to a feature-space representation. The feature-space representation is projected onto a discriminant subspace using a linear discriminant analysis transform to enhance the separation of feature clusters. Dynamic programming is used to find global changes to derive optimal cluster boundaries. The cluster boundaries are used to identify the segments of the audio signal.Type: GrantFiled: April 18, 2005Date of Patent: August 27, 2013Assignee: Creative Technology LtdInventors: Michael M. Goodwin, Jean Laroche
-
Publication number: 20130218563Abstract: A speech recognition system includes a mobile device and a remote server. The mobile device receives the speech from the user and extracts the features and phonemes from the speech. Selected phonemes and measures of uncertainty are transmitted to the server, which processes the phonemes for speech understanding and transmits a text of the speech (or the context or understanding of the speech) back to the mobile device.Type: ApplicationFiled: January 29, 2013Publication date: August 22, 2013Applicant: Intelligent Mechatronic Systems Inc.Inventor: Intelligent Mechatronic Systems Inc.
-
Patent number: 8515749Abstract: Systems and methods for facilitating communication including recognizing speech in a first language represented in a first audio signal; forming a first text representation of the speech; processing the first text representation to form data representing a second audio signal; and causing presentation of the second audio signal to a second user while responsive to an interrupt signal from a first user. In some embodiments, processing the first text representation includes translating the first text representation to a second text representation in a second language and processing the second text representation to form the data representing the second audio signal. In some embodiments include accepting an interrupt signal from the first user and interrupting the presentation of the second audio signal.Type: GrantFiled: May 20, 2009Date of Patent: August 20, 2013Assignee: Raytheon BBN Technologies Corp.Inventor: David G. Stallard
-
Patent number: 8515753Abstract: The example embodiment of the present invention provides an acoustic model adaptation method for enhancing recognition performance for a non-native speaker's speech. In order to adapt acoustic models, first, pronunciation variations are examined by analyzing a non-native speaker's speech. Thereafter, based on variation pronunciation of a non-native speaker's speech, acoustic models are adapted in a state-tying step during a training process of acoustic models. When the present invention for adapting acoustic models and a conventional acoustic model adaptation scheme are combined, more-enhanced recognition performance can be obtained. The example embodiment of the present invention enhances recognition performance for a non-native speaker's speech while reducing the degradation of recognition performance for a native speaker's speech.Type: GrantFiled: March 30, 2007Date of Patent: August 20, 2013Assignee: Gwangju Institute of Science and TechnologyInventors: Hong Kook Kim, Yoo Rhee Oh, Jae Sam Yoon
-
Patent number: 8515750Abstract: Methods, systems, and computer programs encoded on a computer storage medium for real-time acoustic adaptation using stability measures are disclosed. The methods include the actions of receiving a transcription of a first portion of a speech session, wherein the transcription of the first portion of the speech session is generated using a speaker adaptation profile. The actions further include receiving a stability measure for a segment of the transcription and determining that the stability measure for the segment satisfies a threshold. Additionally, the actions include triggering an update of the speaker adaptation profile using the segment, or using a portion of speech data that corresponds to the segment. And the actions include receiving a transcription of a second portion of the speech session, wherein the transcription of the second portion of the speech session is generated using the updated speaker adaptation profile.Type: GrantFiled: September 19, 2012Date of Patent: August 20, 2013Assignee: Google Inc.Inventors: Xin Lei, Petar Aleksic
-
Patent number: 8510111Abstract: A speech recognition apparatus includes a generating unit generating a speech-feature vector expressing a feature for each of frames obtained by dividing an input speech, a storage unit storing a first acoustic model obtained by modeling a feature of each word by using a state transition model, a storage unit configured to store at least one second acoustic model, a calculation unit calculating, for each state, a first probability of transition to an at-end-frame state to obtain first probabilities, and select a maximum probability of the first probabilities, a selection unit selecting a maximum-probability-transition path, a conversion unit converting the maximum-probability-transition path into a corresponding-transition-path corresponding to the second acoustic model, a calculation unit calculating a second probability of transition to the at-end-frame state on the corresponding-transition-path, and a finding unit finding to which word the input speech corresponds based on the maximum probability and the sType: GrantFiled: February 8, 2008Date of Patent: August 13, 2013Assignee: Kabushiki Kaisha ToshibaInventors: Masaru Sakai, Hiroshi Fujimura, Shinichi Tanaka
-
Patent number: 8504367Abstract: Disclosed are a speech retrieval apparatus and a speech retrieval method for searching, in a speech database, for an audio file matching an input search term by using an acoustic model serialization code, a phonemic code, a sub-word unit, and a speech recognition result of speech. The speech retrieval apparatus comprises a first conversion device, a first division device, a first speech retrieval unit creation device, a second conversion device, a second division device, a second speech retrieval unit creation device, and a matching device. The speech retrieval method comprises a first conversion step, a first division step, a first speech retrieval unit creation step, a second conversion step, a second division step, a second speech retrieval unit creation step, and a matching step.Type: GrantFiled: August 31, 2010Date of Patent: August 6, 2013Assignee: Ricoh Company, Ltd.Inventors: Dafei Shi, Yaojie Lu, Yueyan Yin, Jichuan Zheng, Lijun Zhao
-
Patent number: 8498859Abstract: A language-processing system has an input for language in text or audio, as a message, an extractor operating to separate words and phrases from the input, to consult a knowledge base, and to assign a concept to individual ones of the words or phrases, and a connector operating to link the concepts to form a statement. In some cases there is a situation model updated as language is processed. The system may be used for controlling technical systems, such as robotic systems.Type: GrantFiled: November 12, 2003Date of Patent: July 30, 2013Inventor: Bernd Schönebeck
-
Patent number: 8498871Abstract: A system for facilitating free form dictation, including directed dictation and constrained recognition and/or structured transcription among users having heterogeneous protocols for generating, transcribing, and exchanging recognized and transcribed speech. The system includes a system transaction manager having a “system protocol,” to receive a speech information request from an authorized user. The speech information request is generated using a user interface capable of bi-directional communication with the system transaction manager and supporting dictation applications. A speech recognition and/or transcription engine (ASR), in communication with the systems transaction manager, receives the speech information request generates a transcribed response, and transmits the response to the system transaction manager. The system transaction manager routes the response to one or more of the users. In another embodiment, the system employs a virtual sound driver for streaming free form dictation to any ASR.Type: GrantFiled: May 24, 2011Date of Patent: July 30, 2013Assignee: Advanced Voice Recognition Systems, Inc.Inventors: Joseph H. Miglietta, Michael K. Davis
-
Publication number: 20130191129Abstract: System and method for performing speech recognition using acoustic invariant structure for large vocabulary continuous speech. An information processing device receives sound as input and performs speech recognition. The information processing device includes: a speech recognition processing unit for outputting a speech recognition score, a structure score calculation unit for calculation of a structure score that is a score that, with respect for each hypothesis concerning all phoneme pairs comprising the hypothesis, is found by applying phoneme pair-by-pair weighting to phoneme pair inter-distribution distance likelihood and then performing summation, and a ranking unit for ranking the multiple hypotheses based on a sum value of speech recognition score and structure score.Type: ApplicationFiled: January 18, 2013Publication date: July 25, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: International Business Machines Corporation
-
Publication number: 20130191128Abstract: A continuous phonetic recognition method using semi-Markov model, a system for processing the method, and a recording medium for storing the method. In and embodiment of the phonetic recognition method of recognizing phones using a speech recognition system, a phonetic data recognition device receives speech, and a phonetic data processing device recognizes phones from the received speech using a semi-Markov model.Type: ApplicationFiled: August 28, 2012Publication date: July 25, 2013Applicant: KOREA ADVANCED INSTITUTE OF SCIENCE AND TECHNOLOGYInventors: Chang Dong Yoo, Sung Woong Kim
-
Patent number: 8494850Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for recognizing speech using a variable length of context. Speech data and data identifying a candidate transcription for the speech data are received. A phonetic representation for the candidate transcription is accessed. Multiple test sequences are extracted for a particular phone in the phonetic representation. Each of the multiple test sequences includes a different set of contextual phones surrounding the particular phone. Data indicating that an acoustic model includes data corresponding to one or more of the multiple test sequences is received. From among the one or more test sequences, the test sequence that includes the highest number of contextual phones is selected. A score for the candidate transcription is generated based on the data from the acoustic model that corresponds to the selected test sequence.Type: GrantFiled: June 29, 2012Date of Patent: July 23, 2013Assignee: Google Inc.Inventors: Ciprian I. Chelba, Peng Xu, Fernando Pereira
-
Publication number: 20130185073Abstract: The invention deals with speech recognition, such as a system for recognizing words in continuous speech. A speech recognition system is disclosed which is capable of recognizing a huge number of words, and in principle even an unlimited number of words. The speech recognition system comprises a word recognizer for deriving a best path through a word graph, and wherein words are assigned to the speech based on the best path. The word score being obtained from applying a phonemic language model to each word of the word graph. Moreover, the invention deals with an apparatus and a method for identifying words from a sound block and to computer readable code for implementing the method.Type: ApplicationFiled: March 6, 2013Publication date: July 18, 2013Applicant: Nuance Communications Austria GmbHInventor: Nuance Communications Austria GmbH
-
Patent number: 8489398Abstract: A method is performed by a communication device that is configured to communicate with a server over a network. The method includes outputting, to the server, speech data for spoken words; receiving, from the server, speech recognition candidates for a spoken word in the speech data; checking the speech recognition candidates against a database on the communication device; and selecting one or more of the speech recognition candidates for use by the communication device based on the checking.Type: GrantFiled: January 14, 2011Date of Patent: July 16, 2013Assignee: Google Inc.Inventor: Alexander H. Gruenstein
-
Publication number: 20130179169Abstract: A Chinese text readability assessing system analyzes and evaluates the readability of text data. A word segmentation module compares the text data with a corpus to obtain a plurality of word segments from the text data and provide part-of-speech settings corresponding to the word segments. A readability index analysis module analyzes the word segments and the part-of-speech settings based on readability indices to calculate index values of the readability indices in the text data. The index values are inputted to a readability mathematical model in a knowledge-evaluated training module, and the readability mathematical model produces a readability analysis result. Accordingly, the Chinese text readability assessing system of the present invention evaluates the readability of Chinese texts by word segmentation and the readability indices analysis in conjunction with the readability mathematical model.Type: ApplicationFiled: July 5, 2012Publication date: July 11, 2013Applicant: NATIONAL TAIWAN NORMAL UNIVERSITYInventors: Yao-Ting Sung, Ju-Ling Chen
-
Patent number: 8478597Abstract: The present disclosure presents a useful metric for assessing the relative difficulty which non-native speakers face in pronouncing a given utterance and a method and systems for using such a metric in the evaluation and assessment of the utterances of non-native speakers. In an embodiment, the metric may be based on both known sources of difficulty for language learners and a corpus-based measure of cross-language sound differences. The method may be applied to speakers who primarily speak a first language speaking utterances in any non-native second language.Type: GrantFiled: January 10, 2006Date of Patent: July 2, 2013Assignee: Educational Testing ServiceInventors: Derrick Higgins, Klaus Zechner, Yoko Futagi, Rene Lawless
-
Publication number: 20130159000Abstract: The subject disclosure is directed towards training a classifier for spoken utterances without relying on human-assistance. The spoken utterances may be related to a voice menu program for which a speech comprehension component interprets the spoken utterances into voice menu options. The speech comprehension component provides confirmations to some of the spoken utterances in order to accurately assign a semantic label. For each spoken utterance with a denied confirmation, the speech comprehension component automatically generates a pseudo-semantic label that is consistent with the denied confirmation and selected from a set of potential semantic labels and updates a classification model associated with the classifier using the pseudo-semantic label.Type: ApplicationFiled: December 15, 2011Publication date: June 20, 2013Applicant: MICROSOFT CORPORATIONInventors: Yun-Cheng Ju, James Garnet Droppo, III
-
Patent number: 8463609Abstract: In the present invention, a voice input system and a voice input method are provided. The voice input method includes the steps of: (A) initiating a speech recognition process by a first input associated with a first parameter of a first speech recognition subject; (B) providing a voice and a searching space constructed by a speech recognition model associated with the first speech recognition subject; (C) obtaining a sub-searching space from the searching space based on the first parameter; (D) searching at least one candidate item associated with the voice from the sub-searching space; and (E) showing the at least one candidate item.Type: GrantFiled: April 29, 2009Date of Patent: June 11, 2013Assignee: Delta Electronics Inc.Inventors: Keng-Hung Yeh, Liang-Sheng Huang, Chao-Jen Huang, Jia-Lin Shen
-
Patent number: 8457967Abstract: A procedure to automatically evaluate the spoken fluency of a speaker by prompting the speaker to talk on a given topic, recording the speaker's speech to get a recorded sample of speech, and then analyzing the patterns of disfluencies in the speech to compute a numerical score to quantify the spoken fluency skills of the speakers. The numerical fluency score accounts for various prosodic and lexical features, including formant-based filled-pause detection, closely-occurring exact and inexact repeat N-grams, normalized average distance between consecutive occurrences of N-grams. The lexical features and prosodic features are combined to classify the speaker with a C-class classification and develop a rating for the speaker.Type: GrantFiled: August 15, 2009Date of Patent: June 4, 2013Assignee: Nuance Communications, Inc.Inventors: Kartik Audhkhasi, Om D. Deshmukh, Kundan Kandhway, Ashish Verma
-
Publication number: 20130138441Abstract: Disclosed is a method of generating a search network for voice recognition, the method including: generating a pronunciation transduction weighted finite state transducer by implementing a pronunciation transduction rule representing a phenomenon of pronunciation transduction between recognition units as a weighted finite state transducer; and composing the pronunciation transduction weighted finite state transducer and one or more weighted finite state transducers.Type: ApplicationFiled: August 14, 2012Publication date: May 30, 2013Applicant: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTEInventors: Seung Hi Kim, Dong Hyun Kim, Young Ik Kim, Jun Park, Hoon Young Cho, Sang Hun Kim
-
Publication number: 20130124205Abstract: A system allows a user to obtain information about television programming and to make selections of programming using conversational speech. The system includes a speech recognizer that recognizes spoken requests for television programming information. A speech synthesizer generates spoken responses to the spoken requests for television programming information. A user may use a voice user interface as well as a graphical user interface to interact with the system to facilitate the selection of programming choices.Type: ApplicationFiled: January 3, 2013Publication date: May 16, 2013Inventor: Christopher H. Genly
-
Patent number: 8442827Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for approximating an accent source. A system practicing the method collects data associated with customer specific services, generates country-specific or dialect-specific weights for each service in the customer specific services list, generates a summary weight based on an aggregation of the country-specific or dialect-specific weights, and sets an interactive voice response system language model based on the summary weight and the country-specific or dialect-specific weights. The interactive voice response system can also change the user interface based on the interactive voice response system language model. The interactive voice response system can tune a voice recognition algorithm based on the summary weight and the country-specific weights. The interactive voice response system can adjust phoneme matching in the language model based on a possibility that the speaker is using other languages.Type: GrantFiled: June 18, 2010Date of Patent: May 14, 2013Assignee: AT&T Intellectual Property I, L.P.Inventor: Nicholas Duffield
-
Patent number: 8438027Abstract: An object of the invention is to conveniently increase standard patterns registered in a voice recognition device to efficiently extend the amount of words that can be voice-recognized. New standard patterns are generated by modifying a part of an existing standard pattern. A pattern matching unit 16 of a modifying-part specifying unit 14 performs pattern matching process to specify a part to be modified in the existing standard pattern of a usage source. A standard pattern generating unit 18 generates the new standard patterns by cutting or deleting voice data of the modifying part of the usage-source standard pattern, substituting the voice data of the modifying part of the usage-source standard pattern for another voice data, or combining the voice data of the modifying part of the usage-source standard pattern with another voice data. A standard pattern database update unit 20 adds the new standard patterns to a standard pattern database 24.Type: GrantFiled: May 25, 2006Date of Patent: May 7, 2013Assignee: Panasonic CorporationInventors: Toshiyuki Teranishi, Kouji Hatano
-
Patent number: 8438028Abstract: A method of and system for managing nametags including receiving a command from a user to store a nametag, prompting the user to input a number to be stored in association with the nametag, receiving an input for the number from the user, prompting the user to input the nametag to be stored in association with the number, receiving an input for the nametag from the user, processing the nametag input, and calculating confusability of the nametag input in multiple individual domains including a nametag domain, a number domain, and a command domain.Type: GrantFiled: May 18, 2010Date of Patent: May 7, 2013Assignee: General Motors LLCInventors: Rathinavelu Chengalvarayan, Lawrence D. Cepuran
-
Patent number: 8433575Abstract: A system and method is described in which a multimedia story is rendered to a consumer in dependence on features extracted from an audio signal representing for example a musical selection of the consumer. Features such as key changes and tempo of the music selection are related to dramatic parameters defined by and associated with story arcs, narrative story rules and film or story structure. In one example a selection of a few music tracks provides input audio signals (602) from which musical features are extracted (604), following which a dramatic parameter list and timeline are generated (606). Media fragments are then obtained (608), the fragments having story content associated with the dramatic parameters, and the fragments output (610) with the music selection.Type: GrantFiled: December 10, 2003Date of Patent: April 30, 2013Assignee: AMBX UK LimitedInventors: David A. Eves, Richard S. Cole, Christopher Thorne
-
Patent number: 8433573Abstract: A prosody modification device includes: a real voice prosody input part that receives real voice prosody information extracted from an utterance of a human; a regular prosody generating part that generates regular prosody information having a regular phoneme boundary that determines a boundary between phonemes and a regular phoneme length of a phoneme by using data representing a regular or statistical phoneme length in an utterance of a human with respect to a section including at least a phoneme or a phoneme string to be modified in the real voice prosody information; and a real voice prosody modification part that resets a real voice phoneme boundary by using the generated regular prosody information so that the real voice phoneme boundary and a real voice phoneme length of the phoneme or the phoneme string to be modified in the real voice prosody information are approximate to an actual phoneme boundary and an actual phoneme length of the utterance of the human, thereby modifying the real voice prosody inType: GrantFiled: February 11, 2008Date of Patent: April 30, 2013Assignee: Fujitsu LimitedInventors: Kentaro Murase, Nobuyuki Katae
-
Apparatus and method for classification and segmentation of audio content, based on the audio signal
Patent number: 8428949Abstract: An apparatus for classifying an input audio signal into audio contents of a first and second class, comprising an audio segmentation module adapted to segment said input audio signal into segments of a predetermined length; a feature computation module adapted to calculate for the segments features characterizing said audio input signal; a threshold comparison module adapted to generate a feature vector for each of said one or more segments based on a plurality of predetermined thresholds, the thresholds including for each of the audio contents of the first class and of the second class a substantially near certainty threshold, a substantially high certainty threshold, and a substantially low certainty threshold; and a classification module adapted to analyze the feature vector and classify each one of said one or more segments as audio contents of the first class, of the second class, or as non-decisive audio contents.Type: GrantFiled: June 30, 2009Date of Patent: April 23, 2013Assignee: Waves Audio Ltd.Inventors: Itai Neoran, Yizhar Lavner, Dima Ruinskiy -
Patent number: 8428950Abstract: A speech recognition apparatus (110) selects an optimum recognition result from recognition results output from a set of speech recognizers (s1-sM) based on a majority decision. This decision is implemented with taking into account weight values, as to the set of the speech recognizers, learned by a learning apparatus (100). The learning apparatus includes a unit (103) selecting speech recognizers corresponding to characteristics of speech for learning (101), a unit (104) finding recognition results of the speech for learning by using the selected speech recognizers, a unit (105) unifying the recognition results and generating a word string network, and a unit (106) finding weight values concerning a set of the speech recognizers by implementing learning processing.Type: GrantFiled: January 18, 2008Date of Patent: April 23, 2013Assignee: NEC CorporationInventors: Yoshifumi Onishi, Tadashi Emori
-
Patent number: 8417528Abstract: The invention deals with speech recognition, such as a system for recognizing words in continuous speech. A speech recognition system is disclosed which is capable of recognizing a huge number of words, and in principle even an unlimited number of words. The speech recognition system comprises a word recognizer for deriving a best path through a word graph, and wherein words are assigned to the speech based on the best path. The word score being obtained from applying a phonemic language model to each word of the word graph. Moreover, the invention deals with an apparatus and a method for identifying words from a sound block and to computer readable code for implementing the method.Type: GrantFiled: February 3, 2012Date of Patent: April 9, 2013Assignee: Nuance Communications Austria GmbHInventor: Zsolt Saffer
-
Patent number: 8417527Abstract: A phonetic vocabulary for a speech recognition system is adapted to a particular speaker's pronunciation. A speaker can be attributed specific pronunciation styles, which can be identified from specific pronunciation examples. Consequently, a phonetic vocabulary can be reduced in size, which can improve recognition accuracy and recognition speed.Type: GrantFiled: October 13, 2011Date of Patent: April 9, 2013Assignee: Nuance Communications, Inc.Inventors: Nitendra Rajput, Ashish Verma
-
Publication number: 20130085757Abstract: An embodiment of an apparatus for speech recognition includes a plurality of trigger detection units, each of which is configured to detect a start trigger for recognizing a command utterance for controlling a device, a selection unit, utilizing a signal from one or more sensors embedded on the device, configured to select a selected trigger detection unit among the trigger detection units, the selected trigger detection unit being appropriate to a usage environment of the device, and a recognition unit configured to recognize the command utterance when the start trigger is detected by the selected trigger detection unit.Type: ApplicationFiled: June 29, 2012Publication date: April 4, 2013Applicant: Kabushiki Kaisha ToshibaInventors: Masanobu NAKAMURA, Akinori KAWAMURA
-
Patent number: 8412525Abstract: Embodiments for implementing a speech recognition system that includes a speech classifier ensemble are disclosed. In accordance with one embodiment, the speech recognition system includes a classifier ensemble to convert feature vectors that represent a speech vector into log probability sets. The classifier ensemble includes a plurality of classifiers. The speech recognition system includes a decoder ensemble to transform the log probability sets into output symbol sequences. The speech recognition system further includes a query component to retrieve one or more speech utterances from a speech database using the output symbol sequences.Type: GrantFiled: April 30, 2009Date of Patent: April 2, 2013Assignee: Microsoft CorporationInventors: Kunal Mukerjee, Kazuhito Koishida, Shankar Regunathan
-
Patent number: 8407047Abstract: A guidance information display device includes: a voice input unit; a display unit for displaying guidance information; an operation unit for accepting an operation; and a processor capable of executing the following processes of: a voice recognition process operation of performing voice recognition based on inputted voice; a calculation operation of calculating an evaluation value for a recognition result of voice recognition by the voice recognition process operation; a display operation of reading out guidance information corresponding to the recognition result from a storage unit, which stores the guidance information, and displaying the guidance information at a display unit; and a decision operation of deciding a display mode of the guidance information at the display unit based on a variable value, which varies with an operation from the operation unit for the guidance information displayed by the display operation, and the evaluation value calculated by the calculation operation.Type: GrantFiled: March 31, 2009Date of Patent: March 26, 2013Assignee: Fujitsu LimitedInventor: Kenji Abe
-
Patent number: 8401861Abstract: A method for generating a frequency warping function comprising preparing the training speech of a source and a target speaker; performing frame alignment on the training speech of the speakers; selecting aligned frames from the frame-aligned training speech of the speakers; extracting corresponding sets of formant parameters from the selected aligned frames; and generating a frequency warping function based on the corresponding sets of formant parameters. The step of selecting aligned frames preferably selects a pair of aligned frames in the middle of the same or similar frame-aligned phonemes with the same or similar contexts in the speech of the source speaker and target speaker. The step of generating a frequency warping function preferably uses the various pairs of corresponding formant parameters in the corresponding sets of formant parameters as key positions in a piecewise linear frequency warping function to generate the frequency warping function.Type: GrantFiled: January 17, 2007Date of Patent: March 19, 2013Assignee: Nuance Communications, Inc.Inventors: Shuang Zhi Wei, Raimo Bakis, Ellen Marie Eide, Liqin Shen
-
Patent number: 8401847Abstract: An unknown word is additionally registered in a speech recognition dictionary by utilizing a correction result, and a new pronunciation of the word that has been registered in a speech recognition dictionary is additionally registered in the speech recognition dictionary, thereby increasing the accuracy of speech recognition. The start time and finish time of each phoneme unit in speech data corresponding to each phoneme included in a phoneme sequence acquired by a phoneme sequence converting section 13 are added to the phoneme sequence. A phoneme sequence extracting section 15 extracts from the phoneme sequence a phoneme sequence portion composed of phonemes existing in a segment corresponding to the period from the start time to the finish time of the word segment of the word corrected by a word correcting section 9 and the extracted phoneme sequence portion is determined as the pronunciation of the corrected word.Type: GrantFiled: November 30, 2007Date of Patent: March 19, 2013Assignee: National Institute of Advanced Industrial Science and TechnologyInventors: Jun Ogata, Masataka Goto
-
Patent number: 8396714Abstract: Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized form text strings associated with media assets. A text string may be normalized and its native language determined for obtaining a target phoneme for providing human-sounding speech in a language (e.g., dialect or accent) that is familiar to a user. The algorithms may be implemented on a system including several dedicated render engines. The system may be part of a back end coupled to a front end including storage for media assets and associated synthesized speech, and a request processor for receiving and processing requests that result in providing the synthesized speech. The front end may communicate media assets and associated synthesized speech content over a network to host devices coupled to portable electronic devices on which the media assets and synthesized speech are played back.Type: GrantFiled: September 29, 2008Date of Patent: March 12, 2013Assignee: Apple Inc.Inventors: Matthew Rogers, Kim Silverman, Devang Naik, Benjamin Rottler
-
Publication number: 20130060572Abstract: In an aspect, in general, method for aligning an audio recording and a transcript includes receiving a transcript including a plurality of terms, each term of the plurality of terms associated with a time location within a different version of the audio recording, forming a plurality of search terms from the terms of the transcript, determining possible time locations of the search terms in the audio recording, determining a correspondence between time locations within the different version of the audio recording associated with the search terms and the possible time locations of the search terms in the audio recording, and aligning the audio recording and the transcript including updating the time location associated with terms of the transcript based on the determined correspondence.Type: ApplicationFiled: September 4, 2012Publication date: March 7, 2013Applicant: Nexidia Inc.Inventors: Jacob B. Garland, Drew Lanham, Daryl Kip Watters, Marsal Gavalda, Mark Finlay, Kenneth K. Griggs
-
Patent number: 8386254Abstract: A method of speech recognition converts an unknown speech input into a stream of representative features. The feature stream is transformed based on speaker dependent adaptation of multi-class feature models. Then automatic speech recognition is used to compare the transformed feature stream to multi-class speaker independent acoustic models to generate an output representative of the unknown speech input.Type: GrantFiled: May 2, 2008Date of Patent: February 26, 2013Assignee: Nuance Communications, Inc.Inventors: Neeraj Deshmukh, Puming Zhan
-
Patent number: 8380506Abstract: Disclosed are apparatus and methods that employ a modified version of a computational model of the human peripheral and central auditory system, and that provide for automatic pattern recognition using category dependent feature selection. The validity of the output of the model is examined by deriving feature vectors from the dimension expanded cortical response of the central auditory system for use in a conventional phoneme recognition task. In addition, the cortical response may be a place-coded data set where sounds are categorized according to the regions containing their most distinguishing features. This provides for a novel category-dependent feature selection apparatus and methods in which this mechanism may be utilized to better simulate robust human pattern (speech) recognition.Type: GrantFiled: November 29, 2007Date of Patent: February 19, 2013Assignee: Georgia Tech Research CorporationInventors: Woojay Jeon, Biing-Hwang Juang
-
Patent number: 8380505Abstract: A system is provided for recognizing speech for searching a database. The system receives speech input as a spoken search request and then processes the speech input in a speech recognition step using a vocabulary for recognizing the spoken request. By processing the speech input words recognized in the speech input and included in the vocabulary are obtained to form at least one hypothesis. The hypothesis is then utilized to search a database using the at least one hypothesis as a search query. A search result is then received from the database and provided to the user.Type: GrantFiled: October 24, 2008Date of Patent: February 19, 2013Assignee: Nuance Communications, Inc.Inventors: Lars König, Andreas Löw, Udo Haiber
-
Patent number: 8374873Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.Type: GrantFiled: August 11, 2009Date of Patent: February 12, 2013Assignee: Morphism, LLCInventor: James H. Stephens, Jr.
-
Patent number: 8374845Abstract: A word coinciding with a key word input by speech and a word related to the word are set as retrieval candidate words based on a word dictionary in which words representing formal names and aliases of the formal names are registered in association with a family attribute indicating a familiar relation among the words. Content related to any one of retrieval words selected out of the retrieval candidate words and a word related to the retrieval word is retrieved.Type: GrantFiled: February 29, 2008Date of Patent: February 12, 2013Assignee: Kabushiki Kaisha ToshibaInventors: Miwako Doi, Kaoru Suzuki, Toshiyuki Koga, Koichi Yamamoto
-
Patent number: 8374868Abstract: A method for recognizing speech involves reciting, into a speech recognition system, an utterance including a numeric sequence that contains a digit string including a plurality of tokens and detecting a co-articulation problem related to at least two potentially co-articulated tokens in the digit string. The numeric sequence may be identified using i) a dynamically generated possible numeric sequence that potentially corresponds with the numeric sequence, and/or ii) at least one supplemental acoustic model. Also disclosed herein is a system for accomplishing the same.Type: GrantFiled: August 21, 2009Date of Patent: February 12, 2013Assignee: General Motors LLCInventors: Uma Arun, Sherri J Voran-Nowak, Rathinavelu Chengalvarayan, Gaurav Talwar
-
Patent number: 8374869Abstract: An utterance verification method for an isolated word N-best speech recognition result includes: calculating log likelihoods of a context-dependent phoneme and an anti-phoneme model based on an N-best speech recognition result for an input utterance; measuring a confidence score of an N-best speech-recognized word using the log likelihoods; calculating distance between phonemes for the N-best speech-recognized word; comparing the confidence score with a threshold and the distance with a predetermined mean of distances; and accepting the N-best speech-recognized word when the compared results for the confidence score and the distance correspond to acceptance.Type: GrantFiled: August 4, 2009Date of Patent: February 12, 2013Assignee: Electronics and Telecommunications Research InstituteInventors: Jeom Ja Kang, Yunkeun Lee, Jeon Gue Park, Ho-Young Jung, Hyung-Bae Jeon, Hoon Chung, Sung Joo Lee, Euisok Chung, Ji Hyun Wang, Byung Ok Kang, Ki-young Park, Jong Jin Kim
-
Publication number: 20130035939Abstract: Disclosed herein is a method for speech recognition. The method includes receiving speech utterances, assigning a pronunciation weight to each unit of speech in the speech utterances, each respective pronunciation weight being normalized at a unit of speech level to sum to 1, for each received speech utterance, optimizing the pronunciation weight by identifying word and phone alignments and corresponding likelihood scores, and discriminatively adapting the pronunciation weight to minimize classification errors, and recognizing additional received speech utterances using the optimized pronunciation weights. A unit of speech can be a sentence, a word, a context-dependent phone, a context-independent phone, or a syllable. The method can further include discriminatively adapting pronunciation weights based on an objective function. The objective function can be maximum mutual information, maximum likelihood training, minimum classification error training, or other functions known to those of skill in the art.Type: ApplicationFiled: October 11, 2012Publication date: February 7, 2013Applicant: AT&T INTELLECTUAL PROPERTY I, L.P.Inventor: AT&T INTELLECTUAL PROPERTY I, L.P.
-
Patent number: 8369492Abstract: A method, apparatus, computer program product and service for directory dialer name recognition. The directory dialer has a directory of names and a first name grammar and a second name grammar representing phonetic baseforms of first names and second names respectively. The method includes: receiving voice data for a spoken name after requesting a user to speak the required name; extracting a set of phonetic baseforms for the voice data; and finding the best matches between the extracted set of phonetic baseforms voice data and any combination of the first name grammar and the second name grammar. The method can further include: checking the best match against the directory of names; if the best match does not exist in the directory, informing the user and prompting the next best match as an alternative; and if the best match does exist in the directory, forwarding the call to that best match.Type: GrantFiled: July 7, 2008Date of Patent: February 5, 2013Assignee: Nuance Communications, Inc.Inventors: Eric William Janke, Keith Sloan
-
Patent number: 8370144Abstract: A method for identifying end of voiced speech within an audio stream of a noisy environment employs a speech discriminator. The discriminator analyzes each window of the audio stream, producing an output corresponding to the window. The output is used to classify the window in one of several classes, for example, (1) speech, (2) silence, or (3) noise. A state machine processes the window classifications, incrementing counters as each window is classified: speech counter for speech windows, silence counter for silence, and noise counter for noise. If the speech counter indicates a predefined number of windows, the state machine clears all counters. Otherwise, the state machine appropriately weights the values in the silence and noise counters, adds the weighted values, and compares the sum to a limit imposed on the number of non-voice windows. When the non-voice limit is reached, the state machine terminates processing of the audio stream.Type: GrantFiled: June 3, 2010Date of Patent: February 5, 2013Assignee: Applied Voice & Speech Technologies, Inc.Inventor: Karl D. Gierach