Subportions Patents (Class 704/254)
  • Patent number: 8527272
    Abstract: A method and apparatus for aligning texts. The method includes acquiring a target text and a reference text and aligning the target text and the reference text at word level based on phoneme similarity. The method can be applied to automatically archiving a multimedia resource and a method of automatically searching a multimedia resource.
    Type: Grant
    Filed: August 27, 2010
    Date of Patent: September 3, 2013
    Assignee: International Business Machines Corporation
    Inventors: Yong Qin, Qin Shi, Zhiwei Shuang, Shi Lei Zhang, Jie Zhou
  • Publication number: 20130226583
    Abstract: A language identification system that includes a universal phoneme decoder (UPD) is described. The UPD contains a universal phoneme set representing both 1) all phonemes occurring in the set of two or more spoken languages, and 2) captures phoneme correspondences across languages, such that a set of unique phoneme patterns and probabilities are calculated in order to identify a most likely phoneme occurring each time in the audio files in the set of two or more potential languages in which the UPD was trained on. Each statistical language model (SLM) uses the set of unique phoneme patterns created for each language in the set to distinguish between spoken human languages in the set of languages. The run-time language identifier module identifies a particular human language being spoken by utilizing the linguistic probabilities supplied by the SLMs that are based on the set of unique phoneme patterns created for each language.
    Type: Application
    Filed: March 18, 2013
    Publication date: August 29, 2013
    Applicant: Autonomy Corporation Limited
    Inventor: Autonomy Corporation Limited
  • Patent number: 8521527
    Abstract: A computer-implemented system and method for processing audio in a voice response environment is provided. A database of host scripts each comprising signature files of audio phrases and actions to take when one of the audio phrases is recognized is maintained. The host scripts are loaded and a call to a voice mail server is initiated. Incoming audio buffers are received during the call from voice messages stored on the voice mail server. The incoming audio buffers are processed. A signature data structure is created for each audio buffer. The signature data structure is compared with signatures of expected phrases in the host scripts. The actions stored in the host scripts are executed when the signature data structure matches the signature of the expected phrase.
    Type: Grant
    Filed: September 10, 2012
    Date of Patent: August 27, 2013
    Assignee: Intellisist, Inc.
    Inventor: Martin R. M. Dunsmuir
  • Patent number: 8521529
    Abstract: An input signal is converted to a feature-space representation. The feature-space representation is projected onto a discriminant subspace using a linear discriminant analysis transform to enhance the separation of feature clusters. Dynamic programming is used to find global changes to derive optimal cluster boundaries. The cluster boundaries are used to identify the segments of the audio signal.
    Type: Grant
    Filed: April 18, 2005
    Date of Patent: August 27, 2013
    Assignee: Creative Technology Ltd
    Inventors: Michael M. Goodwin, Jean Laroche
  • Publication number: 20130218563
    Abstract: A speech recognition system includes a mobile device and a remote server. The mobile device receives the speech from the user and extracts the features and phonemes from the speech. Selected phonemes and measures of uncertainty are transmitted to the server, which processes the phonemes for speech understanding and transmits a text of the speech (or the context or understanding of the speech) back to the mobile device.
    Type: Application
    Filed: January 29, 2013
    Publication date: August 22, 2013
    Applicant: Intelligent Mechatronic Systems Inc.
    Inventor: Intelligent Mechatronic Systems Inc.
  • Patent number: 8515749
    Abstract: Systems and methods for facilitating communication including recognizing speech in a first language represented in a first audio signal; forming a first text representation of the speech; processing the first text representation to form data representing a second audio signal; and causing presentation of the second audio signal to a second user while responsive to an interrupt signal from a first user. In some embodiments, processing the first text representation includes translating the first text representation to a second text representation in a second language and processing the second text representation to form the data representing the second audio signal. In some embodiments include accepting an interrupt signal from the first user and interrupting the presentation of the second audio signal.
    Type: Grant
    Filed: May 20, 2009
    Date of Patent: August 20, 2013
    Assignee: Raytheon BBN Technologies Corp.
    Inventor: David G. Stallard
  • Patent number: 8515753
    Abstract: The example embodiment of the present invention provides an acoustic model adaptation method for enhancing recognition performance for a non-native speaker's speech. In order to adapt acoustic models, first, pronunciation variations are examined by analyzing a non-native speaker's speech. Thereafter, based on variation pronunciation of a non-native speaker's speech, acoustic models are adapted in a state-tying step during a training process of acoustic models. When the present invention for adapting acoustic models and a conventional acoustic model adaptation scheme are combined, more-enhanced recognition performance can be obtained. The example embodiment of the present invention enhances recognition performance for a non-native speaker's speech while reducing the degradation of recognition performance for a native speaker's speech.
    Type: Grant
    Filed: March 30, 2007
    Date of Patent: August 20, 2013
    Assignee: Gwangju Institute of Science and Technology
    Inventors: Hong Kook Kim, Yoo Rhee Oh, Jae Sam Yoon
  • Patent number: 8515750
    Abstract: Methods, systems, and computer programs encoded on a computer storage medium for real-time acoustic adaptation using stability measures are disclosed. The methods include the actions of receiving a transcription of a first portion of a speech session, wherein the transcription of the first portion of the speech session is generated using a speaker adaptation profile. The actions further include receiving a stability measure for a segment of the transcription and determining that the stability measure for the segment satisfies a threshold. Additionally, the actions include triggering an update of the speaker adaptation profile using the segment, or using a portion of speech data that corresponds to the segment. And the actions include receiving a transcription of a second portion of the speech session, wherein the transcription of the second portion of the speech session is generated using the updated speaker adaptation profile.
    Type: Grant
    Filed: September 19, 2012
    Date of Patent: August 20, 2013
    Assignee: Google Inc.
    Inventors: Xin Lei, Petar Aleksic
  • Patent number: 8510111
    Abstract: A speech recognition apparatus includes a generating unit generating a speech-feature vector expressing a feature for each of frames obtained by dividing an input speech, a storage unit storing a first acoustic model obtained by modeling a feature of each word by using a state transition model, a storage unit configured to store at least one second acoustic model, a calculation unit calculating, for each state, a first probability of transition to an at-end-frame state to obtain first probabilities, and select a maximum probability of the first probabilities, a selection unit selecting a maximum-probability-transition path, a conversion unit converting the maximum-probability-transition path into a corresponding-transition-path corresponding to the second acoustic model, a calculation unit calculating a second probability of transition to the at-end-frame state on the corresponding-transition-path, and a finding unit finding to which word the input speech corresponds based on the maximum probability and the s
    Type: Grant
    Filed: February 8, 2008
    Date of Patent: August 13, 2013
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Masaru Sakai, Hiroshi Fujimura, Shinichi Tanaka
  • Patent number: 8504367
    Abstract: Disclosed are a speech retrieval apparatus and a speech retrieval method for searching, in a speech database, for an audio file matching an input search term by using an acoustic model serialization code, a phonemic code, a sub-word unit, and a speech recognition result of speech. The speech retrieval apparatus comprises a first conversion device, a first division device, a first speech retrieval unit creation device, a second conversion device, a second division device, a second speech retrieval unit creation device, and a matching device. The speech retrieval method comprises a first conversion step, a first division step, a first speech retrieval unit creation step, a second conversion step, a second division step, a second speech retrieval unit creation step, and a matching step.
    Type: Grant
    Filed: August 31, 2010
    Date of Patent: August 6, 2013
    Assignee: Ricoh Company, Ltd.
    Inventors: Dafei Shi, Yaojie Lu, Yueyan Yin, Jichuan Zheng, Lijun Zhao
  • Patent number: 8498859
    Abstract: A language-processing system has an input for language in text or audio, as a message, an extractor operating to separate words and phrases from the input, to consult a knowledge base, and to assign a concept to individual ones of the words or phrases, and a connector operating to link the concepts to form a statement. In some cases there is a situation model updated as language is processed. The system may be used for controlling technical systems, such as robotic systems.
    Type: Grant
    Filed: November 12, 2003
    Date of Patent: July 30, 2013
    Inventor: Bernd Schönebeck
  • Patent number: 8498871
    Abstract: A system for facilitating free form dictation, including directed dictation and constrained recognition and/or structured transcription among users having heterogeneous protocols for generating, transcribing, and exchanging recognized and transcribed speech. The system includes a system transaction manager having a “system protocol,” to receive a speech information request from an authorized user. The speech information request is generated using a user interface capable of bi-directional communication with the system transaction manager and supporting dictation applications. A speech recognition and/or transcription engine (ASR), in communication with the systems transaction manager, receives the speech information request generates a transcribed response, and transmits the response to the system transaction manager. The system transaction manager routes the response to one or more of the users. In another embodiment, the system employs a virtual sound driver for streaming free form dictation to any ASR.
    Type: Grant
    Filed: May 24, 2011
    Date of Patent: July 30, 2013
    Assignee: Advanced Voice Recognition Systems, Inc.
    Inventors: Joseph H. Miglietta, Michael K. Davis
  • Publication number: 20130191129
    Abstract: System and method for performing speech recognition using acoustic invariant structure for large vocabulary continuous speech. An information processing device receives sound as input and performs speech recognition. The information processing device includes: a speech recognition processing unit for outputting a speech recognition score, a structure score calculation unit for calculation of a structure score that is a score that, with respect for each hypothesis concerning all phoneme pairs comprising the hypothesis, is found by applying phoneme pair-by-pair weighting to phoneme pair inter-distribution distance likelihood and then performing summation, and a ranking unit for ranking the multiple hypotheses based on a sum value of speech recognition score and structure score.
    Type: Application
    Filed: January 18, 2013
    Publication date: July 25, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: International Business Machines Corporation
  • Publication number: 20130191128
    Abstract: A continuous phonetic recognition method using semi-Markov model, a system for processing the method, and a recording medium for storing the method. In and embodiment of the phonetic recognition method of recognizing phones using a speech recognition system, a phonetic data recognition device receives speech, and a phonetic data processing device recognizes phones from the received speech using a semi-Markov model.
    Type: Application
    Filed: August 28, 2012
    Publication date: July 25, 2013
    Applicant: KOREA ADVANCED INSTITUTE OF SCIENCE AND TECHNOLOGY
    Inventors: Chang Dong Yoo, Sung Woong Kim
  • Patent number: 8494850
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for recognizing speech using a variable length of context. Speech data and data identifying a candidate transcription for the speech data are received. A phonetic representation for the candidate transcription is accessed. Multiple test sequences are extracted for a particular phone in the phonetic representation. Each of the multiple test sequences includes a different set of contextual phones surrounding the particular phone. Data indicating that an acoustic model includes data corresponding to one or more of the multiple test sequences is received. From among the one or more test sequences, the test sequence that includes the highest number of contextual phones is selected. A score for the candidate transcription is generated based on the data from the acoustic model that corresponds to the selected test sequence.
    Type: Grant
    Filed: June 29, 2012
    Date of Patent: July 23, 2013
    Assignee: Google Inc.
    Inventors: Ciprian I. Chelba, Peng Xu, Fernando Pereira
  • Publication number: 20130185073
    Abstract: The invention deals with speech recognition, such as a system for recognizing words in continuous speech. A speech recognition system is disclosed which is capable of recognizing a huge number of words, and in principle even an unlimited number of words. The speech recognition system comprises a word recognizer for deriving a best path through a word graph, and wherein words are assigned to the speech based on the best path. The word score being obtained from applying a phonemic language model to each word of the word graph. Moreover, the invention deals with an apparatus and a method for identifying words from a sound block and to computer readable code for implementing the method.
    Type: Application
    Filed: March 6, 2013
    Publication date: July 18, 2013
    Applicant: Nuance Communications Austria GmbH
    Inventor: Nuance Communications Austria GmbH
  • Patent number: 8489398
    Abstract: A method is performed by a communication device that is configured to communicate with a server over a network. The method includes outputting, to the server, speech data for spoken words; receiving, from the server, speech recognition candidates for a spoken word in the speech data; checking the speech recognition candidates against a database on the communication device; and selecting one or more of the speech recognition candidates for use by the communication device based on the checking.
    Type: Grant
    Filed: January 14, 2011
    Date of Patent: July 16, 2013
    Assignee: Google Inc.
    Inventor: Alexander H. Gruenstein
  • Publication number: 20130179169
    Abstract: A Chinese text readability assessing system analyzes and evaluates the readability of text data. A word segmentation module compares the text data with a corpus to obtain a plurality of word segments from the text data and provide part-of-speech settings corresponding to the word segments. A readability index analysis module analyzes the word segments and the part-of-speech settings based on readability indices to calculate index values of the readability indices in the text data. The index values are inputted to a readability mathematical model in a knowledge-evaluated training module, and the readability mathematical model produces a readability analysis result. Accordingly, the Chinese text readability assessing system of the present invention evaluates the readability of Chinese texts by word segmentation and the readability indices analysis in conjunction with the readability mathematical model.
    Type: Application
    Filed: July 5, 2012
    Publication date: July 11, 2013
    Applicant: NATIONAL TAIWAN NORMAL UNIVERSITY
    Inventors: Yao-Ting Sung, Ju-Ling Chen
  • Patent number: 8478597
    Abstract: The present disclosure presents a useful metric for assessing the relative difficulty which non-native speakers face in pronouncing a given utterance and a method and systems for using such a metric in the evaluation and assessment of the utterances of non-native speakers. In an embodiment, the metric may be based on both known sources of difficulty for language learners and a corpus-based measure of cross-language sound differences. The method may be applied to speakers who primarily speak a first language speaking utterances in any non-native second language.
    Type: Grant
    Filed: January 10, 2006
    Date of Patent: July 2, 2013
    Assignee: Educational Testing Service
    Inventors: Derrick Higgins, Klaus Zechner, Yoko Futagi, Rene Lawless
  • Publication number: 20130159000
    Abstract: The subject disclosure is directed towards training a classifier for spoken utterances without relying on human-assistance. The spoken utterances may be related to a voice menu program for which a speech comprehension component interprets the spoken utterances into voice menu options. The speech comprehension component provides confirmations to some of the spoken utterances in order to accurately assign a semantic label. For each spoken utterance with a denied confirmation, the speech comprehension component automatically generates a pseudo-semantic label that is consistent with the denied confirmation and selected from a set of potential semantic labels and updates a classification model associated with the classifier using the pseudo-semantic label.
    Type: Application
    Filed: December 15, 2011
    Publication date: June 20, 2013
    Applicant: MICROSOFT CORPORATION
    Inventors: Yun-Cheng Ju, James Garnet Droppo, III
  • Patent number: 8463609
    Abstract: In the present invention, a voice input system and a voice input method are provided. The voice input method includes the steps of: (A) initiating a speech recognition process by a first input associated with a first parameter of a first speech recognition subject; (B) providing a voice and a searching space constructed by a speech recognition model associated with the first speech recognition subject; (C) obtaining a sub-searching space from the searching space based on the first parameter; (D) searching at least one candidate item associated with the voice from the sub-searching space; and (E) showing the at least one candidate item.
    Type: Grant
    Filed: April 29, 2009
    Date of Patent: June 11, 2013
    Assignee: Delta Electronics Inc.
    Inventors: Keng-Hung Yeh, Liang-Sheng Huang, Chao-Jen Huang, Jia-Lin Shen
  • Patent number: 8457967
    Abstract: A procedure to automatically evaluate the spoken fluency of a speaker by prompting the speaker to talk on a given topic, recording the speaker's speech to get a recorded sample of speech, and then analyzing the patterns of disfluencies in the speech to compute a numerical score to quantify the spoken fluency skills of the speakers. The numerical fluency score accounts for various prosodic and lexical features, including formant-based filled-pause detection, closely-occurring exact and inexact repeat N-grams, normalized average distance between consecutive occurrences of N-grams. The lexical features and prosodic features are combined to classify the speaker with a C-class classification and develop a rating for the speaker.
    Type: Grant
    Filed: August 15, 2009
    Date of Patent: June 4, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Kartik Audhkhasi, Om D. Deshmukh, Kundan Kandhway, Ashish Verma
  • Publication number: 20130138441
    Abstract: Disclosed is a method of generating a search network for voice recognition, the method including: generating a pronunciation transduction weighted finite state transducer by implementing a pronunciation transduction rule representing a phenomenon of pronunciation transduction between recognition units as a weighted finite state transducer; and composing the pronunciation transduction weighted finite state transducer and one or more weighted finite state transducers.
    Type: Application
    Filed: August 14, 2012
    Publication date: May 30, 2013
    Applicant: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE
    Inventors: Seung Hi Kim, Dong Hyun Kim, Young Ik Kim, Jun Park, Hoon Young Cho, Sang Hun Kim
  • Publication number: 20130124205
    Abstract: A system allows a user to obtain information about television programming and to make selections of programming using conversational speech. The system includes a speech recognizer that recognizes spoken requests for television programming information. A speech synthesizer generates spoken responses to the spoken requests for television programming information. A user may use a voice user interface as well as a graphical user interface to interact with the system to facilitate the selection of programming choices.
    Type: Application
    Filed: January 3, 2013
    Publication date: May 16, 2013
    Inventor: Christopher H. Genly
  • Patent number: 8442827
    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for approximating an accent source. A system practicing the method collects data associated with customer specific services, generates country-specific or dialect-specific weights for each service in the customer specific services list, generates a summary weight based on an aggregation of the country-specific or dialect-specific weights, and sets an interactive voice response system language model based on the summary weight and the country-specific or dialect-specific weights. The interactive voice response system can also change the user interface based on the interactive voice response system language model. The interactive voice response system can tune a voice recognition algorithm based on the summary weight and the country-specific weights. The interactive voice response system can adjust phoneme matching in the language model based on a possibility that the speaker is using other languages.
    Type: Grant
    Filed: June 18, 2010
    Date of Patent: May 14, 2013
    Assignee: AT&T Intellectual Property I, L.P.
    Inventor: Nicholas Duffield
  • Patent number: 8438027
    Abstract: An object of the invention is to conveniently increase standard patterns registered in a voice recognition device to efficiently extend the amount of words that can be voice-recognized. New standard patterns are generated by modifying a part of an existing standard pattern. A pattern matching unit 16 of a modifying-part specifying unit 14 performs pattern matching process to specify a part to be modified in the existing standard pattern of a usage source. A standard pattern generating unit 18 generates the new standard patterns by cutting or deleting voice data of the modifying part of the usage-source standard pattern, substituting the voice data of the modifying part of the usage-source standard pattern for another voice data, or combining the voice data of the modifying part of the usage-source standard pattern with another voice data. A standard pattern database update unit 20 adds the new standard patterns to a standard pattern database 24.
    Type: Grant
    Filed: May 25, 2006
    Date of Patent: May 7, 2013
    Assignee: Panasonic Corporation
    Inventors: Toshiyuki Teranishi, Kouji Hatano
  • Patent number: 8438028
    Abstract: A method of and system for managing nametags including receiving a command from a user to store a nametag, prompting the user to input a number to be stored in association with the nametag, receiving an input for the number from the user, prompting the user to input the nametag to be stored in association with the number, receiving an input for the nametag from the user, processing the nametag input, and calculating confusability of the nametag input in multiple individual domains including a nametag domain, a number domain, and a command domain.
    Type: Grant
    Filed: May 18, 2010
    Date of Patent: May 7, 2013
    Assignee: General Motors LLC
    Inventors: Rathinavelu Chengalvarayan, Lawrence D. Cepuran
  • Patent number: 8433575
    Abstract: A system and method is described in which a multimedia story is rendered to a consumer in dependence on features extracted from an audio signal representing for example a musical selection of the consumer. Features such as key changes and tempo of the music selection are related to dramatic parameters defined by and associated with story arcs, narrative story rules and film or story structure. In one example a selection of a few music tracks provides input audio signals (602) from which musical features are extracted (604), following which a dramatic parameter list and timeline are generated (606). Media fragments are then obtained (608), the fragments having story content associated with the dramatic parameters, and the fragments output (610) with the music selection.
    Type: Grant
    Filed: December 10, 2003
    Date of Patent: April 30, 2013
    Assignee: AMBX UK Limited
    Inventors: David A. Eves, Richard S. Cole, Christopher Thorne
  • Patent number: 8433573
    Abstract: A prosody modification device includes: a real voice prosody input part that receives real voice prosody information extracted from an utterance of a human; a regular prosody generating part that generates regular prosody information having a regular phoneme boundary that determines a boundary between phonemes and a regular phoneme length of a phoneme by using data representing a regular or statistical phoneme length in an utterance of a human with respect to a section including at least a phoneme or a phoneme string to be modified in the real voice prosody information; and a real voice prosody modification part that resets a real voice phoneme boundary by using the generated regular prosody information so that the real voice phoneme boundary and a real voice phoneme length of the phoneme or the phoneme string to be modified in the real voice prosody information are approximate to an actual phoneme boundary and an actual phoneme length of the utterance of the human, thereby modifying the real voice prosody in
    Type: Grant
    Filed: February 11, 2008
    Date of Patent: April 30, 2013
    Assignee: Fujitsu Limited
    Inventors: Kentaro Murase, Nobuyuki Katae
  • Patent number: 8428949
    Abstract: An apparatus for classifying an input audio signal into audio contents of a first and second class, comprising an audio segmentation module adapted to segment said input audio signal into segments of a predetermined length; a feature computation module adapted to calculate for the segments features characterizing said audio input signal; a threshold comparison module adapted to generate a feature vector for each of said one or more segments based on a plurality of predetermined thresholds, the thresholds including for each of the audio contents of the first class and of the second class a substantially near certainty threshold, a substantially high certainty threshold, and a substantially low certainty threshold; and a classification module adapted to analyze the feature vector and classify each one of said one or more segments as audio contents of the first class, of the second class, or as non-decisive audio contents.
    Type: Grant
    Filed: June 30, 2009
    Date of Patent: April 23, 2013
    Assignee: Waves Audio Ltd.
    Inventors: Itai Neoran, Yizhar Lavner, Dima Ruinskiy
  • Patent number: 8428950
    Abstract: A speech recognition apparatus (110) selects an optimum recognition result from recognition results output from a set of speech recognizers (s1-sM) based on a majority decision. This decision is implemented with taking into account weight values, as to the set of the speech recognizers, learned by a learning apparatus (100). The learning apparatus includes a unit (103) selecting speech recognizers corresponding to characteristics of speech for learning (101), a unit (104) finding recognition results of the speech for learning by using the selected speech recognizers, a unit (105) unifying the recognition results and generating a word string network, and a unit (106) finding weight values concerning a set of the speech recognizers by implementing learning processing.
    Type: Grant
    Filed: January 18, 2008
    Date of Patent: April 23, 2013
    Assignee: NEC Corporation
    Inventors: Yoshifumi Onishi, Tadashi Emori
  • Patent number: 8417528
    Abstract: The invention deals with speech recognition, such as a system for recognizing words in continuous speech. A speech recognition system is disclosed which is capable of recognizing a huge number of words, and in principle even an unlimited number of words. The speech recognition system comprises a word recognizer for deriving a best path through a word graph, and wherein words are assigned to the speech based on the best path. The word score being obtained from applying a phonemic language model to each word of the word graph. Moreover, the invention deals with an apparatus and a method for identifying words from a sound block and to computer readable code for implementing the method.
    Type: Grant
    Filed: February 3, 2012
    Date of Patent: April 9, 2013
    Assignee: Nuance Communications Austria GmbH
    Inventor: Zsolt Saffer
  • Patent number: 8417527
    Abstract: A phonetic vocabulary for a speech recognition system is adapted to a particular speaker's pronunciation. A speaker can be attributed specific pronunciation styles, which can be identified from specific pronunciation examples. Consequently, a phonetic vocabulary can be reduced in size, which can improve recognition accuracy and recognition speed.
    Type: Grant
    Filed: October 13, 2011
    Date of Patent: April 9, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Nitendra Rajput, Ashish Verma
  • Publication number: 20130085757
    Abstract: An embodiment of an apparatus for speech recognition includes a plurality of trigger detection units, each of which is configured to detect a start trigger for recognizing a command utterance for controlling a device, a selection unit, utilizing a signal from one or more sensors embedded on the device, configured to select a selected trigger detection unit among the trigger detection units, the selected trigger detection unit being appropriate to a usage environment of the device, and a recognition unit configured to recognize the command utterance when the start trigger is detected by the selected trigger detection unit.
    Type: Application
    Filed: June 29, 2012
    Publication date: April 4, 2013
    Applicant: Kabushiki Kaisha Toshiba
    Inventors: Masanobu NAKAMURA, Akinori KAWAMURA
  • Patent number: 8412525
    Abstract: Embodiments for implementing a speech recognition system that includes a speech classifier ensemble are disclosed. In accordance with one embodiment, the speech recognition system includes a classifier ensemble to convert feature vectors that represent a speech vector into log probability sets. The classifier ensemble includes a plurality of classifiers. The speech recognition system includes a decoder ensemble to transform the log probability sets into output symbol sequences. The speech recognition system further includes a query component to retrieve one or more speech utterances from a speech database using the output symbol sequences.
    Type: Grant
    Filed: April 30, 2009
    Date of Patent: April 2, 2013
    Assignee: Microsoft Corporation
    Inventors: Kunal Mukerjee, Kazuhito Koishida, Shankar Regunathan
  • Patent number: 8407047
    Abstract: A guidance information display device includes: a voice input unit; a display unit for displaying guidance information; an operation unit for accepting an operation; and a processor capable of executing the following processes of: a voice recognition process operation of performing voice recognition based on inputted voice; a calculation operation of calculating an evaluation value for a recognition result of voice recognition by the voice recognition process operation; a display operation of reading out guidance information corresponding to the recognition result from a storage unit, which stores the guidance information, and displaying the guidance information at a display unit; and a decision operation of deciding a display mode of the guidance information at the display unit based on a variable value, which varies with an operation from the operation unit for the guidance information displayed by the display operation, and the evaluation value calculated by the calculation operation.
    Type: Grant
    Filed: March 31, 2009
    Date of Patent: March 26, 2013
    Assignee: Fujitsu Limited
    Inventor: Kenji Abe
  • Patent number: 8401861
    Abstract: A method for generating a frequency warping function comprising preparing the training speech of a source and a target speaker; performing frame alignment on the training speech of the speakers; selecting aligned frames from the frame-aligned training speech of the speakers; extracting corresponding sets of formant parameters from the selected aligned frames; and generating a frequency warping function based on the corresponding sets of formant parameters. The step of selecting aligned frames preferably selects a pair of aligned frames in the middle of the same or similar frame-aligned phonemes with the same or similar contexts in the speech of the source speaker and target speaker. The step of generating a frequency warping function preferably uses the various pairs of corresponding formant parameters in the corresponding sets of formant parameters as key positions in a piecewise linear frequency warping function to generate the frequency warping function.
    Type: Grant
    Filed: January 17, 2007
    Date of Patent: March 19, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Shuang Zhi Wei, Raimo Bakis, Ellen Marie Eide, Liqin Shen
  • Patent number: 8401847
    Abstract: An unknown word is additionally registered in a speech recognition dictionary by utilizing a correction result, and a new pronunciation of the word that has been registered in a speech recognition dictionary is additionally registered in the speech recognition dictionary, thereby increasing the accuracy of speech recognition. The start time and finish time of each phoneme unit in speech data corresponding to each phoneme included in a phoneme sequence acquired by a phoneme sequence converting section 13 are added to the phoneme sequence. A phoneme sequence extracting section 15 extracts from the phoneme sequence a phoneme sequence portion composed of phonemes existing in a segment corresponding to the period from the start time to the finish time of the word segment of the word corrected by a word correcting section 9 and the extracted phoneme sequence portion is determined as the pronunciation of the corrected word.
    Type: Grant
    Filed: November 30, 2007
    Date of Patent: March 19, 2013
    Assignee: National Institute of Advanced Industrial Science and Technology
    Inventors: Jun Ogata, Masataka Goto
  • Patent number: 8396714
    Abstract: Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized form text strings associated with media assets. A text string may be normalized and its native language determined for obtaining a target phoneme for providing human-sounding speech in a language (e.g., dialect or accent) that is familiar to a user. The algorithms may be implemented on a system including several dedicated render engines. The system may be part of a back end coupled to a front end including storage for media assets and associated synthesized speech, and a request processor for receiving and processing requests that result in providing the synthesized speech. The front end may communicate media assets and associated synthesized speech content over a network to host devices coupled to portable electronic devices on which the media assets and synthesized speech are played back.
    Type: Grant
    Filed: September 29, 2008
    Date of Patent: March 12, 2013
    Assignee: Apple Inc.
    Inventors: Matthew Rogers, Kim Silverman, Devang Naik, Benjamin Rottler
  • Publication number: 20130060572
    Abstract: In an aspect, in general, method for aligning an audio recording and a transcript includes receiving a transcript including a plurality of terms, each term of the plurality of terms associated with a time location within a different version of the audio recording, forming a plurality of search terms from the terms of the transcript, determining possible time locations of the search terms in the audio recording, determining a correspondence between time locations within the different version of the audio recording associated with the search terms and the possible time locations of the search terms in the audio recording, and aligning the audio recording and the transcript including updating the time location associated with terms of the transcript based on the determined correspondence.
    Type: Application
    Filed: September 4, 2012
    Publication date: March 7, 2013
    Applicant: Nexidia Inc.
    Inventors: Jacob B. Garland, Drew Lanham, Daryl Kip Watters, Marsal Gavalda, Mark Finlay, Kenneth K. Griggs
  • Patent number: 8386254
    Abstract: A method of speech recognition converts an unknown speech input into a stream of representative features. The feature stream is transformed based on speaker dependent adaptation of multi-class feature models. Then automatic speech recognition is used to compare the transformed feature stream to multi-class speaker independent acoustic models to generate an output representative of the unknown speech input.
    Type: Grant
    Filed: May 2, 2008
    Date of Patent: February 26, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Neeraj Deshmukh, Puming Zhan
  • Patent number: 8380506
    Abstract: Disclosed are apparatus and methods that employ a modified version of a computational model of the human peripheral and central auditory system, and that provide for automatic pattern recognition using category dependent feature selection. The validity of the output of the model is examined by deriving feature vectors from the dimension expanded cortical response of the central auditory system for use in a conventional phoneme recognition task. In addition, the cortical response may be a place-coded data set where sounds are categorized according to the regions containing their most distinguishing features. This provides for a novel category-dependent feature selection apparatus and methods in which this mechanism may be utilized to better simulate robust human pattern (speech) recognition.
    Type: Grant
    Filed: November 29, 2007
    Date of Patent: February 19, 2013
    Assignee: Georgia Tech Research Corporation
    Inventors: Woojay Jeon, Biing-Hwang Juang
  • Patent number: 8380505
    Abstract: A system is provided for recognizing speech for searching a database. The system receives speech input as a spoken search request and then processes the speech input in a speech recognition step using a vocabulary for recognizing the spoken request. By processing the speech input words recognized in the speech input and included in the vocabulary are obtained to form at least one hypothesis. The hypothesis is then utilized to search a database using the at least one hypothesis as a search query. A search result is then received from the database and provided to the user.
    Type: Grant
    Filed: October 24, 2008
    Date of Patent: February 19, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Lars König, Andreas Löw, Udo Haiber
  • Patent number: 8374873
    Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.
    Type: Grant
    Filed: August 11, 2009
    Date of Patent: February 12, 2013
    Assignee: Morphism, LLC
    Inventor: James H. Stephens, Jr.
  • Patent number: 8374845
    Abstract: A word coinciding with a key word input by speech and a word related to the word are set as retrieval candidate words based on a word dictionary in which words representing formal names and aliases of the formal names are registered in association with a family attribute indicating a familiar relation among the words. Content related to any one of retrieval words selected out of the retrieval candidate words and a word related to the retrieval word is retrieved.
    Type: Grant
    Filed: February 29, 2008
    Date of Patent: February 12, 2013
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Miwako Doi, Kaoru Suzuki, Toshiyuki Koga, Koichi Yamamoto
  • Patent number: 8374868
    Abstract: A method for recognizing speech involves reciting, into a speech recognition system, an utterance including a numeric sequence that contains a digit string including a plurality of tokens and detecting a co-articulation problem related to at least two potentially co-articulated tokens in the digit string. The numeric sequence may be identified using i) a dynamically generated possible numeric sequence that potentially corresponds with the numeric sequence, and/or ii) at least one supplemental acoustic model. Also disclosed herein is a system for accomplishing the same.
    Type: Grant
    Filed: August 21, 2009
    Date of Patent: February 12, 2013
    Assignee: General Motors LLC
    Inventors: Uma Arun, Sherri J Voran-Nowak, Rathinavelu Chengalvarayan, Gaurav Talwar
  • Patent number: 8374869
    Abstract: An utterance verification method for an isolated word N-best speech recognition result includes: calculating log likelihoods of a context-dependent phoneme and an anti-phoneme model based on an N-best speech recognition result for an input utterance; measuring a confidence score of an N-best speech-recognized word using the log likelihoods; calculating distance between phonemes for the N-best speech-recognized word; comparing the confidence score with a threshold and the distance with a predetermined mean of distances; and accepting the N-best speech-recognized word when the compared results for the confidence score and the distance correspond to acceptance.
    Type: Grant
    Filed: August 4, 2009
    Date of Patent: February 12, 2013
    Assignee: Electronics and Telecommunications Research Institute
    Inventors: Jeom Ja Kang, Yunkeun Lee, Jeon Gue Park, Ho-Young Jung, Hyung-Bae Jeon, Hoon Chung, Sung Joo Lee, Euisok Chung, Ji Hyun Wang, Byung Ok Kang, Ki-young Park, Jong Jin Kim
  • Publication number: 20130035939
    Abstract: Disclosed herein is a method for speech recognition. The method includes receiving speech utterances, assigning a pronunciation weight to each unit of speech in the speech utterances, each respective pronunciation weight being normalized at a unit of speech level to sum to 1, for each received speech utterance, optimizing the pronunciation weight by identifying word and phone alignments and corresponding likelihood scores, and discriminatively adapting the pronunciation weight to minimize classification errors, and recognizing additional received speech utterances using the optimized pronunciation weights. A unit of speech can be a sentence, a word, a context-dependent phone, a context-independent phone, or a syllable. The method can further include discriminatively adapting pronunciation weights based on an objective function. The objective function can be maximum mutual information, maximum likelihood training, minimum classification error training, or other functions known to those of skill in the art.
    Type: Application
    Filed: October 11, 2012
    Publication date: February 7, 2013
    Applicant: AT&T INTELLECTUAL PROPERTY I, L.P.
    Inventor: AT&T INTELLECTUAL PROPERTY I, L.P.
  • Patent number: 8369492
    Abstract: A method, apparatus, computer program product and service for directory dialer name recognition. The directory dialer has a directory of names and a first name grammar and a second name grammar representing phonetic baseforms of first names and second names respectively. The method includes: receiving voice data for a spoken name after requesting a user to speak the required name; extracting a set of phonetic baseforms for the voice data; and finding the best matches between the extracted set of phonetic baseforms voice data and any combination of the first name grammar and the second name grammar. The method can further include: checking the best match against the directory of names; if the best match does not exist in the directory, informing the user and prompting the next best match as an alternative; and if the best match does exist in the directory, forwarding the call to that best match.
    Type: Grant
    Filed: July 7, 2008
    Date of Patent: February 5, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Eric William Janke, Keith Sloan
  • Patent number: 8370144
    Abstract: A method for identifying end of voiced speech within an audio stream of a noisy environment employs a speech discriminator. The discriminator analyzes each window of the audio stream, producing an output corresponding to the window. The output is used to classify the window in one of several classes, for example, (1) speech, (2) silence, or (3) noise. A state machine processes the window classifications, incrementing counters as each window is classified: speech counter for speech windows, silence counter for silence, and noise counter for noise. If the speech counter indicates a predefined number of windows, the state machine clears all counters. Otherwise, the state machine appropriately weights the values in the silence and noise counters, adds the weighted values, and compares the sum to a limit imposed on the number of non-voice windows. When the non-voice limit is reached, the state machine terminates processing of the audio stream.
    Type: Grant
    Filed: June 3, 2010
    Date of Patent: February 5, 2013
    Assignee: Applied Voice & Speech Technologies, Inc.
    Inventor: Karl D. Gierach