Subportions Patents (Class 704/254)
  • Patent number: 8983839
    Abstract: The system and method described herein may dynamically generate a recognition grammar associated with a conversational voice user interface in an integrated voice navigation services environment. In particular, in response to receiving a natural language utterance that relates to a navigation context at the voice user interface, a conversational language processor may generate a dynamic recognition grammar that organizes grammar information based on one or more topological domains. For example, the one or more topological domains may be determined based on a current location associated with a navigation device, whereby a speech recognition engine may use the grammar information organized in the dynamic recognition grammar according to the one or more topological domains to generate one or more interpretations associated with the natural language utterance.
    Type: Grant
    Filed: November 30, 2012
    Date of Patent: March 17, 2015
    Assignee: VoiceBox Technologies Corporation
    Inventors: Michael R. Kennewick, Catherine Cheung, Larry Baldwin, Ari Salomon, Michael Tjalve, Sheetal Guttigoli, Lynn Armstrong, Philippe Di Cristo, Bernie Zimmerman, Sam Menaker
  • Publication number: 20150073803
    Abstract: A portion of an audio signal is identified corresponding to a spoken word and its phonemes. A set of alternate spoken words satisfying phonetic similarity criteria to the spoken word is generated. A subset of the set of alternate spoken words is also identified; each member of the subset shares the same phoneme in a similar temporal position as the spoken word. A significance factor is then calculated for the phoneme based on the number of alternates in the subset and on the total number of alternates. The calculated significance factor may then be used to lengthen or shorten the temporal duration of the phoneme in the audio signal according to its significance in the spoken word.
    Type: Application
    Filed: September 12, 2013
    Publication date: March 12, 2015
    Applicant: International Business Machines Corporation
    Inventors: Flemming Boegelund, Lav R. Varshney
  • Patent number: 8977547
    Abstract: A voice recognition system includes: a voice input unit 11 for inputting a voice uttered a plurality of times; a registering voice data storage unit 12 for storing voice data uttered the plurality of times and input into the voice input unit 11; an utterance stability verification unit 13 for determining a similarity between the voice data uttered the plurality of times that are read from the registering voice data storage unit 12, and determining that registration of the voice data is acceptable when the similarity is greater than a threshold Tl; and a standard pattern creation unit 14 for creating a standard pattern by using the voice data where the utterance stability verification unit 13 determines that registration is acceptable.
    Type: Grant
    Filed: October 8, 2009
    Date of Patent: March 10, 2015
    Assignee: Mitsubishi Electric Corporation
    Inventors: Michihiro Yamazaki, Jun Ishii, Hiroki Sakashita, Kazuyuki Nogi
  • Patent number: 8977539
    Abstract: A language analysis apparatus of the invention includes division rules, each of which is classified into one of levels according to the degree of risk of causing analysis accuracy problems when applied; a division point candidate generation unit 21 which, when a character string whose length is greater than the predetermined maximum input length is input, generates division point candidates for the input character string by applying the division rules sequentially one by one in the ascending order of the level of risk of causing problems; a division point adjustment unit 22 which, when the length of a division unit candidate obtained by the division point candidate generated by the division point candidate generation unit 21 is less than the maximum input length, selects a combination of division points from among the division point candidates obtained by applying division rules of the same level while ensuring that each division unit is not greater in length than the maximum input length; and a division unit
    Type: Grant
    Filed: March 23, 2010
    Date of Patent: March 10, 2015
    Assignee: NEC Corporation
    Inventors: Shinichi Ando, Kunihiko Sadamasa
  • Patent number: 8972260
    Abstract: In accordance with one embodiment, a method of generating language models for speech recognition includes identifying a plurality of utterances in training data corresponding to speech, generating a frequency count of each utterance in the plurality of utterances, generating a high-frequency plurality of utterances from the plurality of utterances having a frequency that exceeds a predetermined frequency threshold, generating a low-frequency plurality of utterances from the plurality of utterances having a frequency that is below the predetermined frequency threshold, generating a grammar-based language model using the high-frequency plurality of utterances as training data, and generating a statistical language model using the low-frequency plurality of utterances as training data.
    Type: Grant
    Filed: April 19, 2012
    Date of Patent: March 3, 2015
    Assignee: Robert Bosch GmbH
    Inventors: Fuliang Weng, Zhe Feng, Kui Xu, Lin Zhao
  • Patent number: 8972432
    Abstract: Systems, methods, and apparatuses, including computer program products, are provided for machine translation using information retrieval techniques. In general, in one implementation, a method is provided. The method includes providing a received input segment as a query to a search engine, the search engine searching an index of one or more collections of documents, receiving one or more candidate segments in response to the query, determining a similarity of each candidate segment to the received input segment, and for one or more candidate segments having a determined similarity that exceeds a threshold similarity, providing a translated target segment corresponding to the respective candidate segment.
    Type: Grant
    Filed: April 23, 2008
    Date of Patent: March 3, 2015
    Assignee: Google Inc.
    Inventors: Hayden Shaw, Thorsten Brants
  • Patent number: 8972259
    Abstract: A method and system for teaching non-lexical speech effects includes delexicalizing a first speech segment to provide a first prosodic speech signal and data indicative of the first prosodic speech signal is stored in a computer memory. The first speech segment is audibly played to a language student and the student is prompted to recite the speech segment. The speech uttered by the student in response to the prompt, is recorded.
    Type: Grant
    Filed: September 9, 2010
    Date of Patent: March 3, 2015
    Assignee: Rosetta Stone, Ltd.
    Inventors: Joseph Tepperman, Theban Stanley, Kadri Hacioglu
  • Patent number: 8965766
    Abstract: Systems and methods for identifying music in a noisy environment are described. One of the methods includes receiving audio segment data. The audio segment data is generated from the portion that is captured in the noisy environment. The method further includes generating feature vectors from the audio segment data, identifying phonemes from the feature vectors, and comparing the identified phonemes with pre-assigned phoneme sequences. Each pre-assigned phoneme sequence identifies a known music piece. The method further includes determining an identity of the music based on the comparison.
    Type: Grant
    Filed: March 15, 2012
    Date of Patent: February 24, 2015
    Assignee: Google Inc.
    Inventors: Eugene Weinstein, Boulos Harb, Anaya Misra, Michael Dennis Riley, Pavel Golik, Alex Rudnick
  • Publication number: 20150046163
    Abstract: On a computing device a speech utterance is received from a user. The speech utterance is a section of a speech dialog that includes a plurality of speech utterances. One or more features from the speech utterance are identified. Each identified feature from the speech utterance is a specific characteristic of the speech utterance. One or more features from the speech dialog are identified. Each identified feature from the speech dialog is associated with one or more events in the speech dialog. The one or more events occur prior to the speech utterance. One or more identified features from the speech utterance and one or more identified features from the speech dialog are used to calculate a confidence score for the speech utterance.
    Type: Application
    Filed: October 23, 2014
    Publication date: February 12, 2015
    Applicant: Microsoft Corporation
    Inventors: Michael Levit, Bruce Melvin Buntschuh
  • Publication number: 20150039314
    Abstract: A method and system for speech recognition defined by using a microphone array that is directed to the face of a person speaking. Reading/scanning the output from the microphone array in order to determine which part of a face sound is emitting from. Using this information as input to a speech recognition system for improving speech recognition.
    Type: Application
    Filed: December 20, 2011
    Publication date: February 5, 2015
    Applicant: SQUAREHEAD TECHNOLOGY AS
    Inventor: Morgan Kjølerbakken
  • Patent number: 8949130
    Abstract: In embodiments of the present invention improved capabilities are described for a user interacting with a mobile communication facility, where speech presented by the user is recorded using a mobile communication facility resident capture facility. The recorded speech may be recognized using an external speech recognition facility to produce an external output and a resident speech recognition facility to produce an internal output, where at least one of the external output and the internal output may be selected based on a criteria.
    Type: Grant
    Filed: October 21, 2009
    Date of Patent: February 3, 2015
    Assignee: Vlingo Corporation
    Inventor: Michael S. Phillips
  • Publication number: 20150032453
    Abstract: This invention relates generally to software and computers, and more specifically, to systems and methods for providing information discovery and retrieval. In one embodiment, the invention includes a system for providing information discovery and retrieval, the system including a processor module, the processor module configurable to performing the steps of receiving an information request from a consumer device over a communications network; decoding the information request; discovering information using the decoded information request; preparing instructions for accessing the information; and communicating the prepared instructions to the consumer device, wherein the consumer device is configurable to retrieving the information for presentation using the prepared instructions.
    Type: Application
    Filed: October 13, 2014
    Publication date: January 29, 2015
    Inventor: W. Leo Hoarty
  • Patent number: 8942977
    Abstract: The present invention defines a pitch-synchronous parametrical representation of speech signals as the basis of speech recognition, and discloses methods of generating the said pitch-synchronous parametrical representation from speech signals. The speech signal is first going through a pitch-marks picking program to identify the pitch periods. The speech signal is then segmented into pitch-synchronous frames. An ends-matching program equalizes the values at the two ends of the waveform in each frame. Using Fourier analysis, the speech signal in each frame is converted into a pitch-synchronous amplitude spectrum. Using Laguerre functions, the said amplitude spectrum is converted into a unit vector, referred to as the timbre vector. By using a database of correlated phonemes and timbre vectors, the most likely phoneme sequence of an input speech signal can be decoded in the acoustic stage of a speech recognition system.
    Type: Grant
    Filed: March 17, 2014
    Date of Patent: January 27, 2015
    Inventor: Chengjun Julian Chen
  • Patent number: 8942981
    Abstract: A natural language call router forwards an incoming call from a caller to an appropriate destination. The call router has a speech recognition mechanism responsive to words spoken by a caller for producing recognized text corresponding to the spoken words. A robust parsing mechanism is responsive to the recognized text for detecting a class of words in the recognized text. The class is defined as a group of words having a common attribute. An interpreting mechanism is responsive to the detected class for determining the appropriate destination for routing the call.
    Type: Grant
    Filed: October 28, 2011
    Date of Patent: January 27, 2015
    Assignee: Cellco Partnership
    Inventors: Veronica Klein, Deborah Washington Brown
  • Publication number: 20150019225
    Abstract: A method for arbitrating spoken dialog results includes receiving a spoken utterance from a user within an environment; receiving first recognition results and a first confidence level associated with the spoken utterance from a first source; receiving second recognition results and a second confidence level associated with the spoken utterance from a second source; receiving human-machine-interface (HMI) information associated with the user; selecting between the first recognition results and the second recognition results based on at least one of the first confidence level, the second confidence level, and the HMI information.
    Type: Application
    Filed: June 23, 2014
    Publication date: January 15, 2015
    Inventor: ROBERT D. SIMS, III
  • Publication number: 20150019226
    Abstract: A computerized information apparatus for providing information to a user of transport device. In one embodiment, the apparatus includes data processing apparatus, speech recognition and synthesis apparatus, and a network interface to enable voice-driven provision of information obtained both locally within the transport device and from a remote source such as a networked server. In one implementation, the information relates to one or more business entities in an area local to the transport device's location. Information can be both displayed and provided to the user audibly in another implementation.
    Type: Application
    Filed: September 22, 2014
    Publication date: January 15, 2015
    Inventor: Robert F. Gazdzinski
  • Publication number: 20150006179
    Abstract: Systems, computer-implemented methods, and tangible computer-readable media for generating a pronunciation model. The method includes identifying a generic model of speech composed of phonemes, identifying a family of interchangeable phonemic alternatives for a phoneme in the generic model of speech, labeling the family of interchangeable phonemic alternatives as referring to the same phoneme, and generating a pronunciation model which substitutes each family for each respective phoneme. In one aspect, the generic model of speech is a vocal tract length normalized acoustic model. Interchangeable phonemic alternatives can represent a same phoneme for different dialectal classes. An interchangeable phonemic alternative can include a string of phonemes.
    Type: Application
    Filed: September 17, 2014
    Publication date: January 1, 2015
    Inventors: Andrej LJOLJE, Alistair D. CONKIE, Ann K. SYRDAL
  • Publication number: 20150006178
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining pronunciations for particular terms. The methods, systems, and apparatus include actions of obtaining audio samples of speech corresponding to a particular term and obtaining candidate pronunciations for the particular term. Further actions include generating, for each candidate pronunciation for the particular term and audio sample of speech corresponding to the particular term, a score reflecting a level of similarity between of the candidate pronunciation and the audio sample. Additional actions include aggregating the scores for each candidate pronunciation and adding one or more candidate pronunciations for the particular term to a pronunciation lexicon based on the aggregated scores for the candidate pronunciations.
    Type: Application
    Filed: June 28, 2013
    Publication date: January 1, 2015
    Inventors: Fuchun Peng, Francoise Beaufays, Brian Strope, Xin Lei, Pedro J. Moreno Mengibar, Trevor D. Strohman
  • Publication number: 20140379347
    Abstract: A system and method are provided for performing speech processing. A system includes an audio detection system configured to receive a signal including speech and a memory having stored therein a database of keyword models forming an ensemble of filters associated with each keyword in the database. A processor is configured to receive the signal including speech from the audio detection system, decompose the signal including speech into a sparse set of phonetic impulses, and access the database of keywords and convolve the sparse set of phonetic impulses with the ensemble of filters. The processor is further configured to identify keywords within the signal including speech based a result of the convolution and control operation the electronic system based on the keywords identified.
    Type: Application
    Filed: June 25, 2013
    Publication date: December 25, 2014
    Inventors: Keith Kintzley, Aren Jansen, Hynek Hermansky, Kenneth Church
  • Publication number: 20140379348
    Abstract: There is provided a method and an apparatus for processing a disordered voice. A method for processing a disordered voice according to an exemplary embodiment of the present invention includes: receiving a voice signal; recognizing the voice signal by phoneme; extracting multiple voice components from the voice signal; acquiring restored voice components by processing at least some disordered voice components of the multiple voice components by phoneme; and synthesizing a restored voice signal based on at least the restored voice components.
    Type: Application
    Filed: June 20, 2014
    Publication date: December 25, 2014
    Inventors: Myung Whun SUNG, Tack Kyun KWON, Hee Jin KIM, Wook Eun KIM, Woo Il KIM, Mee Young SUNG, Dong Wook KIM
  • Patent number: 8918408
    Abstract: A computing device maintains an input history in memory. This input history includes input strings that have been previously entered into the computing device. When the user begins entering characters of an input string, a predictive input engine is activated. The predictive input engine receives the input string and the input history to generate a candidate list of predictive inputs which are presented to the user. The user can select one of the inputs from the list, or otherwise continue entering characters. The computing device generates the candidate list by combining frequency and recency information of the matching strings from the input history. Additionally, the candidate list can be manipulated to present a variety of candidates. By using a combination of frequency, recency and variety, a favorable user experience is provided.
    Type: Grant
    Filed: August 24, 2012
    Date of Patent: December 23, 2014
    Assignee: Microsoft Corporation
    Inventors: Katsutoshi Ohtsuki, Koji Watanabe
  • Publication number: 20140372121
    Abstract: A speech processing device includes a processor; and a memory which stores a plurality of instructions, which when executed by the processor, cause the processor to execute: obtaining input speech, detecting a vowel segment contained in the input speech, estimating an accent segment contained in the input speech, calculating a first vowel segment length containing the accent segment and a second vowel segment length excluding the accent segment, and controlling at least one of the first vowel segment length and the second vowel segment length.
    Type: Application
    Filed: April 24, 2014
    Publication date: December 18, 2014
    Applicant: FUJITSU LIMITED
    Inventors: Taro TOGAWA, Chisato Shioda, Takeshi Otani
  • Patent number: 8913720
    Abstract: A method includes receiving a communication from a party at a voice response system and capturing verbal communication spoken by the party. Then a processor creates a voice model associated with the party, the voice model being created by processing the captured verbal communication spoken by the party. The creation of the voice model is imperceptible to the party. The voice model is then stored to provide voice verification of the party during a subsequent communication.
    Type: Grant
    Filed: February 14, 2013
    Date of Patent: December 16, 2014
    Assignee: AT&T Intellectual Property, L.P.
    Inventor: Mazin Gilbert
  • Patent number: 8914292
    Abstract: In embodiments of the present invention improved capabilities are described for a user interacting with a mobile communication facility, where speech presented by the user is recorded using a mobile communication facility resident capture facility. The recorded speech may be recognized using an external speech recognition facility to produce an external output and a resident speech recognition facility to produce an internal output, where at least one of the external output and the internal output may be selected based on a criteria.
    Type: Grant
    Filed: October 21, 2009
    Date of Patent: December 16, 2014
    Assignee: Vlingo Corporation
    Inventor: Michael S. Phillips
  • Patent number: 8914278
    Abstract: A computer-assisted language correction system including spelling correction functionality, misused word correction functionality, grammar correction functionality and vocabulary enhancement functionality utilizing contextual feature-sequence functionality employing an internet corpus.
    Type: Grant
    Filed: July 31, 2008
    Date of Patent: December 16, 2014
    Assignee: Ginger Software, Inc.
    Inventors: Yael Karov Zangvil, Avner Zangvil
  • Patent number: 8909529
    Abstract: The invention concerns a method and corresponding system for building a phonotactic mode for domain independent speech recognition. The method may include recognizing phones from a user's input communication using a current phonotactic model, detecting morphemes (acoustic and/or non-acoustic) from the recognized phones, and outputting the detected morphemes for processing. The method also updates the phonotactic model with the detected morphemes and stores the new model in a database for use by the system during the next user interaction. The method may also include making task-type classification decisions based on the detected morphemes from the user's input communication.
    Type: Grant
    Filed: November 15, 2013
    Date of Patent: December 9, 2014
    Assignee: AT&T Intellectual Property II, L.P.
    Inventor: Giuseppe Riccardi
  • Publication number: 20140358543
    Abstract: According to one embodiment, a linked-work assistance apparatus includes an analysis unit, an identification unit and a control unit. The analysis unit analyzes a speech of each of users by using a keyword list, to acquire a speech analysis result indicating a relation between a first keyword and a classification of the first keyword, the keyword list indicating a list of keywords classified based on concepts of the keywords and intentions of the keywords. The identification unit identifies a role of each of the users according to the classification of the first keyword, to acquire a correspondence relation between each of the users and the role. The control unit, if the speech includes a name of the role, transmits the speech to other users which relate to the role corresponding to the name, by referring to the correspondence relation.
    Type: Application
    Filed: March 6, 2014
    Publication date: December 4, 2014
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Tetsuro CHINO, Kentaro TORRI, Naoshi UCHIHIRA
  • Publication number: 20140358544
    Abstract: Various embodiments contemplate systems and methods for performing automatic speech recognition (ASR) and natural language understanding (NLU) that enable high accuracy recognition and understanding of freely spoken utterances which may contain proper names and similar entities. The proper name entities may contain or be comprised wholly of words that are not present in the vocabularies of these systems as normally constituted. Recognition of the other words in the utterances in question—e.g., words that are not part of the proper name entities—may occur at regular, high recognition accuracy. Various embodiments provide as output not only accurately transcribed running text of the complete utterance, but also a symbolic representation of the meaning of the input, including appropriate symbolic representations of proper name entities, adequate to allow a computer system to respond appropriately to the spoken request without further analysis of the user's input.
    Type: Application
    Filed: May 30, 2014
    Publication date: December 4, 2014
    Inventor: Harry William Printz
  • Patent number: 8898058
    Abstract: Systems, methods, apparatus, and machine-readable media for voice activity detection in a single-channel or multichannel audio signal are disclosed.
    Type: Grant
    Filed: October 24, 2011
    Date of Patent: November 25, 2014
    Assignee: QUALCOMM Incorporated
    Inventors: Jongwon Shin, Erik Visser, Ian Ernan Liu
  • Patent number: 8892420
    Abstract: Text processing includes: segmenting received text based on a lexicon of smallest semantic units to obtain medium-grained segmentation results; merging the medium-grained segmentation results to obtain coarse-grained segmentation results, the coarse-grained segmentation results having coarser granularity than the medium-grained segmentation results; looking up in the lexicon of smallest semantic units respective search elements that correspond to segments in the medium-grained segmentation results; and forming fine-grained segmentation results based on the respective search elements, the fine-grained segmentation results having finer granularity than the medium-grained segmentation results.
    Type: Grant
    Filed: November 17, 2011
    Date of Patent: November 18, 2014
    Assignee: Alibaba Group Holding Limited
    Inventors: Jian Sun, Lei Hou, Jing Ming Tang, Min Chu, Xiao Ling Liao, Bing Jing Xu, Ren Gang Peng, Yang Yang
  • Patent number: 8886534
    Abstract: A speech recognition apparatus includes a speech input unit that receives input speech, a phoneme recognition unit that recognizes phonemes of the input speech and generates a first phoneme sequence representing corrected speech, a matching unit that matches the first phoneme sequence with a second phoneme sequence representing original speech, and a phoneme correcting unit that corrects phonemes of the second phoneme sequence based on the matching result.
    Type: Grant
    Filed: January 27, 2011
    Date of Patent: November 11, 2014
    Assignee: Honda Motor Co., Ltd.
    Inventors: Mikio Nakano, Naoto Iwahashi, Kotaro Funakoshi, Taisuke Sumii
  • Patent number: 8880400
    Abstract: Voice recognition is realized by a pattern matching with a voice pattern model, and when a large number of paraphrased words are required for one facility, such as a name of a hotel or a tourist facility, the pattern matching needs to be performed with the voice pattern models of all the paraphrased words, resulting in an enormous amount of calculation. Further, it is difficult to generate all the paraphrased words, and a large amount of labor is required.
    Type: Grant
    Filed: January 27, 2010
    Date of Patent: November 4, 2014
    Assignee: Mitsubishi Electric Corporation
    Inventors: Toshiyuki Hanazawa, Yohei Okato
  • Publication number: 20140324433
    Abstract: A method and a device for learning a language and a computer readable recording medium are provided. The method includes following steps. An input voice from a voice receiver is transformed into an input sentence according to a grammar rule. Whether the input sentence is the same as a learning sentence displayed on a display is determined. If the input sentence is different from the learning sentence, an ancillary information containing at least one error word in the input sentence that is different from the learning sentence is generated.
    Type: Application
    Filed: February 13, 2014
    Publication date: October 30, 2014
    Applicant: Wistron Corporation
    Inventor: Hsi-chun Hsiao
  • Publication number: 20140316782
    Abstract: Methods and systems are provided for managing speech dialog of a speech system. In one embodiment, a method includes: receiving a first utterance from a user of the speech system; determining a first list of possible results from the first utterance, wherein the first list includes at least two elements that each represent a possible result; analyzing the at least two elements of the first list to determine an ambiguity of the elements; and generating a speech prompt to the user based on partial orthography and the ambiguity.
    Type: Application
    Filed: April 19, 2013
    Publication date: October 23, 2014
    Inventors: Eli TZIRKEL-HANCOCK, Gaurav TALWAR, Xufang ZHAO, Greg T. Lindemann
  • Publication number: 20140316785
    Abstract: A speech recognition system includes distributed processing across a client and server for recognizing a spoken query by a user. A number of different speech models for different languages are used to support and detect a language spoken by a user. In some implementations an interactive electronic agent responds in the user's language to facilitate a real-time, human like dialogue.
    Type: Application
    Filed: April 18, 2014
    Publication date: October 23, 2014
    Applicant: Nuance Communications, Inc.
    Inventors: Ian M. Bennett, Bandi Ramesh Babu, Kishor Morkhandikar, Pallaki Gururaj
  • Patent number: 8868431
    Abstract: A recognition dictionary creation device identifies the language of a reading of an inputted text which is a target to be registered and adds a reading with phonemes in the language identified thereby to the target text to be registered, and also converts the reading of the target text to be registered from the phonemes in the language identified thereby to phonemes in a language to be recognized which is handled in voice recognition to create a recognition dictionary in which the converted reading of the target text to be registered is registered.
    Type: Grant
    Filed: February 5, 2010
    Date of Patent: October 21, 2014
    Assignee: Mitsubishi Electric Corporation
    Inventors: Michihiro Yamazaki, Jun Ishii, Yasushi Ishikawa
  • Patent number: 8868427
    Abstract: Systems and methods for updating electronic calendar information. Speech is received from a user at a vehicle telematics unit (VTU), wherein the speech is representative of information related to a particular vehicle trip. The received speech is recorded in the VTU as a voice memo, and data associated with the voice memo is communicated from the VTU to a computer running a calendaring application. The data is associated with a field of the calendaring application, and stored in association with the calendaring application field.
    Type: Grant
    Filed: June 10, 2010
    Date of Patent: October 21, 2014
    Assignee: General Motors LLC
    Inventor: Jeffrey P. Rysenga
  • Patent number: 8862470
    Abstract: Systems, computer-implemented methods, and tangible computer-readable media for generating a pronunciation model. The method includes identifying a generic model of speech composed of phonemes, identifying a family of interchangeable phonemic alternatives for a phoneme in the generic model of speech, labeling the family of interchangeable phonemic alternatives as referring to the same phoneme, and generating a pronunciation model which substitutes each family for each respective phoneme. In one aspect, the generic model of speech is a vocal tract length normalized acoustic model. Interchangeable phonemic alternatives can represent a same phoneme for different dialectal classes. An interchangeable phonemic alternative can include a string of phonemes.
    Type: Grant
    Filed: November 22, 2011
    Date of Patent: October 14, 2014
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Andrej Ljolje, Alistair D. Conkie, Ann K. Syrdal
  • Patent number: 8856008
    Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.
    Type: Grant
    Filed: September 18, 2013
    Date of Patent: October 7, 2014
    Assignee: Morphism LLC
    Inventor: James H. Stephens, Jr.
  • Publication number: 20140297277
    Abstract: Systems and methods are provided for scoring spoken language in multiparty conversations. A computer receives a conversation between an examinee and at least one interlocutor. The computer selects a portion of the conversation. The portion includes one or more examinee utterances and one or more interlocutor utterances. The computer assesses the portion using one or more metrics, such as: a pragmatic metric for measuring a pragmatic fit of the one or more examinee utterances; a speech act metric for measuring a speech act appropriateness of the one or more examinee utterances; a speech register metric for measuring a speech register appropriateness of the one or more examinee utterances; and an accommodation metric for measuring a level of accommodation of the one or more examinee utterances. The computer computes a final score for the portion of the conversation based on the one or more metrics applied.
    Type: Application
    Filed: March 26, 2014
    Publication date: October 2, 2014
    Applicant: Educational Testing Service
    Inventors: Klaus Zechner, Keelan Evanini
  • Publication number: 20140297282
    Abstract: An ontology stores information about a domain of an automatic speech recognition (ASR) application program. The ontology is augmented with information that enables subsequent automatic generation of a speech understanding grammar for use by the ASR application program. The information includes hints about how a human might talk about objects in the domain, such as preludes (phrases that introduce an identification of the object) and postludes (phrases that follow an identification of the object).
    Type: Application
    Filed: March 28, 2013
    Publication date: October 2, 2014
    Applicant: Nuance Communications, Inc.
    Inventors: Stephen Douglas Peters, Réal Tremblay
  • Patent number: 8849662
    Abstract: A method and a system for segmenting phonemes from voice signals. A method for accurately segmenting phonemes, in which a histogram showing a peak distribution corresponding to an order is formed by using a high order concept, and a boundary indicating a starting point and an ending point of each phoneme is determined by calculating a peak statistic based on the histogram. The phoneme segmentation method can remarkably reduce an amount of calculation, and has an advantage of being applied to sound signal systems which perform sound coding, sound recognition, sound synthesizing, sound reinforcement, etc.
    Type: Grant
    Filed: December 28, 2006
    Date of Patent: September 30, 2014
    Assignee: Samsung Electronics Co., Ltd
    Inventor: Hyun-Soo Kim
  • Patent number: 8849667
    Abstract: A computer-implemented method, apparatus and computer program product. The computer-implemented method performed by a computerized device, comprising: transforming a hidden Markov model to qubits; transforming data into groups of qubits, the data being determined upon the hidden Markov model and features extracted from an audio signal, the data representing a likelihood observation matrix representing likelihood of phoneme and state combinations in an audio signal; applying a quantum search algorithm for finding a maximal value of the qubits; and transforming the maximal value of the qubits into a number, the number representing an entry in a delta array used in speech recognition.
    Type: Grant
    Filed: July 7, 2013
    Date of Patent: September 30, 2014
    Assignee: Novospeech Ltd.
    Inventor: Yossef Ben-Ezra
  • Patent number: 8849664
    Abstract: Methods, systems, and computer programs encoded on a computer storage medium for real-time acoustic adaptation using stability measures are disclosed. The methods include the actions of receiving a transcription of a first portion of a speech session, wherein the transcription of the first portion of the speech session is generated using a speaker adaptation profile. The actions further include receiving a stability measure for a segment of the transcription and determining that the stability measure for the segment satisfies a threshold. Additionally, the actions include triggering an update of the speaker adaptation profile using the segment, or using a portion of speech data that corresponds to the segment. And the actions include receiving a transcription of a second portion of the speech session, wherein the transcription of the second portion of the speech session is generated using the updated speaker adaptation profile.
    Type: Grant
    Filed: July 16, 2013
    Date of Patent: September 30, 2014
    Assignee: Google Inc.
    Inventors: Xin Lei, Petar Aleksic
  • Patent number: 8849666
    Abstract: Speech recognition processing captures phonemes of words in a spoken speech string and retrieves text of words corresponding to particular combinations of phonemes from a phoneme dictionary. A text-to-speech synthesizer then can produce and substitute a synthesized pronunciation of that word in the speech string. If the speech recognition processing fails to recognize a particular combination of phonemes of a word, as spoken, as may occur when a word is spoken with an accent or when the speaker has a speech impediment, the speaker is prompted to clarify the word by entry, as text, from a keyboard or the like for storage in the phoneme dictionary such that a synthesized pronunciation of the word can be played out when the initially unrecognized spoken word is again encountered in a speech string to improve intelligibility, particularly for conference calls.
    Type: Grant
    Filed: February 23, 2012
    Date of Patent: September 30, 2014
    Assignee: International Business Machines Corporation
    Inventors: Peeyush Jaiswal, Burt Leo Vialpando, Fang Wang
  • Publication number: 20140288934
    Abstract: A conversational, natural language voice user interface may provide an integrated voice navigation services environment. The voice user interface may enable a user to make natural language requests relating to various navigation services, and further, may interact with the user in a cooperative, conversational dialogue to resolve the requests. Through dynamic awareness of context, available sources of information, domain knowledge, user behavior and preferences, and external systems and devices, among other things, the voice user interface may provide an integrated environment in which the user can speak conversationally, using natural language, to issue queries, commands, or other requests relating to the navigation services provided in the environment.
    Type: Application
    Filed: May 5, 2014
    Publication date: September 25, 2014
    Applicant: VoiceBox Technologies Corporation
    Inventors: Michael R. Kennewick, Catherine Cheung, Larry Baldwin, Ari Salomon, Michael Tjalve, Sheetal Guttigoli, Lynn Armstrong, Philippe Di Cristo, Bernie Zimmerman, Sam Menaker
  • Publication number: 20140288935
    Abstract: A combination and a method are provided. Automatic speech recognition is performed on a received utterance. A meaning of the utterance is determined based, at least in part, on the recognized speech. At least one query is formed based, at least in part, on the determined meaning of the utterance. The at least one query is sent to at least one searching mechanism to search for an address of at least one web page that satisfies the at least one query.
    Type: Application
    Filed: June 9, 2014
    Publication date: September 25, 2014
    Inventors: Steven Hart LEWIS, Kenneth H. Rosen
  • Publication number: 20140278933
    Abstract: Methods, apparatus, systems and articles of manufacture are disclosed to measure audience engagement with media. An example method for measuring audience engagement with media presented in an environment is disclosed herein. The method includes identifying the media presented by a presentation device in the environment, and obtaining a keyword list associated with the media. The method also includes analyzing audio data captured in the environment for an utterance corresponding to a keyword of the keyword list, and incrementing an engagement counter when the utterance is detected.
    Type: Application
    Filed: March 15, 2013
    Publication date: September 18, 2014
    Inventor: F. Gavin McMillan
  • Publication number: 20140278423
    Abstract: A device, system and method for audio transmission quality assessment that occurs during the transmission. A transmission channel such as the internet is used to transmit speech that is spoken by a human speaker, captured at a first end, and transmitted over the transmission channel for reproduction at a second end. The processors at each end of the transmission channel are configured to determine one or more characteristics of the speech such as phonemes. The phonemes are transmitted over a backchannel of the transmission channel to a processor that compares the speech characteristics that were determined at both ends of the call. The participants are notified of a transmission problem that has had an effect on the intelligibility of the speech that was reproduced at the far end if the comparison does not meet a predetermined quality metric.
    Type: Application
    Filed: March 14, 2013
    Publication date: September 18, 2014
    Inventors: Michael James Dellisanti, Lee Zamir
  • Patent number: 8838449
    Abstract: This document describes word-dependent language models, as well as their creation and use. A word-dependent language model can permit a speech-recognition engine to accurately verify that a speech utterance matches a multi-word phrase. This is useful in many contexts, including those where one or more letters of the expected phrase are known to the speaker.
    Type: Grant
    Filed: December 23, 2010
    Date of Patent: September 16, 2014
    Assignee: Microsoft Corporation
    Inventors: Yun-Cheng Ju, Ivan J. Tashev, Chad R. Heinemann