Subportions Patents (Class 704/254)
-
Patent number: 8983839Abstract: The system and method described herein may dynamically generate a recognition grammar associated with a conversational voice user interface in an integrated voice navigation services environment. In particular, in response to receiving a natural language utterance that relates to a navigation context at the voice user interface, a conversational language processor may generate a dynamic recognition grammar that organizes grammar information based on one or more topological domains. For example, the one or more topological domains may be determined based on a current location associated with a navigation device, whereby a speech recognition engine may use the grammar information organized in the dynamic recognition grammar according to the one or more topological domains to generate one or more interpretations associated with the natural language utterance.Type: GrantFiled: November 30, 2012Date of Patent: March 17, 2015Assignee: VoiceBox Technologies CorporationInventors: Michael R. Kennewick, Catherine Cheung, Larry Baldwin, Ari Salomon, Michael Tjalve, Sheetal Guttigoli, Lynn Armstrong, Philippe Di Cristo, Bernie Zimmerman, Sam Menaker
-
Publication number: 20150073803Abstract: A portion of an audio signal is identified corresponding to a spoken word and its phonemes. A set of alternate spoken words satisfying phonetic similarity criteria to the spoken word is generated. A subset of the set of alternate spoken words is also identified; each member of the subset shares the same phoneme in a similar temporal position as the spoken word. A significance factor is then calculated for the phoneme based on the number of alternates in the subset and on the total number of alternates. The calculated significance factor may then be used to lengthen or shorten the temporal duration of the phoneme in the audio signal according to its significance in the spoken word.Type: ApplicationFiled: September 12, 2013Publication date: March 12, 2015Applicant: International Business Machines CorporationInventors: Flemming Boegelund, Lav R. Varshney
-
Patent number: 8977547Abstract: A voice recognition system includes: a voice input unit 11 for inputting a voice uttered a plurality of times; a registering voice data storage unit 12 for storing voice data uttered the plurality of times and input into the voice input unit 11; an utterance stability verification unit 13 for determining a similarity between the voice data uttered the plurality of times that are read from the registering voice data storage unit 12, and determining that registration of the voice data is acceptable when the similarity is greater than a threshold Tl; and a standard pattern creation unit 14 for creating a standard pattern by using the voice data where the utterance stability verification unit 13 determines that registration is acceptable.Type: GrantFiled: October 8, 2009Date of Patent: March 10, 2015Assignee: Mitsubishi Electric CorporationInventors: Michihiro Yamazaki, Jun Ishii, Hiroki Sakashita, Kazuyuki Nogi
-
Patent number: 8977539Abstract: A language analysis apparatus of the invention includes division rules, each of which is classified into one of levels according to the degree of risk of causing analysis accuracy problems when applied; a division point candidate generation unit 21 which, when a character string whose length is greater than the predetermined maximum input length is input, generates division point candidates for the input character string by applying the division rules sequentially one by one in the ascending order of the level of risk of causing problems; a division point adjustment unit 22 which, when the length of a division unit candidate obtained by the division point candidate generated by the division point candidate generation unit 21 is less than the maximum input length, selects a combination of division points from among the division point candidates obtained by applying division rules of the same level while ensuring that each division unit is not greater in length than the maximum input length; and a division unitType: GrantFiled: March 23, 2010Date of Patent: March 10, 2015Assignee: NEC CorporationInventors: Shinichi Ando, Kunihiko Sadamasa
-
Patent number: 8972260Abstract: In accordance with one embodiment, a method of generating language models for speech recognition includes identifying a plurality of utterances in training data corresponding to speech, generating a frequency count of each utterance in the plurality of utterances, generating a high-frequency plurality of utterances from the plurality of utterances having a frequency that exceeds a predetermined frequency threshold, generating a low-frequency plurality of utterances from the plurality of utterances having a frequency that is below the predetermined frequency threshold, generating a grammar-based language model using the high-frequency plurality of utterances as training data, and generating a statistical language model using the low-frequency plurality of utterances as training data.Type: GrantFiled: April 19, 2012Date of Patent: March 3, 2015Assignee: Robert Bosch GmbHInventors: Fuliang Weng, Zhe Feng, Kui Xu, Lin Zhao
-
Patent number: 8972432Abstract: Systems, methods, and apparatuses, including computer program products, are provided for machine translation using information retrieval techniques. In general, in one implementation, a method is provided. The method includes providing a received input segment as a query to a search engine, the search engine searching an index of one or more collections of documents, receiving one or more candidate segments in response to the query, determining a similarity of each candidate segment to the received input segment, and for one or more candidate segments having a determined similarity that exceeds a threshold similarity, providing a translated target segment corresponding to the respective candidate segment.Type: GrantFiled: April 23, 2008Date of Patent: March 3, 2015Assignee: Google Inc.Inventors: Hayden Shaw, Thorsten Brants
-
Patent number: 8972259Abstract: A method and system for teaching non-lexical speech effects includes delexicalizing a first speech segment to provide a first prosodic speech signal and data indicative of the first prosodic speech signal is stored in a computer memory. The first speech segment is audibly played to a language student and the student is prompted to recite the speech segment. The speech uttered by the student in response to the prompt, is recorded.Type: GrantFiled: September 9, 2010Date of Patent: March 3, 2015Assignee: Rosetta Stone, Ltd.Inventors: Joseph Tepperman, Theban Stanley, Kadri Hacioglu
-
Patent number: 8965766Abstract: Systems and methods for identifying music in a noisy environment are described. One of the methods includes receiving audio segment data. The audio segment data is generated from the portion that is captured in the noisy environment. The method further includes generating feature vectors from the audio segment data, identifying phonemes from the feature vectors, and comparing the identified phonemes with pre-assigned phoneme sequences. Each pre-assigned phoneme sequence identifies a known music piece. The method further includes determining an identity of the music based on the comparison.Type: GrantFiled: March 15, 2012Date of Patent: February 24, 2015Assignee: Google Inc.Inventors: Eugene Weinstein, Boulos Harb, Anaya Misra, Michael Dennis Riley, Pavel Golik, Alex Rudnick
-
Publication number: 20150046163Abstract: On a computing device a speech utterance is received from a user. The speech utterance is a section of a speech dialog that includes a plurality of speech utterances. One or more features from the speech utterance are identified. Each identified feature from the speech utterance is a specific characteristic of the speech utterance. One or more features from the speech dialog are identified. Each identified feature from the speech dialog is associated with one or more events in the speech dialog. The one or more events occur prior to the speech utterance. One or more identified features from the speech utterance and one or more identified features from the speech dialog are used to calculate a confidence score for the speech utterance.Type: ApplicationFiled: October 23, 2014Publication date: February 12, 2015Applicant: Microsoft CorporationInventors: Michael Levit, Bruce Melvin Buntschuh
-
Publication number: 20150039314Abstract: A method and system for speech recognition defined by using a microphone array that is directed to the face of a person speaking. Reading/scanning the output from the microphone array in order to determine which part of a face sound is emitting from. Using this information as input to a speech recognition system for improving speech recognition.Type: ApplicationFiled: December 20, 2011Publication date: February 5, 2015Applicant: SQUAREHEAD TECHNOLOGY ASInventor: Morgan Kjølerbakken
-
Patent number: 8949130Abstract: In embodiments of the present invention improved capabilities are described for a user interacting with a mobile communication facility, where speech presented by the user is recorded using a mobile communication facility resident capture facility. The recorded speech may be recognized using an external speech recognition facility to produce an external output and a resident speech recognition facility to produce an internal output, where at least one of the external output and the internal output may be selected based on a criteria.Type: GrantFiled: October 21, 2009Date of Patent: February 3, 2015Assignee: Vlingo CorporationInventor: Michael S. Phillips
-
Publication number: 20150032453Abstract: This invention relates generally to software and computers, and more specifically, to systems and methods for providing information discovery and retrieval. In one embodiment, the invention includes a system for providing information discovery and retrieval, the system including a processor module, the processor module configurable to performing the steps of receiving an information request from a consumer device over a communications network; decoding the information request; discovering information using the decoded information request; preparing instructions for accessing the information; and communicating the prepared instructions to the consumer device, wherein the consumer device is configurable to retrieving the information for presentation using the prepared instructions.Type: ApplicationFiled: October 13, 2014Publication date: January 29, 2015Inventor: W. Leo Hoarty
-
Patent number: 8942977Abstract: The present invention defines a pitch-synchronous parametrical representation of speech signals as the basis of speech recognition, and discloses methods of generating the said pitch-synchronous parametrical representation from speech signals. The speech signal is first going through a pitch-marks picking program to identify the pitch periods. The speech signal is then segmented into pitch-synchronous frames. An ends-matching program equalizes the values at the two ends of the waveform in each frame. Using Fourier analysis, the speech signal in each frame is converted into a pitch-synchronous amplitude spectrum. Using Laguerre functions, the said amplitude spectrum is converted into a unit vector, referred to as the timbre vector. By using a database of correlated phonemes and timbre vectors, the most likely phoneme sequence of an input speech signal can be decoded in the acoustic stage of a speech recognition system.Type: GrantFiled: March 17, 2014Date of Patent: January 27, 2015Inventor: Chengjun Julian Chen
-
Patent number: 8942981Abstract: A natural language call router forwards an incoming call from a caller to an appropriate destination. The call router has a speech recognition mechanism responsive to words spoken by a caller for producing recognized text corresponding to the spoken words. A robust parsing mechanism is responsive to the recognized text for detecting a class of words in the recognized text. The class is defined as a group of words having a common attribute. An interpreting mechanism is responsive to the detected class for determining the appropriate destination for routing the call.Type: GrantFiled: October 28, 2011Date of Patent: January 27, 2015Assignee: Cellco PartnershipInventors: Veronica Klein, Deborah Washington Brown
-
Publication number: 20150019225Abstract: A method for arbitrating spoken dialog results includes receiving a spoken utterance from a user within an environment; receiving first recognition results and a first confidence level associated with the spoken utterance from a first source; receiving second recognition results and a second confidence level associated with the spoken utterance from a second source; receiving human-machine-interface (HMI) information associated with the user; selecting between the first recognition results and the second recognition results based on at least one of the first confidence level, the second confidence level, and the HMI information.Type: ApplicationFiled: June 23, 2014Publication date: January 15, 2015Inventor: ROBERT D. SIMS, III
-
Publication number: 20150019226Abstract: A computerized information apparatus for providing information to a user of transport device. In one embodiment, the apparatus includes data processing apparatus, speech recognition and synthesis apparatus, and a network interface to enable voice-driven provision of information obtained both locally within the transport device and from a remote source such as a networked server. In one implementation, the information relates to one or more business entities in an area local to the transport device's location. Information can be both displayed and provided to the user audibly in another implementation.Type: ApplicationFiled: September 22, 2014Publication date: January 15, 2015Inventor: Robert F. Gazdzinski
-
Publication number: 20150006179Abstract: Systems, computer-implemented methods, and tangible computer-readable media for generating a pronunciation model. The method includes identifying a generic model of speech composed of phonemes, identifying a family of interchangeable phonemic alternatives for a phoneme in the generic model of speech, labeling the family of interchangeable phonemic alternatives as referring to the same phoneme, and generating a pronunciation model which substitutes each family for each respective phoneme. In one aspect, the generic model of speech is a vocal tract length normalized acoustic model. Interchangeable phonemic alternatives can represent a same phoneme for different dialectal classes. An interchangeable phonemic alternative can include a string of phonemes.Type: ApplicationFiled: September 17, 2014Publication date: January 1, 2015Inventors: Andrej LJOLJE, Alistair D. CONKIE, Ann K. SYRDAL
-
Publication number: 20150006178Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining pronunciations for particular terms. The methods, systems, and apparatus include actions of obtaining audio samples of speech corresponding to a particular term and obtaining candidate pronunciations for the particular term. Further actions include generating, for each candidate pronunciation for the particular term and audio sample of speech corresponding to the particular term, a score reflecting a level of similarity between of the candidate pronunciation and the audio sample. Additional actions include aggregating the scores for each candidate pronunciation and adding one or more candidate pronunciations for the particular term to a pronunciation lexicon based on the aggregated scores for the candidate pronunciations.Type: ApplicationFiled: June 28, 2013Publication date: January 1, 2015Inventors: Fuchun Peng, Francoise Beaufays, Brian Strope, Xin Lei, Pedro J. Moreno Mengibar, Trevor D. Strohman
-
Publication number: 20140379347Abstract: A system and method are provided for performing speech processing. A system includes an audio detection system configured to receive a signal including speech and a memory having stored therein a database of keyword models forming an ensemble of filters associated with each keyword in the database. A processor is configured to receive the signal including speech from the audio detection system, decompose the signal including speech into a sparse set of phonetic impulses, and access the database of keywords and convolve the sparse set of phonetic impulses with the ensemble of filters. The processor is further configured to identify keywords within the signal including speech based a result of the convolution and control operation the electronic system based on the keywords identified.Type: ApplicationFiled: June 25, 2013Publication date: December 25, 2014Inventors: Keith Kintzley, Aren Jansen, Hynek Hermansky, Kenneth Church
-
Publication number: 20140379348Abstract: There is provided a method and an apparatus for processing a disordered voice. A method for processing a disordered voice according to an exemplary embodiment of the present invention includes: receiving a voice signal; recognizing the voice signal by phoneme; extracting multiple voice components from the voice signal; acquiring restored voice components by processing at least some disordered voice components of the multiple voice components by phoneme; and synthesizing a restored voice signal based on at least the restored voice components.Type: ApplicationFiled: June 20, 2014Publication date: December 25, 2014Inventors: Myung Whun SUNG, Tack Kyun KWON, Hee Jin KIM, Wook Eun KIM, Woo Il KIM, Mee Young SUNG, Dong Wook KIM
-
Patent number: 8918408Abstract: A computing device maintains an input history in memory. This input history includes input strings that have been previously entered into the computing device. When the user begins entering characters of an input string, a predictive input engine is activated. The predictive input engine receives the input string and the input history to generate a candidate list of predictive inputs which are presented to the user. The user can select one of the inputs from the list, or otherwise continue entering characters. The computing device generates the candidate list by combining frequency and recency information of the matching strings from the input history. Additionally, the candidate list can be manipulated to present a variety of candidates. By using a combination of frequency, recency and variety, a favorable user experience is provided.Type: GrantFiled: August 24, 2012Date of Patent: December 23, 2014Assignee: Microsoft CorporationInventors: Katsutoshi Ohtsuki, Koji Watanabe
-
Publication number: 20140372121Abstract: A speech processing device includes a processor; and a memory which stores a plurality of instructions, which when executed by the processor, cause the processor to execute: obtaining input speech, detecting a vowel segment contained in the input speech, estimating an accent segment contained in the input speech, calculating a first vowel segment length containing the accent segment and a second vowel segment length excluding the accent segment, and controlling at least one of the first vowel segment length and the second vowel segment length.Type: ApplicationFiled: April 24, 2014Publication date: December 18, 2014Applicant: FUJITSU LIMITEDInventors: Taro TOGAWA, Chisato Shioda, Takeshi Otani
-
Patent number: 8913720Abstract: A method includes receiving a communication from a party at a voice response system and capturing verbal communication spoken by the party. Then a processor creates a voice model associated with the party, the voice model being created by processing the captured verbal communication spoken by the party. The creation of the voice model is imperceptible to the party. The voice model is then stored to provide voice verification of the party during a subsequent communication.Type: GrantFiled: February 14, 2013Date of Patent: December 16, 2014Assignee: AT&T Intellectual Property, L.P.Inventor: Mazin Gilbert
-
Patent number: 8914292Abstract: In embodiments of the present invention improved capabilities are described for a user interacting with a mobile communication facility, where speech presented by the user is recorded using a mobile communication facility resident capture facility. The recorded speech may be recognized using an external speech recognition facility to produce an external output and a resident speech recognition facility to produce an internal output, where at least one of the external output and the internal output may be selected based on a criteria.Type: GrantFiled: October 21, 2009Date of Patent: December 16, 2014Assignee: Vlingo CorporationInventor: Michael S. Phillips
-
Patent number: 8914278Abstract: A computer-assisted language correction system including spelling correction functionality, misused word correction functionality, grammar correction functionality and vocabulary enhancement functionality utilizing contextual feature-sequence functionality employing an internet corpus.Type: GrantFiled: July 31, 2008Date of Patent: December 16, 2014Assignee: Ginger Software, Inc.Inventors: Yael Karov Zangvil, Avner Zangvil
-
Patent number: 8909529Abstract: The invention concerns a method and corresponding system for building a phonotactic mode for domain independent speech recognition. The method may include recognizing phones from a user's input communication using a current phonotactic model, detecting morphemes (acoustic and/or non-acoustic) from the recognized phones, and outputting the detected morphemes for processing. The method also updates the phonotactic model with the detected morphemes and stores the new model in a database for use by the system during the next user interaction. The method may also include making task-type classification decisions based on the detected morphemes from the user's input communication.Type: GrantFiled: November 15, 2013Date of Patent: December 9, 2014Assignee: AT&T Intellectual Property II, L.P.Inventor: Giuseppe Riccardi
-
Publication number: 20140358543Abstract: According to one embodiment, a linked-work assistance apparatus includes an analysis unit, an identification unit and a control unit. The analysis unit analyzes a speech of each of users by using a keyword list, to acquire a speech analysis result indicating a relation between a first keyword and a classification of the first keyword, the keyword list indicating a list of keywords classified based on concepts of the keywords and intentions of the keywords. The identification unit identifies a role of each of the users according to the classification of the first keyword, to acquire a correspondence relation between each of the users and the role. The control unit, if the speech includes a name of the role, transmits the speech to other users which relate to the role corresponding to the name, by referring to the correspondence relation.Type: ApplicationFiled: March 6, 2014Publication date: December 4, 2014Applicant: KABUSHIKI KAISHA TOSHIBAInventors: Tetsuro CHINO, Kentaro TORRI, Naoshi UCHIHIRA
-
Publication number: 20140358544Abstract: Various embodiments contemplate systems and methods for performing automatic speech recognition (ASR) and natural language understanding (NLU) that enable high accuracy recognition and understanding of freely spoken utterances which may contain proper names and similar entities. The proper name entities may contain or be comprised wholly of words that are not present in the vocabularies of these systems as normally constituted. Recognition of the other words in the utterances in question—e.g., words that are not part of the proper name entities—may occur at regular, high recognition accuracy. Various embodiments provide as output not only accurately transcribed running text of the complete utterance, but also a symbolic representation of the meaning of the input, including appropriate symbolic representations of proper name entities, adequate to allow a computer system to respond appropriately to the spoken request without further analysis of the user's input.Type: ApplicationFiled: May 30, 2014Publication date: December 4, 2014Inventor: Harry William Printz
-
Patent number: 8898058Abstract: Systems, methods, apparatus, and machine-readable media for voice activity detection in a single-channel or multichannel audio signal are disclosed.Type: GrantFiled: October 24, 2011Date of Patent: November 25, 2014Assignee: QUALCOMM IncorporatedInventors: Jongwon Shin, Erik Visser, Ian Ernan Liu
-
Patent number: 8892420Abstract: Text processing includes: segmenting received text based on a lexicon of smallest semantic units to obtain medium-grained segmentation results; merging the medium-grained segmentation results to obtain coarse-grained segmentation results, the coarse-grained segmentation results having coarser granularity than the medium-grained segmentation results; looking up in the lexicon of smallest semantic units respective search elements that correspond to segments in the medium-grained segmentation results; and forming fine-grained segmentation results based on the respective search elements, the fine-grained segmentation results having finer granularity than the medium-grained segmentation results.Type: GrantFiled: November 17, 2011Date of Patent: November 18, 2014Assignee: Alibaba Group Holding LimitedInventors: Jian Sun, Lei Hou, Jing Ming Tang, Min Chu, Xiao Ling Liao, Bing Jing Xu, Ren Gang Peng, Yang Yang
-
Patent number: 8886534Abstract: A speech recognition apparatus includes a speech input unit that receives input speech, a phoneme recognition unit that recognizes phonemes of the input speech and generates a first phoneme sequence representing corrected speech, a matching unit that matches the first phoneme sequence with a second phoneme sequence representing original speech, and a phoneme correcting unit that corrects phonemes of the second phoneme sequence based on the matching result.Type: GrantFiled: January 27, 2011Date of Patent: November 11, 2014Assignee: Honda Motor Co., Ltd.Inventors: Mikio Nakano, Naoto Iwahashi, Kotaro Funakoshi, Taisuke Sumii
-
Patent number: 8880400Abstract: Voice recognition is realized by a pattern matching with a voice pattern model, and when a large number of paraphrased words are required for one facility, such as a name of a hotel or a tourist facility, the pattern matching needs to be performed with the voice pattern models of all the paraphrased words, resulting in an enormous amount of calculation. Further, it is difficult to generate all the paraphrased words, and a large amount of labor is required.Type: GrantFiled: January 27, 2010Date of Patent: November 4, 2014Assignee: Mitsubishi Electric CorporationInventors: Toshiyuki Hanazawa, Yohei Okato
-
Publication number: 20140324433Abstract: A method and a device for learning a language and a computer readable recording medium are provided. The method includes following steps. An input voice from a voice receiver is transformed into an input sentence according to a grammar rule. Whether the input sentence is the same as a learning sentence displayed on a display is determined. If the input sentence is different from the learning sentence, an ancillary information containing at least one error word in the input sentence that is different from the learning sentence is generated.Type: ApplicationFiled: February 13, 2014Publication date: October 30, 2014Applicant: Wistron CorporationInventor: Hsi-chun Hsiao
-
Publication number: 20140316782Abstract: Methods and systems are provided for managing speech dialog of a speech system. In one embodiment, a method includes: receiving a first utterance from a user of the speech system; determining a first list of possible results from the first utterance, wherein the first list includes at least two elements that each represent a possible result; analyzing the at least two elements of the first list to determine an ambiguity of the elements; and generating a speech prompt to the user based on partial orthography and the ambiguity.Type: ApplicationFiled: April 19, 2013Publication date: October 23, 2014Inventors: Eli TZIRKEL-HANCOCK, Gaurav TALWAR, Xufang ZHAO, Greg T. Lindemann
-
Publication number: 20140316785Abstract: A speech recognition system includes distributed processing across a client and server for recognizing a spoken query by a user. A number of different speech models for different languages are used to support and detect a language spoken by a user. In some implementations an interactive electronic agent responds in the user's language to facilitate a real-time, human like dialogue.Type: ApplicationFiled: April 18, 2014Publication date: October 23, 2014Applicant: Nuance Communications, Inc.Inventors: Ian M. Bennett, Bandi Ramesh Babu, Kishor Morkhandikar, Pallaki Gururaj
-
Patent number: 8868431Abstract: A recognition dictionary creation device identifies the language of a reading of an inputted text which is a target to be registered and adds a reading with phonemes in the language identified thereby to the target text to be registered, and also converts the reading of the target text to be registered from the phonemes in the language identified thereby to phonemes in a language to be recognized which is handled in voice recognition to create a recognition dictionary in which the converted reading of the target text to be registered is registered.Type: GrantFiled: February 5, 2010Date of Patent: October 21, 2014Assignee: Mitsubishi Electric CorporationInventors: Michihiro Yamazaki, Jun Ishii, Yasushi Ishikawa
-
Patent number: 8868427Abstract: Systems and methods for updating electronic calendar information. Speech is received from a user at a vehicle telematics unit (VTU), wherein the speech is representative of information related to a particular vehicle trip. The received speech is recorded in the VTU as a voice memo, and data associated with the voice memo is communicated from the VTU to a computer running a calendaring application. The data is associated with a field of the calendaring application, and stored in association with the calendaring application field.Type: GrantFiled: June 10, 2010Date of Patent: October 21, 2014Assignee: General Motors LLCInventor: Jeffrey P. Rysenga
-
Patent number: 8862470Abstract: Systems, computer-implemented methods, and tangible computer-readable media for generating a pronunciation model. The method includes identifying a generic model of speech composed of phonemes, identifying a family of interchangeable phonemic alternatives for a phoneme in the generic model of speech, labeling the family of interchangeable phonemic alternatives as referring to the same phoneme, and generating a pronunciation model which substitutes each family for each respective phoneme. In one aspect, the generic model of speech is a vocal tract length normalized acoustic model. Interchangeable phonemic alternatives can represent a same phoneme for different dialectal classes. An interchangeable phonemic alternative can include a string of phonemes.Type: GrantFiled: November 22, 2011Date of Patent: October 14, 2014Assignee: AT&T Intellectual Property I, L.P.Inventors: Andrej Ljolje, Alistair D. Conkie, Ann K. Syrdal
-
Patent number: 8856008Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.Type: GrantFiled: September 18, 2013Date of Patent: October 7, 2014Assignee: Morphism LLCInventor: James H. Stephens, Jr.
-
Publication number: 20140297277Abstract: Systems and methods are provided for scoring spoken language in multiparty conversations. A computer receives a conversation between an examinee and at least one interlocutor. The computer selects a portion of the conversation. The portion includes one or more examinee utterances and one or more interlocutor utterances. The computer assesses the portion using one or more metrics, such as: a pragmatic metric for measuring a pragmatic fit of the one or more examinee utterances; a speech act metric for measuring a speech act appropriateness of the one or more examinee utterances; a speech register metric for measuring a speech register appropriateness of the one or more examinee utterances; and an accommodation metric for measuring a level of accommodation of the one or more examinee utterances. The computer computes a final score for the portion of the conversation based on the one or more metrics applied.Type: ApplicationFiled: March 26, 2014Publication date: October 2, 2014Applicant: Educational Testing ServiceInventors: Klaus Zechner, Keelan Evanini
-
Publication number: 20140297282Abstract: An ontology stores information about a domain of an automatic speech recognition (ASR) application program. The ontology is augmented with information that enables subsequent automatic generation of a speech understanding grammar for use by the ASR application program. The information includes hints about how a human might talk about objects in the domain, such as preludes (phrases that introduce an identification of the object) and postludes (phrases that follow an identification of the object).Type: ApplicationFiled: March 28, 2013Publication date: October 2, 2014Applicant: Nuance Communications, Inc.Inventors: Stephen Douglas Peters, Réal Tremblay
-
Patent number: 8849662Abstract: A method and a system for segmenting phonemes from voice signals. A method for accurately segmenting phonemes, in which a histogram showing a peak distribution corresponding to an order is formed by using a high order concept, and a boundary indicating a starting point and an ending point of each phoneme is determined by calculating a peak statistic based on the histogram. The phoneme segmentation method can remarkably reduce an amount of calculation, and has an advantage of being applied to sound signal systems which perform sound coding, sound recognition, sound synthesizing, sound reinforcement, etc.Type: GrantFiled: December 28, 2006Date of Patent: September 30, 2014Assignee: Samsung Electronics Co., LtdInventor: Hyun-Soo Kim
-
Patent number: 8849667Abstract: A computer-implemented method, apparatus and computer program product. The computer-implemented method performed by a computerized device, comprising: transforming a hidden Markov model to qubits; transforming data into groups of qubits, the data being determined upon the hidden Markov model and features extracted from an audio signal, the data representing a likelihood observation matrix representing likelihood of phoneme and state combinations in an audio signal; applying a quantum search algorithm for finding a maximal value of the qubits; and transforming the maximal value of the qubits into a number, the number representing an entry in a delta array used in speech recognition.Type: GrantFiled: July 7, 2013Date of Patent: September 30, 2014Assignee: Novospeech Ltd.Inventor: Yossef Ben-Ezra
-
Patent number: 8849664Abstract: Methods, systems, and computer programs encoded on a computer storage medium for real-time acoustic adaptation using stability measures are disclosed. The methods include the actions of receiving a transcription of a first portion of a speech session, wherein the transcription of the first portion of the speech session is generated using a speaker adaptation profile. The actions further include receiving a stability measure for a segment of the transcription and determining that the stability measure for the segment satisfies a threshold. Additionally, the actions include triggering an update of the speaker adaptation profile using the segment, or using a portion of speech data that corresponds to the segment. And the actions include receiving a transcription of a second portion of the speech session, wherein the transcription of the second portion of the speech session is generated using the updated speaker adaptation profile.Type: GrantFiled: July 16, 2013Date of Patent: September 30, 2014Assignee: Google Inc.Inventors: Xin Lei, Petar Aleksic
-
Patent number: 8849666Abstract: Speech recognition processing captures phonemes of words in a spoken speech string and retrieves text of words corresponding to particular combinations of phonemes from a phoneme dictionary. A text-to-speech synthesizer then can produce and substitute a synthesized pronunciation of that word in the speech string. If the speech recognition processing fails to recognize a particular combination of phonemes of a word, as spoken, as may occur when a word is spoken with an accent or when the speaker has a speech impediment, the speaker is prompted to clarify the word by entry, as text, from a keyboard or the like for storage in the phoneme dictionary such that a synthesized pronunciation of the word can be played out when the initially unrecognized spoken word is again encountered in a speech string to improve intelligibility, particularly for conference calls.Type: GrantFiled: February 23, 2012Date of Patent: September 30, 2014Assignee: International Business Machines CorporationInventors: Peeyush Jaiswal, Burt Leo Vialpando, Fang Wang
-
Publication number: 20140288934Abstract: A conversational, natural language voice user interface may provide an integrated voice navigation services environment. The voice user interface may enable a user to make natural language requests relating to various navigation services, and further, may interact with the user in a cooperative, conversational dialogue to resolve the requests. Through dynamic awareness of context, available sources of information, domain knowledge, user behavior and preferences, and external systems and devices, among other things, the voice user interface may provide an integrated environment in which the user can speak conversationally, using natural language, to issue queries, commands, or other requests relating to the navigation services provided in the environment.Type: ApplicationFiled: May 5, 2014Publication date: September 25, 2014Applicant: VoiceBox Technologies CorporationInventors: Michael R. Kennewick, Catherine Cheung, Larry Baldwin, Ari Salomon, Michael Tjalve, Sheetal Guttigoli, Lynn Armstrong, Philippe Di Cristo, Bernie Zimmerman, Sam Menaker
-
Publication number: 20140288935Abstract: A combination and a method are provided. Automatic speech recognition is performed on a received utterance. A meaning of the utterance is determined based, at least in part, on the recognized speech. At least one query is formed based, at least in part, on the determined meaning of the utterance. The at least one query is sent to at least one searching mechanism to search for an address of at least one web page that satisfies the at least one query.Type: ApplicationFiled: June 9, 2014Publication date: September 25, 2014Inventors: Steven Hart LEWIS, Kenneth H. Rosen
-
Publication number: 20140278933Abstract: Methods, apparatus, systems and articles of manufacture are disclosed to measure audience engagement with media. An example method for measuring audience engagement with media presented in an environment is disclosed herein. The method includes identifying the media presented by a presentation device in the environment, and obtaining a keyword list associated with the media. The method also includes analyzing audio data captured in the environment for an utterance corresponding to a keyword of the keyword list, and incrementing an engagement counter when the utterance is detected.Type: ApplicationFiled: March 15, 2013Publication date: September 18, 2014Inventor: F. Gavin McMillan
-
Publication number: 20140278423Abstract: A device, system and method for audio transmission quality assessment that occurs during the transmission. A transmission channel such as the internet is used to transmit speech that is spoken by a human speaker, captured at a first end, and transmitted over the transmission channel for reproduction at a second end. The processors at each end of the transmission channel are configured to determine one or more characteristics of the speech such as phonemes. The phonemes are transmitted over a backchannel of the transmission channel to a processor that compares the speech characteristics that were determined at both ends of the call. The participants are notified of a transmission problem that has had an effect on the intelligibility of the speech that was reproduced at the far end if the comparison does not meet a predetermined quality metric.Type: ApplicationFiled: March 14, 2013Publication date: September 18, 2014Inventors: Michael James Dellisanti, Lee Zamir
-
Patent number: 8838449Abstract: This document describes word-dependent language models, as well as their creation and use. A word-dependent language model can permit a speech-recognition engine to accurately verify that a speech utterance matches a multi-word phrase. This is useful in many contexts, including those where one or more letters of the expected phrase are known to the speaker.Type: GrantFiled: December 23, 2010Date of Patent: September 16, 2014Assignee: Microsoft CorporationInventors: Yun-Cheng Ju, Ivan J. Tashev, Chad R. Heinemann