Subportions Patents (Class 704/249)

Systems and Methods for Extracting Meaning from Multimodal Inputs Using Finite-State Devices

Publication number: 20130158998

Abstract: Multimodal utterances contain a number of different modes. These modes can include speech, gestures, and pen, haptic, and gaze inputs, and the like. This invention use recognition results from one or more of these modes to provide compensation to the recognition process of one or more other ones of these modes. In various exemplary embodiments, a multimodal recognition system inputs one or more recognition lattices from one or more of these modes, and generates one or more models to be used by one or more mode recognizers to recognize the one or more other modes. In one exemplary embodiment, a gesture recognizer inputs a gesture input and outputs a gesture recognition lattice to a multimodal parser. The multimodal parser generates a language model and outputs it to an automatic speech recognition system, which uses the received language model to recognize the speech input that corresponds to the recognized gesture input.

Type: Application

Filed: November 30, 2012

Publication date: June 20, 2013

Applicant: AT&T INTELLECTUAL PROPERTY II, L.P.

Inventor: AT&T INTELLECTUAL PROPERTY II, L.P.
System and Method for Targeted Tuning of a Speech Recognition System

Publication number: 20130151253

Abstract: A system and method of targeted tuning of a speech recognition system are disclosed. A particular method includes detecting that a frequency of occurrence of a particular type of utterance satisfies a threshold. The method further includes tuning a speech recognition system with respect to the particular type of utterance.

Type: Application

Filed: February 6, 2013

Publication date: June 13, 2013

Applicant: AT&T Intellectual Property I, L.P. (formerly known as SBC Knowledge Ventures, L.P.)

Inventor: AT&T Intellectual Property I, L.P. (formerly known as SBC Knowledge Ventures, L.P
VISUAL PRESENTATION OF SPEAKER-RELATED INFORMATION

Publication number: 20130144623

Abstract: Techniques for ability enhancement are described. Some embodiments provide an ability enhancement facilitator system (“AEFS”) configured to determine and present speaker-related information based on speaker utterances. In one embodiment, the AEFS receives data that represents an utterance of a speaker received by a hearing device of the user, such as a hearing aid, smart phone, media player/device, or the like. The AEFS identifies the speaker based on the received data, such as by performing speaker recognition. The AEFS determines speaker-related information associated with the identified speaker, such as by determining an identifier (e.g., name or title) of the speaker, by locating an information item (e.g., an email message, document) associated with the speaker, or the like. The AEFS then informs the user of the speaker-related information, such as by presenting the speaker-related information on a display of the hearing device or some other device accessible to the user.

Type: Application

Filed: December 13, 2011

Publication date: June 6, 2013

Inventors: Richard T. Lord, Robert W. Lord, Nathan P. Myhrvold, Clarence T. Tegreene, Roderick A. Hyde, Lowell L. Wood, JR., Muriel Y. Ishikawa, Victoria Y.H. Wood, Charles Whitmer, Paramvir Bahl, Douglas C. Burger, Ranveer Chandra, William H. Gates, III, Paul Holman, Jordin T. Kare, Craig J. Mundie, Tim Paek, Desney S. Tan, Lin Zhong, Matthew G. Dyor
ENHANCED VOICE CONFERENCING

Publication number: 20130144619

Abstract: Techniques for ability enhancement are described. Some embodiments provide an ability enhancement facilitator system (“AEFS”) configured to enhance voice conferencing among multiple speakers. In one embodiment, the AEFS receives data that represents utterances of multiple speakers who are engaging in a voice conference with one another. The AEFS then determines speaker-related information, such as by identifying a current speaker, locating an information item (e.g., an email message, document) associated with the speaker, or the like. The AEFS then informs a user of the speaker-related information, such as by presenting the speaker-related information on a display of a conferencing device associated with the user.

Type: Application

Filed: January 23, 2012

Publication date: June 6, 2013

Inventors: Richard T. Lord, Robert W. Lord, Nathan P. Myhrvold, Clarence T. Tegreene, Roderick A. Hyde, Lowell L. Wood, JR., Muriel Y. Ishikawa, Victoria Y.H. Wood, Charles Whitmer, Paramvir Bahl, Doughlas C. Burger, Ranveer Chandra, William H. Gates, III, Paul Holman, Jordin T. Kare, Craig J. Mundie, Tim Paek, Desney S. Tan, Lin Zhong, Matthew G. Dyor
Biomimetic voice identifier

Patent number: 8442825

Abstract: A device for voice identification including a receiver, a segmenter, a resolver, two advancers, a buffer, and a plurality of IIR resonator digital filters where each IIR filter comprises a set of memory locations or functional equivalent to hold filter specifications, a memory location or functional equivalent to hold the arithmetic reciprocal of the filter's gain, a five cell controller array, several multipliers, an adder, a subtractor, and a logical non-shift register. Each cell of the five cell controller array has five logical states, each acting as a five-position single-pole rotating switch that operates in unison with the four others. Additionally, the device also includes an artificial neural network and a display means.

Type: Grant

Filed: August 16, 2011

Date of Patent: May 14, 2013

Assignee: The United States of America as Represented by the Director, National Security Agency

Inventor: Michael Sinutko
SYSTEM AND METHOD FOR PROCESSING SPEECH RECOGNITION

Publication number: 20130090928

Abstract: An automatic speech recognition (ASR) system and method is provided for controlling the recognition of speech utterances generated by an end user operating a communications device. The ASR system and method can be used with a mobile device that is used in a communications network. The ASR system can be used for ASR of speech utterances input into a mobile device, to perform compensating techniques using at least one characteristic and for updating an ASR speech recognizer associated with the ASR system by determined and using a background noise value and a distortion value that is based on the features of the mobile device. The ASR system can be used to augment a limited data input capability of a mobile device, for example, caused by limited input devices physically located on the mobile device.

Type: Application

Filed: November 30, 2012

Publication date: April 11, 2013

Applicant: AT&T Intellectual Property II, L.P.

Inventor: AT&T Intellectual Property II, L.P.
PHONOLOGICALLY-BASED BIOMARKERS FOR MAJOR DEPRESSIVE DISORDER

Publication number: 20130090927

Abstract: A system and a method for assessing a condition in a subject. Phones from speech of the subject are recognized, one or more prosodic or speech-excitation-source features of the phones are extracted, and an assessment of a condition of the subject, is generated based on a correlation between the features of the phones and the condition.

Type: Application

Filed: July 30, 2012

Publication date: April 11, 2013

Applicant: Massachusetts Institute of Technology

Inventors: Thomas Francis Quatieri, Nicolas Malyska, Andrea Carolina Trevino
AUDIO ANALYSIS SYSTEM, AUDIO ANALYSIS APPARATUS, AUDIO ANALYSIS TERMINAL

Publication number: 20130080169

Abstract: An audio analysis system includes a terminal apparatus and a host system. The terminal apparatus acquires an audio signal of a sound containing utterances of a user and another person, discriminates between portions of the audio signal corresponding to the utterances of the user and the other person, detects an utterance feature based on the portion corresponding to the utterance of the user or the other person, and transmits utterance information including the discrimination and detection results to the host system. The host system detects a part corresponding to a conversation from the received utterance information, detects portions of the part of the utterance information corresponding to the user and the other person, compares a combination of plural utterance features corresponding to the portions of the part of the utterance information of the user and the other person with relation information to estimate an emotion, and outputs estimation information.

Type: Application

Filed: February 10, 2012

Publication date: March 28, 2013

Applicant: FUJI XEROX Co., Ltd.

Inventors: Haruo HARADA, Hirohito YONEYAMA, Kei SHIMOTANI, Yohei NISHINO, Kiyoshi IIDA, Takao NAITO
AUDIO ANALYSIS APPARATUS AND AUDIO ANALYSIS SYSTEM

Publication number: 20130080170

Abstract: An audio analysis apparatus includes the following components. A main body includes a discrimination unit and a transmission unit. A strap is used for hanging the main body from a user's neck. A first audio acquisition device is provided to the strap or the main body. A second audio acquisition device is provided to the strap at a position where a distance between the second audio acquisition device and the user's mouth is smaller than the distance between the first audio acquisition device and the user's in a state where the strap is worn around the user's neck. The discrimination unit discriminates whether an acquired sound is an uttered voice of the user or of another person by comparing audio signals of the sound acquired by the first and second audio acquisition devices. The transmission unit transmits information including the discrimination result to an external apparatus.

Type: Application

Filed: March 5, 2012

Publication date: March 28, 2013

Applicant: FUJI XEROX CO., LTD.

Inventors: Haruo HARADA, Hirohito YONEYAMA, Kei SHIMOTANI, Yohei NISHINO, Kiyoshi IIDA, Takao NAITO
Transparent voice registration and verification method and system

Patent number: 8406382

Abstract: A method includes registering a voice of a party in order to provide voice verification for communications with an entity. A call is received from a party at a voice response system. The party is prompted for information and verbal communication spoken by the party is captured. A voice model associated with the party is created by processing the captured verbal communication spoken by the party and is stored. The identity of the party is verified and a previously stored voice model of the party, registered during a previous call from the party, is updated. The creation of the voice model is imperceptible to the party.

Type: Grant

Filed: November 9, 2011

Date of Patent: March 26, 2013

Assignee: AT&T Intellectual Property I, L.P.

Inventor: Mazin Gilbert
Voice authentication system and method

Patent number: 8396711

Abstract: A user's voice is authenticated by prompting a user to say a challenge phrase from a list of predetermined phrases and comparing the user's response with a prerecorded version of the same response. The user's stored recordings are associated with an electronic identification or serial number for a specific device, so that when communication is established using the device, only the specific user may authenticate the session. When several phrases and recordings are used, one may be selected at random for authentication so that fraudulent authentication using a recording of the user's voice may be thwarted. The system and method may be used for authenticating a device when it is first activated, such as a telephony device, or may be used when authenticating a specific communications session.

Type: Grant

Filed: May 1, 2006

Date of Patent: March 12, 2013

Assignee: Microsoft Corporation

Inventors: Dawson Yee, Gurdeep S. Pall
Method and system for building a phonotactic model for domain independent speech recognition

Patent number: 8392188

Abstract: The invention concerns a method and corresponding system for building a phonotactic model for domain independent speech recognition. The method may include recognizing phones from a user's input communication using a current phonotactic model, detecting morphemes (acoustic and/or non-acoustic) from the recognized phones, and outputting the detected morphemes for processing. The method also updates the phonotactic model with the detected morphemes and stores the new model in a database for use by the system during the next user interaction. The method may also include making task-type classification decisions based on the detected morphemes from the user's input communication.

Type: Grant

Filed: September 21, 2001

Date of Patent: March 5, 2013

Assignee: AT&T Intellectual Property II, L.P.

Inventor: Giuseppe Riccardi
Speech recognition using speech characteristic probabilities

Patent number: 8392189

Abstract: A speech recognition module includes an acoustic front-end module, a sound detection module, and a word detection module. The acoustic front-end module generates a plurality of representations of frames from a digital audio signal and generates speech characteristic probabilities for the plurality of frames. The sound detection module determines a plurality of estimated utterances from the plurality of representations and the speech characteristic probabilities. The word detection module determines one or more words based on the plurality of estimated utterances and the speech characteristic probabilities.

Type: Grant

Filed: November 30, 2009

Date of Patent: March 5, 2013

Assignee: Broadcom Corporation

Inventor: Nambirajan Seshadri
Progressive application of knowledge sources in multistage speech recognition

Patent number: 8386251

Abstract: A speech recognition system is provided with iteratively refined multiple passes through the received data to enhance the accuracy of the results by introducing constraints and adaptation from initial passes into subsequent recognition operations. The multiple passes are performed on an initial utterance received from a user. The iteratively enhanced subsequent passes are also performed on following utterances received from the user increasing an overall system efficiency and accuracy.

Type: Grant

Filed: June 8, 2009

Date of Patent: February 26, 2013

Assignee: Microsoft Corporation

Inventors: Nikko Strom, Julian Odell, Jon Hamaker
Training and applying prosody models

Patent number: 8374873

Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.

Type: Grant

Filed: August 11, 2009

Date of Patent: February 12, 2013

Assignee: Morphism, LLC

Inventor: James H. Stephens, Jr.
SPEAKER VERIFICATION METHODS AND APPARATUS

Publication number: 20130030809

Abstract: One aspect includes determining validity of an identity asserted by a speaker using a voice print associated with a user whose identity the speaker is asserting, the voice print obtained from characteristic features of at least one first voice signal obtained from the user uttering at least one enrollment utterance including at least one enrollment word by obtaining a second voice signal of the speaker uttering at least one challenge utterance that includes at least one word not in the at least one enrollment utterance, obtaining at least one characteristic feature from the second voice signal, comparing the at least one characteristic feature with at least a portion of the voice print to determine a similarity between the at least one characteristic feature and the at least a portion of the voice print, and determining whether the speaker is the user based, at least in part, on the similarity.

Type: Application

Filed: September 14, 2012

Publication date: January 31, 2013

Applicant: Nuance Communications, Inc.

Inventors: Kevin R. Farrell, David A. James, William F. Ganong, III, Jerry K. Carter
Method and System for Bio-Metric Voice Print Authentication

Publication number: 20130018657

Abstract: A method (700) and system (900) for authenticating a user is provided. The method can include receiving one or more spoken utterances from a user (702), recognizing a phrase corresponding to one or more spoken utterances (704), identifying a biometric voice print of the user from one or more spoken utterances of the phrase (706), determining a device identifier associated with the device (708), and authenticating the user based on the phrase, the biometric voice print, and the device identifier (710). A location of the handset or the user can be employed as criteria for granting access to one or more resources (712).

Type: Application

Filed: September 13, 2012

Publication date: January 17, 2013

Applicant: Porticus Technology, Inc.

Inventors: Germano Di Mambro, Bernardas Salna
Systems and methods for extracting meaning from multimodal inputs using finite-state devices

Patent number: 8355916

Abstract: Multimodal utterances contain a number of different modes. These modes can include speech, gestures, and pen, haptic, and gaze inputs, and the like. This invention use recognition results from one or more of these modes to provide compensation to the recognition process of one or more other ones of these modes. In various exemplary embodiments, a multimodal recognition system inputs one or more recognition lattices from one or more of these modes, and generates one or more models to be used by one or more mode recognizers to recognize the one or more other modes. In one exemplary embodiment, a gesture recognizer inputs a gesture input and outputs a gesture recognition lattice to a multimodal parser. The multimodal parser generates a language model and outputs it to an automatic speech recognition system, which uses the received language model to recognize the speech input that corresponds to the recognized gesture input.

Type: Grant

Filed: May 31, 2012

Date of Patent: January 15, 2013

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Srinivas Bangalore, Michael J. Johnston
Speech recognition system and method for speech recognition

Patent number: 8346553

Abstract: A recognition result extraction unit and an agreement determination unit are provided. The recognition result extraction unit extracts, from a recognition result storage unit, N best solutions A and B obtained by an utterance B. The utterance B follows an utterance A corresponding to the N best solutions A and made by a speaker b who is different from a speaker of the utterance A. In a case where a repeat utterance determination unit determines that the N best solutions B are N best solutions obtained by a repeat utterance B according to the utterance A corresponding to the N best solutions A, when the best solution A and B are different each other, the agreement determination unit determines that some or all of the N best solutions A can be replaced with some or all of the N best solutions B.

Type: Grant

Filed: February 21, 2008

Date of Patent: January 1, 2013

Assignee: Fujitsu Limited

Inventor: Kenji Abe
System and method for low overhead time domain voice authentication

Patent number: 8326625

Abstract: A system and method are provided to authenticate a voice in a time domain. The initial rise time, initial fall time, second rise time, second fall time and final oscillation time are digitized into bits to form at least part of a voice ID. The voice IDs are used to authenticate a user's voice.

Type: Grant

Filed: November 10, 2009

Date of Patent: December 4, 2012

Assignee: Research In Motion Limited

Inventor: Sasan Adibi
SYSTEMS AND METHODS FOR EXTRACTING MEANING FROM MULTIMODAL INPUTS USING FINITE-STATE DEVICES

Publication number: 20120303370

Abstract: Multimodal utterances contain a number of different modes. These modes can include speech, gestures, and pen, haptic, and gaze inputs, and the like. This invention use recognition results from one or more of these modes to provide compensation to the recognition process of one or more other ones of these modes. In various exemplary embodiments, a multimodal recognition system inputs one or more recognition lattices from one or more of these modes, and generates one or more models to be used by one or more mode recognizers to recognize the one or more other modes. In one exemplary embodiment, a gesture recognizer inputs a gesture input and outputs a gesture recognition lattice to a multimodal parser. The multimodal parser generates a language model and outputs it to an automatic speech recognition system, which uses the received language model to recognize the speech input that corresponds to the recognized gesture input.

Type: Application

Filed: May 31, 2012

Publication date: November 29, 2012

Applicant: AT&T INTELLECTUAL PROPERTY II, L.P.

Inventors: Srinivas Bangalore, Michael J. Johnston
USER AUTHENTICATION BY COMBINING SPEAKER VERIFICATION AND REVERSE TURING TEST

Publication number: 20120296651

Abstract: Methods and system for authenticating a user are disclosed. The present invention includes accessing a collection of personal information related to the user. The present invention also includes performing an authentication operation that is based on the collection of personal information. The authentication operation incorporates at least one dynamic component and prompts the user to give an audible utterance. The audible utterance is compared to a stored voiceprint.

Type: Application

Filed: July 26, 2012

Publication date: November 22, 2012

Applicant: MICROSOFT CORPORATION

Inventor: Kuansan Wang
SPEECH PROCESSING SYSTEM AND METHOD

Publication number: 20120253811

Abstract: A method for identifying a plurality of speakers in audio data and for decoding the speech spoken by said speakers; the method comprising: receiving speech; dividing the speech into segments as it is received; processing the received speech segment by segment in the order received to identify the speaker and to decode the speech, processing comprising: performing primary decoding of the segment using an acoustic model and a language model; obtaining segment parameters indicating the differences between the speaker of the segment and a base speaker during the primary decoding; comparing the segment parameters with a plurality of stored speaker profiles to determine the identity of the speaker, and selecting a speaker profile for said speaker; updating the selected speaker profile; performing a further decoding of the segment using a speaker independent acoustic model, adapted using the updated speaker profile; outputting the decoded speech for the identified speaker, wherein the speaker profiles are upd

Type: Application

Filed: August 23, 2011

Publication date: October 4, 2012

Applicant: Kabushiki Kaisha Toshiba

Inventors: Catherine BRESLIN, Mark John Francis Gales, Kean Kheong Chin, Katherine Mary Knill
SPEECH DATA ANALYSIS DEVICE, SPEECH DATA ANALYSIS METHOD AND SPEECH DATA ANALYSIS PROGRAM

Publication number: 20120239400

Abstract: A speaker or a set of speakers can be recognized with high accuracy even when multiple speakers and a relationship between speakers change over time. A device comprises a speaker model derivation means for deriving a speaker model for defining a voice property per speaker from speech data made of multiple utterances to which speaker labels as information for identifying a speaker are given, a speaker co-occurrence model derivation means for, by use of the speaker model derived by the speaker model derivation means, deriving a speaker co-occurrence model indicating a strength of a co-occurrence relationship between the speakers from session data which is divided speech data in units of a series of conversation, and a model structure update means for, with reference to a session of newly-added speech data, detecting predefined events, and when the predefined event is detected, updating a structure of at least one of the speaker model and the speaker co-occurrence model.

Type: Application

Filed: October 21, 2010

Publication date: September 20, 2012

Applicant: NRC Corporation

Inventor: Takafumi Koshinaka
Quantizing feature vectors in decision-making applications

Patent number: 8271278

Abstract: A system, method and computer program product for classification of an analog electrical signal using statistical models of training data. A technique is described to quantize the analog electrical signal in a manner which maximizes the compression of the signal while simultaneously minimizing the diminution in the ability to classify the compressed signal. These goals are achieved by utilizing a quantizer designed to minimize the loss in a power of the log-likelihood ratio. A further technique is described to enhance the quantization process by optimally allocating a number of bits for each dimension of the quantized feature vector subject to a maximum number of bits available across all dimensions.

Type: Grant

Filed: April 3, 2010

Date of Patent: September 18, 2012

Assignee: International Business Machines Corporation

Inventors: Upendra V. Chaudhari, Hsin I. Tseng, Deepak S. Turaga, Olivier Verscheure
System and method for identifying audio command prompts for use in a voice response environment

Patent number: 8265932

Abstract: A system and method for identifying audio command prompts for use in a voice response environment is provided. A signature is generated for audio samples each having preceding audio, reference phrase audio, and trailing audio segments. The trailing segment is removed and each of the preceding and reference phrase segments are divided into buffers. The buffers are transformed into discrete fourier transform buffers. One of the discrete fourier transform buffers from the reference phrase segment that is dissimilar to each of the discrete fourier transform buffers from the preceding segment is selected as the signature. Audio command prompts are processed to generate a discrete fourier transform. Each discrete fourier transform for the audio command prompts is compared with each of the signatures and a correlation value is determined. One such audio command prompt matches one such signature when the correlation value for that audio command prompt satisfies a threshold.

Type: Grant

Filed: October 3, 2011

Date of Patent: September 11, 2012

Assignee: Intellisist, Inc.

Inventor: Martin R. M. Dunsmuir
Method for detecting statuses of components of semiconductor equipment and associated apparatus

Patent number: 8255178

Abstract: An apparatus for detecting an operational status of a semiconductor equipment includes an audio frequency signal receiving unit and an analysis and determination unit. The audio frequency signal receiving unit is used for receiving an audio frequency signal from the semiconductor equipment while the semiconductor equipment is working. The analysis and determination unit is used for analyzing the audio frequency signal to determine statuses of components of the semiconductor equipment.

Type: Grant

Filed: January 5, 2010

Date of Patent: August 28, 2012

Assignee: Inotera Memories, Inc.

Inventors: Yu-Chang Huang, Chia-Wei Fan
Sound Recognition Operation Apparatus and Sound Recognition Operation Method

Publication number: 20120215537

Abstract: According to one embodiment, a sound recognition operation apparatus includes a sound detection module, a keyword detection module, an audio mute module, and a transmission module. The sound detection module is configured to detect sound. The keyword detection module is configured to detect a particular keyword using voice recognition when the sound detection module detects sound. The audio mute module is configured to transmit an operation signal for muting audio sound when the keyword detection module detects the keyword. The transmission module is configured to recognize the voice command after the keyword is detected by the keyword detection module, and transmit an operation signal corresponding to the voice command.

Type: Application

Filed: September 21, 2011

Publication date: August 23, 2012

Inventor: Yoshihiro Igarashi
USER-SPECIFIC CONFIDENCE THRESHOLDS FOR SPEECH RECOGNITION

Publication number: 20120209609

Abstract: A method of automatic speech recognition includes receiving an utterance from a user via a microphone that converts the utterance into a speech signal, pre-processing the speech signal using a processor to extract acoustic data from the received speech signal, and identifying at least one user-specific characteristic in response to the extracted acoustic data. The method also includes determining a user-specific confidence threshold responsive to the at least one user-specific characteristic, and using the user-specific confidence threshold to recognize the utterance received from the user and/or to assess confusability of the utterance with stored vocabulary.

Type: Application

Filed: February 14, 2011

Publication date: August 16, 2012

Applicant: GENERAL MOTORS LLC

Inventors: Xufang Zhao, Gaurav Talwar
Method of retaining a media stream without its private audio content

Patent number: 8244531

Abstract: A method is disclosed that enables the handling of audio streams for segments in the audio that might contain private information, in a way that is more straightforward than in some techniques in the prior art. The data-processing system of the illustrative embodiment receives a media stream that comprises an audio stream, possibly in addition to other types of media such as video. The audio stream comprises audio content, some of which can be private in nature. Once it receives the data, the data-processing system then analyzes the audio stream for private audio content by using one or more techniques that involve looking for private information as well as non-private information. As a result of the analysis, the data-processing system omits the private audio content from the resulting stream that contains the processed audio.

Type: Grant

Filed: September 28, 2008

Date of Patent: August 14, 2012

Assignee: Avaya Inc.

Inventors: George William Erhart, Valentine C. Matula, David Joseph Skiba, Lawrence O'Gorman
MAPPING OBSTRUENT SPEECH ENERGY TO LOWER FREQUENCIES

Publication number: 20120197643

Abstract: A speech signal processing system and method which uses the following steps: (a) receiving an utterance from a user via a microphone that converts the utterance into a speech signal; and (b) pre-processing the speech signal using a processor. The pre-processing step includes extracting acoustic data from the received speech signal, determining from the acoustic data whether the utterance includes one or more obstruents; estimating speech energy from higher frequencies associated with the identified obstruents, and mapping the estimated speech energy to lower frequencies.

Type: Application

Filed: January 27, 2011

Publication date: August 2, 2012

Applicant: GENERAL MOTORS LLC

Inventors: Gaurav Talwar, Rathinavelu Chengalvarayan
USER INTENTION BASED ON N-BEST LIST OF RECOGNITION HYPOTHESES FOR UTTERANCES IN A DIALOG

Publication number: 20120179467

Abstract: Disclosed herein are systems, computer-implemented methods, and tangible computer-readable media for using alternate recognition hypotheses to improve whole-dialog understanding accuracy. The method includes receiving an utterance as part of a user dialog, generating an N-best list of recognition hypotheses for the user dialog turn, selecting an underlying user intention based on a belief distribution across the generated N-best list and at least one contextually similar N-best list, and responding to the user based on the selected underlying user intention. Selecting an intention can further be based on confidence scores associated with recognition hypotheses in the generated N-best lists, and also on the probability of a user's action given their underlying intention. A belief or cumulative confidence score can be assigned to each inferred user intention.

Type: Application

Filed: March 20, 2012

Publication date: July 12, 2012

Applicant: AT&T Intellectual Property I, L. P.

Inventor: Jason Williams
Systems and methods for extracting meaning from multimodal inputs using finite-state devices

Patent number: 8214212

Abstract: Multimodal utterances contain a number of different modes. These modes can include speech, gestures, and pen, haptic, and gaze inputs, and the like. This invention use recognition results from one or more of these modes to provide compensation to the recognition process of one or more other ones of these modes. In various exemplary embodiments, a multimodal recognition system inputs one or more recognition lattices from one or more of these modes, and generates one or more models to be used by one or more mode recognizers to recognize the one or more other modes. In one exemplary embodiment, a gesture recognizer inputs a gesture input and outputs a gesture recognition lattice to a multimodal parser. The multimodal parser generates a language model and outputs it to an automatic speech recognition system, which uses the received language model to recognize the speech input that corresponds to the recognized gesture input.

Type: Grant

Filed: November 8, 2011

Date of Patent: July 3, 2012

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Srinivas Bangalore, Michael J. Johnston
Synchronization of an input text of a speech with a recording of the speech

Patent number: 8209169

Abstract: A method and system for synchronizing words in an input text of a speech with a continuous recording of the speech. A received input text includes previously recorded content of the speech to be reproduced. A synthetic speech corresponding to the received input text is generated. Ratio data including a ratio between the respective pronunciation times of words included in the received text in the generated synthetic speech is computed. The ratio data is used to determine an association between erroneously recognized words of the received text and a time to reproduce each erroneously recognized word. The association is outputted in a recording medium and/or displayed on a display device.

Type: Grant

Filed: October 24, 2011

Date of Patent: June 26, 2012

Assignee: Nuance Communications, Inc.

Inventors: Noriko Imoto, Tetsuya Uda, Takatoshi Watanabe
MALE ACOUSTIC MODEL ADAPTATION BASED ON LANGUAGE-INDEPENDENT FEMALE SPEECH DATA

Publication number: 20120150541

Abstract: A method of generating proxy acoustic models for use in automatic speech recognition includes training acoustic models from speech received via microphone from male speakers of a first language, and adapting the acoustic models in response to language-independent speech data from female speakers of a second language, to generate proxy acoustic models for use during runtime of speech recognition of an utterance from a female speaker of the first language.

Type: Application

Filed: December 10, 2010

Publication date: June 14, 2012

Applicant: GENERAL MOTORS LLC

Inventors: Gaurav Talwar, Rathinavelu Chengalvarayan
Method and system for automatically detecting morphemes in a task classification system using lattices

Patent number: 8200491

Abstract: In an embodiment, a lattice of phone strings in an input communication of a user may be recognized, wherein the lattice may represent a distribution over the phone strings. Morphemes in the input communication of the user may be detected using the recognized lattice. Task-type classification decisions may be made based on the detected morphemes in the input communication of the user.

Type: Grant

Filed: August 27, 2011

Date of Patent: June 12, 2012

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Allen Louis Gorin, Dijana Petrovska-Delacretaz, Giuseppe Riccardi, Jeremy Huntley Wright
SYSTEM AND METHOD FOR GENERATING CHALLENGE UTTERANCES FOR SPEAKER VERIFICATION

Publication number: 20120130714

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media relating to speaker verification. In one aspect, a system receives a first user identity from a second user, and, based on the identity, accesses voice characteristics. The system randomly generates a challenge sentence according to a rule and/or grammar, based on the voice characteristics, and prompts the second user to speak the challenge sentence. The system verifies that the second user is the first user if the spoken challenge sentence matches the voice characteristics. In an enrollment aspect, the system constructs an enrollment phrase that covers a minimum threshold of unique speech sounds based on speaker-distinctive phonemes, phoneme clusters, and prosody. Then user utters the enrollment phrase and extracts voice characteristics for the user from the uttered enrollment phrase.

Type: Application

Filed: November 24, 2010

Publication date: May 24, 2012

Applicant: AT&T Intellectual Property I, L.P.

Inventors: Ilija Zeljkovic, Taniya Mishra, Amanda Stent, Ann K. Syrdal, Jay Wilpon
Speech recognition system and method

Patent number: 8175883

Abstract: According to the present invention, a method for integrating processes with a multi-faceted human centered interface is provided. The interface is facilitated to implement a hands free, voice driven environment to control processes and applications. A natural language model is used to parse voice initiated commands and data, and to route those voice initiated inputs to the required applications or processes. The use of an intelligent context based parser allows the system to intelligently determine what processes are required to complete a task which is initiated using natural language. A single window environment provides an interface which is comfortable to the user by preventing the occurrence of distracting windows from appearing. The single window has a plurality of facets which allow distinct viewing areas. Each facet has an independent process routing its outputs thereto. As other processes are activated, each facet can reshape itself to bring a new process into one of the viewing areas.

Type: Grant

Filed: July 8, 2011

Date of Patent: May 8, 2012

Assignee: Nuance Communications, Inc.

Inventors: Richard Grant, Pedro E. McGregor
Personalized voice activity detection

Patent number: 8175874

Abstract: A method of transferring a real-time audio signal transmission, including: registering voice patterns (or other characteristics) of on more users to be used to identify the voices of the users, accepting an audio signal as it is created as a sequence of segments, analyzing each segment of the accepted audio signal to determine if it contains voice activity (314), determining a probability level that the voice activity of the segment is of a registered user (320 & 322); and selectively transferring the contents, of a segment responsive to the determined probability level (324).

Type: Grant

Filed: July 18, 2006

Date of Patent: May 8, 2012

Inventor: Shaul Shimhi
Wideband audio signal coding/decoding device and method

Patent number: 8170885

Abstract: Disclosed is a wideband audio signal coding/decoding device and method that may code a wideband audio signal while maintaining a low bit rate. The wideband audio signal coding device includes an enhancement layer that extracts a first spectrum parameter from an inputted wideband signal having a first bandwidth, quantizes the extracted first spectrum parameter, and converts the extracted first spectrum parameter into a second spectrum parameter; and a coding unit that extracts a narrowband signal from the inputted wideband signal and codes the narrowband signal based on the second spectrum parameter provided from the enhancement layer, wherein the narrowband signal has a second bandwidth smaller than the first bandwidth. The wideband audio signal coding/decoding device and method may code a wideband audio signal while maintaining a low bit rate.

Type: Grant

Filed: October 15, 2008

Date of Patent: May 1, 2012

Assignee: Gwangju Institute of Science and Technology

Inventors: Hong Kook Kim, Young Han Lee
APPARATUS AND METHOD FOR SPEECH ANALYSIS

Publication number: 20120089396

Abstract: A system that incorporates teachings of the present disclosure may include, for example, an interface for receiving an utterance of speech and converting the utterance into a speech signal, such as digital representation including a waveform and/or spectrum; and a processor for dividing the speech signal into segments and detecting the emotional information from speech. The system is designed by comparing the speech segments to a baseline to identify the emotion or emotions from the suprasegmental information (i.e., paralinguistic information) in speech, wherein the baseline is determined from acoustic characteristics of a plurality of emotion categories. Other embodiments are disclosed.

Type: Application

Filed: June 16, 2010

Publication date: April 12, 2012

Applicant: University of Florida Research Foundation, Inc.

Inventors: Sona Patel, Rahul Shrivastav
Indexing apparatus, indexing method, and computer program product

Patent number: 8145486

Abstract: Acoustic models to provide features to a speech signal are created based on speech features included in regions where similarities of acoustic models created based on speech features in a certain time length are equal to or greater than a predetermined value. Feature vectors acquired by using the acoustic models of the regions and the speech features to provide features to speech signals of second segments are grouped by speaker.

Type: Grant

Filed: January 9, 2008

Date of Patent: March 27, 2012

Assignee: Kabushiki Kaisha Toshiba

Inventor: Makoto Hirohata
Dialogue Detector and Correction

Publication number: 20120041762

Abstract: An apparatus and method for tracking dialogue and other sound signals in film, television or other systems with multiple channel sound is described. One or more audio channels which is expected to carry the speech of persons appearing in the program or other particular types of sounds is inspected to determine if that channel's audio includes particular sounds such as MUEVs, including phonemes corresponding to human speech patterns. If an improper number of particular sounds such as phonemes are found in the channel(s) an action such as a report, an alarm, a correction, or other action is taken. The inspection of the audio channel(s) may be made in conjunction with the appearance of corresponding images associated with the sound, such as visemes in the video signal, to improve the determination of types of sounds such as phonemes.

Type: Application

Filed: December 7, 2010

Publication date: February 16, 2012

Applicant: Pixel Instruments Corporation

Inventors: J. Carl Cooper, Mirko Vojnovic, Christopher Smith
System and method for automatic verification of the understandability of speech

Patent number: 8117033

Abstract: Disclosed herein are systems, methods, and computer-readable storage media for processing a message received from a user to determine whether an estimate of intelligibility is below an intelligibility threshold. The method includes recognizing a portion of a user's message that contains the one or more expected utterances from a critical information list, calculating an estimate of intelligibility for the recognized portion of the user's message that contains the one or more expected utterances, and prompting the user to repeat at least the recognized portion of the user's message if the calculated estimate of intelligibility for the recognized portion of the user's message is below an intelligibility threshold. In one aspect, the method further includes prompting the user to repeat at least a portion of the message if any of a measured speech level and a measured signal-to-noise ratio of the user's message are determined to be below their respective thresholds.

Type: Grant

Filed: August 8, 2011

Date of Patent: February 14, 2012

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Harvey S. Cohen, Randy G. Goldberg, Kenneth H. Rosen
Systems and methods for extracting meaning from multimodal inputs using finite-state devices

Patent number: 8103502

Abstract: Multimodal utterances contain a number of different modes. These modes can include speech, gestures, and pen, haptic, and gaze inputs, and the like. This invention use recognition results from one or more of these modes to provide compensation to the recognition process of one or more other ones of these modes. In various exemplary embodiments, a multimodal recognition system inputs one or more recognition lattices from one or more of these modes, and generates one or more models to be used by one or more mode recognizers to recognize the one or more other modes. In one exemplary embodiment, a gesture recognizer inputs a gesture input and outputs a gesture recognition lattice to a multimodal parser. The multimodal parser generates a language model and outputs it to an automatic speech recognition system, which uses the received language model to recognize the speech input that corresponds to the recognized gesture input.

Type: Grant

Filed: September 26, 2007

Date of Patent: January 24, 2012

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Srinivas Bangalore, Michael J. Johnston
Model development authoring, generation and execution based on data and processor dependencies

Patent number: 8086455

Abstract: A recognition (e.g., speech, handwriting, etc.) model build process that is declarative and data-dependence-based. Process steps are defined in a declarative language as individual processors having input/output data relationships and data dependencies of predecessors and subsequent process steps. A compiler is utilized to generate the model building sequence. The compiler uses the input data and output data files of each model build processor to determine the sequence of model building and automatically orders the processing steps based on the declared input/output relationship (the user does not need to determine the order of execution). The compiler also automatically detects ill-defined processes, including cyclic definition and data being produced by more than one action. The user can add, change and/or modify a process by editing a declaration file, and rerunning the compiler, thereby a new process is automatically generated.

Type: Grant

Filed: January 9, 2008

Date of Patent: December 27, 2011

Assignee: Microsoft Corporation

Inventors: Yifan Gong, Ye Tian
SYSTEM AND METHOD FOR AUTOMATIC VERIFICATION OF THE UNDERSTANDABILITY OF SPEECH

Publication number: 20110295604

Abstract: Disclosed herein are systems, methods, and computer-readable storage media for processing a message received from a user to determine whether an estimate of intelligibility is below an intelligibility threshold. The method includes recognizing a portion of a user's message that contains the one or more expected utterances from a critical information list, calculating an estimate of intelligibility for the recognized portion of the user's message that contains the one or more expected utterances, and prompting the user to repeat at least the recognized portion of the user's message if the calculated estimate of intelligibility for the recognized portion of the user's message is below an intelligibility threshold. In one aspect, the method further includes prompting the user to repeat at least a portion of the message if any of a measured speech level and a measured signal-to-noise ratio of the user's message are determined to be below their respective thresholds.

Type: Application

Filed: August 8, 2011

Publication date: December 1, 2011

Applicant: AT&T Intellectual Property II, L.P.

Inventors: Harvey S. Cohen, Randy G. Goldberg, Kenneth H. Rosen
Speaker recognition in a multi-speaker environment and comparison of several voice prints to many

Patent number: 8036892

Abstract: One-to-many comparisons of callers' voice prints with known voice prints to identify any matches between them. When a customer communicates with a particular entity, such as a customer service center, the system makes a recording of the real-time call including both the customer's and agent's voices. The system segments the recording to extract at least a portion of the customer's voice to create a customer voice print, and it formats the segmented voice print for network transmission to a server. The server compares the customer's voice print with multiple known voice prints to determine any matches, meaning that the customer's voice print and one of the known voice prints are likely from the same person. The identification of any matches can be used for a variety of purposes, such as determining whether to authorize a transaction requested by the customer.

Type: Grant

Filed: July 8, 2010

Date of Patent: October 11, 2011

Assignee: American Express Travel Related Services Company, Inc.

Inventors: Vicki Broman, Vernon Marshall, Seshasayee Bellamkonda, Marcel Leyva, Cynthia Hanson
Method and apparatus for microphone matching for wearable directional hearing device using wearer's own voice

Patent number: 8031881

Abstract: Method and apparatus for microphone matching for wearable directional hearing assistance devices are provided. An embodiment includes a method for matching at least a first microphone to a second microphone, using a user's voice from the user's mouth. The user's voice is processed as received by at least one microphone to determine a frequency profile associated with voice of the user. Intervals are detected where the user is speaking using the frequency profile. Variations in microphone reception between the first microphone and the second microphone are adaptively canceled during the intervals and when the first microphone and second microphone are in relatively constant spatial position with respect to the user's mouth.

Type: Grant

Filed: September 18, 2007

Date of Patent: October 4, 2011

Assignee: Starkey Laboratories, Inc.

Inventor: Tao Zhang
System and method of using acoustic models for automatic speech recognition which distinguish pre- and post-vocalic consonants

Patent number: 8015008

Abstract: Disclosed are systems, methods and computer readable media for training acoustic models for an automatic speech recognition systems (ASR) system. The method includes receiving a speech signal, defining at least one syllable boundary position in the received speech signal, based on the at least one syllable boundary position, generating for each consonant in a consonant phoneme inventory a pre-vocalic position label and a post-vocalic position label to expand the consonant phoneme inventory, reformulating a lexicon to reflect an expanded consonant phoneme inventory, and training a language model for an automated speech recognition (ASR) system based on the reformulated lexicon.

Type: Grant

Filed: October 31, 2007

Date of Patent: September 6, 2011

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Yeon-Jun Kim, Alistair Conkie, Andrej Ljolje, Ann K. Syrdal

prev 1 2 3 4 5 6 7 next