Specialized Models Patents (Class 704/250)
  • Patent number: 8868409
    Abstract: In some implementations, audio data for an utterance is provided over a network. At a client device and over the network, information is received that indicates candidate transcriptions for the utterance and semantic information for the candidate transcriptions. A semantic parser is used at the client device to evaluate each of at least a plurality of the candidate transcriptions. One of the candidate transcriptions is selected based on at least the received semantic information and the output of the semantic parser for the plurality of candidate transcriptions that are evaluated.
    Type: Grant
    Filed: January 16, 2014
    Date of Patent: October 21, 2014
    Assignee: Google Inc.
    Inventors: Pedro J. Moreno Mengibar, Fadi Biadsy, Diego Melendo Casado
  • Patent number: 8854232
    Abstract: A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software. The device provides output in the form of a default output and a number of variants. The output is based largely upon the frequency, i.e., the likelihood that a user intended a particular output, but various features of the device provide additional variants that are not based solely on frequency and rather are provided by various logic structures resident on the device. The device enables editing during text entry and also provides a learning function that allows the disambiguation function to adapt to provide a customized experience for the user. The disambiguation function can be selectively disabled and an alternate keystroke interpretation system provided.
    Type: Grant
    Filed: June 30, 2011
    Date of Patent: October 7, 2014
    Assignee: BlackBerry Limited
    Inventors: Vadim Fux, Michael G. Elizarov, Sergey V. Kolomiets
  • Patent number: 8843372
    Abstract: A system for analyzing conversations or speech, especially “turns” (a point in time in a person's or an animal's talk when another may or does speak) comprises a computer (100) with a memory (105), at least one microphone (115, 120), and software (110) running in the computer. The system is arranged to recognize and quantify utterances including spoken words, pauses between words, in-breaths, vowel extensions, and the like, and to recognize questions and sentences. The system rapidly and efficiently quantifies and qualifies speech and thus offers substantial improvement over prior-art computerized response systems that use traditional linguistic approaches that depend on single words or a small number of words in grammatical sentences as a basic unit of analysis. The system and method are useful in many applications, including teaching colloquial use of turn-taking, and in forensic linguistics.
    Type: Grant
    Filed: December 21, 2012
    Date of Patent: September 23, 2014
    Inventor: Herbert M. Isenberg
  • Patent number: 8831942
    Abstract: A method is provided for identifying a gender of a speaker. The method steps include obtaining speech data of the speaker, extracting vowel-like speech frames from the speech data, analyzing the vowel-like speech frames to generate a feature vector having pitch values corresponding to the vowel-like frames, analyzing the pitch values to generate a most frequent pitch value, determining, in response to the most frequent pitch value being between a first pre-determined threshold and a second pre-determined threshold, an output of a male Gaussian Mixture Model (GMM) and an output of a female GMM using the pitch values as inputs to the male GMM and the female GMM, and identifying the gender of the speaker by comparing the output of the male GMM and the output of the female GMM based on a pre-determined criterion.
    Type: Grant
    Filed: March 19, 2010
    Date of Patent: September 9, 2014
    Assignee: Narus, Inc.
    Inventor: Antonio Nucci
  • Patent number: 8825482
    Abstract: Consumer electronic devices have been developed with enormous information processing capabilities, high quality audio and video outputs, large amounts of memory, and may also include wired and/or wireless networking capabilities. Additionally, relatively unsophisticated and inexpensive sensors, such as microphones, video camera, GPS or other position sensors, when coupled with devices having these enhanced capabilities, can be used to detect subtle features about users and their environments. A variety of audio, video, simulation and user interface paradigms have been developed to utilize the enhanced capabilities of these devices. These paradigms can be used separately or together in any combination. One paradigm automatically creating user identities using speaker identification. Another paradigm includes a control button with 3-axis pressure sensitivity for use with game controllers and other input devices.
    Type: Grant
    Filed: September 15, 2006
    Date of Patent: September 2, 2014
    Assignee: Sony Computer Entertainment Inc.
    Inventors: Gustavo Hernandez-Abrego, Xavier Menendez-Pidal, Steven Osman, Ruxin Chen, Rishi Deshpande, Care Michaud-Wideman, Richard Marks, Eric Larsen, Xiaodong Mao
  • Patent number: 8818810
    Abstract: A method for verifying that a person is registered to use a telemedical device includes identifying an unprompted trigger phrase in words spoken by a person and received by the telemedical device. The telemedical device prompts the person to state a name of a registered user and optionally prompts the person to state health tips for the person. The telemedical device verifies that the person is the registered user using utterance data generated from the unprompted trigger phrase, name of the registered user, and health tips.
    Type: Grant
    Filed: December 29, 2011
    Date of Patent: August 26, 2014
    Assignee: Robert Bosch GmbH
    Inventors: Fuliang Weng, Taufiq Hasan, Zhe Feng
  • Patent number: 8812318
    Abstract: One-to-many comparisons of callers' voice prints with known voice prints to identify any matches between them. When a customer communicates with a particular entity, such as a customer service center, the system makes a recording of the real-time call including both the customer's and agent's voices. The system segments the recording to extract at least a portion of the customer's voice to create a customer voice print, and it formats the segmented voice print for network transmission to a server. The server compares the customer's voice print with multiple known voice prints to determine any matches, meaning that the customer's voice print and one of the known voice prints are likely from the same person. The identification of any matches can be used for a variety of purposes, such as determining whether to authorize a transaction requested by the customer.
    Type: Grant
    Filed: February 6, 2012
    Date of Patent: August 19, 2014
    Assignee: III Holdings 1, LLC
    Inventors: Vicki Broman, Vernon Marshall, Seshasayee Bellamkonda, Marcel Leyva, Cynthia Hanson
  • Patent number: 8805685
    Abstract: Disclosed herein are systems, methods, and tangible computer readable-media for detecting synthetic speaker verification. The method comprises receiving a plurality of speech samples of the same word or phrase for verification, comparing each of the plurality of speech samples to each other, denying verification if the plurality of speech samples demonstrate little variance over time or are the same, and verifying the plurality of speech samples if the plurality of speech samples demonstrates sufficient variance over time. One embodiment further adds that each of the plurality of speech samples is collected at different times or in different contexts. In other embodiments, variance is based on a pre-determined threshold or the threshold for variance is adjusted based on a need for authentication certainty. In another embodiment, if the initial comparison is inconclusive, additional speech samples are received.
    Type: Grant
    Filed: August 5, 2013
    Date of Patent: August 12, 2014
    Assignee: AT&T Intellectual Property I, L.P.
    Inventor: Horst J. Schroeter
  • Publication number: 20140222428
    Abstract: Most speaker recognition systems use i-vectors which are compact representations of speaker voice characteristics. Typical i-vector extraction procedures are complex in terms of computations and memory usage. According to an embodiment, a method and corresponding apparatus for speaker identification, comprise determining a representation for each component of a variability operator, representing statistical inter- and intra-speaker variability of voice features with respect to a background statistical model, in terms of a linear operator common to all components of the variability operator and having a first dimension larger than a second dimension of the components of the variability operator; computing statistical voice characteristics of a particular speaker using the determined representations; and employing the statistical voice characteristics of the particular speaker in performing speaker recognition.
    Type: Application
    Filed: April 4, 2013
    Publication date: August 7, 2014
    Applicant: Nuance Communications, Inc.
    Inventors: Sandro Cumani, Pietro Laface
  • Patent number: 8798994
    Abstract: The present invention discloses a solution for conserving computing resources when implementing transformation based adaptation techniques. The disclosed solution limits the amount of speech data used by real-time adaptation algorithms to compute a transformation, which results in substantial computational savings. Appreciably, application of a transform is a relatively low memory and computationally cheap process compared to memory and resource requirements for computing the transform to be applied.
    Type: Grant
    Filed: February 6, 2008
    Date of Patent: August 5, 2014
    Assignee: International Business Machines Corporation
    Inventors: John W. Eckhart, Michael Florio, Radek Hampl, Pavel Krbec, Jonathan Palgon
  • Patent number: 8793127
    Abstract: In addition to conveying primary information, human speech also conveys information concerning the speaker's gender, age, socioeconomic status, accent, language spoken, emotional state, or other personal characteristics, which is referred to as secondary information. Disclosed herein are both the means of automatic discovery and use of such secondary information to direct other aspects of the behavior of a controlled system. One embodiment of the invention comprises an improved method to determine, with high reliability, the gender of an adult speaker. A further embodiment of the invention comprises the use of this information to display a gender-appropriate advertisement to the user of an information retrieval system that uses a cell phone as the input and output device.
    Type: Grant
    Filed: October 31, 2007
    Date of Patent: July 29, 2014
    Assignee: Promptu Systems Corporation
    Inventors: Harry Printz, Vikas Gulati
  • Patent number: 8775178
    Abstract: Updating a voice template for recognizing a speaker on the basis of a voice uttered by the speaker is disclosed. Stored voice templates indicate distinctive characteristics of utterances from speakers. Distinctive characteristics are extracted for a specific speaker based on a voice message utterance received from that speaker. The distinctive characteristics are compared to the characteristics indicated by the stored voice templates to selected a template that matches within a predetermined threshold. The selected template is updated on the basis of the extracted characteristics.
    Type: Grant
    Filed: October 27, 2009
    Date of Patent: July 8, 2014
    Assignee: International Business Machines Corporation
    Inventors: Yukari Miki, Masami Noguchi
  • Publication number: 20140188468
    Abstract: An apparatus, system and method for calculating passphrase variability are disclosed. The passphrase variability value can then be used for generating phonetically rich passwords in text-dependent speaker recognition systems, or for estimating the variability of the input passphrase in text-independent system during the enrolling process in a speech recognition security system.
    Type: Application
    Filed: December 28, 2012
    Publication date: July 3, 2014
    Inventors: Dmitry Dyrmovskiy, Mikhail Khitrov
  • Patent number: 8762733
    Abstract: The invention provides a method for verifying a person's identity, which includes obtaining a password and/or random key from a person, and comparing the obtained password and/or random key to a plurality of known passwords and/or random keys to determine a likely identity of the person. The method further includes measuring a specific biometric of the person, the specific biometric comprising a respiratory, cardiac, or other physiologic biometric, and comparing the measured specific biometric to the known specific biometric of the person that is associated with the obtained password and/or random key to verify the likely identity of the person.
    Type: Grant
    Filed: January 25, 2007
    Date of Patent: June 24, 2014
    Assignee: adidas AG
    Inventors: P. Alexander Derchak, Lance Myers
  • Publication number: 20140172428
    Abstract: Provided is a method for context independent gender recognition utilizing phoneme transition probability. The method for the context independent gender recognition includes detecting a voice section from a received voice signal, generating feature vectors within the detected voice section, performing a hidden Markov model on the feature vectors by using a search network that is set according to a phoneme rule to recognize a phoneme and obtain scores of first and second likelihoods, and comparing final scores of the first and second likelihoods obtained while the phoneme recognition is performed up to the last section of the voice section to finally decide gender with respect to the voice signal.
    Type: Application
    Filed: September 3, 2013
    Publication date: June 19, 2014
    Applicant: Electronics and Telecommunications Research Institute
    Inventor: Mun Sung HAN
  • Patent number: 8751231
    Abstract: Methods and systems for model-driven candidate sorting based on audio cues for evaluating digital interviews are described. In one embodiment, an audio cue generator identifies utterances in audio data of a digital interview. The utterances each include a group of one or more words spoken by a candidate in the digital interview. The audio cue generator generates audio cues of the digital interview based on the identified utterances. The audio cues are applies to a prediction model to predict an achievement index for the candidate based on the audio cues. The candidate is displayed in a list of candidates based on the achievement index. The list of candidates is sorted according to the candidates' achievement index.
    Type: Grant
    Filed: February 18, 2014
    Date of Patent: June 10, 2014
    Assignee: Hirevue, Inc.
    Inventors: Loren Larsen, Benjamin Taylor
  • Publication number: 20140142944
    Abstract: Systems and method of diarization of audio files use an acoustic voiceprint model. A plurality of audio files are analyzed to arrive at an acoustic voiceprint model associated to an identified speaker. Metadata associate with an audio file is used to select an acoustic voiceprint model. The selected acoustic voiceprint model is applied in a diarization to identify audio data of the identified speaker.
    Type: Application
    Filed: November 20, 2013
    Publication date: May 22, 2014
    Applicant: VERINT SYSTEMS LTD.
    Inventors: Omer Ziv, Ran Achituv, Ido Shapira, Jeremie Dreyfuss
  • Patent number: 8732209
    Abstract: Computerized systems and methods for dynamically rendering reports in a healthcare environment are provided. In accordance with one method of the invention, two XML files are provided. The first XML file contains data representing information to be presented in the report. The second XML file contains data representing a format for the report. The second XML file is converted to an XSL stylesheet and applied to the data contained in the first XML file to create a third XML file. The third XML file contains the data representing the information to be presented in the report and the data representing the format for the report. The report is rendered using the third XML file.
    Type: Grant
    Filed: June 6, 2005
    Date of Patent: May 20, 2014
    Assignee: Cerner Innovation, Inc.
    Inventors: Sean Patrick Griffin, Brent W. Bossi
  • Patent number: 8719019
    Abstract: Speaker identification techniques are described. In one or more implementations, sample data is received at a computing device of one or more user utterances captured using a microphone. The sample data is processed by the computing device to identify a speaker of the one or more user utterances. The processing involving use of a feature set that includes features obtained using a filterbank having filters that space linearly at higher frequencies and logarithmically at lower frequencies, respectively, features that model the speaker's vocal tract transfer function, and features that indicate a vibration rate of vocal folds of the speaker of the sample data.
    Type: Grant
    Filed: April 25, 2011
    Date of Patent: May 6, 2014
    Assignee: Microsoft Corporation
    Inventors: Hoang T. Do, Ivan J. Tashev, Alejandro Acero, Jason S. Flaks, Robert N. Heitkamp, Molly R. Suver
  • Publication number: 20140100850
    Abstract: A method and apparatus of performing a preset operation by using voice recognition are provided. The method includes performing the preset operation of a preset operation mode according to a key input or a touch input in the preset operation mode; and recognizing an input voice during performance of the preset operation of the preset operation mode and assisting the performance of the preset operation according to the recognized voice.
    Type: Application
    Filed: July 30, 2013
    Publication date: April 10, 2014
    Applicant: Samsung Electronics Co., Ltd.
    Inventor: Sung-Joon WON
  • Patent number: 8650027
    Abstract: The invention provides an electrolaryngeal speech reconstruction method and a system thereof. Firstly, model parameters are extracted from the collected speech as a parameter library, then facial images of a speaker are acquired and then transmitted to an image analyzing and processing module to obtain the voice onset and offset times and the vowel classes, then a waveform of a voice source is synthesized by a voice source synthesis module, finally, the waveform of the above voice source is output by an electrolarynx vibration output module, wherein the voice source synthesis module firstly sets the model parameters of a glottal voice source so as to synthesize the waveform of the glottal voice source, and then a waveguide model is used to simulate sound transmission in a vocal tract and select shape parameters of the vocal tract according to the vowel classes.
    Type: Grant
    Filed: September 4, 2012
    Date of Patent: February 11, 2014
    Assignee: Xi'an Jiaotong University
    Inventors: Mingxi Wan, Liang Wu, Supin Wang, Zhifeng Niu, Congying Wan
  • Patent number: 8645137
    Abstract: A method and system for training a user authentication by voice signal are described. In one embodiment, a set of feature vectors are decomposed into speaker-specific recognition units. The speaker-specific recognition units are used to compute distribution values to train the voice signal. In addition, spectral feature vectors are decomposed into speaker-specific characteristic units which are compared to the speaker-specific distribution values. If the speaker-specific characteristic units are within a threshold limit of the speaker-specific distribution values, the speech signal is authenticated.
    Type: Grant
    Filed: June 11, 2007
    Date of Patent: February 4, 2014
    Assignee: Apple Inc.
    Inventors: Jerome R. Bellegarda, Kim E. A. Silverman
  • Patent number: 8645130
    Abstract: A processing unit is provided which executes speech recognition on speech signals captured by a microphone for capturing sounds uttered in an environment. The processing unit has: an initial reflection component extraction portion that extracts initial reflection components by removing diffuse reverberation components from a reverberation pattern of an impulse response generated in the environment; and an acoustic model learning portion that learns an acoustic model for the speech recognition by reflecting the initial reflection components to speech data for learning.
    Type: Grant
    Filed: November 20, 2008
    Date of Patent: February 4, 2014
    Assignees: Toyota Jidosha Kabushiki Kaisha, National University Corporation Nara Institute of Science and Technology
    Inventors: Narimasa Watanabe, Kiyohiro Shikano, Randy Gomez
  • Patent number: 8626505
    Abstract: A computer implemented method, system, and/or computer program product generates an audio cohort. Audio data from a set of audio sensors is received by an audio analysis engine. The audio data, which is associated with a plurality of objects, comprises a set of audio patterns. The audio data is processed to identify audio attributes associated with the plurality of objects to form digital audio data. This digital audio data comprises metadata that describes the audio attributes of the set of objects. A set of audio cohorts is generated using the audio attributes associated with the digital audio data and cohort criteria, where each audio cohort in the set of audio cohorts is a cohort of accompanied customers in a store, and where processing the audio data identifies a type of zoological creature that is accompanying each of the accompanied customers.
    Type: Grant
    Filed: September 6, 2012
    Date of Patent: January 7, 2014
    Assignee: International Business Machines Corporation
    Inventors: Robert L. Angell, Robert R. Friedlander, James R. Kraemer
  • Patent number: 8627230
    Abstract: A method, system, and computer program product for intelligent command prediction are provided. The method includes determining a command prediction preference associated with a user from user profile data, and selecting one or more command history repositories responsive to the command prediction preference. The one or more command history repositories include command history data collected from a plurality of users and classification data associated with the plurality of users. The method also includes calculating command probabilities for commands in the command history data of the selected one or more command history repositories as a function of the classification data associated with the plurality of users in relation to the user. The method additionally includes presenting a next suggested command as a command from the command history data of the selected one or more command history repositories with a highest calculated command probability.
    Type: Grant
    Filed: November 24, 2009
    Date of Patent: January 7, 2014
    Assignee: International Business Machines Corporation
    Inventors: Olivier Boehler, Gisela C. Cheng, Anuja Deedwaniya, Zamir G. Gonzalez, Shayne M. Grant, Jagadish B. Kotra
  • Patent number: 8620655
    Abstract: A speech processing method, comprising: receiving a speech input which comprises a sequence of feature vectors; determining the likelihood of a sequence of words arising from the sequence of feature vectors using an acoustic model and a language model, comprising: providing an acoustic model for performing speech recognition on an input signal which comprises a sequence of feature vectors, said model having a plurality of model parameters relating to the probability distribution of a word or part thereof being related to a feature vector, wherein said speech input is a mismatched speech input which is received from a speaker in an environment which is not matched to the speaker or environment under which the acoustic model was trained; and adapting the acoustic model to the mismatched speech input, the speech processing method further comprising determining the likelihood of a sequence of features occurring in a given language using a language model; and combining the likelihoods determined by the acoustic
    Type: Grant
    Filed: August 10, 2011
    Date of Patent: December 31, 2013
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Haitian Xu, Kean Kheong Chin, Mark John Francis Gales
  • Patent number: 8612212
    Abstract: The invention concerns a method and corresponding system for building a phonotactic model for domain independent speech recognition. The method may include recognizing phones from a user's input communication using a current phonotactic model, detecting morphemes (acoustic and/or non-acoustic) from the recognized phones, and outputting the detected morphemes for processing. The method also updates the phonotactic model with the detected morphemes and stores the new model in a database for use by the system during the next user interaction. The method may also include making task-type classification decisions based on the detected morphemes from the user's input communication.
    Type: Grant
    Filed: March 4, 2013
    Date of Patent: December 17, 2013
    Assignee: AT&T Intellectual Property II, L.P.
    Inventor: Giuseppe Riccardi
  • Patent number: 8612224
    Abstract: A method for identifying a plurality of speakers in audio data and for decoding the speech spoken by said speakers; the method comprising: receiving speech; dividing the speech into segments as it is received; processing the received speech segment by segment in the order received to identify the speaker and to decode the speech, processing comprising: performing primary decoding of the segment using an acoustic model and a language model; obtaining segment parameters indicating the differences between the speaker of the segment and a base speaker during the primary decoding; comparing the segment parameters with a plurality of stored speaker profiles to determine the identity of the speaker, and selecting a speaker profile for said speaker; updating the selected speaker profile; performing a further decoding of the segment using a speaker independent acoustic model, adapted using the updated speaker profile; outputting the decoded speech for the identified speaker, wherein the speaker profiles are upd
    Type: Grant
    Filed: August 23, 2011
    Date of Patent: December 17, 2013
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Catherine Breslin, Mark John Francis Gales, Kean Kheong Chin, Katherine Mary Knill
  • Patent number: 8606580
    Abstract: To provide a data process unit and data process unit control program that are suitable for generating acoustic models for unspecified speakers taking distribution of diversifying feature parameters into consideration under such specific conditions as the type of speaker, speech lexicons, speech styles, and speech environment and that are suitable for providing acoustic models intended for unspecified speakers and adapted to speech of a specific person. The data process unit comprises a data classification section, data storing section, pattern model generating section, data control section, mathematical distance calculating section, pattern model converting section, pattern model display section, region dividing section, division changing section, region selecting section, and specific pattern model generating section.
    Type: Grant
    Filed: December 30, 2008
    Date of Patent: December 10, 2013
    Assignee: Asahi Kasei Kabushiki Kaisha
    Inventors: Makoto Shozakai, Goshu Nagino
  • Publication number: 20130311184
    Abstract: A method and a system for speech recognition are provided. In the method, vocal characteristics are captured from speech data and used to identify a speaker identification of the speech data. Next, a first acoustic model is used to recognize a speech in the speech data. According to the recognized speech and the speech data, a confidence score of the speech recognition is calculated and it is determined whether the confidence score is over a threshold. If the confidence score is over the threshold, the recognized speech and the speech data are collected, and the collected speech data is used for performing a speaker adaptation on a second acoustic model corresponding to the speaker identification.
    Type: Application
    Filed: December 5, 2012
    Publication date: November 21, 2013
    Inventors: Nilay Chokhoba Badavne, Tai-Ming Parng, Po-Yuan Yeh, Vinay Kumar Baapanapalli Yadaiah
  • Publication number: 20130311185
    Abstract: In accordance with an example embodiment a method and apparatus are provided. The method comprises identifying at least one subject voice in one or more media files. The method also comprises determining at least one prosodic feature of the at least one subject voice. The method also comprises determining at least one prosodic tag for the at least one subject voice based on the at least one prosodic feature.
    Type: Application
    Filed: January 19, 2012
    Publication date: November 21, 2013
    Applicant: NOKIA CORPORATION
    Inventors: Rohit Atri, Sidharth Patil
  • Patent number: 8589159
    Abstract: The present invention is a keyword display system that includes a speaker specifier for specify a speaker; a weight determinator for determining a weight of the specified speaker; a keyword extractor for extracting keywords from a speech of the aforementioned speaker; a keyword relation degree calculator for calculating a relation degree between the aforementioned extracted keywords, carrying out a weighting for this calculated relation degree by using the weight of the speaker having spoken the aforementioned keywords, and calculating a keyword relation degree between the keywords; and a keyword display controller for displaying a relevancy between the aforementioned extracted keywords responding to the aforementioned keyword relation degree.
    Type: Grant
    Filed: April 19, 2011
    Date of Patent: November 19, 2013
    Assignee: NEC Corporation
    Inventor: Mitsunori Morisaki
  • Publication number: 20130297311
    Abstract: An information processing apparatus including: a high-quality-voice determining section configured to determine a voice, which can be determined to have been collected under a good condition, as a good-condition voice included in mixed voices pertaining to a group of voices collected under different conditions; and a voice recognizing section configured to carry out voice recognition processing by making use of a predetermined parameter on the good-condition voice determined by the high-quality-voice determining section, modify the value of the predetermined parameter on the basis of a result of the voice recognition processing carried out on the good-condition voice, and carry out the voice recognition processing by making use of the predetermined parameter having the modified value on a voice included in the mixed voices as a voice other than the good-condition voice.
    Type: Application
    Filed: March 15, 2013
    Publication date: November 7, 2013
    Applicant: Sony Corporation
    Inventors: Takeshi Yamaguchi, Yasuhiko Kato, Nobuyuki Kihara, Yohei Sakuraba
  • Patent number: 8571865
    Abstract: Systems, methods performed by data processing apparatus and computer storage media encoded with computer programs for receiving information relating to (i) a communication device that has received an utterance and (ii) a voice associated with the received utterance, comparing the received voice information with voice signatures in a comparison group, the comparison group including one or more individuals identified from one or more connections arising from the received information relating to the communication device, attempting to identify the voice associated with the utterance as matching one of the individuals in the comparison group, and based on a result of the attempt to identify, selectively providing the communication device with access to one or more resources associated with the matched individual.
    Type: Grant
    Filed: August 10, 2012
    Date of Patent: October 29, 2013
    Assignee: Google Inc.
    Inventor: Philip Hewinson
  • Patent number: 8571867
    Abstract: A method (700) and system (900) for authenticating a user is provided. The method can include receiving one or more spoken utterances from a user (702), recognizing a phrase corresponding to one or more spoken utterances (704), identifying a biometric voice print of the user from one or more spoken utterances of the phrase (706), determining a device identifier associated with the device (708), and authenticating the user based on the phrase, the biometric voice print, and the device identifier (710). A location of the handset or the user can be employed as criteria for granting access to one or more resources (712).
    Type: Grant
    Filed: September 13, 2012
    Date of Patent: October 29, 2013
    Assignee: Porticus Technology, Inc.
    Inventors: Germano Di Mambro, Bernardas Salna
  • Patent number: 8566091
    Abstract: A speech recognition system is provided for selecting, via a speech input, an item from a list of items. The speech recognition system detects a first speech input, recognizes the first speech input, compares the recognized first speech input with the list of items and generates a first candidate list of best matching items based on the comparison result. The system then informs the speaker of at least one of the best matching items of the first candidate list for a selection of an item by the speaker. If the intended item is not one of the best matching items presented to the speaker, the system then detects a second speech input, recognizes the second speech input, and generates a second candidate list of best matching items taking into account the comparison result obtained with the first speech input.
    Type: Grant
    Filed: December 12, 2007
    Date of Patent: October 22, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Andreas Löw, Lars König, Christian Hillebrecht
  • Patent number: 8566093
    Abstract: A method for compensating inter-session variability for automatic extraction of information from an input voice signal representing an utterance of a speaker, includes: processing the input voice signal to provide feature vectors each formed by acoustic features extracted from the input voice signal at a time frame; computing an intersession variability compensation feature vector; and computing compensated feature vectors based on the extracted feature vectors and the intersession variability compensation feature vector.
    Type: Grant
    Filed: May 16, 2006
    Date of Patent: October 22, 2013
    Assignee: Loquendo S.p.A.
    Inventors: Claudio Vair, Daniele Colibro, Pietro Laface
  • Patent number: 8554562
    Abstract: A method and system for speaker diarization are provided. Pre-trained acoustic models of individual speaker and/or groups of speakers are obtained. Speech data with multiple speakers is received and divided into frames. For a frame, an acoustic feature vector is determined extended to include log-likelihood ratios of the pre-trained models in relation to a background population model. The extended acoustic feature vector is used in segmentation and clustering algorithms.
    Type: Grant
    Filed: November 15, 2009
    Date of Patent: October 8, 2013
    Assignee: Nuance Communications, Inc.
    Inventor: Hagai Aronowitz
  • Patent number: 8554563
    Abstract: A method and system for speaker diarization are provided. Pre-trained acoustic models of individual speaker and/or groups of speakers are obtained. Speech data with multiple speakers is received and divided into frames. For a frame, an acoustic feature vector is determined extended to include log-likelihood ratios of the pre-trained models in relation to a background population model. The extended acoustic feature vector is used in segmentation and clustering algorithms.
    Type: Grant
    Filed: September 11, 2012
    Date of Patent: October 8, 2013
    Assignee: Nuance Communications, Inc.
    Inventor: Hagai Aronowitz
  • Patent number: 8532992
    Abstract: Disclosed herein are systems, methods, and computer-readable storage media for selecting a speech recognition model in a standardized speech recognition infrastructure. The system receives speech from a user, and if a user-specific supervised speech model associated with the user is available, retrieves the supervised speech model. If the user-specific supervised speech model is unavailable and if an unsupervised speech model is available, the system retrieves the unsupervised speech model. If the user-specific supervised speech model and the unsupervised speech model are unavailable, the system retrieves a generic speech model associated with the user. Next the system recognizes the received speech from the user with the retrieved model. In one embodiment, the system trains a speech recognition model in a standardized speech recognition infrastructure. In another embodiment, the system handshakes with a remote application in a standardized speech recognition infrastructure.
    Type: Grant
    Filed: February 8, 2013
    Date of Patent: September 10, 2013
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Andrej Ljolje, Bernard S. Renger, Steven Neil Tischer
  • Patent number: 8515756
    Abstract: A method and device are configured to receive voice data from a user and perform speech recognition on the received voice data. A confidence score is calculated that represents the likelihood that received voice data has been accurately recognized. A likely age range is determined associated with the user based on the confidence score.
    Type: Grant
    Filed: November 30, 2011
    Date of Patent: August 20, 2013
    Assignee: Verizon Patent and Licensing Inc.
    Inventor: Kevin R. Witzman
  • Patent number: 8510110
    Abstract: Systems and methods for detecting people or speakers in an automated fashion are disclosed. A pool of features including more than one type of input (like audio input and video input) may be identified and used with a learning algorithm to generate a classifier that identifies people or speakers. The resulting classifier may be evaluated to detect people or speakers.
    Type: Grant
    Filed: July 11, 2012
    Date of Patent: August 13, 2013
    Assignee: Microsoft Corporation
    Inventors: Cha Zhang, Paul A. Viola, Pei Yin, Ross G. Cutler, Xinding Sun, Yong Rui
  • Patent number: 8510111
    Abstract: A speech recognition apparatus includes a generating unit generating a speech-feature vector expressing a feature for each of frames obtained by dividing an input speech, a storage unit storing a first acoustic model obtained by modeling a feature of each word by using a state transition model, a storage unit configured to store at least one second acoustic model, a calculation unit calculating, for each state, a first probability of transition to an at-end-frame state to obtain first probabilities, and select a maximum probability of the first probabilities, a selection unit selecting a maximum-probability-transition path, a conversion unit converting the maximum-probability-transition path into a corresponding-transition-path corresponding to the second acoustic model, a calculation unit calculating a second probability of transition to the at-end-frame state on the corresponding-transition-path, and a finding unit finding to which word the input speech corresponds based on the maximum probability and the s
    Type: Grant
    Filed: February 8, 2008
    Date of Patent: August 13, 2013
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Masaru Sakai, Hiroshi Fujimura, Shinichi Tanaka
  • Publication number: 20130204621
    Abstract: A phonetic vocabulary for a speech recognition system is adapted to a particular speaker's pronunciation. A speaker can be attributed specific pronunciation styles, which can be identified from specific pronunciation examples. Consequently, a phonetic vocabulary can be reduced in size, which can improve recognition accuracy and recognition speed.
    Type: Application
    Filed: March 15, 2013
    Publication date: August 8, 2013
    Applicant: Nuance Communications, Inc.
    Inventor: Nuance Communications, Inc.
  • Patent number: 8504366
    Abstract: Method, system, and computer program product are provided for Joint Factor Analysis (JFA) scoring in speech processing systems. The method includes: carrying out an enrollment session offline to enroll a speaker model in a speech processing system using JFA, including: extracting speaker factors from the enrollment session; estimating first components of channel factors from the enrollment session. The method further includes: carrying out a test session including: calculating second components of channel factors strongly dependent on the test session; and generating a score based on speaker factors, channel factors, and test session Gaussian mixture model sufficient statistics to provide a log-likelihood ratio for a test session.
    Type: Grant
    Filed: November 16, 2011
    Date of Patent: August 6, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Aronowitz Hagai, Barkan Oren
  • Patent number: 8504365
    Abstract: Disclosed herein are systems, methods, and tangible computer readable-media for detecting synthetic speaker verification. The method comprises receiving a plurality of speech samples of the same word or phrase for verification, comparing each of the plurality of speech samples to each other, denying verification if the plurality of speech samples demonstrate little variance over time or are the same, and verifying the plurality of speech samples if the plurality of speech samples demonstrates sufficient variance over time. One embodiment further adds that each of the plurality of speech samples is collected at different times or in different contexts. In other embodiments, variance is based on a pre-determined threshold or the threshold for variance is adjusted based on a need for authentication certainty. In another embodiment, if the initial comparison is inconclusive, additional speech samples are received.
    Type: Grant
    Filed: April 11, 2008
    Date of Patent: August 6, 2013
    Assignee: AT&T Intellectual Property I, L.P.
    Inventor: Horst Schroeter
  • Patent number: 8494856
    Abstract: According to one embodiment, a speech synthesizer includes an analyzer, a first estimator, a selector, a generator, a second estimator, and a synthesizer. The analyzer analyzes text and extracts a linguistic feature. The first estimator selects a first prosody model adapted to the linguistic feature and estimates prosody information that maximizes a first likelihood representing probability of the selected first prosody model. The selector selects speech units that minimize a cost function determined in accordance with the prosody information. The generator generates a second prosody model that is a model of the prosody information of the speech units. The second estimator estimates prosody information that maximizes a third likelihood calculated on the basis of the first likelihood and a second likelihood representing probability of the second prosody model. The synthesizer generates synthetic speech by concatenating the speech units on the basis of the prosody information estimated by the second estimator.
    Type: Grant
    Filed: October 12, 2011
    Date of Patent: July 23, 2013
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Javier Latorre, Masami Akamine
  • Patent number: 8489397
    Abstract: A machine-readable medium and a network device are provided for speech-to-text translation. Speech packets are received at a broadband telephony interface and stored in a buffer. The speech packets are processed and textual representations thereof are displayed as words on a display device. Speech processing is activated and deactivated in response to a command from a subscriber.
    Type: Grant
    Filed: September 11, 2012
    Date of Patent: July 16, 2013
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Charles David Caldwell, John Bruce Harlow, Robert J. Sayko, Norman Shaye
  • Patent number: 8473292
    Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for generating personalized user models. The method includes receiving automatic speech recognition (ASR) output of speech interactions with a user, receiving an ASR transcription error model characterizing how ASR transcription errors are made, generating guesses of a true transcription and a user model via an expectation maximization (EM) algorithm based on the error model and the respective ASR output where the guesses will converge to a personalized user model which maximizes the likelihood of the ASR output. The ASR output can be unlabeled. The method can include casting speech interactions as a dynamic Bayesian network with four variables: (s), (u), (r), (m), and encoding relationships between (s), (u), (r), (m) as conditional probability tables. At each dialog turn (r) and (m) are known and (s) and (u) are hidden.
    Type: Grant
    Filed: September 2, 2009
    Date of Patent: June 25, 2013
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Jason Williams, Umar Syed
  • Patent number: 8457965
    Abstract: A method is described for correcting and improving the functioning of certain devices for the diagnosis and treatment of speech that dynamically measure the functioning of the velum in the control of nasality during speech. The correction method uses an estimate of the vowel frequency spectrum to greatly reduce the variation of nasalance with the vowel being spoken, so as to result in a corrected value of nasalance that reflects with greater accuracy the degree of velar opening. Correction is also described for reducing the effect on nasalance values of energy from the oral and nasal channels crossing over into the other channel because of imperfect acoustic separation.
    Type: Grant
    Filed: October 6, 2009
    Date of Patent: June 4, 2013
    Assignee: Rothenberg Enterprises
    Inventor: Martin Rothenberg