Specialized Models Patents (Class 704/250)
  • Patent number: 8108212
    Abstract: A speech recognition method comprises model selection step which selects a recognition model based on characteristic information of input speech and speech recognition step which translates input speech into text data based on the selected recognition model.
    Type: Grant
    Filed: October 30, 2007
    Date of Patent: January 31, 2012
    Assignee: NEC Corporation
    Inventor: Shuhei Maegawa
  • Patent number: 8099278
    Abstract: A device may be configured to provide a query to a user. Voice data may be received from the user responsive to the query. Voice recognition may be performed on the voice data to identify a query answer. A confidence score associated with the query answer may be calculated, wherein the confidence score represents the likelihood that the query answer has been accurately identified. A likely age range associated with the user may be determined based on the confidence score. The device to calculate the confidence score may be tuned to increase a likelihood of recognition of voice data for a particular age range of callers.
    Type: Grant
    Filed: December 22, 2010
    Date of Patent: January 17, 2012
    Assignee: Verizon Patent and Licensing Inc.
    Inventor: Kevin R. Witzman
  • Patent number: 8099288
    Abstract: A text-dependent speaker verification technique that uses a generic speaker-independent speech recognizer for robust speaker verification, and uses the acoustical model of a speaker-independent speech recognizer as a background model. Instead of using a likelihood ratio test (LRT) at the utterance level (e.g., the sentence level), which is typical of most speaker verification systems, the present text-dependent speaker verification technique uses weighted sum of likelihood ratios at the sub-unit level (word, tri-phone, or phone) as well as at the utterance level.
    Type: Grant
    Filed: February 12, 2007
    Date of Patent: January 17, 2012
    Assignee: Microsoft Corp.
    Inventors: Zhengyou Zhang, Amarnag Subramaya
  • Patent number: 8099290
    Abstract: A voice recognition unit is constructed in such a way as to create a voice label string for an inputted voice uttered by a user inputted for each language on the basis of a feature vector time series of the inputted voice uttered by the user and data about a sound standard model, and register the voice label string into a voice label memory 2 while automatically switching among languages for a sound standard model memory 1 used to create the voice label string, and automatically switching among the languages for the voice label memory 2 for holding the created voice label string by using a first language switching unit SW1 and a second language switching unit SW2.
    Type: Grant
    Filed: October 20, 2009
    Date of Patent: January 17, 2012
    Assignee: Mitsubishi Electric Corporation
    Inventors: Tadashi Suzuki, Yasushi Ishikawa, Yuzo Maruta
  • Publication number: 20120010887
    Abstract: Embodiments include a speech recognition system and a personal speech profile data (PSPD) storage device that is physically distinct from the speech recognition system. In the speech recognition system, a PSPD interface receives voice training data, which is associated with an individual, from the PSPD storage device. A speech input module produces a digital speech signal derived from an utterance made by a system user. A speech processing module accesses voice training data stored on the PSPD storage device through the PSPD interface, and executes a speech processing algorithm that analyzes the digital speech signal using the voice training data, in order to identify one or more recognized terms from the digital speech signal. A command processing module initiates execution of various applications based on the recognized terms. Embodiments may be implemented in various types of host systems, including an aircraft cockpit-based system.
    Type: Application
    Filed: July 8, 2010
    Publication date: January 12, 2012
    Applicant: HONEYWELL INTERNATIONAL INC.
    Inventors: Lokesh Rayasandra Boregowda, Meruva Jayaprakash, Koushik Sinha
  • Patent number: 8086455
    Abstract: A recognition (e.g., speech, handwriting, etc.) model build process that is declarative and data-dependence-based. Process steps are defined in a declarative language as individual processors having input/output data relationships and data dependencies of predecessors and subsequent process steps. A compiler is utilized to generate the model building sequence. The compiler uses the input data and output data files of each model build processor to determine the sequence of model building and automatically orders the processing steps based on the declared input/output relationship (the user does not need to determine the order of execution). The compiler also automatically detects ill-defined processes, including cyclic definition and data being produced by more than one action. The user can add, change and/or modify a process by editing a declaration file, and rerunning the compiler, thereby a new process is automatically generated.
    Type: Grant
    Filed: January 9, 2008
    Date of Patent: December 27, 2011
    Assignee: Microsoft Corporation
    Inventors: Yifan Gong, Ye Tian
  • Patent number: 8078462
    Abstract: A transformation-parameter calculating unit calculates a first model parameter indicating a parameter of a speaker model for causing a first likelihood for a clean feature to maximum, and calculates a transformation parameter for causing the first likelihood to maximum. The transformation parameter transforms, for each of the speakers, a distribution of the clean feature corresponding to the identification information of the speaker to a distribution represented by the speaker model of the first model parameter. A model-parameter calculating unit transforms a noisy feature corresponding to identification information for each of speakers by using the transformation parameter, and calculates a second model parameter indicating a parameter of the speaker model for causing a second likelihood for the transformed noisy feature to maximum.
    Type: Grant
    Filed: October 2, 2008
    Date of Patent: December 13, 2011
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Yusuke Shinohara, Masami Akamine
  • Patent number: 8078465
    Abstract: Certain aspects and embodiments of the present invention are directed to systems and methods for monitoring and analyzing the language environment and the development of a key child. A key child's language environment and language development can be monitored without placing artificial limitations on the key child's activities or requiring a third party observer. The language environment can be analyzed to identify words, vocalizations, or other noises directed to or spoken by the key child, independent of content. The analysis can include the number of responses between the child and another, such as an adult and the number of words spoken by the child and/or another, independent of content of the speech. One or more metrics can be determined based on the analysis and provided to assist in improving the language environment and/or tracking language development of the key child.
    Type: Grant
    Filed: January 23, 2008
    Date of Patent: December 13, 2011
    Assignee: LENA Foundation
    Inventors: Terrance Paul, Dongxin Xu, Umit Yapenel, Sharmistha Gray
  • Publication number: 20110301953
    Abstract: Provided is a system of voice recognition that adapts and stores a voice of a speaker for each feature to each of a basic voice model and new independent multi models and provides stable real-time voice recognition through voice recognition using a multi adaptive model.
    Type: Application
    Filed: April 11, 2011
    Publication date: December 8, 2011
    Applicant: Seoby Electronic Co., Ltd
    Inventor: Sung-Sub Lee
  • Patent number: 8050922
    Abstract: Voice recognition methods and systems are disclosed. A voice signal is obtained for an utterance of a speaker. The speaker is categorized as a male, female, or child and the categorization is used as a basis for dynamically adjusting a maximum frequency fmax and a minimum frequency fmin of a filter bank used for processing the input utterance to produce an output. Corresponding gender or age specific acoustic models are used to perform voice recognition based on the filter bank output.
    Type: Grant
    Filed: July 21, 2010
    Date of Patent: November 1, 2011
    Assignee: Sony Computer Entertainment Inc.
    Inventor: Ruxin Chen
  • Patent number: 8036892
    Abstract: One-to-many comparisons of callers' voice prints with known voice prints to identify any matches between them. When a customer communicates with a particular entity, such as a customer service center, the system makes a recording of the real-time call including both the customer's and agent's voices. The system segments the recording to extract at least a portion of the customer's voice to create a customer voice print, and it formats the segmented voice print for network transmission to a server. The server compares the customer's voice print with multiple known voice prints to determine any matches, meaning that the customer's voice print and one of the known voice prints are likely from the same person. The identification of any matches can be used for a variety of purposes, such as determining whether to authorize a transaction requested by the customer.
    Type: Grant
    Filed: July 8, 2010
    Date of Patent: October 11, 2011
    Assignee: American Express Travel Related Services Company, Inc.
    Inventors: Vicki Broman, Vernon Marshall, Seshasayee Bellamkonda, Marcel Leyva, Cynthia Hanson
  • Patent number: 8031881
    Abstract: Method and apparatus for microphone matching for wearable directional hearing assistance devices are provided. An embodiment includes a method for matching at least a first microphone to a second microphone, using a user's voice from the user's mouth. The user's voice is processed as received by at least one microphone to determine a frequency profile associated with voice of the user. Intervals are detected where the user is speaking using the frequency profile. Variations in microphone reception between the first microphone and the second microphone are adaptively canceled during the intervals and when the first microphone and second microphone are in relatively constant spatial position with respect to the user's mouth.
    Type: Grant
    Filed: September 18, 2007
    Date of Patent: October 4, 2011
    Assignee: Starkey Laboratories, Inc.
    Inventor: Tao Zhang
  • Patent number: 8032373
    Abstract: A system and method for enabling two computer systems to communicate over an audio communications channel, such as a voice telephony connection. Such a system includes a software application that enables a user's computer to call, interrogate, download, and manage a voicemail account stored on a telephone company's computer, without human intervention. A voicemail retrieved from the telephone company's computer can be stored in a digital format on the user's computer. In such a format, the voicemail can be readily archived, or even distributed throughout a network, such as the Internet, in a digital form, such as an email attachment. Preferably a computationally efficient audio recognition algorithm is employed by the user's computer to respond to and navigate the automated audio menu of the telephone company's computer.
    Type: Grant
    Filed: February 28, 2007
    Date of Patent: October 4, 2011
    Assignee: Intellisist, Inc.
    Inventor: Martin R. M. Dunsmuir
  • Patent number: 8024189
    Abstract: Systems and methods for detecting people or speakers in an automated fashion are disclosed. A pool of features including more than one type of input (like audio input and video input) may be identified and used with a learning algorithm to generate a classifier that identifies people or speakers. The resulting classifier may be evaluated to detect people or speakers.
    Type: Grant
    Filed: June 22, 2006
    Date of Patent: September 20, 2011
    Assignee: Microsoft Corporation
    Inventors: Cha Zhang, Paul A. Viola, Pei Yin, Ross G. Cutler, Xinding Sun, Yong Rui
  • Patent number: 8010358
    Abstract: Methods and apparatus for voice recognition are disclosed. A voice signal is obtained and two or more voice recognition analyses are performed on the voice signal. Each voice recognition analysis uses a filter bank defined by a different maximum frequency and a different minimum frequency and wherein each voice recognition analysis produces a recognition probability ri of recognition of one or more speech units, whereby there are two or more recognition probabilities ri. The maximum frequency and the minimum frequency may be adjusted every time speech is windowed and analyzed. A final recognition probability Pf is determined based on the two or more recognition probabilities ri.
    Type: Grant
    Filed: February 21, 2006
    Date of Patent: August 30, 2011
    Assignee: Sony Computer Entertainment Inc.
    Inventor: Ruxin Chen
  • Patent number: 8005674
    Abstract: A recognition model set is generated. A technique is described to take advantage of the logarithm likelihood of real data for cross entropy to measure the mismatch between a training data and a training data derived model, and compare such type of mismatches between class dependent models and class independent model for evidence of model replacement. By using change of cross entropies in the decision of adding class independent Gaussian Mixture Models (GMMs), the good performance of class dependent models is largely retained, while decreasing the size and complexity of the model.
    Type: Grant
    Filed: July 10, 2007
    Date of Patent: August 23, 2011
    Assignee: International Business Machines Corporation
    Inventors: Eric W Janke, Bin Jia
  • Patent number: 8000971
    Abstract: Disclosed are systems and methods for training a barge-in-model for speech processing in a spoken dialogue system comprising the steps of (1) receiving an input having at least one speech segment and at least one non-speech segment, (2) establishing a restriction of recognizing only speech states during speech segments of the input and non-speech states during non-speech segments of the input, (2) generating a hypothesis lattice by allowing any sequence of speech Hidden Markov Models (HMMs) and non-speech HMMs, (4) generating a reference lattice by only allowing speech HMMs for at least one speech segment and non-speech HMMs for at least one non-speech segment, wherein different iterations of training generates at least one different reference lattice and at least one reference transcription, and (5) employing the generated reference lattice as the barge-in-model for speech processing.
    Type: Grant
    Filed: October 31, 2007
    Date of Patent: August 16, 2011
    Assignee: AT&T Intellectual Property I, L.P.
    Inventor: Andrej Ljolje
  • Patent number: 7996213
    Abstract: A similarity degree estimation method is performed by two processes. In a first process, an inter-band correlation matrix is created from spectral data of an input voice such that the spectral data are divided into a plurality of discrete bands which are separated from each other with spaces therebetween along a frequency axis, a plurality of envelope components of the spectral data are obtained from the plurality of the discrete bands, and elements of the inter-band correlation matrix are correlation values between the respective envelope components of the input voice. In a second process, a degree of similarity is calculated between a pair of input voices to be compared with each other by using respective inter-band correlation matrices obtained for the pair of the input voices through the inter-band correlation matrix creation process.
    Type: Grant
    Filed: March 20, 2007
    Date of Patent: August 9, 2011
    Assignee: Yamaha Corporation
    Inventors: Mikio Tohyama, Michiko Kazama, Satoru Goto, Takehiko Kawahara, Yasuo Yoshioka
  • Patent number: 7996222
    Abstract: A contour for a syllable (or other speech segment) in a voice undergoing conversion is transformed. The transform of that contour is then used to identify one or more source syllable transforms in a codebook. Information regarding the context and/or linguistic features of the contour being converted can also be compared to similar information in the codebook when identifying an appropriate source transform. Once a codebook source transform is selected, an inverse transformation is performed on a corresponding codebook target transform to yield an output contour. The corresponding codebook target transform represents a target voice version of the same syllable represented by the selected codebook source transform. The output contour may be further processed to improve conversion quality.
    Type: Grant
    Filed: September 29, 2006
    Date of Patent: August 9, 2011
    Assignee: Nokia Corporation
    Inventors: Jani K. Nurminen, Elina Helander
  • Patent number: 7994943
    Abstract: A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software. The device provides output in the form of a default output and a number of variants. The output is based largely upon the frequency, i.e., the likelihood that a user intended a particular output, but various features of the device provide additional variants that are not based solely on frequency and rather are provided by various logic structures resident on the device. The device enables editing during text entry and also provides a learning function that allows the disambiguation function to adapt to provide a customized experience for the user. The disambiguation function can be selectively disabled and an alternate keystroke interpretation system provided.
    Type: Grant
    Filed: August 27, 2007
    Date of Patent: August 9, 2011
    Assignee: Research In Motion Limited
    Inventors: Vadim Fux, Michael G. Elizarov, Sergey V. Kolomiets
  • Patent number: 7983910
    Abstract: Communicating across channels with emotion preservation includes: receiving, by a processor in a communication device, a voice communication; analyzing, by the processor in the communication device, the voice communication for first emotion content; analyzing, by the processor in the communication device, textual content of the voice communication for second emotion content; and marking up, by the processor in the communication device, the textual content with emotion metadata for one of the first emotion content and the second emotion content.
    Type: Grant
    Filed: March 3, 2006
    Date of Patent: July 19, 2011
    Assignee: International Business Machines Corporation
    Inventors: Balan Subramanian, Deepa Srinivasan, Mohamad Reza Salahshoor
  • Patent number: 7983917
    Abstract: An enhanced system for speech interpretation is provided. The system may include receiving a user verbalization and generating one or more preliminary interpretations of the verbalization by identifying one or more phonemes in the verbalization. An acoustic grammar may be used to map the phonemes to syllables or words, and the acoustic grammar may include one or more linking elements to reduce a search space associated with the grammar. The preliminary interpretations may be subject to various post-processing techniques to sharpen accuracy of the preliminary interpretation. A heuristic model may assign weights to various parameters based on a context, a user profile, or other domain knowledge. A probable interpretation may be identified based on a confidence score for each of a set of candidate interpretations generated by the heuristic model. The model may be augmented or updated based on various information associated with the interpretation of the verbalization.
    Type: Grant
    Filed: October 29, 2009
    Date of Patent: July 19, 2011
    Assignee: VoiceBox Technologies, Inc.
    Inventors: Robert A. Kennewick, Min Ke, Michael Tjalve, Philippe Di Cristo
  • Patent number: 7969329
    Abstract: A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software. The device provides output in the form of a default output and a number of variants. The output is based largely upon the frequency, i.e., the likelihood that a user intended a particular output, but various features of the device provide additional variants that are not based solely on frequency and rather are provided by various logic structures resident on the device. The device enables editing during text entry and also provides a learning function that allows the disambiguation function to adapt to provide a customized experience for the user. The disambiguation function can be selectively disabled and an alternate keystroke interpretation system provided.
    Type: Grant
    Filed: October 31, 2007
    Date of Patent: June 28, 2011
    Assignee: Research In Motion Limited
    Inventors: Vadim Fux, Michael Elizarov, Sergey V. Kolomiets
  • Patent number: 7952497
    Abstract: A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software that is operable to disambiguate compound text input. The device is able to assemble language objects in the memory to generate compound language solutions. The device is able to prioritize compound language solutions according to various criteria.
    Type: Grant
    Filed: May 6, 2009
    Date of Patent: May 31, 2011
    Assignee: Research In Motion Limited
    Inventors: Vadim Fux, Michael Elizarov
  • Publication number: 20110119059
    Abstract: Disclosed herein are systems, methods, and computer-readable storage media for selecting a speech recognition model in a standardized speech recognition infrastructure. The system receives speech from a user, and if a user-specific supervised speech model associated with the user is available, retrieves the supervised speech model. If the user-specific supervised speech model is unavailable and if an unsupervised speech model is available, the system retrieves the unsupervised speech model. If the user-specific supervised speech model and the unsupervised speech model are unavailable, the system retrieves a generic speech model associated with the user. Next the system recognizes the received speech from the user with the retrieved model. In one embodiment, the system trains a speech recognition model in a standardized speech recognition infrastructure. In another embodiment, the system handshakes with a remote application in a standardized speech recognition infrastructure.
    Type: Application
    Filed: November 13, 2009
    Publication date: May 19, 2011
    Applicant: AT&T Intellectual Property I, L.P.
    Inventors: Andrej LJOLJE, Bernard S. RENGER, Steven Neil TISCHER
  • Publication number: 20110119060
    Abstract: A method and system for speaker diarization are provided. Pre-trained acoustic models of individual speaker and/or groups of speakers are obtained. Speech data with multiple speakers is received and divided into frames. For a frame, an acoustic feature vector is determined extended to include log-likelihood ratios of the pre-trained models in relation to a background population model. The extended acoustic feature vector is used in segmentation and clustering algorithms.
    Type: Application
    Filed: November 15, 2009
    Publication date: May 19, 2011
    Applicant: International Business Machines Corporation
    Inventor: Hagai Aronowitz
  • Patent number: 7937269
    Abstract: Systems and methods are provided for real-time classification of streaming data. In particular, systems and methods for real-time classification of continuous data streams implement micro-clustering methods for offline and online processing of training data to build and dynamically update training models that are used for classification, as well as incrementally clustering the data over contiguous segments of a continuous data stream (in real-time) into a plurality of micro-clusters from which target profiles are constructed which define/model the behavior of the data in individual segments of the data stream.
    Type: Grant
    Filed: August 22, 2005
    Date of Patent: May 3, 2011
    Assignee: International Business Machines Corporation
    Inventors: Charu Chandra Aggarwal, Philip Shilung Yu
  • Patent number: 7930179
    Abstract: Systems and methods for unsupervised segmentation of multi-speaker speech or audio data by speaker. A front-end analysis is applied to input speech data to obtain feature vectors. The speech data is initially segmented and then clustered into groups of segments that correspond to different speakers. The clusters are iteratively modeled and resegmented to obtain stable speaker segmentations. The overlap between segmentation sets is checked to ensure successful speaker segmentation. Overlapping segments are combined and remodeled and resegmented. Optionally, the speech data is processed to produce a segmentation lattice to maximize the overall segmentation likelihood.
    Type: Grant
    Filed: October 2, 2007
    Date of Patent: April 19, 2011
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Allen Louis Gorin, Zhu Liu, Sarangarajan Parthasarathy, Aaron Edward Rosenberg
  • Publication number: 20110077943
    Abstract: A first system for generating a language model is a system for generating a language model including: a topic history dependent language model storing unit; a topic history accumulation unit; and a language score calculation unit. In the system for generating the language model, a language score corresponding to history of topics is calculated by the language score calculation unit using history of topics in an utterance accumulated in the topic history accumulation unit and a language model stored in the topic history dependent language model storing unit. The topic history dependent language model storing unit may store a topic history dependent language model dependent on only most recent n topics. The topic history accumulation unit may accumulate only most recent n topics.
    Type: Application
    Filed: June 18, 2007
    Publication date: March 31, 2011
    Applicant: NEC CORPORATION
    Inventors: Kiyokazu Miki, Kentaro Nagatomo
  • Patent number: 7904295
    Abstract: It is proposed a text-independent automatic speaker recognition (ASkR) system which employs a new speech feature and a new classifier. The statistical feature pH is a vector of Hurst parameters obtained by applying a wavelet-based multi-dimensional estimator (M dim wavelets) to the windowed short-time segments of speech. The proposed classifier for the speaker identification and verification tasks is based on the multi-dimensional fBm (fractional Brownian motion) model, denoted by M dim fBm. For a given sequence of input speech features, the speaker model is obtained from the sequence of vectors of H parameters, means and variances of these features.
    Type: Grant
    Filed: September 2, 2004
    Date of Patent: March 8, 2011
    Inventor: Rosangelo Fernandes Coelho
  • Patent number: 7895038
    Abstract: Speech enhancement techniques for extemporaneous noise without a noise interval and unknown extemporaneous noise are provided with a method of signal enhancement including subtracting a given reference signal from an input signal containing a target signal and a noise signal by spectral subtraction; applying an adaptive filter to the reference signal; and controlling a filter coefficient of the adaptive filter in order to reduce components of the noise signal in the input signal. In signal enhancement, a database of a signal model concerning the target signal expressing a given feature by a given statistical model is provided, and the filter coefficient is controlled based on the likelihood of the signal model with respect to an output signal from the spectral subtraction means.
    Type: Grant
    Filed: May 26, 2008
    Date of Patent: February 22, 2011
    Assignee: International Business Machines Corporation
    Inventors: Masafumi Nishimura, Tetsuya Takiguchi
  • Patent number: 7881933
    Abstract: A device may include logic configured to receive voice data from a user, identify a result from the voice data, calculate a confidence score associated with the result, and determine a likely age range associated with the user based on the confidence score.
    Type: Grant
    Filed: March 23, 2007
    Date of Patent: February 1, 2011
    Assignee: Verizon Patent and Licensing Inc.
    Inventor: Kevin R. Witzman
  • Patent number: 7877254
    Abstract: The present invention provides a method and apparatus for enrollment and verification of speaker authentication. The method for enrollment of speaker authentication, comprising: extracting an acoustic feature vector sequence from an enrollment utterance of a speaker; and generating a speaker template using the acoustic feature vector sequence; wherein said step of extracting an acoustic feature vector sequence comprises: generating a filter-bank for the enrollment utterance of the speaker for filtering locations and energies of formants in the spectrum of the enrollment utterance based on the enrollment utterance; filtering the spectrum of the enrollment utterance by the generated filter-bank; and generating the acoustic feature vector sequence from the filtered enrollment utterance.
    Type: Grant
    Filed: March 28, 2007
    Date of Patent: January 25, 2011
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Jian Luan, Pei Ding, Lei He, Jie Hao
  • Publication number: 20110004473
    Abstract: A method and apparatus for improving speech recognition results for an audio signal captured within an organization, comprising: receiving the audio signal captured by a capturing or logging device; extracting a phonetic feature and an acoustic feature from the audio signal; decoding the phonetic feature into a phonetic searchable structure; storing the phonetic searchable structure and the acoustic feature in an index; performing phonetic search for a word or a phrase in the phonetic searchable structure to obtain a result; activating an audio analysis engine which receives the acoustic feature to validate the result and obtain an enhanced result.
    Type: Application
    Filed: July 6, 2009
    Publication date: January 6, 2011
    Applicant: Nice Systems Ltd.
    Inventors: Ronen Laperdon, Moshe Wasserblat, Shimrit Artzi, Yuval Lubowich
  • Patent number: 7864987
    Abstract: An access system in one embodiment that first determines that someone has correct credentials by using a non-biometric authentication method such as typing in a password, presenting a Smart card containing a cryptographic secret, or having a valid digital signature. Once the credentials are authenticated, then the user must take at least two biometric tests, which can be chosen randomly. In one approach, the biometric tests need only check a template generated from the user who desires access with the stored templates matching the holder of the credentials authenticated by the non-biometric test. Access desirably will be allowed when both biometric tests are passed.
    Type: Grant
    Filed: April 18, 2006
    Date of Patent: January 4, 2011
    Assignee: Infosys Technologies Ltd.
    Inventors: Kumar Balepur Venkatanna, Rajat Moona, S V Subrahmanya
  • Patent number: 7853450
    Abstract: A method of transmitting digital voice information comprises encoding raw speech into encoded digital speech data. The beginning and end of individual phonemes within the encoded digital speech data are marked. The encoded digital speech data is formed into packets. The packets are fed into a speech decoding mechanism.
    Type: Grant
    Filed: March 30, 2007
    Date of Patent: December 14, 2010
    Assignee: Alcatel-Lucent USA Inc.
    Inventor: Bryan Kadel
  • Publication number: 20100268538
    Abstract: Disclosed are an electronic apparatus and a voice recognition method for the same. The voice recognition method for the electronic apparatus includes: receiving an input voice of a user; determining characteristics of the user; and recognizing the input voice based on the determined characteristics of the user.
    Type: Application
    Filed: January 7, 2010
    Publication date: October 21, 2010
    Applicant: Samsung Electronics Co., Ltd.
    Inventors: Hee-seob RYU, Seung-kwon PARK, Jong-ho LEA, Jong-hyuk JANG
  • Patent number: 7813927
    Abstract: There is provided an apparatus for providing a Text Independent (TI) speaker recognition mode in a Text Dependent (TD) Hidden Markov Model (HMM) speaker recognition system and/or a Text Constrained (TC) HMM speaker recognition system. The apparatus includes a Gaussian Mixture Model (GMM) generator and a Gaussian weight normalizer. The GMM generator is for creating a GMM by pooling Gaussians from a plurality of HMM states. The Gaussian weight normalizer is for normalizing Gaussian weights with respect to the plurality of HMM states.
    Type: Grant
    Filed: June 4, 2008
    Date of Patent: October 12, 2010
    Assignee: Nuance Communications, Inc.
    Inventors: Jiri Navratil, James H. Nealand, Jason W. Pelecanos, Ganesh N. Ramaswamy, Ran D. Zilca
  • Patent number: 7809562
    Abstract: A voice recognition system has a recognition dictionary storing voice information, a primary voice recognition means for performing primary voice recognition in response to input voice information pronounced by a user by the use of the recognition dictionary, and a recognition result judging means for deciding whether the primary voice recognition result is to be accepted or rejected. The voice recognition system includes a transceiver means for sending the input voice information of the user to an additional voice recognition means when the primary voice recognition result is rejected by the recognition result decision means and for receiving a secondary voice recognition result produced as a result of secondary voice recognition of the additional voice recognition means, and a recognition result output means for outputting the primary or secondary voice recognition result to an exterior of the voice recognition system.
    Type: Grant
    Filed: July 26, 2006
    Date of Patent: October 5, 2010
    Assignee: NEC Corporation
    Inventor: Ken Hanazawa
  • Publication number: 20100223057
    Abstract: System and process for audio authentication of an individual or speaker including a processor for decomposing an audio signal received at the sensor into vectors representative of the speaker to be authenticated for transforming the super-vector V of the speaker resulting from the concatenation of the vectors associated with the said speaker into binary data 1001100 . . . 0 taking as an input the mean super-vector M resulting from the mean super-vector, and comparing the super-vector V of the speaker with the mean super-vector M, the said binary data thus obtained being transmitted to a module for extracting the speaker authentication taking as an input the public keys Kpub(1) in order to authenticate the speaker and/or to generate a cryptographic key associated with the speaker.
    Type: Application
    Filed: December 22, 2009
    Publication date: September 2, 2010
    Applicant: Thales
    Inventors: François Capman, Sandra Marcello, Jean Martinelli
  • Patent number: 7788101
    Abstract: Embodiments of a system and method for verifying an identity of a claimant are described. In accordance with one embodiment, a feature may be extracted from a biometric sample captured from a claimant claiming an identity. The extracted feature may be compared to a template associated with the identity to determine the similarity between the extracted feature and the template with the similarity between them being represented by a score. A determination may be made to determine whether the identity has a correction factor associated therewith. If the identity is determined to have a correction factor associated therewith, then the score may be modified using the correction factor. The score may then be compared to a threshold to determine whether to accept the claimant as the identity.
    Type: Grant
    Filed: October 31, 2005
    Date of Patent: August 31, 2010
    Assignee: Hitachi, Ltd.
    Inventor: Clifford Tavares
  • Patent number: 7788095
    Abstract: A method and apparatus for indexing one or more audio signals using a speech to text engine and a phoneme detection engine, and generating a combined lattice comprising a text part and a phoneme part. A word to be searched is searched for in the text part, and if not found, or is found with low certainty is divided into phonemes and searched for in the phoneme parts of the lattice.
    Type: Grant
    Filed: November 18, 2007
    Date of Patent: August 31, 2010
    Assignee: Nice Systems, Ltd.
    Inventors: Moshe Wasserblant, Barak Eilam, Yuval Lubowich, Maor Nissan
  • Publication number: 20100217595
    Abstract: Disclosed herein is a method for emotion recognition based on a minimum classification error. In the method, a speaker's neutral emotion is extracted using a Gaussian mixture model (GMM), other emotions except the neutral emotion are classified using the Gaussian Mixture Model to which a discriminative weight for minimizing the loss function of a classification error for the feature vector for emotion recognition is applied. In the emotion recognition, the emotion recognition is performed by applying a discriminative weight evaluated using the Gaussian Mixture Model based on minimum classification error to feature vectors of the emotion classified with difficult, thereby enhancing the performance of emotion recognition.
    Type: Application
    Filed: February 23, 2010
    Publication date: August 26, 2010
    Applicants: KOREA INSTITUTE OF SCIENCE AND TECHNOLOGY, Electronics and Telecommunications Research Institute
    Inventors: Hyoung Gon KIM, Ig Jae KIM, Joon-Hyuk CHANG, Kye Hwan LEE, Chang Seok BAE
  • Publication number: 20100211376
    Abstract: Computer implemented speech processing generates one or more pronunciations of an input word in a first language by a non-native speaker of the first language who is a native speaker of a second language. The input word is converted into one or more pronunciations. Each pronunciation includes one or more phonemes selected from a set of phonemes associated with the second language. Each pronunciation is associated with the input word in an entry in a computer database. Each pronunciation in the database is associated with information identifying a pronunciation language and/or a phoneme language.
    Type: Application
    Filed: February 2, 2010
    Publication date: August 19, 2010
    Applicant: Sony Computer Entertainment Inc.
    Inventors: Ruxin Chen, Gustavo Hernandez-Abrego, Masanori Omote, Xavier Menendez-Pidal
  • Patent number: 7778831
    Abstract: Voice recognition methods and systems are disclosed. A voice signal is obtained for an utterance of a speaker. A runtime pitch is determined from the voice signal for the utterance. The speaker is categorized based on the runtime pitch and one or more acoustic model parameters are adjusted based on a categorization of the speaker. The parameter adjustment may be performed at any instance of time during the recognition. A voice recognition analysis of the utterance is then performed based on the acoustic model.
    Type: Grant
    Filed: February 21, 2006
    Date of Patent: August 17, 2010
    Assignee: Sony Computer Entertainment Inc.
    Inventor: Ruxin Chen
  • Patent number: 7778832
    Abstract: One-to-many comparisons of callers' voice prints with known voice prints to identify any matches between them. When a customer communicates with a particular entity, such as a customer service center, the system makes a recording of the real-time call including both the customer's and agent's voices. The system segments the recording to extract at least a portion of the customer's voice to create a customer voice print, and it formats the segmented voice print for network transmission to a server. The server compares the customer's voice print with multiple known voice prints to determine any matches, meaning that the customer's voice print and one of the known voice prints are likely from the same person. The identification of any matches can be used for a variety of purposes, such as determining whether to authorize a transaction requested by the customer.
    Type: Grant
    Filed: September 26, 2007
    Date of Patent: August 17, 2010
    Assignee: American Express Travel Related Services Company, Inc.
    Inventors: Vicki Broman, Vernon Marshall, Seshasayee Bellamkonda, Marcel Leyva, Cynthia Hanson
  • Publication number: 20100205120
    Abstract: A method for researching and developing a recognition model in a computing environment, including gathering one or more data samples from one or more users in the computing environment into a training data set used for creating the recognition model, receiving one or more training parameters defining a feature extraction algorithm configured to analyze one or more features of the training data set, a classifier algorithm configured to associate the features to a template set, a selection of a subset of the training data set, a type of the data samples, or combinations thereof, creating the recognition model based on the training parameters, and evaluating the recognition model.
    Type: Application
    Filed: February 6, 2009
    Publication date: August 12, 2010
    Applicant: Microsoft Corporation
    Inventors: Yu Zou, Hao Wei, Gong Cheng, Dongmei Zhang, Jian Wang
  • Publication number: 20100204993
    Abstract: The present invention relates to a system and method of making a verification decision within a speaker recognition system. A speech sample is gathered from a speaker over a period of time a verification score is then produce for said sample over the period. Once the verification score is determined a confidence measure is produced based on frame score observations from said sample over the period and a confidence measure calculated based on the standard Gaussian distribution. If the confidence measure indicates with a set level of confidence that the verification score is below the verification threshold the speaker is rejected and gathering process terminated.
    Type: Application
    Filed: December 19, 2007
    Publication date: August 12, 2010
    Inventor: Robert VOGT
  • Publication number: 20100198598
    Abstract: A method for recognizing a speaker of an utterance in a speech recognition system is disclosed. A likelihood score for each of a plurality of speaker models for different speakers is determined. The likelihood score indicating how well the speaker model corresponds to the utterance. For each of the plurality of speaker models, a probability that the utterance originates from that speaker is determined. The probability is determined based on the likelihood score for the speaker model and requires the estimation of a distribution of likelihood scores expected based at least in part on the training state of the speaker.
    Type: Application
    Filed: February 4, 2010
    Publication date: August 5, 2010
    Applicant: NUANCE COMMUNICATIONS, INC.
    Inventors: Tobias Herbig, Franz Gerl
  • Patent number: 7769583
    Abstract: A system, method and computer program product for classification of an analog electrical signal using statistical models of training data. A technique is described to quantize the analog electrical signal in a manner which maximizes the compression of the signal while simultaneously minimizing the diminution in the ability to classify the compressed signal. These goals are achieved by utilizing a quantizer designed to minimize the loss in a power of the log-likelihood ratio. A further technique is described to enhance the quantization process by optimally allocating a number of bits for each dimension of the quantized feature vector subject to a maximum number of bits available across all dimensions.
    Type: Grant
    Filed: May 13, 2006
    Date of Patent: August 3, 2010
    Assignee: International Business Machines Corporation
    Inventors: Upendra V. Chaudhari, Hsin I. Tseng, Deepak S. Turaga, Olivier Verscheure