Hidden Markov Model (hmm) (epo) Patents (Class 704/256.1)
  • Patent number: 11961514
    Abstract: An acoustic event detection system may employ one or more recurrent neural networks (RNNs) to extract features from audio data, and use the extracted features to determine the presence of an acoustic event. The system may use self-attention to emphasize features extracted from portions of audio data that may include features more useful for detecting acoustic events. The system may perform self-attention in an iterative manner to reduce the amount of memory used to store hidden states of the RNN while processing successive portions of the audio data. The system may process the portions of the audio data using the RNN to generate a hidden state for each portion. The system may calculate an interim embedding for each hidden state. An interim embedding calculated for the last hidden state may be normalized to determine a final embedding representing features extracted from the input data by the RNN.
    Type: Grant
    Filed: December 10, 2021
    Date of Patent: April 16, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Chia-Jung Chang, Qingming Tang, Ming Sun, Chao Wang
  • Patent number: 11942072
    Abstract: Disclosed is a wireless communication device including a voice recognition portion configured to convert a voice signal input through a microphone into a syllable information stream using voice recognition, an encoding portion configured to encode the syllable information stream to generate digital transmission data, a transmission portion configured to modulate from the digital transmission data to a transmission signal and transmit the transmission signal through an antenna, a reception portion configured to demodulate from a reception signal received through the antenna to a digital reception data and output the digital reception data, a decoding portion configured to decode the digital reception data to generate the syllable information stream and a voice synthesis portion configured to convert the syllable information stream into the voice signal using voice synthesis and output the voice signal through a speaker.
    Type: Grant
    Filed: February 3, 2021
    Date of Patent: March 26, 2024
    Inventor: Sang Rae Park
  • Patent number: 11837253
    Abstract: A device, system, and method whereby a speech-driven system can distinguish speech obtained from users of the system from other speech spoken by background persons, as well as from background speech from public address systems. In one aspect, the present system and method prepares, in advance of field-use, a voice-data file which is created in a training environment. The training environment exhibits both desired user speech and unwanted background speech, including unwanted speech from persons other than a user and also speech from a PA system. The speech recognition system is trained or otherwise programmed to identify wanted user speech which may be spoken concurrently with the background sounds. In an embodiment, during the pre-field-use phase the training or programming may be accomplished by having persons who are training listeners audit the pre-recorded sounds to identify the desired user speech. A processor-based learning system is trained to duplicate the assessments made by the human listeners.
    Type: Grant
    Filed: September 28, 2021
    Date of Patent: December 5, 2023
    Assignee: VOCOLLECT, INC.
    Inventor: David D. Hardek
  • Patent number: 11817013
    Abstract: A display apparatus and a method for questions and answers includes a display unit includes an input unit configured to receive user's speech voice; a communication unit configured to perform data communication with an answer server; and a processor configured to create and display one or more question sentences using the speech voice in response to the speech voice being a word speech, create a question language corresponding to the question sentence selected from among the displayed one or more question sentences, transmit the created question language to the answer server via the communication unit, and, in response to one or more answer results related to the question language being received from the answer server, display the received one or more answer results. Accordingly, the display apparatus may provide an answer result appropriate to a user's question intention although a non-sentence speech is input.
    Type: Grant
    Filed: November 13, 2020
    Date of Patent: November 14, 2023
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventor: Eun-sang Bak
  • Patent number: 11783197
    Abstract: Systems and methods for neural machine translation are provided. In one example, a neural machine translation system translates text and comprises processors and a memory storing instructions that, when executed by at least one processor among the processors, cause the system to perform operations comprising, at least, obtaining a text as an input to a neural network system, supplementing the input text with meta information as an extra input to the neural network system, and delivering an output of the neural network system to a user as a translation of the input text, leveraging the meta information for translation.
    Type: Grant
    Filed: November 17, 2021
    Date of Patent: October 10, 2023
    Assignee: EBAY INC.
    Inventors: Evgeny Matusov, Wenhu Chen, Shahram Khadivi
  • Patent number: 11776530
    Abstract: An apparatus for speech model with personalization via ambient context harvesting, is described herein. The apparatus includes a microphone, context harvesting module, confidence module, and training module. The context harvesting module is to determine a context associated with the captured audio signals. A confidence module is to determine a confidence of the context as applied to the audio signals. A training module is to train a neural network in response to the confidence being above a predetermined threshold.
    Type: Grant
    Filed: November 15, 2017
    Date of Patent: October 3, 2023
    Assignee: INTEL CORPORATION
    Inventors: Gabriel Amores, Guillermo Perez, Moshe Wasserblat, Michael Deisher, Loic Dufresne de Virel
  • Patent number: 11694685
    Abstract: A method includes receiving audio data corresponding to an utterance spoken by the user and captured by the user device. The utterance includes a command for a digital assistant to perform an operation. The method also includes determining, using a hotphrase detector configured to detect each trigger word in a set of trigger words associated with a hotphrase, whether any of the trigger words in the set of trigger words are detected in the audio data during the corresponding fixed-duration time window. The method also includes determining identifying, in the audio corresponding to the utterance, the hotphrase when each other trigger word in the set of trigger words was also detected in the audio data. The method also includes triggering an automated speech recognizer to perform speech recognition on the audio data when the hotphrase is identified in the audio data corresponding to the utterance.
    Type: Grant
    Filed: December 10, 2020
    Date of Patent: July 4, 2023
    Assignee: Google LLC
    Inventors: Victor Carbune, Matthew Sharifi
  • Patent number: 11631399
    Abstract: According to some embodiments, a machine learning model may include an input layer to receive an input signal as a series of frames representing handwriting data, speech data, audio data, and/or textual data. A plurality of time layers may be provided, and each time layer may comprise a uni-directional recurrent neural network processing block. A depth processing block may scan hidden states of the recurrent neural network processing block of each time layer, and the depth processing block may be associated with a first frame and receive context frame information of a sequence of one or more future frames relative to the first frame. An output layer may output a final classification as a classified posterior vector of the input signal. For example, the depth processing block may receive the context from information from an output of a time layer processing block or another depth processing block of the future frame.
    Type: Grant
    Filed: May 13, 2019
    Date of Patent: April 18, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Jinyu Li, Vadim Mazalov, Changliang Liu, Liang Lu, Yifan Gong
  • Patent number: 11620104
    Abstract: Characteristics of a speaker are estimated using speech processing and machine learning. The characteristics of the speaker are used to automatically customize a user interface of a client device for the speaker.
    Type: Grant
    Filed: July 11, 2022
    Date of Patent: April 4, 2023
    Assignee: Google LLC
    Inventors: Eugene Weinstein, Ignacio L. Moreno
  • Patent number: 11574646
    Abstract: A method of extracting a fundamental frequency of an input sound includes generating a DJ transform spectrogram indicating estimated pure-tone amplitudes for respective natural frequencies of a plurality of springs and a plurality of time points by calculating the estimated pure-tone amplitudes for the respective natural frequencies by modeling an oscillation motion of the plurality of springs having different natural frequencies with respect to an input sound, calculating degrees of fundamental frequency suitability based on a moving average of the estimated pure-tone amplitudes or on a moving standard deviation of the estimated pure-tone amplitudes with respect to each natural frequency of the DJ transform spectrogram, and extracting a fundamental frequency based on local maximum values of the degrees of fundamental frequency suitability for the respective natural frequencies at each of the plurality of time points.
    Type: Grant
    Filed: November 12, 2020
    Date of Patent: February 7, 2023
    Assignee: BRAINSOFT INC.
    Inventors: Dong Jin Kim, Ju Yong Shin
  • Patent number: 11514904
    Abstract: Methods, computer program products, and systems are presented. The method computer program products, and systems can include, for instance: receiving, from a user, voice data defining a candidate directive invoking vocal utterance for invoking a directive to execute a first text based command to perform a first computer function of a computer system, wherein the candidate directive invoking vocal utterance includes at least one word or phrase of the text based command, wherein the computer system is configured to perform the first computer function in response to the first text based command and wherein the computer system is configured to perform a second computer function in response to a second text based command; determining, based on machine logic, whether a word or phrase of the candidate vocal utterance sounds confusingly similar to a speech rendering of a word or phrase defining the second text based command.
    Type: Grant
    Filed: November 20, 2019
    Date of Patent: November 29, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jeremy A. Greenberger, Nicholas R. Sandonato
  • Patent number: 10937444
    Abstract: A system for end-to-end automated scoring is disclosed. The system includes a word embedding layer for converting a plurality of ASR outputs into input tensors; a neural network lexical model encoder receiving the input tensors; a neural network acoustic model encoder implementing AM posterior probability, word duration, mean value of pitch and mean value of intensity based on a plurality of cues; and a linear regression module, for receiving concatenated encoded features from the neural network lexical model encoder and the neural network acoustic model encoder.
    Type: Grant
    Filed: November 20, 2018
    Date of Patent: March 2, 2021
    Assignee: Educational Testing Service
    Inventors: David Suendermann-Oeft, Lei Chen, Jidong Tao, Shabnam Ghaffarzadegan, Yao Qian
  • Patent number: 10930267
    Abstract: Provided is a speech recognition method for a recognition target language. According to an embodiment of the inventive concept, a speech recognition method for a recognition target language performed by a speech recognition apparatus includes obtaining an original learning data set for the recognition target language, constructing a target label by dividing the text information included in each piece of original learning data in letter units, and building an acoustic model based on a deep neural network by learning the learning speech data included in the each piece of original learning data and the target label corresponding to the learning speech data.
    Type: Grant
    Filed: June 13, 2018
    Date of Patent: February 23, 2021
    Assignee: SAMSUNG SDS CO., LTD.
    Inventors: Min Soo Kim, Ji Hyeon Seo, Kyung Jun An, Seung Kyung Kim
  • Patent number: 10891573
    Abstract: A method can include receiving state information for a wellsite system; receiving contextual information for a role associated with a workflow; generating a natural language report based at least in part on the state information and based at least in part on the contextual information; and transmitting the natural language report via a network interface based at least in part on an identifier associated with the role.
    Type: Grant
    Filed: April 18, 2016
    Date of Patent: January 12, 2021
    Assignee: Schlumberger Technology Corporation
    Inventors: Benoit Foubert, Richard John Meehan, Jean-Pierre Poyet, Sandra Reyes, Raymond Lin, Sylvain Chambon
  • Patent number: 10283142
    Abstract: Systems and methods are provided for a processor-implemented method of analyzing quality of sound acquired via a microphone. An input metric is extracted from a sound recording at each of a plurality of time intervals. The input metric is provided at each of the time intervals to a neural network that includes a memory component, where the neural network provides an output metric at each of the time intervals, where the output metric at a particular time interval is based on the input metric at a plurality of time intervals other than the particular time interval using the memory component of the neural network. The output metric is aggregated from each of the time intervals to generate a score indicative of the quality of the sound acquired via the microphone.
    Type: Grant
    Filed: July 21, 2016
    Date of Patent: May 7, 2019
    Assignee: Educational Testing Service
    Inventors: Zhou Yu, Vikram Ramanarayanan, David Suendermann-Oeft, Xinhao Wang, Klaus Zechner, Lei Chen, Jidong Tao, Yao Qian
  • Patent number: 10229676
    Abstract: Methods and systems for identifying specified phrases within audio streams are provided. More particularly, a phrase is specified. An audio stream is them monitored for the phrase. In response to determining that the audio stream contains the phrase, verification from a user that the phrase was in fact included in the audio stream is requested. If such verification is received, the portion of the audio stream including the phrase is recorded. The recorded phrase can then be applied to identify future instances of the phrase in monitored audio streams.
    Type: Grant
    Filed: October 5, 2012
    Date of Patent: March 12, 2019
    Assignee: Avaya Inc.
    Inventors: Shmuel Shaffer, Keith Ponting, Valentine C. Matula
  • Patent number: 10008197
    Abstract: A keyword detector includes a processor configured to calculate a feature vector for each frame from a speech signal, input the feature vector for each frame to a DNN to calculate a first output probability for each triphone according to a sequence of phonemes contained in a predetermined keyword and a second output probability for each monophone, for each of at least one state of an HMM, calculate a first likelihood representing the probability that the predetermined keyword is uttered in the speech signal by applying the first output probability to the HMM, calculate a second likelihood for the most probable phoneme string in the speech signal by applying the second output probability to the HMM, and determine whether the keyword is to be detected on the basis of the first likelihood and the second likelihood.
    Type: Grant
    Filed: October 24, 2016
    Date of Patent: June 26, 2018
    Assignee: FUJITSU LIMITED
    Inventor: Shoji Hayakawa
  • Patent number: 9881616
    Abstract: A method for improving speech recognition by a speech recognition system includes obtaining a voice sample from a speaker; storing the voice sample of the speaker as a voice model in a voice model database; identifying an area from which sound matching the voice model for the speaker is coming; providing one or more audio signals corresponding to sound received from the identified area to the speech recognition system for processing.
    Type: Grant
    Filed: June 6, 2012
    Date of Patent: January 30, 2018
    Assignee: QUALCOMM Incorporated
    Inventors: Jeffrey D. Beckley, Pooja Aggarwal, Shivakumar Balasubramanyam
  • Patent number: 9418334
    Abstract: Pretraining for a DBN initializes weights of the DBN (Deep Belief Network) using a hybrid pre-training methodology. Hybrid pre-training employs generative component that allows the hybrid PT method to have better performance in WER (Word Error Rate) compared to the discriminative PT method. Hybrid pre-training learns weights which are more closely linked to the final objective function, allowing for a much larger batch size compared to generative PT, which allows for improvements in speed; and a larger batch size allows for parallelization of the gradient computation, speeding up training further.
    Type: Grant
    Filed: December 6, 2012
    Date of Patent: August 16, 2016
    Assignee: Nuance Communications, Inc.
    Inventors: Tara N. Sainath, Brian Kingsbury, Bhuvana Ramabhadran
  • Patent number: 9177558
    Abstract: Computer-implemented systems and methods are provided for assessing non-native spontaneous speech pronunciation. Speech recognition on digitized speech is performed using a non-native acoustic model trained with non-native speech to generate word hypotheses for the digitized speech. Time alignment is performed between the digitized speech and the word hypotheses using a reference acoustic model trained with native-quality speech. Statistics are calculated regarding individual words and phonemes in the word hypotheses based on the alignment. A plurality of features for use in assessing pronunciation of the speech are calculated based on the statistics, an assessment score is calculated based on one or more of the calculated features, and the assessment score is stored in a computer-readable memory.
    Type: Grant
    Filed: January 31, 2013
    Date of Patent: November 3, 2015
    Assignee: Educational Testing Service
    Inventors: Lei Chen, Klaus Zechner, Xiaoming Xi
  • Patent number: 9037462
    Abstract: Disclosed herein are systems, computer-implemented methods, and tangible computer-readable media for using alternate recognition hypotheses to improve whole-dialog understanding accuracy. The method includes receiving an utterance as part of a user dialog, generating an N-best list of recognition hypotheses for the user dialog turn, selecting an underlying user intention based on a belief distribution across the generated N-best list and at least one contextually similar N-best list, and responding to the user based on the selected underlying user intention. Selecting an intention can further be based on confidence scores associated with recognition hypotheses in the generated N-best lists, and also on the probability of a user's action given their underlying intention. A belief or cumulative confidence score can be assigned to each inferred user intention.
    Type: Grant
    Filed: March 20, 2012
    Date of Patent: May 19, 2015
    Assignee: AT&T Intellectual Property I, L.P.
    Inventor: Jason Williams
  • Patent number: 9009039
    Abstract: Technologies are described herein for noise adaptive training to achieve robust automatic speech recognition. Through the use of these technologies, a noise adaptive training (NAT) approach may use both clean and corrupted speech for training. The NAT approach may normalize the environmental distortion as part of the model training. A set of underlying “pseudo-clean” model parameters may be estimated directly. This may be done without point estimation of clean speech features as an intermediate step. The pseudo-clean model parameters learned from the NAT technique may be used with a Vector Taylor Series (VTS) adaptation. Such adaptation may support decoding noisy utterances during the operating phase of a automatic voice recognition system.
    Type: Grant
    Filed: June 12, 2009
    Date of Patent: April 14, 2015
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Michael Lewis Seltzer, James Garnet Droppo, Ozlem Kalinli, Alejandro Acero
  • Patent number: 8972254
    Abstract: Environmental recognition systems may improve recognition accuracy by leveraging local and nonlocal features in a recognition target. A local decoder may be used to analyze local features, and a nonlocal decoder may be used to analyze nonlocal features. Local and nonlocal estimates may then be exchanged to improve the accuracy of the local and nonlocal decoders. Additional iterations of analysis and exchange may be performed until a predetermined threshold is reached. In some embodiments, the system may comprise extrinsic information extractors to prevent positive feedback loops from causing the system to adhere to erroneous previous decisions.
    Type: Grant
    Filed: June 28, 2012
    Date of Patent: March 3, 2015
    Assignee: Utah State University
    Inventors: Jacob Gunther, Todd Moon
  • Publication number: 20140365221
    Abstract: A computer-implemented method performed by a computerized device, a computerized apparatus and a computer program product for recognizing speech, the method comprising: receiving a signal; extracting audio features from the signal; performing acoustic level processing on the audio features; receiving additional data; extracting additional features from the additional data; fusing the audio features and the additional features into a unified structure; receiving a Hidden Markov Model (HMM); and performing a quantum search over the features using the HMM and the unified structure.
    Type: Application
    Filed: August 27, 2014
    Publication date: December 11, 2014
    Inventor: Yossef Ben-Ezra
  • Publication number: 20140324426
    Abstract: The present invention, pertaining to the field of speech recognition, discloses a reminder setting method and apparatus. The method includes: acquiring speech signals; acquiring time information in speech signals by using keyword recognition, and determining reminder time for reminder setting according to the time information; acquiring text sequence corresponding to the speech signals by using continuous speech recognition, and determining reminder content for reminder setting according to the time information and the text sequence; and setting a reminder according to the reminder time and the reminder content.
    Type: Application
    Filed: May 28, 2013
    Publication date: October 30, 2014
    Inventors: Li LU, Feng RAO, Song LIU, Zongyao TANG, Xiang ZHANG, Shuai YUE, Bo CHEN
  • Patent number: 8793124
    Abstract: A scheme to judge emphasized speech portions, wherein the judgment is executed by a statistical processing in terms of a set of speech parameters including a fundamental frequency, power and a temporal variation of a dynamic measure and/or their derivatives. The emphasized speech portions are used for clues to summarize an audio content or a video content with a speech.
    Type: Grant
    Filed: April 5, 2006
    Date of Patent: July 29, 2014
    Assignee: Nippon Telegraph and Telephone Corporation
    Inventors: Kota Hidaka, Shinya Nakajima, Osamu Mizuno, Hidetaka Kuwano, Haruhiko Kojima
  • Patent number: 8774261
    Abstract: A two stage interference cancellation (IC) process includes a linear IC stage that suppresses co-channel interference (CCI) and adjacent channel interference (ACI). The linear IC stage disambiguates otherwise super-trellis data for non-linear cancellation. Soft linear IC processing is driven by a-posteriori probability (Apop) information. A second stage performs expectation maximization/Baum Welch (EM-BW) processing that reduces residual ISI left over from the first stage and also generates the Apop which drives the soft linear IC in an iterative manner.
    Type: Grant
    Filed: June 11, 2012
    Date of Patent: July 8, 2014
    Assignee: QUALCOMM Incorporated
    Inventors: Farrokh Abrishamkar, Divaydeep Sikri, Ken Delgado
  • Publication number: 20140180693
    Abstract: Embodiments of the present invention include an acoustic processing device, a method for acoustic signal processing, and a speech recognition system. The speech processing device can include a processing unit, a histogram pruning unit, and a pre-pruning unit. The processing unit is configured to calculate one or more Hidden Markov Model (HMM) pruning thresholds. The histogram pruning unit is configured to prune one or more HMM states to generate one or more active HMM states. The pruning is based on the one or more pruning thresholds. The pre-pruning unit is configured to prune the one or more active HMM states based on an adjustable pre-pruning threshold. Further, the adjustable pre-pruning threshold is based on the one or more pruning thresholds.
    Type: Application
    Filed: December 21, 2012
    Publication date: June 26, 2014
    Applicant: Spansion LLC
    Inventor: Ojas Ashok BAPAT
  • Publication number: 20140180694
    Abstract: Embodiments of the present invention include an acoustic processing device and a method for traversing a Hidden Markov Model (HMM). The acoustic processing device can include a senone scoring unit (SSU), a memory device, a HMM module, and an interface module. The SSU is configured to receive feature vectors from an external computing device and to calculate senones. The memory device is configured to store the senone scores and HMM information, where the HMM information includes HMM IDs and HMM state scores. The HMM module is configured to traverse the HMM based on the senone scores and the HMM information. Further, the interface module is configured to transfer one or more HMM scoring requests from the external computing device to the HMM module and to transfer the HMM state scores to the external computing device.
    Type: Application
    Filed: December 21, 2012
    Publication date: June 26, 2014
    Applicant: Spansion LLC
    Inventors: Richard M. FASTOW, Ojas A. Bapat, Jens Olson
  • Patent number: 8719023
    Abstract: An apparatus to improve robustness to environmental changes of a context dependent speech recognizer for an application, that includes a training database to store sounds for speech recognition training, a dictionary to store words supported by the speech recognizer, and a speech recognizer training module to train a set of one or more multiple state Hidden Markov Models (HMMs) with use of the training database and the dictionary. The speech recognizer training module performs a non-uniform state clustering process on each of the states of each HMM, which includes using a different non-uniform cluster threshold for at least some of the states of each HMM to more heavily cluster and correspondingly reduce a number of observation distributions for those of the states of each HMM that are less empirically affected by one or more contextual dependencies.
    Type: Grant
    Filed: May 21, 2010
    Date of Patent: May 6, 2014
    Assignee: Sony Computer Entertainment Inc.
    Inventors: Xavier Menendez-Pidal, Ruxin Chen
  • Patent number: 8700403
    Abstract: A method of statistical modeling is provided which includes constructing a statistical model and incorporating Gaussian priors during feature selection and during parameter optimization for the construction of the statistical model.
    Type: Grant
    Filed: November 3, 2005
    Date of Patent: April 15, 2014
    Assignee: Robert Bosch GmbH
    Inventors: Fuliang Weng, Lin Zhao
  • Patent number: 8700400
    Abstract: Subspace speech adaptation may be utilized for facilitating the recognition of speech containing short utterances. Speech training data may be received in a speech model by a computer. A first matrix may be determined for preconditioning speech statistics based on the speech training data. A second matrix may be determined for representing a basis for the speech to be recognized. A set of basis matrices may then be determined from the first matrix and the second matrix. Speech test data including a short utterance may then be received by the computer. The computer may then apply the set of basis matrices to the speech test data to produce a transcription. The transcription may represent speech recognition of the short utterance.
    Type: Grant
    Filed: December 30, 2010
    Date of Patent: April 15, 2014
    Assignee: Microsoft Corporation
    Inventors: Daniel Povey, Kaisheng Yao, Yifan Gong
  • Patent number: 8639510
    Abstract: A hardware acoustic scoring unit for a speech recognition system and a method of operation thereof are provided. Rather than scoring all senones in an acoustic model used for the speech recognition system, acoustic scoring logic first scores a set of ciphones based on acoustic features for one frame of sampled speech. The acoustic scoring logic then scores senones associated with the N highest scored ciphones. In one embodiment, the number (N) is three. While the acoustic scoring logic scores the senones associated with the N highest scored ciphones, high score ciphone identification logic operates in parallel with the acoustic scoring unit to identify one or more additional ciphones that have scores greater than a threshold. Once the acoustic scoring unit finishes scoring the senones for the N highest scored ciphones, the acoustic scoring unit then scores senones associated with the one or more additional ciphones.
    Type: Grant
    Filed: December 22, 2008
    Date of Patent: January 28, 2014
    Inventors: Kai Yu, Rob A. Rutenbar
  • Patent number: 8635067
    Abstract: Access is obtained to a large reference acoustic model for automatic speech recognition. The large reference acoustic model has L states modeled by L mixture models, and the large reference acoustic model has N components. A desired number of components Nc, less than N, to be used in a restructured acoustic model derived from the reference acoustic model, is identified. The desired number of components Nc is selected based on a computing environment in which the restructured acoustic model is to be deployed. The restructured acoustic model also has L states. For each given one of the L mixture models in the reference acoustic model, a merge sequence is built which records, for a given cost function, sequential mergers of pairs of the components associated with the given one of the mixture models. A portion of the Nc components is assigned to each of the L states in the restructured acoustic model.
    Type: Grant
    Filed: December 9, 2010
    Date of Patent: January 21, 2014
    Assignee: International Business Machines Corporation
    Inventors: Pierre Dognin, Vaibhava Goel, John R. Hershey, Peder A. Olsen
  • Patent number: 8600749
    Abstract: Disclosed herein are systems, methods, and computer-readable storage media for training adaptation-specific acoustic models. A system practicing the method receives speech and generates a full size model and a reduced size model, the reduced size model starting with a single distribution for each speech sound in the received speech. The system finds speech segment boundaries in the speech using the full size model and adapts features of the speech data using the reduced size model based on the speech segment boundaries and an overall centroid for each speech sound. The system then recognizes speech using the adapted features of the speech. The model can be a Hidden Markov Model (HMM). The reduced size model can also be of a reduced complexity, such as having fewer mixture components than a model of full complexity. Adapting features of speech can include moving the features closer to an overall feature distribution center.
    Type: Grant
    Filed: December 8, 2009
    Date of Patent: December 3, 2013
    Assignee: AT&T Intellectual Property I, L.P.
    Inventor: Andrej Ljolje
  • Patent number: 8504357
    Abstract: A related word presentation device includes a program information storage unit that stores program information of each program; and an information dividing unit that generates, for each of the attributes of the words included in the program information, at least one group which includes a reference word belonging to the attribute and a set of words which co-occur with the reference word in a program. A degree-of-relevance calculating unit stores attribute-based association dictionaries each of which indicates, for the corresponding attribute of words, (i) the words and (ii) the degrees of relevance between the words calculated based on the frequency of co-occurrence in each of groups. A search condition obtaining unit obtains the search word and the attribute; a substitute word obtaining unit selects substitute words from the attribute-based association dictionary for the obtained attribute; and an output unit presents the selected substitute word.
    Type: Grant
    Filed: July 30, 2008
    Date of Patent: August 6, 2013
    Assignee: Panasonic Corporation
    Inventors: Takashi Tsuzuki, Satoshi Matsuura, Kazutoyo Takata
  • Patent number: 8484035
    Abstract: A method of altering a social signaling characteristic of a speech signal. A statistically large number of speech samples created by different speakers in different tones of voice are evaluated to determine one or more relationships that exist between a selected social signaling characteristic and one or more measurable parameters of the speech samples. An input audio voice signal is then processed in accordance with these relationships to modify one or more of controllable parameters of input audio voice signal to produce a modified output audio voice signal in which said selected social signaling characteristic is modified. In a specific illustrative embodiment, a two-level hidden Markov model is used to identify voiced and unvoiced speech segments and selected controllable characteristics of these speech segments are modified to alter the desired social signaling characteristic.
    Type: Grant
    Filed: September 6, 2007
    Date of Patent: July 9, 2013
    Assignee: Massachusetts Institute of Technology
    Inventor: Alex Paul Pentland
  • Publication number: 20130151254
    Abstract: A speech recognition module includes an acoustic front-end module, a sound detection module, and a word detection module. The acoustic front-end module generates a plurality of representations of frames from a digital audio signal and generates speech characteristic probabilities for the plurality of frames. The sound detection module determines a plurality of estimated utterances from the plurality of representations and the speech characteristic probabilities. The word detection module determines one or more words based on the plurality of estimated utterances and the speech characteristic probabilities.
    Type: Application
    Filed: January 31, 2013
    Publication date: June 13, 2013
    Applicant: BROADCOM CORPORATION
    Inventor: Nambirajan Seshadri
  • Publication number: 20130132085
    Abstract: Methods and systems for non-negative hidden Markov modeling of signals are described. For example, techniques disclosed herein may be applied to signals emitted by one or more sources. In some embodiments, methods and systems may enable the separation of a signal's various components. As such, the systems and methods disclosed herein may find a wide variety of applications. In audio-related fields, for example, these techniques may be useful in music recording and processing, source extraction, noise reduction, teaching, automatic transcription, electronic games, audio search and retrieval, and many other applications.
    Type: Application
    Filed: February 21, 2011
    Publication date: May 23, 2013
    Inventors: Gautham J. Mysore, Paris Smaragdis
  • Patent number: 8392190
    Abstract: Computer-implemented systems and methods are provided for assessing non-native spontaneous speech pronunciation. Speech recognition on digitized speech is performed using a non-native acoustic model trained with non-native speech to generate word hypotheses for the digitized speech. Time alignment is performed between the digitized speech and the word hypotheses using a reference acoustic model trained with native-quality speech. Statistics are calculated regarding individual words and phonemes in the word hypotheses based on the alignment. A plurality of features for use in assessing pronunciation of the speech are calculated based on the statistics, an assessment score is calculated based on one or more of the calculated features, and the assessment score is stored in a computer-readable memory.
    Type: Grant
    Filed: December 1, 2009
    Date of Patent: March 5, 2013
    Assignee: Educational Testing Service
    Inventors: Lei Chen, Klaus Zechner, Xiaoming Xi
  • Patent number: 8364487
    Abstract: A language processing system may determine a display form of a spoken word by analyzing the spoken form using a language model that includes dictionary entries for display forms of homonyms. The homonyms may include trade names as well as given names and other phrases. The language processing system may receive spoken language and produce a display form of the language while displaying the proper form of the homonym. Such a system may be used in search systems where audio input is converted to a graphical display of a portion of the spoken input.
    Type: Grant
    Filed: October 21, 2008
    Date of Patent: January 29, 2013
    Assignee: Microsoft Corporation
    Inventors: Yun-Cheng Ju, Julian J. Odell
  • Patent number: 8352265
    Abstract: A hardware implemented backend search stage, or engine, for a speech recognition system is provided. In one embodiment, the backend search engine includes a number of pipelined stages including a fetch stage, an updating stage which may be a Viterbi stage, a transition and prune stage, and a language model stage. Each active triphone of each active word is represented by a corresponding triphone model. By being pipelined, the stages of the backend search engine are enabled to simultaneously process different triphone models, thereby providing high-rate backend searching for the speech recognition system. In one embodiment, caches may be used to cache frequently and/or recently accessed triphone information utilized by the fetch stage, frequently and/or recently accessed triphone-to-senone mappings utilized by the updating stage, or both.
    Type: Grant
    Filed: December 22, 2008
    Date of Patent: January 8, 2013
    Inventors: Edward Lin, Rob A. Rutenbar
  • Publication number: 20130006631
    Abstract: Environmental recognition systems may improve recognition accuracy by leveraging local and nonlocal features in a recognition target. A local decoder may be used to analyze local features, and a nonlocal decoder may be used to analyze nonlocal features. Local and nonlocal estimates may then be exchanged to improve the accuracy of the local and nonlocal decoders. Additional iterations of analysis and exchange may be performed until a predetermined threshold is reached. In some embodiments, the system may comprise extrinsic information extractors to prevent positive feedback loops from causing the system to adhere to erroneous previous decisions.
    Type: Application
    Filed: June 28, 2012
    Publication date: January 3, 2013
    Applicant: UTAH STATE UNIVERSITY
    Inventors: Jacob Gunther, Todd Moon
  • Publication number: 20120330664
    Abstract: The present invention relates to a method and apparatus for computing Gaussian likelihoods. One embodiment of a method for processing a speech sample includes generating a feature vector for each frame of the speech signal, evaluating the feature vector in accordance with a hierarchical Gaussian shortlist, and producing a hypothesis regarding a content of the speech signal, based on the evaluating.
    Type: Application
    Filed: June 24, 2011
    Publication date: December 27, 2012
    Inventors: XIN LEI, JING ZHENG
  • Patent number: 8335332
    Abstract: A method for operating a hearing aid in a hearing aid system where the hearing aid is continuously learnable for the particular user. A sound environment classification system is provided for tracking and defining sound environment classes relevant to the user. In an ongoing learning process, the classes are redefined based on new environments to which the hearing aid is subjected by the user.
    Type: Grant
    Filed: June 23, 2008
    Date of Patent: December 18, 2012
    Assignees: Siemens Audiologische Technik GmbH, University Of Ottawa
    Inventors: Tyseer Aboulnasr, Eghart Fischer, Christian Giguère, Wail Gueaieb, Volkmar Hamacher, Luc Lamarche
  • Patent number: 8315857
    Abstract: Systems and methods for modification of an audio input signal are provided. In exemplary embodiments, an adaptive multiple-model optimizer is configured to generate at least one source model parameter for facilitating modification of an analyzed signal. The adaptive multiple-model optimizer comprises a segment grouping engine and a source grouping engine. The segment grouping engine is configured to group simultaneous feature segments to generate at least one segment model. The at least one segment model is used by the source grouping engine to generate at least one source model, which comprises the at least one source model parameter. Control signals for modification of the analyzed signal may then be generated based on the at least one source model parameter.
    Type: Grant
    Filed: May 30, 2006
    Date of Patent: November 20, 2012
    Assignee: Audience, Inc.
    Inventors: David Klein, Stephen Malinowski, Lloyd Watts, Bernard Mont-Reynaud
  • Patent number: 8307459
    Abstract: A botnet detection system is provided. A bursty feature extractor receives an Internet Relay Chat (IRC) packet value from a detection object network, and determines a bursty feature accordingly. A Hybrid Hidden Markov Model (HHMM) parameter estimator determines probability parameters for a Hybrid Hidden Markov Model according to the bursty feature. A traffic profile generator establishes a probability sequential model for the Hybrid Hidden Markov Model according to the probability parameters and pre-defined network traffic categories. A dubious state detector determines a traffic state corresponding to a network relaying the IRC packet in response to reception of a new IRC packet, determines whether the IRC packet flow of the object network is dubious by applying the bursty feature to the probability sequential model for the Hybrid Hidden Markov Model, and generates a warning signal when the IRC packet flow is regarded as having a dubious traffic state.
    Type: Grant
    Filed: March 17, 2010
    Date of Patent: November 6, 2012
    Assignee: National Taiwan University of Science and Technology
    Inventors: Hahn-Ming Lee, Ching-Hao Mao, Yu-Jie Chen, Yi-Hsun Wang, Jerome Yeh, Tsu-Han Chen
  • Patent number: 8301449
    Abstract: Hidden Markov Model (HMM) parameters are updated using update equations based on growth transformation optimization of a minimum classification error objective function. Using the list of N-best competitor word sequences obtained by decoding the training data with the current-iteration HMM parameters, the current HMM parameters are updated iteratively. The updating procedure involves using weights for each competitor word sequence that can take any positive real value. The updating procedure is further extended to the case where a decoded lattice of competitors is used. In this case, updating the model parameters relies on determining the probability for a state at a time point based on the word that spans the time point instead of the entire word sequence. This word-bound span of time is shorter than the duration of the entire word sequence and thus reduces the computing time.
    Type: Grant
    Filed: October 16, 2006
    Date of Patent: October 30, 2012
    Assignee: Microsoft Corporation
    Inventors: Xiaodong He, Li Deng
  • Patent number: 8265930
    Abstract: The present invention relates to recording voice data using a voice communication device connected to a communication network and converting the voice data into a text file for delivery to a text communication device. In accordance with the present invention, the voice communication device may transfer the voice data in real-time or store the voice data on the device to be transmitted at a later time. Transcribing the voice data into a text file may be accomplished by automated computer software, either speaker-independent or speaker-dependent or by a human who transcribes the voice data into a text file. After transcribing the voice data into a text file, the text file may be delivered to a text communication device in a number of ways, such as email, file transfer protocol (FTP), or hypertext transfer protocol (HTTP).
    Type: Grant
    Filed: April 13, 2005
    Date of Patent: September 11, 2012
    Assignee: Sprint Communications Company L.P.
    Inventors: Bryce A. Jones, Raymond Edward Dickensheets
  • Patent number: 8234116
    Abstract: Measurement of Kullback-Leibler Divergence (KLD) between hidden Markov models (HMM) of acoustic units utilizes an unscented transform to approximate KLD between Gaussian mixtures. Dynamic programming equalizes the number of states between HMMs having a different number of states, while the total KLD of the HMMs is obtained by summing individual KLDs calculated by state pair by state pair comparisons.
    Type: Grant
    Filed: August 22, 2006
    Date of Patent: July 31, 2012
    Assignee: Microsoft Corporation
    Inventors: Peng Liu, Frank Kao-Ping K. Soong, Jian-Lai Zhou