Hidden Markov Model (hmm) (epo) Patents (Class 704/256.1)
-
Patent number: 12166886Abstract: The present disclosure provides systems and methods for authenticated control of content delivery. The method includes receiving a request for an item of content from a computing device, the request comprising a security token associated with the computing device and an identifier of a group of domains, identifying the group of domains from the identifier, and retrieving a security key associated with the group of domains. The method further includes decrypting a signature of the security token, identifying an authentication string, determining that the authentication string matches a server authentication string, and identifying characteristics of the security token. The characteristics of the security token include a confidence score. The method further includes comparing the confidence score of the security token to a threshold, determining that the confidence score does not exceed the threshold, and preventing transmission of content to the computing device.Type: GrantFiled: June 21, 2022Date of Patent: December 10, 2024Assignee: Google LLCInventors: Gang Wang, Marcel Yung
-
Patent number: 12154452Abstract: A communication method for hearing impaired communication comprising: providing a speech training device to a hearing impaired user, the speech training device configured to teach the hearing impaired user how to determine non-speech sounds. The method further includes providing a haptic output device to a hearing impaired user where the haptic output device is configured to be relasably coupled to the hearing impaired user. The haptic output device receives, a sound input signal comprising a non-speech sound and provides the haptic output signal to an actuator which is in electrical communication with the haptic output device. The actuator actuates in response to the haptic output signal and provides a haptic sensation to the hearing impaired user.Type: GrantFiled: August 23, 2021Date of Patent: November 26, 2024Inventor: Peter Stevens
-
Patent number: 11961514Abstract: An acoustic event detection system may employ one or more recurrent neural networks (RNNs) to extract features from audio data, and use the extracted features to determine the presence of an acoustic event. The system may use self-attention to emphasize features extracted from portions of audio data that may include features more useful for detecting acoustic events. The system may perform self-attention in an iterative manner to reduce the amount of memory used to store hidden states of the RNN while processing successive portions of the audio data. The system may process the portions of the audio data using the RNN to generate a hidden state for each portion. The system may calculate an interim embedding for each hidden state. An interim embedding calculated for the last hidden state may be normalized to determine a final embedding representing features extracted from the input data by the RNN.Type: GrantFiled: December 10, 2021Date of Patent: April 16, 2024Assignee: Amazon Technologies, Inc.Inventors: Chia-Jung Chang, Qingming Tang, Ming Sun, Chao Wang
-
Patent number: 11942072Abstract: Disclosed is a wireless communication device including a voice recognition portion configured to convert a voice signal input through a microphone into a syllable information stream using voice recognition, an encoding portion configured to encode the syllable information stream to generate digital transmission data, a transmission portion configured to modulate from the digital transmission data to a transmission signal and transmit the transmission signal through an antenna, a reception portion configured to demodulate from a reception signal received through the antenna to a digital reception data and output the digital reception data, a decoding portion configured to decode the digital reception data to generate the syllable information stream and a voice synthesis portion configured to convert the syllable information stream into the voice signal using voice synthesis and output the voice signal through a speaker.Type: GrantFiled: February 3, 2021Date of Patent: March 26, 2024Inventor: Sang Rae Park
-
Patent number: 11837253Abstract: A device, system, and method whereby a speech-driven system can distinguish speech obtained from users of the system from other speech spoken by background persons, as well as from background speech from public address systems. In one aspect, the present system and method prepares, in advance of field-use, a voice-data file which is created in a training environment. The training environment exhibits both desired user speech and unwanted background speech, including unwanted speech from persons other than a user and also speech from a PA system. The speech recognition system is trained or otherwise programmed to identify wanted user speech which may be spoken concurrently with the background sounds. In an embodiment, during the pre-field-use phase the training or programming may be accomplished by having persons who are training listeners audit the pre-recorded sounds to identify the desired user speech. A processor-based learning system is trained to duplicate the assessments made by the human listeners.Type: GrantFiled: September 28, 2021Date of Patent: December 5, 2023Assignee: VOCOLLECT, INC.Inventor: David D. Hardek
-
Patent number: 11817013Abstract: A display apparatus and a method for questions and answers includes a display unit includes an input unit configured to receive user's speech voice; a communication unit configured to perform data communication with an answer server; and a processor configured to create and display one or more question sentences using the speech voice in response to the speech voice being a word speech, create a question language corresponding to the question sentence selected from among the displayed one or more question sentences, transmit the created question language to the answer server via the communication unit, and, in response to one or more answer results related to the question language being received from the answer server, display the received one or more answer results. Accordingly, the display apparatus may provide an answer result appropriate to a user's question intention although a non-sentence speech is input.Type: GrantFiled: November 13, 2020Date of Patent: November 14, 2023Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventor: Eun-sang Bak
-
Patent number: 11783197Abstract: Systems and methods for neural machine translation are provided. In one example, a neural machine translation system translates text and comprises processors and a memory storing instructions that, when executed by at least one processor among the processors, cause the system to perform operations comprising, at least, obtaining a text as an input to a neural network system, supplementing the input text with meta information as an extra input to the neural network system, and delivering an output of the neural network system to a user as a translation of the input text, leveraging the meta information for translation.Type: GrantFiled: November 17, 2021Date of Patent: October 10, 2023Assignee: EBAY INC.Inventors: Evgeny Matusov, Wenhu Chen, Shahram Khadivi
-
Patent number: 11776530Abstract: An apparatus for speech model with personalization via ambient context harvesting, is described herein. The apparatus includes a microphone, context harvesting module, confidence module, and training module. The context harvesting module is to determine a context associated with the captured audio signals. A confidence module is to determine a confidence of the context as applied to the audio signals. A training module is to train a neural network in response to the confidence being above a predetermined threshold.Type: GrantFiled: November 15, 2017Date of Patent: October 3, 2023Assignee: INTEL CORPORATIONInventors: Gabriel Amores, Guillermo Perez, Moshe Wasserblat, Michael Deisher, Loic Dufresne de Virel
-
Patent number: 11694685Abstract: A method includes receiving audio data corresponding to an utterance spoken by the user and captured by the user device. The utterance includes a command for a digital assistant to perform an operation. The method also includes determining, using a hotphrase detector configured to detect each trigger word in a set of trigger words associated with a hotphrase, whether any of the trigger words in the set of trigger words are detected in the audio data during the corresponding fixed-duration time window. The method also includes determining identifying, in the audio corresponding to the utterance, the hotphrase when each other trigger word in the set of trigger words was also detected in the audio data. The method also includes triggering an automated speech recognizer to perform speech recognition on the audio data when the hotphrase is identified in the audio data corresponding to the utterance.Type: GrantFiled: December 10, 2020Date of Patent: July 4, 2023Assignee: Google LLCInventors: Victor Carbune, Matthew Sharifi
-
Patent number: 11631399Abstract: According to some embodiments, a machine learning model may include an input layer to receive an input signal as a series of frames representing handwriting data, speech data, audio data, and/or textual data. A plurality of time layers may be provided, and each time layer may comprise a uni-directional recurrent neural network processing block. A depth processing block may scan hidden states of the recurrent neural network processing block of each time layer, and the depth processing block may be associated with a first frame and receive context frame information of a sequence of one or more future frames relative to the first frame. An output layer may output a final classification as a classified posterior vector of the input signal. For example, the depth processing block may receive the context from information from an output of a time layer processing block or another depth processing block of the future frame.Type: GrantFiled: May 13, 2019Date of Patent: April 18, 2023Assignee: Microsoft Technology Licensing, LLCInventors: Jinyu Li, Vadim Mazalov, Changliang Liu, Liang Lu, Yifan Gong
-
Patent number: 11620104Abstract: Characteristics of a speaker are estimated using speech processing and machine learning. The characteristics of the speaker are used to automatically customize a user interface of a client device for the speaker.Type: GrantFiled: July 11, 2022Date of Patent: April 4, 2023Assignee: Google LLCInventors: Eugene Weinstein, Ignacio L. Moreno
-
Patent number: 11574646Abstract: A method of extracting a fundamental frequency of an input sound includes generating a DJ transform spectrogram indicating estimated pure-tone amplitudes for respective natural frequencies of a plurality of springs and a plurality of time points by calculating the estimated pure-tone amplitudes for the respective natural frequencies by modeling an oscillation motion of the plurality of springs having different natural frequencies with respect to an input sound, calculating degrees of fundamental frequency suitability based on a moving average of the estimated pure-tone amplitudes or on a moving standard deviation of the estimated pure-tone amplitudes with respect to each natural frequency of the DJ transform spectrogram, and extracting a fundamental frequency based on local maximum values of the degrees of fundamental frequency suitability for the respective natural frequencies at each of the plurality of time points.Type: GrantFiled: November 12, 2020Date of Patent: February 7, 2023Assignee: BRAINSOFT INC.Inventors: Dong Jin Kim, Ju Yong Shin
-
Patent number: 11514904Abstract: Methods, computer program products, and systems are presented. The method computer program products, and systems can include, for instance: receiving, from a user, voice data defining a candidate directive invoking vocal utterance for invoking a directive to execute a first text based command to perform a first computer function of a computer system, wherein the candidate directive invoking vocal utterance includes at least one word or phrase of the text based command, wherein the computer system is configured to perform the first computer function in response to the first text based command and wherein the computer system is configured to perform a second computer function in response to a second text based command; determining, based on machine logic, whether a word or phrase of the candidate vocal utterance sounds confusingly similar to a speech rendering of a word or phrase defining the second text based command.Type: GrantFiled: November 20, 2019Date of Patent: November 29, 2022Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Jeremy A. Greenberger, Nicholas R. Sandonato
-
Patent number: 10937444Abstract: A system for end-to-end automated scoring is disclosed. The system includes a word embedding layer for converting a plurality of ASR outputs into input tensors; a neural network lexical model encoder receiving the input tensors; a neural network acoustic model encoder implementing AM posterior probability, word duration, mean value of pitch and mean value of intensity based on a plurality of cues; and a linear regression module, for receiving concatenated encoded features from the neural network lexical model encoder and the neural network acoustic model encoder.Type: GrantFiled: November 20, 2018Date of Patent: March 2, 2021Assignee: Educational Testing ServiceInventors: David Suendermann-Oeft, Lei Chen, Jidong Tao, Shabnam Ghaffarzadegan, Yao Qian
-
Patent number: 10930267Abstract: Provided is a speech recognition method for a recognition target language. According to an embodiment of the inventive concept, a speech recognition method for a recognition target language performed by a speech recognition apparatus includes obtaining an original learning data set for the recognition target language, constructing a target label by dividing the text information included in each piece of original learning data in letter units, and building an acoustic model based on a deep neural network by learning the learning speech data included in the each piece of original learning data and the target label corresponding to the learning speech data.Type: GrantFiled: June 13, 2018Date of Patent: February 23, 2021Assignee: SAMSUNG SDS CO., LTD.Inventors: Min Soo Kim, Ji Hyeon Seo, Kyung Jun An, Seung Kyung Kim
-
Patent number: 10891573Abstract: A method can include receiving state information for a wellsite system; receiving contextual information for a role associated with a workflow; generating a natural language report based at least in part on the state information and based at least in part on the contextual information; and transmitting the natural language report via a network interface based at least in part on an identifier associated with the role.Type: GrantFiled: April 18, 2016Date of Patent: January 12, 2021Assignee: Schlumberger Technology CorporationInventors: Benoit Foubert, Richard John Meehan, Jean-Pierre Poyet, Sandra Reyes, Raymond Lin, Sylvain Chambon
-
Patent number: 10283142Abstract: Systems and methods are provided for a processor-implemented method of analyzing quality of sound acquired via a microphone. An input metric is extracted from a sound recording at each of a plurality of time intervals. The input metric is provided at each of the time intervals to a neural network that includes a memory component, where the neural network provides an output metric at each of the time intervals, where the output metric at a particular time interval is based on the input metric at a plurality of time intervals other than the particular time interval using the memory component of the neural network. The output metric is aggregated from each of the time intervals to generate a score indicative of the quality of the sound acquired via the microphone.Type: GrantFiled: July 21, 2016Date of Patent: May 7, 2019Assignee: Educational Testing ServiceInventors: Zhou Yu, Vikram Ramanarayanan, David Suendermann-Oeft, Xinhao Wang, Klaus Zechner, Lei Chen, Jidong Tao, Yao Qian
-
Patent number: 10229676Abstract: Methods and systems for identifying specified phrases within audio streams are provided. More particularly, a phrase is specified. An audio stream is them monitored for the phrase. In response to determining that the audio stream contains the phrase, verification from a user that the phrase was in fact included in the audio stream is requested. If such verification is received, the portion of the audio stream including the phrase is recorded. The recorded phrase can then be applied to identify future instances of the phrase in monitored audio streams.Type: GrantFiled: October 5, 2012Date of Patent: March 12, 2019Assignee: Avaya Inc.Inventors: Shmuel Shaffer, Keith Ponting, Valentine C. Matula
-
Patent number: 10008197Abstract: A keyword detector includes a processor configured to calculate a feature vector for each frame from a speech signal, input the feature vector for each frame to a DNN to calculate a first output probability for each triphone according to a sequence of phonemes contained in a predetermined keyword and a second output probability for each monophone, for each of at least one state of an HMM, calculate a first likelihood representing the probability that the predetermined keyword is uttered in the speech signal by applying the first output probability to the HMM, calculate a second likelihood for the most probable phoneme string in the speech signal by applying the second output probability to the HMM, and determine whether the keyword is to be detected on the basis of the first likelihood and the second likelihood.Type: GrantFiled: October 24, 2016Date of Patent: June 26, 2018Assignee: FUJITSU LIMITEDInventor: Shoji Hayakawa
-
Patent number: 9881616Abstract: A method for improving speech recognition by a speech recognition system includes obtaining a voice sample from a speaker; storing the voice sample of the speaker as a voice model in a voice model database; identifying an area from which sound matching the voice model for the speaker is coming; providing one or more audio signals corresponding to sound received from the identified area to the speech recognition system for processing.Type: GrantFiled: June 6, 2012Date of Patent: January 30, 2018Assignee: QUALCOMM IncorporatedInventors: Jeffrey D. Beckley, Pooja Aggarwal, Shivakumar Balasubramanyam
-
Patent number: 9418334Abstract: Pretraining for a DBN initializes weights of the DBN (Deep Belief Network) using a hybrid pre-training methodology. Hybrid pre-training employs generative component that allows the hybrid PT method to have better performance in WER (Word Error Rate) compared to the discriminative PT method. Hybrid pre-training learns weights which are more closely linked to the final objective function, allowing for a much larger batch size compared to generative PT, which allows for improvements in speed; and a larger batch size allows for parallelization of the gradient computation, speeding up training further.Type: GrantFiled: December 6, 2012Date of Patent: August 16, 2016Assignee: Nuance Communications, Inc.Inventors: Tara N. Sainath, Brian Kingsbury, Bhuvana Ramabhadran
-
Patent number: 9177558Abstract: Computer-implemented systems and methods are provided for assessing non-native spontaneous speech pronunciation. Speech recognition on digitized speech is performed using a non-native acoustic model trained with non-native speech to generate word hypotheses for the digitized speech. Time alignment is performed between the digitized speech and the word hypotheses using a reference acoustic model trained with native-quality speech. Statistics are calculated regarding individual words and phonemes in the word hypotheses based on the alignment. A plurality of features for use in assessing pronunciation of the speech are calculated based on the statistics, an assessment score is calculated based on one or more of the calculated features, and the assessment score is stored in a computer-readable memory.Type: GrantFiled: January 31, 2013Date of Patent: November 3, 2015Assignee: Educational Testing ServiceInventors: Lei Chen, Klaus Zechner, Xiaoming Xi
-
Patent number: 9037462Abstract: Disclosed herein are systems, computer-implemented methods, and tangible computer-readable media for using alternate recognition hypotheses to improve whole-dialog understanding accuracy. The method includes receiving an utterance as part of a user dialog, generating an N-best list of recognition hypotheses for the user dialog turn, selecting an underlying user intention based on a belief distribution across the generated N-best list and at least one contextually similar N-best list, and responding to the user based on the selected underlying user intention. Selecting an intention can further be based on confidence scores associated with recognition hypotheses in the generated N-best lists, and also on the probability of a user's action given their underlying intention. A belief or cumulative confidence score can be assigned to each inferred user intention.Type: GrantFiled: March 20, 2012Date of Patent: May 19, 2015Assignee: AT&T Intellectual Property I, L.P.Inventor: Jason Williams
-
Patent number: 9009039Abstract: Technologies are described herein for noise adaptive training to achieve robust automatic speech recognition. Through the use of these technologies, a noise adaptive training (NAT) approach may use both clean and corrupted speech for training. The NAT approach may normalize the environmental distortion as part of the model training. A set of underlying “pseudo-clean” model parameters may be estimated directly. This may be done without point estimation of clean speech features as an intermediate step. The pseudo-clean model parameters learned from the NAT technique may be used with a Vector Taylor Series (VTS) adaptation. Such adaptation may support decoding noisy utterances during the operating phase of a automatic voice recognition system.Type: GrantFiled: June 12, 2009Date of Patent: April 14, 2015Assignee: Microsoft Technology Licensing, LLCInventors: Michael Lewis Seltzer, James Garnet Droppo, Ozlem Kalinli, Alejandro Acero
-
Patent number: 8972254Abstract: Environmental recognition systems may improve recognition accuracy by leveraging local and nonlocal features in a recognition target. A local decoder may be used to analyze local features, and a nonlocal decoder may be used to analyze nonlocal features. Local and nonlocal estimates may then be exchanged to improve the accuracy of the local and nonlocal decoders. Additional iterations of analysis and exchange may be performed until a predetermined threshold is reached. In some embodiments, the system may comprise extrinsic information extractors to prevent positive feedback loops from causing the system to adhere to erroneous previous decisions.Type: GrantFiled: June 28, 2012Date of Patent: March 3, 2015Assignee: Utah State UniversityInventors: Jacob Gunther, Todd Moon
-
Publication number: 20140365221Abstract: A computer-implemented method performed by a computerized device, a computerized apparatus and a computer program product for recognizing speech, the method comprising: receiving a signal; extracting audio features from the signal; performing acoustic level processing on the audio features; receiving additional data; extracting additional features from the additional data; fusing the audio features and the additional features into a unified structure; receiving a Hidden Markov Model (HMM); and performing a quantum search over the features using the HMM and the unified structure.Type: ApplicationFiled: August 27, 2014Publication date: December 11, 2014Inventor: Yossef Ben-Ezra
-
Publication number: 20140324426Abstract: The present invention, pertaining to the field of speech recognition, discloses a reminder setting method and apparatus. The method includes: acquiring speech signals; acquiring time information in speech signals by using keyword recognition, and determining reminder time for reminder setting according to the time information; acquiring text sequence corresponding to the speech signals by using continuous speech recognition, and determining reminder content for reminder setting according to the time information and the text sequence; and setting a reminder according to the reminder time and the reminder content.Type: ApplicationFiled: May 28, 2013Publication date: October 30, 2014Inventors: Li LU, Feng RAO, Song LIU, Zongyao TANG, Xiang ZHANG, Shuai YUE, Bo CHEN
-
Patent number: 8793124Abstract: A scheme to judge emphasized speech portions, wherein the judgment is executed by a statistical processing in terms of a set of speech parameters including a fundamental frequency, power and a temporal variation of a dynamic measure and/or their derivatives. The emphasized speech portions are used for clues to summarize an audio content or a video content with a speech.Type: GrantFiled: April 5, 2006Date of Patent: July 29, 2014Assignee: Nippon Telegraph and Telephone CorporationInventors: Kota Hidaka, Shinya Nakajima, Osamu Mizuno, Hidetaka Kuwano, Haruhiko Kojima
-
Patent number: 8774261Abstract: A two stage interference cancellation (IC) process includes a linear IC stage that suppresses co-channel interference (CCI) and adjacent channel interference (ACI). The linear IC stage disambiguates otherwise super-trellis data for non-linear cancellation. Soft linear IC processing is driven by a-posteriori probability (Apop) information. A second stage performs expectation maximization/Baum Welch (EM-BW) processing that reduces residual ISI left over from the first stage and also generates the Apop which drives the soft linear IC in an iterative manner.Type: GrantFiled: June 11, 2012Date of Patent: July 8, 2014Assignee: QUALCOMM IncorporatedInventors: Farrokh Abrishamkar, Divaydeep Sikri, Ken Delgado
-
Publication number: 20140180693Abstract: Embodiments of the present invention include an acoustic processing device, a method for acoustic signal processing, and a speech recognition system. The speech processing device can include a processing unit, a histogram pruning unit, and a pre-pruning unit. The processing unit is configured to calculate one or more Hidden Markov Model (HMM) pruning thresholds. The histogram pruning unit is configured to prune one or more HMM states to generate one or more active HMM states. The pruning is based on the one or more pruning thresholds. The pre-pruning unit is configured to prune the one or more active HMM states based on an adjustable pre-pruning threshold. Further, the adjustable pre-pruning threshold is based on the one or more pruning thresholds.Type: ApplicationFiled: December 21, 2012Publication date: June 26, 2014Applicant: Spansion LLCInventor: Ojas Ashok BAPAT
-
Publication number: 20140180694Abstract: Embodiments of the present invention include an acoustic processing device and a method for traversing a Hidden Markov Model (HMM). The acoustic processing device can include a senone scoring unit (SSU), a memory device, a HMM module, and an interface module. The SSU is configured to receive feature vectors from an external computing device and to calculate senones. The memory device is configured to store the senone scores and HMM information, where the HMM information includes HMM IDs and HMM state scores. The HMM module is configured to traverse the HMM based on the senone scores and the HMM information. Further, the interface module is configured to transfer one or more HMM scoring requests from the external computing device to the HMM module and to transfer the HMM state scores to the external computing device.Type: ApplicationFiled: December 21, 2012Publication date: June 26, 2014Applicant: Spansion LLCInventors: Richard M. FASTOW, Ojas A. Bapat, Jens Olson
-
Patent number: 8719023Abstract: An apparatus to improve robustness to environmental changes of a context dependent speech recognizer for an application, that includes a training database to store sounds for speech recognition training, a dictionary to store words supported by the speech recognizer, and a speech recognizer training module to train a set of one or more multiple state Hidden Markov Models (HMMs) with use of the training database and the dictionary. The speech recognizer training module performs a non-uniform state clustering process on each of the states of each HMM, which includes using a different non-uniform cluster threshold for at least some of the states of each HMM to more heavily cluster and correspondingly reduce a number of observation distributions for those of the states of each HMM that are less empirically affected by one or more contextual dependencies.Type: GrantFiled: May 21, 2010Date of Patent: May 6, 2014Assignee: Sony Computer Entertainment Inc.Inventors: Xavier Menendez-Pidal, Ruxin Chen
-
Patent number: 8700400Abstract: Subspace speech adaptation may be utilized for facilitating the recognition of speech containing short utterances. Speech training data may be received in a speech model by a computer. A first matrix may be determined for preconditioning speech statistics based on the speech training data. A second matrix may be determined for representing a basis for the speech to be recognized. A set of basis matrices may then be determined from the first matrix and the second matrix. Speech test data including a short utterance may then be received by the computer. The computer may then apply the set of basis matrices to the speech test data to produce a transcription. The transcription may represent speech recognition of the short utterance.Type: GrantFiled: December 30, 2010Date of Patent: April 15, 2014Assignee: Microsoft CorporationInventors: Daniel Povey, Kaisheng Yao, Yifan Gong
-
Patent number: 8700403Abstract: A method of statistical modeling is provided which includes constructing a statistical model and incorporating Gaussian priors during feature selection and during parameter optimization for the construction of the statistical model.Type: GrantFiled: November 3, 2005Date of Patent: April 15, 2014Assignee: Robert Bosch GmbHInventors: Fuliang Weng, Lin Zhao
-
Patent number: 8639510Abstract: A hardware acoustic scoring unit for a speech recognition system and a method of operation thereof are provided. Rather than scoring all senones in an acoustic model used for the speech recognition system, acoustic scoring logic first scores a set of ciphones based on acoustic features for one frame of sampled speech. The acoustic scoring logic then scores senones associated with the N highest scored ciphones. In one embodiment, the number (N) is three. While the acoustic scoring logic scores the senones associated with the N highest scored ciphones, high score ciphone identification logic operates in parallel with the acoustic scoring unit to identify one or more additional ciphones that have scores greater than a threshold. Once the acoustic scoring unit finishes scoring the senones for the N highest scored ciphones, the acoustic scoring unit then scores senones associated with the one or more additional ciphones.Type: GrantFiled: December 22, 2008Date of Patent: January 28, 2014Inventors: Kai Yu, Rob A. Rutenbar
-
Patent number: 8635067Abstract: Access is obtained to a large reference acoustic model for automatic speech recognition. The large reference acoustic model has L states modeled by L mixture models, and the large reference acoustic model has N components. A desired number of components Nc, less than N, to be used in a restructured acoustic model derived from the reference acoustic model, is identified. The desired number of components Nc is selected based on a computing environment in which the restructured acoustic model is to be deployed. The restructured acoustic model also has L states. For each given one of the L mixture models in the reference acoustic model, a merge sequence is built which records, for a given cost function, sequential mergers of pairs of the components associated with the given one of the mixture models. A portion of the Nc components is assigned to each of the L states in the restructured acoustic model.Type: GrantFiled: December 9, 2010Date of Patent: January 21, 2014Assignee: International Business Machines CorporationInventors: Pierre Dognin, Vaibhava Goel, John R. Hershey, Peder A. Olsen
-
Patent number: 8600749Abstract: Disclosed herein are systems, methods, and computer-readable storage media for training adaptation-specific acoustic models. A system practicing the method receives speech and generates a full size model and a reduced size model, the reduced size model starting with a single distribution for each speech sound in the received speech. The system finds speech segment boundaries in the speech using the full size model and adapts features of the speech data using the reduced size model based on the speech segment boundaries and an overall centroid for each speech sound. The system then recognizes speech using the adapted features of the speech. The model can be a Hidden Markov Model (HMM). The reduced size model can also be of a reduced complexity, such as having fewer mixture components than a model of full complexity. Adapting features of speech can include moving the features closer to an overall feature distribution center.Type: GrantFiled: December 8, 2009Date of Patent: December 3, 2013Assignee: AT&T Intellectual Property I, L.P.Inventor: Andrej Ljolje
-
Patent number: 8504357Abstract: A related word presentation device includes a program information storage unit that stores program information of each program; and an information dividing unit that generates, for each of the attributes of the words included in the program information, at least one group which includes a reference word belonging to the attribute and a set of words which co-occur with the reference word in a program. A degree-of-relevance calculating unit stores attribute-based association dictionaries each of which indicates, for the corresponding attribute of words, (i) the words and (ii) the degrees of relevance between the words calculated based on the frequency of co-occurrence in each of groups. A search condition obtaining unit obtains the search word and the attribute; a substitute word obtaining unit selects substitute words from the attribute-based association dictionary for the obtained attribute; and an output unit presents the selected substitute word.Type: GrantFiled: July 30, 2008Date of Patent: August 6, 2013Assignee: Panasonic CorporationInventors: Takashi Tsuzuki, Satoshi Matsuura, Kazutoyo Takata
-
Patent number: 8484035Abstract: A method of altering a social signaling characteristic of a speech signal. A statistically large number of speech samples created by different speakers in different tones of voice are evaluated to determine one or more relationships that exist between a selected social signaling characteristic and one or more measurable parameters of the speech samples. An input audio voice signal is then processed in accordance with these relationships to modify one or more of controllable parameters of input audio voice signal to produce a modified output audio voice signal in which said selected social signaling characteristic is modified. In a specific illustrative embodiment, a two-level hidden Markov model is used to identify voiced and unvoiced speech segments and selected controllable characteristics of these speech segments are modified to alter the desired social signaling characteristic.Type: GrantFiled: September 6, 2007Date of Patent: July 9, 2013Assignee: Massachusetts Institute of TechnologyInventor: Alex Paul Pentland
-
Publication number: 20130151254Abstract: A speech recognition module includes an acoustic front-end module, a sound detection module, and a word detection module. The acoustic front-end module generates a plurality of representations of frames from a digital audio signal and generates speech characteristic probabilities for the plurality of frames. The sound detection module determines a plurality of estimated utterances from the plurality of representations and the speech characteristic probabilities. The word detection module determines one or more words based on the plurality of estimated utterances and the speech characteristic probabilities.Type: ApplicationFiled: January 31, 2013Publication date: June 13, 2013Applicant: BROADCOM CORPORATIONInventor: Nambirajan Seshadri
-
Publication number: 20130132085Abstract: Methods and systems for non-negative hidden Markov modeling of signals are described. For example, techniques disclosed herein may be applied to signals emitted by one or more sources. In some embodiments, methods and systems may enable the separation of a signal's various components. As such, the systems and methods disclosed herein may find a wide variety of applications. In audio-related fields, for example, these techniques may be useful in music recording and processing, source extraction, noise reduction, teaching, automatic transcription, electronic games, audio search and retrieval, and many other applications.Type: ApplicationFiled: February 21, 2011Publication date: May 23, 2013Inventors: Gautham J. Mysore, Paris Smaragdis
-
Patent number: 8392190Abstract: Computer-implemented systems and methods are provided for assessing non-native spontaneous speech pronunciation. Speech recognition on digitized speech is performed using a non-native acoustic model trained with non-native speech to generate word hypotheses for the digitized speech. Time alignment is performed between the digitized speech and the word hypotheses using a reference acoustic model trained with native-quality speech. Statistics are calculated regarding individual words and phonemes in the word hypotheses based on the alignment. A plurality of features for use in assessing pronunciation of the speech are calculated based on the statistics, an assessment score is calculated based on one or more of the calculated features, and the assessment score is stored in a computer-readable memory.Type: GrantFiled: December 1, 2009Date of Patent: March 5, 2013Assignee: Educational Testing ServiceInventors: Lei Chen, Klaus Zechner, Xiaoming Xi
-
Patent number: 8364487Abstract: A language processing system may determine a display form of a spoken word by analyzing the spoken form using a language model that includes dictionary entries for display forms of homonyms. The homonyms may include trade names as well as given names and other phrases. The language processing system may receive spoken language and produce a display form of the language while displaying the proper form of the homonym. Such a system may be used in search systems where audio input is converted to a graphical display of a portion of the spoken input.Type: GrantFiled: October 21, 2008Date of Patent: January 29, 2013Assignee: Microsoft CorporationInventors: Yun-Cheng Ju, Julian J. Odell
-
Patent number: 8352265Abstract: A hardware implemented backend search stage, or engine, for a speech recognition system is provided. In one embodiment, the backend search engine includes a number of pipelined stages including a fetch stage, an updating stage which may be a Viterbi stage, a transition and prune stage, and a language model stage. Each active triphone of each active word is represented by a corresponding triphone model. By being pipelined, the stages of the backend search engine are enabled to simultaneously process different triphone models, thereby providing high-rate backend searching for the speech recognition system. In one embodiment, caches may be used to cache frequently and/or recently accessed triphone information utilized by the fetch stage, frequently and/or recently accessed triphone-to-senone mappings utilized by the updating stage, or both.Type: GrantFiled: December 22, 2008Date of Patent: January 8, 2013Inventors: Edward Lin, Rob A. Rutenbar
-
Publication number: 20130006631Abstract: Environmental recognition systems may improve recognition accuracy by leveraging local and nonlocal features in a recognition target. A local decoder may be used to analyze local features, and a nonlocal decoder may be used to analyze nonlocal features. Local and nonlocal estimates may then be exchanged to improve the accuracy of the local and nonlocal decoders. Additional iterations of analysis and exchange may be performed until a predetermined threshold is reached. In some embodiments, the system may comprise extrinsic information extractors to prevent positive feedback loops from causing the system to adhere to erroneous previous decisions.Type: ApplicationFiled: June 28, 2012Publication date: January 3, 2013Applicant: UTAH STATE UNIVERSITYInventors: Jacob Gunther, Todd Moon
-
Publication number: 20120330664Abstract: The present invention relates to a method and apparatus for computing Gaussian likelihoods. One embodiment of a method for processing a speech sample includes generating a feature vector for each frame of the speech signal, evaluating the feature vector in accordance with a hierarchical Gaussian shortlist, and producing a hypothesis regarding a content of the speech signal, based on the evaluating.Type: ApplicationFiled: June 24, 2011Publication date: December 27, 2012Inventors: XIN LEI, JING ZHENG
-
Patent number: 8335332Abstract: A method for operating a hearing aid in a hearing aid system where the hearing aid is continuously learnable for the particular user. A sound environment classification system is provided for tracking and defining sound environment classes relevant to the user. In an ongoing learning process, the classes are redefined based on new environments to which the hearing aid is subjected by the user.Type: GrantFiled: June 23, 2008Date of Patent: December 18, 2012Assignees: Siemens Audiologische Technik GmbH, University Of OttawaInventors: Tyseer Aboulnasr, Eghart Fischer, Christian Giguère, Wail Gueaieb, Volkmar Hamacher, Luc Lamarche
-
Patent number: 8315857Abstract: Systems and methods for modification of an audio input signal are provided. In exemplary embodiments, an adaptive multiple-model optimizer is configured to generate at least one source model parameter for facilitating modification of an analyzed signal. The adaptive multiple-model optimizer comprises a segment grouping engine and a source grouping engine. The segment grouping engine is configured to group simultaneous feature segments to generate at least one segment model. The at least one segment model is used by the source grouping engine to generate at least one source model, which comprises the at least one source model parameter. Control signals for modification of the analyzed signal may then be generated based on the at least one source model parameter.Type: GrantFiled: May 30, 2006Date of Patent: November 20, 2012Assignee: Audience, Inc.Inventors: David Klein, Stephen Malinowski, Lloyd Watts, Bernard Mont-Reynaud
-
Patent number: 8307459Abstract: A botnet detection system is provided. A bursty feature extractor receives an Internet Relay Chat (IRC) packet value from a detection object network, and determines a bursty feature accordingly. A Hybrid Hidden Markov Model (HHMM) parameter estimator determines probability parameters for a Hybrid Hidden Markov Model according to the bursty feature. A traffic profile generator establishes a probability sequential model for the Hybrid Hidden Markov Model according to the probability parameters and pre-defined network traffic categories. A dubious state detector determines a traffic state corresponding to a network relaying the IRC packet in response to reception of a new IRC packet, determines whether the IRC packet flow of the object network is dubious by applying the bursty feature to the probability sequential model for the Hybrid Hidden Markov Model, and generates a warning signal when the IRC packet flow is regarded as having a dubious traffic state.Type: GrantFiled: March 17, 2010Date of Patent: November 6, 2012Assignee: National Taiwan University of Science and TechnologyInventors: Hahn-Ming Lee, Ching-Hao Mao, Yu-Jie Chen, Yi-Hsun Wang, Jerome Yeh, Tsu-Han Chen
-
Patent number: 8301449Abstract: Hidden Markov Model (HMM) parameters are updated using update equations based on growth transformation optimization of a minimum classification error objective function. Using the list of N-best competitor word sequences obtained by decoding the training data with the current-iteration HMM parameters, the current HMM parameters are updated iteratively. The updating procedure involves using weights for each competitor word sequence that can take any positive real value. The updating procedure is further extended to the case where a decoded lattice of competitors is used. In this case, updating the model parameters relies on determining the probability for a state at a time point based on the word that spans the time point instead of the entire word sequence. This word-bound span of time is shorter than the duration of the entire word sequence and thus reduces the computing time.Type: GrantFiled: October 16, 2006Date of Patent: October 30, 2012Assignee: Microsoft CorporationInventors: Xiaodong He, Li Deng