Hidden Markov Model (hmm) (epo) Patents (Class 704/256.1)

Training of hmm (epo) (Class 704/256.2)

With insufficient amount of training data, e.g., state sharing, tying, deleted interpolation (EPO) (Class 704/256.3)

Duration modeling in hmm, e.g., semi hmm, segmental models, transition probabilities (epo) (Class 704/256.4)

Hidden markov (hm) network (epo) (Class 704/256.5)

State emission probability (epo) (Class 704/256.6)

Continuous density, e.g, Gaussian distribution, Lapalce (EPO) (Class 704/256.7)
Discrete density, e.g., Vector Quantization preprocessor, look up tables (EPO) (Class 704/256.8)

Streaming self-attention in a neural network

Patent number: 11961514

Abstract: An acoustic event detection system may employ one or more recurrent neural networks (RNNs) to extract features from audio data, and use the extracted features to determine the presence of an acoustic event. The system may use self-attention to emphasize features extracted from portions of audio data that may include features more useful for detecting acoustic events. The system may perform self-attention in an iterative manner to reduce the amount of memory used to store hidden states of the RNN while processing successive portions of the audio data. The system may process the portions of the audio data using the RNN to generate a hidden state for each portion. The system may calculate an interim embedding for each hidden state. An interim embedding calculated for the last hidden state may be normalized to determine a final embedding representing features extracted from the input data by the RNN.

Type: Grant

Filed: December 10, 2021

Date of Patent: April 16, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Chia-Jung Chang, Qingming Tang, Ming Sun, Chao Wang
Wireless communication device using voice recognition and voice synthesis

Patent number: 11942072

Abstract: Disclosed is a wireless communication device including a voice recognition portion configured to convert a voice signal input through a microphone into a syllable information stream using voice recognition, an encoding portion configured to encode the syllable information stream to generate digital transmission data, a transmission portion configured to modulate from the digital transmission data to a transmission signal and transmit the transmission signal through an antenna, a reception portion configured to demodulate from a reception signal received through the antenna to a digital reception data and output the digital reception data, a decoding portion configured to decode the digital reception data to generate the syllable information stream and a voice synthesis portion configured to convert the syllable information stream into the voice signal using voice synthesis and output the voice signal through a speaker.

Type: Grant

Filed: February 3, 2021

Date of Patent: March 26, 2024

Inventor: Sang Rae Park
Distinguishing user speech from background speech in speech-dense environments

Patent number: 11837253

Abstract: A device, system, and method whereby a speech-driven system can distinguish speech obtained from users of the system from other speech spoken by background persons, as well as from background speech from public address systems. In one aspect, the present system and method prepares, in advance of field-use, a voice-data file which is created in a training environment. The training environment exhibits both desired user speech and unwanted background speech, including unwanted speech from persons other than a user and also speech from a PA system. The speech recognition system is trained or otherwise programmed to identify wanted user speech which may be spoken concurrently with the background sounds. In an embodiment, during the pre-field-use phase the training or programming may be accomplished by having persons who are training listeners audit the pre-recorded sounds to identify the desired user speech. A processor-based learning system is trained to duplicate the assessments made by the human listeners.

Type: Grant

Filed: September 28, 2021

Date of Patent: December 5, 2023

Assignee: VOCOLLECT, INC.

Inventor: David D. Hardek
Display apparatus and method for question and answer

Patent number: 11817013

Abstract: A display apparatus and a method for questions and answers includes a display unit includes an input unit configured to receive user's speech voice; a communication unit configured to perform data communication with an answer server; and a processor configured to create and display one or more question sentences using the speech voice in response to the speech voice being a word speech, create a question language corresponding to the question sentence selected from among the displayed one or more question sentences, transmit the created question language to the answer server via the communication unit, and, in response to one or more answer results related to the question language being received from the answer server, display the received one or more answer results. Accordingly, the display apparatus may provide an answer result appropriate to a user's question intention although a non-sentence speech is input.

Type: Grant

Filed: November 13, 2020

Date of Patent: November 14, 2023

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventor: Eun-sang Bak
Using meta-information in neural machine translation

Patent number: 11783197

Abstract: Systems and methods for neural machine translation are provided. In one example, a neural machine translation system translates text and comprises processors and a memory storing instructions that, when executed by at least one processor among the processors, cause the system to perform operations comprising, at least, obtaining a text as an input to a neural network system, supplementing the input text with meta information as an extra input to the neural network system, and delivering an output of the neural network system to a user as a translation of the input text, leveraging the meta information for translation.

Type: Grant

Filed: November 17, 2021

Date of Patent: October 10, 2023

Assignee: EBAY INC.

Inventors: Evgeny Matusov, Wenhu Chen, Shahram Khadivi
Speech model personalization via ambient context harvesting

Patent number: 11776530

Abstract: An apparatus for speech model with personalization via ambient context harvesting, is described herein. The apparatus includes a microphone, context harvesting module, confidence module, and training module. The context harvesting module is to determine a context associated with the captured audio signals. A confidence module is to determine a confidence of the context as applied to the audio signals. A training module is to train a neural network in response to the confidence being above a predetermined threshold.

Type: Grant

Filed: November 15, 2017

Date of Patent: October 3, 2023

Assignee: INTEL CORPORATION

Inventors: Gabriel Amores, Guillermo Perez, Moshe Wasserblat, Michael Deisher, Loic Dufresne de Virel
Hotphrase triggering based on a sequence of detections

Patent number: 11694685

Abstract: A method includes receiving audio data corresponding to an utterance spoken by the user and captured by the user device. The utterance includes a command for a digital assistant to perform an operation. The method also includes determining, using a hotphrase detector configured to detect each trigger word in a set of trigger words associated with a hotphrase, whether any of the trigger words in the set of trigger words are detected in the audio data during the corresponding fixed-duration time window. The method also includes determining identifying, in the audio corresponding to the utterance, the hotphrase when each other trigger word in the set of trigger words was also detected in the audio data. The method also includes triggering an automated speech recognizer to perform speech recognition on the audio data when the hotphrase is identified in the audio data corresponding to the utterance.

Type: Grant

Filed: December 10, 2020

Date of Patent: July 4, 2023

Assignee: Google LLC

Inventors: Victor Carbune, Matthew Sharifi
Layer trajectory long short-term memory with future context

Patent number: 11631399

Abstract: According to some embodiments, a machine learning model may include an input layer to receive an input signal as a series of frames representing handwriting data, speech data, audio data, and/or textual data. A plurality of time layers may be provided, and each time layer may comprise a uni-directional recurrent neural network processing block. A depth processing block may scan hidden states of the recurrent neural network processing block of each time layer, and the depth processing block may be associated with a first frame and receive context frame information of a sequence of one or more future frames relative to the first frame. An output layer may output a final classification as a classified posterior vector of the input signal. For example, the depth processing block may receive the context from information from an output of a time layer processing block or another depth processing block of the future frame.

Type: Grant

Filed: May 13, 2019

Date of Patent: April 18, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Jinyu Li, Vadim Mazalov, Changliang Liu, Liang Lu, Yifan Gong
User interface customization based on speaker characteristics

Patent number: 11620104

Abstract: Characteristics of a speaker are estimated using speech processing and machine learning. The characteristics of the speaker are used to automatically customize a user interface of a client device for the speaker.

Type: Grant

Filed: July 11, 2022

Date of Patent: April 4, 2023

Assignee: Google LLC

Inventors: Eugene Weinstein, Ignacio L. Moreno
Fundamental frequency extraction method using DJ transform

Patent number: 11574646

Abstract: A method of extracting a fundamental frequency of an input sound includes generating a DJ transform spectrogram indicating estimated pure-tone amplitudes for respective natural frequencies of a plurality of springs and a plurality of time points by calculating the estimated pure-tone amplitudes for the respective natural frequencies by modeling an oscillation motion of the plurality of springs having different natural frequencies with respect to an input sound, calculating degrees of fundamental frequency suitability based on a moving average of the estimated pure-tone amplitudes or on a moving standard deviation of the estimated pure-tone amplitudes with respect to each natural frequency of the DJ transform spectrogram, and extracting a fundamental frequency based on local maximum values of the degrees of fundamental frequency suitability for the respective natural frequencies at each of the plurality of time points.

Type: Grant

Filed: November 12, 2020

Date of Patent: February 7, 2023

Assignee: BRAINSOFT INC.

Inventors: Dong Jin Kim, Ju Yong Shin
Filtering directive invoking vocal utterances

Patent number: 11514904

Abstract: Methods, computer program products, and systems are presented. The method computer program products, and systems can include, for instance: receiving, from a user, voice data defining a candidate directive invoking vocal utterance for invoking a directive to execute a first text based command to perform a first computer function of a computer system, wherein the candidate directive invoking vocal utterance includes at least one word or phrase of the text based command, wherein the computer system is configured to perform the first computer function in response to the first text based command and wherein the computer system is configured to perform a second computer function in response to a second text based command; determining, based on machine logic, whether a word or phrase of the candidate vocal utterance sounds confusingly similar to a speech rendering of a word or phrase defining the second text based command.

Type: Grant

Filed: November 20, 2019

Date of Patent: November 29, 2022

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Jeremy A. Greenberger, Nicholas R. Sandonato
End-to-end neural network based automated speech scoring

Patent number: 10937444

Abstract: A system for end-to-end automated scoring is disclosed. The system includes a word embedding layer for converting a plurality of ASR outputs into input tensors; a neural network lexical model encoder receiving the input tensors; a neural network acoustic model encoder implementing AM posterior probability, word duration, mean value of pitch and mean value of intensity based on a plurality of cues; and a linear regression module, for receiving concatenated encoded features from the neural network lexical model encoder and the neural network acoustic model encoder.

Type: Grant

Filed: November 20, 2018

Date of Patent: March 2, 2021

Assignee: Educational Testing Service

Inventors: David Suendermann-Oeft, Lei Chen, Jidong Tao, Shabnam Ghaffarzadegan, Yao Qian
Speech recognition method and apparatus therefor

Patent number: 10930267

Abstract: Provided is a speech recognition method for a recognition target language. According to an embodiment of the inventive concept, a speech recognition method for a recognition target language performed by a speech recognition apparatus includes obtaining an original learning data set for the recognition target language, constructing a target label by dividing the text information included in each piece of original learning data in letter units, and building an acoustic model based on a deep neural network by learning the learning speech data included in the each piece of original learning data and the target label corresponding to the learning speech data.

Type: Grant

Filed: June 13, 2018

Date of Patent: February 23, 2021

Assignee: SAMSUNG SDS CO., LTD.

Inventors: Min Soo Kim, Ji Hyeon Seo, Kyung Jun An, Seung Kyung Kim
Wellsite report system

Patent number: 10891573

Abstract: A method can include receiving state information for a wellsite system; receiving contextual information for a role associated with a workflow; generating a natural language report based at least in part on the state information and based at least in part on the contextual information; and transmitting the natural language report via a network interface based at least in part on an identifier associated with the role.

Type: Grant

Filed: April 18, 2016

Date of Patent: January 12, 2021

Assignee: Schlumberger Technology Corporation

Inventors: Benoit Foubert, Richard John Meehan, Jean-Pierre Poyet, Sandra Reyes, Raymond Lin, Sylvain Chambon
Processor-implemented systems and methods for determining sound quality

Patent number: 10283142

Abstract: Systems and methods are provided for a processor-implemented method of analyzing quality of sound acquired via a microphone. An input metric is extracted from a sound recording at each of a plurality of time intervals. The input metric is provided at each of the time intervals to a neural network that includes a memory component, where the neural network provides an output metric at each of the time intervals, where the output metric at a particular time interval is based on the input metric at a plurality of time intervals other than the particular time interval using the memory component of the neural network. The output metric is aggregated from each of the time intervals to generate a score indicative of the quality of the sound acquired via the microphone.

Type: Grant

Filed: July 21, 2016

Date of Patent: May 7, 2019

Assignee: Educational Testing Service

Inventors: Zhou Yu, Vikram Ramanarayanan, David Suendermann-Oeft, Xinhao Wang, Klaus Zechner, Lei Chen, Jidong Tao, Yao Qian
Phrase spotting systems and methods

Patent number: 10229676

Abstract: Methods and systems for identifying specified phrases within audio streams are provided. More particularly, a phrase is specified. An audio stream is them monitored for the phrase. In response to determining that the audio stream contains the phrase, verification from a user that the phrase was in fact included in the audio stream is requested. If such verification is received, the portion of the audio stream including the phrase is recorded. The recorded phrase can then be applied to identify future instances of the phrase in monitored audio streams.

Type: Grant

Filed: October 5, 2012

Date of Patent: March 12, 2019

Assignee: Avaya Inc.

Inventors: Shmuel Shaffer, Keith Ponting, Valentine C. Matula
Keyword detector and keyword detection method

Patent number: 10008197

Abstract: A keyword detector includes a processor configured to calculate a feature vector for each frame from a speech signal, input the feature vector for each frame to a DNN to calculate a first output probability for each triphone according to a sequence of phonemes contained in a predetermined keyword and a second output probability for each monophone, for each of at least one state of an HMM, calculate a first likelihood representing the probability that the predetermined keyword is uttered in the speech signal by applying the first output probability to the HMM, calculate a second likelihood for the most probable phoneme string in the speech signal by applying the second output probability to the HMM, and determine whether the keyword is to be detected on the basis of the first likelihood and the second likelihood.

Type: Grant

Filed: October 24, 2016

Date of Patent: June 26, 2018

Assignee: FUJITSU LIMITED

Inventor: Shoji Hayakawa
Method and systems having improved speech recognition

Patent number: 9881616

Abstract: A method for improving speech recognition by a speech recognition system includes obtaining a voice sample from a speaker; storing the voice sample of the speaker as a voice model in a voice model database; identifying an area from which sound matching the voice model for the speaker is coming; providing one or more audio signals corresponding to sound received from the identified area to the speech recognition system for processing.

Type: Grant

Filed: June 6, 2012

Date of Patent: January 30, 2018

Assignee: QUALCOMM Incorporated

Inventors: Jeffrey D. Beckley, Pooja Aggarwal, Shivakumar Balasubramanyam
Hybrid pre-training of deep belief networks

Patent number: 9418334

Abstract: Pretraining for a DBN initializes weights of the DBN (Deep Belief Network) using a hybrid pre-training methodology. Hybrid pre-training employs generative component that allows the hybrid PT method to have better performance in WER (Word Error Rate) compared to the discriminative PT method. Hybrid pre-training learns weights which are more closely linked to the final objective function, allowing for a much larger batch size compared to generative PT, which allows for improvements in speed; and a larger batch size allows for parallelization of the gradient computation, speeding up training further.

Type: Grant

Filed: December 6, 2012

Date of Patent: August 16, 2016

Assignee: Nuance Communications, Inc.

Inventors: Tara N. Sainath, Brian Kingsbury, Bhuvana Ramabhadran
Systems and methods for assessment of non-native spontaneous speech

Patent number: 9177558

Abstract: Computer-implemented systems and methods are provided for assessing non-native spontaneous speech pronunciation. Speech recognition on digitized speech is performed using a non-native acoustic model trained with non-native speech to generate word hypotheses for the digitized speech. Time alignment is performed between the digitized speech and the word hypotheses using a reference acoustic model trained with native-quality speech. Statistics are calculated regarding individual words and phonemes in the word hypotheses based on the alignment. A plurality of features for use in assessing pronunciation of the speech are calculated based on the statistics, an assessment score is calculated based on one or more of the calculated features, and the assessment score is stored in a computer-readable memory.

Type: Grant

Filed: January 31, 2013

Date of Patent: November 3, 2015

Assignee: Educational Testing Service

Inventors: Lei Chen, Klaus Zechner, Xiaoming Xi
User intention based on N-best list of recognition hypotheses for utterances in a dialog

Patent number: 9037462

Abstract: Disclosed herein are systems, computer-implemented methods, and tangible computer-readable media for using alternate recognition hypotheses to improve whole-dialog understanding accuracy. The method includes receiving an utterance as part of a user dialog, generating an N-best list of recognition hypotheses for the user dialog turn, selecting an underlying user intention based on a belief distribution across the generated N-best list and at least one contextually similar N-best list, and responding to the user based on the selected underlying user intention. Selecting an intention can further be based on confidence scores associated with recognition hypotheses in the generated N-best lists, and also on the probability of a user's action given their underlying intention. A belief or cumulative confidence score can be assigned to each inferred user intention.

Type: Grant

Filed: March 20, 2012

Date of Patent: May 19, 2015

Assignee: AT&T Intellectual Property I, L.P.

Inventor: Jason Williams
Noise adaptive training for speech recognition

Patent number: 9009039

Abstract: Technologies are described herein for noise adaptive training to achieve robust automatic speech recognition. Through the use of these technologies, a noise adaptive training (NAT) approach may use both clean and corrupted speech for training. The NAT approach may normalize the environmental distortion as part of the model training. A set of underlying “pseudo-clean” model parameters may be estimated directly. This may be done without point estimation of clean speech features as an intermediate step. The pseudo-clean model parameters learned from the NAT technique may be used with a Vector Taylor Series (VTS) adaptation. Such adaptation may support decoding noisy utterances during the operating phase of a automatic voice recognition system.

Type: Grant

Filed: June 12, 2009

Date of Patent: April 14, 2015

Assignee: Microsoft Technology Licensing, LLC

Inventors: Michael Lewis Seltzer, James Garnet Droppo, Ozlem Kalinli, Alejandro Acero
Turbo processing for speech recognition with local-scale and broad-scale decoders

Patent number: 8972254

Abstract: Environmental recognition systems may improve recognition accuracy by leveraging local and nonlocal features in a recognition target. A local decoder may be used to analyze local features, and a nonlocal decoder may be used to analyze nonlocal features. Local and nonlocal estimates may then be exchanged to improve the accuracy of the local and nonlocal decoders. Additional iterations of analysis and exchange may be performed until a predetermined threshold is reached. In some embodiments, the system may comprise extrinsic information extractors to prevent positive feedback loops from causing the system to adhere to erroneous previous decisions.

Type: Grant

Filed: June 28, 2012

Date of Patent: March 3, 2015

Assignee: Utah State University

Inventors: Jacob Gunther, Todd Moon
METHOD AND APPARATUS FOR SPEECH RECOGNITION

Publication number: 20140365221

Abstract: A computer-implemented method performed by a computerized device, a computerized apparatus and a computer program product for recognizing speech, the method comprising: receiving a signal; extracting audio features from the signal; performing acoustic level processing on the audio features; receiving additional data; extracting additional features from the additional data; fusing the audio features and the additional features into a unified structure; receiving a Hidden Markov Model (HMM); and performing a quantum search over the features using the HMM and the unified structure.

Type: Application

Filed: August 27, 2014

Publication date: December 11, 2014

Inventor: Yossef Ben-Ezra
REMINDER SETTING METHOD AND APPARATUS

Publication number: 20140324426

Abstract: The present invention, pertaining to the field of speech recognition, discloses a reminder setting method and apparatus. The method includes: acquiring speech signals; acquiring time information in speech signals by using keyword recognition, and determining reminder time for reminder setting according to the time information; acquiring text sequence corresponding to the speech signals by using continuous speech recognition, and determining reminder content for reminder setting according to the time information and the text sequence; and setting a reminder according to the reminder time and the reminder content.

Type: Application

Filed: May 28, 2013

Publication date: October 30, 2014

Inventors: Li LU, Feng RAO, Song LIU, Zongyao TANG, Xiang ZHANG, Shuai YUE, Bo CHEN
Speech processing method and apparatus for deciding emphasized portions of speech, and program therefor

Patent number: 8793124

Abstract: A scheme to judge emphasized speech portions, wherein the judgment is executed by a statistical processing in terms of a set of speech parameters including a fundamental frequency, power and a temporal variation of a dynamic measure and/or their derivatives. The emphasized speech portions are used for clues to summarize an audio content or a video content with a speech.

Type: Grant

Filed: April 5, 2006

Date of Patent: July 29, 2014

Assignee: Nippon Telegraph and Telephone Corporation

Inventors: Kota Hidaka, Shinya Nakajima, Osamu Mizuno, Hidetaka Kuwano, Haruhiko Kojima
Soft linear and non-linear interference cancellation

Patent number: 8774261

Abstract: A two stage interference cancellation (IC) process includes a linear IC stage that suppresses co-channel interference (CCI) and adjacent channel interference (ACI). The linear IC stage disambiguates otherwise super-trellis data for non-linear cancellation. Soft linear IC processing is driven by a-posteriori probability (Apop) information. A second stage performs expectation maximization/Baum Welch (EM-BW) processing that reduces residual ISI left over from the first stage and also generates the Apop which drives the soft linear IC in an iterative manner.

Type: Grant

Filed: June 11, 2012

Date of Patent: July 8, 2014

Assignee: QUALCOMM Incorporated

Inventors: Farrokh Abrishamkar, Divaydeep Sikri, Ken Delgado
Histogram Based Pre-Pruning Scheme for Active HMMS

Publication number: 20140180693

Abstract: Embodiments of the present invention include an acoustic processing device, a method for acoustic signal processing, and a speech recognition system. The speech processing device can include a processing unit, a histogram pruning unit, and a pre-pruning unit. The processing unit is configured to calculate one or more Hidden Markov Model (HMM) pruning thresholds. The histogram pruning unit is configured to prune one or more HMM states to generate one or more active HMM states. The pruning is based on the one or more pruning thresholds. The pre-pruning unit is configured to prune the one or more active HMM states based on an adjustable pre-pruning threshold. Further, the adjustable pre-pruning threshold is based on the one or more pruning thresholds.

Type: Application

Filed: December 21, 2012

Publication date: June 26, 2014

Applicant: Spansion LLC

Inventor: Ojas Ashok BAPAT
Phoneme Score Accelerator

Publication number: 20140180694

Abstract: Embodiments of the present invention include an acoustic processing device and a method for traversing a Hidden Markov Model (HMM). The acoustic processing device can include a senone scoring unit (SSU), a memory device, a HMM module, and an interface module. The SSU is configured to receive feature vectors from an external computing device and to calculate senones. The memory device is configured to store the senone scores and HMM information, where the HMM information includes HMM IDs and HMM state scores. The HMM module is configured to traverse the HMM based on the senone scores and the HMM information. Further, the interface module is configured to transfer one or more HMM scoring requests from the external computing device to the HMM module and to transfer the HMM state scores to the external computing device.

Type: Application

Filed: December 21, 2012

Publication date: June 26, 2014

Applicant: Spansion LLC

Inventors: Richard M. FASTOW, Ojas A. Bapat, Jens Olson
Robustness to environmental changes of a context dependent speech recognizer

Patent number: 8719023

Abstract: An apparatus to improve robustness to environmental changes of a context dependent speech recognizer for an application, that includes a training database to store sounds for speech recognition training, a dictionary to store words supported by the speech recognizer, and a speech recognizer training module to train a set of one or more multiple state Hidden Markov Models (HMMs) with use of the training database and the dictionary. The speech recognizer training module performs a non-uniform state clustering process on each of the states of each HMM, which includes using a different non-uniform cluster threshold for at least some of the states of each HMM to more heavily cluster and correspondingly reduce a number of observation distributions for those of the states of each HMM that are less empirically affected by one or more contextual dependencies.

Type: Grant

Filed: May 21, 2010

Date of Patent: May 6, 2014

Assignee: Sony Computer Entertainment Inc.

Inventors: Xavier Menendez-Pidal, Ruxin Chen
Unified treatment of data-sparseness and data-overfitting in maximum entropy modeling

Patent number: 8700403

Abstract: A method of statistical modeling is provided which includes constructing a statistical model and incorporating Gaussian priors during feature selection and during parameter optimization for the construction of the statistical model.

Type: Grant

Filed: November 3, 2005

Date of Patent: April 15, 2014

Assignee: Robert Bosch GmbH

Inventors: Fuliang Weng, Lin Zhao
Subspace speech adaptation

Patent number: 8700400

Abstract: Subspace speech adaptation may be utilized for facilitating the recognition of speech containing short utterances. Speech training data may be received in a speech model by a computer. A first matrix may be determined for preconditioning speech statistics based on the speech training data. A second matrix may be determined for representing a basis for the speech to be recognized. A set of basis matrices may then be determined from the first matrix and the second matrix. Speech test data including a short utterance may then be received by the computer. The computer may then apply the set of basis matrices to the speech test data to produce a transcription. The transcription may represent speech recognition of the short utterance.

Type: Grant

Filed: December 30, 2010

Date of Patent: April 15, 2014

Assignee: Microsoft Corporation

Inventors: Daniel Povey, Kaisheng Yao, Yifan Gong
Acoustic scoring unit implemented on a single FPGA or ASIC

Patent number: 8639510

Abstract: A hardware acoustic scoring unit for a speech recognition system and a method of operation thereof are provided. Rather than scoring all senones in an acoustic model used for the speech recognition system, acoustic scoring logic first scores a set of ciphones based on acoustic features for one frame of sampled speech. The acoustic scoring logic then scores senones associated with the N highest scored ciphones. In one embodiment, the number (N) is three. While the acoustic scoring logic scores the senones associated with the N highest scored ciphones, high score ciphone identification logic operates in parallel with the acoustic scoring unit to identify one or more additional ciphones that have scores greater than a threshold. Once the acoustic scoring unit finishes scoring the senones for the N highest scored ciphones, the acoustic scoring unit then scores senones associated with the one or more additional ciphones.

Type: Grant

Filed: December 22, 2008

Date of Patent: January 28, 2014

Inventors: Kai Yu, Rob A. Rutenbar
Model restructuring for client and server based automatic speech recognition

Patent number: 8635067

Abstract: Access is obtained to a large reference acoustic model for automatic speech recognition. The large reference acoustic model has L states modeled by L mixture models, and the large reference acoustic model has N components. A desired number of components Nc, less than N, to be used in a restructured acoustic model derived from the reference acoustic model, is identified. The desired number of components Nc is selected based on a computing environment in which the restructured acoustic model is to be deployed. The restructured acoustic model also has L states. For each given one of the L mixture models in the reference acoustic model, a merge sequence is built which records, for a given cost function, sequential mergers of pairs of the components associated with the given one of the mixture models. A portion of the Nc components is assigned to each of the L states in the restructured acoustic model.

Type: Grant

Filed: December 9, 2010

Date of Patent: January 21, 2014

Assignee: International Business Machines Corporation

Inventors: Pierre Dognin, Vaibhava Goel, John R. Hershey, Peder A. Olsen
System and method for training adaptation-specific acoustic models for automatic speech recognition

Patent number: 8600749

Abstract: Disclosed herein are systems, methods, and computer-readable storage media for training adaptation-specific acoustic models. A system practicing the method receives speech and generates a full size model and a reduced size model, the reduced size model starting with a single distribution for each speech sound in the received speech. The system finds speech segment boundaries in the speech using the full size model and adapts features of the speech data using the reduced size model based on the speech segment boundaries and an overall centroid for each speech sound. The system then recognizes speech using the adapted features of the speech. The model can be a Hidden Markov Model (HMM). The reduced size model can also be of a reduced complexity, such as having fewer mixture components than a model of full complexity. Adapting features of speech can include moving the features closer to an overall feature distribution center.

Type: Grant

Filed: December 8, 2009

Date of Patent: December 3, 2013

Assignee: AT&T Intellectual Property I, L.P.

Inventor: Andrej Ljolje
Related word presentation device

Patent number: 8504357

Abstract: A related word presentation device includes a program information storage unit that stores program information of each program; and an information dividing unit that generates, for each of the attributes of the words included in the program information, at least one group which includes a reference word belonging to the attribute and a set of words which co-occur with the reference word in a program. A degree-of-relevance calculating unit stores attribute-based association dictionaries each of which indicates, for the corresponding attribute of words, (i) the words and (ii) the degrees of relevance between the words calculated based on the frequency of co-occurrence in each of groups. A search condition obtaining unit obtains the search word and the attribute; a substitute word obtaining unit selects substitute words from the attribute-based association dictionary for the obtained attribute; and an output unit presents the selected substitute word.

Type: Grant

Filed: July 30, 2008

Date of Patent: August 6, 2013

Assignee: Panasonic Corporation

Inventors: Takashi Tsuzuki, Satoshi Matsuura, Kazutoyo Takata
Modification of voice waveforms to change social signaling

Patent number: 8484035

Abstract: A method of altering a social signaling characteristic of a speech signal. A statistically large number of speech samples created by different speakers in different tones of voice are evaluated to determine one or more relationships that exist between a selected social signaling characteristic and one or more measurable parameters of the speech samples. An input audio voice signal is then processed in accordance with these relationships to modify one or more of controllable parameters of input audio voice signal to produce a modified output audio voice signal in which said selected social signaling characteristic is modified. In a specific illustrative embodiment, a two-level hidden Markov model is used to identify voiced and unvoiced speech segments and selected controllable characteristics of these speech segments are modified to alter the desired social signaling characteristic.

Type: Grant

Filed: September 6, 2007

Date of Patent: July 9, 2013

Assignee: Massachusetts Institute of Technology

Inventor: Alex Paul Pentland
SPEECH RECOGNITION USING SPEECH CHARACTERISTIC PROBABILITIES

Publication number: 20130151254

Abstract: A speech recognition module includes an acoustic front-end module, a sound detection module, and a word detection module. The acoustic front-end module generates a plurality of representations of frames from a digital audio signal and generates speech characteristic probabilities for the plurality of frames. The sound detection module determines a plurality of estimated utterances from the plurality of representations and the speech characteristic probabilities. The word detection module determines one or more words based on the plurality of estimated utterances and the speech characteristic probabilities.

Type: Application

Filed: January 31, 2013

Publication date: June 13, 2013

Applicant: BROADCOM CORPORATION

Inventor: Nambirajan Seshadri
Systems and Methods for Non-Negative Hidden Markov Modeling of Signals

Publication number: 20130132085

Abstract: Methods and systems for non-negative hidden Markov modeling of signals are described. For example, techniques disclosed herein may be applied to signals emitted by one or more sources. In some embodiments, methods and systems may enable the separation of a signal's various components. As such, the systems and methods disclosed herein may find a wide variety of applications. In audio-related fields, for example, these techniques may be useful in music recording and processing, source extraction, noise reduction, teaching, automatic transcription, electronic games, audio search and retrieval, and many other applications.

Type: Application

Filed: February 21, 2011

Publication date: May 23, 2013

Inventors: Gautham J. Mysore, Paris Smaragdis
Systems and methods for assessment of non-native spontaneous speech

Patent number: 8392190

Abstract: Computer-implemented systems and methods are provided for assessing non-native spontaneous speech pronunciation. Speech recognition on digitized speech is performed using a non-native acoustic model trained with non-native speech to generate word hypotheses for the digitized speech. Time alignment is performed between the digitized speech and the word hypotheses using a reference acoustic model trained with native-quality speech. Statistics are calculated regarding individual words and phonemes in the word hypotheses based on the alignment. A plurality of features for use in assessing pronunciation of the speech are calculated based on the statistics, an assessment score is calculated based on one or more of the calculated features, and the assessment score is stored in a computer-readable memory.

Type: Grant

Filed: December 1, 2009

Date of Patent: March 5, 2013

Assignee: Educational Testing Service

Inventors: Lei Chen, Klaus Zechner, Xiaoming Xi
Speech recognition system with display information

Patent number: 8364487

Abstract: A language processing system may determine a display form of a spoken word by analyzing the spoken form using a language model that includes dictionary entries for display forms of homonyms. The homonyms may include trade names as well as given names and other phrases. The language processing system may receive spoken language and produce a display form of the language while displaying the proper form of the homonym. Such a system may be used in search systems where audio input is converted to a graphical display of a portion of the spoken input.

Type: Grant

Filed: October 21, 2008

Date of Patent: January 29, 2013

Assignee: Microsoft Corporation

Inventors: Yun-Cheng Ju, Julian J. Odell
Hardware implemented backend search engine for a high-rate speech recognition system

Patent number: 8352265

Abstract: A hardware implemented backend search stage, or engine, for a speech recognition system is provided. In one embodiment, the backend search engine includes a number of pipelined stages including a fetch stage, an updating stage which may be a Viterbi stage, a transition and prune stage, and a language model stage. Each active triphone of each active word is represented by a corresponding triphone model. By being pipelined, the stages of the backend search engine are enabled to simultaneously process different triphone models, thereby providing high-rate backend searching for the speech recognition system. In one embodiment, caches may be used to cache frequently and/or recently accessed triphone information utilized by the fetch stage, frequently and/or recently accessed triphone-to-senone mappings utilized by the updating stage, or both.

Type: Grant

Filed: December 22, 2008

Date of Patent: January 8, 2013

Inventors: Edward Lin, Rob A. Rutenbar
Turbo Processing of Speech Recognition

Publication number: 20130006631

Abstract: Environmental recognition systems may improve recognition accuracy by leveraging local and nonlocal features in a recognition target. A local decoder may be used to analyze local features, and a nonlocal decoder may be used to analyze nonlocal features. Local and nonlocal estimates may then be exchanged to improve the accuracy of the local and nonlocal decoders. Additional iterations of analysis and exchange may be performed until a predetermined threshold is reached. In some embodiments, the system may comprise extrinsic information extractors to prevent positive feedback loops from causing the system to adhere to erroneous previous decisions.

Type: Application

Filed: June 28, 2012

Publication date: January 3, 2013

Applicant: UTAH STATE UNIVERSITY

Inventors: Jacob Gunther, Todd Moon
METHOD AND APPARATUS FOR COMPUTING GAUSSIAN LIKELIHOODS

Publication number: 20120330664

Abstract: The present invention relates to a method and apparatus for computing Gaussian likelihoods. One embodiment of a method for processing a speech sample includes generating a feature vector for each frame of the speech signal, evaluating the feature vector in accordance with a hierarchical Gaussian shortlist, and producing a hypothesis regarding a content of the speech signal, based on the evaluating.

Type: Application

Filed: June 24, 2011

Publication date: December 27, 2012

Inventors: XIN LEI, JING ZHENG
Fully learning classification system and method for hearing aids

Patent number: 8335332

Abstract: A method for operating a hearing aid in a hearing aid system where the hearing aid is continuously learnable for the particular user. A sound environment classification system is provided for tracking and defining sound environment classes relevant to the user. In an ongoing learning process, the classes are redefined based on new environments to which the hearing aid is subjected by the user.

Type: Grant

Filed: June 23, 2008

Date of Patent: December 18, 2012

Assignees: Siemens Audiologische Technik GmbH, University Of Ottawa

Inventors: Tyseer Aboulnasr, Eghart Fischer, Christian Giguère, Wail Gueaieb, Volkmar Hamacher, Luc Lamarche
Systems and methods for audio signal analysis and modification

Patent number: 8315857

Abstract: Systems and methods for modification of an audio input signal are provided. In exemplary embodiments, an adaptive multiple-model optimizer is configured to generate at least one source model parameter for facilitating modification of an analyzed signal. The adaptive multiple-model optimizer comprises a segment grouping engine and a source grouping engine. The segment grouping engine is configured to group simultaneous feature segments to generate at least one segment model. The at least one segment model is used by the source grouping engine to generate at least one source model, which comprises the at least one source model parameter. Control signals for modification of the analyzed signal may then be generated based on the at least one source model parameter.

Type: Grant

Filed: May 30, 2006

Date of Patent: November 20, 2012

Assignee: Audience, Inc.

Inventors: David Klein, Stephen Malinowski, Lloyd Watts, Bernard Mont-Reynaud
Botnet early detection using hybrid hidden markov model algorithm

Patent number: 8307459

Abstract: A botnet detection system is provided. A bursty feature extractor receives an Internet Relay Chat (IRC) packet value from a detection object network, and determines a bursty feature accordingly. A Hybrid Hidden Markov Model (HHMM) parameter estimator determines probability parameters for a Hybrid Hidden Markov Model according to the bursty feature. A traffic profile generator establishes a probability sequential model for the Hybrid Hidden Markov Model according to the probability parameters and pre-defined network traffic categories. A dubious state detector determines a traffic state corresponding to a network relaying the IRC packet in response to reception of a new IRC packet, determines whether the IRC packet flow of the object network is dubious by applying the bursty feature to the probability sequential model for the Hybrid Hidden Markov Model, and generates a warning signal when the IRC packet flow is regarded as having a dubious traffic state.

Type: Grant

Filed: March 17, 2010

Date of Patent: November 6, 2012

Assignee: National Taiwan University of Science and Technology

Inventors: Hahn-Ming Lee, Ching-Hao Mao, Yu-Jie Chen, Yi-Hsun Wang, Jerome Yeh, Tsu-Han Chen
Minimum classification error training with growth transformation optimization

Patent number: 8301449

Abstract: Hidden Markov Model (HMM) parameters are updated using update equations based on growth transformation optimization of a minimum classification error objective function. Using the list of N-best competitor word sequences obtained by decoding the training data with the current-iteration HMM parameters, the current HMM parameters are updated iteratively. The updating procedure involves using weights for each competitor word sequence that can take any positive real value. The updating procedure is further extended to the case where a decoded lattice of competitors is used. In this case, updating the model parameters relies on determining the probability for a state at a time point based on the word that spans the time point instead of the entire word sequence. This word-bound span of time is shorter than the duration of the entire word sequence and thus reduces the computing time.

Type: Grant

Filed: October 16, 2006

Date of Patent: October 30, 2012

Assignee: Microsoft Corporation

Inventors: Xiaodong He, Li Deng
System and method for recording voice data and converting voice data to a text file

Patent number: 8265930

Abstract: The present invention relates to recording voice data using a voice communication device connected to a communication network and converting the voice data into a text file for delivery to a text communication device. In accordance with the present invention, the voice communication device may transfer the voice data in real-time or store the voice data on the device to be transmitted at a later time. Transcribing the voice data into a text file may be accomplished by automated computer software, either speaker-independent or speaker-dependent or by a human who transcribes the voice data into a text file. After transcribing the voice data into a text file, the text file may be delivered to a text communication device in a number of ways, such as email, file transfer protocol (FTP), or hypertext transfer protocol (HTTP).

Type: Grant

Filed: April 13, 2005

Date of Patent: September 11, 2012

Assignee: Sprint Communications Company L.P.

Inventors: Bryce A. Jones, Raymond Edward Dickensheets
Calculating cost measures between HMM acoustic models

Patent number: 8234116

Abstract: Measurement of Kullback-Leibler Divergence (KLD) between hidden Markov models (HMM) of acoustic units utilizes an unscented transform to approximate KLD between Gaussian mixtures. Dynamic programming equalizes the number of states between HMMs having a different number of states, while the total KLD of the HMMs is obtained by summing individual KLDs calculated by state pair by state pair comparisons.

Type: Grant

Filed: August 22, 2006

Date of Patent: July 31, 2012

Assignee: Microsoft Corporation

Inventors: Peng Liu, Frank Kao-Ping K. Soong, Jian-Lai Zhou

1 2 3 next