Duration Modeling In Hmm, E.g., Semi Hmm, Segmental Models, Transition Probabilities (epo) Patents (Class 704/256.4)
  • Patent number: 12249319
    Abstract: Implementations relate to determining a language for speech recognition of a spoken utterance, received via an automated assistant interface, for interacting with an automated assistant. Implementations can enable multilingual interaction with the automated assistant, without necessitating a user explicitly designate a language to be utilized for each interaction. Selection of a speech recognition model for a particular language can based on one or more interaction characteristics exhibited during a dialog session between a user and an automated assistant. Such interaction characteristics can include anticipated user input types, anticipated user input durations, a duration for monitoring for a user response, and/or an actual duration of a provided user response.
    Type: Grant
    Filed: November 13, 2023
    Date of Patent: March 11, 2025
    Assignee: GOOGLE LLC
    Inventors: Pu-Sen Chao, Diego Melendo Casado, Ignacio Lopez Moreno
  • Patent number: 12073821
    Abstract: A system capable of speech gap modulation is configured to: receive at least one composite speech portion, which comprises at least one speech portion and at least one dynamic-gap portion, wherein the speech portion(s) comprising at least one variable-value speech portion, wherein the dynamic-gap portion(s) associated with a pause in speech; receive at least one synchronization point, wherein synchronization point(s) is associating a point in time in the composite speech portion(s) and a point in time in other media portion(s); and modulate dynamic-gap portion(s), based at least partially on the at variable-value speech portion(s), and on the point(s), thereby generating at least one modulated composite speech portion. This facilitates improved synchronization of the modulated composite speech portion(s) and the other media portion(s) at the synchronization point(s), when combining the other media portion(s) and the audio-format modulated composite speech portion(s) into a synchronized multimedia output.
    Type: Grant
    Filed: January 30, 2020
    Date of Patent: August 27, 2024
    Assignee: IGENTIFY LTD.
    Inventors: Zohar Sherman, Ori Inbar
  • Patent number: 11900943
    Abstract: A method of zoning a transcription of audio data includes separating the transcription of audio data into a plurality of utterances. A that each word in an utterances is a meaning unit boundary is calculated. The utterance is split into two new utterances at a work with a maximum calculated probability. At least one of the two new utterances that is shorter than a maximum utterance threshold is identified as a meaning unit.
    Type: Grant
    Filed: January 3, 2022
    Date of Patent: February 13, 2024
    Assignee: Verint Systems Ltd.
    Inventors: Roni Romano, Yair Horesh, Jeremie Dreyfuss
  • Patent number: 11817085
    Abstract: Implementations relate to determining a language for speech recognition of a spoken utterance, received via an automated assistant interface, for interacting with an automated assistant. Implementations can enable multilingual interaction with the automated assistant, without necessitating a user explicitly designate a language to be utilized for each interaction. Selection of a speech recognition model for a particular language can based on one or more interaction characteristics exhibited during a dialog session between a user and an automated assistant. Such interaction characteristics can include anticipated user input types, anticipated user input durations, a duration for monitoring for a user response, and/or an actual duration of a provided user response.
    Type: Grant
    Filed: December 14, 2020
    Date of Patent: November 14, 2023
    Assignee: GOOGLE LLC
    Inventors: Pu-Sen Chao, Diego Melendo Casado, Ignacio Lopez Moreno
  • Patent number: 11817084
    Abstract: The present disclosure relates generally to determining a language for speech recognition of a spoken utterance, received via an automated assistant interface, for interacting with an automated assistant. The system can enable multilingual interaction with the automated assistant, without necessitating a user explicitly designate a language to be utilized for each interaction. Selection of a speech recognition model for a particular language can based on one or more interaction characteristics exhibited during a dialog session between a user and an automated assistant. Such interaction characteristics can include anticipated user input types, anticipated user input durations, a duration for monitoring for a user response, and/or an actual duration of a provided user response.
    Type: Grant
    Filed: May 21, 2020
    Date of Patent: November 14, 2023
    Assignee: GOOGLE LLC
    Inventors: Pu-sen Chao, Diego Melendo Casado, Ignacio Lopez Moreno
  • Patent number: 11715464
    Abstract: Systems and processes for creating and updating natural language models are provided. An example process of creating a natural language model includes, at an electronic device with one or more processors and memory, receiving an utterance, associating an action structure with the utterance, determining a plurality of augmented utterances based on the received utterance, creating a natural language model including the received utterance and the plurality of augmented utterances by mapping the plurality of augmented utterance to the associated action structure, and providing the natural language model including the received utterance and the plurality of augmented utterances to a second electronic device.
    Type: Grant
    Filed: July 2, 2021
    Date of Patent: August 1, 2023
    Assignee: Apple Inc.
    Inventors: Thomas Robert Nickson, Keith Scott Brisson, Eric Gregory, Thomas B. Gunter, Arthur A. Van Hoff
  • Patent number: 11670289
    Abstract: Systems and processes are disclosed for handling a multi-part voice command for a virtual assistant. Speech input can be received from a user that includes multiple actionable commands within a single utterance. A text string can be generated from the speech input using a speech transcription process. The text string can be parsed into multiple candidate substrings based on domain keywords, imperative verbs, predetermined substring lengths, or the like. For each candidate substring, a probability can be determined indicating whether the candidate substring corresponds to an actionable command. Such probabilities can be determined based on semantic coherence, similarity to user request templates, querying services to determine manageability, or the like. If the probabilities exceed a threshold, the user intent of each substring can be determined, processes associated with the user intents can be executed, and an acknowledgment can be provided to the user.
    Type: Grant
    Filed: December 18, 2020
    Date of Patent: June 6, 2023
    Assignee: Apple Inc.
    Inventors: Thomas R. Gruber, Harry J. Saddler, Jerome Rene Bellegarda, Bryce H. Nyeggen, Alessandro Sabatelli
  • Patent number: 11574641
    Abstract: A processor-implemented method with data recognition includes: extracting input feature data from input data; calculating a matching score between the extracted input feature data and enrolled feature data of an enrolled user, based on the extracted input feature data, common component data of a plurality of enrolled feature data corresponding to the enrolled user, and distribution component data of the plurality of enrolled feature data corresponding to the enrolled user; and recognizing the input data based on the matching score.
    Type: Grant
    Filed: April 10, 2020
    Date of Patent: February 7, 2023
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Sung-Un Park, Kyuhong Kim
  • Patent number: 11423904
    Abstract: A computer-implemented method of false keyphrase rejection comprises receiving a captured audio signal of human speech including one or more keyphrases that trigger an action. It also comprises detecting whether or not at least part of the speech is spoken by at least one computer originated voice. The method also has an operation of omitting the triggering of the action at least partly due to the computer originated voice being recognized in the speech.
    Type: Grant
    Filed: November 9, 2020
    Date of Patent: August 23, 2022
    Assignee: Intel Corporation
    Inventors: Jacek Ossowski, Tobias Bocklet, Kuba Lopatka
  • Patent number: 11417314
    Abstract: A speech synthesis method, a speech synthesis device, and an electronic apparatus are provided, which relate to a field of speech synthesis. Specific implementation solution is the following: inputting text information into an encoder of an acoustic model, to output a text feature of a current time step; splicing the text feature of the current time step with a spectral feature of a previous time step to obtain a spliced feature of the current time step, and inputting the spliced feature of the current time step into an decoder of the acoustic model to obtain a spectral feature of the current time step; and inputting the spectral feature of the current time step into a neural network vocoder, to output speech.
    Type: Grant
    Filed: February 21, 2020
    Date of Patent: August 16, 2022
    Assignee: Baidu Online Network Technology (Beijing) Co., Ltd.
    Inventors: Chenxi Sun, Tao Sun, Xiaolin Zhu, Wenfu Wang
  • Patent number: 11182431
    Abstract: Systems and methods for voice searching media content based on metadata or subtitles are provided. Metadata associated with media content can be pre-processed at a media server. Upon receiving a vocal command representative of a search for an aspect of the media content, the media server performs a search for one or more portions of the media content relevant to the aspect of the media content being searched for. The media performs the search by matching the aspect of the media content being searched for with the pre-processed metadata.
    Type: Grant
    Filed: December 11, 2014
    Date of Patent: November 23, 2021
    Assignee: Disney Enterprises, Inc.
    Inventors: Jing X. Wang, Mark Arana, Edward Drake, Alexander C. Chen
  • Patent number: 10971140
    Abstract: A speech recognition circuit comprises an input buffer for receiving processed speech parameters. A lexical memory contains lexical data for word recognition. The lexical data comprises a plurality of lexical tree data structures. Each lexical tree data structure comprises a model of words having common prefix components. An initial component of each lexical tree structure is unique. A plurality of lexical tree processors are connected in parallel to the input buffer for processing the speech parameters in parallel to perform parallel lexical tree processing for word recognition by accessing the lexical data in the lexical memory. A results memory is connected to the lexical tree processors for storing processing results from the lexical tree processors and lexical tree identifiers to identify lexical trees to be processed by the lexical tree processors.
    Type: Grant
    Filed: February 4, 2019
    Date of Patent: April 6, 2021
    Assignee: Zentian Limited
    Inventor: Mark Catchpole
  • Patent number: 10217456
    Abstract: A method and system for generating training data for a target domain using speech data of a source domain. The training data generation method including: reading out a Gaussian mixture model (GMM) of a target domain trained with a clean speech data set of the target domain; mapping, by referring to the GMM of the target domain, a set of source domain speech data received as an input to the set of target domain speech data on a basis of a channel characteristic of the target domain speech data; and adding a noise of the target domain to the mapped set of source domain speech data to output a set of pseudo target domain speech data.
    Type: Grant
    Filed: April 14, 2014
    Date of Patent: February 26, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Osamu Ichikawa, Steven J Rennie
  • Patent number: 10192116
    Abstract: The disclosure relates to recognizing data such as items or entities in content. In some aspects, content may be received and feature information, such as face recognition data and voice recognition data may be generated. Scene segmentation may also be performed on the content, grouping the various shots of the video content into one or more shot collections, such as scenes. For example, a decision lattice representative of possible scene segmentations may be determined and the most probable path through the decision lattice may be selected as the scene segmentation. Upon generating the feature information and performing the scene segmentation, one or more items or entities that are present in the scene may be identified.
    Type: Grant
    Filed: May 27, 2016
    Date of Patent: January 29, 2019
    Assignee: Comcast Cable Communications, LLC
    Inventors: Jan Neumann, Evelyne Tzoukermann, Amit Bagga, Oliver Jojic, Bageshree Shevade, David F. Houghton, Corey Farrell
  • Patent number: 9959870
    Abstract: A system and method of speech recognition involving a mobile device. Speech input is received (202) on a mobile device (102) and converted (204) to a set of phonetic symbols. Data relating to the phonetic symbols is transferred (206) from the mobile device over a communications network (104) to a remote processing device (106) where it is used (208) to identify at least one matching data item from a set of data items (114). Data relating to the at least one matching data item is transferred (210) from the remote processing device to the mobile device and presented (214) thereon.
    Type: Grant
    Filed: December 10, 2009
    Date of Patent: May 1, 2018
    Assignee: Apple Inc.
    Inventors: Melvyn Hunt, John Bridle
  • Patent number: 9959260
    Abstract: The invention provides for a system, method, and computer readable medium storing instructions related to controlling a presentation in a multimodal system. The method embodiment of the invention is a method for the retrieval of information on the basis of its content for incorporation into an electronic presentation. The method comprises receiving from a user a content-based request for at least one segment from a first plurality of segments within a media presentation preprocessed to enable natural language content searchability; in response to the request, presenting a subset of the first plurality of segments to the user; receiving a selection indication from the user associated with at least one segment of the subset of the first plurality of segments and adding the selected at least one segment to a deck for use in a presentation.
    Type: Grant
    Filed: May 4, 2015
    Date of Patent: May 1, 2018
    Assignee: Nuance Communications, Inc.
    Inventors: Patrick Ehlen, David Crawford Gibbon, Mazin Gilbert, Michael Johnston, Zhu Liu, Behzad Shahraray
  • Patent number: 9785613
    Abstract: Embodiments of the present invention include an apparatus, method, and system for acoustic modeling. In an embodiment, a speech recognition system is provided. The system includes a processing unit configured to divide a received audio signal into consecutive frames having respective frame vectors, an acoustic processing unit (APU), a data bus that couples the processing unit and the APU. The APU includes a local, non-volatile memory that stores a plurality of senones, a memory buffer coupled to the memory, the acoustic processing unit being configured to load at least one Gaussian probability distribution vector stored in the memory into the memory buffer, and a scoring unit configured to simultaneously compare a plurality of dimensions of a Gaussian probability distribution vector loaded into the memory buffer with respective dimensions of a frame vector received from the processing unit and to output a corresponding score to the processing unit.
    Type: Grant
    Filed: June 6, 2012
    Date of Patent: October 10, 2017
    Assignee: Cypress Semiconductor Corporation
    Inventors: Venkataraman Natarajan, Stephan Rosner
  • Patent number: 9753890
    Abstract: Embodiments of the present invention include an apparatus, method, and system for acoustic modeling. In an embodiment, a speech recognition system is provided. The system includes a processing unit configured to divide a received audio signal into consecutive frames having respective frame vectors, an acoustic processing unit (APU), a data bus that couples the processing unit and the APU. The APU includes a local, non-volatile memory that stores a plurality of senones, a memory buffer coupled to the memory, the acoustic processing unit being configured to load at least one Gaussian probability distribution vector stored in the memory into the memory buffer, and a scoring unit configured to simultaneously compare a plurality of dimensions of a Gaussian probability distribution vector loaded into the memory buffer with respective dimensions of a frame vector received from the processing unit and to output a corresponding score to the processing unit.
    Type: Grant
    Filed: June 6, 2012
    Date of Patent: September 5, 2017
    Assignee: Cypress Semiconductor Corporation
    Inventors: Venkataraman Natarajan, Stephan Rosner
  • Patent number: 9378423
    Abstract: The disclosure relates to recognizing data such as items or entities in content. In some aspects, content may be received and feature information, such as face recognition data and voice recognition data may be generated. Scene segmentation may also be performed on the content, grouping the various shots of the video content into one or more shot collections, such as scenes. For example, a decision lattice representative of possible scene segmentations may be determined and the most probable path through the decision lattice may be selected as the scene segmentation. Upon generating the feature information and performing the scene segmentation, one or more items or entities that are present in the scene may be identified.
    Type: Grant
    Filed: September 3, 2014
    Date of Patent: June 28, 2016
    Assignee: Comcast Cable Communications, LLC
    Inventors: Jan Neumann, Evelyne Tzoukermann, Amit Bagga, Oliver Jojic, Bageshree Shevade, David F. Houghton, Corey Farrell
  • Patent number: 9286414
    Abstract: The subject disclosure relates to one or more computer-implemented processes for collecting, analyzing, and employing annotations of data sources. In particular, an annotation component is configured to receive annotations of data for a data source, wherein the respective annotations comprise different associations of a global terms with the data of the data source, a data store configured to store the annotations, and an interface component configured to render the data based on the annotations in response to a request for the data. In an aspect, storing information, the data also stores descriptions of the data sources and definitions of the global terms, and the interface component determines a subset of the information in the data store based on the annotations. A method is further provided comprising receiving a global term and determining data sources that have the global term associated with the data thereof based on the information in the data store.
    Type: Grant
    Filed: December 2, 2011
    Date of Patent: March 15, 2016
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Alex James, Michael Pizzo, Pablo Castro, Michael Justin Flasko, Lance Olson, Jason Clark, Siddharth Jayadevan
  • Patent number: 9230548
    Abstract: Embodiments of the present invention include a data storage device and a method for storing data in a hash table. The data storage device can include a first memory device, a second memory device, and a processing device. The first memory device is configured to store one or more data elements. The second memory device is configured to store one or more status bits at one or more respective table indices. In addition, each of the table indices is mapped to a corresponding table index in the first memory device. The processing device is configured to calculate one or more hash values based on the one or more data elements.
    Type: Grant
    Filed: December 21, 2012
    Date of Patent: January 5, 2016
    Assignee: Cypress Semiconductor Corporation
    Inventors: Richard M. Fastow, Ojas A. Bapat
  • Patent number: 9208778
    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for combining frame and segment level processing, via temporal pooling, for phonetic classification. A frame processor unit receives an input and extracts the time-dependent features from the input. A plurality of pooling interface units generates a plurality of feature vectors based on pooling the time-dependent features and selecting a plurality of time-dependent features according to a plurality of selection strategies. Next, a plurality of segmental classification units generates scores for the feature vectors. Each segmental classification unit (SCU) can be dedicated to a specific pooling interface unit (PIU) to form a PIU-SCU combination. Multiple PIU-SCU combinations can be further combined to form an ensemble of combinations, and the ensemble can be diversified by varying the pooling operations used by the PIU-SCU combinations.
    Type: Grant
    Filed: November 10, 2014
    Date of Patent: December 8, 2015
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Sumit Chopra, Dimitrios Dimitriadis, Patrick Haffner
  • Patent number: 9009039
    Abstract: Technologies are described herein for noise adaptive training to achieve robust automatic speech recognition. Through the use of these technologies, a noise adaptive training (NAT) approach may use both clean and corrupted speech for training. The NAT approach may normalize the environmental distortion as part of the model training. A set of underlying “pseudo-clean” model parameters may be estimated directly. This may be done without point estimation of clean speech features as an intermediate step. The pseudo-clean model parameters learned from the NAT technique may be used with a Vector Taylor Series (VTS) adaptation. Such adaptation may support decoding noisy utterances during the operating phase of a automatic voice recognition system.
    Type: Grant
    Filed: June 12, 2009
    Date of Patent: April 14, 2015
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Michael Lewis Seltzer, James Garnet Droppo, Ozlem Kalinli, Alejandro Acero
  • Patent number: 8959014
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training acoustic models. Speech data and data identifying a transcription for the speech data are received. A phonetic representation for the transcription is accessed. Training sequences are identified for a particular phone in the phonetic representation. Each of the training sequences includes a different set of contextual phones surrounding the particular phone. A partitioning key is identified based on a sequence of phones that occurs in each of the training sequences. A processing module to which the identified partitioning key is assigned is selected. Data identifying the training sequences and a portion of the speech data are transmitted to the selected processing module.
    Type: Grant
    Filed: June 29, 2012
    Date of Patent: February 17, 2015
    Assignee: Google Inc.
    Inventors: Peng Xu, Fernando Pereira, Ciprian I. Chelba
  • Patent number: 8935170
    Abstract: A speech recognition system, according to an example embodiment, includes a data storage to store speech training data. A training engine determines consecutive breakout periods in the speech training data, calculates forward and backward probabilities for the breakout periods, and generates a speech recognition Hidden Markov Model (HMM) from the forward and backward probabilities calculated for the breakout periods.
    Type: Grant
    Filed: November 27, 2012
    Date of Patent: January 13, 2015
    Assignee: Longsand Limited
    Inventor: Maha Kadirkamanathan
  • Patent number: 8868423
    Abstract: Systems and methods for controlling access to resources using spoken Completely Automatic Public Turing Tests To Tell Humans And Computers Apart (CAPTCHA) tests are disclosed. In these systems and methods, entities seeking access to resources are required to produce an input utterance that contains at least some audio. That utterance is compared with voice reference data for human and machine entities, and a determination is made as to whether the entity requesting access is a human or a machine. Access is then permitted or refused based on that determination.
    Type: Grant
    Filed: July 11, 2013
    Date of Patent: October 21, 2014
    Assignee: John Nicholas and Kristin Gross Trust
    Inventor: John Nicholas Gross
  • Patent number: 8818802
    Abstract: A method for real-time data-pattern analysis. The method includes receiving and queuing at least one data-pattern analysis request by a data-pattern analysis unit controller. At least one data stream portion is also received and stored by the data-pattern analysis unit controller, each data stream portion corresponding to a received data-pattern analysis request. Next, a received data-pattern analysis request is selected by the data-pattern analysis unit controller along with a corresponding data stream portion. A data-pattern analysis is performed based on the selected data-pattern analysis request and the corresponding data stream portion, wherein the data-pattern analysis is performed by one of a plurality of data-pattern analysis units.
    Type: Grant
    Filed: August 9, 2010
    Date of Patent: August 26, 2014
    Assignee: Spansion LLC
    Inventors: Richard Fastow, Qamrul Hasan
  • Patent number: 8725508
    Abstract: A computer-implemented method and apparatus for searching for an element sequence, the method comprising: receiving a signal; determining an initial segment of the signal; inputting the initial segment into an element extraction engine to obtain a first element sequence; determining one or more second segments, each of the second segments at least partially overlapping with the initial segment; inputting the second segments into the element extraction engine to obtain at least one second element sequence; and searching for an element subsequence common to at least a predetermined number of sequences of the first element sequence and the second element sequences.
    Type: Grant
    Filed: March 27, 2012
    Date of Patent: May 13, 2014
    Assignee: Novospeech
    Inventor: Yossef Ben-Ezra
  • Patent number: 8700403
    Abstract: A method of statistical modeling is provided which includes constructing a statistical model and incorporating Gaussian priors during feature selection and during parameter optimization for the construction of the statistical model.
    Type: Grant
    Filed: November 3, 2005
    Date of Patent: April 15, 2014
    Assignee: Robert Bosch GmbH
    Inventors: Fuliang Weng, Lin Zhao
  • Patent number: 8543402
    Abstract: System and methods for robust multiple speaker segmentation in noisy conversational speech are presented. Robust voice activity detection is applied to detect temporal speech events. In order to get robust speech features and detect speech events in a noisy environment, a noise reduction algorithm is applied, using noise tracking. After noise reduction and voice activity detection, the incoming audio/speech is initially labeled as speech segments or silence segments. With no prior knowledge of the number of speakers, the system identifies one reliable speech segment near the beginning of the conversational speech and extracts speech features with a short latency, then learns a statistical model from the selected speech segment. This initial statistical model is used to identify the succeeding speech segments in a conversation. The statistical model is also continuously adapted and expanded with newly identified speech segments that match well to the model.
    Type: Grant
    Filed: April 29, 2011
    Date of Patent: September 24, 2013
    Assignee: The Intellisis Corporation
    Inventor: Jiyong Ma
  • Patent number: 8510111
    Abstract: A speech recognition apparatus includes a generating unit generating a speech-feature vector expressing a feature for each of frames obtained by dividing an input speech, a storage unit storing a first acoustic model obtained by modeling a feature of each word by using a state transition model, a storage unit configured to store at least one second acoustic model, a calculation unit calculating, for each state, a first probability of transition to an at-end-frame state to obtain first probabilities, and select a maximum probability of the first probabilities, a selection unit selecting a maximum-probability-transition path, a conversion unit converting the maximum-probability-transition path into a corresponding-transition-path corresponding to the second acoustic model, a calculation unit calculating a second probability of transition to the at-end-frame state on the corresponding-transition-path, and a finding unit finding to which word the input speech corresponds based on the maximum probability and the s
    Type: Grant
    Filed: February 8, 2008
    Date of Patent: August 13, 2013
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Masaru Sakai, Hiroshi Fujimura, Shinichi Tanaka
  • Patent number: 8494854
    Abstract: An audible based electronic challenge system is used to control access to a computing resource by using a test to identify an origin of a voice. The test is based on analyzing a spoken utterance using optimized challenge items selected for their discrimination capability to determine if it was articulated by an unauthorized human or a text to speech (TTS) system.
    Type: Grant
    Filed: June 15, 2009
    Date of Patent: July 23, 2013
    Assignee: John Nicholas and Kristin Gross
    Inventor: John Nicholas Gross
  • Patent number: 8489399
    Abstract: An audible based electronic challenge system is used to control access to a computing resource by using a test to identify an origin of a voice. The test is based on analyzing a spoken utterance to determine if it was articulated by an unauthorized human or a text to speech (TTS) system.
    Type: Grant
    Filed: June 15, 2009
    Date of Patent: July 16, 2013
    Assignee: John Nicholas and Kristin Gross Trust
    Inventor: John Nicholas Gross
  • Patent number: 8442828
    Abstract: A conditional model is used in spoken language understanding. One such model is a conditional random field model.
    Type: Grant
    Filed: March 17, 2006
    Date of Patent: May 14, 2013
    Assignee: Microsoft Corporation
    Inventors: Ye-Yi Wang, Alejandro Acero, John Sie Yuen Lee, Milind V. Mahajan
  • Patent number: 8374869
    Abstract: An utterance verification method for an isolated word N-best speech recognition result includes: calculating log likelihoods of a context-dependent phoneme and an anti-phoneme model based on an N-best speech recognition result for an input utterance; measuring a confidence score of an N-best speech-recognized word using the log likelihoods; calculating distance between phonemes for the N-best speech-recognized word; comparing the confidence score with a threshold and the distance with a predetermined mean of distances; and accepting the N-best speech-recognized word when the compared results for the confidence score and the distance correspond to acceptance.
    Type: Grant
    Filed: August 4, 2009
    Date of Patent: February 12, 2013
    Assignee: Electronics and Telecommunications Research Institute
    Inventors: Jeom Ja Kang, Yunkeun Lee, Jeon Gue Park, Ho-Young Jung, Hyung-Bae Jeon, Hoon Chung, Sung Joo Lee, Euisok Chung, Ji Hyun Wang, Byung Ok Kang, Ki-young Park, Jong Jin Kim
  • Patent number: 8301449
    Abstract: Hidden Markov Model (HMM) parameters are updated using update equations based on growth transformation optimization of a minimum classification error objective function. Using the list of N-best competitor word sequences obtained by decoding the training data with the current-iteration HMM parameters, the current HMM parameters are updated iteratively. The updating procedure involves using weights for each competitor word sequence that can take any positive real value. The updating procedure is further extended to the case where a decoded lattice of competitors is used. In this case, updating the model parameters relies on determining the probability for a state at a time point based on the word that spans the time point instead of the entire word sequence. This word-bound span of time is shorter than the duration of the entire word sequence and thus reduces the computing time.
    Type: Grant
    Filed: October 16, 2006
    Date of Patent: October 30, 2012
    Assignee: Microsoft Corporation
    Inventors: Xiaodong He, Li Deng
  • Patent number: 8229729
    Abstract: A system and method for training a statistical machine translation model and decoding or translating using the same is disclosed. A source word versus target word co-occurrence matrix is created to define word pairs. Dimensionality of the matrix may be reduced. Word pairs are mapped as vectors into continuous space where the word pairs are vectors of continuous real numbers and not discrete entities in the continuous space. A machine translation parametric model is trained using an acoustic model training method based on word pair vectors in the continuous space.
    Type: Grant
    Filed: March 25, 2008
    Date of Patent: July 24, 2012
    Assignee: International Business Machines Corporation
    Inventors: Ruhi Sarikaya, Yonggang Deng, Brian Edward Doorenbos Kingsbury, Yuqing Gao
  • Patent number: 8078462
    Abstract: A transformation-parameter calculating unit calculates a first model parameter indicating a parameter of a speaker model for causing a first likelihood for a clean feature to maximum, and calculates a transformation parameter for causing the first likelihood to maximum. The transformation parameter transforms, for each of the speakers, a distribution of the clean feature corresponding to the identification information of the speaker to a distribution represented by the speaker model of the first model parameter. A model-parameter calculating unit transforms a noisy feature corresponding to identification information for each of speakers by using the transformation parameter, and calculates a second model parameter indicating a parameter of the speaker model for causing a second likelihood for the transformed noisy feature to maximum.
    Type: Grant
    Filed: October 2, 2008
    Date of Patent: December 13, 2011
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Yusuke Shinohara, Masami Akamine
  • Patent number: 7941317
    Abstract: Systems and methods for low-latency real-time speech recognition/transcription. A discriminative feature extraction, such as a heteroscedastic discriminant analysis transform, in combination with a maximum likelihood linear transform is applied during front-end processing of a digital speech signal. The extracted features reduce the word error rate. A discriminative acoustic model is applied by generating state-level lattices using Maximum Mutual Information Estimation. Recognition networks of language models are replaced by their closure. Latency is reduced by eliminating segmentation such that a number of words/sentences can be recognized as a single utterance. Latency is further reduced by performing front-end normalization in a causal fashion.
    Type: Grant
    Filed: June 5, 2007
    Date of Patent: May 10, 2011
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Vincent Goffin, Michael Dennis Riley, Murat Saraclar
  • Patent number: 7930181
    Abstract: Systems and methods for low-latency real-time speech recognition/transcription. A discriminative feature extraction, such as a heteroscedastic discriminant analysis transform, in combination with a maximum likelihood linear transform is applied during front-end processing of a digital speech signal. The extracted features reduce the word error rate. A discriminative acoustic model is applied by generating state-level lattices using Maximum Mutual Information Estimation. Recognition networks of language models are replaced by their closure. Latency is reduced by eliminating segmentation such that a number of words/sentences can be recognized as a single utterance. Latency is further reduced by performing front-end normalization in a causal fashion.
    Type: Grant
    Filed: November 21, 2002
    Date of Patent: April 19, 2011
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Vincent Goffin, Michael Dennis Riley, Murat Saraclar
  • Patent number: 7895040
    Abstract: According to an embodiment, voice recognition apparatus includes units of: acoustic processing, voice interval detecting, dictionary, collating, search target selecting, storing and determining, and voice recognition method includes processes of: selecting a search range on basis of a beam search, setting and storing a standard frame, storing an output probability of a certain transition path, determining whether or not the output probability of a certain path is stored. Number of times of calculation of the output probability is reduced by selecting the search range on basis of the beam search, calculating the output probability of the certain transition path only once in an interval from when the standard frame is set to when the standard frame is renewed, and storing and using thus calculated value as an approximate value of the output probability in subsequent frames.
    Type: Grant
    Filed: March 30, 2007
    Date of Patent: February 22, 2011
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Masaru Sakai, Shinichi Tanaka
  • Patent number: 7856356
    Abstract: A speech recognition system for a mobile terminal includes an acoustic variation channel unit and a pronunciation channel unit. The acoustic variation channel unit transforms a speech signal into feature parameters and Viterbi-decodes the speech signal to produce a varied phoneme sequence by using the feature parameters and predetermined models. Further, the pronunciation variation channel unit Viterbi-decodes the varied phoneme sequence to produce a word phoneme sequence by using the varied phoneme sequence and a preset DHMM (Discrete Hidden Markov Model) based context-dependent error model.
    Type: Grant
    Filed: December 20, 2006
    Date of Patent: December 21, 2010
    Assignee: Electronics and Telecommunications Research Institute
    Inventors: Hoon Chung, Yunkeun Lee
  • Patent number: 7711561
    Abstract: The present invention relates to speech recognition systems, particularly speech-to-text systems and software and decoders for the same.
    Type: Grant
    Filed: April 15, 2004
    Date of Patent: May 4, 2010
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Wide Hogenhout, Kean Kheong Chin
  • Patent number: 7672847
    Abstract: Methods are given for improving discriminative training of hidden Markov models for continuous speech recognition. For a mixture component of a hidden Markov model state, a gradient adjustment is calculated of the standard deviation of the mixture component. If the calculated gradient adjustment is greater than a first threshold amount, an adjustment is performed of the standard deviation of the mixture component using the first threshold. If the calculated gradient adjustment is less than a second threshold amount, an adjustment is performed of the standard deviation of the mixture component using the second threshold. Otherwise, an adjustment is performed of the standard deviation of the mixture component using the calculated gradient adjustment.
    Type: Grant
    Filed: September 30, 2008
    Date of Patent: March 2, 2010
    Assignee: Nuance Communications, Inc.
    Inventors: Chuang He, Jianxiong Wu, Vlad Sejnoha
  • Patent number: 7664643
    Abstract: A method, and a system to execute this method is being presented for the identification and separation of sources of an acoustic signal, which signal contains a mixture of multiple simultaneous component signals. The method represents the signal with multiple discrete state-variable sequences and combines acoustic and context level dynamics to achieve the source separation. The method identifies sources by discovering those frames of the signal whose features are dominated by single sources. The signal may be the simultaneous speech of multiple speakers.
    Type: Grant
    Filed: August 25, 2006
    Date of Patent: February 16, 2010
    Assignees: Nuance Communications, Inc.
    Inventors: Ramesh Ambat Gopinath, John Randall Hershey, Trausti Thor Kristjansson, Peder Andreas Olsen, Steven John Rennie
  • Patent number: 7587321
    Abstract: According to one aspect of the invention, a method is provided in which a set of multiple mixture monophone models is created and trained to generate a set of multiple mixture context dependent models. A set of single mixture triphone models is created and trained to generate a set of context dependent models. Corresponding states of the triphone models are clustered to obtain a set of tied states based on a decision tree clustering process. Parameters of the context dependent models are estimated using a data dependent maximum a posteriori (MAP) adaptation method in which parameters of the tied states of the context dependent models are derived by adapting corresponding parameters of the context independent models using the training data associated with the respective tied states.
    Type: Grant
    Filed: May 8, 2001
    Date of Patent: September 8, 2009
    Assignee: Intel Corporation
    Inventors: Xiaoxing Liu, Baosheng Yuan, Yonghong Yan
  • Patent number: 7454341
    Abstract: According to one aspect of the invention, a method is provided in which a mean vector set and a variance vector set of a set of N Gaussians are divided into multiple mean sub-vector sets and variance sub-vector sets, respectively. Each mean sub-vector set contains a subset of the dimensions of the corresponding mean vector set and each variance sub-vector set contains a subset of the dimensions of the corresponding variance vector set. Each resultant sub-vector set is clustered to build a codebook for the respective sub-vector set using a modified K-means clustering process which dynamically merges and splits clusters based upon the size and average distortion of each cluster during each iteration in the modified K-means clustering process.
    Type: Grant
    Filed: September 30, 2000
    Date of Patent: November 18, 2008
    Assignee: Intel Corporation
    Inventors: Jielin Pan, Baosheng Yuan
  • Patent number: 7454336
    Abstract: A system and method that facilitate modeling unobserved speech dynamics based upon a hidden dynamic speech model in the form of segmental switching state space model that employs model parameters including those describing the unobserved speech dynamics and those describing the relationship between the unobserved speech dynamic vector and the observed acoustic feature vector is provided. The model parameters are modified based, at least in part, upon, a variational learning technique. In accordance with an aspect of the present invention, novel and powerful variational expectation maximization (EM) algorithm(s) for the segmental switching state space models used in speech applications, which are capable of capturing key internal (or hidden) dynamics of natural speech production, are provided. For example, modification of model parameters can be based upon an approximate mixture of Gaussian (MOG) posterior and/or based upon an approximate hidden Markov model (HMM) posterior using a variational technique.
    Type: Grant
    Filed: June 20, 2003
    Date of Patent: November 18, 2008
    Assignee: Microsoft Corporation
    Inventors: Hagai Attias, Li Deng, Leo J. Lee
  • Patent number: 7437288
    Abstract: A speech recognition apparatus using a probability model that employs a mixed distribution, the apparatus formed by a standard pattern storage means for storing a standard pattern; a recognition means for outputting recognition results corresponding to an input speech by using the standard pattern; a standard pattern generating means for inputting learning speech and generating the standard pattern; and a standard pattern adjustment means, provided between the standard pattern generating means and the standard pattern storage means, for adjusting the number of element distributions of the mixed distribution of the standard pattern.
    Type: Grant
    Filed: March 11, 2002
    Date of Patent: October 14, 2008
    Assignee: NEC Corporation
    Inventor: Koichi Shinoda
  • Publication number: 20080243506
    Abstract: A speech recognition apparatus includes a generating unit generating a speech-feature vector expressing a feature for each of frames obtained by dividing an input speech, a storage unit storing a first acoustic model obtained by modeling a feature of each word by using a state transition model, a storage unit configured to store at least one second acoustic model, a calculation unit calculating, for each state, a first probability of transition to an at-end-frame state to obtain first probabilities, and select a maximum probability of the first probabilities, a selection unit selecting a maximum-probability-transition path, a conversion unit converting the maximum-probability-transition path into a corresponding-transition-path corresponding to the second acoustic model, a calculation unit calculating a second probability of transition to the at-end-frame state on the corresponding-transition-path, and a finding unit finding to which word the input speech corresponds based on the maximum probability and the s
    Type: Application
    Filed: February 8, 2008
    Publication date: October 2, 2008
    Inventors: Masaru Sakai, Hiroshi Fujimura, Shinichi Tanaka