Duration Modeling In Hmm, E.g., Semi Hmm, Segmental Models, Transition Probabilities (epo) Patents (Class 704/256.4)
-
Patent number: 12249319Abstract: Implementations relate to determining a language for speech recognition of a spoken utterance, received via an automated assistant interface, for interacting with an automated assistant. Implementations can enable multilingual interaction with the automated assistant, without necessitating a user explicitly designate a language to be utilized for each interaction. Selection of a speech recognition model for a particular language can based on one or more interaction characteristics exhibited during a dialog session between a user and an automated assistant. Such interaction characteristics can include anticipated user input types, anticipated user input durations, a duration for monitoring for a user response, and/or an actual duration of a provided user response.Type: GrantFiled: November 13, 2023Date of Patent: March 11, 2025Assignee: GOOGLE LLCInventors: Pu-Sen Chao, Diego Melendo Casado, Ignacio Lopez Moreno
-
Patent number: 12073821Abstract: A system capable of speech gap modulation is configured to: receive at least one composite speech portion, which comprises at least one speech portion and at least one dynamic-gap portion, wherein the speech portion(s) comprising at least one variable-value speech portion, wherein the dynamic-gap portion(s) associated with a pause in speech; receive at least one synchronization point, wherein synchronization point(s) is associating a point in time in the composite speech portion(s) and a point in time in other media portion(s); and modulate dynamic-gap portion(s), based at least partially on the at variable-value speech portion(s), and on the point(s), thereby generating at least one modulated composite speech portion. This facilitates improved synchronization of the modulated composite speech portion(s) and the other media portion(s) at the synchronization point(s), when combining the other media portion(s) and the audio-format modulated composite speech portion(s) into a synchronized multimedia output.Type: GrantFiled: January 30, 2020Date of Patent: August 27, 2024Assignee: IGENTIFY LTD.Inventors: Zohar Sherman, Ori Inbar
-
Patent number: 11900943Abstract: A method of zoning a transcription of audio data includes separating the transcription of audio data into a plurality of utterances. A that each word in an utterances is a meaning unit boundary is calculated. The utterance is split into two new utterances at a work with a maximum calculated probability. At least one of the two new utterances that is shorter than a maximum utterance threshold is identified as a meaning unit.Type: GrantFiled: January 3, 2022Date of Patent: February 13, 2024Assignee: Verint Systems Ltd.Inventors: Roni Romano, Yair Horesh, Jeremie Dreyfuss
-
Patent number: 11817085Abstract: Implementations relate to determining a language for speech recognition of a spoken utterance, received via an automated assistant interface, for interacting with an automated assistant. Implementations can enable multilingual interaction with the automated assistant, without necessitating a user explicitly designate a language to be utilized for each interaction. Selection of a speech recognition model for a particular language can based on one or more interaction characteristics exhibited during a dialog session between a user and an automated assistant. Such interaction characteristics can include anticipated user input types, anticipated user input durations, a duration for monitoring for a user response, and/or an actual duration of a provided user response.Type: GrantFiled: December 14, 2020Date of Patent: November 14, 2023Assignee: GOOGLE LLCInventors: Pu-Sen Chao, Diego Melendo Casado, Ignacio Lopez Moreno
-
Patent number: 11817084Abstract: The present disclosure relates generally to determining a language for speech recognition of a spoken utterance, received via an automated assistant interface, for interacting with an automated assistant. The system can enable multilingual interaction with the automated assistant, without necessitating a user explicitly designate a language to be utilized for each interaction. Selection of a speech recognition model for a particular language can based on one or more interaction characteristics exhibited during a dialog session between a user and an automated assistant. Such interaction characteristics can include anticipated user input types, anticipated user input durations, a duration for monitoring for a user response, and/or an actual duration of a provided user response.Type: GrantFiled: May 21, 2020Date of Patent: November 14, 2023Assignee: GOOGLE LLCInventors: Pu-sen Chao, Diego Melendo Casado, Ignacio Lopez Moreno
-
Patent number: 11715464Abstract: Systems and processes for creating and updating natural language models are provided. An example process of creating a natural language model includes, at an electronic device with one or more processors and memory, receiving an utterance, associating an action structure with the utterance, determining a plurality of augmented utterances based on the received utterance, creating a natural language model including the received utterance and the plurality of augmented utterances by mapping the plurality of augmented utterance to the associated action structure, and providing the natural language model including the received utterance and the plurality of augmented utterances to a second electronic device.Type: GrantFiled: July 2, 2021Date of Patent: August 1, 2023Assignee: Apple Inc.Inventors: Thomas Robert Nickson, Keith Scott Brisson, Eric Gregory, Thomas B. Gunter, Arthur A. Van Hoff
-
Patent number: 11670289Abstract: Systems and processes are disclosed for handling a multi-part voice command for a virtual assistant. Speech input can be received from a user that includes multiple actionable commands within a single utterance. A text string can be generated from the speech input using a speech transcription process. The text string can be parsed into multiple candidate substrings based on domain keywords, imperative verbs, predetermined substring lengths, or the like. For each candidate substring, a probability can be determined indicating whether the candidate substring corresponds to an actionable command. Such probabilities can be determined based on semantic coherence, similarity to user request templates, querying services to determine manageability, or the like. If the probabilities exceed a threshold, the user intent of each substring can be determined, processes associated with the user intents can be executed, and an acknowledgment can be provided to the user.Type: GrantFiled: December 18, 2020Date of Patent: June 6, 2023Assignee: Apple Inc.Inventors: Thomas R. Gruber, Harry J. Saddler, Jerome Rene Bellegarda, Bryce H. Nyeggen, Alessandro Sabatelli
-
Patent number: 11574641Abstract: A processor-implemented method with data recognition includes: extracting input feature data from input data; calculating a matching score between the extracted input feature data and enrolled feature data of an enrolled user, based on the extracted input feature data, common component data of a plurality of enrolled feature data corresponding to the enrolled user, and distribution component data of the plurality of enrolled feature data corresponding to the enrolled user; and recognizing the input data based on the matching score.Type: GrantFiled: April 10, 2020Date of Patent: February 7, 2023Assignee: Samsung Electronics Co., Ltd.Inventors: Sung-Un Park, Kyuhong Kim
-
Patent number: 11423904Abstract: A computer-implemented method of false keyphrase rejection comprises receiving a captured audio signal of human speech including one or more keyphrases that trigger an action. It also comprises detecting whether or not at least part of the speech is spoken by at least one computer originated voice. The method also has an operation of omitting the triggering of the action at least partly due to the computer originated voice being recognized in the speech.Type: GrantFiled: November 9, 2020Date of Patent: August 23, 2022Assignee: Intel CorporationInventors: Jacek Ossowski, Tobias Bocklet, Kuba Lopatka
-
Patent number: 11417314Abstract: A speech synthesis method, a speech synthesis device, and an electronic apparatus are provided, which relate to a field of speech synthesis. Specific implementation solution is the following: inputting text information into an encoder of an acoustic model, to output a text feature of a current time step; splicing the text feature of the current time step with a spectral feature of a previous time step to obtain a spliced feature of the current time step, and inputting the spliced feature of the current time step into an decoder of the acoustic model to obtain a spectral feature of the current time step; and inputting the spectral feature of the current time step into a neural network vocoder, to output speech.Type: GrantFiled: February 21, 2020Date of Patent: August 16, 2022Assignee: Baidu Online Network Technology (Beijing) Co., Ltd.Inventors: Chenxi Sun, Tao Sun, Xiaolin Zhu, Wenfu Wang
-
Patent number: 11182431Abstract: Systems and methods for voice searching media content based on metadata or subtitles are provided. Metadata associated with media content can be pre-processed at a media server. Upon receiving a vocal command representative of a search for an aspect of the media content, the media server performs a search for one or more portions of the media content relevant to the aspect of the media content being searched for. The media performs the search by matching the aspect of the media content being searched for with the pre-processed metadata.Type: GrantFiled: December 11, 2014Date of Patent: November 23, 2021Assignee: Disney Enterprises, Inc.Inventors: Jing X. Wang, Mark Arana, Edward Drake, Alexander C. Chen
-
Patent number: 10971140Abstract: A speech recognition circuit comprises an input buffer for receiving processed speech parameters. A lexical memory contains lexical data for word recognition. The lexical data comprises a plurality of lexical tree data structures. Each lexical tree data structure comprises a model of words having common prefix components. An initial component of each lexical tree structure is unique. A plurality of lexical tree processors are connected in parallel to the input buffer for processing the speech parameters in parallel to perform parallel lexical tree processing for word recognition by accessing the lexical data in the lexical memory. A results memory is connected to the lexical tree processors for storing processing results from the lexical tree processors and lexical tree identifiers to identify lexical trees to be processed by the lexical tree processors.Type: GrantFiled: February 4, 2019Date of Patent: April 6, 2021Assignee: Zentian LimitedInventor: Mark Catchpole
-
Patent number: 10217456Abstract: A method and system for generating training data for a target domain using speech data of a source domain. The training data generation method including: reading out a Gaussian mixture model (GMM) of a target domain trained with a clean speech data set of the target domain; mapping, by referring to the GMM of the target domain, a set of source domain speech data received as an input to the set of target domain speech data on a basis of a channel characteristic of the target domain speech data; and adding a noise of the target domain to the mapped set of source domain speech data to output a set of pseudo target domain speech data.Type: GrantFiled: April 14, 2014Date of Patent: February 26, 2019Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Osamu Ichikawa, Steven J Rennie
-
Patent number: 10192116Abstract: The disclosure relates to recognizing data such as items or entities in content. In some aspects, content may be received and feature information, such as face recognition data and voice recognition data may be generated. Scene segmentation may also be performed on the content, grouping the various shots of the video content into one or more shot collections, such as scenes. For example, a decision lattice representative of possible scene segmentations may be determined and the most probable path through the decision lattice may be selected as the scene segmentation. Upon generating the feature information and performing the scene segmentation, one or more items or entities that are present in the scene may be identified.Type: GrantFiled: May 27, 2016Date of Patent: January 29, 2019Assignee: Comcast Cable Communications, LLCInventors: Jan Neumann, Evelyne Tzoukermann, Amit Bagga, Oliver Jojic, Bageshree Shevade, David F. Houghton, Corey Farrell
-
Patent number: 9959870Abstract: A system and method of speech recognition involving a mobile device. Speech input is received (202) on a mobile device (102) and converted (204) to a set of phonetic symbols. Data relating to the phonetic symbols is transferred (206) from the mobile device over a communications network (104) to a remote processing device (106) where it is used (208) to identify at least one matching data item from a set of data items (114). Data relating to the at least one matching data item is transferred (210) from the remote processing device to the mobile device and presented (214) thereon.Type: GrantFiled: December 10, 2009Date of Patent: May 1, 2018Assignee: Apple Inc.Inventors: Melvyn Hunt, John Bridle
-
Patent number: 9959260Abstract: The invention provides for a system, method, and computer readable medium storing instructions related to controlling a presentation in a multimodal system. The method embodiment of the invention is a method for the retrieval of information on the basis of its content for incorporation into an electronic presentation. The method comprises receiving from a user a content-based request for at least one segment from a first plurality of segments within a media presentation preprocessed to enable natural language content searchability; in response to the request, presenting a subset of the first plurality of segments to the user; receiving a selection indication from the user associated with at least one segment of the subset of the first plurality of segments and adding the selected at least one segment to a deck for use in a presentation.Type: GrantFiled: May 4, 2015Date of Patent: May 1, 2018Assignee: Nuance Communications, Inc.Inventors: Patrick Ehlen, David Crawford Gibbon, Mazin Gilbert, Michael Johnston, Zhu Liu, Behzad Shahraray
-
Patent number: 9785613Abstract: Embodiments of the present invention include an apparatus, method, and system for acoustic modeling. In an embodiment, a speech recognition system is provided. The system includes a processing unit configured to divide a received audio signal into consecutive frames having respective frame vectors, an acoustic processing unit (APU), a data bus that couples the processing unit and the APU. The APU includes a local, non-volatile memory that stores a plurality of senones, a memory buffer coupled to the memory, the acoustic processing unit being configured to load at least one Gaussian probability distribution vector stored in the memory into the memory buffer, and a scoring unit configured to simultaneously compare a plurality of dimensions of a Gaussian probability distribution vector loaded into the memory buffer with respective dimensions of a frame vector received from the processing unit and to output a corresponding score to the processing unit.Type: GrantFiled: June 6, 2012Date of Patent: October 10, 2017Assignee: Cypress Semiconductor CorporationInventors: Venkataraman Natarajan, Stephan Rosner
-
Patent number: 9753890Abstract: Embodiments of the present invention include an apparatus, method, and system for acoustic modeling. In an embodiment, a speech recognition system is provided. The system includes a processing unit configured to divide a received audio signal into consecutive frames having respective frame vectors, an acoustic processing unit (APU), a data bus that couples the processing unit and the APU. The APU includes a local, non-volatile memory that stores a plurality of senones, a memory buffer coupled to the memory, the acoustic processing unit being configured to load at least one Gaussian probability distribution vector stored in the memory into the memory buffer, and a scoring unit configured to simultaneously compare a plurality of dimensions of a Gaussian probability distribution vector loaded into the memory buffer with respective dimensions of a frame vector received from the processing unit and to output a corresponding score to the processing unit.Type: GrantFiled: June 6, 2012Date of Patent: September 5, 2017Assignee: Cypress Semiconductor CorporationInventors: Venkataraman Natarajan, Stephan Rosner
-
Patent number: 9378423Abstract: The disclosure relates to recognizing data such as items or entities in content. In some aspects, content may be received and feature information, such as face recognition data and voice recognition data may be generated. Scene segmentation may also be performed on the content, grouping the various shots of the video content into one or more shot collections, such as scenes. For example, a decision lattice representative of possible scene segmentations may be determined and the most probable path through the decision lattice may be selected as the scene segmentation. Upon generating the feature information and performing the scene segmentation, one or more items or entities that are present in the scene may be identified.Type: GrantFiled: September 3, 2014Date of Patent: June 28, 2016Assignee: Comcast Cable Communications, LLCInventors: Jan Neumann, Evelyne Tzoukermann, Amit Bagga, Oliver Jojic, Bageshree Shevade, David F. Houghton, Corey Farrell
-
Patent number: 9286414Abstract: The subject disclosure relates to one or more computer-implemented processes for collecting, analyzing, and employing annotations of data sources. In particular, an annotation component is configured to receive annotations of data for a data source, wherein the respective annotations comprise different associations of a global terms with the data of the data source, a data store configured to store the annotations, and an interface component configured to render the data based on the annotations in response to a request for the data. In an aspect, storing information, the data also stores descriptions of the data sources and definitions of the global terms, and the interface component determines a subset of the information in the data store based on the annotations. A method is further provided comprising receiving a global term and determining data sources that have the global term associated with the data thereof based on the information in the data store.Type: GrantFiled: December 2, 2011Date of Patent: March 15, 2016Assignee: Microsoft Technology Licensing, LLCInventors: Alex James, Michael Pizzo, Pablo Castro, Michael Justin Flasko, Lance Olson, Jason Clark, Siddharth Jayadevan
-
Patent number: 9230548Abstract: Embodiments of the present invention include a data storage device and a method for storing data in a hash table. The data storage device can include a first memory device, a second memory device, and a processing device. The first memory device is configured to store one or more data elements. The second memory device is configured to store one or more status bits at one or more respective table indices. In addition, each of the table indices is mapped to a corresponding table index in the first memory device. The processing device is configured to calculate one or more hash values based on the one or more data elements.Type: GrantFiled: December 21, 2012Date of Patent: January 5, 2016Assignee: Cypress Semiconductor CorporationInventors: Richard M. Fastow, Ojas A. Bapat
-
Patent number: 9208778Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for combining frame and segment level processing, via temporal pooling, for phonetic classification. A frame processor unit receives an input and extracts the time-dependent features from the input. A plurality of pooling interface units generates a plurality of feature vectors based on pooling the time-dependent features and selecting a plurality of time-dependent features according to a plurality of selection strategies. Next, a plurality of segmental classification units generates scores for the feature vectors. Each segmental classification unit (SCU) can be dedicated to a specific pooling interface unit (PIU) to form a PIU-SCU combination. Multiple PIU-SCU combinations can be further combined to form an ensemble of combinations, and the ensemble can be diversified by varying the pooling operations used by the PIU-SCU combinations.Type: GrantFiled: November 10, 2014Date of Patent: December 8, 2015Assignee: AT&T Intellectual Property I, L.P.Inventors: Sumit Chopra, Dimitrios Dimitriadis, Patrick Haffner
-
Patent number: 9009039Abstract: Technologies are described herein for noise adaptive training to achieve robust automatic speech recognition. Through the use of these technologies, a noise adaptive training (NAT) approach may use both clean and corrupted speech for training. The NAT approach may normalize the environmental distortion as part of the model training. A set of underlying “pseudo-clean” model parameters may be estimated directly. This may be done without point estimation of clean speech features as an intermediate step. The pseudo-clean model parameters learned from the NAT technique may be used with a Vector Taylor Series (VTS) adaptation. Such adaptation may support decoding noisy utterances during the operating phase of a automatic voice recognition system.Type: GrantFiled: June 12, 2009Date of Patent: April 14, 2015Assignee: Microsoft Technology Licensing, LLCInventors: Michael Lewis Seltzer, James Garnet Droppo, Ozlem Kalinli, Alejandro Acero
-
Patent number: 8959014Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training acoustic models. Speech data and data identifying a transcription for the speech data are received. A phonetic representation for the transcription is accessed. Training sequences are identified for a particular phone in the phonetic representation. Each of the training sequences includes a different set of contextual phones surrounding the particular phone. A partitioning key is identified based on a sequence of phones that occurs in each of the training sequences. A processing module to which the identified partitioning key is assigned is selected. Data identifying the training sequences and a portion of the speech data are transmitted to the selected processing module.Type: GrantFiled: June 29, 2012Date of Patent: February 17, 2015Assignee: Google Inc.Inventors: Peng Xu, Fernando Pereira, Ciprian I. Chelba
-
Patent number: 8935170Abstract: A speech recognition system, according to an example embodiment, includes a data storage to store speech training data. A training engine determines consecutive breakout periods in the speech training data, calculates forward and backward probabilities for the breakout periods, and generates a speech recognition Hidden Markov Model (HMM) from the forward and backward probabilities calculated for the breakout periods.Type: GrantFiled: November 27, 2012Date of Patent: January 13, 2015Assignee: Longsand LimitedInventor: Maha Kadirkamanathan
-
Patent number: 8868423Abstract: Systems and methods for controlling access to resources using spoken Completely Automatic Public Turing Tests To Tell Humans And Computers Apart (CAPTCHA) tests are disclosed. In these systems and methods, entities seeking access to resources are required to produce an input utterance that contains at least some audio. That utterance is compared with voice reference data for human and machine entities, and a determination is made as to whether the entity requesting access is a human or a machine. Access is then permitted or refused based on that determination.Type: GrantFiled: July 11, 2013Date of Patent: October 21, 2014Assignee: John Nicholas and Kristin Gross TrustInventor: John Nicholas Gross
-
Patent number: 8818802Abstract: A method for real-time data-pattern analysis. The method includes receiving and queuing at least one data-pattern analysis request by a data-pattern analysis unit controller. At least one data stream portion is also received and stored by the data-pattern analysis unit controller, each data stream portion corresponding to a received data-pattern analysis request. Next, a received data-pattern analysis request is selected by the data-pattern analysis unit controller along with a corresponding data stream portion. A data-pattern analysis is performed based on the selected data-pattern analysis request and the corresponding data stream portion, wherein the data-pattern analysis is performed by one of a plurality of data-pattern analysis units.Type: GrantFiled: August 9, 2010Date of Patent: August 26, 2014Assignee: Spansion LLCInventors: Richard Fastow, Qamrul Hasan
-
Patent number: 8725508Abstract: A computer-implemented method and apparatus for searching for an element sequence, the method comprising: receiving a signal; determining an initial segment of the signal; inputting the initial segment into an element extraction engine to obtain a first element sequence; determining one or more second segments, each of the second segments at least partially overlapping with the initial segment; inputting the second segments into the element extraction engine to obtain at least one second element sequence; and searching for an element subsequence common to at least a predetermined number of sequences of the first element sequence and the second element sequences.Type: GrantFiled: March 27, 2012Date of Patent: May 13, 2014Assignee: NovospeechInventor: Yossef Ben-Ezra
-
Patent number: 8700403Abstract: A method of statistical modeling is provided which includes constructing a statistical model and incorporating Gaussian priors during feature selection and during parameter optimization for the construction of the statistical model.Type: GrantFiled: November 3, 2005Date of Patent: April 15, 2014Assignee: Robert Bosch GmbHInventors: Fuliang Weng, Lin Zhao
-
Patent number: 8543402Abstract: System and methods for robust multiple speaker segmentation in noisy conversational speech are presented. Robust voice activity detection is applied to detect temporal speech events. In order to get robust speech features and detect speech events in a noisy environment, a noise reduction algorithm is applied, using noise tracking. After noise reduction and voice activity detection, the incoming audio/speech is initially labeled as speech segments or silence segments. With no prior knowledge of the number of speakers, the system identifies one reliable speech segment near the beginning of the conversational speech and extracts speech features with a short latency, then learns a statistical model from the selected speech segment. This initial statistical model is used to identify the succeeding speech segments in a conversation. The statistical model is also continuously adapted and expanded with newly identified speech segments that match well to the model.Type: GrantFiled: April 29, 2011Date of Patent: September 24, 2013Assignee: The Intellisis CorporationInventor: Jiyong Ma
-
Patent number: 8510111Abstract: A speech recognition apparatus includes a generating unit generating a speech-feature vector expressing a feature for each of frames obtained by dividing an input speech, a storage unit storing a first acoustic model obtained by modeling a feature of each word by using a state transition model, a storage unit configured to store at least one second acoustic model, a calculation unit calculating, for each state, a first probability of transition to an at-end-frame state to obtain first probabilities, and select a maximum probability of the first probabilities, a selection unit selecting a maximum-probability-transition path, a conversion unit converting the maximum-probability-transition path into a corresponding-transition-path corresponding to the second acoustic model, a calculation unit calculating a second probability of transition to the at-end-frame state on the corresponding-transition-path, and a finding unit finding to which word the input speech corresponds based on the maximum probability and the sType: GrantFiled: February 8, 2008Date of Patent: August 13, 2013Assignee: Kabushiki Kaisha ToshibaInventors: Masaru Sakai, Hiroshi Fujimura, Shinichi Tanaka
-
Patent number: 8494854Abstract: An audible based electronic challenge system is used to control access to a computing resource by using a test to identify an origin of a voice. The test is based on analyzing a spoken utterance using optimized challenge items selected for their discrimination capability to determine if it was articulated by an unauthorized human or a text to speech (TTS) system.Type: GrantFiled: June 15, 2009Date of Patent: July 23, 2013Assignee: John Nicholas and Kristin GrossInventor: John Nicholas Gross
-
Patent number: 8489399Abstract: An audible based electronic challenge system is used to control access to a computing resource by using a test to identify an origin of a voice. The test is based on analyzing a spoken utterance to determine if it was articulated by an unauthorized human or a text to speech (TTS) system.Type: GrantFiled: June 15, 2009Date of Patent: July 16, 2013Assignee: John Nicholas and Kristin Gross TrustInventor: John Nicholas Gross
-
Patent number: 8442828Abstract: A conditional model is used in spoken language understanding. One such model is a conditional random field model.Type: GrantFiled: March 17, 2006Date of Patent: May 14, 2013Assignee: Microsoft CorporationInventors: Ye-Yi Wang, Alejandro Acero, John Sie Yuen Lee, Milind V. Mahajan
-
Patent number: 8374869Abstract: An utterance verification method for an isolated word N-best speech recognition result includes: calculating log likelihoods of a context-dependent phoneme and an anti-phoneme model based on an N-best speech recognition result for an input utterance; measuring a confidence score of an N-best speech-recognized word using the log likelihoods; calculating distance between phonemes for the N-best speech-recognized word; comparing the confidence score with a threshold and the distance with a predetermined mean of distances; and accepting the N-best speech-recognized word when the compared results for the confidence score and the distance correspond to acceptance.Type: GrantFiled: August 4, 2009Date of Patent: February 12, 2013Assignee: Electronics and Telecommunications Research InstituteInventors: Jeom Ja Kang, Yunkeun Lee, Jeon Gue Park, Ho-Young Jung, Hyung-Bae Jeon, Hoon Chung, Sung Joo Lee, Euisok Chung, Ji Hyun Wang, Byung Ok Kang, Ki-young Park, Jong Jin Kim
-
Patent number: 8301449Abstract: Hidden Markov Model (HMM) parameters are updated using update equations based on growth transformation optimization of a minimum classification error objective function. Using the list of N-best competitor word sequences obtained by decoding the training data with the current-iteration HMM parameters, the current HMM parameters are updated iteratively. The updating procedure involves using weights for each competitor word sequence that can take any positive real value. The updating procedure is further extended to the case where a decoded lattice of competitors is used. In this case, updating the model parameters relies on determining the probability for a state at a time point based on the word that spans the time point instead of the entire word sequence. This word-bound span of time is shorter than the duration of the entire word sequence and thus reduces the computing time.Type: GrantFiled: October 16, 2006Date of Patent: October 30, 2012Assignee: Microsoft CorporationInventors: Xiaodong He, Li Deng
-
Patent number: 8229729Abstract: A system and method for training a statistical machine translation model and decoding or translating using the same is disclosed. A source word versus target word co-occurrence matrix is created to define word pairs. Dimensionality of the matrix may be reduced. Word pairs are mapped as vectors into continuous space where the word pairs are vectors of continuous real numbers and not discrete entities in the continuous space. A machine translation parametric model is trained using an acoustic model training method based on word pair vectors in the continuous space.Type: GrantFiled: March 25, 2008Date of Patent: July 24, 2012Assignee: International Business Machines CorporationInventors: Ruhi Sarikaya, Yonggang Deng, Brian Edward Doorenbos Kingsbury, Yuqing Gao
-
Patent number: 8078462Abstract: A transformation-parameter calculating unit calculates a first model parameter indicating a parameter of a speaker model for causing a first likelihood for a clean feature to maximum, and calculates a transformation parameter for causing the first likelihood to maximum. The transformation parameter transforms, for each of the speakers, a distribution of the clean feature corresponding to the identification information of the speaker to a distribution represented by the speaker model of the first model parameter. A model-parameter calculating unit transforms a noisy feature corresponding to identification information for each of speakers by using the transformation parameter, and calculates a second model parameter indicating a parameter of the speaker model for causing a second likelihood for the transformed noisy feature to maximum.Type: GrantFiled: October 2, 2008Date of Patent: December 13, 2011Assignee: Kabushiki Kaisha ToshibaInventors: Yusuke Shinohara, Masami Akamine
-
Patent number: 7941317Abstract: Systems and methods for low-latency real-time speech recognition/transcription. A discriminative feature extraction, such as a heteroscedastic discriminant analysis transform, in combination with a maximum likelihood linear transform is applied during front-end processing of a digital speech signal. The extracted features reduce the word error rate. A discriminative acoustic model is applied by generating state-level lattices using Maximum Mutual Information Estimation. Recognition networks of language models are replaced by their closure. Latency is reduced by eliminating segmentation such that a number of words/sentences can be recognized as a single utterance. Latency is further reduced by performing front-end normalization in a causal fashion.Type: GrantFiled: June 5, 2007Date of Patent: May 10, 2011Assignee: AT&T Intellectual Property II, L.P.Inventors: Vincent Goffin, Michael Dennis Riley, Murat Saraclar
-
Patent number: 7930181Abstract: Systems and methods for low-latency real-time speech recognition/transcription. A discriminative feature extraction, such as a heteroscedastic discriminant analysis transform, in combination with a maximum likelihood linear transform is applied during front-end processing of a digital speech signal. The extracted features reduce the word error rate. A discriminative acoustic model is applied by generating state-level lattices using Maximum Mutual Information Estimation. Recognition networks of language models are replaced by their closure. Latency is reduced by eliminating segmentation such that a number of words/sentences can be recognized as a single utterance. Latency is further reduced by performing front-end normalization in a causal fashion.Type: GrantFiled: November 21, 2002Date of Patent: April 19, 2011Assignee: AT&T Intellectual Property II, L.P.Inventors: Vincent Goffin, Michael Dennis Riley, Murat Saraclar
-
Patent number: 7895040Abstract: According to an embodiment, voice recognition apparatus includes units of: acoustic processing, voice interval detecting, dictionary, collating, search target selecting, storing and determining, and voice recognition method includes processes of: selecting a search range on basis of a beam search, setting and storing a standard frame, storing an output probability of a certain transition path, determining whether or not the output probability of a certain path is stored. Number of times of calculation of the output probability is reduced by selecting the search range on basis of the beam search, calculating the output probability of the certain transition path only once in an interval from when the standard frame is set to when the standard frame is renewed, and storing and using thus calculated value as an approximate value of the output probability in subsequent frames.Type: GrantFiled: March 30, 2007Date of Patent: February 22, 2011Assignee: Kabushiki Kaisha ToshibaInventors: Masaru Sakai, Shinichi Tanaka
-
Patent number: 7856356Abstract: A speech recognition system for a mobile terminal includes an acoustic variation channel unit and a pronunciation channel unit. The acoustic variation channel unit transforms a speech signal into feature parameters and Viterbi-decodes the speech signal to produce a varied phoneme sequence by using the feature parameters and predetermined models. Further, the pronunciation variation channel unit Viterbi-decodes the varied phoneme sequence to produce a word phoneme sequence by using the varied phoneme sequence and a preset DHMM (Discrete Hidden Markov Model) based context-dependent error model.Type: GrantFiled: December 20, 2006Date of Patent: December 21, 2010Assignee: Electronics and Telecommunications Research InstituteInventors: Hoon Chung, Yunkeun Lee
-
Patent number: 7711561Abstract: The present invention relates to speech recognition systems, particularly speech-to-text systems and software and decoders for the same.Type: GrantFiled: April 15, 2004Date of Patent: May 4, 2010Assignee: Kabushiki Kaisha ToshibaInventors: Wide Hogenhout, Kean Kheong Chin
-
Patent number: 7672847Abstract: Methods are given for improving discriminative training of hidden Markov models for continuous speech recognition. For a mixture component of a hidden Markov model state, a gradient adjustment is calculated of the standard deviation of the mixture component. If the calculated gradient adjustment is greater than a first threshold amount, an adjustment is performed of the standard deviation of the mixture component using the first threshold. If the calculated gradient adjustment is less than a second threshold amount, an adjustment is performed of the standard deviation of the mixture component using the second threshold. Otherwise, an adjustment is performed of the standard deviation of the mixture component using the calculated gradient adjustment.Type: GrantFiled: September 30, 2008Date of Patent: March 2, 2010Assignee: Nuance Communications, Inc.Inventors: Chuang He, Jianxiong Wu, Vlad Sejnoha
-
Patent number: 7664643Abstract: A method, and a system to execute this method is being presented for the identification and separation of sources of an acoustic signal, which signal contains a mixture of multiple simultaneous component signals. The method represents the signal with multiple discrete state-variable sequences and combines acoustic and context level dynamics to achieve the source separation. The method identifies sources by discovering those frames of the signal whose features are dominated by single sources. The signal may be the simultaneous speech of multiple speakers.Type: GrantFiled: August 25, 2006Date of Patent: February 16, 2010Assignees: Nuance Communications, Inc.Inventors: Ramesh Ambat Gopinath, John Randall Hershey, Trausti Thor Kristjansson, Peder Andreas Olsen, Steven John Rennie
-
Patent number: 7587321Abstract: According to one aspect of the invention, a method is provided in which a set of multiple mixture monophone models is created and trained to generate a set of multiple mixture context dependent models. A set of single mixture triphone models is created and trained to generate a set of context dependent models. Corresponding states of the triphone models are clustered to obtain a set of tied states based on a decision tree clustering process. Parameters of the context dependent models are estimated using a data dependent maximum a posteriori (MAP) adaptation method in which parameters of the tied states of the context dependent models are derived by adapting corresponding parameters of the context independent models using the training data associated with the respective tied states.Type: GrantFiled: May 8, 2001Date of Patent: September 8, 2009Assignee: Intel CorporationInventors: Xiaoxing Liu, Baosheng Yuan, Yonghong Yan
-
Patent number: 7454341Abstract: According to one aspect of the invention, a method is provided in which a mean vector set and a variance vector set of a set of N Gaussians are divided into multiple mean sub-vector sets and variance sub-vector sets, respectively. Each mean sub-vector set contains a subset of the dimensions of the corresponding mean vector set and each variance sub-vector set contains a subset of the dimensions of the corresponding variance vector set. Each resultant sub-vector set is clustered to build a codebook for the respective sub-vector set using a modified K-means clustering process which dynamically merges and splits clusters based upon the size and average distortion of each cluster during each iteration in the modified K-means clustering process.Type: GrantFiled: September 30, 2000Date of Patent: November 18, 2008Assignee: Intel CorporationInventors: Jielin Pan, Baosheng Yuan
-
Patent number: 7454336Abstract: A system and method that facilitate modeling unobserved speech dynamics based upon a hidden dynamic speech model in the form of segmental switching state space model that employs model parameters including those describing the unobserved speech dynamics and those describing the relationship between the unobserved speech dynamic vector and the observed acoustic feature vector is provided. The model parameters are modified based, at least in part, upon, a variational learning technique. In accordance with an aspect of the present invention, novel and powerful variational expectation maximization (EM) algorithm(s) for the segmental switching state space models used in speech applications, which are capable of capturing key internal (or hidden) dynamics of natural speech production, are provided. For example, modification of model parameters can be based upon an approximate mixture of Gaussian (MOG) posterior and/or based upon an approximate hidden Markov model (HMM) posterior using a variational technique.Type: GrantFiled: June 20, 2003Date of Patent: November 18, 2008Assignee: Microsoft CorporationInventors: Hagai Attias, Li Deng, Leo J. Lee
-
Patent number: 7437288Abstract: A speech recognition apparatus using a probability model that employs a mixed distribution, the apparatus formed by a standard pattern storage means for storing a standard pattern; a recognition means for outputting recognition results corresponding to an input speech by using the standard pattern; a standard pattern generating means for inputting learning speech and generating the standard pattern; and a standard pattern adjustment means, provided between the standard pattern generating means and the standard pattern storage means, for adjusting the number of element distributions of the mixed distribution of the standard pattern.Type: GrantFiled: March 11, 2002Date of Patent: October 14, 2008Assignee: NEC CorporationInventor: Koichi Shinoda
-
Publication number: 20080243506Abstract: A speech recognition apparatus includes a generating unit generating a speech-feature vector expressing a feature for each of frames obtained by dividing an input speech, a storage unit storing a first acoustic model obtained by modeling a feature of each word by using a state transition model, a storage unit configured to store at least one second acoustic model, a calculation unit calculating, for each state, a first probability of transition to an at-end-frame state to obtain first probabilities, and select a maximum probability of the first probabilities, a selection unit selecting a maximum-probability-transition path, a conversion unit converting the maximum-probability-transition path into a corresponding-transition-path corresponding to the second acoustic model, a calculation unit calculating a second probability of transition to the at-end-frame state on the corresponding-transition-path, and a finding unit finding to which word the input speech corresponds based on the maximum probability and the sType: ApplicationFiled: February 8, 2008Publication date: October 2, 2008Inventors: Masaru Sakai, Hiroshi Fujimura, Shinichi Tanaka