Update Patterns Patents (Class 704/244)
-
Patent number: 10438593Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for presenting notifications in an enterprise system. In one aspect, a method include actions of obtaining enrollment acoustic data representing an enrollment utterance spoken by a user, obtaining a set of candidate acoustic data representing utterances spoken by other users, determining, for each candidate acoustic data of the set of candidate acoustic data, a similarity score that represents a similarity between the enrollment acoustic data and the candidate acoustic data, selecting a subset of candidate acoustic data from the set of candidate acoustic data based at least on the similarity scores, generating a detection model based on the subset of candidate acoustic data, and providing the detection model for use in detecting an utterance spoken by the user.Type: GrantFiled: July 22, 2015Date of Patent: October 8, 2019Assignee: Google LLCInventor: Raziel Alvarez Guevara
-
Patent number: 10438585Abstract: A voice recording device that connects/is connected to a network, comprising a voice recording circuit that acquires voice and records the acquired voice as a voice file, a transmission circuit that transmits the voice file to a network, and a control circuit, the control circuit including an information extraction section that extracts associated information that has been associated with the voice file, and a display that displays the associated information associated with a voice data file.Type: GrantFiled: April 29, 2017Date of Patent: October 8, 2019Assignee: Olympus CorporationInventors: Kenta Yumoto, Takafumi Onishi, Kazushi Fujitani, Ryusuke Hamakawa
-
Patent number: 10410628Abstract: Techniques are disclosed for adjusting a ranking of information content of a software application based on feedback from a user. One embodiment presented herein includes a method comprising receiving, at a computing device, an audio stream comprising audio of the user, the audio being indicative of feedback related to information content. The method further comprises analyzing the audio stream for paralinguistic information to determine an attribute of the user. The method further comprises adjusting a ranking of the information content based on at least one of the feedback and additional feedback and the determined attribute of the user.Type: GrantFiled: June 11, 2018Date of Patent: September 10, 2019Assignee: INTUIT, INC.Inventors: Raymond Chan, Igor A. Podgorny, Benjamin Indyk
-
Patent number: 10410627Abstract: A method for generating a speech recognition model includes accessing a baseline speech recognition model, obtaining information related to recent language usage from search queries, and modifying the speech recognition model to revise probabilities of a portion of a sound occurrence based on the information. The portion of a sound may include a word. Also, a method for generating a speech recognition model, includes receiving at a search engine from a remote device an audio recording and a transcript that substantially represents at least a portion of the audio recording, synchronizing the transcript with the audio recording, extracting one or more letters from the transcript and extracting the associated pronunciation of the one or more letters from the audio recording, and generating a dictionary entry in a pronunciation dictionary.Type: GrantFiled: March 15, 2018Date of Patent: September 10, 2019Assignee: Google LLCInventors: Michael H. Cohen, Shumeet Baluja, Pedro J. Moreno Mengibar
-
Patent number: 10403288Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speaker diarization are disclosed. In one aspect, a method includes the actions of receiving audio data corresponding to an utterance. The actions further include determining that the audio data includes an utterance of a predefined hotword spoken by a first speaker. The actions further include identifying a first portion of the audio data that includes speech from the first speaker. The actions further include identifying a second portion of the audio data that includes speech from a second, different speaker. The actions further include transmitting the first portion of the audio data that includes speech from the first speaker and suppressing transmission of the second portion of the audio data that includes speech from the second, different speaker.Type: GrantFiled: October 17, 2017Date of Patent: September 3, 2019Assignee: Google LLCInventors: Aleksandar Kracun, Richard Cameron Rose
-
Patent number: 10295965Abstract: A vector of actual sensor values is received. A maturity of a model is determined and the maturity is defined for sensors. A function that translates model maturity to model range inhibition measure is determined. A model range inhibition (MRI) measure is determined. An MRI limit based upon the MRI measure is determined. The received vector is compared to the MRI limit and the model is selectively changed based upon the comparing. In other aspects, vectors are received having actual values of driver and response sensors. A function that provides a set of boundaries between acceptable observations and unacceptable observations is also determined. Measures of similarity between vectors are determined. The measures of similarity and the function are compared and the model is selectively changed based upon the comparing.Type: GrantFiled: September 6, 2013Date of Patent: May 21, 2019Assignee: GE INTELLIGENT PLATFORMS, INCInventors: Devang Jagdish Gandhi, James Paul Herzog
-
Patent number: 10204620Abstract: A computer-implemented method according to one embodiment includes estimating a speaker dependent acoustic model utilizing test speech data and maximum likelihood linear regression (MLLR), transforming labeled speech data to create transformed speech data, utilizing the speaker dependent acoustic model and a linear transformation, and adjusting a deep neural network (DNN) acoustic model, utilizing the transformed speech data.Type: GrantFiled: September 7, 2016Date of Patent: February 12, 2019Assignee: International Business Machines CorporationInventors: Xiaodong Cui, Vaibhava Goel
-
Patent number: 10204621Abstract: A computer-implemented method according to one embodiment includes estimating a speaker dependent acoustic model utilizing test speech data and a hybrid estimation technique, transforming labeled speech data to create transformed speech data, utilizing the speaker dependent acoustic model and a nonlinear transformation, and adjusting a deep neural network (DNN) acoustic model, utilizing the transformed speech data.Type: GrantFiled: September 7, 2016Date of Patent: February 12, 2019Assignee: International Business Machines CorporationInventors: Xiaodong Cui, Vaibhava Goel
-
Patent number: 10192548Abstract: An electronic device includes a microphone that receives an audio signal that includes a spoken trigger phrase, and a processor that is electrically coupled to the microphone. The processor measures characteristics of the audio signal, and determines, based on the measured characteristics, whether the spoken trigger phrase is acceptable for trigger phrase model training. If the spoken trigger phrase is determined not to be acceptable for trigger phrase model training, the processor rejects the trigger phrase for trigger phrase model training.Type: GrantFiled: June 2, 2017Date of Patent: January 29, 2019Assignee: Google Technology Holdings LLCInventors: Joel A. Clark, Tenkasi V. Ramabadran, Mark A. Jasiuk
-
Patent number: 10192554Abstract: A method may include obtaining audio data originating at a first device during a communication session between the first device and a second device and providing the audio data to a first speech recognition system to generate a first transcript based on the audio data and directing the first transcript to the second device. The method may also include in response to obtaining a quality indication regarding a quality of the first transcript, multiplexing the audio data to provide the audio data to a second speech recognition system to generate a second transcript based on the audio data while continuing to provide the audio data to the first speech recognition system and direct the first transcript to the second device, and in response to obtaining a transfer indication that occurs after multiplexing of the audio data, directing the second transcript to the second device instead of the first transcript.Type: GrantFiled: February 26, 2018Date of Patent: January 29, 2019Assignee: Sorenson IP Holdings, LLCInventors: Kenneth Boehme, Michael Holm, Shane Roylance
-
Patent number: 10152975Abstract: A method, device, system, and computer medium for providing interactive advertising are provided. For example, a device may request an advertisement from a remote server, receive the advertisement, receive a response from a user who is listening and/or watching the advertisement, and transmit the response to the server for further action. The user may input a response by speaking. A server may receive an advertisement request from the device, select an advertisement based on pre-defined one or more criteria, transmit the selected advertisement to the device for play, receive from the device a response to the selected advertisement, and then perform an action corresponding to the received response.Type: GrantFiled: January 5, 2015Date of Patent: December 11, 2018Assignee: XAPPMEDIA, INC.Inventors: Patrick B. Higbie, John P. Kelvie, Michael M. Myers, Franklin D. Raines
-
Patent number: 10152973Abstract: Features are disclosed for managing the use of speech recognition models and data in automated speech recognition systems. Models and data may be retrieved asynchronously and used as they are received or after an utterance is initially processed with more general or different models. Once received, the models and statistics can be cached. Statistics needed to update models and data may also be retrieved asynchronously so that it may be used to update the models and data as it becomes available. The updated models and data may be immediately used to re-process an utterance, or saved for use in processing subsequently received utterances. User interactions with the automated speech recognition system may be tracked in order to predict when a user is likely to utilize the system. Models and data may be pre-cached based on such predictions.Type: GrantFiled: November 16, 2015Date of Patent: December 11, 2018Assignee: Amazon Technologies, Inc.Inventors: Bjorn Hoffmeister, Hugh Evan Secker-Walker, Jeffrey Cornelius O'Neill
-
Patent number: 10146765Abstract: A text prediction engine, a system comprising a text prediction engine, and a method for generating sequence predictions. The text prediction engine, system and method generate multiple sequence predictions based on evidence sources and models, with each sequence prediction having a sequence and associated probability estimate.Type: GrantFiled: June 3, 2016Date of Patent: December 4, 2018Assignee: Touchtype Ltd.Inventors: Benjamin Medlock, Douglas Alexander Harper Orr
-
Patent number: 10140976Abstract: Methods and systems for language processing includes training one or more automatic speech recognition models using an automatic speech recognition dictionary. A set of N automatic speech recognition hypotheses for an input is determined, based on the one or more automatic speech recognition models, using a processor. A best hypothesis is selected using a discriminative language model and a list of relevant words. Natural language processing is performed on the best hypothesis.Type: GrantFiled: December 14, 2015Date of Patent: November 27, 2018Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Masayuki Suzuki, Nobuyasu Itoh, Gakuto Kurata, Tohru Nagano
-
Patent number: 10127927Abstract: A method for emotion or speaking style recognition and/or clustering comprises receiving one or more speech samples, generating a set of training data by extracting one or more acoustic features from every frame of the one or more speech samples, and generating a model from the set of training data, wherein the model identifies emotion or speaking style dependent information in the set of training data. The method may further comprise receiving one or more test speech samples, generating a set of test data by extracting one or more acoustic features from every frame of the one or more test speeches, and transforming the set of test data using the model to better represent emotion/speaking style dependent information, and use the transformed data for clustering and/or classification to discover speech with similar emotion or speaking style.Type: GrantFiled: June 18, 2015Date of Patent: November 13, 2018Assignee: Sony Interactive Entertainment Inc.Inventors: Ozlem Kalinli-Akbacak, Ruxin Chen
-
Patent number: 10083696Abstract: A method for determining user liveness is provided that includes calculating, by a computing device, a spectral property difference between voice biometric data captured from a user and user record voice biometric data. The user and the computing device constitute a user-computing device pair, and the voice biometric data is captured by the computing device during a verification transaction. Moreover, the method includes inputting the spectral property difference into a machine learning algorithm, calculating an output score with the machine learning algorithm, and determining the voice biometric data was captured from a live user when the output score satisfies a threshold score.Type: GrantFiled: March 7, 2017Date of Patent: September 25, 2018Assignee: DAON HOLDINGS LIMITEDInventor: Raphael Blouet
-
Patent number: 10079687Abstract: The embodiments herein provide a method and system for password recovery using Fuzzy logic. The system includes a receiving module, a validation module, an authentication module, a display module, a memory module, and a network interface. The system uses a phonetic algorithm such as Soundex algorithm for enabling the password recovery process. The user credentials received through the receiving module is validated with the validation module at the time of accessing the application. The authentication module is configured to authenticate the user using a fuzzy logic derived from a phonetic algorithm, by matching the answers of the user with the stored answers to compute a score which is compared with a threshold score. The user is enabled to unlock the user device when the computed validation score is greater than the threshold score.Type: GrantFiled: April 12, 2016Date of Patent: September 18, 2018Assignee: ILANTUS TECHNOLOGIES PVT. LTD.Inventors: Ashutosh Kumar Mishra, Saurav Sharma, Deepika Kuntar
-
Patent number: 9997157Abstract: Systems and methods are provided for improving language models for speech recognition by personalizing knowledge sources utilized by the language models to specific users or user-population characteristics. A knowledge source, such as a knowledge graph, is personalized for a particular user by mapping entities or user actions from usage history for the user, such as query logs, to the knowledge source. The personalized knowledge source may be used to build a personal language model by training a language model with queries corresponding to entities or entity pairs that appear in usage history. In some embodiments, a personalized knowledge source for a specific user can be extended based on personalized knowledge sources of similar users.Type: GrantFiled: May 16, 2014Date of Patent: June 12, 2018Assignee: Microsoft Technology Licensing, LLCInventors: Murat Akbacak, Dilek Z. Hakkani-Tur, Gokhan Tur, Larry P. Heck, Benoit Dumoulin
-
Patent number: 9984678Abstract: Various technologies described herein pertain to adapting a speech recognizer to input speech data. A first linear transform can be selected from a first set of linear transforms based on a value of a first variability source corresponding to the input speech data, and a second linear transform can be selected from a second set of linear transforms based on a value of a second variability source corresponding to the input speech data. The linear transforms in the first and second sets can compensate for the first variability source and the second variability source, respectively. Moreover, the first linear transform can be applied to the input speech data to generate intermediate transformed speech data, and the second linear transform can be applied to the intermediate transformed speech data to generate transformed speech data. Further, speech can be recognized based on the transformed speech data to obtain a result.Type: GrantFiled: March 23, 2012Date of Patent: May 29, 2018Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Michael Lewis Seltzer, Alejandro Acero
-
Patent number: 9959862Abstract: A speech recognition apparatus based on a deep-neural-network (DNN) sound model includes a memory and a processor. As the processor executes a program stored in the memory, the processor generates sound-model state sets corresponding to a plurality of pieces of set training speech data included in multi-set training speech data, generates a multi-set state cluster from the sound-model state sets, and sets the multi-set training speech data as an input node and the multi-set state cluster as output nodes so as to learn a DNN structured parameter.Type: GrantFiled: June 20, 2016Date of Patent: May 1, 2018Assignee: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTEInventors: Byung Ok Kang, Jeon Gue Park, Hwa Jeon Song, Yun Keun Lee, Eui Sok Chung
-
Patent number: 9953636Abstract: A method for generating a speech recognition model includes accessing a baseline speech recognition model, obtaining information related to recent language usage from search queries, and modifying the speech recognition model to revise probabilities of a portion of a sound occurrence based on the information. The portion of a sound may include a word. Also, a method for generating a speech recognition model, includes receiving at a search engine from a remote device an audio recording and a transcript that substantially represents at least a portion of the audio recording, synchronizing the transcript with the audio recording, extracting one or more letters from the transcript and extracting the associated pronunciation of the one or more letters from the audio recording, and generating a dictionary entry in a pronunciation dictionary.Type: GrantFiled: October 9, 2015Date of Patent: April 24, 2018Inventors: Michael H. Cohen, Shumeet Baluja, Pedro J. Moreno Mengibar
-
Patent number: 9928831Abstract: A speech data recognition method, apparatus, and server are for distinguishing regional accent. The speech data recognition method includes: calculating a speech recognition confidence and/or a signal-to-noise ratio of the speech data, and screening a regional speech data from the speech data based on the speech recognition confidence and/or the signal-to-noise ratio of the speech dat; and determining a region to which the regional speech data belongs based on a regional attribute of the regional speech data. The regional speech data are automatically recognized from the mass speech data by calculating the speech recognition confidence, the signal-to-noise ratio of the speech data or the combination thereof, thereby avoiding manual labeling of the speech data and enhancing the efficiency of the speech data processing.Type: GrantFiled: December 18, 2014Date of Patent: March 27, 2018Assignee: Baidu Online Network Technology (Beijing) Co., Ltd.Inventors: Dan Su, Zhao Yin
-
Patent number: 9892726Abstract: Features are disclosed for modifying a statistical model to more accurately discriminate between classes of input data. A subspace of the total model parameter space can be learned such that individual points in the subspace, corresponding to the various classes, are discriminative with respect to the classes. The subspace can be learned using an iterative process whereby an initial subspace is used to generate data and maximize an objective function. The objective function can correspond to maximizing the posterior probability of the correct class for a given input. The initial subspace, data, and objective function can be used to generate a new subspace that better discriminates between classes. The process may be repeated as desired. A model modified using such a subspace can be used to classify input data.Type: GrantFiled: December 17, 2014Date of Patent: February 13, 2018Assignee: Amazon Technologies, Inc.Inventors: Sri Venkata Surya Siva Rama Krishna Garimella, Spyridon Matsoukas, Ariya Rastrow, Bjorn Hoffmeister
-
Patent number: 9852732Abstract: A method of operating a communication system includes generating a transcript of at least a portion of a conversation between a plurality of users. The transcript includes a plurality of subsets of characters. The method further includes displaying the transcript on a plurality of communication devices, identifying an occurrence of at least one selected subset of characters from the plurality of subsets of characters, and querying a definition source for at least one definition for the selected subset of characters. The definition for the selected subset of characters is displayed on the plurality of communication devices.Type: GrantFiled: October 7, 2010Date of Patent: December 26, 2017Assignee: Avaya Inc.Inventors: David L. Chavez, Larry J. Hardouin
-
Patent number: 9799350Abstract: An apparatus and method for verifying an utterance based on multi-event detection information in a natural language speech recognition system. The apparatus includes a noise processor configured to process noise of an input speech signal, a feature extractor configured to extract features of speech data obtained through the noise processing, an event detector configured to detect events of the plurality of speech features occurring in the speech data using the noise-processed data and data of the extracted features, a decoder configured to perform speech recognition using a plurality of preset speech recognition models for the extracted feature data, and an utterance verifier configured to calculate confidence measurement values in units of words and sentences using information on the plurality of events detected by the event detector and a preset utterance verification model and perform utterance verification according to the calculated confidence measurement values.Type: GrantFiled: June 17, 2016Date of Patent: October 24, 2017Assignee: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTEInventors: Jeom Ja Kang, Hwa Jeon Song, Jeon Gue Park, Hoon Chung
-
Patent number: 9779722Abstract: A system and method for adapting a speech recognition and generation system. The system and method include providing a speech recognition and generation engine that processes speech received from a user and providing a dictionary adaptation module that adds out of vocabulary words to a baseline dictionary of the speech recognition and generation system. Words are added by extracting words that are encountered and adding out of vocabulary words to the baseline dictionary of the speech recognition and generation system.Type: GrantFiled: November 5, 2013Date of Patent: October 3, 2017Assignee: GM Global Technology Operations LLCInventors: Ron M. Hecht, Omer Tsimhoni, Timothy J. Grost
-
Patent number: 9767792Abstract: A system and method for learning alternate pronunciations for speech recognition is disclosed. Alternative name pronunciations may be covered, through pronunciation learning, that have not been previously covered in a general pronunciation dictionary. In an embodiment, the detection of phone-level and syllable-level mispronunciations in words and sentences may be based on acoustic models trained by Hidden Markov Models. Mispronunciations may be detected by comparing the likelihood of the potential state of the targeting pronunciation unit with a pre-determined threshold through a series of tests. It is also within the scope of an embodiment to detect accents.Type: GrantFiled: October 12, 2016Date of Patent: September 19, 2017Assignee: Interactive Intelligence Group, Inc.Inventors: Zhenhao Ge, Vivek Tyagi, Aravind Ganapathiraju, Ananth Nagaraja Iyer, Scott Allen Randal, Felix Immanuel Wyss
-
Patent number: 9754586Abstract: The present invention concerns methods and apparatus for identifying and assigning meaning to words not recognized by a vocabulary or grammar of a speech recognition system. In an embodiment of the invention, the word may be in an acoustic vocabulary of the speech recognition system, but may be unrecognized by an embedded grammar of a language model of the speech recognition system. In another embodiment of the invention, the word may not be recognized by any vocabulary associated with the speech recognition system. In embodiments of the invention, at least one hypothesis is generated for an utterance not recognized by the speech recognition system. If the at least one hypothesis meets at least one predetermined criterion, a sword or more corresponding to the at least one hypothesis is added to the vocabulary of the speech recognition system.Type: GrantFiled: June 5, 2008Date of Patent: September 5, 2017Assignee: Nuance Communications, Inc.Inventors: Sabine Deligne, Ramesh A. Gopinath, Dimitri Kanevsky, Mahesh Viswanathan
-
Patent number: 9721560Abstract: A low power sound recognition sensor is configured to receive an analog signal that may contain a signature sound. Sound parameter information is extracted from the analog signal and compared to a sound parameter reference stored locally with the sound recognition sensor to detect when the signature sound is received in the analog signal. A trigger signal is generated when a signature sound is detected. A portion of the extracted sound parameter information is sent to a remote training location for adaptive training when a signature sound detection error occurs. An updated sound parameter reference from the remote training location is received in response to the adaptive training.Type: GrantFiled: November 2, 2015Date of Patent: August 1, 2017Assignee: TEXAS INSTRUMENTS INCORPORATEDInventors: Lin Sun, Wei Ma
-
Patent number: 9684648Abstract: Determining a subject type for an entity in a text segment. A text segment is selected, which includes one or more single-word or multi-word entities. Natural language processing is performed on the selected text segment to identify entities that constitute subjects of the selected text segment. One entity is selected. A variant annotation is associated with the selected entity. The variant annotation reflects multiple subject types for the selected entity and a value for each subject type. The most probable subject type is determined for the selected entity, based on a combination of natural language processing rules and dictionary listings. The value of the annotation is incremented for the subject type corresponding to the most probable subject type for the selected entity, so that the highest value of the annotation indicates the most probable subject type for the selected entity within the selected text segment.Type: GrantFiled: May 31, 2012Date of Patent: June 20, 2017Assignee: International Business Machines CorporationInventors: Patrick W. Fink, Kristin E. McNeil, Philip E. Parker
-
Patent number: 9666181Abstract: A tuning system for tuning a speech recognition system includes a transmitter for sending a user response to a speech recognition system. The user response is based at least in part on a test stimulus that may be generated by the control system. A receiver receives a recognized response from the speech recognition system; this recognized response is based at least in part on the associated user response. An adjustment module adjusts at least one parameter of the speech recognition system based at least in part on at least one of the test stimulus, the associated user response, and the recognized response.Type: GrantFiled: February 7, 2013Date of Patent: May 30, 2017Assignees: University of Florida Research Foundation, Inc., Cochlear LimitedInventors: Lee S. Krause, Bonny Banerjee, Mark Skowronski, Rahul Shrivastav, Alice E. Holmes
-
Patent number: 9659559Abstract: Phonetic distances are empirically measured as a function of speech recognition engine recognition error rates. The error rates are determined by comparing a recognized speech file with a reference file. The phonetic distances can be normalized to earlier measurements. The phonetic distances/error rates can also be used to improve speech recognition engine grammar selection, as an aid in language training and evaluation, and in other applications.Type: GrantFiled: June 25, 2009Date of Patent: May 23, 2017Assignee: ADACEL SYSTEMS, INC.Inventor: Chang-Qing Shu
-
Patent number: 9633001Abstract: Content is received and compared against rules for identifying a type of content. Each rule has both segmented and unsegmented patterns. The content is matched against the patterns and assigned a confidence score that is higher if the content matches a segmented pattern and lower if the content matches an unsegmented pattern.Type: GrantFiled: June 23, 2015Date of Patent: April 25, 2017Assignee: Microsoft Technology Licensing, LLCInventors: Mauktik Gandhi, Charles Lamanna, Vidyaraman Sankaranarayanan, Raimundo Pontes Filho
-
Patent number: 9626971Abstract: Method for text-dependent Speaker Recognition using a speaker adapted Universal Background Model, wherein the speaker adapted Universal Background Model is a speaker adapted Hidden Markov Model comprising channel correction.Type: GrantFiled: September 20, 2013Date of Patent: April 18, 2017Assignee: Cirrus Logic International Semiconductor Ltd.Inventors: Luis Buera Rodriguez, Carlos Vaquero Aviles-Casco, Marta Garcia Gomar, Antonio Miguel Artiaga
-
Patent number: 9538114Abstract: A system includes a voice converter converting a first voice command into a first electrical command and a command library having library contents. A language responsiveness module (LRM) stores the first electrical command in a temporary set when a first control command cannot be determined from the library contents. A voice prompt module receives a second voice command when the first control command cannot be determined from the library contents. The voice converter converts a second voice command into a second electrical command corresponding to the second voice command. The LRM compares the second electrical command to the command library. The LRM determines a second control command corresponding to the second electrical command in response to comparing the second voice command to the command library and stores the first voice command in the command library after determining the control command corresponding to the second voice command.Type: GrantFiled: March 15, 2013Date of Patent: January 3, 2017Assignee: The DIRECTV Group, Inc.Inventors: John M. Han, Vlad Zarney, Vladislav Kolegayev, Becky P. Chen
-
Patent number: 9524718Abstract: The speech recognition result through the general-purpose server and that through the specialized speech recognition server are integrated in an optimum manner, thereby, a speech recognition function least in errors in the end being provided. The specialized speech recognition server 108 is constructed with the words contained in the user dictionary data in use as well as the performance of the general-purpose speech recognition server 106 is preliminarily evaluated with such user dictionary data. Based on such evaluation result, information related to which recognition results through the specialized and general-purpose speech recognition servers are adopted and to how the adopted recognition results are weighted to obtain an optimum recognition result is preliminarily retained in the form of a database.Type: GrantFiled: April 3, 2013Date of Patent: December 20, 2016Assignee: Clarion Co., Ltd.Inventors: Yasunari Obuchi, Takeshi Homma
-
Patent number: 9489940Abstract: The technology of the present application provides a method and apparatus to allow for dynamically updating a language model across a large number of similarly situated users. The system identifies individual changes to user profiles and evaluates the change for a broader application, such as, a dialect correction for a speech recognition engine, as administrator for the system identifies similarly situated user profiles and downloads the profile change to effect a dynamic change to the language model of similarly situated users.Type: GrantFiled: June 11, 2012Date of Patent: November 8, 2016Assignee: NVOQ INCORPORATEDInventor: Charles Corfield
-
Patent number: 9478216Abstract: A method for speech recognition is implemented in the specific form of computer processes that function in a computer processor. That is, one or more computer processes: process a speech input to produce a sequence of representative speech vectors and perform multiple recognition passes to determine a recognition output corresponding to the speech input. At least one generic recognition pass is based on a generic speech recognition arrangement using generic modeling of a broad general class of input speech. And at least one adapted recognition pass is based on a speech adapted arrangement using pre-adapted modeling of a specific sub-class of the general class of input speech.Type: GrantFiled: December 8, 2009Date of Patent: October 25, 2016Assignee: Nuance Communications, Inc.Inventors: Daniel Willett, Lambert Mathias, Chuang He, Jianxiong Wu
-
Patent number: 9472187Abstract: The present disclosure relates to training a speech recognition system. One example method includes receiving a collection of speech data items, wherein each speech data item corresponds to an utterance that was previously submitted for transcription by a production speech recognizer. The production speech recognizer uses initial production speech recognizer components in generating transcriptions of speech data items. A transcription for each speech data item is generated using an offline speech recognizer, and the offline speech recognizer components are configured to improve speech recognition accuracy in comparison with the initial production speech recognizer components. The updated production speech recognizer components are trained for the production speech recognizer using a selected subset of the transcriptions of the speech data items generated by the offline speech recognizer.Type: GrantFiled: May 25, 2016Date of Patent: October 18, 2016Assignee: Google Inc.Inventors: Olga Kapralova, John Paul Alex, Eugene Weinstein, Pedro J. Moreno Mengibar, Olivier Siohan, Ignacio Lopez Moreno
-
Patent number: 9406299Abstract: A computer-implemented method is described for speaker adaptation in automatic speech recognition. Speech recognition data from a particular speaker is used for adaptation of an initial speech recognition acoustic model to produce a speaker adapted acoustic model. A speaker dependent differential acoustic model is determined that represents differences between the initial speech recognition acoustic model and the speaker adapted acoustic model. In addition, an approach is also disclosed to estimate speaker-specific feature or model transforms over multiple sessions. This is achieved by updating the previously estimated transform using only adaptation statistics of the current session.Type: GrantFiled: March 8, 2012Date of Patent: August 2, 2016Assignee: Nuance Communications, Inc.Inventors: Christian Gollan, Daniel Willett
-
Patent number: 9378735Abstract: Features are disclosed for estimating affine transforms in Log Filter-Bank Energy Space (“LFBE” space) in order to adapt artificial neural network-based acoustic models to a new speaker or environment. Neural network-based acoustic models may be trained using concatenated LFBEs as input features. The affine transform may be estimated by minimizing the least squares error between corresponding linear and bias transform parts for the resultant neural network feature vector and some standard speaker-specific feature vector obtained for a GMM-based acoustic model using constrained Maximum Likelihood Linear Regression (“cMLLR”) techniques. Alternatively, the affine transform may be estimated by minimizing the least squares error between the resultant transformed neural network feature and some standard speaker-specific feature obtained for a GMM-based acoustic model.Type: GrantFiled: December 19, 2013Date of Patent: June 28, 2016Assignee: Amazon Technologies, Inc.Inventors: Sri Venkata Surya Siva Rama Krishna Garimella, Bjorn Hoffmeister, Nikko Strom
-
Patent number: 9330663Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, receiving audio data; determining that an initial portion of the audio data corresponds to an initial portion of a hotword; in response to determining that the initial portion of the audio data corresponds to the initial portion of the hotword, selecting, from among a set of one or more actions that are performed when the entire hotword is detected, a subset of the one or more actions; and causing one or more actions of the subset to be performed.Type: GrantFiled: March 20, 2014Date of Patent: May 3, 2016Assignee: Google Inc.Inventor: Matthew Sharifi
-
Patent number: 9305554Abstract: A method and device for recognizing an utterance. The method includes transmitting context data associated with a first device to a second device. A first speech recognition model is received from the second device. The first speech recognition model is a subset of a second speech recognition model present at the second device. The first speech recognition model is based on the context data. It is determined whether the utterance can be recognized at the first device based on the first speech recognition model. If the utterance cannot be recognized at the first device, then at least a portion of the utterance is sent to the second device. If the utterance can be recognized at the first device, then an action associated with the recognized utterance is performed.Type: GrantFiled: July 16, 2014Date of Patent: April 5, 2016Assignee: Samsung Electronics Co., Ltd.Inventors: Arun Jagatheesan, Juhan Lee, Jong Hoon Ahnn
-
Patent number: 9208780Abstract: The processing efficiency and estimation accuracy of a voice activity detection apparatus are improved. An acoustic signal analyzer receives a digital acoustic signal containing a speech signal and a noise signal, generates a non-speech GMM and a speech GMM adapted to a noise environment, by using a silence GMM and a clean-speech GMM in each frame of the digital acoustic signal, and calculates the output probabilities of dominant Gaussian distributions of the GMMs. A speech state probability to non-speech state probability ratio calculator calculates a speech state probability to non-speech state probability ratio based on a state transition model of a speech state and a non-speech state, by using the output probabilities; and a voice activity detection unit judges, from the speech state probability to non-speech state probability ratio, whether the acoustic signal in the frame is in the speech state or in the non-speech state and outputs only the acoustic signal in the speech state.Type: GrantFiled: July 15, 2010Date of Patent: December 8, 2015Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATIONInventors: Masakiyo Fujimoto, Tomohiro Nakatani
-
Patent number: 9171547Abstract: Included are embodiments for multi-pass analytics. At least one embodiment of a method includes receiving data associated with a communication, performing first tier analytics on the received data, and performing second tier analytics on the received data, where the second tier analytics determines different characteristics of the received data than the first tier analytics.Type: GrantFiled: October 12, 2011Date of Patent: October 27, 2015Assignee: VERINT AMERICAS INC.Inventor: Joseph Watson
-
Patent number: 9165028Abstract: Methods and apparatus related to ranking modifications of a previous query. For example, modifications of a previous query may be generated based on a current query issued subsequent to the previous query by substituting one or more n-grams of the previous query with one or more n-grams of the current query. One or more measures of each of the modifications may be identified and, based on such measures, a ranking of each of the modifications may be determined. One of the modifications may be selected as a submission query based on the rankings of the modifications. The submission query may be selected for submission in lieu of, or in addition to, the current query.Type: GrantFiled: July 24, 2013Date of Patent: October 20, 2015Assignee: GOOGLE INC.Inventors: Bruce Christensen, Pravir Kumar Gupta, Jan Kuipers
-
Patent number: 9153230Abstract: A method for executing a mobile speech recognition software application based on a multi-layer neural network model includes providing to a hardware accelerator in the mobile device to classify one or more frames of an audio signal. The hardware accelerator includes a multiplier-accumulator (MAC) unit to perform matrix multiplication operations involved in computing the neural network output.Type: GrantFiled: October 23, 2012Date of Patent: October 6, 2015Assignee: Google Inc.Inventor: Juha-Pekka Maaninen
-
Patent number: 9099092Abstract: Techniques disclosed herein include systems and methods for open-domain voice-enabled searching that is speaker sensitive. Techniques include using speech information, speaker information, and information associated with a spoken query to enhance open voice search results. This includes integrating a textual index with a voice index to support the entire search cycle. Given a voice query, the system can execute two matching processes simultaneously. This can include a text matching process based on the output of speech recognition, as well as a voice matching process based on characteristics of a caller or user voicing a query. Characteristics of the caller can include output of voice feature extraction and metadata about the call. The system clusters callers according to these characteristics. The system can use specific voice and text clusters to modify speech recognition results, as well as modifying search results.Type: GrantFiled: January 10, 2014Date of Patent: August 4, 2015Assignee: Nuance Communications, Inc.Inventors: Shilei Zhang, Shenghua Bao, Wen Liu, Yong Qin, Zhiwei Shuang, Jian Chen, Zhong Su, Qin Shi, William F. Ganong, III
-
Patent number: 9082403Abstract: The subject disclosure is directed towards training a classifier for spoken utterances without relying on human-assistance. The spoken utterances may be related to a voice menu program for which a speech comprehension component interprets the spoken utterances into voice menu options. The speech comprehension component provides confirmations to some of the spoken utterances in order to accurately assign a semantic label. For each spoken utterance with a denied confirmation, the speech comprehension component automatically generates a pseudo-semantic label that is consistent with the denied confirmation and selected from a set of potential semantic labels and updates a classification model associated with the classifier using the pseudo-semantic label.Type: GrantFiled: December 15, 2011Date of Patent: July 14, 2015Assignee: Microsoft Technology Licensing, LLCInventors: Yun-Cheng Ju, James Garnet Droppo, III
-
Publication number: 20150142436Abstract: The present invention allows feedback from operator workstations to be used to update databases used for providing automated information services. When an automated process fails, recorded speech of the caller is passed on to the operator for decision making. Based on the selections made by the operator in light of the speech or other interactions with the caller, a comparison is made between the speech and the selections made by the operator to arrive at information to update the databases in the information services automation system. Thus, when the operator inputs the words corresponding to the speech provided at the information services automation system, the speech may be associated with those words. The association between the speech and the words may be used to update different databases in the information services automation system.Type: ApplicationFiled: January 30, 2015Publication date: May 21, 2015Inventors: Bruce Bokish, Michael Craig Presnell