Specialized Models Patents (Class 704/255)
-
Patent number: 9842592Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for language models using non-linguistic context. In some implementations, context data indicating non-linguistic context for the utterance is received. Based on the context data, feature scores for one or more non-linguistic features are generated. The feature scores for the non-linguistic features are provided to a language model trained to process scores for non-linguistic features. The output from the language model is received, and a transcription for the utterance is determined using the output of the language model.Type: GrantFiled: February 12, 2014Date of Patent: December 12, 2017Assignee: Google Inc.Inventors: Fadi Biadsy, Pedro J. Moreno Mengibar
-
Patent number: 9842593Abstract: Predetermined services are provided using preset instructions. A transcript of audible content provided over an electronic network and received at a communications device is analyzed to determine whether a trigger is present in the audible content. When the trigger is present in the audible content, preset instructions correlated with the trigger and instructing how to provide a predetermined service are identified. The predetermined service is provided by following the preset instructions.Type: GrantFiled: November 14, 2014Date of Patent: December 12, 2017Assignee: AT&T INTELLECTUAL PROPERTY I, L.P.Inventors: Shiv Kumar, Paritosh Bajpay, Anand K. Singh, Satyendra Tripathi
-
Patent number: 9836452Abstract: Methods and systems are provided for discriminating ambiguous expressions to enhance user experience. For example, a natural language expression may be received by a speech recognition component. The natural language expression may include at least one of words, terms, and phrases of text. A dialog hypothesis set from the natural language expression may be created by using contextual information. In some cases, the dialog hypothesis set has at least two dialog hypotheses. A plurality of dialog responses may be generated for the dialog hypothesis set. The dialog hypothesis set may be ranked based on an analysis of the plurality of the dialog responses. An action may be performed based on ranking the dialog hypothesis set.Type: GrantFiled: December 30, 2014Date of Patent: December 5, 2017Assignee: Microsoft Technology Licensing, LLCInventors: Jean-Philippe Robichaud, Ruhi Sarikaya
-
Patent number: 9830909Abstract: A speech recognition method and system enables user-configurable speech commands. For a given speech command, the speech recognition engine provides a mechanism for the end-user to select speech command terms to use in substitution for the given speech command. The speech recognition engine, or module thereof, forms a replacement command for the given speech command from the user-selected speech command terms. The speech recognition engine thereafter is synonymously responsive to user utterance of the replacement command as though the user issued/voiced the associated counterpart given speech command.Type: GrantFiled: March 31, 2017Date of Patent: November 28, 2017Assignee: Kopin CorporationInventor: Christopher Parkinson
-
Patent number: 9817881Abstract: A method, apparatus, and tangible computer readable medium for processing a Hidden Markov Model (HMM) structure are disclosed herein. For example, the method includes receiving Hidden Markov Model (HMM) information from an external system. The method also includes processing back pointer data and first HMM states scores for one or more NULL states in the HMM information. Second HMM state scores are processed for one or more non-NULL states in the HMM information based on at least one predecessor state. Further, the method includes transferring the second HMM state scores to the external system.Type: GrantFiled: October 16, 2013Date of Patent: November 14, 2017Assignee: Cypress Semiconductor CorporationInventors: Ojas A. Bapat, Richard M. Fastow, Jens Olson, Kenichi Kumatani
-
Patent number: 9805371Abstract: A user may respond to a request of another user by entering text, such as a customer service representative responding to a customer. Suggestions of responses may be provided to the responding user so that the responding user may use a suggested response instead of entering text to provide a response. Previous messages between the two users and other information may be used to determine an appropriate suggested response. A conversation feature vector may be determined from previous messages, and response feature vectors may be determined from the conversation feature vector. The response feature vectors may be used to determine an appropriate suggested response.Type: GrantFiled: September 1, 2016Date of Patent: October 31, 2017Assignee: ASAPP, INC.Inventors: Gustavo Sapoznik, Shawn Henry
-
Patent number: 9792907Abstract: Techniques related to key phrase detection for applications such as wake on voice are discussed. Such techniques may include updating a start state based rejection model and a key phrase model based on scores of sub-phonetic units from an acoustic model to generate a rejection likelihood score and a key phrase likelihood score and determining whether received audio input is associated with a predetermined key phrase based on the rejection likelihood score and the key phrase likelihood score.Type: GrantFiled: November 24, 2015Date of Patent: October 17, 2017Assignee: Intel IP CorporationInventors: Tobias Bocklet, Joachim Hofer
-
Patent number: 9753918Abstract: A speech translation system and methods for cross-lingual communication that enable users to improve and customize content and usage of the system and easily. The methods include, in response to receiving an utterance including a first term associated with a field, translating the utterance into a second language. In response to receiving an indication to add the first term associated with the field to a first recognition lexicon, adding the first term associated with the field and the determined translation to a first machine translation module and to a shared database for a community associated with the field of the first term associated with the field, wherein the first term associated with the field added to the shared database is accessible by the community.Type: GrantFiled: January 5, 2015Date of Patent: September 5, 2017Assignee: Facebook, Inc.Inventors: Alexander Waibel, Ian R. Lane
-
Patent number: 9747893Abstract: A computer-based, unsupervised training method for an N-gram language model includes reading, by a computer, recognition results obtained as a result of speech recognition of speech data; acquiring, by the computer, a reliability for each of the read recognition results; referring, by the computer, to the recognition result and the acquired reliability to select an N-gram entry; and training, by the computer, the N-gram language model about selected one of more of the N-gram entries using all recognition results.Type: GrantFiled: October 6, 2016Date of Patent: August 29, 2017Assignee: International Business Machines CorporationInventors: Nobuyasu Itoh, Gakuto Kurata, Masafumi Nishimura
-
Patent number: 9721561Abstract: In a speech recognition system, deep neural networks (DNNs) are employed in phoneme recognition. While DNNs typically provide better phoneme recognition performance than other techniques, such as Gaussian mixture models (GMM), adapting a DNN to a particular speaker is a real challenge. According to at least one example embodiment, speech data and corresponding speaker data are both applied as input to a DNN. In response, the DNN generates a prediction of a phoneme based on the input speech data and the corresponding speaker data. The speaker data may be generated from the corresponding speech data.Type: GrantFiled: December 5, 2013Date of Patent: August 1, 2017Assignee: Nuance Communications, Inc.Inventors: Yun Tang, Venkatesh Nagesha, Xing Fan
-
Patent number: 9699298Abstract: An electronic communication device and method for communication including automatically selecting, composing, and/or presenting a number of proposed responses to a message is disclosed. The device can present the incoming message content to the user together with an alert to the user that one or more proposed responses are available for review and selection. The device can calculate a confidence metric for each proposed response. The user alert and/or presentation of the messages can be made to vary in intensity in accordance with the confidence metric of the one or more proposed responses. The presentation of the proposed responses can be alternated and/or combined with a method for the device to receive user input to the content of the proposed response. The user can edit a proposed response rather than composing a complete response.Type: GrantFiled: August 10, 2015Date of Patent: July 4, 2017Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Mark Rubinstein, Jason Nelson
-
Patent number: 9684650Abstract: A penalized loss is optimized using a corpus of language samples respective to a set of parameters of a language model. The penalized loss includes a function measuring predictive accuracy of the language model respective to the corpus of language samples and a penalty comprising a tree-structured norm. The trained language model with optimized values for the parameters generated by the optimizing is applied to predict a symbol following sequence of symbols of the language modeled by the language model. In some embodiments the penalty comprises a tree-structured lp-norm, such as a tree-structured l2-norm or a tree-structured l?-norm. In some embodiments a tree-structured l?-norm operates on a collapsed suffix trie in which any series of suffixes of increasing lengths which are always observed in the same context are collapsed into a single node. The optimizing may be performed using a proximal step algorithm.Type: GrantFiled: September 10, 2014Date of Patent: June 20, 2017Assignee: XEROX CORPORATIONInventors: Anil Kumar Nelakanti, Guillaume M. Bouchard, Cedric Archambeau, Francis Bach, Julien Mairal
-
Patent number: 9640178Abstract: A speech recognition method and system enables user-configurable speech commands. For a given speech command, the speech recognition engine provides a mechanism for the end-user to select speech command terms to use in substitution for the given speech command. The speech recognition engine, or module thereof, forms a replacement command for the given speech command from the user-selected speech command terms. The speech recognition engine thereafter is synonymously responsive to user utterance of the replacement command as though the user issued/voiced the associated counterpart given speech command.Type: GrantFiled: December 17, 2014Date of Patent: May 2, 2017Assignee: Kopin CorporationInventor: Christopher Parkinson
-
Patent number: 9626354Abstract: For using a tone indicator to determine a language for text recognition, an apparatus for using a tone indicator to identify language in text recognition is disclosed. The apparatus for text recognition using a tone indicator includes a text input module, a tone indicator module, and a language identification module. A method and computer program product also perform the functions of the apparatus.Type: GrantFiled: January 21, 2014Date of Patent: April 18, 2017Assignee: Lenovo (Singapore) PTE. LTD.Inventor: Xin Feng
-
Patent number: 9620111Abstract: Described herein are systems and methods for the generation and maintenance of language models. Language models are developed based at least in part on transaction data from one or more users. These transactions may include purchases and other interactions between one or more users and one or more online merchants. The language models may be associated with a particular user or group of users. Ongoing transactions may modify the language models. The language models may be used to recognize spoken input from the one or more users.Type: GrantFiled: May 1, 2012Date of Patent: April 11, 2017Assignee: Amazon Technologies, Inc.Inventors: Matthew P. Bell, Stephen Polansky, Yuzo Watanabe
-
Patent number: 9594744Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for transcribing utterances into written text are disclosed. The methods, systems, and apparatus include actions of obtaining a lexicon model that maps phones to spoken text and obtaining a language model that assigns probabilities to written text. Further includes generating a transducer that maps the written text to the spoken text, the transducer mapping multiple items of the written text to an item of the spoken text. Additionally, the actions include constructing a decoding network for transcribing utterances into written text, by composing the lexicon model, the inverse of the transducer, and the language model.Type: GrantFiled: March 14, 2013Date of Patent: March 14, 2017Assignee: Google Inc.Inventors: Hasim Sak, Francoise Beaufays
-
Patent number: 9564125Abstract: Methods and systems are provided for adapting a speech system. In one example a method includes: logging speech data from the speech system; detecting a user characteristic from the speech data; and selectively updating a language model based on the user characteristic.Type: GrantFiled: October 22, 2013Date of Patent: February 7, 2017Assignee: GM GLOBAL TECHNOLOGY OPERATIONS LLCInventors: Ron M. Hecht, Timothy J. Grost, Robert D. Sims, III, Ute Winter
-
Patent number: 9542604Abstract: A method and apparatus of providing a combined summary by receiving monitored audio and video are provided The method includes: receiving audio and video captured by at least one network camera; generating a video summary by detecting at least one video event from at least one of the audio and the video; generating an audio summary by detecting at least one audio event from at least one of the audio and the video; extracting at least one section of the video summary corresponding to the at least one audio event, and storing the extracted at least one section of the video summary with the audio summary; and providing a display of the video reproducing apparatus with a video summary control interface for controlling the video summary and an audio summary control interface for controlling the audio summary.Type: GrantFiled: May 13, 2015Date of Patent: January 10, 2017Assignee: Hanwha Techwin Co., Ltd.Inventor: Sungbong Cho
-
Patent number: 9530402Abstract: Methods and systems are provided for adapting a speech system. In one example a method includes: logging speech data from the speech system; detecting a user characteristic from the speech data; and selectively updating a language model based on the user characteristic.Type: GrantFiled: October 22, 2013Date of Patent: December 27, 2016Assignee: GM GLOBAL TECHNOLOGY OPERATIONS LLCInventors: Ron M. Hecht, Timothy J. Grost, Robert D. Sims, III, Ute Winter
-
Patent number: 9484025Abstract: The disclosure includes a system and method for configuring custom vocabularies for personalized speech recognition. The system includes a processor and a memory storing instructions that when executed cause the system to: detect a provisioning trigger event; determine a state of a journey associated with a user based on the provisioning trigger event; determine one or more interest places based on the state of the journey; populate a place vocabulary associated with the user using the one or more interest places; filter the place vocabulary based on one or more place filtering parameters; and register the filtered place vocabulary for the user.Type: GrantFiled: October 15, 2013Date of Patent: November 1, 2016Assignee: TOYOTA JIDOSHA KABUSHIKI KAISHAInventors: Divya Sai Toopran, Vinuth Rai, Rahul Parundekar
-
Patent number: 9471064Abstract: A method for controlling a drone includes receiving a natural language request for information about a spatial location, parsing the natural language request into data requests, configuring a flight plan and controlling one or more drones to fly over the spatial location to obtain data types based on the data requests, and extracting and analyzing data to answer the request. The method can include extracting data points from the data types, obtaining labels from a user for one or more of the data points, predicting labels for unlabeled data points from a learning algorithm using the labels obtained from the user, determining the predicted labels are true labels for the unlabeled data points and combining the extracted data, the user labeled data points and the true labeled data points to answer the request for information. The learning algorithm may be active learning using a support vector machine.Type: GrantFiled: December 8, 2015Date of Patent: October 18, 2016Assignee: International Business Machines CorporationInventors: Gregory F. Boland, James R. Kozloski, Yu Ma, Justin G. Manweiler, Kevin E. Siemonsen, Umut Topkara, Katherine Vogt, Justin D. Weisz
-
Patent number: 9449599Abstract: Various embodiments contemplate systems and methods for performing automatic speech recognition (ASR) and natural language understanding (NLU) that enable high accuracy recognition and understanding of freely spoken utterances which may contain proper names and similar entities. The proper name entities may contain or be comprised wholly of words that are not present in the vocabularies of these systems as normally constituted. Recognition of the other words in the utterances in question—e.g., words that are not part of the proper name entities—may occur at regular, high recognition accuracy. Various embodiments provide as output not only accurately transcribed running text of the complete utterance, but also a symbolic representation of the meaning of the input, including appropriate symbolic representations of proper name entities, adequate to allow a computer system to respond appropriately to the spoken request without further analysis of the user's input.Type: GrantFiled: May 30, 2014Date of Patent: September 20, 2016Assignee: PROMPTU SYSTEMS CORPORATIONInventor: Harry William Printz
-
Patent number: 9406298Abstract: Most speaker recognition systems use i-vectors which are compact representations of speaker voice characteristics. Typical i-vector extraction procedures are complex in terms of computations and memory usage. According to an embodiment, a method and corresponding apparatus for speaker identification, comprise determining a representation for each component of a variability operator, representing statistical inter- and intra-speaker variability of voice features with respect to a background statistical model, in terms of a linear operator common to all components of the variability operator and having a first dimension larger than a second dimension of the components of the variability operator; computing statistical voice characteristics of a particular speaker using the determined representations; and employing the statistical voice characteristics of the particular speaker in performing speaker recognition.Type: GrantFiled: April 4, 2013Date of Patent: August 2, 2016Assignee: Nuance Communications, Inc.Inventors: Sandro Cumani, Pietro Laface
-
Patent number: 9367526Abstract: A language processing application employs a classing function optimized for the underlying production application context for which it is expected to process speech. A combination of class based and word based features generates a classing function optimized for a particular production application, meaning that a language model employing the classing function uses word classes having a high likelihood of accurately predicting word sequences encountered by a language model invoked by the production application. The classing function optimizes word classes by aligning the objective of word classing with the underlying language processing task to be performed by the production application. The classing function is optimized to correspond to usage in the production application context using class-based and word-based features by computing a likelihood of a word in an n-gram and a frequency of a word within a class of the n-gram.Type: GrantFiled: July 26, 2011Date of Patent: June 14, 2016Assignee: Nuance Communications, Inc.Inventors: Paul Vozila, Maximilian Bisani, Yi Su, Stephen M. Chu, Stanley F. Chen, Ruhi Sarikaya, Bhuvana Ramabhadran
-
Patent number: 9367612Abstract: A system identifies a set of initial segments of a time-based data item, such as audio. The segments can be defined at regular time intervals within the time-based data item. The initial segments are short segments. The system computes a short-timescale vectorial representation for each initial segment and compares the short-timescale vectorial representation for each initial segment with other short-timescale vectorial representations of the segments in a time duration within the time-based data item (e.g., audio) immediately preceding or immediately following the initial segment. The system generates a representation of long-timescale information for the time-based data item based on a comparison of the short-timescale vectorial representations of the initial segments and the short-timescale vectorial representations of immediate segments. The representation of long-timescale information identifies an underlying repetition structure of the time-based data item, such as rhythm or phrasing in an audio item.Type: GrantFiled: November 18, 2011Date of Patent: June 14, 2016Assignee: GOOGLE INC.Inventors: Douglas Eck, Jay Yagnik
-
Patent number: 9288327Abstract: An apparatus and method for routing an incoming call includes the apparatus configured to perform a routing decision resulting in the incoming call being routed to the voicemail of a called user. The apparatus further creates a transcription of a voicemail message being recorded on the voicemail of the called user. This is done by applying a speech analysis process to the voicemail message as the voicemail message is being recorded. Information about a context of the incoming call is determined from the transcription and a second routing decision is performed based on the information.Type: GrantFiled: May 14, 2014Date of Patent: March 15, 2016Assignee: Mitel Networks CorporationInventors: Trung Tim Trinh, Tom Gray
-
Patent number: 9275039Abstract: According to an embodiment, an information processing device includes a first set generating unit that generates a first set of states reachable by following ?-transitions in a forward direction and a reverse direction among states included in a finite state automaton; a second set generating unit that generates a second set representing a set of states included in each of a plurality of first sets connected by one transition that is not an ?-transition and the first set that is not connected to another first set by one transition that is not an ?-transition; a combination generating unit that obtains, among combinations of ?-removal patterns to be respectively applied to the ?-transitions in the second set, a combination with which the number of transitions after removing ?-transitions is the smallest; and a removing unit that removes ?-transitions in the second set by the removal patterns included in the obtained combination.Type: GrantFiled: September 14, 2012Date of Patent: March 1, 2016Assignee: Kabushiki Kaisha ToshibaInventor: Manabu Nagao
-
Patent number: 9275128Abstract: Generating a document index comprises: obtaining a document to be indexed; determining whether each monadic partition obtained from the document is a filter character and if so, forming a polynary partition with the monadic partition and at least one adjacent monadic partition and indexing the polynary partition, otherwise, indexing the monadic partition. Querying data comprising: receiving a data query, determining whether each monadic partition obtained from the data query is a filter character and if so, forming a polynary partition with the monadic partition and at least one adjacent monadic partition and using the polynary partition to obtain search results, otherwise, using the monadic partition to obtain search results; and combining search results to form a final query search result.Type: GrantFiled: July 20, 2010Date of Patent: March 1, 2016Assignee: Alibaba Group Holding LimitedInventors: Lei Wei, Jiaxiang Shen
-
Patent number: 9256457Abstract: A system for providing an interactive response system for hosted services may include a processor and a memory. The processor may facilitate steps of receiving data streams from participant devices, such as devices participating in a multimedia conference, and transmitting, in response to receiving the data streams from the participant devices, the data streams to the other of the participant devices. The steps may further include receiving, from one of the participant devices, a request to initiate communication with an invitee device, processing the data streams to determine a language associated with the data streams, and providing, to the invitee device, a communication in the determined language. The communication provided to the invitee device may be an initial communication of an interactive response system, such as an interactive voice response system or an interactive messaging response system.Type: GrantFiled: March 28, 2012Date of Patent: February 9, 2016Assignee: Google Inc.Inventors: Matthew John Leske, Jarkko Oikarinen, Tomas Lundqvist
-
Patent number: 9224384Abstract: Embodiments of the present invention include an acoustic processing device, a method for acoustic signal processing, and a speech recognition system. The speech processing device can include a processing unit, a histogram pruning unit, and a pre-pruning unit. The processing unit is configured to calculate one or more Hidden Markov Model (HMM) pruning thresholds. The histogram pruning unit is configured to prune one or more HMM states to generate one or more active HMM states. The pruning is based on the one or more pruning thresholds. The pre-pruning unit is configured to prune the one or more active HMM states based on an adjustable pre-pruning threshold. Further, the adjustable pre-pruning threshold is based on the one or more pruning thresholds.Type: GrantFiled: December 21, 2012Date of Patent: December 29, 2015Assignee: Cypress Semiconductor CorporationInventor: Ojas Ashok Bapat
-
Patent number: 9225879Abstract: A method for video sequential alignment is provided. The method includes inputting a first video sequence and a second video sequence, and extracting features of the first video sequence and the second video sequence. The method also includes generating an expanded path using a hierarchical sequential alignment algorithm and generating masks containing search space using keyframe matching. Further, the method includes calculating the lowest path cost within search space to find a minimal alignment path using a block A* algorithm and outputting a final alignment result after applying the block A* algorithm to search space.Type: GrantFiled: December 27, 2013Date of Patent: December 29, 2015Assignee: TCL RESEARCH AMERICA INC.Inventors: Armin Kappeler, Haiguang Li, Michail Iliadis, Haohong Wang
-
Patent number: 9202459Abstract: Methods and systems are provided for managing speech dialog of a speech system. In one embodiment, a method includes: receiving a first utterance from a user of the speech system; determining a first list of possible results from the first utterance, wherein the first list includes at least two elements that each represent a possible result; analyzing the at least two elements of the first list to determine an ambiguity of the elements; and generating a speech prompt to the user based on partial orthography and the ambiguity.Type: GrantFiled: April 19, 2013Date of Patent: December 1, 2015Assignee: GM GLOBAL TECHNOLOGY OPERATIONS LLCInventors: Eli Tzirkel-Hancock, Gaurav Talwar, Xufang Zhao, Greg T. Lindemann
-
Patent number: 9190058Abstract: Embodiments related to recognizing speech inputs are disclosed. One disclosed embodiment provides a method for recognizing a speech input including receiving depth information of a physical space from a depth camera, determining an identity of a user in the physical space based on the depth information, receiving audio information from one or more microphones, and determining a speech input from the audio input. If the speech input comprises an ambiguous term, the ambiguous term in the speech input is compared to one or more of depth image data received from the depth image sensor and digital content consumption information for the user to identify an unambiguous term corresponding to the ambiguous term. After identifying the unambiguous term, an action is taken on the computing device based on the speech input and the unambiguous term.Type: GrantFiled: January 25, 2013Date of Patent: November 17, 2015Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventor: Christian Klein
-
Patent number: 9110888Abstract: A service server apparatus is provided which can realize a service as if a speaker has a conversation in a face-to-face manner using existing applications. The service server apparatus includes a service activating unit receiving an instruction for performing a task during a communication, a telephone/call control unit recording a call speech of the speaker during the communication between speech communication terminal devices, a speech recognizing enabler, a text translating enabler, and a speech synthesizing enabler performing the task based on the recorded call speech when the instruction is received, and a communication control enabler providing task data obtained through an performance of the task to the speech communication terminal device. The task data is text data obtained by converting the call speech into a text, translated text data obtained by translating the text data into another language, and speech data obtained by converting the translated text data into a speech.Type: GrantFiled: May 24, 2012Date of Patent: August 18, 2015Assignee: NTT DOCOMO, Inc.Inventors: Kazunori Nasu, Seiji Shibata, Mari Akuzawa, Ken Uchiyama
-
Patent number: 9104768Abstract: Techniques are described for aggregating person-based information on a computing device. The computing device includes a display device and one or more processors. The processors are configured to determine a context of the computing device, and identify a contact associated with the currently-executing application and a plurality of contact identifiers associated with the contact. The processors are configured to receive, from two or more different sources and based on at least one contact, information associated with the contact that includes a respective portion of information received from each of the different sources, determine, based on the context of the computing device, a respective priority value for each of the portions of the information, select, based on the respective priority value, one or more portions of the information, and output, for display, a graphical user interface including at least the selected one or more portions of the information.Type: GrantFiled: April 2, 2014Date of Patent: August 11, 2015Assignee: Google Inc.Inventors: Marco Paglia, Alexander Faaborg
-
Patent number: 9093076Abstract: A multipass processing system includes a first grammar-based speech recognition system that compares a spoken utterance to a sub-grammar. The sub-grammar includes keywords or key phrases from active grammars that each uniquely identifies one of many application engines. The first grammar-based speech recognition system generates a first grammar-based speech recognition result and a first grammar-based confidence score. A demultiplexer receives the spoken utterance through an input. The demultiplexer transmits the spoken utterance to one of many other grammar-based speech recognition systems based on the first grammar-based speech recognition-result.Type: GrantFiled: April 30, 2012Date of Patent: July 28, 2015Assignee: 2236008 Ontario Inc.Inventor: Darrin Kenneth Fry
-
Patent number: 9093075Abstract: A method is disclosed herein for recognizing a repeated utterance in a mobile computing device via a processor. A first utterance is detected being spoken into a first mobile computing device. Likewise, a second utterance is detected being spoken into a second mobile computing device within a predetermined time period. The second utterance substantially matches the first spoken utterance and the first and second mobile computing devices are communicatively coupled to each other. The processor enables capturing, at least temporarily, a matching utterance for performing a subsequent processing function. The performed subsequent processing function is based on a type of captured utterance.Type: GrantFiled: April 20, 2012Date of Patent: July 28, 2015Assignee: Google Technology Holdings LLCInventors: Rachid M Alameh, Jiri Slaby, Hirsashi D Watanabe
-
Patent number: 9076451Abstract: An operation determination processing section of a center extracts words included in the utterance of a driver and an operator, reads an attribute associated with each word from a synonym and related word in which an attribute is stored so as to be associated with each word, reads a domain of a candidate or the like for the task associated with the attribute from the synonym and related word in which domains of a candidate for a task associated with the read attribute or domains of a task to be actually performed are stored, totals the domains read for each word for words included in the utterance of the driver or the like, and estimates those related to a domain with a highest total score as the candidate for the task and the task to be actually performed. In this manner, it is possible to estimate the task with high accuracy.Type: GrantFiled: April 12, 2011Date of Patent: July 7, 2015Assignee: TOYOTA JIDOSHA KABUSHIKI KAISHAInventors: Toshiyuki Namba, Hiroaki Sekiyama, Tatsuyuki Oka, Minako Fujishiro, Katsutoshi Okawa, Emi Otani, Atsushi Sasaki, Yasuhiko Fujita
-
Patent number: 9069755Abstract: Described is a technology by which a probability is estimated for a token in a sequence of tokens based upon a number of zero or more times (actual counts) that the sequence was observed in training data. The token may be a word in a word sequence, and the estimated probability may be used in a statistical language model. A discount parameter is set independently of interpolation parameters. If the sequence was observed at least once in the training data, a discount probability and an interpolation probability are computed and summed to provide the estimated probability. If the sequence was not observed, the probability is estimated by computing a backoff probability. Also described are various ways to obtain the discount parameter and interpolation parameters.Type: GrantFiled: March 11, 2010Date of Patent: June 30, 2015Assignee: Microsoft Technology Licensing, LLCInventor: Robert Carter Moore
-
Patent number: 9064495Abstract: In some embodiments, a user device receives a voice signal corresponding to a user utterance. The user device may set a time marker corresponding to a point in time in the voice signal. The voice signal and the time marker may be transmitted to a server device. The server device may perform speech recognition using the voice signal. The server device may determine a time offset corresponding to a difference in time between an end point of the user utterance and a time associated with the time marker. The server device may determine a response to the user utterance. The server device may transmit the time offset and the response to the user device. The user device may use the time offset to determine a user-perceived latency between the end of the user utterance and a beginning of the response.Type: GrantFiled: May 7, 2013Date of Patent: June 23, 2015Assignee: Amazon Technologies, Inc.Inventors: Fred Torok, Peter Spalding VanLund
-
Patent number: 9047868Abstract: A specific language model for speech recognition may be built. In some embodiments, the specific language model is associated with a user and built using a corpus of text obtained from a user computing device. In some embodiments, a sequence of words is constructed from the corpus of text. The sequence of words may be obfuscated, and the obfuscated sequence of words may be stored in the specific language model. A server or a user device may use the specific language model in conjunction with a general language model to perform speech recognition on an utterance made by the user.Type: GrantFiled: July 31, 2012Date of Patent: June 2, 2015Assignee: Amazon Technologies, Inc.Inventors: Jeffrey C. O'Neill, Ryan P. Thomas
-
Patent number: 9043208Abstract: A speech recognition system, method of recognizing speech and a computer program product therefor. A client device identified with a context for an associated user selectively streams audio to a provider computer, e.g., a cloud computer. Speech recognition receives streaming audio, maps utterances to specific textual candidates and determines a likelihood of a correct match for each mapped textual candidate. A context model selectively winnows candidate to resolve recognition ambiguity according to context whenever multiple textual candidates are recognized as potential matches for the same mapped utterance. Matches are used to update the context model, which may be used for multiple users in the same context.Type: GrantFiled: August 10, 2012Date of Patent: May 26, 2015Assignee: International Business Machines CorporationInventors: Fernando Luiz Koch, Julio Nogima
-
Patent number: 9043213Abstract: A speech recognition method including the steps of receiving a speech input from a known speaker of a sequence of observations and determining the likelihood of a sequence of words arising from the sequence of observations using an acoustic model. The acoustic model has a plurality of model parameters describing probability distributions which relate a word or part thereof to an observation and has been trained using first training data and adapted using second training data to said speaker. The speech recognition method also determines the likelihood of a sequence of observations occurring in a given language using a language model and combines the likelihoods determined by the acoustic model and the language model and outputs a sequence of words identified from said speech input signal. The acoustic model is context based for the speaker, the context based information being contained in the model using a plurality of decision trees and the structure of the decision trees is based on second training data.Type: GrantFiled: January 26, 2011Date of Patent: May 26, 2015Assignee: Kabushiki Kaisha ToshibaInventor: Byung Ha Chun
-
Patent number: 9037464Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for computing numeric representations of words. One of the methods includes obtaining a set of training data, wherein the set of training data comprises sequences of words; training a classifier and an embedding function on the set of training data, wherein training the embedding function comprises obtained trained values of the embedding function parameters; processing each word in the vocabulary using the embedding function in accordance with the trained values of the embedding function parameters to generate a respective numerical representation of each word in the vocabulary in the high-dimensional space; and associating each word in the vocabulary with the respective numeric representation of the word in the high-dimensional space.Type: GrantFiled: March 15, 2013Date of Patent: May 19, 2015Assignee: Google Inc.Inventors: Tomas Mikolov, Kai Chen, Gregory S. Corrado, Jeffrey A. Dean
-
Patent number: 9037465Abstract: A method of detecting pre-determined phrases to determine compliance quality is provided. The method includes determining whether at least one of an event or a precursor event has occurred based on a comparison between pre-determined phrases and a communication between a sender and a recipient in a communications network, and rating the recipient based on the presence of the pre-determined phrases associated with the event or the presence of the pre-determined phrases associated with the precursor event in the communication.Type: GrantFiled: February 21, 2013Date of Patent: May 19, 2015Assignee: AT&T INTELLECTUAL PROPERTY I, L.P.Inventors: I. Dan Melamed, Andrej Ljolje, Bernard Renger, Yeon-Jun Kim, David J. Smith
-
Patent number: 9026442Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.Type: GrantFiled: August 14, 2014Date of Patent: May 5, 2015Assignee: AT&T Intellectual Property I, L.P.Inventors: Andrej Ljolje, Alistair D. Conkie, Ann K. Syrdal
-
Patent number: 9026429Abstract: In one embodiment, a method includes receiving one or more characters of a character string as a user enters the character string into a graphical user interface (GUI) of a computing device. The method also includes determining a degree of difficulty of the user entering the character string into the GUI of the computing device. The method further includes, if the degree of difficulty is at least approximately equal to or exceeds a pre-determined threshold, providing for display to the user an auto-suggestion for completing the character string for the user.Type: GrantFiled: December 5, 2012Date of Patent: May 5, 2015Assignee: Facebook, Inc.Inventors: Jenny Yuen, Luke St. Clair
-
Patent number: 9020818Abstract: Implementations of systems, method and devices described herein enable enhancing the intelligibility of a target voice signal included in a noisy audible signal received by a hearing aid device or the like. In particular, in some implementations, systems, methods and devices are operable to generate a machine readable formant based codebook. In some implementations, the method includes determining whether or not a candidate codebook tuple includes a sufficient amount of new information to warrant either adding the candidate codebook tuple to the codebook or using at least a portion of the candidate codebook tuple to update an existing codebook tuple. Additionally and/or alternatively, in some implementations systems, methods and devices are operable to reconstruct a target voice signal by detecting formants in an audible signal, using the detected formants to select codebook tuples, and using the formant information in the selected codebook tuples to reconstruct the target voice signal.Type: GrantFiled: August 20, 2012Date of Patent: April 28, 2015Assignee: Malaspina Labs (Barbados) Inc.Inventors: Pierre Zakarauskas, Alexander Escott, Clarence S. H. Chu, Shawn E. Stevenson
-
Patent number: 9015044Abstract: Implementations of systems, method and devices described herein enable enhancing the intelligibility of a target voice signal included in a noisy audible signal received by a hearing aid device or the like. In particular, in some implementations, systems, methods and devices are operable to generate a machine readable formant based codebook. In some implementations, the method includes determining whether or not a candidate codebook tuple includes a sufficient amount of new information to warrant either adding the candidate codebook tuple to the codebook or using at least a portion of the candidate codebook tuple to update an existing codebook tuple. Additionally and/or alternatively, in some implementations systems, methods and devices are operable to reconstruct a target voice signal by detecting formants in an audible signal, using the detected formants to select codebook tuples, and using the formant information in the selected codebook tuples to reconstruct the target voice signal.Type: GrantFiled: August 20, 2012Date of Patent: April 21, 2015Assignee: Malaspina Labs (Barbados) Inc.Inventors: Pierre Zakarauskas, Alexander Escott, Clarence S. H. Chu, Shawn E. Stevenson
-
Patent number: 9015047Abstract: A method, a computer readable medium and a system for reporting automatic speech recognition that comprises, collecting an utterance, analyzing the utterance, receiving a translation of the utterance, and determining a difference between the analyzed utterance and the translated utterance. An embodiment the disclosure includes updating the utterance analysis based upon the determined difference, correlating the analyzed utterance to the translated utterance and tracking determined difference by a translator. In another embodiment the disclose includes reporting, categorizing, sorting, and grouping the determined difference.Type: GrantFiled: February 12, 2007Date of Patent: April 21, 2015Assignee: West CorporationInventors: Aaron Scott Fisher, Prashanta Pradhan