Patents by Inventor Ariya Rastrow

Ariya Rastrow has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Dynamic arc weights in speech recognition models

Patent number: 10140981

Abstract: Features are disclosed for performing speech recognition on utterances using dynamic weights with speech recognition models. An automatic speech recognition system may use a general speech recognition model, such a large finite state transducer-based language model, to generate speech recognition results for various utterances. The general speech recognition model may include sub-models or other portions that are customized for particular tasks, such as speech recognition on utterances regarding particular topics. Individual weights within the general speech recognition model can be dynamically replaced based on the context in which an utterance is made or received, thereby providing a further degree of customization without requiring additional speech recognition models to generated, maintained, or loaded.

Type: Grant

Filed: June 10, 2014

Date of Patent: November 27, 2018

Assignee: Amazon Technologies, Inc.

Inventors: Denis Sergeyevich Filimonov, Ariya Rastrow
Language model speech endpointing

Patent number: 10121471

Abstract: An automatic speech recognition (ASR) system detects an endpoint of an utterance using the active hypotheses under consideration by a decoder. The ASR system calculates the amount of non-speech detected by a plurality of hypotheses and weights the non-speech duration by the probability of each hypotheses. When the aggregate weighted non-speech exceeds a threshold, an endpoint may be declared.

Type: Grant

Filed: June 29, 2015

Date of Patent: November 6, 2018

Assignee: Amazon Technologies, Inc.

Inventors: Bjorn Hoffmeister, Ariya Rastrow, Baiyang Liu
Automatic speech recognition incorporating word usage information

Patent number: 10121467

Abstract: A language model for automatic speech processing, such as a finite state transducer (FST) may be configured to incorporate information about how a particular word sequence (N-gram) may be used in a similar manner from another N-gram. A score of a component of the FST (such as an arc or state) relating to the first N-gram may be based on information of the second N-gram. Further, the FST may be configured to have an arc between a state of the first N-gram and a state of the second N-gram to allow for cross N-gram back off, rather than backoff from a larger N-gram to a smaller N-gram during traversal of the FST during speech processing.

Type: Grant

Filed: June 30, 2016

Date of Patent: November 6, 2018

Assignee: Amazon Technologies, Inc.

Inventors: Ankur Gandhe, Denis Sergeyevich Filimonov, Ariya Rastrow, Björn Hoffmeister
Generation of predictive natural language processing models

Patent number: 10049656

Abstract: Features are disclosed for generating predictive personal natural language processing models based on user-specific profile information. The predictive personal models can provide broader coverage of the various terms, named entities, and/or intents of an utterance by the user than a personal model, while providing better accuracy than a general model. Profile information may be obtained from various data sources. Predictions regarding the content or subject of future user utterances may be made from the profile information. Predictive personal models may be generated based on the predictions. Future user utterances may be processed using the predictive personal models.

Type: Grant

Filed: September 20, 2013

Date of Patent: August 14, 2018

Assignee: Amazon Technologies, Inc.

Inventors: William Folwell Barton, Rohit Prasad, Stephen Frederick Potter, Nikko Strom, Yuzo Watanabe, Madan Mohan Rao Jampani, Ariya Rastrow, Arushan Rajasekaram
Speech processing with learned representation of user interaction history

Patent number: 10032463

Abstract: An automatic speech recognition (“ASR”) system produces, for particular users, customized speech recognition results by using data regarding prior interactions of the users with the system. A portion of the ASR system (e.g., a neural-network-based language model) can be trained to produce an encoded representation of a user's interactions with the system based on, e.g., transcriptions of prior utterances made by the user. This user-specific encoded representation of interaction history is then used by the language model to customize ASR processing for the user.

Type: Grant

Filed: December 29, 2015

Date of Patent: July 24, 2018

Assignee: Amazon Technologies, Inc.

Inventors: Ariya Rastrow, Nikko Ström, Spyridon Matsoukas, Markus Dreyer, Ankur Gandhe, Denis Sergeyevich Filimonov, Julian Chan, Rohit Prasad
Compact HCLG FST

Patent number: 10013974

Abstract: Compact finite state transducers (FSTs) for automatic speech recognition (ASR). An HCLG FST and/or G FST may be compacted at training time to reduce the size of the FST to be used at runtime. The compact FSTs may be significantly smaller (e.g., 50% smaller) in terms of memory size, thus reducing the use of computing resources at runtime to operate the FSTs. The individual arcs and states of each FST may be compacted by binning individual weights, thus reducing the number of bits needed for each weight. Further, certain fields such as a next state ID may be left out of a compact FST if an estimation technique can be used to reproduce the next state at runtime. During runtime portions of the FSTs may be decompressed for processing by an ASR engine.

Type: Grant

Filed: June 20, 2016

Date of Patent: July 3, 2018

Assignee: Amazon Technologies, Inc.

Inventors: Denis Sergeyevich Filimonov, Gautam Tiwari, Shaun Nidhiri Joseph, Ariya Rastrow
Customized speech processing language models

Patent number: 9934777

Abstract: User-specific language models (LMs) that include internal word indexes to a word table specific to the user-specific LM rather than a word table specific to a system-wide LM. When the system-wide LM is updated, the word table of the user-specific LM may be updated to translate the user-specific indices to system-wide indices. This prevents having to update the internal indices of the user-specific LM every time the system-wide LM is updated.

Type: Grant

Filed: August 26, 2016

Date of Patent: April 3, 2018

Assignee: Amazon Technologies, Inc.

Inventors: Shaun Nidhiri Joseph, Sonal Pareek, Ariya Rastrow, Gautam Tiwari, Alexander David Rosen
Intent-specific automatic speech recognition result generation

Patent number: 9922650

Abstract: Features are disclosed for generating intent-specific results in an automatic speech recognition system. The results can be generated by utilizing a decoding graph containing tags that identify portions of the graph corresponding to a given intent. The tags can also identify high-information content slots and low-information carrier phrases for a given intent. The automatic speech recognition system may utilize these tags to provide a semantic representation based on a plurality of different tokens for the content slot portions and low information for the carrier portions. A user can be presented with a user interface containing top intent results with corresponding intent-specific top content slot values.

Type: Grant

Filed: December 20, 2013

Date of Patent: March 20, 2018

Assignee: Amazon Technologies, Inc.

Inventors: Hugh Evan Secker-Walker, Aaron Lee Mathers Challenner, Ariya Rastrow
Class-based discriminative training of speech models

Patent number: 9892726

Abstract: Features are disclosed for modifying a statistical model to more accurately discriminate between classes of input data. A subspace of the total model parameter space can be learned such that individual points in the subspace, corresponding to the various classes, are discriminative with respect to the classes. The subspace can be learned using an iterative process whereby an initial subspace is used to generate data and maximize an objective function. The objective function can correspond to maximizing the posterior probability of the correct class for a given input. The initial subspace, data, and objective function can be used to generate a new subspace that better discriminates between classes. The process may be repeated as desired. A model modified using such a subspace can be used to classify input data.

Type: Grant

Filed: December 17, 2014

Date of Patent: February 13, 2018

Assignee: Amazon Technologies, Inc.

Inventors: Sri Venkata Surya Siva Rama Krishna Garimella, Spyridon Matsoukas, Ariya Rastrow, Bjorn Hoffmeister
Compressed finite state transducers for automatic speech recognition

Patent number: 9865254

Abstract: Compact finite state transducers (FSTs) for automatic speech recognition (ASR). An HCLG FST and/or G FST may be compacted at training time to reduce the size of the FST to be used at runtime. The compact FSTs may be significantly smaller (e.g., 50% smaller) in terms of memory size, thus reducing the use of computing resources at runtime to operate the FSTs. The individual arcs and states of each FST may be compacted by binning individual weights, thus reducing the number of bits needed for each weight. Further, certain fields such as a next state ID may be left out of a compact FST if an estimation technique can be used to reproduce the next state at runtime. During runtime portions of the FSTs may be decompressed for processing by an ASR engine.

Type: Grant

Filed: June 20, 2016

Date of Patent: January 9, 2018

Assignee: Amazon Technologies, Inc.

Inventors: Denis Sergeyevich Filimonov, Gautam Tiwari, Shaun Nidhiri Joseph, Ariya Rastrow
Generative modeling of speech using neural networks

Patent number: 9653093

Abstract: Features are disclosed for using an artificial neural network to generate customized speech recognition models during the speech recognition process. By dynamically generating the speech recognition models during the speech recognition process, the models can be customized based on the specific context of individual frames within the audio data currently being processed. In this way, dependencies between frames in the current sequence can form the basis of the models used to score individual frames of the current sequence. Thus, each frame of the current sequence (or some subset thereof) may be scored using one or more models customized for the particular frame in context.

Type: Grant

Filed: August 19, 2014

Date of Patent: May 16, 2017

Assignee: Amazon Technologies, Inc.

Inventors: Spyridon Matsoukas, Nikko Ström, Ariya Rastrow, Sri Venkata Surya Siva Rama Krishna Garimella
Markov-based sequence tagging using neural networks

Patent number: 9600764

Abstract: Features are disclosed for using a neural network to tag sequential input without using an internal representation of the neural network generated when scoring previous positions in the sequence. A predicted or determined label (e.g., the highest scoring or otherwise most probable label) for input at a given position in the sequence can be used when scoring input corresponding to the next position the sequence. Additional features are disclosed for training a neural network for use in tagging sequential input without using an internal representation of the neural network generated when scoring previous positions the sequence.

Type: Grant

Filed: June 17, 2014

Date of Patent: March 21, 2017

Assignee: Amazon Technologies, Inc.

Inventors: Ariya Rastrow, Spyros Matsoukas, Sri Venkata Surya Siva Rama Krishna Garimella, Nikko Ström, Bjorn Hoffmeister
LANGUAGE MODEL SPEECH ENDPOINTING

Publication number: 20160379632

Abstract: An automatic speech recognition (ASR) system detects an endpoint of an utterance using the active hypotheses under consideration by a decoder. The ASR system calculates the amount of non-speech detected by a plurality of hypotheses and weights the non-speech duration by the probability of each hypotheses. When the aggregate weighted non-speech exceeds a threshold, an endpoint may be declared.

Type: Application

Filed: June 29, 2015

Publication date: December 29, 2016

Inventors: Bjorn Hoffmeister, Ariya Rastrow, Baiyang Liu
Speech recognition with combined grammar and statistical language models

Patent number: 9449598

Abstract: Features are disclosed for performing speech recognition on utterances using a grammar and a statistical language model, such as an n-gram model. States of the grammar may correspond to states of the statistical language model. Speech recognition may be initiated using the grammar. At a given state of the grammar, speech recognition may continue at a corresponding state of the statistical language model. Speech recognition may continue using the grammar in parallel with the statistical language model, or it may continue using the statistical language model exclusively. Scores associated with the correspondences between states (e.g., backoff arcs) may be determined according to a heuristically or based on test data.

Type: Grant

Filed: September 26, 2013

Date of Patent: September 20, 2016

Assignee: Amazon Technologies, Inc.

Inventors: Ariya Rastrow, Bjorn Hoffmeister, Sri Venkata Surya Siva Rama Krishna Garimella, Rohit Krishna Prasad
PREDICTING PRONUNCIATION IN SPEECH RECOGNITION

Publication number: 20150255069

Abstract: An automatic speech recognition (ASR) device may be configured to predict pronunciations of textual identifiers (for example, song names, etc.) based on predicting one or more languages of origin of the textual identifier. The one or more languages of origin may be determined based on the textual identifier. The pronunciations may include a hybrid pronunciation including a pronunciation in one language, a pronunciation in a second language and a hybrid pronunciation that combines multiple languages. The pronunciations may be added to a lexicon and matched to the content item (e.g., song) and/or textual identifier. The ASR device may receive a spoken utterance from a user requesting the ASR device to access the content item. The ASR device determines whether the spoken utterance matches one of the pronunciations of the content item in the lexicon. The ASR device then accesses the content when the spoken utterance matches one of the potential textual identifier pronunciations.

Type: Application

Filed: March 4, 2014

Publication date: September 10, 2015

Applicant: Amazon Technologies, Inc.

Inventors: Jeffrey Penrod Adams, Alok Ulhas Parlikar, Jeffrey Paul Lilly, Ariya Rastrow

prev 1 2 3