Patents by Inventor Ariya Rastrow

Ariya Rastrow has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11908468
    Abstract: A system that is capable of resolving anaphora using timing data received by a local device. A local device outputs audio representing a list of entries. The audio may represent synthesized speech of the list of entries. A user can interrupt the device to select an entry in the list, such as by saying “that one.” The local device can determine an offset time representing the time between when audio playback began and when the user interrupted. The local device sends the offset time and audio data representing the utterance to a speech processing system which can then use the offset time and stored data to identify which entry on the list was most recently output by the local device when the user interrupted. The system can then resolve anaphora to match that entry and can perform additional processing based on the referred to item.
    Type: Grant
    Filed: December 4, 2020
    Date of Patent: February 20, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Prakash Krishnan, Arindam Mandal, Siddhartha Reddy Jonnalagadda, Nikko Strom, Ariya Rastrow, Ying Shi, David Chi-Wai Tang, Nishtha Gupta, Aaron Challenner, Bonan Zheng, Angeliki Metallinou, Vincent Auvray, Minmin Shen
  • Patent number: 11887583
    Abstract: Some devices may perform processing using machine learning models trained at a centralized system and distributed to the device. The centralized system may update the machine learning model and distribute the update to the device (or devices). To reduce the size of an update, the centralized system may train a model update object, which may be smaller in size than the model itself and thus more suitable for sending to the device(s). A device may receive the model update object and use it to update the on-device machine learning model; for example, by changing some parameters of the model. Parameters left unchanged during the update may retain their previous value. Thus, using the model update object to update the on-device model may result in a more accurate updated model when compared to sending an updated model compressed to a size similar to that of the model update object.
    Type: Grant
    Filed: June 9, 2021
    Date of Patent: January 30, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Grant Strimel, Jonathan Jenner Macoskey, Ariya Rastrow
  • Publication number: 20240029743
    Abstract: Some speech processing systems may handle some commands on-device rather than sending the audio data to a second device or system for processing. The first device may have limited speech processing capabilities sufficient for handling common language and/or commands, while the second device (e.g., an edge device and/or a remote system) may call on additional language models, entity libraries, skill components, etc. to perform additional tasks. An intermediate data generator may facilitate dividing speech processing operations between devices by generating a stream of data that includes a first-pass ASR output (e.g., a word or sub-word lattice) and other characteristics of the audio data such as whisper detection, speaker identification, media signatures, etc. The second device can perform the additional processing using the data stream; e.g., without using the audio data. Thus, privacy may be enhanced by processing the audio data locally without sending it to other devices/systems.
    Type: Application
    Filed: June 6, 2023
    Publication date: January 25, 2024
    Inventors: Stanislaw Ignacy Pasko, Pawel Zelazko, Cagdas Bak, Eli Joshua Fidler, Michal Kowalczuk, Andrew Oberlin, Ariya Rastrow
  • Publication number: 20230360633
    Abstract: Techniques for an interactive turn-based reading experience are described. A system may take turns reading content, such as a book, with a user. The system may process audio data representing a user reading a portion of the content, determine reading evaluation data, and determine how to proceed for the next turn based on the reading evaluation data. For example, based on the reading evaluation data, the system may read a portion of the content by outputting synthesized speech representing the content, may ask the user re-read a portion of the content, or may ask the user to read a different, smaller portion of the content.
    Type: Application
    Filed: March 13, 2023
    Publication date: November 9, 2023
    Inventors: Kevin Crews, Prasanna H. Sridhar, Ariya Rastrow, Nicholas Matthew Jutila, Andrew Oberlin, Samarth Batra, Paul Anthony Bernhardt, Veerdhawal Pande, Roland Maximilian Rolf Maas
  • Patent number: 11721347
    Abstract: Some speech processing systems may handle some commands on-device rather than sending the audio data to a second device or system for processing. The first device may have limited speech processing capabilities sufficient for handling common language and/or commands, while the second device (e.g., an edge device and/or a remote system) may call on additional language models, entity libraries, skill components, etc. to perform additional tasks. An intermediate data generator may facilitate dividing speech processing operations between devices by generating a stream of data that includes a first-pass ASR output (e.g., a word or sub-word lattice) and other characteristics of the audio data such as whisper detection, speaker identification, media signatures, etc. The second device can perform the additional processing using the data stream; e.g., without using the audio data. Thus, privacy may be enhanced by processing the audio data locally without sending it to other devices/systems.
    Type: Grant
    Filed: June 29, 2021
    Date of Patent: August 8, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Stanislaw Ignacy Pasko, Pawel Zelazko, Cagdas Bak, Eli Joshua Fidler, Michal Kowalczuk, Andrew Oberlin, Ariya Rastrow
  • Patent number: 11705116
    Abstract: Systems and methods described herein relate to adapting a language model for automatic speech recognition (ASR) for a new set of words. Instead of retraining the ASR models, language models and grammar models, the system only modifies one grammar model and ensures its compatibility with the existing models in the ASR system.
    Type: Grant
    Filed: August 18, 2021
    Date of Patent: July 18, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Ankur Gandhe, Ariya Rastrow, Gautam Tiwari, Ashish Vishwanath Shenoy, Chun Chen
  • Publication number: 20230223023
    Abstract: A speech interface device is configured to detect an interrupt event and process a voice command without detecting a wakeword. The device includes on-device interrupt architecture configured to detect when device-directed speech is present and send audio data to a remote system for speech processing. This architecture includes an interrupt detector that detects an interrupt event (e.g., device-directed speech) with low latency, enabling the device to quickly lower a volume of output audio and/or perform other actions in response to a potential voice command. In addition, the architecture includes a device directed classifier that processes an entire utterance and corresponding semantic information and detects device-directed speech with high accuracy. Using the device directed classifier, the device may reject the interrupt event and increase a volume of the output audio or may accept the interrupt event, causing the output audio to end and performing speech processing on the audio data.
    Type: Application
    Filed: January 3, 2023
    Publication date: July 13, 2023
    Inventors: Ariya Rastrow, Eli Joshua Fidler, Roland Maximilian Rolf Maas, Nikko Strom, Aaron Eakin, Diamond Bishop, Bjorn Hoffmeister, Sanjeev Mishra
  • Patent number: 11688394
    Abstract: This disclosure proposes systems and methods for leveraging entity-related language models in speech processing. A system can receive audio data corresponding to an utterance and perform automatic speech recognition (ASR) on a first portion of the audio data using a general language model. Based on the results, the system may identify a specific language model for processing a second portion of the audio data. The specific language model may include entities belonging to a common subject or class. The specific language model may, in some cases, provide better results than the general language model. While the general language model may describe a whole sentence, the specific language model may describe only a portion of a sentence. Thus, a top-level model may “activate” the specific language model when it may provide useful results. The resulting data may include results from both the general language model and the specific language model.
    Type: Grant
    Filed: August 31, 2020
    Date of Patent: June 27, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Denis Filimonov, Ravi Teja Gadde, Ariya Rastrow
  • Patent number: 11676575
    Abstract: A speech interface device is configured to receive response data from a remote speech processing system for responding to user speech. This response data may be enhanced with information such as remote NLU data. The response data from the remote speech processing system may be compared to local NLU data to improve a speech processing model on the device. Thus, the device may perform supervised on-device learning based on the remote NLU data. The device may determine differences between the updated speech processing model and an original speech processing model received from the remote system and may send data indicating these differences to the remote system. The remote system may aggregate data received from a plurality of devices and may generate an improved speech processing model.
    Type: Grant
    Filed: July 27, 2021
    Date of Patent: June 13, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Ariya Rastrow, Rohit Prasad, Nikko Strom
  • Patent number: 11670285
    Abstract: Techniques for an interactive turn-based reading experience are described. A system may take turns reading content, such as a book, with a user. The system may process audio data representing a user reading a portion of the content, determine reading evaluation data, and determine how to proceed for the next turn based on the reading evaluation data. For example, based on the reading evaluation data, the system may read a portion of the content by outputting synthesized speech representing the content, may ask the user re-read a portion of the content, or may ask the user to read a different, smaller portion of the content.
    Type: Grant
    Filed: November 24, 2020
    Date of Patent: June 6, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Kevin Crews, Prasanna H Sridhar, Ariya Rastrow, Nicholas Matthew Jutila, Andrew Oberlin, Samarth Batra, Paul Anthony Bernhardt, Veerdhawal Pande, Roland Maximilian Rolf Maas
  • Publication number: 20230012984
    Abstract: Systems, methods, and devices for computer-generating responses and sending responses to communications when the recipient of the communication is unavailable are disclosed. An individual may send a message (either audio or text) to a recipient. The recipient may be unavailable to contemporaneously respond to the message (e.g., the recipient may be performing an action that makes is difficult or impractical for the recipient to contemporaneously respond to the audio message). When the recipient is unavailable, a response to the message is generated and sent without receiving an instruction from the recipient to do so. The response may be sent to the message originating individual, and content of the response may thereafter be sent to the recipient to receive feedback regarding the correctness of the response. Alternatively, the response content may first be sent to the recipient to receive the feedback, and thereafter the response may be sent to the message originating individual.
    Type: Application
    Filed: September 16, 2022
    Publication date: January 19, 2023
    Inventors: Ariya Rastrow, Tony Hardie, Rohit Prasad
  • Patent number: 11551685
    Abstract: A speech interface device is configured to detect an interrupt event and process a voice command without detecting a wakeword. The device includes on-device interrupt architecture configured to detect when device-directed speech is present and send audio data to a remote system for speech processing. This architecture includes an interrupt detector that detects an interrupt event (e.g., device-directed speech) with low latency, enabling the device to quickly lower a volume of output audio and/or perform other actions in response to a potential voice command. In addition, the architecture includes a device directed classifier that processes an entire utterance and corresponding semantic information and detects device-directed speech with high accuracy. Using the device directed classifier, the device may reject the interrupt event and increase a volume of the output audio or may accept the interrupt event, causing the output audio to end and performing speech processing on the audio data.
    Type: Grant
    Filed: March 18, 2020
    Date of Patent: January 10, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Ariya Rastrow, Eli Joshua Fidler, Roland Maximilian Rolf Maas, Nikko Strom, Aaron Eakin, Diamond Bishop, Bjorn Hoffmeister, Sanjeev Mishra
  • Publication number: 20220358908
    Abstract: Exemplary embodiments relate to adapting a generic language model during runtime using domain-specific language model data. The system performs an audio frame-level analysis, to determine if the utterance corresponds to a particular domain and whether the ASR hypothesis needs to be rescored. The system processes, using a trained classifier, the ASR hypothesis (a partial hypothesis) generated for the audio data processed so far. The system determines whether to rescore the hypothesis after every few audio frames (representing a word in the utterance) are processed by the speech recognition system.
    Type: Application
    Filed: March 28, 2022
    Publication date: November 10, 2022
    Inventors: Ankur Gandhe, Ariya Rastrow, Roland Maximilian Rolf Maas, Bjorn Hoffmeister
  • Patent number: 11496582
    Abstract: Systems, methods, and devices for computer-generating responses and sending responses to communications when the recipient of the communication is unavailable are disclosed. An individual may send a message (either audio or text) to a recipient. The recipient may be unavailable to contemporaneously respond to the message (e.g., the recipient may be performing an action that makes is difficult or impractical for the recipient to contemporaneously respond to the audio message). When the recipient is unavailable, a response to the message is generated and sent without receiving an instruction from the recipient to do so. The response may be sent to the message originating individual, and content of the response may thereafter be sent to the recipient to receive feedback regarding the correctness of the response. Alternatively, the response content may first be sent to the recipient to receive the feedback, and thereafter the response may be sent to the message originating individual.
    Type: Grant
    Filed: June 27, 2019
    Date of Patent: November 8, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Ariya Rastrow, Tony Hardie, Rohit Prasad
  • Patent number: 11398236
    Abstract: Features are disclosed for generating intent-specific results in an automatic speech recognition system. The results can be generated by utilizing a decoding graph containing tags that identify portions of the graph corresponding to a given intent. The tags can also identify high-information content slots and low-information carrier phrases for a given intent. The automatic speech recognition system may utilize these tags to provide a semantic representation based on a plurality of different tokens for the content slot portions and low information for the carrier portions. A user can be presented with a user interface containing top intent results with corresponding intent-specific top content slot values.
    Type: Grant
    Filed: May 21, 2020
    Date of Patent: July 26, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Hugh Evan Secker-Walker, Aaron Lee Mathers Challenner, Ariya Rastrow
  • Patent number: 11302310
    Abstract: Exemplary embodiments relate to adapting a generic language model during runtime using domain-specific language model data. The system performs an audio frame-level analysis, to determine if the utterance corresponds to a particular domain and whether the ASR hypothesis needs to be rescored. The system processes, using a trained classifier, the ASR hypothesis (a partial hypothesis) generated for the audio data processed so far. The system determines whether to rescore the hypothesis after every few audio frames (representing a word in the utterance) are processed by the speech recognition system.
    Type: Grant
    Filed: May 30, 2019
    Date of Patent: April 12, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Ankur Gandhe, Ariya Rastrow, Roland Maximilian Rolf Maas, Bjorn Hoffmeister
  • Publication number: 20220093093
    Abstract: A system can operate a speech-controlled device in a mode where the speech-controlled device determines that an utterance is directed at the speech-controlled device using image data showing the user speaking the utterance. If the user is directing the user's gaze at the speech-controlled device while speaking, the system may determine the utterance is system directed and thus may perform further speech processing based on the utterance. If the user's gaze is directed elsewhere, the system may determine the utterance is not system directed (for example directed at another user) and thus the system may not perform further speech processing based on the utterance and may take other actions, for example discarding audio data of the utterance.
    Type: Application
    Filed: December 4, 2020
    Publication date: March 24, 2022
    Inventors: Prakash Krishnan, Arindam Mandal, Nikko Strom, Pradeep Natarajan, Ariya Rastrow, Shiv Naga Prasad Vitaladevuni, David Chi-Wai Tang, Aaron Challenner, Xu Zhang, Krishna Anisetty, Josey Diego Sandoval, Rohit Prasad, Premkumar Natarajan
  • Publication number: 20220093101
    Abstract: A system that is capable of resolving anaphora using timing data received by a local device. A local device outputs audio representing a list of entries. The audio may represent synthesized speech of the list of entries. A user can interrupt the device to select an entry in the list, such as by saying “that one.” The local device can determine an offset time representing the time between when audio playback began and when the user interrupted. The local device sends the offset time and audio data representing the utterance to a speech processing system which can then use the offset time and stored data to identify which entry on the list was most recently output by the local device when the user interrupted. The system can then resolve anaphora to match that entry and can perform additional processing based on the referred to item.
    Type: Application
    Filed: December 4, 2020
    Publication date: March 24, 2022
    Inventors: Prakash Krishnan, Arindam Mandal, Siddhartha Reddy Jonnalagadda, Nikko Strom, Ariya Rastrow, Ying Shi, David Chi-Wai Tang, Nishtha Gupta, Aaron Challenner, Bonan Zheng, Angeliki Metallinou, Vincent Auvray, Minmin Shen
  • Publication number: 20220093094
    Abstract: A natural language system may be configured to act as a participant in a conversation between two users. The system may determine when a user expression such as speech, a gesture, or the like is directed from one user to the other. The system may processing input data related the expression (such as audio data, input data, language processing result data, conversation context data, etc.) to determine if the system should interject a response to the user-to-user expression. If so, the system may process the input data to determine a response and output it. The system may track that response as part of the data related to the ongoing conversation.
    Type: Application
    Filed: December 4, 2020
    Publication date: March 24, 2022
    Inventors: Prakash Krishnan, Arindam Mandal, Siddhartha Reddy Jonnalagadda, Nikko Strom, Ariya Rastrow, Shiv Naga Prasad Vitaladevuni, Angeliki Metallinou, Vincent Auvray, Minmin Shen, Josey Diego Sandoval, Rohit Prasad, Thomas Taylor, Amotz Maimon
  • Publication number: 20220036893
    Abstract: Systems and methods described herein relate to adapting a language model for automatic speech recognition (ASR) for a new set of words. Instead of retraining the ASR models, language models and grammar models, the system only modifies one grammar model and ensures its compatibility with the existing models in the ASR system.
    Type: Application
    Filed: August 18, 2021
    Publication date: February 3, 2022
    Inventors: Ankur Gandhe, Ariya Rastrow, Gautam Tiwari, Ashish Vishwanath Shenoy, Chun Chen