Patents by Inventor Ariya Rastrow

Ariya Rastrow has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20250174231
    Abstract: A speech interface device is configured to detect an interrupt event and process a voice command without detecting a wakeword. The device includes on-device interrupt architecture configured to detect when device-directed speech is present and send audio data to a remote system for speech processing. This architecture includes an interrupt detector that detects an interrupt event (e.g., device-directed speech) with low latency, enabling the device to quickly lower a volume of output audio and/or perform other actions in response to a potential voice command. In addition, the architecture includes a device directed classifier that processes an entire utterance and corresponding semantic information and detects device-directed speech with high accuracy. Using the device directed classifier, the device may reject the interrupt event and increase a volume of the output audio or may accept the interrupt event, causing the output audio to end and performing speech processing on the audio data.
    Type: Application
    Filed: January 21, 2025
    Publication date: May 29, 2025
    Inventors: Ariya Rastrow, Eli Joshua Fidler, Roland Maximilian Rolf Maas, Nikko Strom, Aaron Eakin, Diamond Bishop, Bjorn Hoffmeister, Sanjeev Mishra
  • Patent number: 12236950
    Abstract: A speech interface device is configured to detect an interrupt event and process a voice command without detecting a wakeword. The device includes on-device interrupt architecture configured to detect when device-directed speech is present and send audio data to a remote system for speech processing. This architecture includes an interrupt detector that detects an interrupt event (e.g., device-directed speech) with low latency, enabling the device to quickly lower a volume of output audio and/or perform other actions in response to a potential voice command. In addition, the architecture includes a device directed classifier that processes an entire utterance and corresponding semantic information and detects device-directed speech with high accuracy. Using the device directed classifier, the device may reject the interrupt event and increase a volume of the output audio or may accept the interrupt event, causing the output audio to end and performing speech processing on the audio data.
    Type: Grant
    Filed: January 3, 2023
    Date of Patent: February 25, 2025
    Assignee: Amazon Technologies, Inc.
    Inventors: Ariya Rastrow, Eli Joshua Fidler, Roland Maximilian Rolf Maas, Nikko Strom, Aaron Eakin, Diamond Bishop, Bjorn Hoffmeister, Sanjeev Mishra
  • Patent number: 12211517
    Abstract: A speech-processing system may determine potential endpoints in a user's speech. Such endpoint prediction may include determining a potential endpoint in a stream of audio data, and may additionally including determining an endpoint score representing a likelihood that the potential endpoint represents an end of speech representing a complete user input. When the potential endpoint has been determined, the system may publish a transcript of speech that preceded the potential endpoint, and send it to downstream components. The system may continue to transcribe audio data and determine additional potential endpoints while the downstream components process the transcript. The downstream components may determine whether the transcript is complete; e.g., represents the entirety of the user input. Final endpoint determinations may be made based on the results of the downstream processing including automatic speech recognition, natural language understanding, etc.
    Type: Grant
    Filed: September 15, 2021
    Date of Patent: January 28, 2025
    Assignee: Amazon Technologies, Inc.
    Inventors: Roland Maximilian Rolf Maas, Bjorn Hoffmeister, Ariya Rastrow, James Garnet Droppo, Veerdhawal Pande, Maarten Van Segbroeck, Gautam Tiwari, Andrew Smith, Eli Joshua Fidler
  • Patent number: 12205574
    Abstract: Techniques for using multiple machine learning (ML) models, with varying compute costs, for ASR processing is described. The system may include an arbitrator component configured to determine which ML model is to be used to process an audio frame from a sequence of audio frames representing a spoken natural language input. The arbitrator component may switch between the ML models, on a frame-by-frame basis, to reduce an overall compute cost for the entire spoken natural language input. The outputs of the different ML models may be combined to determine the final output for the entire spoken natural language input.
    Type: Grant
    Filed: March 22, 2021
    Date of Patent: January 21, 2025
    Assignee: Amazon Technologies, Inc.
    Inventors: Grant Strimel, Ariya Rastrow, Jonathan Jenner Macoskey
  • Patent number: 12039975
    Abstract: A natural language system may be configured to act as a participant in a conversation between two users. The system may determine when a user expression such as speech, a gesture, or the like is directed from one user to the other. The system may processing input data related the expression (such as audio data, input data, language processing result data, conversation context data, etc.) to determine if the system should interject a response to the user-to-user expression. If so, the system may process the input data to determine a response and output it. The system may track that response as part of the data related to the ongoing conversation.
    Type: Grant
    Filed: December 4, 2020
    Date of Patent: July 16, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Prakash Krishnan, Arindam Mandal, Siddhartha Reddy Jonnalagadda, Nikko Strom, Ariya Rastrow, Shiv Naga Prasad Vitaladevuni, Angeliki Metallinou, Vincent Auvray, Minmin Shen, Josey Diego Sandoval, Rohit Prasad, Thomas Taylor, Amotz Maimon
  • Patent number: 12014726
    Abstract: Exemplary embodiments relate to adapting a generic language model during runtime using domain-specific language model data. The system performs an audio frame-level analysis, to determine if the utterance corresponds to a particular domain and whether the ASR hypothesis needs to be rescored. The system processes, using a trained classifier, the ASR hypothesis (a partial hypothesis) generated for the audio data processed so far. The system determines whether to rescore the hypothesis after every few audio frames (representing a word in the utterance) are processed by the speech recognition system.
    Type: Grant
    Filed: March 28, 2022
    Date of Patent: June 18, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Ankur Gandhe, Ariya Rastrow, Roland Maximilian Rolf Maas, Bjorn Hoffmeister
  • Patent number: 11908468
    Abstract: A system that is capable of resolving anaphora using timing data received by a local device. A local device outputs audio representing a list of entries. The audio may represent synthesized speech of the list of entries. A user can interrupt the device to select an entry in the list, such as by saying “that one.” The local device can determine an offset time representing the time between when audio playback began and when the user interrupted. The local device sends the offset time and audio data representing the utterance to a speech processing system which can then use the offset time and stored data to identify which entry on the list was most recently output by the local device when the user interrupted. The system can then resolve anaphora to match that entry and can perform additional processing based on the referred to item.
    Type: Grant
    Filed: December 4, 2020
    Date of Patent: February 20, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Prakash Krishnan, Arindam Mandal, Siddhartha Reddy Jonnalagadda, Nikko Strom, Ariya Rastrow, Ying Shi, David Chi-Wai Tang, Nishtha Gupta, Aaron Challenner, Bonan Zheng, Angeliki Metallinou, Vincent Auvray, Minmin Shen
  • Patent number: 11887583
    Abstract: Some devices may perform processing using machine learning models trained at a centralized system and distributed to the device. The centralized system may update the machine learning model and distribute the update to the device (or devices). To reduce the size of an update, the centralized system may train a model update object, which may be smaller in size than the model itself and thus more suitable for sending to the device(s). A device may receive the model update object and use it to update the on-device machine learning model; for example, by changing some parameters of the model. Parameters left unchanged during the update may retain their previous value. Thus, using the model update object to update the on-device model may result in a more accurate updated model when compared to sending an updated model compressed to a size similar to that of the model update object.
    Type: Grant
    Filed: June 9, 2021
    Date of Patent: January 30, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Grant Strimel, Jonathan Jenner Macoskey, Ariya Rastrow
  • Publication number: 20240029743
    Abstract: Some speech processing systems may handle some commands on-device rather than sending the audio data to a second device or system for processing. The first device may have limited speech processing capabilities sufficient for handling common language and/or commands, while the second device (e.g., an edge device and/or a remote system) may call on additional language models, entity libraries, skill components, etc. to perform additional tasks. An intermediate data generator may facilitate dividing speech processing operations between devices by generating a stream of data that includes a first-pass ASR output (e.g., a word or sub-word lattice) and other characteristics of the audio data such as whisper detection, speaker identification, media signatures, etc. The second device can perform the additional processing using the data stream; e.g., without using the audio data. Thus, privacy may be enhanced by processing the audio data locally without sending it to other devices/systems.
    Type: Application
    Filed: June 6, 2023
    Publication date: January 25, 2024
    Inventors: Stanislaw Ignacy Pasko, Pawel Zelazko, Cagdas Bak, Eli Joshua Fidler, Michal Kowalczuk, Andrew Oberlin, Ariya Rastrow
  • Publication number: 20230360633
    Abstract: Techniques for an interactive turn-based reading experience are described. A system may take turns reading content, such as a book, with a user. The system may process audio data representing a user reading a portion of the content, determine reading evaluation data, and determine how to proceed for the next turn based on the reading evaluation data. For example, based on the reading evaluation data, the system may read a portion of the content by outputting synthesized speech representing the content, may ask the user re-read a portion of the content, or may ask the user to read a different, smaller portion of the content.
    Type: Application
    Filed: March 13, 2023
    Publication date: November 9, 2023
    Inventors: Kevin Crews, Prasanna H. Sridhar, Ariya Rastrow, Nicholas Matthew Jutila, Andrew Oberlin, Samarth Batra, Paul Anthony Bernhardt, Veerdhawal Pande, Roland Maximilian Rolf Maas
  • Patent number: 11721347
    Abstract: Some speech processing systems may handle some commands on-device rather than sending the audio data to a second device or system for processing. The first device may have limited speech processing capabilities sufficient for handling common language and/or commands, while the second device (e.g., an edge device and/or a remote system) may call on additional language models, entity libraries, skill components, etc. to perform additional tasks. An intermediate data generator may facilitate dividing speech processing operations between devices by generating a stream of data that includes a first-pass ASR output (e.g., a word or sub-word lattice) and other characteristics of the audio data such as whisper detection, speaker identification, media signatures, etc. The second device can perform the additional processing using the data stream; e.g., without using the audio data. Thus, privacy may be enhanced by processing the audio data locally without sending it to other devices/systems.
    Type: Grant
    Filed: June 29, 2021
    Date of Patent: August 8, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Stanislaw Ignacy Pasko, Pawel Zelazko, Cagdas Bak, Eli Joshua Fidler, Michal Kowalczuk, Andrew Oberlin, Ariya Rastrow
  • Patent number: 11705116
    Abstract: Systems and methods described herein relate to adapting a language model for automatic speech recognition (ASR) for a new set of words. Instead of retraining the ASR models, language models and grammar models, the system only modifies one grammar model and ensures its compatibility with the existing models in the ASR system.
    Type: Grant
    Filed: August 18, 2021
    Date of Patent: July 18, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Ankur Gandhe, Ariya Rastrow, Gautam Tiwari, Ashish Vishwanath Shenoy, Chun Chen
  • Publication number: 20230223023
    Abstract: A speech interface device is configured to detect an interrupt event and process a voice command without detecting a wakeword. The device includes on-device interrupt architecture configured to detect when device-directed speech is present and send audio data to a remote system for speech processing. This architecture includes an interrupt detector that detects an interrupt event (e.g., device-directed speech) with low latency, enabling the device to quickly lower a volume of output audio and/or perform other actions in response to a potential voice command. In addition, the architecture includes a device directed classifier that processes an entire utterance and corresponding semantic information and detects device-directed speech with high accuracy. Using the device directed classifier, the device may reject the interrupt event and increase a volume of the output audio or may accept the interrupt event, causing the output audio to end and performing speech processing on the audio data.
    Type: Application
    Filed: January 3, 2023
    Publication date: July 13, 2023
    Inventors: Ariya Rastrow, Eli Joshua Fidler, Roland Maximilian Rolf Maas, Nikko Strom, Aaron Eakin, Diamond Bishop, Bjorn Hoffmeister, Sanjeev Mishra
  • Patent number: 11688394
    Abstract: This disclosure proposes systems and methods for leveraging entity-related language models in speech processing. A system can receive audio data corresponding to an utterance and perform automatic speech recognition (ASR) on a first portion of the audio data using a general language model. Based on the results, the system may identify a specific language model for processing a second portion of the audio data. The specific language model may include entities belonging to a common subject or class. The specific language model may, in some cases, provide better results than the general language model. While the general language model may describe a whole sentence, the specific language model may describe only a portion of a sentence. Thus, a top-level model may “activate” the specific language model when it may provide useful results. The resulting data may include results from both the general language model and the specific language model.
    Type: Grant
    Filed: August 31, 2020
    Date of Patent: June 27, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Denis Filimonov, Ravi Teja Gadde, Ariya Rastrow
  • Patent number: 11676575
    Abstract: A speech interface device is configured to receive response data from a remote speech processing system for responding to user speech. This response data may be enhanced with information such as remote NLU data. The response data from the remote speech processing system may be compared to local NLU data to improve a speech processing model on the device. Thus, the device may perform supervised on-device learning based on the remote NLU data. The device may determine differences between the updated speech processing model and an original speech processing model received from the remote system and may send data indicating these differences to the remote system. The remote system may aggregate data received from a plurality of devices and may generate an improved speech processing model.
    Type: Grant
    Filed: July 27, 2021
    Date of Patent: June 13, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Ariya Rastrow, Rohit Prasad, Nikko Strom
  • Patent number: 11670285
    Abstract: Techniques for an interactive turn-based reading experience are described. A system may take turns reading content, such as a book, with a user. The system may process audio data representing a user reading a portion of the content, determine reading evaluation data, and determine how to proceed for the next turn based on the reading evaluation data. For example, based on the reading evaluation data, the system may read a portion of the content by outputting synthesized speech representing the content, may ask the user re-read a portion of the content, or may ask the user to read a different, smaller portion of the content.
    Type: Grant
    Filed: November 24, 2020
    Date of Patent: June 6, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Kevin Crews, Prasanna H Sridhar, Ariya Rastrow, Nicholas Matthew Jutila, Andrew Oberlin, Samarth Batra, Paul Anthony Bernhardt, Veerdhawal Pande, Roland Maximilian Rolf Maas
  • Publication number: 20230012984
    Abstract: Systems, methods, and devices for computer-generating responses and sending responses to communications when the recipient of the communication is unavailable are disclosed. An individual may send a message (either audio or text) to a recipient. The recipient may be unavailable to contemporaneously respond to the message (e.g., the recipient may be performing an action that makes is difficult or impractical for the recipient to contemporaneously respond to the audio message). When the recipient is unavailable, a response to the message is generated and sent without receiving an instruction from the recipient to do so. The response may be sent to the message originating individual, and content of the response may thereafter be sent to the recipient to receive feedback regarding the correctness of the response. Alternatively, the response content may first be sent to the recipient to receive the feedback, and thereafter the response may be sent to the message originating individual.
    Type: Application
    Filed: September 16, 2022
    Publication date: January 19, 2023
    Inventors: Ariya Rastrow, Tony Hardie, Rohit Prasad
  • Patent number: 11551685
    Abstract: A speech interface device is configured to detect an interrupt event and process a voice command without detecting a wakeword. The device includes on-device interrupt architecture configured to detect when device-directed speech is present and send audio data to a remote system for speech processing. This architecture includes an interrupt detector that detects an interrupt event (e.g., device-directed speech) with low latency, enabling the device to quickly lower a volume of output audio and/or perform other actions in response to a potential voice command. In addition, the architecture includes a device directed classifier that processes an entire utterance and corresponding semantic information and detects device-directed speech with high accuracy. Using the device directed classifier, the device may reject the interrupt event and increase a volume of the output audio or may accept the interrupt event, causing the output audio to end and performing speech processing on the audio data.
    Type: Grant
    Filed: March 18, 2020
    Date of Patent: January 10, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Ariya Rastrow, Eli Joshua Fidler, Roland Maximilian Rolf Maas, Nikko Strom, Aaron Eakin, Diamond Bishop, Bjorn Hoffmeister, Sanjeev Mishra
  • Publication number: 20220358908
    Abstract: Exemplary embodiments relate to adapting a generic language model during runtime using domain-specific language model data. The system performs an audio frame-level analysis, to determine if the utterance corresponds to a particular domain and whether the ASR hypothesis needs to be rescored. The system processes, using a trained classifier, the ASR hypothesis (a partial hypothesis) generated for the audio data processed so far. The system determines whether to rescore the hypothesis after every few audio frames (representing a word in the utterance) are processed by the speech recognition system.
    Type: Application
    Filed: March 28, 2022
    Publication date: November 10, 2022
    Inventors: Ankur Gandhe, Ariya Rastrow, Roland Maximilian Rolf Maas, Bjorn Hoffmeister
  • Patent number: 11496582
    Abstract: Systems, methods, and devices for computer-generating responses and sending responses to communications when the recipient of the communication is unavailable are disclosed. An individual may send a message (either audio or text) to a recipient. The recipient may be unavailable to contemporaneously respond to the message (e.g., the recipient may be performing an action that makes is difficult or impractical for the recipient to contemporaneously respond to the audio message). When the recipient is unavailable, a response to the message is generated and sent without receiving an instruction from the recipient to do so. The response may be sent to the message originating individual, and content of the response may thereafter be sent to the recipient to receive feedback regarding the correctness of the response. Alternatively, the response content may first be sent to the recipient to receive the feedback, and thereafter the response may be sent to the message originating individual.
    Type: Grant
    Filed: June 27, 2019
    Date of Patent: November 8, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Ariya Rastrow, Tony Hardie, Rohit Prasad