Patents by Inventor Nikko Strom

Nikko Strom has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Dialog management for multiple users

Patent number: 11908468

Abstract: A system that is capable of resolving anaphora using timing data received by a local device. A local device outputs audio representing a list of entries. The audio may represent synthesized speech of the list of entries. A user can interrupt the device to select an entry in the list, such as by saying “that one.” The local device can determine an offset time representing the time between when audio playback began and when the user interrupted. The local device sends the offset time and audio data representing the utterance to a speech processing system which can then use the offset time and stored data to identify which entry on the list was most recently output by the local device when the user interrupted. The system can then resolve anaphora to match that entry and can perform additional processing based on the referred to item.

Type: Grant

Filed: December 4, 2020

Date of Patent: February 20, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Prakash Krishnan, Arindam Mandal, Siddhartha Reddy Jonnalagadda, Nikko Strom, Ariya Rastrow, Ying Shi, David Chi-Wai Tang, Nishtha Gupta, Aaron Challenner, Bonan Zheng, Angeliki Metallinou, Vincent Auvray, Minmin Shen
Distributed model training

Patent number: 11853391

Abstract: Exemplary embodiments provide distributed parallel training of a machine learning model. Multiple processors may be used to train a machine learning model to reduce training time. To synchronize trained model data between the processors, data is communicated between the processors after some number of training cycles. To improve the communication efficiency, exemplary embodiments synchronize data among a set of processors after a predetermined number of training cycles, and synchronize data between one or more processors of each set of the processors after a predetermined number of training cycles. During the first synchronization among a set of processors, compressed model gradient data generated after performing the training cycles may be communicated. During the second synchronization between the set of processors, trained models or full model gradient data generated after performing the training cycles may be communicated.

Type: Grant

Filed: September 24, 2018

Date of Patent: December 26, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Pranav Prashant Ladkat, Oleg Rybakov, Nikko Strom, Sri Venkata Surya Siva Rama Krishna Garimella, Sree Hari Krishnan Parthasarathi
DEVICE-DIRECTED UTTERANCE DETECTION

Publication number: 20230223023

Abstract: A speech interface device is configured to detect an interrupt event and process a voice command without detecting a wakeword. The device includes on-device interrupt architecture configured to detect when device-directed speech is present and send audio data to a remote system for speech processing. This architecture includes an interrupt detector that detects an interrupt event (e.g., device-directed speech) with low latency, enabling the device to quickly lower a volume of output audio and/or perform other actions in response to a potential voice command. In addition, the architecture includes a device directed classifier that processes an entire utterance and corresponding semantic information and detects device-directed speech with high accuracy. Using the device directed classifier, the device may reject the interrupt event and increase a volume of the output audio or may accept the interrupt event, causing the output audio to end and performing speech processing on the audio data.

Type: Application

Filed: January 3, 2023

Publication date: July 13, 2023

Inventors: Ariya Rastrow, Eli Joshua Fidler, Roland Maximilian Rolf Maas, Nikko Strom, Aaron Eakin, Diamond Bishop, Bjorn Hoffmeister, Sanjeev Mishra
On-device learning in a hybrid speech processing system

Patent number: 11676575

Abstract: A speech interface device is configured to receive response data from a remote speech processing system for responding to user speech. This response data may be enhanced with information such as remote NLU data. The response data from the remote speech processing system may be compared to local NLU data to improve a speech processing model on the device. Thus, the device may perform supervised on-device learning based on the remote NLU data. The device may determine differences between the updated speech processing model and an original speech processing model received from the remote system and may send data indicating these differences to the remote system. The remote system may aggregate data received from a plurality of devices and may generate an improved speech processing model.

Type: Grant

Filed: July 27, 2021

Date of Patent: June 13, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Ariya Rastrow, Rohit Prasad, Nikko Strom
SENTIMENT AWARE VOICE USER INTERFACE

Publication number: 20230119954

Abstract: Described herein is a system for responding to a frustrated user with a response determined based on spoken language understanding (SLU) processing of a user input. The system detects user frustration and responds to a repeated user input by confirming an action to be performed or presenting an alternative action, instead of performing the action responsive to the user input. The system also detects poor audio quality of the captured user input, and responds by requesting the user to repeat the user input. The system processes sentiment data and signal quality data to respond to user inputs.

Type: Application

Filed: October 27, 2022

Publication date: April 20, 2023

Inventors: Isaac Joseph Madwed, Julia Kennedy Nemer, Joo-Kyung Kim, Nikko Strom, Steven Mack Saunders, Laura Maggia Panfili, Anna Caitlin Jentoft, Sungjin Lee, David Thomas, Young-Bum Kim, Pablo Cesar Ganga, Chenlei Guo, Shuting Tang, Zhenyu Yao
Deep multi-channel acoustic modeling using multiple microphone array geometries

Patent number: 11574628

Abstract: Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-geometry/multi-channel DNN) that is trained using a plurality of microphone array geometries. Thus, the first model may receive a variable number of microphone channels, generate multiple outputs using multiple microphone array geometries, and select the best output as a first feature vector that may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. The DNN front-end enables improved performance despite a reduction in microphones.

Type: Grant

Filed: March 28, 2019

Date of Patent: February 7, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Kenichi Kumatani, Minhua Wu, Shiva Sundaram, Nikko Strom, Bjorn Hoffmeister
Device-directed utterance detection

Patent number: 11551685

Abstract: A speech interface device is configured to detect an interrupt event and process a voice command without detecting a wakeword. The device includes on-device interrupt architecture configured to detect when device-directed speech is present and send audio data to a remote system for speech processing. This architecture includes an interrupt detector that detects an interrupt event (e.g., device-directed speech) with low latency, enabling the device to quickly lower a volume of output audio and/or perform other actions in response to a potential voice command. In addition, the architecture includes a device directed classifier that processes an entire utterance and corresponding semantic information and detects device-directed speech with high accuracy. Using the device directed classifier, the device may reject the interrupt event and increase a volume of the output audio or may accept the interrupt event, causing the output audio to end and performing speech processing on the audio data.

Type: Grant

Filed: March 18, 2020

Date of Patent: January 10, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Ariya Rastrow, Eli Joshua Fidler, Roland Maximilian Rolf Maas, Nikko Strom, Aaron Eakin, Diamond Bishop, Bjorn Hoffmeister, Sanjeev Mishra
Sentiment aware voice user interface

Patent number: 11508361

Abstract: Described herein is a system for responding to a frustrated user with a response determined based on spoken language understanding (SLU) processing of a user input. The system detects user frustration and responds to a repeated user input by confirming an action to be performed or presenting an alternative action, instead of performing the action responsive to the user input. The system also detects poor audio quality of the captured user input, and responds by requesting the user to repeat the user input. The system processes sentiment data and signal quality data to respond to user inputs.

Type: Grant

Filed: June 1, 2020

Date of Patent: November 22, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Isaac Joseph Madwed, Julia Kennedy Nemer, Joo-Kyung Kim, Nikko Strom, Steven Mack Saunders, Laura Maggia Panfili, Anna Caitlin Jentoft, Sungjin Lee, David Thomas, Young-Bum Kim, Pablo Cesar Ganga, Chenlei Guo, Shuting Tang, Zhenyu Yao
Deep multi-channel acoustic modeling

Patent number: 11475881

Abstract: Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-channel DNN) that takes in raw signals and produces a first feature vector that may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. These three models may be jointly optimized for speech processing (as opposed to individually optimized for signal enhancement), enabling improved performance despite a reduction in microphones and a reduction in bandwidth consumption during real-time processing.

Type: Grant

Filed: July 17, 2020

Date of Patent: October 18, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Arindam Mandal, Kenichi Kumatani, Nikko Strom, Minhua Wu, Shiva Sundaram, Bjorn Hoffmeister, Jeremie Lecomte
DIALOG MANAGEMENT FOR MULTIPLE USERS

Publication number: 20220093093

Abstract: A system can operate a speech-controlled device in a mode where the speech-controlled device determines that an utterance is directed at the speech-controlled device using image data showing the user speaking the utterance. If the user is directing the user's gaze at the speech-controlled device while speaking, the system may determine the utterance is system directed and thus may perform further speech processing based on the utterance. If the user's gaze is directed elsewhere, the system may determine the utterance is not system directed (for example directed at another user) and thus the system may not perform further speech processing based on the utterance and may take other actions, for example discarding audio data of the utterance.

Type: Application

Filed: December 4, 2020

Publication date: March 24, 2022

Inventors: Prakash Krishnan, Arindam Mandal, Nikko Strom, Pradeep Natarajan, Ariya Rastrow, Shiv Naga Prasad Vitaladevuni, David Chi-Wai Tang, Aaron Challenner, Xu Zhang, Krishna Anisetty, Josey Diego Sandoval, Rohit Prasad, Premkumar Natarajan
DIALOG MANAGEMENT FOR MULTIPLE USERS

Publication number: 20220093094

Abstract: A natural language system may be configured to act as a participant in a conversation between two users. The system may determine when a user expression such as speech, a gesture, or the like is directed from one user to the other. The system may processing input data related the expression (such as audio data, input data, language processing result data, conversation context data, etc.) to determine if the system should interject a response to the user-to-user expression. If so, the system may process the input data to determine a response and output it. The system may track that response as part of the data related to the ongoing conversation.

Type: Application

Filed: December 4, 2020

Publication date: March 24, 2022

Inventors: Prakash Krishnan, Arindam Mandal, Siddhartha Reddy Jonnalagadda, Nikko Strom, Ariya Rastrow, Shiv Naga Prasad Vitaladevuni, Angeliki Metallinou, Vincent Auvray, Minmin Shen, Josey Diego Sandoval, Rohit Prasad, Thomas Taylor, Amotz Maimon
DIALOG MANAGEMENT FOR MULTIPLE USERS

Publication number: 20220093101

Abstract: A system that is capable of resolving anaphora using timing data received by a local device. A local device outputs audio representing a list of entries. The audio may represent synthesized speech of the list of entries. A user can interrupt the device to select an entry in the list, such as by saying “that one.” The local device can determine an offset time representing the time between when audio playback began and when the user interrupted. The local device sends the offset time and audio data representing the utterance to a speech processing system which can then use the offset time and stored data to identify which entry on the list was most recently output by the local device when the user interrupted. The system can then resolve anaphora to match that entry and can perform additional processing based on the referred to item.

Type: Application

Filed: December 4, 2020

Publication date: March 24, 2022

Inventors: Prakash Krishnan, Arindam Mandal, Siddhartha Reddy Jonnalagadda, Nikko Strom, Ariya Rastrow, Ying Shi, David Chi-Wai Tang, Nishtha Gupta, Aaron Challenner, Bonan Zheng, Angeliki Metallinou, Vincent Auvray, Minmin Shen
Natural language speech processing application selection

Patent number: 11276403

Abstract: Techniques for limiting natural language processing performed on input data are described. A system receives input data from a device. The input data corresponds to a command to be executed by the system. The system determines applications likely configured to execute the command. The system performs named entity recognition and intent classification with respect to only the applications likely configured to execute the command.

Type: Grant

Filed: November 25, 2019

Date of Patent: March 15, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Ruhi Sarikaya, Rohit Prasad, Kerry Hammil, Spyridon Matsoukas, Nikko Strom, Frédéric Johan Georges Deramat, Stephen Frederick Potter, Young-Bum Kim
ON-DEVICE LEARNING IN A HYBRID SPEECH PROCESSING SYSTEM

Publication number: 20220020357

Abstract: A speech interface device is configured to receive response data from a remote speech processing system for responding to user speech. This response data may be enhanced with information such as remote NLU data. The response data from the remote speech processing system may be compared to local NLU data to improve a speech processing model on the device. Thus, the device may perform supervised on-device learning based on the remote NLU data. The device may determine differences between the updated speech processing model and an original speech processing model received from the remote system and may send data indicating these differences to the remote system. The remote system may aggregate data received from a plurality of devices and may generate an improved speech processing model.

Type: Application

Filed: July 27, 2021

Publication date: January 20, 2022

Inventors: Ariya Rastrow, Rohit Prasad, Nikko Strom
Goal-oriented dialog system

Patent number: 11200885

Abstract: A dialog manager receives text data corresponding to a dialog with a user. Entities represented in the text data are identified. Context data relating to the dialog is maintained, which may include prior dialog, prior API calls, user profile information, or other data. Using the text data and the context data, an N-best list of one or more dialog models is selected to process the text data. After processing the text data, the outputs of the N-best models are ranked and a top-scoring output is selected. The top-scoring output may be an API call and/or an audio prompt.

Type: Grant

Filed: December 13, 2018

Date of Patent: December 14, 2021

Assignee: Amazon Technologies, Inc.

Inventors: Arindam Mandal, Nikko Strom, Angeliki Metallinou, Tagyoung Chung, Dilek Hakkani-Tur, Suranjit Adhikari, Sridhar Yadav Manoharan, Ankita De, Qing Liu, Raefer Christopher Gabriel, Rohit Prasad
SENTIMENT AWARE VOICE USER INTERFACE

Publication number: 20210375272

Abstract: Described herein is a system for responding to a frustrated user with a response determined based on spoken language understanding (SLU) processing of a user input. The system detects user frustration and responds to a repeated user input by confirming an action to be performed or presenting an alternative action, instead of performing the action responsive to the user input. The system also detects poor audio quality of the captured user input, and responds by requesting the user to repeat the user input. The system processes sentiment data and signal quality data to respond to user inputs.

Type: Application

Filed: June 1, 2020

Publication date: December 2, 2021

Inventors: Isaac Joseph Madwed, Julia Kennedy Nemer, Joo-Kyung Kim, Nikko Strom, Steven Mack Saunders, Laura Maggia Panfili, Anna Caitlin Jentoft, Sungjin Lee, David Thomas, Young-Bum Kim, Pablo Cesar Ganga, Chenlei Guo, Shuting Tang, Zhenyu Yao
DEVICE-DIRECTED UTTERANCE DETECTION

Publication number: 20210295833

Abstract: A speech interface device is configured to detect an interrupt event and process a voice command without detecting a wakeword. The device includes on-device interrupt architecture configured to detect when device-directed speech is present and send audio data to a remote system for speech processing. This architecture includes an interrupt detector that detects an interrupt event (e.g., device-directed speech) with low latency, enabling the device to quickly lower a volume of output audio and/or perform other actions in response to a potential voice command. In addition, the architecture includes a device directed classifier that processes an entire utterance and corresponding semantic information and detects device-directed speech with high accuracy. Using the device directed classifier, the device may reject the interrupt event and increase a volume of the output audio or may accept the interrupt event, causing the output audio to end and performing speech processing on the audio data.

Type: Application

Filed: March 18, 2020

Publication date: September 23, 2021

Inventors: Ariya Rastrow, Eli Joshua Fidler, Roland Maximilian Rolf Maas, Nikko Strom, Aaron Eakin, Diamond Bishop, Bjorn Hoffmeister, Sanjeev Mishra
On-device learning in a hybrid speech processing system

Patent number: 11087739

Abstract: A speech interface device is configured to receive response data from a remote speech processing system for responding to user speech. This response data may be enhanced with information such as remote NLU data. The response data from the remote speech processing system may be compared to local NLU data to improve a speech processing model on the device. Thus, the device may perform supervised on-device learning based on the remote NLU data. The device may determine differences between the updated speech processing model and an original speech processing model received from the remote system and may send data indicating these differences to the remote system. The remote system may aggregate data received from a plurality of devices and may generate an improved speech processing model.

Type: Grant

Filed: November 13, 2018

Date of Patent: August 10, 2021

Assignee: Amazon Technologies, Inc.

Inventors: Ariya Rastrow, Rohit Prasad, Nikko Strom
Generation of predictive natural language processing models

Patent number: 10964312

Abstract: Features are disclosed for generating predictive personal natural language processing models based on user-specific profile information. The predictive personal models can provide broader coverage of the various terms, named entities, and/or intents of an utterance by the user than a personal model, while providing better accuracy than a general model. Profile information may be obtained from various data sources. Predictions regarding the content or subject of future user utterances may be made from the profile information. Predictive personal models may be generated based on the predictions. Future user utterances may be processed using the predictive personal models.

Type: Grant

Filed: August 13, 2018

Date of Patent: March 30, 2021

Assignee: Amazon Technologies, Inc.

Inventors: William Folwell Barton, Rohit Prasad, Stephen Frederick Potter, Nikko Strom, Yuzo Watanabe, Madan Mohan Rao Jampani, Ariya Rastrow, Arushan Rajasekaram
Trigger word detection using neural network waveform processing

Patent number: 10847137

Abstract: An approach to speech recognition, and in particular trigger word detection, implements fixed feature extraction form waveform samples with a neural network (NN). For example, rather than computing Log Frequency Band Energies (LFBEs), a convolutional neural network is used. In some implementations, this NN waveform processing is combined with a trained secondary classification that makes use of phonetic segmentation of a possible trigger word occurrence.

Type: Grant

Filed: December 12, 2017

Date of Patent: November 24, 2020

Assignee: Amazon Technologies, Inc.

Inventors: Arindam Mandal, Nikko Strom, Kenichi Kumatani, Sankaran Panchapagesan

1 2 3 next