Patents by Inventor Nikko Strom

Nikko Strom has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

DEEP MULTI-CHANNEL ACOUSTIC MODELING

Publication number: 20200349928

Abstract: Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-channel DNN) that takes in raw signals and produces a first feature vector that may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. These three models may be jointly optimized for speech processing (as opposed to individually optimized for signal enhancement), enabling improved performance despite a reduction in microphones and a reduction in bandwidth consumption during real-time processing.

Type: Application

Filed: July 17, 2020

Publication date: November 5, 2020

Inventors: Arindam Mandal, Kenichi Kumatani, Nikko Strom, Minhua Wu, Shiva Sundaram, Bjorn Hoffmeister, Jeremie Lecomte
Deep multi-channel acoustic modeling

Patent number: 10726830

Abstract: Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-channel DNN) that takes in raw signals and produces a first feature vector that may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. These three models may be jointly optimized for speech processing (as opposed to individually optimized for signal enhancement), enabling improved performance despite a reduction in microphones and a reduction in bandwidth consumption during real-time processing.

Type: Grant

Filed: September 27, 2018

Date of Patent: July 28, 2020

Assignee: Amazon Technologies, Inc.

Inventors: Arindam Mandal, Kenichi Kumatani, Nikko Strom, Minhua Wu, Shiva Sundaram, Bjorn Hoffmeister, Jeremie Lecomte
NATURAL LANGUAGE SPEECH PROCESSING APPLICATION SELECTION

Publication number: 20200152195

Abstract: Techniques for limiting natural language processing performed on input data are described. A system receives input data from a device. The input data corresponds to a command to be executed by the system. The system determines applications likely configured to execute the command. The system performs named entity recognition and intent classification with respect to only the applications likely configured to execute the command.

Type: Application

Filed: November 25, 2019

Publication date: May 14, 2020

Inventors: Ruhi Sarikaya, Rohit Prasad, Kerry Hammil, Spyridon Matsoukas, Nikko Strom, Frédéric Johan Georges Deramat, Stephen Frederick Potter, Young-Bum Kim
System command processing

Patent number: 10600419

Abstract: Techniques for performing command processing are described. A system receives, from a device, input data corresponding to a command. The system determines NLU processing results associated with multiple applications. The system also determines NLU confidences for the NLU processing results for each application. The system sends NLU processing results to a portion of the multiple applications, and receives output data or instructions from the portion of the applications. The system ranks the portion of the applications based at least in part on the NLU processing results associated with the portion of the applications as well as the output data or instructions provided by the portion of the applications. The system may also rank the portion of the applications using other data. The system causes content corresponding to output data or instructions provided by the highest ranked application to be output to a user.

Type: Grant

Filed: September 22, 2017

Date of Patent: March 24, 2020

Assignee: Amazon Technologies, Inc.

Inventors: Ruhi Sarikaya, Rohit Prasad, Kerry Hammil, Spyridon Matsoukas, Nikko Strom, Frédéric Johan Georges Deramat, Stephen Frederick Potter, Young-Bum Kim
Resolution enhancement of speech signals for speech synthesis

Patent number: 10510358

Abstract: An approach to speech synthesis uses two phases in which a relatively low quality waveform is computed, and that waveform is passed through an enhancement phase which generates the waveform that is ultimately used to produce the acoustic signal provided to the user. For example, the first phase and the second phase are each implemented using a separate artificial neural network. The two phases may be computationally preferable to using a direct approach to yield a synthesized waveform of comparable quality.

Type: Grant

Filed: September 29, 2017

Date of Patent: December 17, 2019

Assignee: Amazon Technologies, Inc.

Inventors: Roberto Barra-Chicote, Alexis Moinet, Nikko Strom
Natural language speech processing application selection

Patent number: 10504512

Abstract: Techniques for limiting natural language processing performed on input data are described. A system receives input data from a device. The input data corresponds to a command to be executed by the system. The system determines applications likely configured to execute the command. The system performs named entity recognition and intent classification with respect to only the applications likely configured to execute the command.

Type: Grant

Filed: September 22, 2017

Date of Patent: December 10, 2019

Assignee: AMAZON TECHNOLOGIES, INC.

Inventors: Ruhi Sarikaya, Rohit Prasad, Kerry Hammil, Spyridon Matsoukas, Nikko Strom, Frédéric Johan Georges Deramat, Stephen Frederick Potter, Young-Bum Kim
Acoustic trigger detection

Patent number: 10460722

Abstract: A method for selective transmission of audio data to a speech processing server uses detection of an acoustic trigger in the audio data in determining the data to transmit. Detection of the acoustic trigger makes use of an efficient computation approach that reduces the amount of run-time computation required, or equivalently improves accuracy for a given amount of computation, by combining a “time delay” structure in which intermediate results of computations are reused at various time delays, thereby avoiding computation of computing new results, and decomposition of certain transformations to require fewer arithmetic operations without sacrificing significant performance. For a given amount of computation capacity the combination of these two techniques provides improved accuracy as compared to current approaches.

Type: Grant

Filed: June 30, 2017

Date of Patent: October 29, 2019

Assignee: Amazon Technologies, Inc.

Inventors: Ming Sun, David Snyder, Yixin Gao, Nikko Strom, Spyros Matsoukas, Shiv Naga Prasad Vitaladevuni
Joint modeling of user behavior

Patent number: 10354184

Abstract: A system and method is disclosed for predicting user behavior in response to various tasks and or/applications. This system can be a neural network-based joint model. The neural network can include a base neural network portion and one or more task-specific neural network portions. The artificial neural network can be initialized and trained using data from multiple users for multiple tasks and/or applications. This user data can be related to characteristics and behavior, including age, gender, geographic location, purchases, past search history, and customer reviews. Additional task-specific neural network portions can be added to the neural network and may be trained using a task-specific subset of the training data. The joint model can be used to predict user behavior in response to an identified task and/or application. The tasks and/or applications can relate to use of a website by users.

Type: Grant

Filed: June 24, 2014

Date of Patent: July 16, 2019

Assignee: Amazon Technologies, Inc.

Inventors: Shiv Naga Prasad Vitaladevuni, Nikko Ström, Rohit Prasad
GENERATION OF PREDICTIVE NATURAL LANGUAGE PROCESSING MODELS

Publication number: 20190180736

Abstract: Features are disclosed for generating predictive personal natural language processing models based on user-specific profile information. The predictive personal models can provide broader coverage of the various terms, named entities, and/or intents of an utterance by the user than a personal model, while providing better accuracy than a general model. Profile information may be obtained from various data sources. Predictions regarding the content or subject of future user utterances may be made from the profile information. Predictive personal models may be generated based on the predictions. Future user utterances may be processed using the predictive personal models.

Type: Application

Filed: August 13, 2018

Publication date: June 13, 2019

Inventors: William Folwell Barton, Rohit Prasad, Stephen Frederick Potter, Nikko Strom, Yuzo Watanabe, Madan Mohan Rao Jampani, Ariya Rastrow, Arushan Rajasekaram
Asynchronous transfer of audio data

Patent number: 10297250

Abstract: The systems, devices, and processes described herein may asynchronously transfer audio signals from a voice-controlled device to a remote device. The audio signals may correspond to sound that is captured by multiple microphones of the voice-controlled device, which may then process the audio signals. The audio signals may also be transferred to the remote device for processing. Moreover, a determination of whether the voice-controlled device or the remote device is to process the audio signals may be based at least in part on the bandwidth of a network communicatively coupled to the voice-controlled device. The voice-controlled device may also cache and log the audio signals, and then asynchronously stream the audio signals to the remote device after the audio signals are initially processed, which may be based on the bandwidth of the network. The remote device may utilize the unprocessed audio signals to improve subsequent processing of audio signals.

Type: Grant

Filed: March 11, 2013

Date of Patent: May 21, 2019

Assignee: Amazon Technologies, Inc.

Inventors: Scott Ian Blanksteen, Nikko Strom, Kavitha Velusamy, Tony David, Edward Dietz Crump
Training models using voice tags

Patent number: 10163437

Abstract: Techniques for training machine-learning algorithms with the aid of voice tags are described herein. An environment may include sensors configured to generate sensor data and devices configured to perform operations. Sensor data as well as indications of actions performed by devices within the environment may be collected over time and analyzed to identify one or more patterns. Over time, a model that includes an association between this sensor data and device actions may be created and trained such that one or more device actions may be automatically initiated in response to identifying sensor data matching the sensor data of the model. To aid in the training, a user may utter a predefined voice tag each time she performs a particular sequence of actions, with the voice tag indicating to the system that temporally proximate sensor data and device-activity data should be used to train a particular model.

Type: Grant

Filed: June 2, 2016

Date of Patent: December 25, 2018

Assignee: Amazon Technologies, Inc.

Inventors: Lindo St. Angel, Nikko Strom, Rohan Mutagi
Distributed training of models using stochastic gradient descent

Patent number: 10152676

Abstract: Features are disclosed for distributing the training of models over multiple computing nodes (e.g., servers or other computing devices). Each computing device may include a separate copy of the model to be trained, and a subset of the training data to be used. A computing device may determine updates for parameters of the model based on processing of a portion of the training data. A portion of those updates may be selected for application to the model and synchronization with other computing devices. In some embodiments, the portion of the updates is selected based on a threshold value. Other computing devices can apply the received portion of the updates such that the copy of the model being trained in each individual computing device may be substantially synchronized, even though each computing device may be using a different subset of training data to train the model.

Type: Grant

Filed: November 22, 2013

Date of Patent: December 11, 2018

Assignee: Amazon Technologies, Inc.

Inventor: Nikko Strom
Efficient generation of personalized spoken language understanding models

Patent number: 10109273

Abstract: Features are disclosed for maintaining data that can be used to personalize spoken language understanding models, such as speech recognition or natural language understanding models. The personalization data can be used to update the models based on some or all of the data. The data may be obtained from various data sources, such as applications or services used by the user. Personalized spoken language understanding models may be generated or updated based on updates to the personalization data or some other portion of the stored personalization data. Generation of personalized spoken language understanding models may be prioritized such that the generation process accommodates multiple users.

Type: Grant

Filed: August 29, 2013

Date of Patent: October 23, 2018

Assignee: Amazon Technologies, Inc.

Inventors: Arushan Rajasekaram, Nikko Strom, Madan Mohan Rao Jampani
Generation of predictive natural language processing models

Patent number: 10049656

Abstract: Features are disclosed for generating predictive personal natural language processing models based on user-specific profile information. The predictive personal models can provide broader coverage of the various terms, named entities, and/or intents of an utterance by the user than a personal model, while providing better accuracy than a general model. Profile information may be obtained from various data sources. Predictions regarding the content or subject of future user utterances may be made from the profile information. Predictive personal models may be generated based on the predictions. Future user utterances may be processed using the predictive personal models.

Type: Grant

Filed: September 20, 2013

Date of Patent: August 14, 2018

Assignee: Amazon Technologies, Inc.

Inventors: William Folwell Barton, Rohit Prasad, Stephen Frederick Potter, Nikko Strom, Yuzo Watanabe, Madan Mohan Rao Jampani, Ariya Rastrow, Arushan Rajasekaram
Speech processing with learned representation of user interaction history

Patent number: 10032463

Abstract: An automatic speech recognition (“ASR”) system produces, for particular users, customized speech recognition results by using data regarding prior interactions of the users with the system. A portion of the ASR system (e.g., a neural-network-based language model) can be trained to produce an encoded representation of a user's interactions with the system based on, e.g., transcriptions of prior utterances made by the user. This user-specific encoded representation of interaction history is then used by the language model to customize ASR processing for the user.

Type: Grant

Filed: December 29, 2015

Date of Patent: July 24, 2018

Assignee: Amazon Technologies, Inc.

Inventors: Ariya Rastrow, Nikko Ström, Spyridon Matsoukas, Markus Dreyer, Ankur Gandhe, Denis Sergeyevich Filimonov, Julian Chan, Rohit Prasad
Distributed endpointing for speech recognition

Patent number: 9818407

Abstract: An efficient audio streaming method and apparatus includes a client process implemented on a client or local device and a server process implemented on a remote server or server(s). The client process and server process each have speech recognition components and communicate over a network, and together efficiently manage the detection of speech in an audio signal streamed by the local device to the server for speech recognition and potentially further processing at the server. The client process monitors audio input and in a first detection stage, implements endpointing on the local device to determine when speech is detected. The client process may further determine if a “wakeword” is detected, and then the client process opens a connection and begins streaming audio to the server process via the network.

Type: Grant

Filed: February 7, 2013

Date of Patent: November 14, 2017

Assignee: AMAZON TECHNOLOGIES, INC.

Inventors: Hugh Evan Secker-Walker, Kenneth John Basye, Nikko Strom, Ryan Paul Thomas
Generative modeling of speech using neural networks

Patent number: 9653093

Abstract: Features are disclosed for using an artificial neural network to generate customized speech recognition models during the speech recognition process. By dynamically generating the speech recognition models during the speech recognition process, the models can be customized based on the specific context of individual frames within the audio data currently being processed. In this way, dependencies between frames in the current sequence can form the basis of the models used to score individual frames of the current sequence. Thus, each frame of the current sequence (or some subset thereof) may be scored using one or more models customized for the particular frame in context.

Type: Grant

Filed: August 19, 2014

Date of Patent: May 16, 2017

Assignee: Amazon Technologies, Inc.

Inventors: Spyridon Matsoukas, Nikko Ström, Ariya Rastrow, Sri Venkata Surya Siva Rama Krishna Garimella
Markov-based sequence tagging using neural networks

Patent number: 9600764

Abstract: Features are disclosed for using a neural network to tag sequential input without using an internal representation of the neural network generated when scoring previous positions in the sequence. A predicted or determined label (e.g., the highest scoring or otherwise most probable label) for input at a given position in the sequence can be used when scoring input corresponding to the next position the sequence. Additional features are disclosed for training a neural network for use in tagging sequential input without using an internal representation of the neural network generated when scoring previous positions the sequence.

Type: Grant

Filed: June 17, 2014

Date of Patent: March 21, 2017

Assignee: Amazon Technologies, Inc.

Inventors: Ariya Rastrow, Spyros Matsoukas, Sri Venkata Surya Siva Rama Krishna Garimella, Nikko Ström, Bjorn Hoffmeister
Estimating speaker-specific affine transforms for neural network based speech recognition systems

Patent number: 9378735

Abstract: Features are disclosed for estimating affine transforms in Log Filter-Bank Energy Space (“LFBE” space) in order to adapt artificial neural network-based acoustic models to a new speaker or environment. Neural network-based acoustic models may be trained using concatenated LFBEs as input features. The affine transform may be estimated by minimizing the least squares error between corresponding linear and bias transform parts for the resultant neural network feature vector and some standard speaker-specific feature vector obtained for a GMM-based acoustic model using constrained Maximum Likelihood Linear Regression (“cMLLR”) techniques. Alternatively, the affine transform may be estimated by minimizing the least squares error between the resultant transformed neural network feature and some standard speaker-specific feature obtained for a GMM-based acoustic model.

Type: Grant

Filed: December 19, 2013

Date of Patent: June 28, 2016

Assignee: Amazon Technologies, Inc.

Inventors: Sri Venkata Surya Siva Rama Krishna Garimella, Bjorn Hoffmeister, Nikko Strom
Retrieval and management of spoken language understanding personalization data

Patent number: 9361289

Abstract: Features are disclosed for maintaining data that can be used to personalize spoken language processing, such as automatic speech recognition (“ASR”), natural language understanding (“NLU”), natural language processing (“NLP”), etc. The data may be obtained from various data sources, such as applications or services used by the user. User-specific data maintained by the data sources can be retrieved and stored for use in generating personal models. Updates to data at the data sources may be reflected by separate data sets in the personalization data, such that other processes can obtain the update data sets separate from other data.

Type: Grant

Filed: August 30, 2013

Date of Patent: June 7, 2016

Assignee: Amazon Technologies, Inc.

Inventors: Madan Mohan Rao Jampani, Arushan Rajasekaram, Nikko Strom, Yuzo Watanabe, Stan Weidner Salvador

prev 1 2 3 next