Patents by Inventor Nikko Strom
Nikko Strom has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20200349928Abstract: Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-channel DNN) that takes in raw signals and produces a first feature vector that may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. These three models may be jointly optimized for speech processing (as opposed to individually optimized for signal enhancement), enabling improved performance despite a reduction in microphones and a reduction in bandwidth consumption during real-time processing.Type: ApplicationFiled: July 17, 2020Publication date: November 5, 2020Inventors: Arindam Mandal, Kenichi Kumatani, Nikko Strom, Minhua Wu, Shiva Sundaram, Bjorn Hoffmeister, Jeremie Lecomte
-
Patent number: 10726830Abstract: Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-channel DNN) that takes in raw signals and produces a first feature vector that may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. These three models may be jointly optimized for speech processing (as opposed to individually optimized for signal enhancement), enabling improved performance despite a reduction in microphones and a reduction in bandwidth consumption during real-time processing.Type: GrantFiled: September 27, 2018Date of Patent: July 28, 2020Assignee: Amazon Technologies, Inc.Inventors: Arindam Mandal, Kenichi Kumatani, Nikko Strom, Minhua Wu, Shiva Sundaram, Bjorn Hoffmeister, Jeremie Lecomte
-
Publication number: 20200152195Abstract: Techniques for limiting natural language processing performed on input data are described. A system receives input data from a device. The input data corresponds to a command to be executed by the system. The system determines applications likely configured to execute the command. The system performs named entity recognition and intent classification with respect to only the applications likely configured to execute the command.Type: ApplicationFiled: November 25, 2019Publication date: May 14, 2020Inventors: Ruhi Sarikaya, Rohit Prasad, Kerry Hammil, Spyridon Matsoukas, Nikko Strom, Frédéric Johan Georges Deramat, Stephen Frederick Potter, Young-Bum Kim
-
Patent number: 10600419Abstract: Techniques for performing command processing are described. A system receives, from a device, input data corresponding to a command. The system determines NLU processing results associated with multiple applications. The system also determines NLU confidences for the NLU processing results for each application. The system sends NLU processing results to a portion of the multiple applications, and receives output data or instructions from the portion of the applications. The system ranks the portion of the applications based at least in part on the NLU processing results associated with the portion of the applications as well as the output data or instructions provided by the portion of the applications. The system may also rank the portion of the applications using other data. The system causes content corresponding to output data or instructions provided by the highest ranked application to be output to a user.Type: GrantFiled: September 22, 2017Date of Patent: March 24, 2020Assignee: Amazon Technologies, Inc.Inventors: Ruhi Sarikaya, Rohit Prasad, Kerry Hammil, Spyridon Matsoukas, Nikko Strom, Frédéric Johan Georges Deramat, Stephen Frederick Potter, Young-Bum Kim
-
Patent number: 10510358Abstract: An approach to speech synthesis uses two phases in which a relatively low quality waveform is computed, and that waveform is passed through an enhancement phase which generates the waveform that is ultimately used to produce the acoustic signal provided to the user. For example, the first phase and the second phase are each implemented using a separate artificial neural network. The two phases may be computationally preferable to using a direct approach to yield a synthesized waveform of comparable quality.Type: GrantFiled: September 29, 2017Date of Patent: December 17, 2019Assignee: Amazon Technologies, Inc.Inventors: Roberto Barra-Chicote, Alexis Moinet, Nikko Strom
-
Patent number: 10504512Abstract: Techniques for limiting natural language processing performed on input data are described. A system receives input data from a device. The input data corresponds to a command to be executed by the system. The system determines applications likely configured to execute the command. The system performs named entity recognition and intent classification with respect to only the applications likely configured to execute the command.Type: GrantFiled: September 22, 2017Date of Patent: December 10, 2019Assignee: AMAZON TECHNOLOGIES, INC.Inventors: Ruhi Sarikaya, Rohit Prasad, Kerry Hammil, Spyridon Matsoukas, Nikko Strom, Frédéric Johan Georges Deramat, Stephen Frederick Potter, Young-Bum Kim
-
Patent number: 10460722Abstract: A method for selective transmission of audio data to a speech processing server uses detection of an acoustic trigger in the audio data in determining the data to transmit. Detection of the acoustic trigger makes use of an efficient computation approach that reduces the amount of run-time computation required, or equivalently improves accuracy for a given amount of computation, by combining a “time delay” structure in which intermediate results of computations are reused at various time delays, thereby avoiding computation of computing new results, and decomposition of certain transformations to require fewer arithmetic operations without sacrificing significant performance. For a given amount of computation capacity the combination of these two techniques provides improved accuracy as compared to current approaches.Type: GrantFiled: June 30, 2017Date of Patent: October 29, 2019Assignee: Amazon Technologies, Inc.Inventors: Ming Sun, David Snyder, Yixin Gao, Nikko Strom, Spyros Matsoukas, Shiv Naga Prasad Vitaladevuni
-
Patent number: 10354184Abstract: A system and method is disclosed for predicting user behavior in response to various tasks and or/applications. This system can be a neural network-based joint model. The neural network can include a base neural network portion and one or more task-specific neural network portions. The artificial neural network can be initialized and trained using data from multiple users for multiple tasks and/or applications. This user data can be related to characteristics and behavior, including age, gender, geographic location, purchases, past search history, and customer reviews. Additional task-specific neural network portions can be added to the neural network and may be trained using a task-specific subset of the training data. The joint model can be used to predict user behavior in response to an identified task and/or application. The tasks and/or applications can relate to use of a website by users.Type: GrantFiled: June 24, 2014Date of Patent: July 16, 2019Assignee: Amazon Technologies, Inc.Inventors: Shiv Naga Prasad Vitaladevuni, Nikko Ström, Rohit Prasad
-
Publication number: 20190180736Abstract: Features are disclosed for generating predictive personal natural language processing models based on user-specific profile information. The predictive personal models can provide broader coverage of the various terms, named entities, and/or intents of an utterance by the user than a personal model, while providing better accuracy than a general model. Profile information may be obtained from various data sources. Predictions regarding the content or subject of future user utterances may be made from the profile information. Predictive personal models may be generated based on the predictions. Future user utterances may be processed using the predictive personal models.Type: ApplicationFiled: August 13, 2018Publication date: June 13, 2019Inventors: William Folwell Barton, Rohit Prasad, Stephen Frederick Potter, Nikko Strom, Yuzo Watanabe, Madan Mohan Rao Jampani, Ariya Rastrow, Arushan Rajasekaram
-
Patent number: 10297250Abstract: The systems, devices, and processes described herein may asynchronously transfer audio signals from a voice-controlled device to a remote device. The audio signals may correspond to sound that is captured by multiple microphones of the voice-controlled device, which may then process the audio signals. The audio signals may also be transferred to the remote device for processing. Moreover, a determination of whether the voice-controlled device or the remote device is to process the audio signals may be based at least in part on the bandwidth of a network communicatively coupled to the voice-controlled device. The voice-controlled device may also cache and log the audio signals, and then asynchronously stream the audio signals to the remote device after the audio signals are initially processed, which may be based on the bandwidth of the network. The remote device may utilize the unprocessed audio signals to improve subsequent processing of audio signals.Type: GrantFiled: March 11, 2013Date of Patent: May 21, 2019Assignee: Amazon Technologies, Inc.Inventors: Scott Ian Blanksteen, Nikko Strom, Kavitha Velusamy, Tony David, Edward Dietz Crump
-
Patent number: 10163437Abstract: Techniques for training machine-learning algorithms with the aid of voice tags are described herein. An environment may include sensors configured to generate sensor data and devices configured to perform operations. Sensor data as well as indications of actions performed by devices within the environment may be collected over time and analyzed to identify one or more patterns. Over time, a model that includes an association between this sensor data and device actions may be created and trained such that one or more device actions may be automatically initiated in response to identifying sensor data matching the sensor data of the model. To aid in the training, a user may utter a predefined voice tag each time she performs a particular sequence of actions, with the voice tag indicating to the system that temporally proximate sensor data and device-activity data should be used to train a particular model.Type: GrantFiled: June 2, 2016Date of Patent: December 25, 2018Assignee: Amazon Technologies, Inc.Inventors: Lindo St. Angel, Nikko Strom, Rohan Mutagi
-
Patent number: 10152676Abstract: Features are disclosed for distributing the training of models over multiple computing nodes (e.g., servers or other computing devices). Each computing device may include a separate copy of the model to be trained, and a subset of the training data to be used. A computing device may determine updates for parameters of the model based on processing of a portion of the training data. A portion of those updates may be selected for application to the model and synchronization with other computing devices. In some embodiments, the portion of the updates is selected based on a threshold value. Other computing devices can apply the received portion of the updates such that the copy of the model being trained in each individual computing device may be substantially synchronized, even though each computing device may be using a different subset of training data to train the model.Type: GrantFiled: November 22, 2013Date of Patent: December 11, 2018Assignee: Amazon Technologies, Inc.Inventor: Nikko Strom
-
Patent number: 10109273Abstract: Features are disclosed for maintaining data that can be used to personalize spoken language understanding models, such as speech recognition or natural language understanding models. The personalization data can be used to update the models based on some or all of the data. The data may be obtained from various data sources, such as applications or services used by the user. Personalized spoken language understanding models may be generated or updated based on updates to the personalization data or some other portion of the stored personalization data. Generation of personalized spoken language understanding models may be prioritized such that the generation process accommodates multiple users.Type: GrantFiled: August 29, 2013Date of Patent: October 23, 2018Assignee: Amazon Technologies, Inc.Inventors: Arushan Rajasekaram, Nikko Strom, Madan Mohan Rao Jampani
-
Patent number: 10049656Abstract: Features are disclosed for generating predictive personal natural language processing models based on user-specific profile information. The predictive personal models can provide broader coverage of the various terms, named entities, and/or intents of an utterance by the user than a personal model, while providing better accuracy than a general model. Profile information may be obtained from various data sources. Predictions regarding the content or subject of future user utterances may be made from the profile information. Predictive personal models may be generated based on the predictions. Future user utterances may be processed using the predictive personal models.Type: GrantFiled: September 20, 2013Date of Patent: August 14, 2018Assignee: Amazon Technologies, Inc.Inventors: William Folwell Barton, Rohit Prasad, Stephen Frederick Potter, Nikko Strom, Yuzo Watanabe, Madan Mohan Rao Jampani, Ariya Rastrow, Arushan Rajasekaram
-
Patent number: 10032463Abstract: An automatic speech recognition (“ASR”) system produces, for particular users, customized speech recognition results by using data regarding prior interactions of the users with the system. A portion of the ASR system (e.g., a neural-network-based language model) can be trained to produce an encoded representation of a user's interactions with the system based on, e.g., transcriptions of prior utterances made by the user. This user-specific encoded representation of interaction history is then used by the language model to customize ASR processing for the user.Type: GrantFiled: December 29, 2015Date of Patent: July 24, 2018Assignee: Amazon Technologies, Inc.Inventors: Ariya Rastrow, Nikko Ström, Spyridon Matsoukas, Markus Dreyer, Ankur Gandhe, Denis Sergeyevich Filimonov, Julian Chan, Rohit Prasad
-
Patent number: 9818407Abstract: An efficient audio streaming method and apparatus includes a client process implemented on a client or local device and a server process implemented on a remote server or server(s). The client process and server process each have speech recognition components and communicate over a network, and together efficiently manage the detection of speech in an audio signal streamed by the local device to the server for speech recognition and potentially further processing at the server. The client process monitors audio input and in a first detection stage, implements endpointing on the local device to determine when speech is detected. The client process may further determine if a “wakeword” is detected, and then the client process opens a connection and begins streaming audio to the server process via the network.Type: GrantFiled: February 7, 2013Date of Patent: November 14, 2017Assignee: AMAZON TECHNOLOGIES, INC.Inventors: Hugh Evan Secker-Walker, Kenneth John Basye, Nikko Strom, Ryan Paul Thomas
-
Patent number: 9653093Abstract: Features are disclosed for using an artificial neural network to generate customized speech recognition models during the speech recognition process. By dynamically generating the speech recognition models during the speech recognition process, the models can be customized based on the specific context of individual frames within the audio data currently being processed. In this way, dependencies between frames in the current sequence can form the basis of the models used to score individual frames of the current sequence. Thus, each frame of the current sequence (or some subset thereof) may be scored using one or more models customized for the particular frame in context.Type: GrantFiled: August 19, 2014Date of Patent: May 16, 2017Assignee: Amazon Technologies, Inc.Inventors: Spyridon Matsoukas, Nikko Ström, Ariya Rastrow, Sri Venkata Surya Siva Rama Krishna Garimella
-
Patent number: 9600764Abstract: Features are disclosed for using a neural network to tag sequential input without using an internal representation of the neural network generated when scoring previous positions in the sequence. A predicted or determined label (e.g., the highest scoring or otherwise most probable label) for input at a given position in the sequence can be used when scoring input corresponding to the next position the sequence. Additional features are disclosed for training a neural network for use in tagging sequential input without using an internal representation of the neural network generated when scoring previous positions the sequence.Type: GrantFiled: June 17, 2014Date of Patent: March 21, 2017Assignee: Amazon Technologies, Inc.Inventors: Ariya Rastrow, Spyros Matsoukas, Sri Venkata Surya Siva Rama Krishna Garimella, Nikko Ström, Bjorn Hoffmeister
-
Patent number: 9378735Abstract: Features are disclosed for estimating affine transforms in Log Filter-Bank Energy Space (“LFBE” space) in order to adapt artificial neural network-based acoustic models to a new speaker or environment. Neural network-based acoustic models may be trained using concatenated LFBEs as input features. The affine transform may be estimated by minimizing the least squares error between corresponding linear and bias transform parts for the resultant neural network feature vector and some standard speaker-specific feature vector obtained for a GMM-based acoustic model using constrained Maximum Likelihood Linear Regression (“cMLLR”) techniques. Alternatively, the affine transform may be estimated by minimizing the least squares error between the resultant transformed neural network feature and some standard speaker-specific feature obtained for a GMM-based acoustic model.Type: GrantFiled: December 19, 2013Date of Patent: June 28, 2016Assignee: Amazon Technologies, Inc.Inventors: Sri Venkata Surya Siva Rama Krishna Garimella, Bjorn Hoffmeister, Nikko Strom
-
Patent number: 9361289Abstract: Features are disclosed for maintaining data that can be used to personalize spoken language processing, such as automatic speech recognition (“ASR”), natural language understanding (“NLU”), natural language processing (“NLP”), etc. The data may be obtained from various data sources, such as applications or services used by the user. User-specific data maintained by the data sources can be retrieved and stored for use in generating personal models. Updates to data at the data sources may be reflected by separate data sets in the personalization data, such that other processes can obtain the update data sets separate from other data.Type: GrantFiled: August 30, 2013Date of Patent: June 7, 2016Assignee: Amazon Technologies, Inc.Inventors: Madan Mohan Rao Jampani, Arushan Rajasekaram, Nikko Strom, Yuzo Watanabe, Stan Weidner Salvador