Patents by Inventor Shiv Naga Prasad Vitaladevuni

Shiv Naga Prasad Vitaladevuni has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

USER PRESENCE DETECTION

Publication number: 20210027798

Abstract: A speech-capture device can capture audio data during wakeword monitoring and use the audio data to determine if a user is present nearby the device, even if no wakeword is spoken. Audio such as speech, human originating sounds (e.g., coughing, sneezing), or other human related noises (e.g., footsteps, doors closing) can be used to detect audio. Audio frames are individually scored as to whether a human presence is detected in the particular audio frames. The scores are then smoothed relative to nearby frames to create a decision for a particular frame. Presence information can then be sent according to a periodic schedule to a remote device to create a presence “heartbeat” that regularly identifies whether a user is detected proximate to a speech-capture device.

Type: Application

Filed: September 16, 2020

Publication date: January 28, 2021

Inventors: Shiva Kumar Sundaram, Chao Wang, Shiv Naga Prasad Vitaladevuni, Spyridon Matsoukas, Arindam Mandal
Wakeword training

Patent number: 10872599

Abstract: A device monitors audio data for a predetermined and/or user-defined wakeword. The device detects an error in detecting the wakeword in the audio data, such as a false-positive detection of the wakeword or a false-negative detection of the wakeword. Upon detecting the error, the device updates a model trained to detect the wakeword to create an updated trained model; the updated trained model reduces or eliminates further errors in detecting the wakeword. Data corresponding to the updated trained model may be collected by a server from a plurality of devices and used to create an updated trained model aggregating the data; this updated trained model may be sent to some or all of the devices.

Type: Grant

Filed: June 28, 2018

Date of Patent: December 22, 2020

Assignee: Amazon Technologies, Inc.

Inventors: Shuang Wu, Thibaud Senechal, Gengshen Fu, Shiv Naga Prasad Vitaladevuni
DYNAMIC WAKEWORD DETECTION

Publication number: 20200388273

Abstract: Techniques for using a dynamic wakeword detection threshold are described. A device detects a wakeword in audio data using a first wakeword detection threshold value. Thereafter, the device receives audio including speech. If the device receives the audio within a predetermined duration of time after detecting the previous wakeword, the device attempts to detect a wakeword in second audio data, corresponding to the audio including the speech, using a second, lower wakeword detection threshold value.

Type: Application

Filed: July 23, 2020

Publication date: December 10, 2020

Inventors: Gengshen Fu, Shiv Naga Prasad Vitaladevuni, Paul McIntyre, Shuang Wu
Keyword detection modeling using contextual information

Patent number: 10832662

Abstract: Features are disclosed for detecting words in audio using contextual information in addition to automatic speech recognition results. A detection model can be generated and used to determine whether a particular word, such as a keyword or “wake word,” has been uttered. The detection model can operate on features derived from an audio signal, contextual information associated with generation of the audio signal, and the like. In some embodiments, the detection model can be customized for particular users or groups of users based usage patterns associated with the users.

Type: Grant

Filed: July 3, 2017

Date of Patent: November 10, 2020

Assignee: Amazon Technologies, Inc.

Inventors: Rohit Prasad, Kenneth John Basye, Spyridon Matsoukas, Rajiv Ramachandran, Shiv Naga Prasad Vitaladevuni, Bjorn Hoffmeister
User presence detection

Patent number: 10796716

Abstract: A speech-capture device can capture audio data during wakeword monitoring and use the audio data to determine if a user is present nearby the device, even if no wakeword is spoken. Audio such as speech, human originating sounds (e.g., coughing, sneezing), or other human related noises (e.g., footsteps, doors closing) can be used to detect audio. Audio frames are individually scored as to whether a human presence is detected in the particular audio frames. The scores are then smoothed relative to nearby frames to create a decision for a particular frame. Presence information can then be sent according to a periodic schedule to a remote device to create a presence “heartbeat” that regularly identifies whether a user is detected proximate to a speech-capture device.

Type: Grant

Filed: October 11, 2018

Date of Patent: October 6, 2020

Assignee: AMAZON TECHNOLOGIES, INC.

Inventors: Shiva Kumar Sundaram, Chao Wang, Shiv Naga Prasad Vitaladevuni, Spyridon Matsoukas, Arindam Mandal
Dynamic wakeword detection

Patent number: 10777189

Abstract: Techniques for using a dynamic wakeword detection threshold are described. A device detects a wakeword in audio data using a first wakeword detection threshold value. Thereafter, the device receives audio including speech. If the device receives the audio within a predetermined duration of time after detecting the previous wakeword, the device attempts to detect a wakeword in second audio data, corresponding to the audio including the speech, using a second, lower wakeword detection threshold value.

Type: Grant

Filed: December 5, 2017

Date of Patent: September 15, 2020

Assignee: AMAZON TECHNOLOGIES, INC.

Inventors: Gengshen Fu, Shiv Naga Prasad Vitaladevuni, Paul McIntyre, Shuang Wu
Dynamic wakeword detection

Patent number: 10510340

Abstract: Techniques for using a dynamic wakeword detection threshold are described. A server(s) may receive audio data corresponding to an utterance from a device in response to the device detecting a wakeword using a wakeword detection threshold. The server(s) may then determine the device should use a lower wakeword detection threshold for a duration of time. In addition to sending the device output data responsive to the utterance, the server(s) may send the device an instruction to use the lower wakeword detection threshold for the duration of time. Alternatively, the server(s) may train a machine learning model to determine when the device should use a lower wakeword detection threshold. The server(s) may send the trained machine learned model to the device for use at runtime.

Type: Grant

Filed: December 5, 2017

Date of Patent: December 17, 2019

Assignee: Amazon Technologies, Inc.

Inventors: Gengshen Fu, Shiv Naga Prasad Vitaladevuni, Paul McIntyre, Shuang Wu
Binary target acoustic trigger detecton

Patent number: 10460729

Abstract: A method for selective transmission of audio data to a speech processing server uses detection of an acoustic trigger in the audio data in determining the data to transmit. Detection of the acoustic trigger makes use of an efficient computation approach that reduces the amount of run-time computation required, or equivalently improves accuracy for a given amount of computation, by using a neural network to determine an indicator of presence of the acoustic trigger. In some example, the neural network combines a “time delay” structure in which intermediate results of computations are reused at various time delays, thereby avoiding computation of computing new results, and decomposition of certain transformations to require fewer arithmetic operations without sacrificing significant performance.

Type: Grant

Filed: June 30, 2017

Date of Patent: October 29, 2019

Assignee: Amazon Technologies, Inc.

Inventors: Ming Sun, Aaron Lee Mathers Challenner, Yixin Gao, Shiv Naga Prasad Vitaladevuni
Acoustic trigger detection

Patent number: 10460722

Abstract: A method for selective transmission of audio data to a speech processing server uses detection of an acoustic trigger in the audio data in determining the data to transmit. Detection of the acoustic trigger makes use of an efficient computation approach that reduces the amount of run-time computation required, or equivalently improves accuracy for a given amount of computation, by combining a “time delay” structure in which intermediate results of computations are reused at various time delays, thereby avoiding computation of computing new results, and decomposition of certain transformations to require fewer arithmetic operations without sacrificing significant performance. For a given amount of computation capacity the combination of these two techniques provides improved accuracy as compared to current approaches.

Type: Grant

Filed: June 30, 2017

Date of Patent: October 29, 2019

Assignee: Amazon Technologies, Inc.

Inventors: Ming Sun, David Snyder, Yixin Gao, Nikko Strom, Spyros Matsoukas, Shiv Naga Prasad Vitaladevuni
Joint modeling of user behavior

Patent number: 10354184

Abstract: A system and method is disclosed for predicting user behavior in response to various tasks and or/applications. This system can be a neural network-based joint model. The neural network can include a base neural network portion and one or more task-specific neural network portions. The artificial neural network can be initialized and trained using data from multiple users for multiple tasks and/or applications. This user data can be related to characteristics and behavior, including age, gender, geographic location, purchases, past search history, and customer reviews. Additional task-specific neural network portions can be added to the neural network and may be trained using a task-specific subset of the training data. The joint model can be used to predict user behavior in response to an identified task and/or application. The tasks and/or applications can relate to use of a website by users.

Type: Grant

Filed: June 24, 2014

Date of Patent: July 16, 2019

Assignee: Amazon Technologies, Inc.

Inventors: Shiv Naga Prasad Vitaladevuni, Nikko Ström, Rohit Prasad
Keyword spotting using multi-task configuration

Patent number: 10304440

Abstract: An approach to keyword spotting makes use of acoustic parameters that are trained on a keyword spotting task as well as on a second speech recognition task, for example, a large vocabulary continuous speech recognition task. The parameters may be optimized according to a weighted measure that weighs the keyword spotting task more highly than the other task, and that weighs utterances of a keyword more highly than utterances of other speech. In some applications, a keyword spotter configured with the acoustic parameters is used for trigger or wake word detection.

Type: Grant

Filed: June 30, 2016

Date of Patent: May 28, 2019

Assignee: Amazon Technologies, Inc.

Inventors: Sankaran Panchapagesan, Bjorn Hoffmeister, Arindam Mandal, Aparna Khare, Shiv Naga Prasad Vitaladevuni, Spyridon Matsoukas, Ming Sun
Disambiguation in speech recognition

Patent number: 10283111

Abstract: Automatic speech recognition (ASR) processing including a feedback configuration to allow for improved disambiguation between ASR hypotheses. After ASR processing of an incoming utterance where the ASR outputs an N-best list including multiple hypotheses, the multiple hypotheses are passed downstream for further processing. The downstream further processing may include natural language understanding (NLU) or other processing to determine a command result for each hypothesis. The command results are compared to determine if any hypotheses of the N-best list would yield similar command results. If so, the hypothesis(es) with similar results are removed from the N-best list so that only one hypothesis of the similar results remains in the N-best list. The remaining non-similar hypotheses are sent for disambiguation, or, if only one hypothesis remains, it is sent for execution.

Type: Grant

Filed: December 19, 2016

Date of Patent: May 7, 2019

Assignee: Amazon Technologies, Inc.

Inventors: Francois Mairesse, Paul Frederick Raccuglia, Shiv Naga Prasad Vitaladevuni, Simon Peter Reavely
User presence detection

Patent number: 10121494

Abstract: A speech-capture device can capture audio data during wakeword monitoring and use the audio data to determine if a user is present nearby the device, even if no wakeword is spoken. Audio such as speech, human originating sounds (e.g., coughing, sneezing), or other human related noises (e.g., footsteps, doors closing) can be used to detect audio. Audio frames are individually scored as to whether a human presence is detected in the particular audio frames. The scores are then smoothed relative to nearby frames to create a decision for a particular frame. Presence information can then be sent according to a periodic schedule to a remote device to create a presence “heartbeat” that regularly identifies whether a user is detected proximate to a speech-capture device.

Type: Grant

Filed: March 30, 2017

Date of Patent: November 6, 2018

Assignee: Amazon Technologies, Inc.

Inventors: Shiva Kumar Sundaram, Chao Wang, Shiv Naga Prasad Vitaladevuni, Spyridon Matsoukas, Arindam Mandal
Dynamic adjustment of expression detection criteria

Patent number: 9940949

Abstract: In a speech-based system, a wake word or other trigger expression is used to preface user speech that is intended as a command. The system receives multiple directional audio signals, each of which emphasizes sound from a different direction. The trigger expression is detected in an individual directional audio signal by comparing a confidence score with a confidence threshold. An individual confidence threshold is specified for each directional audio signal. The confidence thresholds are adjusted during operation of the system based on performance information that is generated during operation of the system. As an example, performance information may include the number of times that the trigger expression has been detected in each of the directional audio signals.

Type: Grant

Filed: December 19, 2014

Date of Patent: April 10, 2018

Assignee: AMAZON TECHNOLOGIES, INC.

Inventors: Shiv Naga Prasad Vitaladevuni, Philip Ryan Hilmes
Stochastic modeling of user interactions with a detection system

Patent number: 9899021

Abstract: Features are disclosed for modeling user interaction with a detection system using a stochastic dynamical model in order to determine or adjust detection thresholds. The model may incorporate numerous features, such as the probability of false rejection and false acceptance of a user utterance and the cost associated with each potential action. The model may determine or adjust detection thresholds so as to minimize the occurrence of false acceptances and false rejections while preserving other desirable characteristics. The model may further incorporate background and speaker statistics. Adjustments to the model or other operation parameters can be implemented based on the model, user statistics, and/or additional data.

Type: Grant

Filed: December 20, 2013

Date of Patent: February 20, 2018

Assignee: Amazon Technologies, Inc.

Inventors: Shiv Naga Prasad Vitaladevuni, Bjorn Hoffmeister, Rohit Prasad
KEYWORD DETECTION MODELING USING CONTEXTUAL INFORMATION

Publication number: 20180012593

Abstract: Features are disclosed for detecting words in audio using contextual information in addition to automatic speech recognition results. A detection model can be generated and used to determine whether a particular word, such as a keyword or “wake word,” has been uttered. The detection model can operate on features derived from an audio signal, contextual information associated with generation of the audio signal, and the like. In some embodiments, the detection model can be customized for particular users or groups of users based usage patterns associated with the users.

Type: Application

Filed: July 3, 2017

Publication date: January 11, 2018

Inventors: Rohit Prasad, Kenneth John Basye, Spyridon Matsoukas, Rajiv Ramachandran, Shiv Naga Prasad Vitaladevuni, Bjorn Hoffmeister
Audio output masking for improved automatic speech recognition

Patent number: 9704478

Abstract: Features are disclosed for filtering portions of an output audio signal in order to improve automatic speech recognition on an input signal which may include a representation of the output signal. A signal that includes audio content can be received, and a frequency or band of frequencies can be selected to be filtered from the signal. The frequency band may correspond to a desired frequency band for speech recognition. An input signal can be obtained comprising audio data corresponding to a user utterance and presentation of the output signal. Automatic speech recognition can be performed on the input signal. In some cases, an acoustic model trained for use with such frequency band filtering may be used to perform speech recognition.

Type: Grant

Filed: December 2, 2013

Date of Patent: July 11, 2017

Assignee: Amazon Technologies, Inc.

Inventors: Shiv Naga Prasad Vitaladevuni, Amit Singh Chhetri, Phillip Ryan Hilmes, Rohit Prasad
Keyword detection modeling using contextual and environmental information

Patent number: 9697828

Abstract: Features are disclosed for detecting words in audio using environmental information and/or contextual information in addition to acoustic features associated with the words to be detected. A detection model can be generated and used to determine whether a particular word, such as a keyword or “wake word,” has been uttered. The detection model can operate on features derived from an audio signal, contextual information associated with generation of the audio signal, and the like. In some embodiments, the detection model can be customized for particular users or groups of users based usage patterns associated with the users.

Type: Grant

Filed: June 20, 2014

Date of Patent: July 4, 2017

Assignee: Amazon Technologies, Inc.

Inventors: Rohit Prasad, Kenneth John Basye, Spyridon Matsoukas, Rajiv Ramachandran, Shiv Naga Prasad Vitaladevuni, Bjorn Hoffmeister
Model shrinking for embedded keyword spotting

Patent number: 9600231

Abstract: A revised support vector machine (SVM) classifier is offered to distinguish between true keywords and false positives based on output from a keyword spotting component of a speech recognition system. The SVM operates on a reduced set of feature dimensions, where the feature dimensions are selected based on their ability to distinguish between true keywords and false positives. Further, support vectors pairs are merged to create a reduced set of re-weighted support vectors. These techniques result in an SVM that may be operated using reduced computing resources, thus improving system performance.

Type: Grant

Filed: June 26, 2015

Date of Patent: March 21, 2017

Assignee: AMAZON TECHNOLOGIES, INC.

Inventors: Ming Sun, Björn Hoffmeister, Shiv Naga Prasad Vitaladevuni, Varun Kumar Nagaraja
Estimating false rejection rate in a detection system

Patent number: 9589560

Abstract: Features are disclosed for estimating a false rejection rate in a detection system. The false rejection rate can be estimated by fitting a model to a distribution of detection confidence scores. An estimated false rejection rate can then be computed for confidence scores that fall below a threshold. The false rejection rate and model can be verified once the detection system has been deployed by obtaining additional data with confidence scores falling below the threshold. Adjustments to the model or other operational parameters can be implemented based on the verified false rejection rate, model, or additional data.

Type: Grant

Filed: December 19, 2013

Date of Patent: March 7, 2017

Assignee: Amazon Technologies, Inc.

Inventors: Shiv Naga Prasad Vitaladevuni, Bjorn Hoffmeister, Rohit Prasad

prev 1 2 3 next