Patents by Inventor Spyridon Matsoukas

Spyridon Matsoukas has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11043218
    Abstract: A system processes audio data to detect when it includes a representation of a wakeword or of an acoustic event. The system may receive or determine acoustic features for the audio data, such as log-filterbank energy (LFBE). The acoustic features may be used by a first, wakeword-detection model to detect the wakeword; the output of this model may be further processed using a softmax function, to smooth it, and to detect spikes. The same acoustic features may be also be used by a second, acoustic-event-detection model to detect the acoustic event; the output of this model may be further processed using a sigmoid function and a classifier. Another model may be used to extract additional features from the LFBE data; these additional features may be used by the other models.
    Type: Grant
    Filed: June 26, 2019
    Date of Patent: June 22, 2021
    Assignee: Amazon Technologies, Inc.
    Inventors: Ming Sun, Thibaud Senechal, Yixin Gao, Anish N. Shah, Spyridon Matsoukas, Chao Wang, Shiv Naga Prasad Vitaladevuni
  • Patent number: 11043205
    Abstract: A natural language processing system that can determine an overall score for a natural language hypothesis using hypothesis-specific component scores from different aspects of NLU processing. The individual component scores may be weighted by weights trained to optimize the overall scores relative to each other. Each domain of the system may be configured with a separate component that determines the overall score with respect to the domain. Natural language hypotheses can be ranked using the overall score either within a specific domain or on a cross-domain basis.
    Type: Grant
    Filed: December 12, 2017
    Date of Patent: June 22, 2021
    Assignee: Amazon Technologies, Inc.
    Inventors: Chengwei Su, Sankaranarayanan Ananthakrishnan, Spyridon Matsoukas, Rahul Gupta, Kelly James Vanee
  • Patent number: 11004454
    Abstract: Techniques for updating voice profiles used to perform user recognition are described. A system may use clustering techniques to update voice profiles. When the system receives audio data representing a spoken user input, the system may store the audio data. Periodically, the system may recall, from storage, audio data (representing previous user inputs). The system may identify clusters of the audio data, with each cluster including similar or identical speech characteristics. The system may determine a cluster is substantially similar to an existing voice profile. If this occurs, the system may create an updated voice profile using the original voice profile and the cluster of audio data.
    Type: Grant
    Filed: November 6, 2018
    Date of Patent: May 11, 2021
    Assignee: Amazon Technologies, Inc.
    Inventors: Sundararajan Srinivasan, Arindam Mandal, Krishna Subramanian, Spyridon Matsoukas, Aparna Khare, Rohit Prasad
  • Publication number: 20210134276
    Abstract: Features are disclosed for detecting words in audio using contextual information in addition to automatic speech recognition results. A detection model can be generated and used to determine whether a particular word, such as a keyword or “wake word,” has been uttered. The detection model can operate on features derived from an audio signal, contextual information associated with generation of the audio signal, and the like. In some embodiments, the detection model can be customized for particular users or groups of users based usage patterns associated with the users.
    Type: Application
    Filed: November 5, 2020
    Publication date: May 6, 2021
    Inventors: Rohit Prasad, Kenneth John Basye, Spyridon Matsoukas, Rajiv Ramachandran, Shiv Naga Prasad Vitaladevuni, Bjorn Hoffmeister
  • Publication number: 20210027798
    Abstract: A speech-capture device can capture audio data during wakeword monitoring and use the audio data to determine if a user is present nearby the device, even if no wakeword is spoken. Audio such as speech, human originating sounds (e.g., coughing, sneezing), or other human related noises (e.g., footsteps, doors closing) can be used to detect audio. Audio frames are individually scored as to whether a human presence is detected in the particular audio frames. The scores are then smoothed relative to nearby frames to create a decision for a particular frame. Presence information can then be sent according to a periodic schedule to a remote device to create a presence “heartbeat” that regularly identifies whether a user is detected proximate to a speech-capture device.
    Type: Application
    Filed: September 16, 2020
    Publication date: January 28, 2021
    Inventors: Shiva Kumar Sundaram, Chao Wang, Shiv Naga Prasad Vitaladevuni, Spyridon Matsoukas, Arindam Mandal
  • Patent number: 10832662
    Abstract: Features are disclosed for detecting words in audio using contextual information in addition to automatic speech recognition results. A detection model can be generated and used to determine whether a particular word, such as a keyword or “wake word,” has been uttered. The detection model can operate on features derived from an audio signal, contextual information associated with generation of the audio signal, and the like. In some embodiments, the detection model can be customized for particular users or groups of users based usage patterns associated with the users.
    Type: Grant
    Filed: July 3, 2017
    Date of Patent: November 10, 2020
    Assignee: Amazon Technologies, Inc.
    Inventors: Rohit Prasad, Kenneth John Basye, Spyridon Matsoukas, Rajiv Ramachandran, Shiv Naga Prasad Vitaladevuni, Bjorn Hoffmeister
  • Patent number: 10796716
    Abstract: A speech-capture device can capture audio data during wakeword monitoring and use the audio data to determine if a user is present nearby the device, even if no wakeword is spoken. Audio such as speech, human originating sounds (e.g., coughing, sneezing), or other human related noises (e.g., footsteps, doors closing) can be used to detect audio. Audio frames are individually scored as to whether a human presence is detected in the particular audio frames. The scores are then smoothed relative to nearby frames to create a decision for a particular frame. Presence information can then be sent according to a periodic schedule to a remote device to create a presence “heartbeat” that regularly identifies whether a user is detected proximate to a speech-capture device.
    Type: Grant
    Filed: October 11, 2018
    Date of Patent: October 6, 2020
    Assignee: AMAZON TECHNOLOGIES, INC.
    Inventors: Shiva Kumar Sundaram, Chao Wang, Shiv Naga Prasad Vitaladevuni, Spyridon Matsoukas, Arindam Mandal
  • Publication number: 20200279555
    Abstract: Methods and systems for determining an intent of an utterance using contextual information associated with a requesting device are described herein. Voice activated electronic devices may, in some embodiments, be capable of displaying content using a display screen. Entity data representing the content rendered by the display screen may describe entities having similar attributes as an identified intent from natural language understanding processing. Natural language understanding processing may attempt to resolve one or more declared slots for a particular intent and may generate an initial list of intent hypotheses ranked to indicate which are most likely to correspond to the utterance. The entity data may be compared with the declared slots for the intent hypotheses, and the list of intent hypothesis may be re-ranked to account for matching slots from the contextual metadata. The top ranked intent hypothesis after re-ranking may then be selected as the utterance's intent.
    Type: Application
    Filed: March 11, 2020
    Publication date: September 3, 2020
    Inventors: Alexandra R. Shapiro, Melanie Chie Bomke Gens, Spyridon Matsoukas, Kellen Gillespie, Rahul Goel
  • Publication number: 20200193967
    Abstract: Systems, methods, and devices for verifying a user are disclosed. A speech-controlled device captures a spoken command, and sends audio data corresponding thereto to a server. The server performs ASR on the audio data to determine ASR confidence data. The server, in parallel, performs user verification on the audio data to determine user verification confidence data. The server may modify the user verification confidence data using the ASR confidence data. In addition or alternatively, the server may modify the user verification confidence data using at least one of a location of the speech-controlled device within a building, a type of the speech-controlled device, or a geographic location of the speech-controlled device.
    Type: Application
    Filed: December 23, 2019
    Publication date: June 18, 2020
    Inventors: Spyridon Matsoukas, Aparna Khare, Vishwanathan Krishnamoorthy, Shamitha Somashekar, Arindam Mandal
  • Patent number: 10679621
    Abstract: Systems and methods for utilizing microphone array information for acoustic modeling are disclosed. Audio data may be received from a device having a microphone array configuration. Microphone configuration data may also be received that indicates the configuration of the microphone array. The microphone configuration data may be utilized as an input vector to an acoustic model, along with the audio data, to generate phoneme data. Additionally, the microphone configuration data may be utilized to train and/or generate acoustic models, select an acoustic model to perform speech recognition with, and/or to improve trigger sound detection.
    Type: Grant
    Filed: March 21, 2018
    Date of Patent: June 9, 2020
    Assignee: Amazon Technologies, Inc.
    Inventors: Shiva Kumar Sundaram, Minhua Wu, Anirudh Raju, Spyridon Matsoukas, Arindam Mandal, Kenichi Kumatani
  • Publication number: 20200152195
    Abstract: Techniques for limiting natural language processing performed on input data are described. A system receives input data from a device. The input data corresponds to a command to be executed by the system. The system determines applications likely configured to execute the command. The system performs named entity recognition and intent classification with respect to only the applications likely configured to execute the command.
    Type: Application
    Filed: November 25, 2019
    Publication date: May 14, 2020
    Inventors: Ruhi Sarikaya, Rohit Prasad, Kerry Hammil, Spyridon Matsoukas, Nikko Strom, Frédéric Johan Georges Deramat, Stephen Frederick Potter, Young-Bum Kim
  • Patent number: 10600419
    Abstract: Techniques for performing command processing are described. A system receives, from a device, input data corresponding to a command. The system determines NLU processing results associated with multiple applications. The system also determines NLU confidences for the NLU processing results for each application. The system sends NLU processing results to a portion of the multiple applications, and receives output data or instructions from the portion of the applications. The system ranks the portion of the applications based at least in part on the NLU processing results associated with the portion of the applications as well as the output data or instructions provided by the portion of the applications. The system may also rank the portion of the applications using other data. The system causes content corresponding to output data or instructions provided by the highest ranked application to be output to a user.
    Type: Grant
    Filed: September 22, 2017
    Date of Patent: March 24, 2020
    Assignee: Amazon Technologies, Inc.
    Inventors: Ruhi Sarikaya, Rohit Prasad, Kerry Hammil, Spyridon Matsoukas, Nikko Strom, Frédéric Johan Georges Deramat, Stephen Frederick Potter, Young-Bum Kim
  • Patent number: 10600406
    Abstract: Methods and systems for determining an intent of an utterance using contextual information associated with a requesting device are described herein. Voice activated electronic devices may, in some embodiments, be capable of displaying content using a display screen. Entity data representing the content rendered by the display screen may describe entities having similar attributes as an identified intent from natural language understanding processing. Natural language understanding processing may attempt to resolve one or more declared slots for a particular intent and may generate an initial list of intent hypotheses ranked to indicate which are most likely to correspond to the utterance. The entity data may be compared with the declared slots for the intent hypotheses, and the list of intent hypothesis may be re-ranked to account for matching slots from the contextual metadata. The top ranked intent hypothesis after re-ranking may then be selected as the utterance's intent.
    Type: Grant
    Filed: March 20, 2017
    Date of Patent: March 24, 2020
    Assignee: Amazon Technologies, Inc.
    Inventors: Alexandra R. Shapiro, Melanie Chie Bomke Gens, Spyridon Matsoukas, Kellen Gillespie, Rahul Goel
  • Patent number: 10522134
    Abstract: Systems, methods, and devices for verifying a user are disclosed. A speech-controlled device captures a spoken command, and sends audio data corresponding thereto to a server. The server performs ASR on the audio data to determine ASR confidence data. The server, in parallel, performs user verification on the audio data to determine user verification confidence data. The server may modify the user verification confidence data using the ASR confidence data. In addition or alternatively, the server may modify the user verification confidence data using at least one of a location of the speech-controlled device within a building, a type of the speech-controlled device, or a geographic location of the speech-controlled device.
    Type: Grant
    Filed: December 22, 2016
    Date of Patent: December 31, 2019
    Assignee: Amazon Technologies, Inc.
    Inventors: Spyridon Matsoukas, Aparna Khare, Vishwanathan Krishnamoorthy, Shamitha Somashekar, Arindam Mandal
  • Patent number: 10504512
    Abstract: Techniques for limiting natural language processing performed on input data are described. A system receives input data from a device. The input data corresponds to a command to be executed by the system. The system determines applications likely configured to execute the command. The system performs named entity recognition and intent classification with respect to only the applications likely configured to execute the command.
    Type: Grant
    Filed: September 22, 2017
    Date of Patent: December 10, 2019
    Assignee: AMAZON TECHNOLOGIES, INC.
    Inventors: Ruhi Sarikaya, Rohit Prasad, Kerry Hammil, Spyridon Matsoukas, Nikko Strom, Frédéric Johan Georges Deramat, Stephen Frederick Potter, Young-Bum Kim
  • Patent number: 10490195
    Abstract: Systems, methods, and devices related to establishing voice identity profiles for use with voice-controlled devices are provided. The embodiments disclosed enhance user experience by customizing the enrollment process to utilize voice recognition for each user based on historical information which can be used in the selection process of phrases a user speaks during enrollment of a voice recognition function or skill. The selection process can utilize phrases that have already been spoken to the electronic device; it can utilize phrases, contacts, or other personalized information it can obtain from the user account of the person enrolling; it can use any of the information just described to select specific words to enhance the probably of achieving higher phonetic matches based on words the individual user is more likely to speak to the device.
    Type: Grant
    Filed: September 26, 2017
    Date of Patent: November 26, 2019
    Assignee: Amazon Technologies, Inc.
    Inventors: Vishwanathan Krishnamoorthy, Sundararajan Srinivasan, Spyridon Matsoukas, Aparna Khare, Arindam Mandal, Krishna Subramanian, Gregory Michael Hart
  • Patent number: 10304440
    Abstract: An approach to keyword spotting makes use of acoustic parameters that are trained on a keyword spotting task as well as on a second speech recognition task, for example, a large vocabulary continuous speech recognition task. The parameters may be optimized according to a weighted measure that weighs the keyword spotting task more highly than the other task, and that weighs utterances of a keyword more highly than utterances of other speech. In some applications, a keyword spotter configured with the acoustic parameters is used for trigger or wake word detection.
    Type: Grant
    Filed: June 30, 2016
    Date of Patent: May 28, 2019
    Assignee: Amazon Technologies, Inc.
    Inventors: Sankaran Panchapagesan, Bjorn Hoffmeister, Arindam Mandal, Aparna Khare, Shiv Naga Prasad Vitaladevuni, Spyridon Matsoukas, Ming Sun
  • Patent number: 10121494
    Abstract: A speech-capture device can capture audio data during wakeword monitoring and use the audio data to determine if a user is present nearby the device, even if no wakeword is spoken. Audio such as speech, human originating sounds (e.g., coughing, sneezing), or other human related noises (e.g., footsteps, doors closing) can be used to detect audio. Audio frames are individually scored as to whether a human presence is detected in the particular audio frames. The scores are then smoothed relative to nearby frames to create a decision for a particular frame. Presence information can then be sent according to a periodic schedule to a remote device to create a presence “heartbeat” that regularly identifies whether a user is detected proximate to a speech-capture device.
    Type: Grant
    Filed: March 30, 2017
    Date of Patent: November 6, 2018
    Assignee: Amazon Technologies, Inc.
    Inventors: Shiva Kumar Sundaram, Chao Wang, Shiv Naga Prasad Vitaladevuni, Spyridon Matsoukas, Arindam Mandal
  • Patent number: 10032463
    Abstract: An automatic speech recognition (“ASR”) system produces, for particular users, customized speech recognition results by using data regarding prior interactions of the users with the system. A portion of the ASR system (e.g., a neural-network-based language model) can be trained to produce an encoded representation of a user's interactions with the system based on, e.g., transcriptions of prior utterances made by the user. This user-specific encoded representation of interaction history is then used by the language model to customize ASR processing for the user.
    Type: Grant
    Filed: December 29, 2015
    Date of Patent: July 24, 2018
    Assignee: Amazon Technologies, Inc.
    Inventors: Ariya Rastrow, Nikko Ström, Spyridon Matsoukas, Markus Dreyer, Ankur Gandhe, Denis Sergeyevich Filimonov, Julian Chan, Rohit Prasad
  • Patent number: 9892726
    Abstract: Features are disclosed for modifying a statistical model to more accurately discriminate between classes of input data. A subspace of the total model parameter space can be learned such that individual points in the subspace, corresponding to the various classes, are discriminative with respect to the classes. The subspace can be learned using an iterative process whereby an initial subspace is used to generate data and maximize an objective function. The objective function can correspond to maximizing the posterior probability of the correct class for a given input. The initial subspace, data, and objective function can be used to generate a new subspace that better discriminates between classes. The process may be repeated as desired. A model modified using such a subspace can be used to classify input data.
    Type: Grant
    Filed: December 17, 2014
    Date of Patent: February 13, 2018
    Assignee: Amazon Technologies, Inc.
    Inventors: Sri Venkata Surya Siva Rama Krishna Garimella, Spyridon Matsoukas, Ariya Rastrow, Bjorn Hoffmeister