Patents by Inventor Shiva Sundaram

Shiva Sundaram has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11574628
    Abstract: Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-geometry/multi-channel DNN) that is trained using a plurality of microphone array geometries. Thus, the first model may receive a variable number of microphone channels, generate multiple outputs using multiple microphone array geometries, and select the best output as a first feature vector that may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. The DNN front-end enables improved performance despite a reduction in microphones.
    Type: Grant
    Filed: March 28, 2019
    Date of Patent: February 7, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Kenichi Kumatani, Minhua Wu, Shiva Sundaram, Nikko Strom, Bjorn Hoffmeister
  • Patent number: 11495215
    Abstract: Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-geometry/multi-channel DNN) that includes a frequency aligned network (FAN) architecture. Thus, the first model may perform spatial filtering to generate a first feature vector by processing individual frequency bins separately, such that multiple frequency bins are not combined. The first feature vector may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. The DNN front-end enables improved performance despite a reduction in microphones.
    Type: Grant
    Filed: December 11, 2019
    Date of Patent: November 8, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Minhua Wu, Shiva Sundaram, Tae Jin Park, Kenichi Kumatani
  • Patent number: 11475881
    Abstract: Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-channel DNN) that takes in raw signals and produces a first feature vector that may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. These three models may be jointly optimized for speech processing (as opposed to individually optimized for signal enhancement), enabling improved performance despite a reduction in microphones and a reduction in bandwidth consumption during real-time processing.
    Type: Grant
    Filed: July 17, 2020
    Date of Patent: October 18, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Arindam Mandal, Kenichi Kumatani, Nikko Strom, Minhua Wu, Shiva Sundaram, Bjorn Hoffmeister, Jeremie Lecomte
  • Publication number: 20200349928
    Abstract: Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-channel DNN) that takes in raw signals and produces a first feature vector that may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. These three models may be jointly optimized for speech processing (as opposed to individually optimized for signal enhancement), enabling improved performance despite a reduction in microphones and a reduction in bandwidth consumption during real-time processing.
    Type: Application
    Filed: July 17, 2020
    Publication date: November 5, 2020
    Inventors: Arindam Mandal, Kenichi Kumatani, Nikko Strom, Minhua Wu, Shiva Sundaram, Bjorn Hoffmeister, Jeremie Lecomte
  • Patent number: 10726830
    Abstract: Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-channel DNN) that takes in raw signals and produces a first feature vector that may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. These three models may be jointly optimized for speech processing (as opposed to individually optimized for signal enhancement), enabling improved performance despite a reduction in microphones and a reduction in bandwidth consumption during real-time processing.
    Type: Grant
    Filed: September 27, 2018
    Date of Patent: July 28, 2020
    Assignee: Amazon Technologies, Inc.
    Inventors: Arindam Mandal, Kenichi Kumatani, Nikko Strom, Minhua Wu, Shiva Sundaram, Bjorn Hoffmeister, Jeremie Lecomte
  • Patent number: 9837099
    Abstract: Embodiments of systems and methods are described for determining which of a plurality of beamformed audio signals to select for signal processing. In some embodiments, a plurality of audio input signals are received from a microphone array comprising a plurality of microphones. A plurality of beamformed audio signals are determined based on the plurality of input audio signals, the beamformed audio signals comprising a direction. A plurality of signal features may be determined for each beamformed audio signal. Smoothed features may be determined for each beamformed audio signal based on at least a portion of the plurality of signal features. The beamformed audio signal corresponding to the maximum smoothed feature may be selected for further processing.
    Type: Grant
    Filed: August 29, 2016
    Date of Patent: December 5, 2017
    Assignee: Amazon Technologies, Inc.
    Inventors: Shiva Sundaram, Amit Singh Chhetri, Ramya Gopalan, Philip Ryan Hilmes
  • Patent number: 9734822
    Abstract: Features are disclosed for improving the accuracy and stability of beamformed signal selection. The selection may consider processing feedback information to identify when the current beam selection may need to be re-evaluated. The feedback information may further be used to select a beamformed signal for processing. For example, beams which detect wake-words or yield high confidence speech recognition may be favored over beams which fail to detect or recognize at a lower confidence level.
    Type: Grant
    Filed: June 1, 2015
    Date of Patent: August 15, 2017
    Assignee: Amazon Technologies, Inc.
    Inventors: Shiva Sundaram, Ramya Gopalan
  • Patent number: 9432769
    Abstract: Embodiments of systems and methods are described for determining which of a plurality of beamformed audio signals to select for signal processing. In some embodiments, a plurality of audio input signals are received from a microphone array comprising a plurality of microphones. A plurality of beamformed audio signals are determined based on the plurality of input audio signals, the beamformed audio signals comprising a direction. A plurality of signal features may be determined for each beamformed audio signal. Smoothed features may be determined for each beamformed audio signal based on at least a portion of the plurality of signal features. The beamformed audio signal corresponding to the maximum smoothed feature may be selected for further processing.
    Type: Grant
    Filed: July 30, 2014
    Date of Patent: August 30, 2016
    Assignee: Amazon Technologies, Inc.
    Inventors: Shiva Sundaram, Amit Singh Chhetri, Ramya Gopalan, Philip Ryan Hilmes