Patents by Inventor Arindam Mandal

Arindam Mandal has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10726830
    Abstract: Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-channel DNN) that takes in raw signals and produces a first feature vector that may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. These three models may be jointly optimized for speech processing (as opposed to individually optimized for signal enhancement), enabling improved performance despite a reduction in microphones and a reduction in bandwidth consumption during real-time processing.
    Type: Grant
    Filed: September 27, 2018
    Date of Patent: July 28, 2020
    Assignee: Amazon Technologies, Inc.
    Inventors: Arindam Mandal, Kenichi Kumatani, Nikko Strom, Minhua Wu, Shiva Sundaram, Bjorn Hoffmeister, Jeremie Lecomte
  • Publication number: 20200193967
    Abstract: Systems, methods, and devices for verifying a user are disclosed. A speech-controlled device captures a spoken command, and sends audio data corresponding thereto to a server. The server performs ASR on the audio data to determine ASR confidence data. The server, in parallel, performs user verification on the audio data to determine user verification confidence data. The server may modify the user verification confidence data using the ASR confidence data. In addition or alternatively, the server may modify the user verification confidence data using at least one of a location of the speech-controlled device within a building, a type of the speech-controlled device, or a geographic location of the speech-controlled device.
    Type: Application
    Filed: December 23, 2019
    Publication date: June 18, 2020
    Inventors: Spyridon Matsoukas, Aparna Khare, Vishwanathan Krishnamoorthy, Shamitha Somashekar, Arindam Mandal
  • Patent number: 10679621
    Abstract: Systems and methods for utilizing microphone array information for acoustic modeling are disclosed. Audio data may be received from a device having a microphone array configuration. Microphone configuration data may also be received that indicates the configuration of the microphone array. The microphone configuration data may be utilized as an input vector to an acoustic model, along with the audio data, to generate phoneme data. Additionally, the microphone configuration data may be utilized to train and/or generate acoustic models, select an acoustic model to perform speech recognition with, and/or to improve trigger sound detection.
    Type: Grant
    Filed: March 21, 2018
    Date of Patent: June 9, 2020
    Assignee: Amazon Technologies, Inc.
    Inventors: Shiva Kumar Sundaram, Minhua Wu, Anirudh Raju, Spyridon Matsoukas, Arindam Mandal, Kenichi Kumatani
  • Patent number: 10522134
    Abstract: Systems, methods, and devices for verifying a user are disclosed. A speech-controlled device captures a spoken command, and sends audio data corresponding thereto to a server. The server performs ASR on the audio data to determine ASR confidence data. The server, in parallel, performs user verification on the audio data to determine user verification confidence data. The server may modify the user verification confidence data using the ASR confidence data. In addition or alternatively, the server may modify the user verification confidence data using at least one of a location of the speech-controlled device within a building, a type of the speech-controlled device, or a geographic location of the speech-controlled device.
    Type: Grant
    Filed: December 22, 2016
    Date of Patent: December 31, 2019
    Assignee: Amazon Technologies, Inc.
    Inventors: Spyridon Matsoukas, Aparna Khare, Vishwanathan Krishnamoorthy, Shamitha Somashekar, Arindam Mandal
  • Patent number: 10490195
    Abstract: Systems, methods, and devices related to establishing voice identity profiles for use with voice-controlled devices are provided. The embodiments disclosed enhance user experience by customizing the enrollment process to utilize voice recognition for each user based on historical information which can be used in the selection process of phrases a user speaks during enrollment of a voice recognition function or skill. The selection process can utilize phrases that have already been spoken to the electronic device; it can utilize phrases, contacts, or other personalized information it can obtain from the user account of the person enrolling; it can use any of the information just described to select specific words to enhance the probably of achieving higher phonetic matches based on words the individual user is more likely to speak to the device.
    Type: Grant
    Filed: September 26, 2017
    Date of Patent: November 26, 2019
    Assignee: Amazon Technologies, Inc.
    Inventors: Vishwanathan Krishnamoorthy, Sundararajan Srinivasan, Spyridon Matsoukas, Aparna Khare, Arindam Mandal, Krishna Subramanian, Gregory Michael Hart
  • Patent number: 10304444
    Abstract: A system capable of performing natural language understanding (NLU) without the concept of a domain that influences NLU results. The present system uses a hierarchical organizations of intents/commands and entity types, and trained models associated with those hierarchies, so that commands and entity types may be determined for incoming text queries without necessarily determining a domain for the incoming text. The system thus operates in a domain agnostic manner, in a departure from multi-domain architecture NLU processing where a system determines NLU results for multiple domains simultaneously and then ranks them to determine which to select as the result.
    Type: Grant
    Filed: June 29, 2016
    Date of Patent: May 28, 2019
    Assignee: Amazon Technologies, Inc.
    Inventors: Lambert Mathias, Thomas Kollar, Arindam Mandal, Angeliki Metallinou
  • Patent number: 10304440
    Abstract: An approach to keyword spotting makes use of acoustic parameters that are trained on a keyword spotting task as well as on a second speech recognition task, for example, a large vocabulary continuous speech recognition task. The parameters may be optimized according to a weighted measure that weighs the keyword spotting task more highly than the other task, and that weighs utterances of a keyword more highly than utterances of other speech. In some applications, a keyword spotter configured with the acoustic parameters is used for trigger or wake word detection.
    Type: Grant
    Filed: June 30, 2016
    Date of Patent: May 28, 2019
    Assignee: Amazon Technologies, Inc.
    Inventors: Sankaran Panchapagesan, Bjorn Hoffmeister, Arindam Mandal, Aparna Khare, Shiv Naga Prasad Vitaladevuni, Spyridon Matsoukas, Ming Sun
  • Patent number: 10147442
    Abstract: A neural network acoustic model is trained to be robust and produce accurate output when used to process speech signals having acoustic interference. The neural network acoustic model can be trained using a source-separation process by which, in addition to producing the main acoustic model output for a given input, the neural network generates predictions of the separate speech and interference portions of the input. The parameters of the neural network can be adjusted to jointly optimize all three outputs (e.g., the main acoustic model output, the speech signal prediction, and the interference signal prediction), rather than only optimizing the main acoustic model output. Once trained, output layers for the speech and interference signal predictions can be removed from the neural network or otherwise disabled.
    Type: Grant
    Filed: September 29, 2015
    Date of Patent: December 4, 2018
    Assignee: Amazon Technologies, Inc.
    Inventors: Sankaran Panchapagesan, Shiva Kumar Sundaram, Arindam Mandal
  • Patent number: 10121494
    Abstract: A speech-capture device can capture audio data during wakeword monitoring and use the audio data to determine if a user is present nearby the device, even if no wakeword is spoken. Audio such as speech, human originating sounds (e.g., coughing, sneezing), or other human related noises (e.g., footsteps, doors closing) can be used to detect audio. Audio frames are individually scored as to whether a human presence is detected in the particular audio frames. The scores are then smoothed relative to nearby frames to create a decision for a particular frame. Presence information can then be sent according to a periodic schedule to a remote device to create a presence “heartbeat” that regularly identifies whether a user is detected proximate to a speech-capture device.
    Type: Grant
    Filed: March 30, 2017
    Date of Patent: November 6, 2018
    Assignee: Amazon Technologies, Inc.
    Inventors: Shiva Kumar Sundaram, Chao Wang, Shiv Naga Prasad Vitaladevuni, Spyridon Matsoukas, Arindam Mandal
  • Publication number: 20180210703
    Abstract: A system may use multiple speech interface devices to interact with a user by speech. All or a portion of the speech interface devices may detect a user utterance and may initiate speech processing to determine a meaning or intent of the utterance. Within the speech processing, arbitration is employed to select one of the multiple speech interface devices to respond to the user utterance. Arbitration may be based in part on metadata that directly or indirectly indicates the proximity of the user to the devices, and the device that is deemed to be nearest the user may be selected to respond to the user utterance.
    Type: Application
    Filed: January 22, 2018
    Publication date: July 26, 2018
    Inventors: James David Meyers, Shah Samir Pravinchandra, Yue Liu, Arlen Dean, Daniel Miller, Arindam Mandal
  • Patent number: 9875081
    Abstract: A system may use multiple speech interface devices to interact with a user by speech. All or a portion of the speech interface devices may detect a user utterance and may initiate speech processing to determine a meaning or intent of the utterance. Within the speech processing, arbitration is employed to select one of the multiple speech interface devices to respond to the user utterance. Arbitration may be based in part on metadata that directly or indirectly indicates the proximity of the user to the devices, and the device that is deemed to be nearest the user may be selected to respond to the user utterance.
    Type: Grant
    Filed: September 21, 2015
    Date of Patent: January 23, 2018
    Assignee: Amazon Technologies, Inc.
    Inventors: James David Meyers, Shah Samir Pravinchandra, Yue Liu, Arlen Dean, Daniel Miller, Arindam Mandal
  • Patent number: 9805718
    Abstract: A dialog assistant embodied in a computing system can present a clarification question based on a machine-readable version of human-generated conversational natural language input. Some versions of the dialog assistant identify a clarification target in the machine-readable version, determine a clarification type relating to the clarification target, present the clarification question in a conversational natural language manner, and process a human-generated conversational natural language response to the clarification question.
    Type: Grant
    Filed: April 19, 2013
    Date of Patent: October 31, 2017
    Assignee: SRI INTERNAITONAL
    Inventors: Necip Fazil Ayan, Arindam Mandal, Jing Zheng
  • Publication number: 20170278514
    Abstract: A system capable of performing natural language understanding (NLU) without the concept of a domain that influences NLU results. The present system uses a hierarchical organizations of intents/commands and entity types, and trained models associated with those hierarchies, so that commands and entity types may be determined for incoming text queries without necessarily determining a domain for the incoming text. The system thus operates in a domain agnostic manner, in a departure from multi-domain architecture NLU processing where a system determines NLU results for multiple domains simultaneously and then ranks them to determine which to select as the result.
    Type: Application
    Filed: June 29, 2016
    Publication date: September 28, 2017
    Inventors: Lambert Mathias, Thomas Kollar, Arindam Mandal, Angeliki Metallinou
  • Publication number: 20170083285
    Abstract: A system may use multiple speech interface devices to interact with a user by speech. All or a portion of the speech interface devices may detect a user utterance and may initiate speech processing to determine a meaning or intent of the utterance. Within the speech processing, arbitration is employed to select one of the multiple speech interface devices to respond to the user utterance. Arbitration may be based in part on metadata that directly or indirectly indicates the proximity of the user to the devices, and the device that is deemed to be nearest the user may be selected to respond to the user utterance.
    Type: Application
    Filed: September 21, 2015
    Publication date: March 23, 2017
    Inventors: James David Meyers, Shah Samir Pravinchandra, Yue Liu, Arlen Dean, Daniel Miller, Arindam Mandal
  • Publication number: 20140316764
    Abstract: A dialog assistant embodied in a computing system can present a clarification question based on a machine-readable version of human-generated conversational natural language input. Some versions of the dialog assistant identify a clarification target in the machine-readable version, determine a clarification type relating to the clarification target, present the clarification question in a conversational natural language manner, and process a human-generated conversational natural language response to the clarification question.
    Type: Application
    Filed: April 19, 2013
    Publication date: October 23, 2014
    Inventors: Necip Fazil Ayan, Arindam Mandal, Jing Zheng
  • Publication number: 20130304752
    Abstract: Embodiments of the invention provide systems and methods for recording and handling an absence of an individual according to a multi-tier employment model. According to one embodiment, the method can comprise receiving absence information indicating a period during which an individual will be unavailable. A list of assignments affected by the absence can be generated based on the received absence information, work schedule information, and work assignment information defined in a work assignment tier of the multi-tier employment model. The generated list of assignments can be filtered based on the received absence information and work schedule information for each assignment of the list of assignments. A list of supervisors corresponding to the assignments of the filtered list of assignments can be generated and a notification of the absence can be provided to each of the supervisors in the list of supervisors.
    Type: Application
    Filed: May 8, 2012
    Publication date: November 14, 2013
    Applicant: Oracle International Corporation
    Inventors: Surendra Nath. Nukala, Arindam Mandal