Patents by Inventor Arindam Mandal

Arindam Mandal has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11935525
    Abstract: Systems and methods for utilizing microphone array information for acoustic modeling are disclosed. Audio data may be received from a device having a microphone array configuration. Microphone configuration data may also be received that indicates the configuration of the microphone array. The microphone configuration data may be utilized as an input vector to an acoustic model, along with the audio data, to generate phoneme data. Additionally, the microphone configuration data may be utilized to train and/or generate acoustic models, select an acoustic model to perform speech recognition with, and/or to improve trigger sound detection.
    Type: Grant
    Filed: June 8, 2020
    Date of Patent: March 19, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Shiva Kumar Sundaram, Minhua Wu, Anirudh Raju, Spyridon Matsoukas, Arindam Mandal, Kenichi Kumatani
  • Patent number: 11922095
    Abstract: A system may use multiple speech interface devices to interact with a user by speech. All or a portion of the speech interface devices may detect a user utterance and may initiate speech processing to determine a meaning or intent of the utterance. Within the speech processing, arbitration is employed to select one of the multiple speech interface devices to respond to the user utterance. Arbitration may be based in part on metadata that directly or indirectly indicates the proximity of the user to the devices, and the device that is deemed to be nearest the user may be selected to respond to the user utterance.
    Type: Grant
    Filed: January 22, 2018
    Date of Patent: March 5, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: James David Meyers, Shah Samir Pravinchandra, Yue Liu, Arlen Dean, Daniel Miller, Arindam Mandal
  • Patent number: 11908468
    Abstract: A system that is capable of resolving anaphora using timing data received by a local device. A local device outputs audio representing a list of entries. The audio may represent synthesized speech of the list of entries. A user can interrupt the device to select an entry in the list, such as by saying “that one.” The local device can determine an offset time representing the time between when audio playback began and when the user interrupted. The local device sends the offset time and audio data representing the utterance to a speech processing system which can then use the offset time and stored data to identify which entry on the list was most recently output by the local device when the user interrupted. The system can then resolve anaphora to match that entry and can perform additional processing based on the referred to item.
    Type: Grant
    Filed: December 4, 2020
    Date of Patent: February 20, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Prakash Krishnan, Arindam Mandal, Siddhartha Reddy Jonnalagadda, Nikko Strom, Ariya Rastrow, Ying Shi, David Chi-Wai Tang, Nishtha Gupta, Aaron Challenner, Bonan Zheng, Angeliki Metallinou, Vincent Auvray, Minmin Shen
  • Patent number: 11908463
    Abstract: Techniques for storing and using multi-session context are described. A system may store context data corresponding to a first interaction, where the context data may include action data, entity data and a profile identifier for a user. Later the stored context data may be retrieved during a second interaction corresponding to the entity of the second interaction. The second interaction may take place at a system different than the first interaction. The system may generate a response during the second interaction using the stored context data of the prior interaction.
    Type: Grant
    Filed: June 29, 2021
    Date of Patent: February 20, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Arjit Biswas, Shishir Bharathi, Anushree Venkatesh, Yun Lei, Ashish Kumar Agrawal, Siddhartha Reddy Jonnalagadda, Prakash Krishnan, Arindam Mandal, Raefer Christopher Gabriel, Abhay Kumar Jha, David Chi-Wai Tang, Savas Parastatidis
  • Patent number: 11893999
    Abstract: Techniques for enrolling a user in a system's user recognition functionality without requiring the user speak particular speech are described. The system may determine characteristics unique to a user input. The system may generate an implicit voice profile from user inputs having similar characteristics. After an implicit voice profile is generated, the system may receive a user input having speech characteristics similar to that of the implicit voice profile. The system may ask the user if the user wants the system to associate the implicit voice profile with a particular user identifier. If the user responds affirmatively, the system may request an identifier of a user profile (e.g., a user name). In response to receiving the user's name, the system may identify a user profile associated with the name and associate the implicit voice profile with the user profile, thereby converting the implicit voice profile into an explicit voice profile.
    Type: Grant
    Filed: August 6, 2018
    Date of Patent: February 6, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Sai Sailesh Kopuri, John Moore, Sundararajan Srinivasan, Aparna Khare, Arindam Mandal, Spyridon Matsoukas, Rohit Prasad
  • Publication number: 20230410833
    Abstract: A speech-capture device can capture audio data during wakeword monitoring and use the audio data to determine if a user is present nearby the device, even if no wakeword is spoken. Audio such as speech, human originating sounds (e.g., coughing, sneezing), or other human related noises (e.g., footsteps, doors closing) can be used to detect audio. Audio frames are individually scored as to whether a human presence is detected in the particular audio frames. The scores are then smoothed relative to nearby frames to create a decision for a particular frame. Presence information can then be sent according to a periodic schedule to a remote device to create a presence “heartbeat” that regularly identifies whether a user is detected proximate to a speech-capture device.
    Type: Application
    Filed: April 6, 2023
    Publication date: December 21, 2023
    Inventors: Shiva Kumar Sundaram, Chao Wang, Shiv Naga Prasad Vitaladevuni, Spyridon Matsoukas, Arindam Mandal
  • Patent number: 11804225
    Abstract: Techniques for conversation recovery in a dialog management system are described. A system may determine, using dialog models, that a predicted action to be performed by a skill component is likely to result in an undesired response or that the skill component is unable to respond to a user input of a dialog session. Rather than informing the user that the skill component is unable to respond, the system may send data to the skill component to enable the skill component to determine a correct action responsive to the user input. The data may include an indication of the predicted action and/or entity data corresponding to the user input. The system may receive, from the skill component, response data corresponding to the user input, and may use the response data to update a dialog context for the dialog session and an inference engine of the dialog management system.
    Type: Grant
    Filed: July 14, 2021
    Date of Patent: October 31, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Ashish Kumar Agrawal, Kemal Oral Cansizlar, Suranjit Adhikari, Shucheng Zhu, Raefer Christopher Gabriel, Arindam Mandal
  • Patent number: 11749282
    Abstract: A dialog system receives a user request corresponding to a dialog with a user. The dialog system processes the user request to determine multiple service providers capable of responding to the user request. The dialog system selects one service provider based on a request-to-handle score, and selects another service provider based on a satisfaction rating. The dialog system updates the dialog state based on further input provided by the user to determine an output responsive to the user request.
    Type: Grant
    Filed: May 5, 2020
    Date of Patent: September 5, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Arindam Mandal, Devesh Mohan Pandey, Kjel Larsen, Prakash Krishnan, Raefer Christopher Gabriel
  • Patent number: 11657832
    Abstract: A speech-capture device can capture audio data during wakeword monitoring and use the audio data to determine if a user is present nearby the device, even if no wakeword is spoken. Audio such as speech, human originating sounds (e.g., coughing, sneezing), or other human related noises (e.g., footsteps, doors closing) can be used to detect audio. Audio frames are individually scored as to whether a human presence is detected in the particular audio frames. The scores are then smoothed relative to nearby frames to create a decision for a particular frame. Presence information can then be sent according to a periodic schedule to a remote device to create a presence “heartbeat” that regularly identifies whether a user is detected proximate to a speech-capture device.
    Type: Grant
    Filed: September 16, 2020
    Date of Patent: May 23, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Shiva Kumar Sundaram, Chao Wang, Shiv Naga Prasad Vitaladevuni, Spyridon Matsoukas, Arindam Mandal
  • Patent number: 11475881
    Abstract: Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-channel DNN) that takes in raw signals and produces a first feature vector that may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. These three models may be jointly optimized for speech processing (as opposed to individually optimized for signal enhancement), enabling improved performance despite a reduction in microphones and a reduction in bandwidth consumption during real-time processing.
    Type: Grant
    Filed: July 17, 2020
    Date of Patent: October 18, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Arindam Mandal, Kenichi Kumatani, Nikko Strom, Minhua Wu, Shiva Sundaram, Bjorn Hoffmeister, Jeremie Lecomte
  • Patent number: 11393454
    Abstract: A dialog generator receives data corresponding to desired dialog, such as application programming interface (API) information and sample dialog. A first model corresponding to an agent simulator and a second model corresponding to a user simulator take turns creating a plurality of dialog outlines of the desired dialog. The dialog generator may determine that one or more additional APIs are relevant to the dialog and may create further dialog outlines related thereto. The dialog outlines are converted to natural dialog to generate the dialog.
    Type: Grant
    Filed: December 13, 2018
    Date of Patent: July 19, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Anish Acharya, Angeliki Metallinou, Tagyoung Chung, Shachi Paul, Shubhra Chandra, Chien-wei Lin, Dilek Hakkani-Tur, Arindam Mandal
  • Publication number: 20220189458
    Abstract: Systems, methods, and devices for verifying a user are disclosed. A speech-controlled device captures a spoken command, and sends audio data corresponding thereto to a server. The server performs ASR on the audio data to determine ASR confidence data. The server, in parallel, performs user verification on the audio data to determine user verification confidence data. The server may modify the user verification confidence data using the ASR confidence data. In addition or alternatively, the server may modify the user verification confidence data using at least one of a location of the speech-controlled device within a building, a type of the speech-controlled device, or a geographic location of the speech-controlled device.
    Type: Application
    Filed: January 26, 2022
    Publication date: June 16, 2022
    Inventors: Spyridon Matsoukas, Aparna Khare, Vishwanathan Krishnamoorthy, Shamitha Somashekar, Arindam Mandal
  • Publication number: 20220093101
    Abstract: A system that is capable of resolving anaphora using timing data received by a local device. A local device outputs audio representing a list of entries. The audio may represent synthesized speech of the list of entries. A user can interrupt the device to select an entry in the list, such as by saying “that one.” The local device can determine an offset time representing the time between when audio playback began and when the user interrupted. The local device sends the offset time and audio data representing the utterance to a speech processing system which can then use the offset time and stored data to identify which entry on the list was most recently output by the local device when the user interrupted. The system can then resolve anaphora to match that entry and can perform additional processing based on the referred to item.
    Type: Application
    Filed: December 4, 2020
    Publication date: March 24, 2022
    Inventors: Prakash Krishnan, Arindam Mandal, Siddhartha Reddy Jonnalagadda, Nikko Strom, Ariya Rastrow, Ying Shi, David Chi-Wai Tang, Nishtha Gupta, Aaron Challenner, Bonan Zheng, Angeliki Metallinou, Vincent Auvray, Minmin Shen
  • Publication number: 20220093093
    Abstract: A system can operate a speech-controlled device in a mode where the speech-controlled device determines that an utterance is directed at the speech-controlled device using image data showing the user speaking the utterance. If the user is directing the user's gaze at the speech-controlled device while speaking, the system may determine the utterance is system directed and thus may perform further speech processing based on the utterance. If the user's gaze is directed elsewhere, the system may determine the utterance is not system directed (for example directed at another user) and thus the system may not perform further speech processing based on the utterance and may take other actions, for example discarding audio data of the utterance.
    Type: Application
    Filed: December 4, 2020
    Publication date: March 24, 2022
    Inventors: Prakash Krishnan, Arindam Mandal, Nikko Strom, Pradeep Natarajan, Ariya Rastrow, Shiv Naga Prasad Vitaladevuni, David Chi-Wai Tang, Aaron Challenner, Xu Zhang, Krishna Anisetty, Josey Diego Sandoval, Rohit Prasad, Premkumar Natarajan
  • Publication number: 20220093094
    Abstract: A natural language system may be configured to act as a participant in a conversation between two users. The system may determine when a user expression such as speech, a gesture, or the like is directed from one user to the other. The system may processing input data related the expression (such as audio data, input data, language processing result data, conversation context data, etc.) to determine if the system should interject a response to the user-to-user expression. If so, the system may process the input data to determine a response and output it. The system may track that response as part of the data related to the ongoing conversation.
    Type: Application
    Filed: December 4, 2020
    Publication date: March 24, 2022
    Inventors: Prakash Krishnan, Arindam Mandal, Siddhartha Reddy Jonnalagadda, Nikko Strom, Ariya Rastrow, Shiv Naga Prasad Vitaladevuni, Angeliki Metallinou, Vincent Auvray, Minmin Shen, Josey Diego Sandoval, Rohit Prasad, Thomas Taylor, Amotz Maimon
  • Patent number: 11270685
    Abstract: Systems, methods, and devices for verifying a user are disclosed. A speech-controlled device captures a spoken command, and sends audio data corresponding thereto to a server. The server performs ASR on the audio data to determine ASR confidence data. The server, in parallel, performs user verification on the audio data to determine user verification confidence data. The server may modify the user verification confidence data using the ASR confidence data. In addition or alternatively, the server may modify the user verification confidence data using at least one of a location of the speech-controlled device within a building, a type of the speech-controlled device, or a geographic location of the speech-controlled device.
    Type: Grant
    Filed: December 23, 2019
    Date of Patent: March 8, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Spyridon Matsoukas, Aparna Khare, Vishwanathan Krishnamoorthy, Shamitha Somashekar, Arindam Mandal
  • Patent number: 11200885
    Abstract: A dialog manager receives text data corresponding to a dialog with a user. Entities represented in the text data are identified. Context data relating to the dialog is maintained, which may include prior dialog, prior API calls, user profile information, or other data. Using the text data and the context data, an N-best list of one or more dialog models is selected to process the text data. After processing the text data, the outputs of the N-best models are ranked and a top-scoring output is selected. The top-scoring output may be an API call and/or an audio prompt.
    Type: Grant
    Filed: December 13, 2018
    Date of Patent: December 14, 2021
    Assignee: Amazon Technologies, Inc.
    Inventors: Arindam Mandal, Nikko Strom, Angeliki Metallinou, Tagyoung Chung, Dilek Hakkani-Tur, Suranjit Adhikari, Sridhar Yadav Manoharan, Ankita De, Qing Liu, Raefer Christopher Gabriel, Rohit Prasad
  • Patent number: 11200884
    Abstract: Techniques for labeling user inputs for updating user recognition voice profiles are described. A system may leverage various signals, generated during or after processing of a user input, to retroactively determine which user spoke the user input. For example, after the system receives the user input, the user may provide the system with non-spoken user verification information. Based on such user verification information, the system may label the previously spoken user input as originating from the particular user. The system may also or alternatively use system usage history to retroactively label user inputs.
    Type: Grant
    Filed: November 6, 2018
    Date of Patent: December 14, 2021
    Assignee: Amazon Technologies, Inc.
    Inventors: Sundararajan Srinivasan, Arindam Mandal, Krishna Subramanian, Spyridon Matsoukas, Aparna Khare, Rohit Prasad
  • Publication number: 20210312914
    Abstract: Described herein is a system for rescoring automatic speech recognition hypotheses for conversational devices that have multi-turn dialogs with a user. The system leverages dialog context by incorporating data related to past user utterances and data related to the system generated response corresponding to the past user utterance. Incorporation of this data improves recognition of a particular user utterance within the dialog.
    Type: Application
    Filed: June 7, 2021
    Publication date: October 7, 2021
    Inventors: Behnam Hedayatnia, Anirudh Raju, Ankur Gandhe, Chandra Prakash Khatri, Ariya Rastrow, Anushree Venkatesh, Arindam Mandal, Raefer Christopher Gabriel, Ahmad Shikib Mehri
  • Publication number: 20210304774
    Abstract: Techniques for updating voice profiles used to perform user recognition are described. A system may use clustering techniques to update voice profiles. When the system receives audio data representing a spoken user input, the system may store the audio data. Periodically, the system may recall, from storage, audio data (representing previous user inputs). The system may identify clusters of the audio data, with each cluster including similar or identical speech characteristics. The system may determine a cluster is substantially similar to an existing voice profile. If this occurs, the system may create an updated voice profile using the original voice profile and the cluster of audio data.
    Type: Application
    Filed: April 13, 2021
    Publication date: September 30, 2021
    Inventors: Sundararajan Srinivasan, Arindam Mandal, Krishna Subramanian, Spyridon Matsoukas, Aparna Khare, Rohit Prasad