Patents by Inventor Arvindh Krishnaswamy

Arvindh Krishnaswamy has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12340281
    Abstract: Systems, methods, and apparatuses for selecting a model are described. A method includes receiving a request to perform model selection and evaluating multiple models by generating various metrics for each trained model. These metrics include a forewarning time (how much in advance of a failure an alert can be raised), an event recall metric (how many failure events were alerted to in advance), an event precision metric (ratio of true and false positives, and an area under a receiver operating characteristic (ROC) curve. For each trained model, a weighted harmonic mean is calculated from these metrics. A model is then selected based on the calculated means and a report on the selected model is generated and provided.
    Type: Grant
    Filed: September 30, 2020
    Date of Patent: June 24, 2025
    Assignee: Amazon Technologies, Inc.
    Inventors: Karim Helwani, Srikanth Venkata Tenneti, Arvindh Krishnaswamy, Ritwik Giri, Mehmet Umut Isik, Aparna Pandey, Fangzhou Cheng
  • Patent number: 12284048
    Abstract: Implementations for compositing two input signals to form a higher quality signal are described. A first input signal is received from a first input device and a second input signal is received from a second input device. The first input signal and the second input signal are combined into a composite input signal. It is then determined that the composite input signal has a higher quality than either the first input signal or the second input signal individually. Based on that determination, the composite input signal is selected for use by the media conferencing service as part of a media conference.
    Type: Grant
    Filed: June 30, 2021
    Date of Patent: April 22, 2025
    Assignee: Amazon Technologies, Inc.
    Inventors: Siddhartha Shankara Rao, Michael Klingbeil, Arvindh Krishnaswamy, John Joseph Dunne
  • Patent number: 12272371
    Abstract: Real-time audio enhancement for a target speaker may be performed. An embedding of a sample of speaker audio is created using a trained neural network that performs voice identification. The embedding is then concatenated with the input features of a trained machine learning model for audio enhancement. The audio enhancement model can recognize and enhance a target speaker's speech in a real-time implementation, as the embedding is in the same feature space of the audio enhancement model.
    Type: Grant
    Filed: June 30, 2021
    Date of Patent: April 8, 2025
    Assignee: Amazon Technologies, Inc.
    Inventors: Ritwik Giri, Shrikant Venkataramani, Jean-Marc Valin, Mehmet Umut Isik, Arvindh Krishnaswamy
  • Patent number: 12205039
    Abstract: A group masked autoencoder may be implemented for anomaly detection. An autoencoder network model may be trained without supervision and applied to output an estimated joint probability distribution of normality for a group of frames of time series data. The estimated joint probability distribution may be used to determine an anomaly score for the time series data. An anomaly may be detected according to the anomaly score and a result that indicates a detected anomaly may be provided.
    Type: Grant
    Filed: November 2, 2020
    Date of Patent: January 21, 2025
    Assignee: Amazon Technologies, Inc.
    Inventors: Ritwik Giri, Srikanth Venkata Tenneti, Karim Helwani, Fangzhou Cheng, Mehmet Umut Isik, Arvindh Krishnaswamy
  • Patent number: 12175434
    Abstract: Systems, methods, and apparatuses for detecting anomalies using clusters are described. In some examples, a method includes receiving a request to perform anomaly detection using a plurality of clusters; receiving a data point; determining when the received data point is a part of one of the plurality of clusters utilizing a distance to centers of the one or more clusters, wherein: when the received data point is determined to belong to a normal cluster, assigning the received data point to the determined cluster, updating the cluster, and updating a history for the cluster, when the received data point is determined to belong to an anomalous cluster, raising an anomaly, updating the cluster, and updating a history for the cluster, and when the received data point is determined to not belong to any cluster, raising an anomaly.
    Type: Grant
    Filed: September 30, 2020
    Date of Patent: December 24, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Srikanth Venkata Tenneti, Arvindh Krishnaswamy, Karim Helwani, Mehmet Umut Isik, Ritwik Giri, Fangzhou Cheng, Aparna Pandey
  • Patent number: 12047536
    Abstract: Implementations for selecting an input device based on characteristics of the input signals from those input devices are described. A first input signal is received from a first input device of a participant device participating in a media conference and a second input signal is received from a second input device of the participant device. A first characteristic of the first input signal and a second characteristic of the second input signal are determined. The first characteristic is compared to the second characteristic. It is determined that a quality of the second input signal is greater than a quality of the first input signal based on comparing the first characteristic to the second characteristic. The second input device is selected based on determining that the quality of the second input signal is greater than the quality of the first input signal.
    Type: Grant
    Filed: June 30, 2021
    Date of Patent: July 23, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Siddhartha Shankara Rao, Michael Klingbeil, Arvindh Krishnaswamy, John Joseph Dunne
  • Patent number: 12014748
    Abstract: Techniques for training and using a machine learning model for estimation of reverberation in a multi-task learning framework are described. According to some embodiments, the multi-task learning framework improves the performance of the machine learning model by estimating the amount of reverberation present in an input audio recording as a secondary task to the primary task of generating a clean speech portion of the input audio recording. In one embodiment, a model architecture is selected that takes a noisy reverberant recording as an input and outputs an estimate of a clean (e.g., de-reverberated) signal, an estimate of noise (e.g., background noise), and an estimate of the reverb only portion, with the secondary task of estimating the reverb only portion acting as a regularizer that improves the machine learning model's performance in enhancing the reverberant (e.g., and noisy) input speech.
    Type: Grant
    Filed: August 7, 2020
    Date of Patent: June 18, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Ritwik Giri, Mehmet Umut Isik, Neerad Dilip Phansalkar, Jean-Marc Valin, Karim Helwani, Arvindh Krishnaswamy
  • Patent number: 12008457
    Abstract: Audio processing may be performed with a convolutional neural network that includes positional embeddings. Audio data may be received at an audio processing system. A convolutional neural network that concatenates frequency-positional embeddings at an input layer may be used to process the audio data. A result of processing the audio data through the convolutional neural network may be used to perform an audio processing task.
    Type: Grant
    Filed: September 29, 2020
    Date of Patent: June 11, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Mehmet Umut Isik, Ritwik Giri, Neerad Dilip Phansalkar, Jean-Marc Valin, Karim Helwani, Arvindh Krishnaswamy
  • Publication number: 20240096346
    Abstract: A plurality of talker embedding vectors may be derived that correspond to a plurality of talkers in an input audio stream. Each talker embedding vector may represent respective voice characteristics of a respective talker. The talker embedding vectors may be generated based on, for example, a pre-enrollment process or a cluster-based embedding vector derivation process. A plurality of instances of a personalized noise suppression model may be executed on the input audio stream. Each instance of the personalized noise suppression model may employ a respective talker embedding vector. A plurality of single-talker audio streams may be generated by the plurality of instances of the personalized noise suppression model. A plurality of single-talker transcriptions may be generated based on the plurality of single-talker audio streams. The plurality of single-talker transcriptions may be merged into a multi-talker output transcription.
    Type: Application
    Filed: June 27, 2022
    Publication date: March 21, 2024
    Inventors: Masahito Togami, Ritwik Giri, Michael Mark Goodwin, Arvindh . Krishnaswamy, Siddhartha Shankara Rao
  • Patent number: 11924367
    Abstract: Joint noise and echo suppression may be performed for enhancing two-way audio communications. Audio data is captured at a communication device and audio data transmitted to the communication device from another communication device are used as input features to a trained machine learning model that uses the transmitted audio data as a reference signal to eliminate residual echo in the captured audio data when also suppressing noise in the captured audio data.
    Type: Grant
    Filed: February 9, 2022
    Date of Patent: March 5, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Jean-Marc Valin, Karim Helwani, Srikanth Venkata Tenneti, Erfan Soltanmohammadi, Mehmet Umut Isik, Richard Newman, Michael Mark Goodwin, Arvindh Krishnaswamy
  • Patent number: 11775145
    Abstract: An electronic device displays a messaging interface that allows a participant in a message conversation to capture, send, and/or play media content. The media content includes images, video, and/or audio. The media content is captured, sent, and/or played based on the electronic device detecting one or more conditions.
    Type: Grant
    Filed: November 17, 2022
    Date of Patent: October 3, 2023
    Assignee: Apple Inc.
    Inventors: Roberto Garcia, Anil K. Kandangath, Arvindh Krishnaswamy, Xiaoyuan Tu, Justin Wood
  • Patent number: 11545134
    Abstract: Techniques for the generation of dubbed audio for an audio/video are described.
    Type: Grant
    Filed: December 10, 2019
    Date of Patent: January 3, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Marcello Federico, Robert Enyedi, Yaser Al-Onaizan, Roberto Barra-Chicote, Andrew Paul Breen, Ritwik Giri, Mehmet Umut Isik, Arvindh Krishnaswamy, Hassan Sawaf
  • Patent number: 11521637
    Abstract: Post-filtering may be performed for ratio masks as part of audio enhancement. Audio data may be received. A machine learning model may be applied to generate gain values for different spectrum bands of the audio data. The gain values may then be modified using an envelope post-filter according to a monotonically increasing function applied to the gain values to produce modified gain values used to generate an enhanced version of the audio data.
    Type: Grant
    Filed: September 29, 2020
    Date of Patent: December 6, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Jean-Marc Valin, Mehmet Umut Isik, Neerad Dilip Phansalkar, Ritwik Giri, Karim Helwani, Arvindh Krishnaswamy
  • Patent number: 11513661
    Abstract: An electronic device includes a touch-sensitive surface, a display, and a camera sensor. The device displays a message region for displaying a message conversation and receives a request to add media to the message conversation. Responsive to receiving the request, the device displays a media selection interface concurrently with at least a portion of the message conversation. The media selection interface includes a plurality of affordances for selecting media for addition to the message conversation, the plurality of affordances includes a live preview affordance, at least a subset of the plurality of affordances includes thumbnail representations of media available for adding to the message conversation, and the live preview affordance is associated with a live camera preview. Responsive to detecting selection of the live preview affordance, the device captures a new image based on the live camera preview and selects the new image for addition to the message conversation.
    Type: Grant
    Filed: July 22, 2020
    Date of Patent: November 29, 2022
    Assignee: Apple Inc.
    Inventors: Roberto Garcia, Anil K. Kandangath, Arvindh Krishnaswamy, Xiaoyuan Tu, Justin Wood
  • Publication number: 20220101270
    Abstract: Systems, methods, and apparatuses for detecting anomalies using clusters are described. In some examples, a method includes receiving a request to perform anomaly detection using a plurality of clusters; receiving a data point; determining when the received data point is a part of one of the plurality of clusters utilizing a distance to centers of the one or more clusters, wherein: when the received data point is determined to belong to a normal cluster, assigning the received data point to the determined cluster, updating the cluster, and updating a history for the cluster, when the received data point is determined to belong to an anomalous cluster, raising an anomaly, updating the cluster, and updating a history for the cluster, and when the received data point is determined to not belong to any cluster, raising an anomaly.
    Type: Application
    Filed: September 30, 2020
    Publication date: March 31, 2022
    Inventors: Srikanth Venkata Tenneti, Arvindh Krishnaswamy, Karim Helwani, Mehmet Umut Isik, Ritwik Giri, Fangzhou Cheng, Aparna Pandey
  • Patent number: 10614812
    Abstract: A speech recognition system for resolving impaired utterances can have a speech recognition engine configured to receive a plurality of representations of an utterance and concurrently to determine a plurality of highest-likelihood transcription candidates corresponding to each respective representation of the utterance. The recognition system can also have a selector configured to determine a most-likely accurate transcription from among the transcription candidates. As but one example, the plurality of representations of the utterance can be acquired by a microphone array, and beamforming techniques can generate independent streams of the utterance across various look directions using output from the microphone array.
    Type: Grant
    Filed: April 19, 2019
    Date of Patent: April 7, 2020
    Assignee: Apple Inc.
    Inventors: Sean A. Ramprashad, Harvey D. Thornburg, Arvindh Krishnaswamy, Aram M. Lindahl
  • Patent number: 10540984
    Abstract: Method for echo control using adaptive polynomial filters in sub-band domain starts with loudspeaker that is configured to be driven by a reference signal outputting a loudspeaker signal. Microphone receives at least one of: a near-end speaker signal, ambient noise signal, or the loudspeaker signal and generates a microphone signal. Adaptive polynomial filters in sub-band domain included in adaptive echo canceller (AEC) are configured to adaptively filter representation of the reference signal in a plurality of channels in a sub-band domain based on a clean signal to generate the echo estimate. Echo suppressor is configured to remove an echo estimate from the microphone signal to generate the clean signal. Other embodiments are described.
    Type: Grant
    Filed: September 22, 2016
    Date of Patent: January 21, 2020
    Assignee: APPLE INC.
    Inventors: Sarmad Aziz Malik, Arvindh Krishnaswamy
  • Publication number: 20190251974
    Abstract: A speech recognition system for resolving impaired utterances can have a speech recognition engine configured to receive a plurality of representations of an utterance and concurrently to determine a plurality of highest-likelihood transcription candidates corresponding to each respective representation of the utterance. The recognition system can also have a selector configured to determine a most-likely accurate transcription from among the transcription candidates. As but one example, the plurality of representations of the utterance can be acquired by a microphone array, and beamforming techniques can generate independent streams of the utterance across various look directions using output from the microphone array.
    Type: Application
    Filed: April 19, 2019
    Publication date: August 15, 2019
    Inventors: Sean A. Ramprashad, Harvey D. Thornburg, Arvindh Krishnaswamy, Aram M. Lindahl
  • Patent number: 10304462
    Abstract: A speech recognition system for resolving impaired utterances can have a speech recognition engine configured to receive a plurality of representations of an utterance and concurrently to determine a plurality of highest-likelihood transcription candidates corresponding to each respective representation of the utterance. The recognition system can also have a selector configured to determine a most-likely accurate transcription from among the transcription candidates. As but one example, the plurality of representations of the utterance can be acquired by a microphone array, and beamforming techniques can generate independent streams of the utterance across various look directions using output from the microphone array.
    Type: Grant
    Filed: January 15, 2018
    Date of Patent: May 28, 2019
    Assignee: Apple Inc.
    Inventors: Sean A. Ramprashad, Harvey D. Thornburg, Arvindh Krishnaswamy, Aram M. Lindahl
  • Patent number: 10013981
    Abstract: A speech recognition system for resolving impaired utterances can have a speech recognition engine configured to receive a plurality of representations of an utterance and concurrently to determine a plurality of highest-likelihood transcription candidates corresponding to each respective representation of the utterance. The recognition system can also have a selector configured to determine a most-likely accurate transcription from among the transcription candidates. As but one example, the plurality of representations of the utterance can be acquired by a microphone array, and beamforming techniques can generate independent streams of the utterance across various look directions using output from the microphone array.
    Type: Grant
    Filed: June 6, 2015
    Date of Patent: July 3, 2018
    Inventors: Sean A. Ramprashad, Harvey D. Thornburg, Arvindh Krishnaswamy, Aram M. Lindahl