Patents by Inventor Arvindh Krishnaswamy
Arvindh Krishnaswamy has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12340281Abstract: Systems, methods, and apparatuses for selecting a model are described. A method includes receiving a request to perform model selection and evaluating multiple models by generating various metrics for each trained model. These metrics include a forewarning time (how much in advance of a failure an alert can be raised), an event recall metric (how many failure events were alerted to in advance), an event precision metric (ratio of true and false positives, and an area under a receiver operating characteristic (ROC) curve. For each trained model, a weighted harmonic mean is calculated from these metrics. A model is then selected based on the calculated means and a report on the selected model is generated and provided.Type: GrantFiled: September 30, 2020Date of Patent: June 24, 2025Assignee: Amazon Technologies, Inc.Inventors: Karim Helwani, Srikanth Venkata Tenneti, Arvindh Krishnaswamy, Ritwik Giri, Mehmet Umut Isik, Aparna Pandey, Fangzhou Cheng
-
Patent number: 12284048Abstract: Implementations for compositing two input signals to form a higher quality signal are described. A first input signal is received from a first input device and a second input signal is received from a second input device. The first input signal and the second input signal are combined into a composite input signal. It is then determined that the composite input signal has a higher quality than either the first input signal or the second input signal individually. Based on that determination, the composite input signal is selected for use by the media conferencing service as part of a media conference.Type: GrantFiled: June 30, 2021Date of Patent: April 22, 2025Assignee: Amazon Technologies, Inc.Inventors: Siddhartha Shankara Rao, Michael Klingbeil, Arvindh Krishnaswamy, John Joseph Dunne
-
Patent number: 12272371Abstract: Real-time audio enhancement for a target speaker may be performed. An embedding of a sample of speaker audio is created using a trained neural network that performs voice identification. The embedding is then concatenated with the input features of a trained machine learning model for audio enhancement. The audio enhancement model can recognize and enhance a target speaker's speech in a real-time implementation, as the embedding is in the same feature space of the audio enhancement model.Type: GrantFiled: June 30, 2021Date of Patent: April 8, 2025Assignee: Amazon Technologies, Inc.Inventors: Ritwik Giri, Shrikant Venkataramani, Jean-Marc Valin, Mehmet Umut Isik, Arvindh Krishnaswamy
-
Patent number: 12205039Abstract: A group masked autoencoder may be implemented for anomaly detection. An autoencoder network model may be trained without supervision and applied to output an estimated joint probability distribution of normality for a group of frames of time series data. The estimated joint probability distribution may be used to determine an anomaly score for the time series data. An anomaly may be detected according to the anomaly score and a result that indicates a detected anomaly may be provided.Type: GrantFiled: November 2, 2020Date of Patent: January 21, 2025Assignee: Amazon Technologies, Inc.Inventors: Ritwik Giri, Srikanth Venkata Tenneti, Karim Helwani, Fangzhou Cheng, Mehmet Umut Isik, Arvindh Krishnaswamy
-
Patent number: 12175434Abstract: Systems, methods, and apparatuses for detecting anomalies using clusters are described. In some examples, a method includes receiving a request to perform anomaly detection using a plurality of clusters; receiving a data point; determining when the received data point is a part of one of the plurality of clusters utilizing a distance to centers of the one or more clusters, wherein: when the received data point is determined to belong to a normal cluster, assigning the received data point to the determined cluster, updating the cluster, and updating a history for the cluster, when the received data point is determined to belong to an anomalous cluster, raising an anomaly, updating the cluster, and updating a history for the cluster, and when the received data point is determined to not belong to any cluster, raising an anomaly.Type: GrantFiled: September 30, 2020Date of Patent: December 24, 2024Assignee: Amazon Technologies, Inc.Inventors: Srikanth Venkata Tenneti, Arvindh Krishnaswamy, Karim Helwani, Mehmet Umut Isik, Ritwik Giri, Fangzhou Cheng, Aparna Pandey
-
Patent number: 12047536Abstract: Implementations for selecting an input device based on characteristics of the input signals from those input devices are described. A first input signal is received from a first input device of a participant device participating in a media conference and a second input signal is received from a second input device of the participant device. A first characteristic of the first input signal and a second characteristic of the second input signal are determined. The first characteristic is compared to the second characteristic. It is determined that a quality of the second input signal is greater than a quality of the first input signal based on comparing the first characteristic to the second characteristic. The second input device is selected based on determining that the quality of the second input signal is greater than the quality of the first input signal.Type: GrantFiled: June 30, 2021Date of Patent: July 23, 2024Assignee: Amazon Technologies, Inc.Inventors: Siddhartha Shankara Rao, Michael Klingbeil, Arvindh Krishnaswamy, John Joseph Dunne
-
Patent number: 12014748Abstract: Techniques for training and using a machine learning model for estimation of reverberation in a multi-task learning framework are described. According to some embodiments, the multi-task learning framework improves the performance of the machine learning model by estimating the amount of reverberation present in an input audio recording as a secondary task to the primary task of generating a clean speech portion of the input audio recording. In one embodiment, a model architecture is selected that takes a noisy reverberant recording as an input and outputs an estimate of a clean (e.g., de-reverberated) signal, an estimate of noise (e.g., background noise), and an estimate of the reverb only portion, with the secondary task of estimating the reverb only portion acting as a regularizer that improves the machine learning model's performance in enhancing the reverberant (e.g., and noisy) input speech.Type: GrantFiled: August 7, 2020Date of Patent: June 18, 2024Assignee: Amazon Technologies, Inc.Inventors: Ritwik Giri, Mehmet Umut Isik, Neerad Dilip Phansalkar, Jean-Marc Valin, Karim Helwani, Arvindh Krishnaswamy
-
Patent number: 12008457Abstract: Audio processing may be performed with a convolutional neural network that includes positional embeddings. Audio data may be received at an audio processing system. A convolutional neural network that concatenates frequency-positional embeddings at an input layer may be used to process the audio data. A result of processing the audio data through the convolutional neural network may be used to perform an audio processing task.Type: GrantFiled: September 29, 2020Date of Patent: June 11, 2024Assignee: Amazon Technologies, Inc.Inventors: Mehmet Umut Isik, Ritwik Giri, Neerad Dilip Phansalkar, Jean-Marc Valin, Karim Helwani, Arvindh Krishnaswamy
-
Publication number: 20240096346Abstract: A plurality of talker embedding vectors may be derived that correspond to a plurality of talkers in an input audio stream. Each talker embedding vector may represent respective voice characteristics of a respective talker. The talker embedding vectors may be generated based on, for example, a pre-enrollment process or a cluster-based embedding vector derivation process. A plurality of instances of a personalized noise suppression model may be executed on the input audio stream. Each instance of the personalized noise suppression model may employ a respective talker embedding vector. A plurality of single-talker audio streams may be generated by the plurality of instances of the personalized noise suppression model. A plurality of single-talker transcriptions may be generated based on the plurality of single-talker audio streams. The plurality of single-talker transcriptions may be merged into a multi-talker output transcription.Type: ApplicationFiled: June 27, 2022Publication date: March 21, 2024Inventors: Masahito Togami, Ritwik Giri, Michael Mark Goodwin, Arvindh . Krishnaswamy, Siddhartha Shankara Rao
-
Patent number: 11924367Abstract: Joint noise and echo suppression may be performed for enhancing two-way audio communications. Audio data is captured at a communication device and audio data transmitted to the communication device from another communication device are used as input features to a trained machine learning model that uses the transmitted audio data as a reference signal to eliminate residual echo in the captured audio data when also suppressing noise in the captured audio data.Type: GrantFiled: February 9, 2022Date of Patent: March 5, 2024Assignee: Amazon Technologies, Inc.Inventors: Jean-Marc Valin, Karim Helwani, Srikanth Venkata Tenneti, Erfan Soltanmohammadi, Mehmet Umut Isik, Richard Newman, Michael Mark Goodwin, Arvindh Krishnaswamy
-
Patent number: 11775145Abstract: An electronic device displays a messaging interface that allows a participant in a message conversation to capture, send, and/or play media content. The media content includes images, video, and/or audio. The media content is captured, sent, and/or played based on the electronic device detecting one or more conditions.Type: GrantFiled: November 17, 2022Date of Patent: October 3, 2023Assignee: Apple Inc.Inventors: Roberto Garcia, Anil K. Kandangath, Arvindh Krishnaswamy, Xiaoyuan Tu, Justin Wood
-
Patent number: 11545134Abstract: Techniques for the generation of dubbed audio for an audio/video are described.Type: GrantFiled: December 10, 2019Date of Patent: January 3, 2023Assignee: Amazon Technologies, Inc.Inventors: Marcello Federico, Robert Enyedi, Yaser Al-Onaizan, Roberto Barra-Chicote, Andrew Paul Breen, Ritwik Giri, Mehmet Umut Isik, Arvindh Krishnaswamy, Hassan Sawaf
-
Patent number: 11521637Abstract: Post-filtering may be performed for ratio masks as part of audio enhancement. Audio data may be received. A machine learning model may be applied to generate gain values for different spectrum bands of the audio data. The gain values may then be modified using an envelope post-filter according to a monotonically increasing function applied to the gain values to produce modified gain values used to generate an enhanced version of the audio data.Type: GrantFiled: September 29, 2020Date of Patent: December 6, 2022Assignee: Amazon Technologies, Inc.Inventors: Jean-Marc Valin, Mehmet Umut Isik, Neerad Dilip Phansalkar, Ritwik Giri, Karim Helwani, Arvindh Krishnaswamy
-
Patent number: 11513661Abstract: An electronic device includes a touch-sensitive surface, a display, and a camera sensor. The device displays a message region for displaying a message conversation and receives a request to add media to the message conversation. Responsive to receiving the request, the device displays a media selection interface concurrently with at least a portion of the message conversation. The media selection interface includes a plurality of affordances for selecting media for addition to the message conversation, the plurality of affordances includes a live preview affordance, at least a subset of the plurality of affordances includes thumbnail representations of media available for adding to the message conversation, and the live preview affordance is associated with a live camera preview. Responsive to detecting selection of the live preview affordance, the device captures a new image based on the live camera preview and selects the new image for addition to the message conversation.Type: GrantFiled: July 22, 2020Date of Patent: November 29, 2022Assignee: Apple Inc.Inventors: Roberto Garcia, Anil K. Kandangath, Arvindh Krishnaswamy, Xiaoyuan Tu, Justin Wood
-
Publication number: 20220101270Abstract: Systems, methods, and apparatuses for detecting anomalies using clusters are described. In some examples, a method includes receiving a request to perform anomaly detection using a plurality of clusters; receiving a data point; determining when the received data point is a part of one of the plurality of clusters utilizing a distance to centers of the one or more clusters, wherein: when the received data point is determined to belong to a normal cluster, assigning the received data point to the determined cluster, updating the cluster, and updating a history for the cluster, when the received data point is determined to belong to an anomalous cluster, raising an anomaly, updating the cluster, and updating a history for the cluster, and when the received data point is determined to not belong to any cluster, raising an anomaly.Type: ApplicationFiled: September 30, 2020Publication date: March 31, 2022Inventors: Srikanth Venkata Tenneti, Arvindh Krishnaswamy, Karim Helwani, Mehmet Umut Isik, Ritwik Giri, Fangzhou Cheng, Aparna Pandey
-
Patent number: 10614812Abstract: A speech recognition system for resolving impaired utterances can have a speech recognition engine configured to receive a plurality of representations of an utterance and concurrently to determine a plurality of highest-likelihood transcription candidates corresponding to each respective representation of the utterance. The recognition system can also have a selector configured to determine a most-likely accurate transcription from among the transcription candidates. As but one example, the plurality of representations of the utterance can be acquired by a microphone array, and beamforming techniques can generate independent streams of the utterance across various look directions using output from the microphone array.Type: GrantFiled: April 19, 2019Date of Patent: April 7, 2020Assignee: Apple Inc.Inventors: Sean A. Ramprashad, Harvey D. Thornburg, Arvindh Krishnaswamy, Aram M. Lindahl
-
Patent number: 10540984Abstract: Method for echo control using adaptive polynomial filters in sub-band domain starts with loudspeaker that is configured to be driven by a reference signal outputting a loudspeaker signal. Microphone receives at least one of: a near-end speaker signal, ambient noise signal, or the loudspeaker signal and generates a microphone signal. Adaptive polynomial filters in sub-band domain included in adaptive echo canceller (AEC) are configured to adaptively filter representation of the reference signal in a plurality of channels in a sub-band domain based on a clean signal to generate the echo estimate. Echo suppressor is configured to remove an echo estimate from the microphone signal to generate the clean signal. Other embodiments are described.Type: GrantFiled: September 22, 2016Date of Patent: January 21, 2020Assignee: APPLE INC.Inventors: Sarmad Aziz Malik, Arvindh Krishnaswamy
-
Publication number: 20190251974Abstract: A speech recognition system for resolving impaired utterances can have a speech recognition engine configured to receive a plurality of representations of an utterance and concurrently to determine a plurality of highest-likelihood transcription candidates corresponding to each respective representation of the utterance. The recognition system can also have a selector configured to determine a most-likely accurate transcription from among the transcription candidates. As but one example, the plurality of representations of the utterance can be acquired by a microphone array, and beamforming techniques can generate independent streams of the utterance across various look directions using output from the microphone array.Type: ApplicationFiled: April 19, 2019Publication date: August 15, 2019Inventors: Sean A. Ramprashad, Harvey D. Thornburg, Arvindh Krishnaswamy, Aram M. Lindahl
-
Patent number: 10304462Abstract: A speech recognition system for resolving impaired utterances can have a speech recognition engine configured to receive a plurality of representations of an utterance and concurrently to determine a plurality of highest-likelihood transcription candidates corresponding to each respective representation of the utterance. The recognition system can also have a selector configured to determine a most-likely accurate transcription from among the transcription candidates. As but one example, the plurality of representations of the utterance can be acquired by a microphone array, and beamforming techniques can generate independent streams of the utterance across various look directions using output from the microphone array.Type: GrantFiled: January 15, 2018Date of Patent: May 28, 2019Assignee: Apple Inc.Inventors: Sean A. Ramprashad, Harvey D. Thornburg, Arvindh Krishnaswamy, Aram M. Lindahl
-
Patent number: 10013981Abstract: A speech recognition system for resolving impaired utterances can have a speech recognition engine configured to receive a plurality of representations of an utterance and concurrently to determine a plurality of highest-likelihood transcription candidates corresponding to each respective representation of the utterance. The recognition system can also have a selector configured to determine a most-likely accurate transcription from among the transcription candidates. As but one example, the plurality of representations of the utterance can be acquired by a microphone array, and beamforming techniques can generate independent streams of the utterance across various look directions using output from the microphone array.Type: GrantFiled: June 6, 2015Date of Patent: July 3, 2018Inventors: Sean A. Ramprashad, Harvey D. Thornburg, Arvindh Krishnaswamy, Aram M. Lindahl