Patents by Inventor Michael Mark Goodwin

Michael Mark Goodwin has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Clock skew robust acoustic echo cancellation

Patent number: 12634397

Abstract: Far-end audio samples may be received corresponding to far-end audio that is output from one or more audio output components. Near-end audio samples may be received corresponding to near-end audio that is captured by one or more audio input components. A plurality of acoustic path estimates and a plurality of clock skew estimates may be calculated in an alternating order, using a state-space model, based at least in part on the far-end audio samples and the near-end audio samples. A first acoustic path estimate and a first clock skew estimate may be used to calculate a second acoustic path estimate. A first portion of the far-end audio may be filtered with the second acoustic path estimate to generate a replica of echo in the first portion of the far-end audio. The replica of the echo may be removed from a corresponding second portion of the near-end audio.

Type: Grant

Filed: December 5, 2022

Date of Patent: May 19, 2026

Assignee: Amazon Technologies, Inc.

Inventors: Karim Helwani, Erfan Soltanmohammadi, Michael Mark Goodwin, Arvindh Krishnaswamy
Real-time summary evaluation large language model

Patent number: 12626051

Abstract: The present disclosure generally relates to systems and methods for generating and evaluating a transcript summary based on a transcript of an active meeting. A summary evaluation system may evaluate a summary based on both completeness and accuracy. The summary evaluation system may generate, via a machine learning model, questions and correct answers based on a first representation of the transcript (e.g., the transcript summary). The summary evaluation system may determine possible answers to the questions based on a second representation of the transcript (e.g., transcript). Depending on a comparison between the answers, a completeness score of the summary is generated. In some embodiments, the first representation of the transcript may be the transcript, the second representation of the transcript may be the generated transcript summary, and the score may be an accuracy score. Regeneration of the summary may be prompted if the score is below a threshold.

Type: Grant

Filed: March 14, 2024

Date of Patent: May 12, 2026

Assignee: Amazon Technologies, Inc.

Inventors: Devansh Shah, Michael Mark Goodwin, Srikanth Venkata Tenneti, Mehmet Umut Isik
Face-aware relighting of live video content

Patent number: 12621417

Abstract: Systems and methods are provided for modifying video content to improve lighting of a person's face depicted within the video content. Pixels depicting skin may be detected in a first frame of the video content. Transformation parameters may then be determined based on intensity values of the pixels depicting skin, where the transformation parameters represent adjusted pixel intensity values determined to improve at least one of brightness or contrast of the pixels depicting skin. Based on the transformation parameters, intensity association data may be generated and stored that associates each possible input pixel intensity value in at least one channel with a corresponding adjusted pixel intensity value. The stored intensity association data as determined with respect to the first frame may then be reused to modify intensity values for a series of frames of the video content.

Type: Grant

Filed: March 31, 2023

Date of Patent: May 5, 2026

Assignee: Amazon Technologies, Inc.

Inventors: Prerit Jaiswal, Mehmet Umut Isik, Srikanth Venkata Tenneti, Samuel J Wilson, Amritpal Singh Saini, Parisa Rahimzadeh, Amalavoyal Chari, Michael Mark Goodwin
Sentiment-based conversation hotspot detection

Patent number: 12586573

Abstract: A system may include machine learning models. A system may receive audio information representing an utterance of a conversation session. A system may divide the audio information into a plurality of audio portions. A system may evaluate a first audio portion using a tone-based sentiment analysis model to generate sentiment probabilities. A system may determine a first positive sentiment probability exceeds a threshold. A system may generate a textual representation of the first audio portion. A system may evaluate the textual representation using a topic identification model to generate a topic result indicating a topic. A system may evaluate a second audio portion using the tone-based sentiment analysis model to generate second sentiment probabilities. A system may determine a second positive sentiment probability does not exceed the threshold.

Type: Grant

Filed: September 29, 2023

Date of Patent: March 24, 2026

Assignee: Amazon Technologies, Inc.

Inventors: Gizem Tabak, Masahito Togami, Michael Mark Goodwin, Amalavoyal Chari, Siddhartha Shankara Rao
Multi-talker audio stream separation, transcription and diaraization

Patent number: 12579993

Abstract: A plurality of talker embedding vectors may be derived that correspond to a plurality of talkers in an input audio stream. Each talker embedding vector may represent respective voice characteristics of a respective talker. The talker embedding vectors may be generated based on, for example, a pre-enrollment process or a cluster-based embedding vector derivation process. A plurality of instances of a personalized noise suppression model may be executed on the input audio stream. Each instance of the personalized noise suppression model may employ a respective talker embedding vector. A plurality of single-talker audio streams may be generated by the plurality of instances of the personalized noise suppression model. A plurality of single-talker transcriptions may be generated based on the plurality of single-talker audio streams. The plurality of single-talker transcriptions may be merged into a multi-talker output transcription.

Type: Grant

Filed: June 27, 2022

Date of Patent: March 17, 2026

Assignee: Amazon Technologies, Inc.

Inventors: Masahito Togami, Ritwik Giri, Michael Mark Goodwin, Arvindh Krishnaswamy, Siddhartha Shankara Rao
MANIFOLD LEARNING FOR SOUND FIELD ESTIMATION

Publication number: 20260038475

Abstract: System and methods are provided for estimating the sound field from partial observations. Estimating an acoustic environment for virtual reality and augmented reality applications is a step in the creation of simulated acoustic sound scenes. In particular, the impulse responses of room can be estimated with a generative model. In a teleconferencing scenario with remote participants and a group of participants in a common physical space, giving the remote participants the impression that all other participants are sitting is in the same room acoustically requires filtering the speech of the remote participants with impulse responses estimated at the desired rendering position in the conference room.

Type: Application

Filed: October 13, 2025

Publication date: February 5, 2026

Inventors: Karim Helwani, Michael Mark Goodwin, Paris Smaragdis
Semi-supervised training of a machine learning model for target speaker audio enhancement

Patent number: 12531067

Abstract: Training a machine learning model for application to an audio enhancement system for a target speaker may be performed. When at least one clean audio speech sample of a target speaker is captured, the machine learning model may then be trained using noisy audio speech samples in which the voice of the target speaker is present in addition to the voices of other speakers and/or background noise. Once the machine learning model is sufficiently trained, it may be deployed for use in audio enhancement and voice processing for an audio transmission service.

Type: Grant

Filed: June 29, 2022

Date of Patent: January 20, 2026

Assignee: Amazon Technologies, Inc.

Inventors: Ritwik Giri, Michael Mark Goodwin, Arvindh Krishnaswamy, Mehmet Umut Isik, Jean-Marc Valin, Zhepei Wang, Shrikant Venkataramani, Paris Smaragdis
Set-based active speaker detection

Patent number: 12532141

Abstract: A system may receive sound information and generate an inference embedding using the sound information. The system may additionally receive a set of speaker embeddings, which may represent voice information for a set of speakers. The system may compare the inference embedding to the set of speaker embeddings to generate a result. The system may determine, based on the result, a speaker identity match rating for each speaker embedding in the set of speaker embeddings. The system may identify a speaker associated with a speaker embedding of the set of speaker embeddings having the highest speaker identity match rating as an active speaker.

Type: Grant

Filed: March 21, 2023

Date of Patent: January 20, 2026

Assignee: Amazon Technologies, Inc.

Inventors: Ritwik Giri, Michael Mark Goodwin, Devansh Shah
Machine learning model that estimates human audio quality assessments

Patent number: 12499904

Abstract: An audio assessment machine learning model, for example including one or more neural networks, may be trained for human audio quality assessment estimation, wherein the training comprises performing comparisons of a plurality of training machine learning audio quality assessments of training audio content to a plurality of human audio quality assessments of the training audio content and adjusting the audio assessment machine learning model based on the comparisons. After the training, a first audio analysis of first audio content may be performed by the audio assessment machine learning model. A first machine learning audio quality assessment of the first audio content may be provided, by the audio assessment machine learning model, based on the first audio analysis. The first machine learning audio quality assessment may include a quality score for the first audio content and a quality degradation reason for the first audio content.

Type: Grant

Filed: March 31, 2022

Date of Patent: December 16, 2025

Assignee: Amazon Technologies, Inc.

Inventors: Sid Shankara Rao, Michael Mark Goodwin, Arvindh Krishnaswamy, Michael Klingbeil, Karim Helwani, Erfan Soltanmohammadi
Manifold learning for sound field estimation

Patent number: 12444398

Abstract: System and methods are provided for estimating the sound field from partial observations. Estimating an acoustic environment for virtual reality and augmented reality applications is a step in the creation of simulated acoustic sound scenes. In particular, the impulse responses of room can be estimated with a generative model. In a teleconferencing scenario with remote participants and a group of participants in a common physical space, giving the remote participants the impression that all other participants are sitting is in the same room acoustically requires filtering the speech of the remote participants with impulse responses estimated at the desired rendering position in the conference room.

Type: Grant

Filed: September 27, 2023

Date of Patent: October 14, 2025

Assignee: Amazon Technologies, Inc.

Inventors: Karim Helwani, Michael Mark Goodwin, Paris Smaragdis
PROMPT MANAGEMENT FOR LARGE LANGUAGE MODEL

Publication number: 20250307639

Abstract: Systems and methods for a prompt generation and analysis service for generating and identifying a preferred prompt for performing a function of a large language model (LLM) are provided. The prompt generation and analysis service may generate a set of training prompts for performing a function of an LLM. The prompt generation and analysis service may then query the LLM with the generated set of prompts and characterize the output of the LLM for each prompt. Using the characterization of the output and corresponding prompt, the prompt generation and analysis service can train a classifier model to classify the prompts. The prompt generation and analysis service may generate a set of target prompts for performing a function of an LLM, characterize the target prompts using the training classifier model, and identify a preferred prompt for performing the function based on the classifier model's classification.

Type: Application

Filed: March 28, 2024

Publication date: October 2, 2025

Inventors: Minho Jin, Erfan Soltanmohammadi, Masahito Togami, Gizem Tabak, Turab Iqbal, Karim Helwani, Michael Mark Goodwin
EFFICIENT VOICE SYNTHESIS USING FRAME-BASED PROCESSING

Publication number: 20250308509

Abstract: Efficient voice synthesis using frame-based processing may be performed. An audio processing system converts an input speech waveform to an acoustic feature representation, which includes a sequence of frames at a lower resolution than the sampling resolution of the input waveform. The system propagates the acoustic feature representation through GRUs and fully-connected layers, while maintaining the lower resolution. At the end, the system performs a flattening operation on the frames of the final acoustic feature representation to generate an output waveform at a target sampling resolution.

Type: Application

Filed: June 10, 2025

Publication date: October 2, 2025

Applicant: Amazon Technologies, Inc.

Inventors: Ahmed Mustafa, Jean-Marc Valin, Jan Buethe, Paris Smaragdis, Michael Mark Goodwin
Voice identification assisted end-to-end cryptography

Patent number: 12425379

Abstract: A computing device may generate a first cryptographic key data for communicating with a second computing device. A computing device may receive, by a microphone of the computing device, a sound information, wherein the first sound information represents speech occurring during a multi-participant conversation. A computing device may generate a sound embedding using the sound information. A computing device may compare the sound embedding to each speaker embedding of a set of speaker embeddings stored in a memory of the computing device to generate a result. A computing device may identify, based on the result, a current active speaker. A computing device may generate a second cryptographic key data, based in part on a speaker embedding associated with the current active speaker.

Type: Grant

Filed: March 31, 2023

Date of Patent: September 23, 2025

Assignee: Amazon Technologies, Inc.

Inventors: Curtis Gill Hartmann, Thushara Paul, Amalavoyal Chari, Michael Mark Goodwin, John Joseph Dunne
Efficient voice synthesis using frame-based processing

Patent number: 12354593

Abstract: Efficient voice synthesis using frame-based processing may be performed. An audio processing system converts an input speech waveform to an acoustic feature representation, which includes a sequence of frames at a lower resolution than the sampling resolution of the input waveform. The system propagates the acoustic feature representation through GRUs and fully-connected layers, while maintaining the lower resolution. At the end, the system performs a flattening operation on the frames of the final acoustic feature representation to generate an output waveform at a target sampling resolution.

Type: Grant

Filed: March 31, 2023

Date of Patent: July 8, 2025

Assignee: Amazon Technologies, Inc.

Inventors: Ahmed Mustafa, Jean-Marc Valin, Jan Buethe, Paris Smaragdis, Michael Mark Goodwin
Bidirectional videoconference-related messaging for public switched telephone network participants

Patent number: 12348677

Abstract: A videoconference among a plurality of participants may be hosted, wherein the plurality of participants comprise Internet Protocol (IP)-connected participants and a Public Switched Telephone Network (PSTN)-connected participant. The IP-connected participants may send and receive audio content and video content of the videoconference via IP-based connections. The PSTN-connected participant may send and receive the audio content of the videoconference via a PSTN connection. Additional content from the videoconference may also be transmitted to the PSTN-connected participant, for example as text messages via the PSTN connection. The additional content may include, for example, images of a videoconference screen share, chat posts, polls, and the like. Images may be transmitted in the additional content based on video status change events, such as switching slides or pages in a screen share.

Type: Grant

Filed: March 31, 2022

Date of Patent: July 1, 2025

Assignee: Amazon Technologies, Inc.

Inventors: John Joseph Dunne, Siddhartha Shankara Rao, Michael Mark Goodwin
UNIFIED AUDIO SUPPRESSION MODEL

Publication number: 20250111857

Abstract: Examples herein provide an approach to enhance an audio mixture of a teleconference application by switching between noise suppression modes using a single model. Specifically, a machine learning (ML) model may be configured to, in response to receiving an audio mixture representation as input, suppress either a background noise of the audio mixture or suppress all noise of the audio mixture except a user's voice. In some examples, the ML model may be trained on speech and background noise training data during a training phase. In addition, the ML model may be trained on a user's voice during an enrollment phase. In addition, during an inference phase, the ML model may enhance the audio mixture by suppressing a portion of the audio mixture.

Type: Application

Filed: September 29, 2023

Publication date: April 3, 2025

Inventors: Ritwik Giri, Zhepei Wang, Devansh Shah, Jean-Marc Valin, Michael Mark Goodwin
Real-time low-complexity stereo speech enhancement with spatial cue preservation

Patent number: 12167223

Abstract: Real-time low-complexity stereo speech enhancement with spatial cue preservation may be performed. A stereo speech enhancement system receives a stereo input signal (e.g., a left and right input signal). The stereo speech enhancement system estimates spatial cues for a target speaker and downmixes the stereo input signal into a monaural signal. A low-complexity model may then process the monaural signal to generate an enhanced monaural signal. The stereo speech enhancement system upmixes the enhanced monaural signal based on the estimated spatial cues for the target speaker, to generate an enhanced stereo output signal.

Type: Grant

Filed: June 30, 2022

Date of Patent: December 10, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Masahito Togami, Karim Helwani, Jean-Marc Valin, Michael Mark Goodwin
Separate representations of videoconference participants that use a shared device

Patent number: 12010459

Abstract: A plurality of device-sharing participants may be detected that are participating in a videoconference via a shared computing device. The detecting of the plurality of device-sharing participants may be performed based, at least in part, on at least one of an audio analysis of captured audio from one or more microphones or a video analysis of captured video from one or more cameras. A plurality of participant connections corresponding to the plurality of device-sharing participants may be joined to the videoconference. Each of the plurality of participant connections may be identified within the videoconference using a respective name. A plurality of video streams and a plurality of audio streams corresponding to the plurality of participant connections may be transmitted, and the plurality of video streams and the plurality of audio streams may be presented to at least one other conference participant.

Type: Grant

Filed: March 31, 2022

Date of Patent: June 11, 2024

Assignee: Amazon Technologies, Inc.

Inventors: John Joseph Dunne, Michael Klingbeil, Michael Mark Goodwin, Siddhartha Shankara Rao
Multi-Talker Audio Stream Separation, Transcription and Diaraization

Publication number: 20240096346

Abstract: A plurality of talker embedding vectors may be derived that correspond to a plurality of talkers in an input audio stream. Each talker embedding vector may represent respective voice characteristics of a respective talker. The talker embedding vectors may be generated based on, for example, a pre-enrollment process or a cluster-based embedding vector derivation process. A plurality of instances of a personalized noise suppression model may be executed on the input audio stream. Each instance of the personalized noise suppression model may employ a respective talker embedding vector. A plurality of single-talker audio streams may be generated by the plurality of instances of the personalized noise suppression model. A plurality of single-talker transcriptions may be generated based on the plurality of single-talker audio streams. The plurality of single-talker transcriptions may be merged into a multi-talker output transcription.

Type: Application

Filed: June 27, 2022

Publication date: March 21, 2024

Inventors: Masahito Togami, Ritwik Giri, Michael Mark Goodwin, Arvindh . Krishnaswamy, Siddhartha Shankara Rao
Joint noise and echo suppression for two-way audio communication enhancement

Patent number: 11924367

Abstract: Joint noise and echo suppression may be performed for enhancing two-way audio communications. Audio data is captured at a communication device and audio data transmitted to the communication device from another communication device are used as input features to a trained machine learning model that uses the transmitted audio data as a reference signal to eliminate residual echo in the captured audio data when also suppressing noise in the captured audio data.

Type: Grant

Filed: February 9, 2022

Date of Patent: March 5, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Jean-Marc Valin, Karim Helwani, Srikanth Venkata Tenneti, Erfan Soltanmohammadi, Mehmet Umut Isik, Richard Newman, Michael Mark Goodwin, Arvindh Krishnaswamy

1 2 next