Patents by Inventor Sunkuk MOON

Sunkuk MOON has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Controllable diffusion-based speech generative model

Patent number: 12658175

Abstract: Systems and techniques described herein relate to a diffusion-based model for generating converted speech from a source speech based on target speech. For example, a device may extract first prosody data from input data and may generate a content embedding based on the input data. The device may extract second prosody data from target speech, generate a speaker embedding from the target speech, and generate a prosody embedding from the second prosody data. The device may generate, based on the first prosody data and the prosody embedding, converted prosody data. The device may then generate a converted spectrogram based on the converted prosody data, the speaker embedding, and the content embedding.

Type: Grant

Filed: October 25, 2023

Date of Patent: June 16, 2026

Assignee: QUALCOMM Incorporated

Inventors: Kyungguen Byun, Sunkuk Moon, Erik Visser
SPEECH PROFILE MANAGEMENT

Publication number: 20250372098

Abstract: A device includes a memory configured to store enrolled speech profiles. The device also includes one or more processors configured to obtain multiple audio embeddings representing speech that is identified as associated with a single talker in an audio stream. The one or more processors are also configured to determine a first speech profile based on the multiple audio embeddings. The one or more processors are further configured to determine a similarity metric based on a comparison of the first speech profile to a second speech profile of the enrolled speech profiles. The one or more processors are also configured to, based on the similarity metric, determine whether to combine the first speech profile and the second speech profile.

Type: Application

Filed: March 17, 2025

Publication date: December 4, 2025

Inventors: Soo Jin PARK, Sunkuk MOON, Hyeon-Kyeong SHIN, Phuong Lam TON, Erik VISSER, Ye JIANG, Dinesh RAMAKRISHNAN, Shaun VAN DYKEN
Context-data based speech enhancement

Patent number: 12380909

Abstract: A device to perform speech enhancement includes one or more processors configured to process image data to detect at least one of an emotion, a speaker characteristic, or a noise type. The one or more processors are also configured to generate context data based at least in part on the at least one of the emotion, the speaker characteristic, or the noise type. The one or more processors are further configured to obtain input spectral data based on an input signal. The input signal represents sound that includes speech. The one or more processors are also configured to process, using a multi-encoder transformer, the input spectral data and the context data to generate output spectral data that represents a speech enhanced version of the input signal.

Type: Grant

Filed: June 14, 2023

Date of Patent: August 5, 2025

Assignee: QUALCOMM Incorporated

Inventors: Kyungguen Byun, Shuhua Zhang, Lae-Hoon Kim, Erik Visser, Sunkuk Moon, Vahid Montazeri
CONTROLLABLE DIFFUSION-BASED SPEECH GENERATIVE MODEL

Publication number: 20250078810

Abstract: Systems and techniques described herein relate to a diffusion-based model for generating converted speech from a source speech based on target speech. For example, a device may extract first prosody data from input data and may generate a content embedding based on the input data. The device may extract second prosody data from target speech, generate a speaker embedding from the target speech, and generate a prosody embedding from the second prosody data. The device may generate, based on the first prosody data and the prosody embedding, converted prosody data. The device may then generate a converted spectrogram based on the converted prosody data, the speaker embedding, and the content embedding.

Type: Application

Filed: October 25, 2023

Publication date: March 6, 2025

Inventors: Kyungguen BYUN, Sunkuk MOON, Erik VISSER
Shared speech processing network for multiple speech applications

Patent number: 12200450

Abstract: A device to process speech includes a speech processing network that includes an input configured to receive audio data. The speech processing network also includes one or more network layers configured to process the audio data to generate a network output. The speech processing network includes an output configured to be coupled to multiple speech application modules to enable the network output to be provided as a common input to each of the multiple speech application modules.

Type: Grant

Filed: May 26, 2023

Date of Patent: January 14, 2025

Assignee: QUALCOMM Incorporated

Inventors: Lae-Hoon Kim, Sunkuk Moon, Erik Visser, Prajakt Kulkarni
SOURCE SPEECH MODIFICATION BASED ON AN INPUT SPEECH CHARACTERISTIC

Publication number: 20240087597

Abstract: A device includes one or more processors configured to process an input audio spectrum of input speech to detect a first characteristic associated with the input speech. The one or more processors are also configured to select, based at least in part on the first characteristic, one or more reference embeddings from among multiple reference embeddings. The one or more processors are further configured to process a representation of source speech, using the one or more reference embeddings, to generate an output audio spectrum of output speech.

Type: Application

Filed: September 13, 2022

Publication date: March 14, 2024

Inventors: Kyungguen BYUN, Sunkuk MOON, Erik VISSER
Audio processing using sound source representations

Patent number: 11869478

Abstract: A device includes one or more processors configured to receive an input audio signal. The one or more processors are also configured to process the input audio signal based on a combined representation of multiple sound sources to generate an output audio signal. The combined representation is used to selectively retain or remove sounds of the multiple sound sources from the input audio signal. The one or more processors are further configured to provide the output audio signal to a second device.

Type: Grant

Filed: March 18, 2022

Date of Patent: January 9, 2024

Assignee: QUALCOMM Incorporated

Inventors: Siddhartha Goutham Swaminathan, Sunkuk Moon, Shuhua Zhang, Erik Visser
CONTEXT-BASED SPEECH ENHANCEMENT

Publication number: 20230326477

Abstract: A device to perform speech enhancement includes one or more processors configured to process image data to detect at least one of an emotion, a speaker characteristic, or a noise type. The one or more processors are also configured to generate context data based at least in part on the at least one of the emotion, the speaker characteristic, or the noise type. The one or more processors are further configured to obtain input spectral data based on an input signal. The input signal represents sound that includes speech. The one or more processors are also configured to process, using a multi-encoder transformer, the input spectral data and the context data to generate output spectral data that represents a speech enhanced version of the input signal.

Type: Application

Filed: June 14, 2023

Publication date: October 12, 2023

Inventors: Kyungguen BYUN, Shuhua ZHANG, Lae-Hoon KIM, Erik VISSER, Sunkuk MOON, Vahid MONTAZERI
SHARED SPEECH PROCESSING NETWORK FOR MULTIPLE SPEECH APPLICATIONS

Publication number: 20230300527

Abstract: A device to process speech includes a speech processing network that includes an input configured to receive audio data. The speech processing network also includes one or more network layers configured to process the audio data to generate a network output. The speech processing network includes an output configured to be coupled to multiple speech application modules to enable the network output to be provided as a common input to each of the multiple speech application modules.

Type: Application

Filed: May 26, 2023

Publication date: September 21, 2023

Inventors: Lae-Hoon KIM, Sunkuk MOON, Erik VISSER, Prajakt KULKARNI
AUDIO PROCESSING USING SOUND SOURCE REPRESENTATIONS

Publication number: 20230298561

Abstract: A device includes one or more processors configured to receive an input audio signal. The one or more processors are also configured to process the input audio signal based on a combined representation of multiple sound sources to generate an output audio signal. The combined representation is used to selectively retain or remove sounds of the multiple sound sources from the input audio signal. The one or more processors are further configured to provide the output audio signal to a second device.

Type: Application

Filed: March 18, 2022

Publication date: September 21, 2023

Inventors: Siddhartha Goutham SWAMINATHAN, Sunkuk Moon, Shuhua Zhang, Erik Visser
Context-based speech enhancement

Patent number: 11715480

Abstract: A device to perform speech enhancement includes one or more processors configured to obtain input spectral data based on an input signal. The input signal represents sound that includes speech. The one or more processors are also configured to process, using a multi-encoder transformer, the input spectral data and context data to generate output spectral data that represents a speech enhanced version of the input signal.

Type: Grant

Filed: March 23, 2021

Date of Patent: August 1, 2023

Assignee: QUALCOMM Incorporated

Inventors: Kyungguen Byun, Shuhua Zhang, Lae-Hoon Kim, Erik Visser, Sunkuk Moon, Vahid Montazeri
Shared speech processing network for multiple speech applications

Patent number: 11700484

Abstract: A device to process speech includes a speech processing network that includes an input configured to receive audio data corresponding to audio captured by one or more microphones. The speech processing network also includes one or more network layers configured to process the audio data to generate a network output. The speech processing network includes an output configured to be coupled to multiple speech application modules to enable the network output to be provided as a common input to each of the multiple speech application modules. A first speech application module corresponds to a speaker verifier, and a second speech application module corresponds to a speech recognition network.

Type: Grant

Filed: February 10, 2022

Date of Patent: July 11, 2023

Assignee: QUALCOMM Incorporated

Inventors: Lae-Hoon Kim, Sunkuk Moon, Erik Visser, Prajakt Kulkarni
Synthesized speech generation

Patent number: 11676571

Abstract: A device for speech generation includes one or more processors configured to receive one or more control parameters indicating target speech characteristics. The one or more processors are also configured to process, using a multi-encoder, an input representation of speech based on the one or more control parameters to generate encoded data corresponding to an audio signal that represents a version of the speech based on the target speech characteristics.

Type: Grant

Filed: January 21, 2021

Date of Patent: June 13, 2023

Assignee: QUALCOMM Incorporated

Inventors: Kyungguen Byun, Sunkuk Moon, Shuhua Zhang, Vahid Montazeri, Lae-Hoon Kim, Erik Visser
User speech profile management

Patent number: 11626104

Abstract: A device includes processors configured to determine, in a first power mode, whether an audio stream corresponds to speech of at least two talkers. The processors are configured to, based on determining that the audio stream corresponds to speech of at least two talkers, analyze, in a second power mode, audio feature data of the audio stream to generate a segmentation result. The processors are configured to perform a comparison of a plurality of user speech profiles to an audio feature data set of a plurality of audio feature data sets of a talker-homogenous audio segment to determine whether the audio feature data set matches any of the user speech profiles. The processors are configured to, based on determining that the audio feature data set does not match any of the plurality of user speech profiles, generate a user speech profile based on the plurality of audio feature data sets.

Type: Grant

Filed: December 8, 2020

Date of Patent: April 11, 2023

Assignee: QUALCOMM Incorporated

Inventors: Soo Jin Park, Sunkuk Moon, Lae-Hoon Kim, Erik Visser
CONTEXT-BASED SPEECH ENHANCEMENT

Publication number: 20220310108

Abstract: A device to perform speech enhancement includes one or more processors configured to obtain input spectral data based on an input signal. The input signal represents sound that includes speech. The one or more processors are also configured to process, using a multi-encoder transformer, the input spectral data and context data to generate output spectral data that represents a speech enhanced version of the input signal.

Type: Application

Filed: March 23, 2021

Publication date: September 29, 2022

Inventors: Kyungguen BYUN, Shuhua ZHANG, Lae-Hoon KIM, Erik VISSER, Sunkuk MOON, Vahid MONTAZERI
SYNTHESIZED SPEECH GENERATION

Publication number: 20220230623

Abstract: A device for speech generation includes one or more processors configured to receive one or more control parameters indicating target speech characteristics. The one or more processors are also configured to process, using a multi-encoder, an input representation of speech based on the one or more control parameters to generate encoded data corresponding to an audio signal that represents a version of the speech based on the target speech characteristics.

Type: Application

Filed: January 21, 2021

Publication date: July 21, 2022

Applicant: QUALCOMM Incorporated

Inventors: Kyungguen BYUN, Sunkuk MOON, Shuhua ZHANG, Vahid MONTAZERI, Lae-Hoon KIM, Erik VISSER
USER SPEECH PROFILE MANAGEMENT

Publication number: 20220180859

Abstract: A device includes processors configured to determine, in a first power mode, whether an audio stream corresponds to speech of at least two talkers. The processors are configured to, based on determining that the audio stream corresponds to speech of at least two talkers, analyze, in a second power mode, audio feature data of the audio stream to generate a segmentation result. The processors are configured to perform a comparison of a plurality of user speech profiles to an audio feature data set of a plurality of audio feature data sets of a talker-homogenous audio segment to determine whether the audio feature data set matches any of the user speech profiles. The processors are configured to, based on determining that the audio feature data set does not match any of the plurality of user speech profiles, generate a user speech profile based on the plurality of audio feature data sets.

Type: Application

Filed: December 8, 2020

Publication date: June 9, 2022

Inventors: Soo Jin PARK, Sunkuk MOON, Lae-Hoon KIM, Erik VISSER
Multi-modal user interface

Patent number: 11348581

Abstract: A device for multi-modal user input includes a processor configured to process first data received from a first input device. The first data indicates a first input from a user based on a first input mode. The first input corresponds to a command. The processor is configured to send a feedback message to an output device based on processing the first data. The feedback message instructs the user to provide, based on a second input mode that is different from the first input mode, a second input that identifies a command associated with the first input. The processor is configured to receive second data from a second input device, the second data indicating the second input, and to update a mapping to associate the first input to the command identified by the second input.

Type: Grant

Filed: November 15, 2019

Date of Patent: May 31, 2022

Assignee: Qualcomm Incorporated

Inventors: Ravi Choudhary, Lae-Hoon Kim, Sunkuk Moon, Yinyi Guo, Fatemeh Saki, Erik Visser
SHARED SPEECH PROCESSING NETWORK FOR MULTIPLE SPEECH APPLICATIONS

Publication number: 20220165285

Abstract: A device to process speech includes a speech processing network that includes an input configured to receive audio data corresponding to audio captured by one or more microphones. The speech processing network also includes one or more network layers configured to process the audio data to generate a network output. The speech processing network includes an output configured to be coupled to multiple speech application modules to enable the network output to be provided as a common input to each of the multiple speech application modules. A first speech application module corresponds to a speaker verifier, and a second speech application module corresponds to a speech recognition network.

Type: Application

Filed: February 10, 2022

Publication date: May 26, 2022

Inventors: Lae-Hoon KIM, Sunkuk MOON, Erik VISSER, Prajakt KULKARNI
Shared speech processing network for multiple speech applications

Patent number: 11276415

Abstract: A device to process speech includes a speech processing network that includes an input configured to receive audio data corresponding to audio captured by one or more microphones. The speech processing network also includes one or more network layers configured to process the audio data to generate an output representation of the audio data. The speech processing network includes an output configured to be coupled to multiple speech application modules to enable the output representation to be provided as a common input to each of the multiple speech application modules.

Type: Grant

Filed: April 9, 2020

Date of Patent: March 15, 2022

Assignee: QUALCOMM Incorporated

Inventors: Lae-Hoon Kim, Sunkuk Moon, Erik Visser, Prajakt Kulkarni

1 2 next