Patents by Inventor Erik Visser

Erik Visser has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

MACHINE-LEARNING BASED AUDIO SUBBAND PROCESSING

Publication number: 20250118318

Abstract: A device includes a memory configured to store audio data. The device also includes one or more processors configured to use a first machine-learning model to process first audio data to generate first spatial sector audio data. The first spatial sector audio data is associated with a first spatial sector. The one or more processors are also configured to use a second machine-learning model to process second audio data to generate second spatial sector audio data. The second spatial sector audio data is associated with a second spatial sector. The one or more processors are further configured to generate output data based on the first spatial sector audio data, the second spatial sector audio data, or both.

Type: Application

Filed: October 4, 2024

Publication date: April 10, 2025

Inventors: Vahid MONTAZERI, Rogerio Guedes ALVES, Erik VISSER
MACHINE-LEARNING BASED AUDIO SUBBAND PROCESSING

Publication number: 20250119704

Abstract: A device includes a memory configured to store audio data. The device also includes one or more processors configured to obtain, from first audio data, first subband audio data and second subband audio data. The first subband audio data is associated with a first frequency subband and the second subband audio data is associated with a second frequency subband. The one or more processors are also configured to use a first machine-learning model to process the first subband audio data to generate first subband noise suppressed audio data. The one or more processors are further configured to use a second machine-learning model to process the second subband audio data to generate second subband noise suppressed audio data. The one or more processors are also configured to generate output data based on the first subband noise suppressed audio data and the second subband noise suppressed audio data.

Type: Application

Filed: October 4, 2024

Publication date: April 10, 2025

Inventors: Vahid MONTAZERI, Rogerio Guedes ALVES, Erik VISSER
CONTEXT-BASED MODEL SELECTION

Publication number: 20250103888

Abstract: A device includes one or more processors configured to receive sensor data from one or more sensor devices. The one or more processors are also configured to determine a context of the device based on the sensor data. The one or more processors are further configured to select a model based on the context. The one or more processors are also configured to process an input signal using the model to generate a context-specific output.

Type: Application

Filed: December 5, 2024

Publication date: March 27, 2025

Inventors: Fatemeh SAKI, Yinyi GUO, Erik VISSER
VOICE ACTIVATION FOR COMPUTING DEVICES

Publication number: 20250077177

Abstract: In general, techniques are described that enable voice activation for computing devices. A computing device configured to support an audible interface that comprises a memory and one or more processors may be configured to perform the techniques. The memory may store a first audio signal representative of an environment external to a user associated with the computing device and a second audio signal sensed by a microphone coupled to a housing of the computing device. The one or more processors may verify, based on the first audio signal and the second audio signal, that the user activated the audible interface of the computing device, and obtain, based on the verification, additional audio signals representative of one or more audible commands.

Type: Application

Filed: November 20, 2024

Publication date: March 6, 2025

Inventors: Taher Shahbazi Mirzahasanloo, Rogerio Guedes Alves, Lae-Hoon Kim, Erik Visser, Dongmei Wang, Fatemeh Saki
CONTROLLABLE DIFFUSION-BASED SPEECH GENERATIVE MODEL

Publication number: 20250078810

Abstract: Systems and techniques described herein relate to a diffusion-based model for generating converted speech from a source speech based on target speech. For example, a device may extract first prosody data from input data and may generate a content embedding based on the input data. The device may extract second prosody data from target speech, generate a speaker embedding from the target speech, and generate a prosody embedding from the second prosody data. The device may generate, based on the first prosody data and the prosody embedding, converted prosody data. The device may then generate a converted spectrogram based on the converted prosody data, the speaker embedding, and the content embedding.

Type: Application

Filed: October 25, 2023

Publication date: March 6, 2025

Inventors: Kyungguen BYUN, Sunkuk MOON, Erik VISSER
FAITHFUL GENERATION OF OUTPUT TEXT FOR MULTIMODAL APPLICATIONS

Publication number: 20250078818

Abstract: Systems and techniques are described for generating and using unimodal/multimodal generative models that mitigate hallucinations. For example, a computing device can encode input data to generate encoded representations of the input data. The computing device can obtain intermediate data including a plurality of partial sentences associated with the input data and can generate, based on the intermediate data, at least one complete sentence associated with the input data. The computing device can encode the at least one complete sentence to generate at least one encoded representation of the at least one complete sentence. The computing device can generate a faithfulness score based on a comparison of the encoded representations of the input data and the at least one encoded representation of the at least one complete sentence. The computing device can re-rank the plurality of partial sentences of the intermediate data based on the faithfulness score to generate re-ranked data.

Type: Application

Filed: February 28, 2024

Publication date: March 6, 2025

Inventors: Arvind Krishna SRIDHAR, Rehana MAHFUZ, Erik VISSER, Yinyi GUO
AUTOMATED AUDIO CAPTION CORRECTION USING FALSE ALARM AND MISS DETECTION

Publication number: 20250078828

Abstract: Systems and techniques are provided for natural language processing. A system generates a plurality of tokens (e.g., words or portions thereof) based on input content (e.g., text and/or speech). The system searches through the plurality of tokens to generate a first ranking the plurality of tokens based on probability. The system generates natural language inference (NLI) scores for the plurality of tokens to generate a second ranking of the plurality of tokens based on faithfulness to the input content (e.g., whether the tokens produce statements that are true based on the input content). The system generates output text that includes at least one token selected from the plurality of tokens based on the first ranking and the second ranking.

Type: Application

Filed: August 21, 2024

Publication date: March 6, 2025

Inventors: Rehana MAHFUZ, Yinyi GUO, Arvind Krishna SRIDHAR, Erik VISSER
Processing of audio signals from multiple microphones

Patent number: 12244994

Abstract: A first device includes a memory configured to store instructions and one or more processors configured to receive audio signals from multiple microphones. The one or more processors are configured to process the audio signals to generate direction-of-arrival information corresponding to one or more sources of sound represented in one or more of the audio signals. The one or more processors are also configured to and send, to a second device, data based on the direction-of-arrival information and a class or embedding associated with the direction-of-arrival information.

Type: Grant

Filed: July 25, 2022

Date of Patent: March 4, 2025

Assignee: QUALCOMM Incorporated

Inventors: Erik Visser, Fatemeh Saki, Yinyi Guo, Lae-Hoon Kim, Rogerio Guedes Alves, Hannes Pessentheiner
Systems, methods, apparatus, and computer-readable media for gestural manipulation of a sound field

Patent number: 12238497

Abstract: Gesture-responsive modification of a generated sound field is described.

Type: Grant

Filed: November 13, 2023

Date of Patent: February 25, 2025

Assignee: QUALCOMM Incorporated

Inventors: Pei Xiang, Erik Visser
Context-based model selection

Patent number: 12198057

Abstract: A device includes one or more processors configured to receive sensor data from one or more sensor devices. The one or more processors are also configured to determine a context of the device based on the sensor data. The one or more processors are further configured to select a model based on the context. The one or more processors are also configured to process an input signal using the model to generate a context-specific output.

Type: Grant

Filed: November 24, 2020

Date of Patent: January 14, 2025

Assignee: QUALCOMM Incorporated

Inventors: Fatemeh Saki, Yinyi Guo, Erik Visser
Shared speech processing network for multiple speech applications

Patent number: 12200450

Abstract: A device to process speech includes a speech processing network that includes an input configured to receive audio data. The speech processing network also includes one or more network layers configured to process the audio data to generate a network output. The speech processing network includes an output configured to be coupled to multiple speech application modules to enable the network output to be provided as a common input to each of the multiple speech application modules.

Type: Grant

Filed: May 26, 2023

Date of Patent: January 14, 2025

Assignee: QUALCOMM Incorporated

Inventors: Lae-Hoon Kim, Sunkuk Moon, Erik Visser, Prajakt Kulkarni
KNOWLEDGE-BASED AUDIO SCENE GRAPH

Publication number: 20240419731

Abstract: A device includes a processor configured to obtain a first audio embedding of a first audio segment and obtain a first text embedding of a first tag assigned to the first audio segment. The first audio segment corresponds to a first audio event of audio events. The processor is configured to obtain a first event representation based on a combination of the first audio embedding and the first text embedding. The processor is configured to obtain a second event representation of a second audio event of the audio events. The processor is also configured to determine, based on knowledge data, relations between the audio events. The processor is configured to construct an audio scene graph based on a temporal order of the audio events. The audio scene graph constructed to include a first node corresponding to the first audio event and a second node corresponding to the second audio event.

Type: Application

Filed: June 10, 2024

Publication date: December 19, 2024

Inventors: Arvind Krishna SRIDHAR, Yinyi GUO, Erik VISSER
Voice activation for computing devices

Patent number: 12153858

Abstract: In general, techniques are described that enable voice activation for computing devices. A computing device configured to support an audible interface that comprises a memory and one or more processors may be configured to perform the techniques. The memory may store a first audio signal representative of an environment external to a user associated with the computing device and a second audio signal sensed by a microphone coupled to a housing of the computing device. The one or more processors may verify, based on the first audio signal and the second audio signal, that the user activated the audible interface of the computing device, and obtain, based on the verification, additional audio signals representative of one or more audible commands.

Type: Grant

Filed: February 25, 2020

Date of Patent: November 26, 2024

Assignee: QUALCOMM Incorporated

Inventors: Taher Shahbazi Mirzahasanloo, Rogerio Guedes Alves, Lae-Hoon Kim, Erik Visser, Dongmei Wang, Fatemeh Saki
MACHINE LEARNING-BASED FEEDBACK CANCELLATION

Publication number: 20240331679

Abstract: This disclosure provides systems, methods, and devices for audio signal processing that support feedback cancellation in a personal audio amplification system. In a first aspect, a method of signal processing includes receiving an input audio signal, wherein the input audio signal includes a desired audio component and a feedback component; and reducing the feedback component by applying a machine learning model to the input audio signal to determine an output audio signal. Other aspects and features are also claimed and described.

Type: Application

Filed: March 20, 2024

Publication date: October 3, 2024

Inventors: Vahid Montazeri, Rogerio Guedes Alves, You Wang, Jacob Jon Bean, Erik Visser
LOW-LATENCY NOISE SUPPRESSION

Publication number: 20240331716

Abstract: A device includes one or more processors configured to obtain audio data representing one or more audio signals. The audio data includes a first segment and a second segment subsequent to the first segment. The one or more processors are configured to perform one or more transform operations on the first segment to generate frequency-domain audio data. The one or more processors are configured to provide input data based on the frequency-domain audio data as input to one or more machine-learning models to generate a noise-suppression output. The one or more processors are configured to perform one or more reverse transform operations on the noise-suppression output to generate time-domain filter coefficients. The one or more processors are configured to perform time-domain filtering of the second segment using the time-domain filter coefficients to generate a noise-suppressed output signal.

Type: Application

Filed: March 20, 2024

Publication date: October 3, 2024

Inventors: Jacob Jon BEAN, Rogerio Guedes ALVES, Vahid MONTAZERI, Erik VISSER
AUDIO PROCESSING BASED ON TARGET SIGNAL-TO-NOISE RATIO

Publication number: 20240334125

Abstract: A device includes one or more processors configured to obtain data specifying a target signal-to-noise ratio based on a hearing condition of a person and to obtain audio data representing one or more audio signals. The one or more processors are configured to determine, based on the target signal-to-noise ratio, a first gain to apply to first components of the audio data and a second gain to apply to second components of the audio data. The one or more processors are configured to apply the first gain to the first components of the audio data to generate a target signal and to apply the second gain to the second components of the audio data to generate a noise signal. The one or more processors are further configured to combine the target signal and the noise signal to generate an output audio signal.

Type: Application

Filed: May 24, 2023

Publication date: October 3, 2024

Inventors: Rogerio Guedes ALVES, Jacob Jon BEAN, Erik VISSER
Separation of self-voice signal from a background signal using a speech generative network on a wearable device

Patent number: 12069425

Abstract: A wearable device may include a processor configured to detect a self-voice signal, based on one or more transducers. The processor may be configured to separate the self-voice signal from a background signal in an external audio signal based on using a multi-microphone speech generative network. The processor may also be configured to apply a first filter to an external audio signal, detected by at least one external microphone on the wearable device, during a listen through operation based on an activation of the audio zoom feature to generate a first listen-through signal that includes the external audio signal. The processor may be configured to produce an output audio signal that is based on at least the first listen-through signal that includes the external signal, and is based on the detected self-voice signal.

Type: Grant

Filed: July 10, 2023

Date of Patent: August 20, 2024

Assignee: QUALCOMM Incorporated

Inventors: Lae-Hoon Kim, Dongmei Wang, Fatemeh Saki, Taher Shahbazi Mirzahasanloo, Erik Visser, Rogerio Guedes Alves
Active self-voice naturalization using a bone conduction sensor

Patent number: 12063490

Abstract: Methods, systems, and devices for signal processing are described. Generally, as provided for by the described techniques, a wearable device to receive an input audio signal from one or more outer microphones, an input audio signal from one or more inner microphones, and a bone conduction signal from a bone conduction sensor based on the input audio signals. The wearable device may filter the bone conduction signal based on a set of frequencies of the input audio signals, such as a low frequency portion of the input audio signals. For example, the wearable device may apply a filter to the bone conduction signal that accounts for an error in the input audio signals. The wearable device may add a gain to the filtered bone conduction signal and may equalize the filtered bone conduction signal based on the gain. The wearable device may output an audio signal to a speaker.

Type: Grant

Filed: February 10, 2023

Date of Patent: August 13, 2024

Assignee: QUALCOMM Incorporated

Inventors: Lae-Hoon Kim, Rogerio Guedes Alves, Jacob Jon Bean, Erik Visser
Transform ambisonic coefficients using an adaptive network for preserving spatial direction

Patent number: 12051429

Abstract: A device includes a memory configured to store untransformed ambisonic coefficients at different time segments. The device includes one or more processors configured to obtain the untransformed ambisonic coefficients at the different time segments, where the untransformed ambisonic coefficients at the different time segments represent a soundfield at the different time segments. The one or more processors are configured to apply one adaptive network, based on a constraint that includes preservation of a spatial direction of one or more audio sources in the soundfield at the different time segments, to the untransformed ambisonic coefficients at the different time segments to generate transformed ambisonic coefficients at the different time segments, wherein the transformed ambisonic coefficients at the different time segments represent a modified soundfield at the different time segments, that was modified based on the constraint.

Type: Grant

Filed: April 24, 2023

Date of Patent: July 30, 2024

Assignee: QUALCOMM Incorporated

Inventors: Lae-Hoon Kim, Shankar Thagadur Shivappa, S M Akramus Salehin, Shuhua Zhang, Erik Visser
SOUND SEARCH

Publication number: 20240232258

Abstract: A device includes one or more processors configured to generate one or more query caption embeddings based on a query. The processor(s) are further configured to select one or more caption embeddings from among a set of embeddings associated with a set of media files of a file repository. Each caption embedding represents a corresponding sound caption, and each sound caption includes a natural-language text description of a sound. The caption embedding(s) are selected based on a similarity metric indicative of similarity between the caption embedding(s) and the query caption embedding(s). The processor(s) are further configured to generate search results identifying one or more first media files of the set of media files. Each of the first media file(s) is associated with at least one of the caption embedding(s).

Type: Application

Filed: May 31, 2023

Publication date: July 11, 2024

Inventors: Rehana MAHFUZ, Yinyi GUO, Erik VISSER

1 2 3 4 5 … next