Patents by Inventor Erik Visser

Erik Visser has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Transform ambisonic coefficients using an adaptive network for preserving spatial direction

Patent number: 12051429

Abstract: A device includes a memory configured to store untransformed ambisonic coefficients at different time segments. The device includes one or more processors configured to obtain the untransformed ambisonic coefficients at the different time segments, where the untransformed ambisonic coefficients at the different time segments represent a soundfield at the different time segments. The one or more processors are configured to apply one adaptive network, based on a constraint that includes preservation of a spatial direction of one or more audio sources in the soundfield at the different time segments, to the untransformed ambisonic coefficients at the different time segments to generate transformed ambisonic coefficients at the different time segments, wherein the transformed ambisonic coefficients at the different time segments represent a modified soundfield at the different time segments, that was modified based on the constraint.

Type: Grant

Filed: April 24, 2023

Date of Patent: July 30, 2024

Assignee: QUALCOMM Incorporated

Inventors: Lae-Hoon Kim, Shankar Thagadur Shivappa, S M Akramus Salehin, Shuhua Zhang, Erik Visser
SOUND SEARCH

Publication number: 20240232258

Abstract: A device includes one or more processors configured to generate one or more query caption embeddings based on a query. The processor(s) are further configured to select one or more caption embeddings from among a set of embeddings associated with a set of media files of a file repository. Each caption embedding represents a corresponding sound caption, and each sound caption includes a natural-language text description of a sound. The caption embedding(s) are selected based on a similarity metric indicative of similarity between the caption embedding(s) and the query caption embedding(s). The processor(s) are further configured to generate search results identifying one or more first media files of the set of media files. Each of the first media file(s) is associated with at least one of the caption embedding(s).

Type: Application

Filed: May 31, 2023

Publication date: July 11, 2024

Inventors: Rehana MAHFUZ, Yinyi GUO, Erik VISSER
ENABLING A GESTURE INTERFACE FOR VOICE ASSISTANTS USING RADIO FREQUENCY (RF) SENSING

Publication number: 20240221752

Abstract: In an aspect, a user equipment receives, via a microphone, an utterance from a user and determines, using radio frequency sensing, that the user performed a gesture while making the utterance. The user equipment determines an object associated with the gesture and transmits an enhanced directive to an application programming interface (API) of a smart assistance device. The enhanced directive is determined based on the object, the gesture, and the utterance. The enhanced directive causes the smart assistant device to perform an action.

Type: Application

Filed: May 5, 2022

Publication date: July 4, 2024

Inventors: Jason FILOS, Xiaoxin ZHANG, Lae-Hoon KIM, Erik VISSER
HALLUCINATION MITIGATION FOR GENERATIVE TRANSFORMER MODELS

Publication number: 20240184988

Abstract: Systems and techniques are provided for natural language processing. A system generates a plurality of tokens (e.g., words or portions thereof) based on input content (e.g., text and/or speech). The system searches through the plurality of tokens to generate a first ranking the plurality of tokens based on probability. The system generates natural language inference (NLI) scores for the plurality of tokens to generate a second ranking of the plurality of tokens based on faithfulness to the input content (e.g., whether the tokens produce statements that are true based on the input content). The system generates output text that includes at least one token selected from the plurality of tokens based on the first ranking and the second ranking.

Type: Application

Filed: March 30, 2023

Publication date: June 6, 2024

Inventors: Arvind Krishna SRIDHAR, Erik VISSER
Semantically-augmented context representation generation

Patent number: 12002455

Abstract: A device includes a memory configured to store instructions. The device also includes one or more processors configured to execute the instructions to provide context and one or more items of interest corresponding to the context to a dependency network encoder to generate a semantic-based representation of the context. The one or more processors are also configured to provide the context to a data dependent encoder to generate a context-based representation. The one or more processors are further configured to combine the semantic-based representation and the context-based representation to generate a semantically-augmented representation of the context.

Type: Grant

Filed: July 22, 2021

Date of Patent: June 4, 2024

Assignee: QUALCOMM Incorporated

Inventors: Arvind Krishna Sridhar, Ravi Choudhary, Lae-Hoon Kim, Erik Visser
ACOUSTIC CONFIGURATION BASED ON RADIO FREQUENCY SENSING

Publication number: 20240155303

Abstract: Disclosed are systems and techniques for detecting audio sources and configuring acoustic device settings. For instance, a wireless device can obtain a first set of radio frequency (RF) sensing data associated with a first plurality of received waveforms corresponding to a first transmitted waveform reflected off of a plurality of reflectors. Based on the first set of RF sensing data, the wireless device can determine a classification of a first reflector from the plurality of reflectors. The wireless device can determine at least one acoustic setting based on the classification of the at least one reflector.

Type: Application

Filed: May 2, 2022

Publication date: May 9, 2024

Inventors: Erik VISSER, Lae-Hoon KIM, Jason FILOS, Xiaoxin ZHANG
SOUND SEARCH

Publication number: 20240134908

Abstract: A device includes one or more processors configured to generate one or more query caption embeddings based on a query. The processor(s) are further configured to select one or more caption embeddings from among a set of embeddings associated with a set of media files of a file repository. Each caption embedding represents a corresponding sound caption, and each sound caption includes a natural-language text description of a sound. The caption embedding(s) are selected based on a similarity metric indicative of similarity between the caption embedding(s) and the query caption embedding(s). The processor(s) are further configured to generate search results identifying one or more first media files of the set of media files. Each of the first media file(s) is associated with at least one of the caption embedding(s).

Type: Application

Filed: May 30, 2023

Publication date: April 25, 2024

Inventors: Rehana MAHFUZ, Yinyi GUO, Erik VISSER
METHOD AND APPARATUS FOR TARGET SOUND DETECTION

Publication number: 20240135959

Abstract: A device to perform target sound detection includes a memory including a buffer configured to store audio data. The device includes one or more processors coupled to the memory. The one or more processors are configured to receive the audio data from the buffer. The one or more processors are configured to detect the presence or absence of one or more target non-speech sounds in the audio data. The one or more processors are further configured to generate a user interface signal, to indicate one of the one or more target non-speech sounds has been detected, and provide the user interface signal to an output device.

Type: Application

Filed: December 18, 2023

Publication date: April 25, 2024

Inventors: Prajakt KULKARNI, Yinyi GUO, Erik VISSER
SYSTEMS, METHODS, APPARATUS, AND COMPUTER-READABLE MEDIA FOR GESTURAL MANIPULATION OF A SOUND FIELD

Publication number: 20240098420

Abstract: Gesture-responsive modification of a generated sound field is described.

Type: Application

Filed: November 13, 2023

Publication date: March 21, 2024

Inventors: Pei Xiang, Erik Visser
SOURCE SPEECH MODIFICATION BASED ON AN INPUT SPEECH CHARACTERISTIC

Publication number: 20240087597

Abstract: A device includes one or more processors configured to process an input audio spectrum of input speech to detect a first characteristic associated with the input speech. The one or more processors are also configured to select, based at least in part on the first characteristic, one or more reference embeddings from among multiple reference embeddings. The one or more processors are further configured to process a representation of source speech, using the one or more reference embeddings, to generate an output audio spectrum of output speech.

Type: Application

Filed: September 13, 2022

Publication date: March 14, 2024

Inventors: Kyungguen BYUN, Sunkuk MOON, Erik VISSER
Audio processing using sound source representations

Patent number: 11869478

Abstract: A device includes one or more processors configured to receive an input audio signal. The one or more processors are also configured to process the input audio signal based on a combined representation of multiple sound sources to generate an output audio signal. The combined representation is used to selectively retain or remove sounds of the multiple sound sources from the input audio signal. The one or more processors are further configured to provide the output audio signal to a second device.

Type: Grant

Filed: March 18, 2022

Date of Patent: January 9, 2024

Assignee: QUALCOMM Incorporated

Inventors: Siddhartha Goutham Swaminathan, Sunkuk Moon, Shuhua Zhang, Erik Visser
Method and apparatus for target sound detection

Patent number: 11862189

Abstract: A device to perform target sound detection includes one or more processors. The one or more processors include a buffer configured to store audio data and a target sound detector. The target sound detector includes a first stage and a second stage. The first stage includes a binary target sound classifier configured to process the audio data. The first stage is configured to activate the second stage in response to detection of a target sound. The second stage is configured to receive the audio data from the buffer in response to the detection of the target sound.

Type: Grant

Filed: April 1, 2020

Date of Patent: January 2, 2024

Assignee: QUALCOMM Incorporated

Inventors: Prajakt Kulkarni, Yinyi Guo, Erik Visser
Systems, methods, apparatus, and computer-readable media for gestural manipulation of a sound field

Patent number: 11818560

Abstract: Gesture-responsive modification of a generated sound field is described.

Type: Grant

Filed: September 27, 2019

Date of Patent: November 14, 2023

Assignee: QUALCOMM Incorporated

Inventors: Pei Xiang, Erik Visser
SEPARATION OF SELF-VOICE SIGNAL FROM A BACKGROUND SIGNAL USING A SPEECH GENERATIVE NETWORK ON A WEARABLE DEVICE

Publication number: 20230353929

Abstract: A wearable device may include a processor configured to detect a self-voice signal, based on one or more transducers. The processor may be configured to separate the self-voice signal from a background signal in an external audio signal based on using a multi-microphone speech generative network. The processor may also be configured to apply a first filter to an external audio signal, detected by at least one external microphone on the wearable device, during a listen through operation based on an activation of the audio zoom feature to generate a first listen-through signal that includes the external audio signal. The processor may be configured to produce an output audio signal that is based on at least the first listen-through signal that includes the external signal, and is based on the detected self-voice signal.

Type: Application

Filed: July 10, 2023

Publication date: November 2, 2023

Inventors: Lae-Hoon KIM, Dongmei WANG, Fatemeh SAKI, Taher SHAHBAZI MIRZAHASANLOO, Erik VISSER, Rogerio Guedes ALVES
Noise suppression using tandem networks

Patent number: 11805360

Abstract: A device includes a memory configured to store instructions and one or more processors configured to execute the instructions. The one or more processors are configured to execute the instructions to receive audio data including a first audio frame corresponding to a first output of a first microphone and a second audio frame corresponding to a second output of a second microphone. The one or more processors are also configured to execute the instructions to provide the audio data to a first noise-suppression network and a second noise-suppression network. The first noise-suppression network is configured to generate a first noise-suppressed audio frame and the second noise-suppression network is configured to generate a second noise-suppressed audio frame. The one or more processors are further configured to execute the instructions to provide the noise-suppressed audio frames to an attention-pooling network. The attention-pooling network is configured to generate an output noise-suppressed audio frame.

Type: Grant

Filed: July 21, 2021

Date of Patent: October 31, 2023

Assignee: QUALCOMM Incorporated

Inventors: Vahid Montazeri, Van Nguyen, Hannes Pessentheiner, Lae-Hoon Kim, Erik Visser, Rogerio Guedes Alves
Linearization of non-linearly transformed signals

Patent number: 11804233

Abstract: A device includes one or more processors configured to perform signal processing including a linear transformation and a non-linear transformation of an input signal to generate a reference target signal. The reference target signal has a linear component associated with the linear transformation and a non-linear component associated with the non-linear transformation. The one or more processors are also configured to perform linear filtering of the input signal by controlling adaptation of the linear filtering to generate an output signal that substantially matches the linear component of the reference target signal.

Type: Grant

Filed: November 15, 2019

Date of Patent: October 31, 2023

Assignee: QUALCOMM Incorporated

Inventors: Lae-Hoon Kim, Dongmei Wang, Cheng-Yu Hung, Erik Visser
Augmented audio for communications

Patent number: 11805380

Abstract: A device includes one or more processors configured to determine, based on data descriptive of two or more audio environments, a geometry of a mutual audio environment. The one or more processors are also configured to process audio data, based on the geometry of the mutual audio environment, for output at an audio device disposed in a first audio environment of the two or more audio environments.

Type: Grant

Filed: August 31, 2021

Date of Patent: October 31, 2023

Assignee: QUALCOMM Incorporated

Inventors: S M Akramus Salehin, Lae-Hoon Kim, Xiaoxin Zhang, Erik Visser
CONTEXT-BASED SPEECH ENHANCEMENT

Publication number: 20230326477

Abstract: A device to perform speech enhancement includes one or more processors configured to process image data to detect at least one of an emotion, a speaker characteristic, or a noise type. The one or more processors are also configured to generate context data based at least in part on the at least one of the emotion, the speaker characteristic, or the noise type. The one or more processors are further configured to obtain input spectral data based on an input signal. The input signal represents sound that includes speech. The one or more processors are also configured to process, using a multi-encoder transformer, the input spectral data and the context data to generate output spectral data that represents a speech enhanced version of the input signal.

Type: Application

Filed: June 14, 2023

Publication date: October 12, 2023

Inventors: Kyungguen BYUN, Shuhua ZHANG, Lae-Hoon KIM, Erik VISSER, Sunkuk MOON, Vahid MONTAZERI
User voice activity detection using dynamic classifier

Patent number: 11783809

Abstract: A device includes a memory configured to store instructions and one or more processors configured execute the instructions. The one or more processors are configured execute the instructions to receive audio data including first audio data corresponding to a first output of a first microphone and second audio data corresponding to a second output of a second microphone. The one or more processors are also configured to execute the instructions to provide the audio data to a dynamic classifier. The dynamic classifier is configured to generate a classification output corresponding to the audio data. The one or more processors are further configured to execute the instructions to determine, at least partially based on the classification output, whether the audio data corresponds to user voice activity.

Type: Grant

Filed: May 5, 2021

Date of Patent: October 10, 2023

Assignee: QUALCOMM Incorporated

Inventors: Taher Shahbazi Mirzahasanloo, Rogerio Guedes Alves, Erik Visser, Lae-Hoon Kim
SHARED SPEECH PROCESSING NETWORK FOR MULTIPLE SPEECH APPLICATIONS

Publication number: 20230300527

Abstract: A device to process speech includes a speech processing network that includes an input configured to receive audio data. The speech processing network also includes one or more network layers configured to process the audio data to generate a network output. The speech processing network includes an output configured to be coupled to multiple speech application modules to enable the network output to be provided as a common input to each of the multiple speech application modules.

Type: Application

Filed: May 26, 2023

Publication date: September 21, 2023

Inventors: Lae-Hoon KIM, Sunkuk MOON, Erik VISSER, Prajakt KULKARNI

prev 1 2 3 4 5 6 … next