Patents by Inventor Erik Visser
Erik Visser has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20250118318Abstract: A device includes a memory configured to store audio data. The device also includes one or more processors configured to use a first machine-learning model to process first audio data to generate first spatial sector audio data. The first spatial sector audio data is associated with a first spatial sector. The one or more processors are also configured to use a second machine-learning model to process second audio data to generate second spatial sector audio data. The second spatial sector audio data is associated with a second spatial sector. The one or more processors are further configured to generate output data based on the first spatial sector audio data, the second spatial sector audio data, or both.Type: ApplicationFiled: October 4, 2024Publication date: April 10, 2025Inventors: Vahid MONTAZERI, Rogerio Guedes ALVES, Erik VISSER
-
Publication number: 20250119704Abstract: A device includes a memory configured to store audio data. The device also includes one or more processors configured to obtain, from first audio data, first subband audio data and second subband audio data. The first subband audio data is associated with a first frequency subband and the second subband audio data is associated with a second frequency subband. The one or more processors are also configured to use a first machine-learning model to process the first subband audio data to generate first subband noise suppressed audio data. The one or more processors are further configured to use a second machine-learning model to process the second subband audio data to generate second subband noise suppressed audio data. The one or more processors are also configured to generate output data based on the first subband noise suppressed audio data and the second subband noise suppressed audio data.Type: ApplicationFiled: October 4, 2024Publication date: April 10, 2025Inventors: Vahid MONTAZERI, Rogerio Guedes ALVES, Erik VISSER
-
Publication number: 20250103888Abstract: A device includes one or more processors configured to receive sensor data from one or more sensor devices. The one or more processors are also configured to determine a context of the device based on the sensor data. The one or more processors are further configured to select a model based on the context. The one or more processors are also configured to process an input signal using the model to generate a context-specific output.Type: ApplicationFiled: December 5, 2024Publication date: March 27, 2025Inventors: Fatemeh SAKI, Yinyi GUO, Erik VISSER
-
Publication number: 20250078818Abstract: Systems and techniques are described for generating and using unimodal/multimodal generative models that mitigate hallucinations. For example, a computing device can encode input data to generate encoded representations of the input data. The computing device can obtain intermediate data including a plurality of partial sentences associated with the input data and can generate, based on the intermediate data, at least one complete sentence associated with the input data. The computing device can encode the at least one complete sentence to generate at least one encoded representation of the at least one complete sentence. The computing device can generate a faithfulness score based on a comparison of the encoded representations of the input data and the at least one encoded representation of the at least one complete sentence. The computing device can re-rank the plurality of partial sentences of the intermediate data based on the faithfulness score to generate re-ranked data.Type: ApplicationFiled: February 28, 2024Publication date: March 6, 2025Inventors: Arvind Krishna SRIDHAR, Rehana MAHFUZ, Erik VISSER, Yinyi GUO
-
Publication number: 20250077177Abstract: In general, techniques are described that enable voice activation for computing devices. A computing device configured to support an audible interface that comprises a memory and one or more processors may be configured to perform the techniques. The memory may store a first audio signal representative of an environment external to a user associated with the computing device and a second audio signal sensed by a microphone coupled to a housing of the computing device. The one or more processors may verify, based on the first audio signal and the second audio signal, that the user activated the audible interface of the computing device, and obtain, based on the verification, additional audio signals representative of one or more audible commands.Type: ApplicationFiled: November 20, 2024Publication date: March 6, 2025Inventors: Taher Shahbazi Mirzahasanloo, Rogerio Guedes Alves, Lae-Hoon Kim, Erik Visser, Dongmei Wang, Fatemeh Saki
-
Publication number: 20250078810Abstract: Systems and techniques described herein relate to a diffusion-based model for generating converted speech from a source speech based on target speech. For example, a device may extract first prosody data from input data and may generate a content embedding based on the input data. The device may extract second prosody data from target speech, generate a speaker embedding from the target speech, and generate a prosody embedding from the second prosody data. The device may generate, based on the first prosody data and the prosody embedding, converted prosody data. The device may then generate a converted spectrogram based on the converted prosody data, the speaker embedding, and the content embedding.Type: ApplicationFiled: October 25, 2023Publication date: March 6, 2025Inventors: Kyungguen BYUN, Sunkuk MOON, Erik VISSER
-
Publication number: 20250078828Abstract: Systems and techniques are provided for natural language processing. A system generates a plurality of tokens (e.g., words or portions thereof) based on input content (e.g., text and/or speech). The system searches through the plurality of tokens to generate a first ranking the plurality of tokens based on probability. The system generates natural language inference (NLI) scores for the plurality of tokens to generate a second ranking of the plurality of tokens based on faithfulness to the input content (e.g., whether the tokens produce statements that are true based on the input content). The system generates output text that includes at least one token selected from the plurality of tokens based on the first ranking and the second ranking.Type: ApplicationFiled: August 21, 2024Publication date: March 6, 2025Inventors: Rehana MAHFUZ, Yinyi GUO, Arvind Krishna SRIDHAR, Erik VISSER
-
Patent number: 12244994Abstract: A first device includes a memory configured to store instructions and one or more processors configured to receive audio signals from multiple microphones. The one or more processors are configured to process the audio signals to generate direction-of-arrival information corresponding to one or more sources of sound represented in one or more of the audio signals. The one or more processors are also configured to and send, to a second device, data based on the direction-of-arrival information and a class or embedding associated with the direction-of-arrival information.Type: GrantFiled: July 25, 2022Date of Patent: March 4, 2025Assignee: QUALCOMM IncorporatedInventors: Erik Visser, Fatemeh Saki, Yinyi Guo, Lae-Hoon Kim, Rogerio Guedes Alves, Hannes Pessentheiner
-
Patent number: 12238497Abstract: Gesture-responsive modification of a generated sound field is described.Type: GrantFiled: November 13, 2023Date of Patent: February 25, 2025Assignee: QUALCOMM IncorporatedInventors: Pei Xiang, Erik Visser
-
Patent number: 12198057Abstract: A device includes one or more processors configured to receive sensor data from one or more sensor devices. The one or more processors are also configured to determine a context of the device based on the sensor data. The one or more processors are further configured to select a model based on the context. The one or more processors are also configured to process an input signal using the model to generate a context-specific output.Type: GrantFiled: November 24, 2020Date of Patent: January 14, 2025Assignee: QUALCOMM IncorporatedInventors: Fatemeh Saki, Yinyi Guo, Erik Visser
-
Patent number: 12200450Abstract: A device to process speech includes a speech processing network that includes an input configured to receive audio data. The speech processing network also includes one or more network layers configured to process the audio data to generate a network output. The speech processing network includes an output configured to be coupled to multiple speech application modules to enable the network output to be provided as a common input to each of the multiple speech application modules.Type: GrantFiled: May 26, 2023Date of Patent: January 14, 2025Assignee: QUALCOMM IncorporatedInventors: Lae-Hoon Kim, Sunkuk Moon, Erik Visser, Prajakt Kulkarni
-
Publication number: 20240419731Abstract: A device includes a processor configured to obtain a first audio embedding of a first audio segment and obtain a first text embedding of a first tag assigned to the first audio segment. The first audio segment corresponds to a first audio event of audio events. The processor is configured to obtain a first event representation based on a combination of the first audio embedding and the first text embedding. The processor is configured to obtain a second event representation of a second audio event of the audio events. The processor is also configured to determine, based on knowledge data, relations between the audio events. The processor is configured to construct an audio scene graph based on a temporal order of the audio events. The audio scene graph constructed to include a first node corresponding to the first audio event and a second node corresponding to the second audio event.Type: ApplicationFiled: June 10, 2024Publication date: December 19, 2024Inventors: Arvind Krishna SRIDHAR, Yinyi GUO, Erik VISSER
-
Patent number: 12153858Abstract: In general, techniques are described that enable voice activation for computing devices. A computing device configured to support an audible interface that comprises a memory and one or more processors may be configured to perform the techniques. The memory may store a first audio signal representative of an environment external to a user associated with the computing device and a second audio signal sensed by a microphone coupled to a housing of the computing device. The one or more processors may verify, based on the first audio signal and the second audio signal, that the user activated the audible interface of the computing device, and obtain, based on the verification, additional audio signals representative of one or more audible commands.Type: GrantFiled: February 25, 2020Date of Patent: November 26, 2024Assignee: QUALCOMM IncorporatedInventors: Taher Shahbazi Mirzahasanloo, Rogerio Guedes Alves, Lae-Hoon Kim, Erik Visser, Dongmei Wang, Fatemeh Saki
-
Publication number: 20240334125Abstract: A device includes one or more processors configured to obtain data specifying a target signal-to-noise ratio based on a hearing condition of a person and to obtain audio data representing one or more audio signals. The one or more processors are configured to determine, based on the target signal-to-noise ratio, a first gain to apply to first components of the audio data and a second gain to apply to second components of the audio data. The one or more processors are configured to apply the first gain to the first components of the audio data to generate a target signal and to apply the second gain to the second components of the audio data to generate a noise signal. The one or more processors are further configured to combine the target signal and the noise signal to generate an output audio signal.Type: ApplicationFiled: May 24, 2023Publication date: October 3, 2024Inventors: Rogerio Guedes ALVES, Jacob Jon BEAN, Erik VISSER
-
Publication number: 20240331716Abstract: A device includes one or more processors configured to obtain audio data representing one or more audio signals. The audio data includes a first segment and a second segment subsequent to the first segment. The one or more processors are configured to perform one or more transform operations on the first segment to generate frequency-domain audio data. The one or more processors are configured to provide input data based on the frequency-domain audio data as input to one or more machine-learning models to generate a noise-suppression output. The one or more processors are configured to perform one or more reverse transform operations on the noise-suppression output to generate time-domain filter coefficients. The one or more processors are configured to perform time-domain filtering of the second segment using the time-domain filter coefficients to generate a noise-suppressed output signal.Type: ApplicationFiled: March 20, 2024Publication date: October 3, 2024Inventors: Jacob Jon BEAN, Rogerio Guedes ALVES, Vahid MONTAZERI, Erik VISSER
-
Publication number: 20240331679Abstract: This disclosure provides systems, methods, and devices for audio signal processing that support feedback cancellation in a personal audio amplification system. In a first aspect, a method of signal processing includes receiving an input audio signal, wherein the input audio signal includes a desired audio component and a feedback component; and reducing the feedback component by applying a machine learning model to the input audio signal to determine an output audio signal. Other aspects and features are also claimed and described.Type: ApplicationFiled: March 20, 2024Publication date: October 3, 2024Inventors: Vahid Montazeri, Rogerio Guedes Alves, You Wang, Jacob Jon Bean, Erik Visser
-
Patent number: 12069425Abstract: A wearable device may include a processor configured to detect a self-voice signal, based on one or more transducers. The processor may be configured to separate the self-voice signal from a background signal in an external audio signal based on using a multi-microphone speech generative network. The processor may also be configured to apply a first filter to an external audio signal, detected by at least one external microphone on the wearable device, during a listen through operation based on an activation of the audio zoom feature to generate a first listen-through signal that includes the external audio signal. The processor may be configured to produce an output audio signal that is based on at least the first listen-through signal that includes the external signal, and is based on the detected self-voice signal.Type: GrantFiled: July 10, 2023Date of Patent: August 20, 2024Assignee: QUALCOMM IncorporatedInventors: Lae-Hoon Kim, Dongmei Wang, Fatemeh Saki, Taher Shahbazi Mirzahasanloo, Erik Visser, Rogerio Guedes Alves
-
Patent number: 12063490Abstract: Methods, systems, and devices for signal processing are described. Generally, as provided for by the described techniques, a wearable device to receive an input audio signal from one or more outer microphones, an input audio signal from one or more inner microphones, and a bone conduction signal from a bone conduction sensor based on the input audio signals. The wearable device may filter the bone conduction signal based on a set of frequencies of the input audio signals, such as a low frequency portion of the input audio signals. For example, the wearable device may apply a filter to the bone conduction signal that accounts for an error in the input audio signals. The wearable device may add a gain to the filtered bone conduction signal and may equalize the filtered bone conduction signal based on the gain. The wearable device may output an audio signal to a speaker.Type: GrantFiled: February 10, 2023Date of Patent: August 13, 2024Assignee: QUALCOMM IncorporatedInventors: Lae-Hoon Kim, Rogerio Guedes Alves, Jacob Jon Bean, Erik Visser
-
Patent number: 12051429Abstract: A device includes a memory configured to store untransformed ambisonic coefficients at different time segments. The device includes one or more processors configured to obtain the untransformed ambisonic coefficients at the different time segments, where the untransformed ambisonic coefficients at the different time segments represent a soundfield at the different time segments. The one or more processors are configured to apply one adaptive network, based on a constraint that includes preservation of a spatial direction of one or more audio sources in the soundfield at the different time segments, to the untransformed ambisonic coefficients at the different time segments to generate transformed ambisonic coefficients at the different time segments, wherein the transformed ambisonic coefficients at the different time segments represent a modified soundfield at the different time segments, that was modified based on the constraint.Type: GrantFiled: April 24, 2023Date of Patent: July 30, 2024Assignee: QUALCOMM IncorporatedInventors: Lae-Hoon Kim, Shankar Thagadur Shivappa, S M Akramus Salehin, Shuhua Zhang, Erik Visser
-
Publication number: 20240232258Abstract: A device includes one or more processors configured to generate one or more query caption embeddings based on a query. The processor(s) are further configured to select one or more caption embeddings from among a set of embeddings associated with a set of media files of a file repository. Each caption embedding represents a corresponding sound caption, and each sound caption includes a natural-language text description of a sound. The caption embedding(s) are selected based on a similarity metric indicative of similarity between the caption embedding(s) and the query caption embedding(s). The processor(s) are further configured to generate search results identifying one or more first media files of the set of media files. Each of the first media file(s) is associated with at least one of the caption embedding(s).Type: ApplicationFiled: May 31, 2023Publication date: July 11, 2024Inventors: Rehana MAHFUZ, Yinyi GUO, Erik VISSER