Patents by Inventor Erik Visser

Erik Visser has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20250118318
    Abstract: A device includes a memory configured to store audio data. The device also includes one or more processors configured to use a first machine-learning model to process first audio data to generate first spatial sector audio data. The first spatial sector audio data is associated with a first spatial sector. The one or more processors are also configured to use a second machine-learning model to process second audio data to generate second spatial sector audio data. The second spatial sector audio data is associated with a second spatial sector. The one or more processors are further configured to generate output data based on the first spatial sector audio data, the second spatial sector audio data, or both.
    Type: Application
    Filed: October 4, 2024
    Publication date: April 10, 2025
    Inventors: Vahid MONTAZERI, Rogerio Guedes ALVES, Erik VISSER
  • Publication number: 20250119704
    Abstract: A device includes a memory configured to store audio data. The device also includes one or more processors configured to obtain, from first audio data, first subband audio data and second subband audio data. The first subband audio data is associated with a first frequency subband and the second subband audio data is associated with a second frequency subband. The one or more processors are also configured to use a first machine-learning model to process the first subband audio data to generate first subband noise suppressed audio data. The one or more processors are further configured to use a second machine-learning model to process the second subband audio data to generate second subband noise suppressed audio data. The one or more processors are also configured to generate output data based on the first subband noise suppressed audio data and the second subband noise suppressed audio data.
    Type: Application
    Filed: October 4, 2024
    Publication date: April 10, 2025
    Inventors: Vahid MONTAZERI, Rogerio Guedes ALVES, Erik VISSER
  • Publication number: 20250103888
    Abstract: A device includes one or more processors configured to receive sensor data from one or more sensor devices. The one or more processors are also configured to determine a context of the device based on the sensor data. The one or more processors are further configured to select a model based on the context. The one or more processors are also configured to process an input signal using the model to generate a context-specific output.
    Type: Application
    Filed: December 5, 2024
    Publication date: March 27, 2025
    Inventors: Fatemeh SAKI, Yinyi GUO, Erik VISSER
  • Publication number: 20250078818
    Abstract: Systems and techniques are described for generating and using unimodal/multimodal generative models that mitigate hallucinations. For example, a computing device can encode input data to generate encoded representations of the input data. The computing device can obtain intermediate data including a plurality of partial sentences associated with the input data and can generate, based on the intermediate data, at least one complete sentence associated with the input data. The computing device can encode the at least one complete sentence to generate at least one encoded representation of the at least one complete sentence. The computing device can generate a faithfulness score based on a comparison of the encoded representations of the input data and the at least one encoded representation of the at least one complete sentence. The computing device can re-rank the plurality of partial sentences of the intermediate data based on the faithfulness score to generate re-ranked data.
    Type: Application
    Filed: February 28, 2024
    Publication date: March 6, 2025
    Inventors: Arvind Krishna SRIDHAR, Rehana MAHFUZ, Erik VISSER, Yinyi GUO
  • Publication number: 20250077177
    Abstract: In general, techniques are described that enable voice activation for computing devices. A computing device configured to support an audible interface that comprises a memory and one or more processors may be configured to perform the techniques. The memory may store a first audio signal representative of an environment external to a user associated with the computing device and a second audio signal sensed by a microphone coupled to a housing of the computing device. The one or more processors may verify, based on the first audio signal and the second audio signal, that the user activated the audible interface of the computing device, and obtain, based on the verification, additional audio signals representative of one or more audible commands.
    Type: Application
    Filed: November 20, 2024
    Publication date: March 6, 2025
    Inventors: Taher Shahbazi Mirzahasanloo, Rogerio Guedes Alves, Lae-Hoon Kim, Erik Visser, Dongmei Wang, Fatemeh Saki
  • Publication number: 20250078810
    Abstract: Systems and techniques described herein relate to a diffusion-based model for generating converted speech from a source speech based on target speech. For example, a device may extract first prosody data from input data and may generate a content embedding based on the input data. The device may extract second prosody data from target speech, generate a speaker embedding from the target speech, and generate a prosody embedding from the second prosody data. The device may generate, based on the first prosody data and the prosody embedding, converted prosody data. The device may then generate a converted spectrogram based on the converted prosody data, the speaker embedding, and the content embedding.
    Type: Application
    Filed: October 25, 2023
    Publication date: March 6, 2025
    Inventors: Kyungguen BYUN, Sunkuk MOON, Erik VISSER
  • Publication number: 20250078828
    Abstract: Systems and techniques are provided for natural language processing. A system generates a plurality of tokens (e.g., words or portions thereof) based on input content (e.g., text and/or speech). The system searches through the plurality of tokens to generate a first ranking the plurality of tokens based on probability. The system generates natural language inference (NLI) scores for the plurality of tokens to generate a second ranking of the plurality of tokens based on faithfulness to the input content (e.g., whether the tokens produce statements that are true based on the input content). The system generates output text that includes at least one token selected from the plurality of tokens based on the first ranking and the second ranking.
    Type: Application
    Filed: August 21, 2024
    Publication date: March 6, 2025
    Inventors: Rehana MAHFUZ, Yinyi GUO, Arvind Krishna SRIDHAR, Erik VISSER
  • Patent number: 12244994
    Abstract: A first device includes a memory configured to store instructions and one or more processors configured to receive audio signals from multiple microphones. The one or more processors are configured to process the audio signals to generate direction-of-arrival information corresponding to one or more sources of sound represented in one or more of the audio signals. The one or more processors are also configured to and send, to a second device, data based on the direction-of-arrival information and a class or embedding associated with the direction-of-arrival information.
    Type: Grant
    Filed: July 25, 2022
    Date of Patent: March 4, 2025
    Assignee: QUALCOMM Incorporated
    Inventors: Erik Visser, Fatemeh Saki, Yinyi Guo, Lae-Hoon Kim, Rogerio Guedes Alves, Hannes Pessentheiner
  • Patent number: 12238497
    Abstract: Gesture-responsive modification of a generated sound field is described.
    Type: Grant
    Filed: November 13, 2023
    Date of Patent: February 25, 2025
    Assignee: QUALCOMM Incorporated
    Inventors: Pei Xiang, Erik Visser
  • Patent number: 12198057
    Abstract: A device includes one or more processors configured to receive sensor data from one or more sensor devices. The one or more processors are also configured to determine a context of the device based on the sensor data. The one or more processors are further configured to select a model based on the context. The one or more processors are also configured to process an input signal using the model to generate a context-specific output.
    Type: Grant
    Filed: November 24, 2020
    Date of Patent: January 14, 2025
    Assignee: QUALCOMM Incorporated
    Inventors: Fatemeh Saki, Yinyi Guo, Erik Visser
  • Patent number: 12200450
    Abstract: A device to process speech includes a speech processing network that includes an input configured to receive audio data. The speech processing network also includes one or more network layers configured to process the audio data to generate a network output. The speech processing network includes an output configured to be coupled to multiple speech application modules to enable the network output to be provided as a common input to each of the multiple speech application modules.
    Type: Grant
    Filed: May 26, 2023
    Date of Patent: January 14, 2025
    Assignee: QUALCOMM Incorporated
    Inventors: Lae-Hoon Kim, Sunkuk Moon, Erik Visser, Prajakt Kulkarni
  • Publication number: 20240419731
    Abstract: A device includes a processor configured to obtain a first audio embedding of a first audio segment and obtain a first text embedding of a first tag assigned to the first audio segment. The first audio segment corresponds to a first audio event of audio events. The processor is configured to obtain a first event representation based on a combination of the first audio embedding and the first text embedding. The processor is configured to obtain a second event representation of a second audio event of the audio events. The processor is also configured to determine, based on knowledge data, relations between the audio events. The processor is configured to construct an audio scene graph based on a temporal order of the audio events. The audio scene graph constructed to include a first node corresponding to the first audio event and a second node corresponding to the second audio event.
    Type: Application
    Filed: June 10, 2024
    Publication date: December 19, 2024
    Inventors: Arvind Krishna SRIDHAR, Yinyi GUO, Erik VISSER
  • Patent number: 12153858
    Abstract: In general, techniques are described that enable voice activation for computing devices. A computing device configured to support an audible interface that comprises a memory and one or more processors may be configured to perform the techniques. The memory may store a first audio signal representative of an environment external to a user associated with the computing device and a second audio signal sensed by a microphone coupled to a housing of the computing device. The one or more processors may verify, based on the first audio signal and the second audio signal, that the user activated the audible interface of the computing device, and obtain, based on the verification, additional audio signals representative of one or more audible commands.
    Type: Grant
    Filed: February 25, 2020
    Date of Patent: November 26, 2024
    Assignee: QUALCOMM Incorporated
    Inventors: Taher Shahbazi Mirzahasanloo, Rogerio Guedes Alves, Lae-Hoon Kim, Erik Visser, Dongmei Wang, Fatemeh Saki
  • Publication number: 20240334125
    Abstract: A device includes one or more processors configured to obtain data specifying a target signal-to-noise ratio based on a hearing condition of a person and to obtain audio data representing one or more audio signals. The one or more processors are configured to determine, based on the target signal-to-noise ratio, a first gain to apply to first components of the audio data and a second gain to apply to second components of the audio data. The one or more processors are configured to apply the first gain to the first components of the audio data to generate a target signal and to apply the second gain to the second components of the audio data to generate a noise signal. The one or more processors are further configured to combine the target signal and the noise signal to generate an output audio signal.
    Type: Application
    Filed: May 24, 2023
    Publication date: October 3, 2024
    Inventors: Rogerio Guedes ALVES, Jacob Jon BEAN, Erik VISSER
  • Publication number: 20240331716
    Abstract: A device includes one or more processors configured to obtain audio data representing one or more audio signals. The audio data includes a first segment and a second segment subsequent to the first segment. The one or more processors are configured to perform one or more transform operations on the first segment to generate frequency-domain audio data. The one or more processors are configured to provide input data based on the frequency-domain audio data as input to one or more machine-learning models to generate a noise-suppression output. The one or more processors are configured to perform one or more reverse transform operations on the noise-suppression output to generate time-domain filter coefficients. The one or more processors are configured to perform time-domain filtering of the second segment using the time-domain filter coefficients to generate a noise-suppressed output signal.
    Type: Application
    Filed: March 20, 2024
    Publication date: October 3, 2024
    Inventors: Jacob Jon BEAN, Rogerio Guedes ALVES, Vahid MONTAZERI, Erik VISSER
  • Publication number: 20240331679
    Abstract: This disclosure provides systems, methods, and devices for audio signal processing that support feedback cancellation in a personal audio amplification system. In a first aspect, a method of signal processing includes receiving an input audio signal, wherein the input audio signal includes a desired audio component and a feedback component; and reducing the feedback component by applying a machine learning model to the input audio signal to determine an output audio signal. Other aspects and features are also claimed and described.
    Type: Application
    Filed: March 20, 2024
    Publication date: October 3, 2024
    Inventors: Vahid Montazeri, Rogerio Guedes Alves, You Wang, Jacob Jon Bean, Erik Visser
  • Patent number: 12069425
    Abstract: A wearable device may include a processor configured to detect a self-voice signal, based on one or more transducers. The processor may be configured to separate the self-voice signal from a background signal in an external audio signal based on using a multi-microphone speech generative network. The processor may also be configured to apply a first filter to an external audio signal, detected by at least one external microphone on the wearable device, during a listen through operation based on an activation of the audio zoom feature to generate a first listen-through signal that includes the external audio signal. The processor may be configured to produce an output audio signal that is based on at least the first listen-through signal that includes the external signal, and is based on the detected self-voice signal.
    Type: Grant
    Filed: July 10, 2023
    Date of Patent: August 20, 2024
    Assignee: QUALCOMM Incorporated
    Inventors: Lae-Hoon Kim, Dongmei Wang, Fatemeh Saki, Taher Shahbazi Mirzahasanloo, Erik Visser, Rogerio Guedes Alves
  • Patent number: 12063490
    Abstract: Methods, systems, and devices for signal processing are described. Generally, as provided for by the described techniques, a wearable device to receive an input audio signal from one or more outer microphones, an input audio signal from one or more inner microphones, and a bone conduction signal from a bone conduction sensor based on the input audio signals. The wearable device may filter the bone conduction signal based on a set of frequencies of the input audio signals, such as a low frequency portion of the input audio signals. For example, the wearable device may apply a filter to the bone conduction signal that accounts for an error in the input audio signals. The wearable device may add a gain to the filtered bone conduction signal and may equalize the filtered bone conduction signal based on the gain. The wearable device may output an audio signal to a speaker.
    Type: Grant
    Filed: February 10, 2023
    Date of Patent: August 13, 2024
    Assignee: QUALCOMM Incorporated
    Inventors: Lae-Hoon Kim, Rogerio Guedes Alves, Jacob Jon Bean, Erik Visser
  • Patent number: 12051429
    Abstract: A device includes a memory configured to store untransformed ambisonic coefficients at different time segments. The device includes one or more processors configured to obtain the untransformed ambisonic coefficients at the different time segments, where the untransformed ambisonic coefficients at the different time segments represent a soundfield at the different time segments. The one or more processors are configured to apply one adaptive network, based on a constraint that includes preservation of a spatial direction of one or more audio sources in the soundfield at the different time segments, to the untransformed ambisonic coefficients at the different time segments to generate transformed ambisonic coefficients at the different time segments, wherein the transformed ambisonic coefficients at the different time segments represent a modified soundfield at the different time segments, that was modified based on the constraint.
    Type: Grant
    Filed: April 24, 2023
    Date of Patent: July 30, 2024
    Assignee: QUALCOMM Incorporated
    Inventors: Lae-Hoon Kim, Shankar Thagadur Shivappa, S M Akramus Salehin, Shuhua Zhang, Erik Visser
  • Publication number: 20240232258
    Abstract: A device includes one or more processors configured to generate one or more query caption embeddings based on a query. The processor(s) are further configured to select one or more caption embeddings from among a set of embeddings associated with a set of media files of a file repository. Each caption embedding represents a corresponding sound caption, and each sound caption includes a natural-language text description of a sound. The caption embedding(s) are selected based on a similarity metric indicative of similarity between the caption embedding(s) and the query caption embedding(s). The processor(s) are further configured to generate search results identifying one or more first media files of the set of media files. Each of the first media file(s) is associated with at least one of the caption embedding(s).
    Type: Application
    Filed: May 31, 2023
    Publication date: July 11, 2024
    Inventors: Rehana MAHFUZ, Yinyi GUO, Erik VISSER