Patents by Inventor Erik Visser

Erik Visser has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12051429
    Abstract: A device includes a memory configured to store untransformed ambisonic coefficients at different time segments. The device includes one or more processors configured to obtain the untransformed ambisonic coefficients at the different time segments, where the untransformed ambisonic coefficients at the different time segments represent a soundfield at the different time segments. The one or more processors are configured to apply one adaptive network, based on a constraint that includes preservation of a spatial direction of one or more audio sources in the soundfield at the different time segments, to the untransformed ambisonic coefficients at the different time segments to generate transformed ambisonic coefficients at the different time segments, wherein the transformed ambisonic coefficients at the different time segments represent a modified soundfield at the different time segments, that was modified based on the constraint.
    Type: Grant
    Filed: April 24, 2023
    Date of Patent: July 30, 2024
    Assignee: QUALCOMM Incorporated
    Inventors: Lae-Hoon Kim, Shankar Thagadur Shivappa, S M Akramus Salehin, Shuhua Zhang, Erik Visser
  • Publication number: 20240232258
    Abstract: A device includes one or more processors configured to generate one or more query caption embeddings based on a query. The processor(s) are further configured to select one or more caption embeddings from among a set of embeddings associated with a set of media files of a file repository. Each caption embedding represents a corresponding sound caption, and each sound caption includes a natural-language text description of a sound. The caption embedding(s) are selected based on a similarity metric indicative of similarity between the caption embedding(s) and the query caption embedding(s). The processor(s) are further configured to generate search results identifying one or more first media files of the set of media files. Each of the first media file(s) is associated with at least one of the caption embedding(s).
    Type: Application
    Filed: May 31, 2023
    Publication date: July 11, 2024
    Inventors: Rehana MAHFUZ, Yinyi GUO, Erik VISSER
  • Publication number: 20240221752
    Abstract: In an aspect, a user equipment receives, via a microphone, an utterance from a user and determines, using radio frequency sensing, that the user performed a gesture while making the utterance. The user equipment determines an object associated with the gesture and transmits an enhanced directive to an application programming interface (API) of a smart assistance device. The enhanced directive is determined based on the object, the gesture, and the utterance. The enhanced directive causes the smart assistant device to perform an action.
    Type: Application
    Filed: May 5, 2022
    Publication date: July 4, 2024
    Inventors: Jason FILOS, Xiaoxin ZHANG, Lae-Hoon KIM, Erik VISSER
  • Publication number: 20240184988
    Abstract: Systems and techniques are provided for natural language processing. A system generates a plurality of tokens (e.g., words or portions thereof) based on input content (e.g., text and/or speech). The system searches through the plurality of tokens to generate a first ranking the plurality of tokens based on probability. The system generates natural language inference (NLI) scores for the plurality of tokens to generate a second ranking of the plurality of tokens based on faithfulness to the input content (e.g., whether the tokens produce statements that are true based on the input content). The system generates output text that includes at least one token selected from the plurality of tokens based on the first ranking and the second ranking.
    Type: Application
    Filed: March 30, 2023
    Publication date: June 6, 2024
    Inventors: Arvind Krishna SRIDHAR, Erik VISSER
  • Patent number: 12002455
    Abstract: A device includes a memory configured to store instructions. The device also includes one or more processors configured to execute the instructions to provide context and one or more items of interest corresponding to the context to a dependency network encoder to generate a semantic-based representation of the context. The one or more processors are also configured to provide the context to a data dependent encoder to generate a context-based representation. The one or more processors are further configured to combine the semantic-based representation and the context-based representation to generate a semantically-augmented representation of the context.
    Type: Grant
    Filed: July 22, 2021
    Date of Patent: June 4, 2024
    Assignee: QUALCOMM Incorporated
    Inventors: Arvind Krishna Sridhar, Ravi Choudhary, Lae-Hoon Kim, Erik Visser
  • Publication number: 20240155303
    Abstract: Disclosed are systems and techniques for detecting audio sources and configuring acoustic device settings. For instance, a wireless device can obtain a first set of radio frequency (RF) sensing data associated with a first plurality of received waveforms corresponding to a first transmitted waveform reflected off of a plurality of reflectors. Based on the first set of RF sensing data, the wireless device can determine a classification of a first reflector from the plurality of reflectors. The wireless device can determine at least one acoustic setting based on the classification of the at least one reflector.
    Type: Application
    Filed: May 2, 2022
    Publication date: May 9, 2024
    Inventors: Erik VISSER, Lae-Hoon KIM, Jason FILOS, Xiaoxin ZHANG
  • Publication number: 20240134908
    Abstract: A device includes one or more processors configured to generate one or more query caption embeddings based on a query. The processor(s) are further configured to select one or more caption embeddings from among a set of embeddings associated with a set of media files of a file repository. Each caption embedding represents a corresponding sound caption, and each sound caption includes a natural-language text description of a sound. The caption embedding(s) are selected based on a similarity metric indicative of similarity between the caption embedding(s) and the query caption embedding(s). The processor(s) are further configured to generate search results identifying one or more first media files of the set of media files. Each of the first media file(s) is associated with at least one of the caption embedding(s).
    Type: Application
    Filed: May 30, 2023
    Publication date: April 25, 2024
    Inventors: Rehana MAHFUZ, Yinyi GUO, Erik VISSER
  • Publication number: 20240135959
    Abstract: A device to perform target sound detection includes a memory including a buffer configured to store audio data. The device includes one or more processors coupled to the memory. The one or more processors are configured to receive the audio data from the buffer. The one or more processors are configured to detect the presence or absence of one or more target non-speech sounds in the audio data. The one or more processors are further configured to generate a user interface signal, to indicate one of the one or more target non-speech sounds has been detected, and provide the user interface signal to an output device.
    Type: Application
    Filed: December 18, 2023
    Publication date: April 25, 2024
    Inventors: Prajakt KULKARNI, Yinyi GUO, Erik VISSER
  • Publication number: 20240098420
    Abstract: Gesture-responsive modification of a generated sound field is described.
    Type: Application
    Filed: November 13, 2023
    Publication date: March 21, 2024
    Inventors: Pei Xiang, Erik Visser
  • Publication number: 20240087597
    Abstract: A device includes one or more processors configured to process an input audio spectrum of input speech to detect a first characteristic associated with the input speech. The one or more processors are also configured to select, based at least in part on the first characteristic, one or more reference embeddings from among multiple reference embeddings. The one or more processors are further configured to process a representation of source speech, using the one or more reference embeddings, to generate an output audio spectrum of output speech.
    Type: Application
    Filed: September 13, 2022
    Publication date: March 14, 2024
    Inventors: Kyungguen BYUN, Sunkuk MOON, Erik VISSER
  • Patent number: 11869478
    Abstract: A device includes one or more processors configured to receive an input audio signal. The one or more processors are also configured to process the input audio signal based on a combined representation of multiple sound sources to generate an output audio signal. The combined representation is used to selectively retain or remove sounds of the multiple sound sources from the input audio signal. The one or more processors are further configured to provide the output audio signal to a second device.
    Type: Grant
    Filed: March 18, 2022
    Date of Patent: January 9, 2024
    Assignee: QUALCOMM Incorporated
    Inventors: Siddhartha Goutham Swaminathan, Sunkuk Moon, Shuhua Zhang, Erik Visser
  • Patent number: 11862189
    Abstract: A device to perform target sound detection includes one or more processors. The one or more processors include a buffer configured to store audio data and a target sound detector. The target sound detector includes a first stage and a second stage. The first stage includes a binary target sound classifier configured to process the audio data. The first stage is configured to activate the second stage in response to detection of a target sound. The second stage is configured to receive the audio data from the buffer in response to the detection of the target sound.
    Type: Grant
    Filed: April 1, 2020
    Date of Patent: January 2, 2024
    Assignee: QUALCOMM Incorporated
    Inventors: Prajakt Kulkarni, Yinyi Guo, Erik Visser
  • Patent number: 11818560
    Abstract: Gesture-responsive modification of a generated sound field is described.
    Type: Grant
    Filed: September 27, 2019
    Date of Patent: November 14, 2023
    Assignee: QUALCOMM Incorporated
    Inventors: Pei Xiang, Erik Visser
  • Publication number: 20230353929
    Abstract: A wearable device may include a processor configured to detect a self-voice signal, based on one or more transducers. The processor may be configured to separate the self-voice signal from a background signal in an external audio signal based on using a multi-microphone speech generative network. The processor may also be configured to apply a first filter to an external audio signal, detected by at least one external microphone on the wearable device, during a listen through operation based on an activation of the audio zoom feature to generate a first listen-through signal that includes the external audio signal. The processor may be configured to produce an output audio signal that is based on at least the first listen-through signal that includes the external signal, and is based on the detected self-voice signal.
    Type: Application
    Filed: July 10, 2023
    Publication date: November 2, 2023
    Inventors: Lae-Hoon KIM, Dongmei WANG, Fatemeh SAKI, Taher SHAHBAZI MIRZAHASANLOO, Erik VISSER, Rogerio Guedes ALVES
  • Patent number: 11805360
    Abstract: A device includes a memory configured to store instructions and one or more processors configured to execute the instructions. The one or more processors are configured to execute the instructions to receive audio data including a first audio frame corresponding to a first output of a first microphone and a second audio frame corresponding to a second output of a second microphone. The one or more processors are also configured to execute the instructions to provide the audio data to a first noise-suppression network and a second noise-suppression network. The first noise-suppression network is configured to generate a first noise-suppressed audio frame and the second noise-suppression network is configured to generate a second noise-suppressed audio frame. The one or more processors are further configured to execute the instructions to provide the noise-suppressed audio frames to an attention-pooling network. The attention-pooling network is configured to generate an output noise-suppressed audio frame.
    Type: Grant
    Filed: July 21, 2021
    Date of Patent: October 31, 2023
    Assignee: QUALCOMM Incorporated
    Inventors: Vahid Montazeri, Van Nguyen, Hannes Pessentheiner, Lae-Hoon Kim, Erik Visser, Rogerio Guedes Alves
  • Patent number: 11804233
    Abstract: A device includes one or more processors configured to perform signal processing including a linear transformation and a non-linear transformation of an input signal to generate a reference target signal. The reference target signal has a linear component associated with the linear transformation and a non-linear component associated with the non-linear transformation. The one or more processors are also configured to perform linear filtering of the input signal by controlling adaptation of the linear filtering to generate an output signal that substantially matches the linear component of the reference target signal.
    Type: Grant
    Filed: November 15, 2019
    Date of Patent: October 31, 2023
    Assignee: QUALCOMM Incorporated
    Inventors: Lae-Hoon Kim, Dongmei Wang, Cheng-Yu Hung, Erik Visser
  • Patent number: 11805380
    Abstract: A device includes one or more processors configured to determine, based on data descriptive of two or more audio environments, a geometry of a mutual audio environment. The one or more processors are also configured to process audio data, based on the geometry of the mutual audio environment, for output at an audio device disposed in a first audio environment of the two or more audio environments.
    Type: Grant
    Filed: August 31, 2021
    Date of Patent: October 31, 2023
    Assignee: QUALCOMM Incorporated
    Inventors: S M Akramus Salehin, Lae-Hoon Kim, Xiaoxin Zhang, Erik Visser
  • Publication number: 20230326477
    Abstract: A device to perform speech enhancement includes one or more processors configured to process image data to detect at least one of an emotion, a speaker characteristic, or a noise type. The one or more processors are also configured to generate context data based at least in part on the at least one of the emotion, the speaker characteristic, or the noise type. The one or more processors are further configured to obtain input spectral data based on an input signal. The input signal represents sound that includes speech. The one or more processors are also configured to process, using a multi-encoder transformer, the input spectral data and the context data to generate output spectral data that represents a speech enhanced version of the input signal.
    Type: Application
    Filed: June 14, 2023
    Publication date: October 12, 2023
    Inventors: Kyungguen BYUN, Shuhua ZHANG, Lae-Hoon KIM, Erik VISSER, Sunkuk MOON, Vahid MONTAZERI
  • Patent number: 11783809
    Abstract: A device includes a memory configured to store instructions and one or more processors configured execute the instructions. The one or more processors are configured execute the instructions to receive audio data including first audio data corresponding to a first output of a first microphone and second audio data corresponding to a second output of a second microphone. The one or more processors are also configured to execute the instructions to provide the audio data to a dynamic classifier. The dynamic classifier is configured to generate a classification output corresponding to the audio data. The one or more processors are further configured to execute the instructions to determine, at least partially based on the classification output, whether the audio data corresponds to user voice activity.
    Type: Grant
    Filed: May 5, 2021
    Date of Patent: October 10, 2023
    Assignee: QUALCOMM Incorporated
    Inventors: Taher Shahbazi Mirzahasanloo, Rogerio Guedes Alves, Erik Visser, Lae-Hoon Kim
  • Publication number: 20230300527
    Abstract: A device to process speech includes a speech processing network that includes an input configured to receive audio data. The speech processing network also includes one or more network layers configured to process the audio data to generate a network output. The speech processing network includes an output configured to be coupled to multiple speech application modules to enable the network output to be provided as a common input to each of the multiple speech application modules.
    Type: Application
    Filed: May 26, 2023
    Publication date: September 21, 2023
    Inventors: Lae-Hoon KIM, Sunkuk MOON, Erik VISSER, Prajakt KULKARNI