Patents Assigned to SoundHound, Inc.

Method and system for acoustic model conditioning on non-phoneme information features

Patent number: 11741943

Abstract: A method and system for acoustic model conditioning on non-phoneme information features for optimized automatic speech recognition is provided. The method includes using an encoder model to encode sound embedding from a known key phrase of speech and conditioning an acoustic model with the sound embedding to optimize its performance in inferring the probabilities of phonemes in the speech. The sound embedding can comprise non-phoneme information related to the key phrase and the following utterance. Further, the encoder model and the acoustic model can be neural networks that are jointly trained with audio data.

Type: Grant

Filed: April 7, 2021

Date of Patent: August 29, 2023

Assignee: SoundHound, Inc

Inventors: Zizu Gowayyed, Keyvan Mohajer
Content filtering in media playing devices

Patent number: 11736769

Abstract: Various approaches relate to user defined content filtering in media playing devices of undesirable content represented in stored and real-time content from content providers. For example, video, image, and/or audio data can be analyzed to identify and classify content included in the data using various classification models and object and text recognition approaches. Thereafter, the identification and classification can be used to control presentation and/or access to the content and/or portions of the content. For example, based on the classification, portions of the content can be modified (e.g., replaced, removed, degraded, etc.) using one or more techniques (e.g., media replacement, media removal, media degradation, etc.) and then presented.

Type: Grant

Filed: April 12, 2021

Date of Patent: August 22, 2023

Assignee: SoundHound, Inc

Inventors: Thor S. Khov, Terry Kong
VIDEO CONFERENCE CAPTIONING

Publication number: 20230245661

Abstract: A video conferencing system, such as one implemented with a cloud server, receives audio streams from a plurality of endpoints. The system uses automatic speech recognition to transcribe speech in the audio streams. The system multiplexes the transcriptions into individual caption streams and sends them to the endpoints, but the caption stream to each endpoint omits the transcription of audio from the endpoint. Some systems allow muting of audio through an indication to the system. The system then omits sending the muted audio to other endpoints and also omits sending a transcription of the muted audio to other endpoints.

Type: Application

Filed: April 10, 2023

Publication date: August 3, 2023

Applicant: SoundHound, Inc.

Inventor: Ethan COEYTAUX
TOKEN CONFIDENCE SCORES FOR AUTOMATIC SPEECH RECOGNITION

Publication number: 20230245649

Abstract: Methods and systems for correction of a likely erroneous word in a speech transcription are disclosed. By evaluating token confidence scores of individual words or phrases, the automatic speech recognition system can replace a low-confidence score word with a substitute word or phrase. Among various approaches, neural network models can be used to generate individual confidence scores. Such word substitution can enable the speech recognition system to automatically detect and correct likely errors in transcription. Furthermore, the system can indicate the token confidence scores on a graphic user interface for labeling and dictionary enhancement.

Type: Application

Filed: February 3, 2022

Publication date: August 3, 2023

Applicant: SoundHound, Inc.

Inventors: Pranav SINGH, Saraswati MISHRA, Eunjee NA
METHOD AND APPARATUS FOR INTELLIGENT VOICE QUERY

Publication number: 20230237056

Abstract: A method and an apparatus for processing an intelligent voice query. A voice query input is received from a user. Automatic speech recognition and natural language understanding generate structured query data. It is modified based on an input adaptation rule to obtain modified structured query data appropriate for a content providing server, which provides a query result output corresponding to the modified structured query data. Input adaptation rules may comprise rule sets based on behavior patterns of the user and/or business recommendations. The query result output can be used for natural language generation, which may have similar adaptation rules for output.

Type: Application

Filed: March 14, 2022

Publication date: July 27, 2023

Applicant: SoundHound, Inc.

Inventor: Chong WANG
METHOD AND SYSTEM FOR ASSISTING A USER

Publication number: 20230206915

Abstract: A method of assisting a user. The method including obtaining a plurality of rules having condition components and action components, the action components specifying conversation schemas, detecting, by a sensor, a fact related to an environment of the user, identifying a rule, of the plurality of rules, having a condition component that is satisfied by the detected fact, initiating a conversation with the user according to a conversation schema of the action component of the rule of the plurality of rules, and performing an action in response to a positive statement by the user.

Type: Application

Filed: December 23, 2021

Publication date: June 29, 2023

Applicant: SoundHound, Inc.

Inventors: Keyvan MOHAJER, Kaishin KAM, Christophe PIERRET
SYSTEM AND METHOD FOR ANALYSIS OF SPOKEN NATURAL LANGUAGE TO DETECT PROMOTION PHRASES FOR PROVIDING FOLLOW-UP CONTENT

Publication number: 20230126052

Abstract: Systems and methods are disclosed that enable a user to speak a promoted phrase in response to a voice content or voice advertisement, which includes the promoted phrase. When the promoted phrase is spoken, then additional content is provided, such as additional advertisement. According to various examples, detection of the user speaking the promoted phrase is enabled once the voice advertisement ends. According to various examples, the additional content is related to the promoted phrase. According to various examples, detection of the user speaking the promoted phrase is done within a time frame; once the time frame is exceeded, detection of the user speaking the promoted phrase is disabled.

Type: Application

Filed: October 27, 2021

Publication date: April 27, 2023

Applicant: SoundHound, Inc.

Inventors: Keyvan MOHAJER, Michael Zagorsek
Natural language grammar improvement

Patent number: 11636853

Abstract: A method for configuring natural language grammars is provided to include identifying a first transcription having a first automatic speech recognition (ASR) score and a first natural language understanding (NLU) score and identifying a second transcription having a second ASR score and a second NLU score. The method includes detecting that a difference between the first and second ASR scores has a signed value with an opposite sign than a sign of a signed value of a difference between the first and second NLU scores, and responsive to detecting the opposite sign providing, to an evaluator, the audio query and the first and second transcriptions, receiving, from the evaluator, an indication of which of the first and second transcriptions is a correct transcription, and adjusting a value implemented to calculate the first NLU score or a value implemented to calculate the second NLU score.

Type: Grant

Filed: August 20, 2019

Date of Patent: April 25, 2023

Assignee: SoundHound, Inc.

Inventor: Angela Rose Howard
MULTIPLE SERVICE LEVELS FOR AUTOMATIC SPEECH RECOGNITION

Publication number: 20230082955

Abstract: A system for performing automated speech recognition (ASR) on audio data includes a queue manager to receive a request to perform ASR on audio data, add the request to a queue of incoming requests, and determine a queue depth representing a number of requests in the queue at a given time. The system also includes a load supervisor to receive the request and the queue depth from the queue manager and assign a service level for the request based on the queue depth. In addition, the system includes a speech-to-text converter to receive the assigned service level for the request from the load supervisor, select an ASR model for the request based on the received service level, receive the audio data associated with the request, and perform ASR on the audio data using the selected ASR model.

Type: Application

Filed: September 16, 2021

Publication date: March 16, 2023

Applicant: SoundHound, Inc.

Inventors: Timothy P. STONEHOCKER, Zizu GOWAYYED, Matthias EICHSTAEDT, Seyed Majid EMAMI, Evelyn JIANG, Ryan BERRYHILL, Mathieu RAMONA, Neil VEIRA
Voice morphing apparatus having adjustable parameters

Patent number: 11600284

Abstract: A voice morphing apparatus having adjustable parameters is described. The disclosed system and method include a voice morphing apparatus that morphs input audio to mask a speaker's identity. Parameter adjustment uses evaluation of an objective function that is based on the input audio and output of the voice morphing apparatus. The voice morphing apparatus includes objectives that are based adversarially on speaker identification and positively on audio fidelity. Thus, the voice morphing apparatus is adjusted to reduce identifiability of speakers while maintaining fidelity of the morphed audio. The voice morphing apparatus may be used as part of an automatic speech recognition system.

Type: Grant

Filed: January 11, 2020

Date of Patent: March 7, 2023

Assignee: SOUNDHOUND, INC.

Inventor: Steve Pearson
SPEECH-ENABLED AUGMENTED REALITY

Publication number: 20230055477

Abstract: Methods and systems for implementing an intuitive interaction between the user and the virtual content of augmented reality applications are disclosed. By implementing an augmented reality inquiry mode of a device, the system can enable a user to interact with relevant virtual objects via a speech-enabled interface. The speech-enabled augmented reality system can identify visual objects in images and recognize virtual objects corresponding to the visual objects, determine one or more relevant objects from the virtual objects based on relevance factors. Once the interaction session is established, a user can further interact with the relevant virtual objects, notably through voice commands addressed to the object. Accordingly, the present subject matter can enable a natural and hands-free interaction between the user and any virtual object that the user is interested in.

Type: Application

Filed: August 23, 2021

Publication date: February 23, 2023

Applicant: SoundHound, Inc.

Inventors: Keyvan MOHAJER, Morris MICHAEL, Bernard MONT-REYNAUD
CONTROLLING A GRAPHICAL USER INTERFACE BY TELEPHONE

Publication number: 20230059765

Abstract: A method and system for controlling a GUI on a user's network-connected device, the control being provided by a telephone call between the user and a speech recognition and speech synthesis system. An example of a restaurant ordering system is provided. The user calls a phone number and is guided through a verbal ordering process that includes one or more of: adding an item, deleting an item, changing quantities, changing sizes, and changing details of an item. The user's choices are added to a display so that a current status of the order is visible to the user. The GUI is updated as changes are made to the order. The GUI can also request additional information, upsell items, and show menus. The GUI aids the user in confirming that the order is correct. The system provides the final order to a restaurant for fulfillment.

Type: Application

Filed: August 22, 2021

Publication date: February 23, 2023

Applicant: SoundHound, Inc.

Inventors: Kamyar MOHAJER, Keyvan MOHAJER, James HOM, Evelyn JIANG
Differential spatial rendering of audio sources

Patent number: 11589184

Abstract: Methods and systems for intuitive spatial audio rendering with improved intelligibility are disclosed. By establishing a virtual association between an audio source and a location in the listener's virtual audio space, a spatial audio rendering system can generate spatial audio signals that create a natural and immersive audio field for a listener. The system can receive the virtual location of the source as a parameter and map the source audio signal to a source-specific multi-channel audio signal. In addition, the spatial audio rendering system can be interactive and dynamically modify the rendering of the spatial audio in response to a user's active control or tracked movement.

Type: Grant

Filed: March 21, 2022

Date of Patent: February 21, 2023

Assignee: SoundHound, Inc

Inventor: Bernard Mont-Reynaud
USING A SMARTPHONE TO CONTROL ANOTHER DEVICE BY VOICE

Publication number: 20230010815

Abstract: A method and system for implementing a speech-enabled interface of a host device via an electronic mobile device in a network are provided. The method includes establishing a communication session between the host device and the mobile device via a session service provider. According to some embodiments, a barcode can be adopted to enable the pairing of the host device and mobile device. Furthermore, the present method and system employ the voice interface in conjunction with speech recognition systems and natural language processing to interpret voice input for the hosting device, which can be used to perform one or more actions related to the hosting device.

Type: Application

Filed: July 9, 2021

Publication date: January 12, 2023

Applicant: SoundHound, Inc.

Inventor: Keisuke Tsuchida
Neural network training from private data

Patent number: 11551083

Abstract: Training and enhancement of neural network models, such as from private data, are described. A slave device receives a version of a neural network model from a master. The slave accesses a local and/or private data source and uses the data to perform optimization of the neural network model. This can be done such as by computing gradients or performing knowledge distillation to locally train an enhanced second version of the model. The slave sends the gradients or enhanced neural network model to a master. The master may use the gradient or second version of the model to improve a master model.

Type: Grant

Filed: December 17, 2019

Date of Patent: January 10, 2023

Assignee: SoundHound, Inc.

Inventors: Zili Li, Asif Amirguliyev, Jonah Probell
Sidebar conversations

Patent number: 11539920

Abstract: A system and a method are disclosed that enable sidebar conversations between two or more attendees that are participating in a primary or main meeting. The sidebar conversation occurs in conjunction or concurrently with the primary meeting. A first attendee provides commands to indicate a desire to initiate a sidebar conversation and information about a targeted attendee. The commands are analyzed to determine if a trigger phrase is included. The commands are analyzed to determine if there is an identification of a second (targeted) attendee, who is currently participating in the main meeting. If the second attendee is available, then the sidebar conversation is initiated. Additional attendees can be added to the sidebar conversation.

Type: Grant

Filed: June 21, 2021

Date of Patent: December 27, 2022

Assignee: SoundHound, Inc.

Inventor: Timothy P Stonehocker
AD GENERATION WITH NEURAL NETWORKS

Publication number: 20220405797

Abstract: Ads are generated based on product info and consumer profiles. A discriminator evaluates probabilities of ads being effective at causing consumer engagement. A decoder extracts product info from generated ads. Based on the probabilities of ads being effective and similarity of extracted and source product info, generated ads are labeled as examples. The examples are used in training an improved ad generator. Ads may be visual and/or audio containing speech. Ads may even contain humor, as recognized by mismatches between source and decoded product info.

Type: Application

Filed: August 18, 2022

Publication date: December 22, 2022

Applicant: SoundHound, Inc.

Inventor: Jonah PROBELL
SIDEBAR CONVERSATIONS

Publication number: 20220408059

Abstract: A system and a method are disclosed that enable sidebar conversations between two or more attendees that are participating in a primary or main meeting. The sidebar conversation occurs in conjunction or concurrently with the primary meeting. A first attendee provides commands to indicate a desire to initiate a sidebar conversation and information about a targeted attendee. The commands are analyzed to determine if a trigger phrase is included. The commands are analyzed to determine if there is an identification of a second (targeted) attendee, who is currently participating in the main meeting. If the second attendee is available, then the sidebar conversation is initiated. Additional attendees can be added to the sidebar conversation.

Type: Application

Filed: June 21, 2021

Publication date: December 22, 2022

Applicant: SoundHound, Inc.

Inventor: Timothy P STONEHOCKER
Text-to-speech adapted by machine learning

Patent number: 11531819

Abstract: Machine learned models take in vectors representing desired behaviors and generate voice vectors that provide the parameters for text-to-speech (TTS) synthesis. Models may be trained on behavior vectors that include user profile attributes, situational attributes, or semantic attributes. Situational attributes may include age of people present, music that is playing, location, noise, and mood. Semantic attributes may include presence of proper nouns, number of modifiers, emotional charge, and domain of discourse. TTS voice parameters may apply per utterance and per word as to enable contrastive emphasis.

Type: Grant

Filed: January 14, 2020

Date of Patent: December 20, 2022

Assignee: SoundHound, Inc.

Inventors: Bernard Mont-Reynaud, Monika Almudafar-Depeyrot
METHOD FOR PROVIDING INFORMATION, METHOD FOR GENERATING DATABASE, AND PROGRAM

Publication number: 20220382823

Abstract: As audio (1) is input to an extension of a browser, the extension transmits the audio (1) to a language processing server. A speech recognition unit obtains a text (1) corresponding to the audio (1), and transmits the text (1) to a natural language understanding unit. In the natural language understanding unit, an information processing unit identifies a URL (1) corresponding to the text (1), and transmits the URL (1) to the browser. The extension passes the URL (1) to a browsing function. The browsing function uses the URL (1) to access a web server. The web server transmits a web page (1) corresponding to the URL (1) to the browser. The browsing function shows a screen corresponding to the web page (1) on a display.

Type: Application

Filed: January 26, 2022

Publication date: December 1, 2022

Applicant: SoundHound, Inc.

Inventors: Masaki NAITO, Keisuke TSUCHIDA, Jun YONEYAMA, Kaku SAWADA

prev 1 2 3 4 5 6 … next