Patents Assigned to SoundHound, Inc.

Token confidence scores for automatic speech recognition

Patent number: 12223948

Abstract: Methods and systems for correction of a likely erroneous word in a speech transcription are disclosed. By evaluating token confidence scores of individual words or phrases, the automatic speech recognition system can replace a low-confidence score word with a substitute word or phrase. Among various approaches, neural network models can be used to generate individual confidence scores. Such word substitution can enable the speech recognition system to automatically detect and correct likely errors in transcription. Furthermore, the system can indicate the token confidence scores on a graphic user interface for labeling and dictionary enhancement.

Type: Grant

Filed: February 3, 2022

Date of Patent: February 11, 2025

Assignee: SoundHound, Inc.

Inventors: Pranav Singh, Saraswati Mishra, Eunjee Na
SEMANTICALLY CONDITIONED VOICE ACTIVITY DETECTION

Publication number: 20240233718

Abstract: A method includes recognizing words comprised by a first utterance; interpreting the recognized words according to a grammar comprised by a domain; from the interpreting of the recognized words, determining a timeout period for the first utterance based on the domain of the first utterance; detecting end of voice activity in the first utterance; executing an instruction following an amount of time after detecting end of voice activity of the first utterance in response to the amount of time exceeding the timeout period, the executed instruction based at least in part on interpreting the recognized words.

Type: Application

Filed: October 19, 2022

Publication date: July 11, 2024

Applicant: SoundHound, Inc.

Inventor: Victor LEITMAN
Method for providing information, method for generating database, and program

Patent number: 11995143

Abstract: As audio (1) is input to an extension of a browser, the extension transmits the audio (1) to a language processing server. A speech recognition unit obtains a text (1) corresponding to the audio (1), and transmits the text (1) to a natural language understanding unit. In the natural language understanding unit, an information processing unit identifies a URL (1) corresponding to the text (1), and transmits the URL (1) to the browser. The extension passes the URL (1) to a browsing function. The browsing function uses the URL (1) to access a web server. The web server transmits a web page (1) corresponding to the URL (1) to the browser. The browsing function shows a screen corresponding to the web page (1) on a display.

Type: Grant

Filed: January 26, 2022

Date of Patent: May 28, 2024

Assignee: SoundHound, Inc.

Inventors: Masaki Naito, Keisuke Tsuchida, Jun Yoneyama, Kaku Sawada
REAL-TIME NATURAL LANGUAGE PROCESSING AND FULFILLMENT

Publication number: 20240161737

Abstract: A system and method of real-time feedback confirmation to solicit a virtual assistant response from an evolving semantic state of at least a portion of an utterance. A user accesses a virtual assistant on an electronic device having the system and/or method configured to capture a command, a question, and/or a fulfillment request from audio such as, the speech emitted from the speaking user. The speech may be intercepted by a speech engine configured to transcribe the speech into text that is matched with the fragment pattern's regular expression to generate a fragment and/or the speech may be processed with a machine learning model to identify fragments. The fragments are identified by a domain handler configured to update a data structure of the current semantic state of the utterance in real-time on an interface of an electronic device.

Type: Application

Filed: November 15, 2022

Publication date: May 16, 2024

Applicant: SoundHound, Inc.

Inventors: Jon GROSSMANN, Robert MACRAE, Scott HALSTVEDT, Keyvan MOHAJER
DOMAIN SPECIFIC NEURAL SENTENCE GENERATOR FOR MULTI-DOMAIN VIRTUAL ASSISTANTS

Publication number: 20240144921

Abstract: Automatically generating sentences that a user can say to invoke a set of defined actions performed by a virtual assistant are disclosed. A sentence is received and keywords are extracted from the sentence. Based on the keywords, additional sentences are generated. A classifier model is applied to the generated sentences to determine a sentence that satisfies a threshold. In the situation a sentence satisfies the threshold, an intent associated with the classifier model can be invoked. In the situation the sentences fail to satisfy the classifier model, the virtual assistant can attempt to interpret the received sentence according to the most likely intent by invoking a sentence generation model fine-tuned for a particular domain, generate additional sentences with a high probability of having the same intent and fulfill the specific action defined by the intent.

Type: Application

Filed: October 27, 2022

Publication date: May 2, 2024

Applicant: SoundHound, Inc.

Inventors: Pranav SINGH, Yilun ZHANG, Eunjee NA, Olivia BETTAGLIO
TEXT-TO-SPEECH SYSTEM WITH VARIABLE FRAME RATE

Publication number: 20240144910

Abstract: A neural TTS system is trained to generate key acoustic frames at variable rates while omitting other frames. The frame skipping depends on the acoustic features to be generated for the input text. The TTS system can interpolate frames between the key frames at a target rate for a vocoder to synthesis audio samples.

Type: Application

Filed: October 31, 2022

Publication date: May 2, 2024

Applicant: SoundHound, Inc.

Inventors: Steve PEARSON, Jon GROSSMAN
SEMANTICALLY CONDITIONED VOICE ACTIVITY DETECTION

Publication number: 20240135922

Abstract: A method includes recognizing words comprised by a first utterance; interpreting the recognized words according to a grammar comprised by a domain; from the interpreting of the recognized words, determining a timeout period for the first utterance based on the domain of the first utterance; detecting end of voice activity in the first utterance; executing an instruction following an amount of time after detecting end of voice activity of the first utterance in response to the amount of time exceeding the timeout period, the executed instruction based at least in part on interpreting the recognized words.

Type: Application

Filed: October 18, 2022

Publication date: April 25, 2024

Applicant: SoundHound, Inc.

Inventor: Victor LEITMAN
ADAPTING AN UTTERANCE CUT-OFF PERIOD WITH USER SPECIFIC PROFILE DATA

Publication number: 20240135927

Abstract: A system detects a period of non-voice activity and compares its duration to a cutoff period. The system adapts the cutoff period based on parsing previously-recognized speech of a user that is stored on a user's device or the system, which detects the voice activity, to determine according to a model, such as a machine-learned model, the probability that the speech recognized so far is a prefix to a longer complete utterance. The cutoff period is longer when a parse of previously recognized speech, which is based on the user profile, has a high probability of being a prefix of a longer utterance.

Type: Application

Filed: January 2, 2024

Publication date: April 25, 2024

Applicant: SoundHound, Inc.

Inventors: Patricia Pozon AGUAYO, Jennifer Hee Young ZHANG, Jonah PROBELL
Automatic Speech Recognition with Voice Personalization and Generalization

Publication number: 20240127803

Abstract: A voice morphing model can transform diverse voices to one or a small number of target voices. An acoustic model can be trained for high accuracy on the target voices. Speech recognition on diverse voices can be performed by morphing it to a target voice and then performing recognition on audio with the target voice. The morphing model and an acoustic model for speech recognition can be trained separately or jointly. A source of requests for speech recognition can pass audio and a voiceprint with requests. Speech recognition can run with improved accuracy by biasing an acoustic model for the voice in the audio using the voiceprint. The audio can be used to calculate a new voiceprint, which can be used to update the voiceprint included with the audio. The updated voiceprint can be sent back to the source and then used with future speech recognition requests.

Type: Application

Filed: October 12, 2022

Publication date: April 18, 2024

Applicant: SoundHound, Inc.

Inventor: Keyvan MOHAJER
Classification by natural language grammar slots across domains

Patent number: 11935029

Abstract: A virtual assistant processes natural language expressions according to grammar rules created by domain providers. The virtual assistant uniquely identifies each of a multiplicity of users and stores values of grammar slots filled by natural language expressions from each user. The virtual assistant stores histories of slot values and computes statistics from the history. The virtual assistant provider, or a classification client, provides values of attributes of users as labels for a machine learning classification algorithm. The algorithm processes the grammar slot values and labels to compute probability distributions for unknown attribute values of users. A network effect of users and domain grammars make the virtual assistant useful and provides increasing amounts of data that improve classification accuracy and usefulness.

Type: Grant

Filed: September 5, 2018

Date of Patent: March 19, 2024

Assignee: SoundHound, Inc.

Inventors: Joe Aung, Jonah Probell
Authorization of Action by Voice Identification

Publication number: 20240054195

Abstract: Actions are authorized by computing a confidence score that exceeds a threshold. The confidence score is based on a match between metadata about requests and fields in corresponding database records. The confidences score weights matches by the dependability of the metadata for authentication. The confidence score is further based on the closeness of a sample of speech audio to a stored voiceprint. Additional identification may be required for authorization. The confidence score requirement may be relaxed based on identification in a buffer of recent action requests.

Type: Application

Filed: August 9, 2022

Publication date: February 15, 2024

Applicant: SoundHound, Inc.

Inventors: Ahmadul HASSAN, James HOM
METHOD AND SYSTEM FOR PROACTIVE INTERACTION

Publication number: 20240046923

Abstract: In an interaction system, a server can obtain a setting expression including a query and a condition for functioning as a virtual assistant, store the query and the condition in a memory, and deliver an inquiry expression including the query in response to occurrence of a situation specified by the condition. The setting expression can be by voice or natural language. Processes can be different for different users and can be based on domain. The inquiry expression includes a question asking the user for an affirmative response before performing the inquiry. Implementations can be adopted in or near a vehicle.

Type: Application

Filed: July 28, 2023

Publication date: February 8, 2024

Applicant: SoundHound, Inc.

Inventor: Masaki NAITO
CONFIGURABLE NEURAL SPEECH SYNTHESIS

Publication number: 20240021189

Abstract: A discriminator trained on labeled samples of speech can compute probabilities of voice properties. A speech synthesis generative neural network that takes in text and continuous scale values of voice properties is trained to synthesize speech audio that the discriminator will infer as matching the values of the input voice properties. Voice parameters can include speaker voice parameters, accents, and attitudes, among others. Training can be done by transfer learning from an existing neural speech synthesis model or such a model can be trained with a loss function that considers speech and parameter values. A graphical user interface can allow voice designers for products to synthesize speech with a desired voice or generate a speech synthesis engine with frozen voice parameters. A vector of parameters can be used for comparison to previously registered voices in databases such as ones for trademark registration.

Type: Application

Filed: July 14, 2023

Publication date: January 18, 2024

Applicant: SoundHound, Inc.

Inventor: Andrew RICHARDS
Adapting an utterance cut-off period based on parse prefix detection

Patent number: 11862162

Abstract: A processing system detects a period of non-voice activity and compares its duration to a cutoff period. The system adapts the cutoff period based on parsing previously-recognized speech to determine, according to a model, such as a machine-learned model, the probability that the speech recognized so far is a prefix to a longer complete utterance. The cutoff period is longer when a parse of previously recognized speech has a high probability of being a prefix of a longer utterance.

Type: Grant

Filed: March 18, 2022

Date of Patent: January 2, 2024

Assignee: SoundHound, Inc.

Inventors: Patricia Pozon Aguayo, Jennifer Hee Young Zhang, Jonah Probell
Virtual assistant domain functionality

Patent number: 11836453

Abstract: Aspects include methods, systems, and computer-program products providing virtual assistant domain functionality. A natural language query including one or more words is received. A collection of natural language modules is accessed. The collection natural language modules are configured to process sets of natural language queries. A natural language module, from the collection of natural language modules, is identified to interpret the natural language query. An interpretation of the natural language query is computed using the identified natural language module. A response to the natural language query is returned using the computed interpretation.

Type: Grant

Filed: July 22, 2021

Date of Patent: December 5, 2023

Assignee: SoundHound, Inc.

Inventors: Kamyar Mohajer, Keyvan Mohajer, Bernard Mont-Reynaud, Pranav Singh
APPARATUS, PLATFORM, METHOD AND MEDIUM FOR INTENTION IMPORTANCE INFERENCE

Publication number: 20230386459

Abstract: The application provides an apparatus, platform, method and medium for intention importance interference. The apparatus includes an interface configured to receive user-related information; and a processor coupled to the interface and configured to: extract data related to different aspects of a user from the user-related information; generate a plurality of intention probes based on the data related to different aspects of the user, each intention probe comprising an intention and associated data items; infer an importance of each intention probe by calculating a score of each associated data items of the intention probe based on the data related to different aspects of the user; and provide information associated with an intention probe with a highest importance.

Type: Application

Filed: August 18, 2022

Publication date: November 30, 2023

Applicant: SoundHound, Inc.

Inventor: Chong Wang
PRE-WAKEWORD SPEECH PROCESSING

Publication number: 20230386458

Abstract: Methods and systems for pre-wakeword speech processing are disclosed. Speech audio, comprising command speech spoken before a wakeword, may be stored in a buffer in oldest to newest order. Upon detection of the wakeword, reverse acoustic models and language models, such as reverse automatic speech recognition (R-ASR) can be applied to the buffered audio, in newest to oldest order, starting from before the wakeword. The speech is converted into a sequence of words. Natural language grammar models, such as natural language understanding (NLU), can be applied to match the sequence of words to a complete command, the complete command being associated with invoking a computer operation.

Type: Application

Filed: May 27, 2022

Publication date: November 30, 2023

Applicant: SoundHound, Inc.

Inventors: Karl STAHL, Bernard MONT-REYNAUD
METHOD AND SYSTEM FOR ACOUSTIC MODEL CONDITIONING ON NON-PHONEME INFORMATION FEATURES

Publication number: 20230352000

Abstract: A method and system for acoustic model conditioning on non-phoneme information features for optimized automatic speech recognition is provided. The method includes using an encoder model to encode sound embedding from a known key phrase of speech and conditioning an acoustic model with the sound embedding to optimize its performance in inferring the probabilities of phonemes in the speech. The sound embedding can comprise non-phoneme information related to the key phrase and the following utterance. Further, the encoder model and the acoustic model can be neural networks that are jointly trained with audio data.

Type: Application

Filed: July 6, 2023

Publication date: November 2, 2023

Applicant: SoundHound, Inc.

Inventors: Zizu GOWAYYED, Keyvan MOHAJER
CONTENT FILTERING IN MEDIA PLAYING DEVICES

Publication number: 20230353826

Abstract: Various approaches relate to user defined content filtering in media playing devices of undesirable content represented in stored and real-time content from content providers. For example, video, image, and/or audio data can be analyzed to identify and classify content included in the data using various classification models and object and text recognition approaches. Thereafter, the identification and classification can be used to control presentation and/or access to the content and/or portions of the content. For example, based on the classification, portions of the content can be modified (e.g., replaced, removed, degraded, etc.) using one or more techniques (e.g., media replacement, media removal, media degradation, etc.) and then presented.

Type: Application

Filed: July 6, 2023

Publication date: November 2, 2023

Applicant: SoundHound, Inc.

Inventors: Thor S. KHOV, Terry KONG
SYSTEMS AND METHODS FOR GENERATING AND USING SHARED NATURAL LANGUAGE LIBRARIES

Publication number: 20230325358

Abstract: Systems and methods for searching databases by sound data input are provided herein. A service provider may have a need to make their database(s) searchable through search technology. However, the service provider may not have the resources to implement such search technology. The search technology may allow for search queries using sound data input. The technology described herein provides a solution addressing the service provider’s need, by giving a search technology that furnishes search results in a fast, accurate manner. In further embodiments, systems and methods to monetize those search results are also described herein.

Type: Application

Filed: June 6, 2023

Publication date: October 12, 2023

Applicant: SoundHound, Inc.

Inventor: Keyvan Mohajer

1 2 3 4 5 … next