Patents Assigned to SoundHound, Inc.

CONTROLLING AN ENGAGEMENT STATE OF AN AGENT DURING A HUMAN-MACHINE DIALOG

Publication number: 20220122607

Abstract: A method of controlling an engagement state of an agent during a human-machine dialog is provided. The method can include receiving a spoken request that is a conditional locking request, wherein the conditional locking request uses a natural language expression to explicitly specify a locking condition, which is a predicate, storing the predicate in a format that can be evaluated when needed by the agent, entering a conditionally locked state in response to the conditional locking request, in the conditionally locked state, receiving a multiplicity of requests without a need for a wakeup indicator, and for a request from the multiplicity of requests evaluating the predicate upon receiving the request, and processing the request if the predicate is true.

Type: Application

Filed: December 27, 2021

Publication date: April 21, 2022

Applicant: SoundHound, Inc.

Inventors: Scott Halstvedt, Keyvan Mohajer, Bernard Mont-Reynaud
Adapting an utterance cut-off period based on parse prefix detection

Patent number: 11308960

Abstract: A processing system detects a period of non-voice activity and compares its duration to a cutoff period. The system adapts the cutoff period based on parsing previously-recognized speech to determine, according to a model, such as a machine-learned model, the probability that the speech recognized so far is a prefix to a longer complete utterance. The cutoff period is longer when a parse of previously recognized speech has a high probability of being a prefix of a longer utterance.

Type: Grant

Filed: March 19, 2020

Date of Patent: April 19, 2022

Assignee: SoundHound, Inc.

Inventors: Patricia Pozon Aguayo, Jennifer Hee Young Zhang, Jonah Probell
Synthesizing speech recognition training data

Patent number: 11308938

Abstract: To train a speech recognizer, such as for recognizing variables in a neural speech-to-meaning system, compute, within an embedding space, a range of vectors of features of natural speech. Generate parameter sets for speech synthesis and synthesis speech according to the parameters. Analyze the synthesized speech to compute vectors in the embedding space. Using a cost function that favors an even spread (minimal clustering) generates a multiplicity of speech synthesis parameter sets. Using the multiplicity of parameter sets, generate a multiplicity of speech of known words that can be used as training data for speech recognition.

Type: Grant

Filed: December 5, 2019

Date of Patent: April 19, 2022

Assignee: SoundHound, Inc.

Inventors: Maisy Wieman, Jonah Probell, Sudharsan Krishnaswamy
METHOD AND SYSTEM FOR CONVERSATION TRANSCRIPTION WITH METADATA

Publication number: 20220115019

Abstract: Methods and systems for enabling an efficient review of meeting content via a metadata-enriched, speaker-attributed and multiuser-editable transcript are disclosed. By incorporating speaker diarization and other metadata, the system can provide a structured and effective way to review and/or edit the transcript by one or more editors. One type of metadata can be image or video data to represent the meeting content. Furthermore, the present subject matter utilizes a multimodal diarization model to identify and label different speakers. The system can synchronize various sources of data, e.g., audio channel data, voice feature vectors, acoustic beamforming, image identification, and extrinsic data, to implement speaker diarization.

Type: Application

Filed: October 11, 2021

Publication date: April 14, 2022

Applicant: SoundHound, Inc.

Inventors: Kiersten L. BRADLEY, Ethan COEYTAUX, Ziming YIN
METHOD AND SYSTEM FOR CONVERSATION TRANSCRIPTION WITH METADATA

Publication number: 20220115020

Abstract: Methods and systems for enabling an efficient review of meeting content via a metadata-enriched, speaker-attributed transcript are disclosed. By incorporating speaker diarization and other metadata, the system can provide a structured and effective way to review and/or edit the transcript. One type of metadata can be image or video data to represent the meeting content. Furthermore, the present subject matter utilizes a multimodal diarization model to identify and label different speakers. The system can synchronize various sources of data, e.g., audio channel data, voice feature vectors, acoustic beamforming, image identification, and extrinsic data, to implement speaker diarization.

Type: Application

Filed: October 11, 2021

Publication date: April 14, 2022

Applicant: SoundHound, Inc.

Inventors: Kiersten L. BRADLEY, Ethan COEYTAUX, Ziming YIN
Using phonetic variants in a local context to improve natural language understanding

Patent number: 11295730

Abstract: A method is described that includes processing text and speech from an input utterance using local overrides of default dictionary pronunciations. Applying this method, a word-level grammar used to process the tokens specifies at least one local word phonetic variant that applies within a specific production rule and, within a local context of the specific production rule, the local word phonetic variant overrides one or more default dictionary phonetic versions of the word. This method can be applied to parsing utterances where the pronunciation of some words depends on their syntactic or semantic context.

Type: Grant

Filed: August 1, 2019

Date of Patent: April 5, 2022

Assignee: SoundHound, Inc.

Inventors: Keyvan Mohajer, Christopher Wilson, Bernard Mont-Reynaud
Dynamic interpolation for hybrid language models

Patent number: 11295732

Abstract: In order to improve the accuracy of ASR, an utterance is transcribed using a plurality of language models, such as for example, an N-gram language model and a neural language model. The language models are trained separately. They each output a probability score or other figure of merit for a partial transcription hypothesis. Model scores are interpolated to determine a hybrid score. While recognizing an utterance, interpolation weights are chosen or updated dynamically, in the specific context of processing. The weights are based on dynamic variables associated with the utterance, the partial transcription hypothesis, or other aspects of context.

Type: Grant

Filed: August 1, 2019

Date of Patent: April 5, 2022

Assignee: SoundHound, Inc.

Inventors: Steffen Holm, Terry Kong, Kiran Garaga Lokeswarappa
Dynamic wakewords for speech-enabled devices

Patent number: 11295741

Abstract: A system and method are disclosed capable of parsing a spoken utterance into a natural language request and a speech audio segment, where the natural language request directs the system to use the speech audio segment as a new wakeword. In response to this wakeword assignment directive, the system and method are further capable of immediately building a new wakeword spotter to activate the device upon matching the new wakeword in the input audio. Different approaches to promptly building a new wakeword spotter are described. Variations of wakeword assignment directives can make the new wakeword public or private. They can also add the new wakeword to earlier wakewords, or replace earlier wakewords.

Type: Grant

Filed: December 5, 2019

Date of Patent: April 5, 2022

Assignee: SoundHound, Inc.

Inventor: Bernard Mont-Reynaud
SYSTEM AND METHOD FOR VOICE MORPHING IN A DATA ANNOTATOR TOOL

Publication number: 20220092273

Abstract: A system and method for masking an identity of a speaker of natural language speech, such as speech clips to be labeled by humans in a system generating voice transcriptions for training an automatic speech recognition model. The natural language speech is morphed prior to being presented to the human for labeling. In one embodiment, morphing comprises pitch shifting the speech randomly either up or down, then frequency shifting the speech, then pitch shifting the speech in a direction opposite the first pitch shift. Labeling the morphed speech comprises at least one or more of transcribing the morphed speech, identifying a gender of the speaker, identifying an accent of the speaker, and identifying a noise type of the morphed speech.

Type: Application

Filed: November 30, 2021

Publication date: March 24, 2022

Applicant: SoundHound, Inc.

Inventor: Dylan H. Ross
System and method for providing natural language recommendations

Patent number: 11276398

Abstract: A system that includes a stand-alone device or a server connected client device are in communication with a server and provide recommendations. The device includes an input component, a storage component, a processor and an output component. The server-connected client device includes an input component that receives the user's request, a communication component that communicates the request to the server and receives the recommendation from the server, and an output component that provides the recommendation to user.

Type: Grant

Filed: June 20, 2019

Date of Patent: March 15, 2022

Assignee: SoundHound, Inc.

Inventors: Robert MacRAE, Kamyar Mohajer
RECEIVING A NATURAL LANGUAGE REQUEST AND RETRIEVING A PERSONAL VOICE MEMO

Publication number: 20220076678

Abstract: A computer-implemented method is provided. The method includes receiving commands to store memos, identifying subjects related to the memos, storing, in a database, the memos, their related subjects, and associated time information, receiving a natural language request to retrieve a memo, the request having query information, identifying a subject related to the request, responsive to the request, querying the database for memos related to the subject, identifying multiple memos in response to the database query, identifying a memo, from the multiple identified memos, that has the most recent associated time information and providing a response in dependence on the identified memo.

Type: Application

Filed: November 19, 2021

Publication date: March 10, 2022

Applicant: SoundHound, Inc.

Inventors: Irina A. SPIRIDONOVA, Karl STAHL, Mara SELVAGGI
PROVIDING RELEVANT MESSAGES TO AN AUTOMOTIVE VIRTUAL ASSISTANT

Publication number: 20220075956

Abstract: A method of providing relevant messages to an automotive virtual assistant is provided. The method includes receiving a spoken utterance and corresponding first geolocation information detected by a subsystem of a first automobile, parsing the spoken utterance to determine concepts and storing the concepts in a concept database indexed by the corresponding first geolocation information. The method further includes receiving second geolocation information detected by a subsystem of a second automobile, searching the concept database for an index based on the second geolocation information to find a stored concept of the stored concepts, searching a natural language expression database using the stored concept as an index to find an assistive natural language expression, wherein the assistive natural language expression includes a constituent part, and sending the assistive natural language expression to the second automobile with the stored concept in place of the constituent part.

Type: Application

Filed: November 15, 2021

Publication date: March 10, 2022

Applicant: SoundHound, Inc.

Inventors: Bernard MONT-REYNAUD, Jonah PROBELL, Pranav SINGH, Kheng KHOV
System and method for detection and correction of a query

Patent number: 11263198

Abstract: Systems and methods are provided for systematically finding and fixing automatic speech recognition (ASR) mistranscriptions and natural language understanding (NLU) misinterpretations and labeling data for machine learning. High similarity of non-identical consecutive queries indicates ASR mistranscriptions. Consecutive queries with close vectors in a semantic embedding space indicates NLU misinterpretations. Key phrases and barge-in also indicate errors. Only queries within a short amount of time are considered.

Type: Grant

Filed: September 5, 2019

Date of Patent: March 1, 2022

Assignee: SOUNDHOUND, INC.

Inventors: Olivia Bettaglio, Pranav Singh
Vision-assisted speech processing

Patent number: 11257493

Abstract: Systems and methods for processing speech are described. In certain examples, image data is used to generate visual feature tensors and audio data is used to generate audio feature tensors. The visual feature tensors and the audio feature tensors are used by a linguistic model to determine linguistic features that are usable to parse an utterance of a user. The generation of the feature tensors may be jointly configured with the linguistic model. Systems may be provided in a client-server architecture.

Type: Grant

Filed: July 11, 2019

Date of Patent: February 22, 2022

Assignee: SoundHound, Inc.

Inventors: Cristina Vasconcelos, Zili Li
Managing agent engagement in a man-machine dialog

Patent number: 11250844

Abstract: Agents engage and disengage with users intelligently. Users can tell agents to remain engaged without requiring a wakeword. Engaged states can support modal dialogs and barge-in. Users can cause disengagement explicitly. Disengagement can be conditional based on timeout, change of user, or environmental conditions. Engagement can be one-time or recurrent. Recurrent states can be attentive or locked. Locked states can be unconditional or conditional, including being reserved to support user continuity. User continuity can be tested by matching parameters or tracking user by many modalities including microphone arrays, cameras, and other sensors.

Type: Grant

Filed: January 26, 2018

Date of Patent: February 15, 2022

Assignee: SoundHound, Inc.

Inventors: Bernard Mont-Reynaud, Scott Halstvedt, Keyvan Mohajer
Conditional responses to application commands in a client-server system

Patent number: 11250217

Abstract: A client device receives a user request (e.g., in natural language form) to execute a command of an application. The client device delegates interpretation of the request to a response-processing server. Using domain knowledge previously provided by a developer of the application, the response-processing server determines the various possible responses that client devices could make in response to the request based on circumstances such as the capabilities of the client devices and the state of the application data. The response-processing server accordingly generates a response package that describes a number of different conditional responses that client devices could have to the request and provides the response package to the client device. The client device selects the appropriate response from the response package based on the circumstances as determined by the client device, executes the command (if possible), and provides the user with some representation of the response.

Type: Grant

Filed: February 14, 2020

Date of Patent: February 15, 2022

Assignee: SoundHound, Inc.

Inventors: Keyvan Mohajer, Christopher S. Wilson, Kheng Khov, Ian Graves
System and method for interpreting natural language commands with compound criteria

Patent number: 11238101

Abstract: A command-processing server receives a natural language command from a user. The command-processing server has a set of domain command interpreters corresponding to different domains in which commands can be expressed, such as the domain of entertainment, or the domain of travel. Some or all of the domain command interpreters recognize user commands having a verbal prefix, an optional pre-filter, an object, and an optional post-filter; the pre- and post-filters may be compounded expressions involving multiple atomic filters. Different developers may independently specify the domain command interpreters and the sub-structure interpreters on which they are based.

Type: Grant

Filed: October 27, 2020

Date of Patent: February 1, 2022

Assignee: SOUNDHOUND, INC.

Inventor: Keyvan Mohajer
Using a virtual assistant to store a personal voice memo and to obtain a response based on a stored personal voice memo that is retrieved according to a received query

Patent number: 11211064

Abstract: The technology disclosed relates to retrieving a personal memo from a database. The method includes receiving, by a virtual assistant, a natural language utterance that expresses a request, interpreting the natural language utterance according to a natural language grammar rule for retrieving memo data from the natural language utterance, the natural language grammar rule recognizing query information, responsive to interpreting the natural language utterance, using the query information to query the database for a memo related to the query information, and providing, to a user, a response generated in dependence upon the memo related to the query information.

Type: Grant

Filed: January 23, 2019

Date of Patent: December 28, 2021

Assignee: SoundHound, Inc.

Inventors: Mara Selvaggi, Irina A Spiridonova, Karl Stahl
MACHINE LEARNING SYSTEM FOR DIGITAL ASSISTANTS

Publication number: 20210397610

Abstract: A machine learning system for a digital assistant is described, together with a method of training such a system. The machine learning system is based on an encoder-decoder sequence-to-sequence neural network architecture trained to map input sequence data to output sequence data, where the input sequence data relates to an initial query and the output sequence data represents canonical data representation for the query. The method of training involves generating a training dataset for the machine learning system. The method involves clustering vector representations of the query data samples to generate canonical-query original-query pairs in training the machine learning system.

Type: Application

Filed: June 17, 2021

Publication date: December 23, 2021

Applicant: SoundHound, Inc.

Inventors: Pranav SINGH, Yilun ZHANG, Keyvan MOHAJER, Mohammadreza FAZELI
System and method for voice morphing

Patent number: 11205056

Abstract: A system and method for masking an identity of a speaker of natural language speech, such as speech clips to be labeled by humans in a system generating voice transcriptions for training an automatic speech recognition model. The natural language speech is morphed prior to being presented to the human for labeling. In one embodiment, morphing comprises pitch shifting the speech randomly either up or down, then frequency shifting the speech, then pitch shifting the speech in a direction opposite the first pitch shift.

Type: Grant

Filed: September 22, 2019

Date of Patent: December 21, 2021

Assignee: SoundHound, Inc.

Inventor: Dylan H Ross

prev 1 2 3 4 5 6 7 8 … next