Patents Assigned to SoundHound, Inc.

STORING AND RETRIEVING PERSONAL VOICE MEMOS

Publication number: 20200234698

Abstract: The technology disclosed relates to retrieving a personal memo from a database. The method includes receiving, by a virtual assistant, a natural language utterance that expresses a request, interpreting the natural language utterance according to a natural language grammar rule for retrieving memo data from the natural language utterance, the natural language grammar rule recognizing query information, responsive to interpreting the natural language utterance, using the query information to query the database for a memo related to the query information, and providing, to a user, a response generated in dependence upon the memo related to the query information.

Type: Application

Filed: January 23, 2019

Publication date: July 23, 2020

Applicant: SoundHound, Inc.

Inventors: Mara Selvaggi, Irina A. Spiridonova, Karl Stahl
Information Retrieval According To A User Interest Model

Publication number: 20200219490

Abstract: Systems and methods are provided for providing relevant information in response to natural language expressions. The expressions may be part of a spoken conversation between people either together or remotely. The information may be provided visually. Whether a piece of information is relevant to display can be conditioned by a model of the interest of the speaker. The interest model can be based on a history of expressions by the speaker and information from a user profile. The display of information can also be conditioned on a current conversation topic and on whether the same information has been displayed recently.

Type: Application

Filed: March 18, 2020

Publication date: July 9, 2020

Applicant: SoundHound, Inc.

Inventors: Bernard Mont-Reynaud, Jonah Probell
Adapting An Utterance Cut-Off Period Based On Parse Prefix Detection

Publication number: 20200219513

Abstract: A processing system detects a period of non-voice activity and compares its duration to a cutoff period. The system adapts the cutoff period based on parsing previously-recognized speech to determine, according to a model, such as a machine-learned model, the probability that the speech recognized so far is a prefix to a longer complete utterance. The cutoff period is longer when a parse of previously recognized speech has a high probability of being a prefix of a longer utterance.

Type: Application

Filed: March 19, 2020

Publication date: July 9, 2020

Applicant: SoundHound, Inc.

Inventors: Patricia Pozon AGUAYO, Jennifer Hee Young ZHANG, Jonah PROBELL
UNIFIED EMBEDDINGS FOR TRANSLATION

Publication number: 20200210529

Abstract: A method of training word embeddings is provided. The method includes determining anchors, each comprising a first word in a first domain and a second word in a second domain, training word embeddings for the first and second domains, and training a transform for transforming word embedding vectors in the first domain to word embedding vectors in the second domain, wherein the training minimizes a loss function that includes an anchor loss for each anchor, such that for each anchor, the anchor loss is based on a distance between the anchor's second word's embedding vector and the transform of the anchor's first word's embedding vector, and for each anchor, the anchor loss for the respective anchor is zero when the distance between the respective anchor's second word's embedding vector and the transform of the respective anchor's first word's embedding vector is less than a specific tolerance.

Type: Application

Filed: December 26, 2018

Publication date: July 2, 2020

Applicant: SoundHound, Inc.

Inventor: Terry KONG
Techniques for concurrent processing of user speech

Patent number: 10699713

Abstract: A server receives a user audio stream, the stream comprising multiple utterances. A query-processing module of the server continuously listens to and processes the utterances. The processing includes parsing successive utterances and recognizing corresponding queries, taking appropriate actions while the utterances are being received. In some embodiments, a query may be parsed and executed before the previous query's execution is complete.

Type: Grant

Filed: April 18, 2019

Date of Patent: June 30, 2020

Assignee: SoundHound, Inc.

Inventors: Scott Halstvedt, Bernard Mont-Reynaud, Kazi Asif Wadud
SYSTEM AND METHOD FOR DETECTION AND CORRECTION OF INCORRECTLY PRONOUNCED WORDS

Publication number: 20200184958

Abstract: A system and method are disclosed for capturing a segment of speech audio, performing phoneme recognition on the segment of speech audio to produce a segmented phoneme sequence, comparing the segmented phoneme sequence to stored phoneme sequences that represent incorrect pronunciations of words to determine if there is a match, and identifying an incorrect pronunciation for a word in the segment of speech audio. The system builds a library based on the data collected for the incorrect pronunciations.

Type: Application

Filed: December 7, 2018

Publication date: June 11, 2020

Applicant: SoundHound, Inc.

Inventors: Katayoun NOROUZI, Karl STAHL
Virtual Assistant Domain Selection Analysis

Publication number: 20200183815

Abstract: A virtual assistant platform provides a user interface for app developers to configure the enablement of domains for virtual assistants. Sets of test queries can be uploaded and statistical analyses displayed for the numbers of test queries served by each selected domain and costs for usage of each domain. Costs can vary according to complex pricing models. The user interface provides display views of tables, cost stack charts, and histograms to inform decisions that trade-off costs with benefits to the virtual assistant user experience. The platform interface shows, for individual queries, responses possible from different domains. Platform providers promote certain chosen domains.

Type: Application

Filed: December 7, 2018

Publication date: June 11, 2020

Applicant: SoundHound, Inc.

Inventors: Bernard Mont-Reynaud, Jonah Probell
Systems and methods for providing identification information in response to an audio segment

Patent number: 10657174

Abstract: The present invention relates to providing identification information in response to an audio segment using a first mode of operation including receiving an audio segment and sending the audio segment to a remote server and receiving, from the remote server, identification information relating to the audio segment, and a second mode of operation of receiving an audio segment and using stored information to obtain identification information relating to the received audio segment received, without sending the audio segment to the remote server. The present invention further includes using identification information from the remote server and using local identification information and selecting either identification information from the remote server or local identification information based on selection criteria, and generating an output based on the selected identification information.

Type: Grant

Filed: July 24, 2018

Date of Patent: May 19, 2020

Assignee: SoundHound, Inc.

Inventors: Aaron Master, Bernard Mont-Reynaud, Keyvan Mohajer, Timothy Stonehocker
Text-to-Speech Adapted by Machine Learning

Publication number: 20200151394

Abstract: Machine learned models take in vectors representing desired behaviors and generate voice vectors that provide the parameters for text-to-speech (TTS) synthesis. Models may be trained on behavior vectors that include user profile attributes, situational attributes, or semantic attributes. Situational attributes may include age of people present, music that is playing, location, noise, and mood. Semantic attributes may include presence of proper nouns, number of modifiers, emotional charge, and domain of discourse. TTS voice parameters may apply per utterance and per word as to enable contrastive emphasis.

Type: Application

Filed: January 14, 2020

Publication date: May 14, 2020

Applicant: SoundHound, Inc.

Inventors: Bernard Mont-Reynaud, Monika Almudafar-Depeyrot
SYSTEM AND METHOD FOR PERFORMING AN INTELLIGENT CROSS-DOMAIN SEARCH

Publication number: 20200142890

Abstract: The technology disclosed relates to performing a cross-lingual search. The cross-lingual search may include receiving a first query in a first language, translating the first query from the first language to a second language, to obtain a second query in the second language, performing a first search based on the first query to obtain first language results, performing a second search based on the second query to obtain second language results, translating the second language results to the first language, to obtain translated second results and outputting overall results including at least some of the first language results and some of the translated second results.

Type: Application

Filed: November 2, 2018

Publication date: May 7, 2020

Applicant: SoundHound, Inc.

Inventors: Qindi ZHANG, Qiaozhi SONG
Parse prefix-detection in a human-machine interface

Patent number: 10636421

Abstract: A speech-based human-machine interface that parses words spoken to detect a complete parse and, responsive to so detecting, computes a hypothesis as to whether the words are a prefix to another complete parse. The duration of no voice activity period to determine an end of a sentence depends on the prefix hypothesis. The user's typical speech speed profile and a short-term measure of speech speed also scale the period. Speech speed is measured by the time between words, and the period scaling uses a continuously adaptive algorithm. The system uses a longer cut-off period after a system wake-up event but before it detects any voice activity.

Type: Grant

Filed: December 27, 2017

Date of Patent: April 28, 2020

Assignee: SOUNDHOUND, INC.

Inventors: Jennifer Hee Young Zhang, Patricia Pozon Aguayo, Jonah Probell
Bidirectional probabilistic natural language rewriting and selection

Patent number: 10599645

Abstract: A speech recognition and natural language understanding system performs insertion, deletion, and replacement edits of tokens at positions with low probabilities according to both a forward and a backward statistical language model (SLM) to produce rewritten token sequences. Multiple rewrites can be produced with scores depending on the probabilities of tokens according to the SLMs. The rewritten token sequences can be parsed according to natural language grammars to produce further weighted scores. Token sequences can be rewritten iteratively using a graph-based search algorithm to find the best rewrite. Mappings of input token sequences to rewritten token sequences can be stored in a cache, and searching for a best rewrite can be bypassed by using cached rewrites when present. Analysis of various initial token sequences that produce the same new rewritten token sequence can be useful to improve natural language grammars.

Type: Grant

Filed: October 6, 2017

Date of Patent: March 24, 2020

Assignee: SoundHound, Inc.

Inventors: Luke Lefebure, Pranav Singh
Dynamic choice of data sources in natural language query processing

Patent number: 10585891

Abstract: A virtual assistant receives natural language interpretation hypotheses for user queries, determines entities and attributes from the interpretations, and requests data from appropriate data sources. A cost function estimates the cost of each data source request. Cost functions include factors such as contract pricing, access latency, and data quality. Based on the estimated cost, the virtual assistant sends requests to a plurality of data sources, each of which might be able to provide data necessary to answer the user query. By including user credits in the cost function, the virtual assistant provides better quality of results and answer latency for paying users. The virtual assistant minimizes latency by answering using data from the first responding data source or provides a latency guarantee by answering with the most accurate data received by a deadline. The virtual assistant measures data source response latency and caches responses for expensive requests.

Type: Grant

Filed: November 3, 2016

Date of Patent: March 10, 2020

Assignee: SOUNDHOUND, INC.

Inventor: Scott Halstvedt
Parametric adaptation of voice synthesis

Patent number: 10586079

Abstract: Software-based systems perform parametric speech synthesis. TTS voice parameters determine the generated speech audio. Voice parameters include gender, age, dialect, donor, arousal, authoritativeness, pitch, range, speech rate, volume, flutter, roughness, breath, frequencies, bandwidths, and relative amplitudes of formants and nasal sounds. The system chooses TTS parameters based on one or more of: user profile attributes including gender, age, and dialect; situational attributes such as location, noise level, and mood; natural language semantic attributes such as domain of conversation, expression type, dimensions of affect, word emphasis and sentence structure; and analysis of target speaker voices. The system chooses TTS parameters to improve listener satisfaction or other desired listener behavior. Choices may be made by specified algorithms defined by code developers, or by machine learning algorithms trained on labeled samples of system performance.

Type: Grant

Filed: January 13, 2017

Date of Patent: March 10, 2020

Assignee: SOUNDHOUND, INC.

Inventors: Monika Almudafar-Depeyrot, Bernard Mont-Reynaud
VISUALLY PRESENTING INFORMATION RELEVANT TO A NATURAL LANGUAGE CONVERSATION

Publication number: 20200043479

Abstract: The present invention extends to methods, systems, and computer program products for automatically visually presenting information relevant to an utterance. Natural language expressions from conversation participants are received and processed to determine a topic and concepts, a search finds relevant information and it is visually displayed to an assisted user. Applications can include video conferencing, wearable devices, augmented reality, and heads-up vehicle displays. Topics, concepts, and information search results are analyzed for relevance and non-repetition. Relevance can depend on a user profile, conversation history, and environmental information. Further information can be requested through non-verbal modes. Searched and displayed information can be in languages other than that spoken in the conversation. Many-party conversations can be processed.

Type: Application

Filed: August 2, 2018

Publication date: February 6, 2020

Applicant: SoundHound, Inc.

Inventors: Bernard Mont-Reynaud, Jonah Probell
Concept-Based Augmentation of Queries for Applying a Buyer-Defined Function

Publication number: 20200013094

Abstract: Original concepts obtained from a query may be augmented with additional concepts connected to the original concepts in a concept graph in response to determining that the original concepts did not match a sufficient number of bid functions. The augmented set of concepts may then be evaluated with respect to the bid functions to identify matching ad functions. This process may be repeated until a sufficient number of matching ad functions are found. A bid amount of the matching bid functions may be calculated, such as based on semantic information obtained as a result of the query. The bid amounts may further be based on environmental information. A bid function is selected based on the bid amounts and the content associated with the bid function is provided to the source of the query. The content may be selected based on the semantic information.

Type: Application

Filed: September 16, 2019

Publication date: January 9, 2020

Applicant: SoundHound, Inc.

Inventors: Keyvan Mohajer, Scott Halstvedt
CUSTOM ACOUSTIC MODELS

Publication number: 20190371311

Abstract: The technology disclosed relates to performing speech recognition for a plurality of different devices or devices in a plurality of conditions. This includes storing a plurality of acoustic models associated with different devices or device conditions, receiving speech audio including natural language utterances, receiving metadata indicative of a device type or device condition, selecting an acoustic model from the plurality in dependence upon the received metadata, and employing the selected acoustic model to recognize speech from the natural language utterances included in the received speech audio. Each of speech recognition and the storage of acoustic models can be performed locally by devices or on a network-connected server. Also provided is a platform and interface, used by device developers to select, configure, and/or train acoustic models for particular devices and/or conditions.

Type: Application

Filed: June 1, 2018

Publication date: December 5, 2019

Applicant: SOUNDHOUND, INC.

Inventors: Mehul PATEL, Keyvan MOHAJER
ADAPTIVE END-OF-UTTERANCE TIMEOUT FOR REAL-TIME SPEECH RECOGNITION

Publication number: 20190325898

Abstract: Real-time speech recognition systems extend an end-of-utterance timeout period in response to the presence of a disfluency at the end of speech, and by so doing avoid cutting off speakers mid-sentence. Approaches to detecting disfluencies include the application of disfluency n-gram language models, acoustic models, prosody models, and phrase spotting. Explicit pause phrases can also be detected to extend sentence parsing until relevant semantic information is gathered from the speaker or another voice. Disfluency models can be trained such as by searching by successive deletion of tokens, phonemes, or acoustic segments to convert sentences that cannot be parsed into ones that can. Disfluency-based timeout adaptation is applicable to safety-critical systems.

Type: Application

Filed: April 23, 2018

Publication date: October 24, 2019

Applicant: SoundHound, Inc.

Inventors: Liam O'Hart Kinney, Joel McKenzie, Anitha Kandasamy
Ad bidding based on a buyer-defined function

Patent number: 10453101

Abstract: An ad processor evaluates bid functions that are based on concepts that might be generated from interpretations of natural language expressions. Ad buyers provide the functions with corresponding ads to ad processors. Bid functions are further based on the values of semantic information referenced by expressions. Bid functions are further based on environmental information. Ad buyers are able to modify bid functions. Ads may be provided in the form of questions, and may be indicated by an identifying sound. Upon finding no expression concepts within a bid function, the set of expression concepts is expanded according to strengths of connections between concepts in a concept graph.

Type: Grant

Filed: October 14, 2016

Date of Patent: October 22, 2019

Assignee: SOUNDHOUND INC.

Inventors: Scott Halstvedt, Keyvan Mohajer
Interpreting Expressions Having Potentially Ambiguous Meanings In Different Domains

Publication number: 20190303438

Abstract: The present invention extends to methods, systems, and computer program products for interpreting expressions having potentially ambiguous meanings in different domains. Multi-domain natural language understanding systems can support a variety of different types of clients. Expressions can be interpreted across multiple domains. Weights can be assigned to domains. Weights can be client specific or expression specific so that a chosen interpretation is more likely correct for the type of client or for its context. Stored weight sets can be chosen according to identifying information carried as metadata with expressions or weight sets carried directly as metadata. Domains can additionally or alternatively be ranked in ordered lists or comparative domain pairs of to favor some domains over others as appropriate for client type or client context.

Type: Application

Filed: April 2, 2018

Publication date: October 3, 2019

Applicant: SoundHound, Inc.

Inventors: Christopher S. Wilson, Keyvan Mohajer, Bernard Mont-Reynaud

prev … 4 5 6 7 8 9 10 11 12 … next