Patents Assigned to SoundHound, Inc.
-
Publication number: 20200234698Abstract: The technology disclosed relates to retrieving a personal memo from a database. The method includes receiving, by a virtual assistant, a natural language utterance that expresses a request, interpreting the natural language utterance according to a natural language grammar rule for retrieving memo data from the natural language utterance, the natural language grammar rule recognizing query information, responsive to interpreting the natural language utterance, using the query information to query the database for a memo related to the query information, and providing, to a user, a response generated in dependence upon the memo related to the query information.Type: ApplicationFiled: January 23, 2019Publication date: July 23, 2020Applicant: SoundHound, Inc.Inventors: Mara Selvaggi, Irina A. Spiridonova, Karl Stahl
-
Publication number: 20200219490Abstract: Systems and methods are provided for providing relevant information in response to natural language expressions. The expressions may be part of a spoken conversation between people either together or remotely. The information may be provided visually. Whether a piece of information is relevant to display can be conditioned by a model of the interest of the speaker. The interest model can be based on a history of expressions by the speaker and information from a user profile. The display of information can also be conditioned on a current conversation topic and on whether the same information has been displayed recently.Type: ApplicationFiled: March 18, 2020Publication date: July 9, 2020Applicant: SoundHound, Inc.Inventors: Bernard Mont-Reynaud, Jonah Probell
-
Publication number: 20200219513Abstract: A processing system detects a period of non-voice activity and compares its duration to a cutoff period. The system adapts the cutoff period based on parsing previously-recognized speech to determine, according to a model, such as a machine-learned model, the probability that the speech recognized so far is a prefix to a longer complete utterance. The cutoff period is longer when a parse of previously recognized speech has a high probability of being a prefix of a longer utterance.Type: ApplicationFiled: March 19, 2020Publication date: July 9, 2020Applicant: SoundHound, Inc.Inventors: Patricia Pozon AGUAYO, Jennifer Hee Young ZHANG, Jonah PROBELL
-
Publication number: 20200210529Abstract: A method of training word embeddings is provided. The method includes determining anchors, each comprising a first word in a first domain and a second word in a second domain, training word embeddings for the first and second domains, and training a transform for transforming word embedding vectors in the first domain to word embedding vectors in the second domain, wherein the training minimizes a loss function that includes an anchor loss for each anchor, such that for each anchor, the anchor loss is based on a distance between the anchor's second word's embedding vector and the transform of the anchor's first word's embedding vector, and for each anchor, the anchor loss for the respective anchor is zero when the distance between the respective anchor's second word's embedding vector and the transform of the respective anchor's first word's embedding vector is less than a specific tolerance.Type: ApplicationFiled: December 26, 2018Publication date: July 2, 2020Applicant: SoundHound, Inc.Inventor: Terry KONG
-
Patent number: 10699713Abstract: A server receives a user audio stream, the stream comprising multiple utterances. A query-processing module of the server continuously listens to and processes the utterances. The processing includes parsing successive utterances and recognizing corresponding queries, taking appropriate actions while the utterances are being received. In some embodiments, a query may be parsed and executed before the previous query's execution is complete.Type: GrantFiled: April 18, 2019Date of Patent: June 30, 2020Assignee: SoundHound, Inc.Inventors: Scott Halstvedt, Bernard Mont-Reynaud, Kazi Asif Wadud
-
Publication number: 20200184958Abstract: A system and method are disclosed for capturing a segment of speech audio, performing phoneme recognition on the segment of speech audio to produce a segmented phoneme sequence, comparing the segmented phoneme sequence to stored phoneme sequences that represent incorrect pronunciations of words to determine if there is a match, and identifying an incorrect pronunciation for a word in the segment of speech audio. The system builds a library based on the data collected for the incorrect pronunciations.Type: ApplicationFiled: December 7, 2018Publication date: June 11, 2020Applicant: SoundHound, Inc.Inventors: Katayoun NOROUZI, Karl STAHL
-
Publication number: 20200183815Abstract: A virtual assistant platform provides a user interface for app developers to configure the enablement of domains for virtual assistants. Sets of test queries can be uploaded and statistical analyses displayed for the numbers of test queries served by each selected domain and costs for usage of each domain. Costs can vary according to complex pricing models. The user interface provides display views of tables, cost stack charts, and histograms to inform decisions that trade-off costs with benefits to the virtual assistant user experience. The platform interface shows, for individual queries, responses possible from different domains. Platform providers promote certain chosen domains.Type: ApplicationFiled: December 7, 2018Publication date: June 11, 2020Applicant: SoundHound, Inc.Inventors: Bernard Mont-Reynaud, Jonah Probell
-
Patent number: 10657174Abstract: The present invention relates to providing identification information in response to an audio segment using a first mode of operation including receiving an audio segment and sending the audio segment to a remote server and receiving, from the remote server, identification information relating to the audio segment, and a second mode of operation of receiving an audio segment and using stored information to obtain identification information relating to the received audio segment received, without sending the audio segment to the remote server. The present invention further includes using identification information from the remote server and using local identification information and selecting either identification information from the remote server or local identification information based on selection criteria, and generating an output based on the selected identification information.Type: GrantFiled: July 24, 2018Date of Patent: May 19, 2020Assignee: SoundHound, Inc.Inventors: Aaron Master, Bernard Mont-Reynaud, Keyvan Mohajer, Timothy Stonehocker
-
Publication number: 20200151394Abstract: Machine learned models take in vectors representing desired behaviors and generate voice vectors that provide the parameters for text-to-speech (TTS) synthesis. Models may be trained on behavior vectors that include user profile attributes, situational attributes, or semantic attributes. Situational attributes may include age of people present, music that is playing, location, noise, and mood. Semantic attributes may include presence of proper nouns, number of modifiers, emotional charge, and domain of discourse. TTS voice parameters may apply per utterance and per word as to enable contrastive emphasis.Type: ApplicationFiled: January 14, 2020Publication date: May 14, 2020Applicant: SoundHound, Inc.Inventors: Bernard Mont-Reynaud, Monika Almudafar-Depeyrot
-
Publication number: 20200142890Abstract: The technology disclosed relates to performing a cross-lingual search. The cross-lingual search may include receiving a first query in a first language, translating the first query from the first language to a second language, to obtain a second query in the second language, performing a first search based on the first query to obtain first language results, performing a second search based on the second query to obtain second language results, translating the second language results to the first language, to obtain translated second results and outputting overall results including at least some of the first language results and some of the translated second results.Type: ApplicationFiled: November 2, 2018Publication date: May 7, 2020Applicant: SoundHound, Inc.Inventors: Qindi ZHANG, Qiaozhi SONG
-
Patent number: 10636421Abstract: A speech-based human-machine interface that parses words spoken to detect a complete parse and, responsive to so detecting, computes a hypothesis as to whether the words are a prefix to another complete parse. The duration of no voice activity period to determine an end of a sentence depends on the prefix hypothesis. The user's typical speech speed profile and a short-term measure of speech speed also scale the period. Speech speed is measured by the time between words, and the period scaling uses a continuously adaptive algorithm. The system uses a longer cut-off period after a system wake-up event but before it detects any voice activity.Type: GrantFiled: December 27, 2017Date of Patent: April 28, 2020Assignee: SOUNDHOUND, INC.Inventors: Jennifer Hee Young Zhang, Patricia Pozon Aguayo, Jonah Probell
-
Patent number: 10599645Abstract: A speech recognition and natural language understanding system performs insertion, deletion, and replacement edits of tokens at positions with low probabilities according to both a forward and a backward statistical language model (SLM) to produce rewritten token sequences. Multiple rewrites can be produced with scores depending on the probabilities of tokens according to the SLMs. The rewritten token sequences can be parsed according to natural language grammars to produce further weighted scores. Token sequences can be rewritten iteratively using a graph-based search algorithm to find the best rewrite. Mappings of input token sequences to rewritten token sequences can be stored in a cache, and searching for a best rewrite can be bypassed by using cached rewrites when present. Analysis of various initial token sequences that produce the same new rewritten token sequence can be useful to improve natural language grammars.Type: GrantFiled: October 6, 2017Date of Patent: March 24, 2020Assignee: SoundHound, Inc.Inventors: Luke Lefebure, Pranav Singh
-
Patent number: 10585891Abstract: A virtual assistant receives natural language interpretation hypotheses for user queries, determines entities and attributes from the interpretations, and requests data from appropriate data sources. A cost function estimates the cost of each data source request. Cost functions include factors such as contract pricing, access latency, and data quality. Based on the estimated cost, the virtual assistant sends requests to a plurality of data sources, each of which might be able to provide data necessary to answer the user query. By including user credits in the cost function, the virtual assistant provides better quality of results and answer latency for paying users. The virtual assistant minimizes latency by answering using data from the first responding data source or provides a latency guarantee by answering with the most accurate data received by a deadline. The virtual assistant measures data source response latency and caches responses for expensive requests.Type: GrantFiled: November 3, 2016Date of Patent: March 10, 2020Assignee: SOUNDHOUND, INC.Inventor: Scott Halstvedt
-
Patent number: 10586079Abstract: Software-based systems perform parametric speech synthesis. TTS voice parameters determine the generated speech audio. Voice parameters include gender, age, dialect, donor, arousal, authoritativeness, pitch, range, speech rate, volume, flutter, roughness, breath, frequencies, bandwidths, and relative amplitudes of formants and nasal sounds. The system chooses TTS parameters based on one or more of: user profile attributes including gender, age, and dialect; situational attributes such as location, noise level, and mood; natural language semantic attributes such as domain of conversation, expression type, dimensions of affect, word emphasis and sentence structure; and analysis of target speaker voices. The system chooses TTS parameters to improve listener satisfaction or other desired listener behavior. Choices may be made by specified algorithms defined by code developers, or by machine learning algorithms trained on labeled samples of system performance.Type: GrantFiled: January 13, 2017Date of Patent: March 10, 2020Assignee: SOUNDHOUND, INC.Inventors: Monika Almudafar-Depeyrot, Bernard Mont-Reynaud
-
Publication number: 20200043479Abstract: The present invention extends to methods, systems, and computer program products for automatically visually presenting information relevant to an utterance. Natural language expressions from conversation participants are received and processed to determine a topic and concepts, a search finds relevant information and it is visually displayed to an assisted user. Applications can include video conferencing, wearable devices, augmented reality, and heads-up vehicle displays. Topics, concepts, and information search results are analyzed for relevance and non-repetition. Relevance can depend on a user profile, conversation history, and environmental information. Further information can be requested through non-verbal modes. Searched and displayed information can be in languages other than that spoken in the conversation. Many-party conversations can be processed.Type: ApplicationFiled: August 2, 2018Publication date: February 6, 2020Applicant: SoundHound, Inc.Inventors: Bernard Mont-Reynaud, Jonah Probell
-
Publication number: 20200013094Abstract: Original concepts obtained from a query may be augmented with additional concepts connected to the original concepts in a concept graph in response to determining that the original concepts did not match a sufficient number of bid functions. The augmented set of concepts may then be evaluated with respect to the bid functions to identify matching ad functions. This process may be repeated until a sufficient number of matching ad functions are found. A bid amount of the matching bid functions may be calculated, such as based on semantic information obtained as a result of the query. The bid amounts may further be based on environmental information. A bid function is selected based on the bid amounts and the content associated with the bid function is provided to the source of the query. The content may be selected based on the semantic information.Type: ApplicationFiled: September 16, 2019Publication date: January 9, 2020Applicant: SoundHound, Inc.Inventors: Keyvan Mohajer, Scott Halstvedt
-
Publication number: 20190371311Abstract: The technology disclosed relates to performing speech recognition for a plurality of different devices or devices in a plurality of conditions. This includes storing a plurality of acoustic models associated with different devices or device conditions, receiving speech audio including natural language utterances, receiving metadata indicative of a device type or device condition, selecting an acoustic model from the plurality in dependence upon the received metadata, and employing the selected acoustic model to recognize speech from the natural language utterances included in the received speech audio. Each of speech recognition and the storage of acoustic models can be performed locally by devices or on a network-connected server. Also provided is a platform and interface, used by device developers to select, configure, and/or train acoustic models for particular devices and/or conditions.Type: ApplicationFiled: June 1, 2018Publication date: December 5, 2019Applicant: SOUNDHOUND, INC.Inventors: Mehul PATEL, Keyvan MOHAJER
-
Publication number: 20190325898Abstract: Real-time speech recognition systems extend an end-of-utterance timeout period in response to the presence of a disfluency at the end of speech, and by so doing avoid cutting off speakers mid-sentence. Approaches to detecting disfluencies include the application of disfluency n-gram language models, acoustic models, prosody models, and phrase spotting. Explicit pause phrases can also be detected to extend sentence parsing until relevant semantic information is gathered from the speaker or another voice. Disfluency models can be trained such as by searching by successive deletion of tokens, phonemes, or acoustic segments to convert sentences that cannot be parsed into ones that can. Disfluency-based timeout adaptation is applicable to safety-critical systems.Type: ApplicationFiled: April 23, 2018Publication date: October 24, 2019Applicant: SoundHound, Inc.Inventors: Liam O'Hart Kinney, Joel McKenzie, Anitha Kandasamy
-
Patent number: 10453101Abstract: An ad processor evaluates bid functions that are based on concepts that might be generated from interpretations of natural language expressions. Ad buyers provide the functions with corresponding ads to ad processors. Bid functions are further based on the values of semantic information referenced by expressions. Bid functions are further based on environmental information. Ad buyers are able to modify bid functions. Ads may be provided in the form of questions, and may be indicated by an identifying sound. Upon finding no expression concepts within a bid function, the set of expression concepts is expanded according to strengths of connections between concepts in a concept graph.Type: GrantFiled: October 14, 2016Date of Patent: October 22, 2019Assignee: SOUNDHOUND INC.Inventors: Scott Halstvedt, Keyvan Mohajer
-
Publication number: 20190303438Abstract: The present invention extends to methods, systems, and computer program products for interpreting expressions having potentially ambiguous meanings in different domains. Multi-domain natural language understanding systems can support a variety of different types of clients. Expressions can be interpreted across multiple domains. Weights can be assigned to domains. Weights can be client specific or expression specific so that a chosen interpretation is more likely correct for the type of client or for its context. Stored weight sets can be chosen according to identifying information carried as metadata with expressions or weight sets carried directly as metadata. Domains can additionally or alternatively be ranked in ordered lists or comparative domain pairs of to favor some domains over others as appropriate for client type or client context.Type: ApplicationFiled: April 2, 2018Publication date: October 3, 2019Applicant: SoundHound, Inc.Inventors: Christopher S. Wilson, Keyvan Mohajer, Bernard Mont-Reynaud