Patents Assigned to SoundHound, Inc.
  • Publication number: 20200234698
    Abstract: The technology disclosed relates to retrieving a personal memo from a database. The method includes receiving, by a virtual assistant, a natural language utterance that expresses a request, interpreting the natural language utterance according to a natural language grammar rule for retrieving memo data from the natural language utterance, the natural language grammar rule recognizing query information, responsive to interpreting the natural language utterance, using the query information to query the database for a memo related to the query information, and providing, to a user, a response generated in dependence upon the memo related to the query information.
    Type: Application
    Filed: January 23, 2019
    Publication date: July 23, 2020
    Applicant: SoundHound, Inc.
    Inventors: Mara Selvaggi, Irina A. Spiridonova, Karl Stahl
  • Publication number: 20200219490
    Abstract: Systems and methods are provided for providing relevant information in response to natural language expressions. The expressions may be part of a spoken conversation between people either together or remotely. The information may be provided visually. Whether a piece of information is relevant to display can be conditioned by a model of the interest of the speaker. The interest model can be based on a history of expressions by the speaker and information from a user profile. The display of information can also be conditioned on a current conversation topic and on whether the same information has been displayed recently.
    Type: Application
    Filed: March 18, 2020
    Publication date: July 9, 2020
    Applicant: SoundHound, Inc.
    Inventors: Bernard Mont-Reynaud, Jonah Probell
  • Publication number: 20200219513
    Abstract: A processing system detects a period of non-voice activity and compares its duration to a cutoff period. The system adapts the cutoff period based on parsing previously-recognized speech to determine, according to a model, such as a machine-learned model, the probability that the speech recognized so far is a prefix to a longer complete utterance. The cutoff period is longer when a parse of previously recognized speech has a high probability of being a prefix of a longer utterance.
    Type: Application
    Filed: March 19, 2020
    Publication date: July 9, 2020
    Applicant: SoundHound, Inc.
    Inventors: Patricia Pozon AGUAYO, Jennifer Hee Young ZHANG, Jonah PROBELL
  • Publication number: 20200210529
    Abstract: A method of training word embeddings is provided. The method includes determining anchors, each comprising a first word in a first domain and a second word in a second domain, training word embeddings for the first and second domains, and training a transform for transforming word embedding vectors in the first domain to word embedding vectors in the second domain, wherein the training minimizes a loss function that includes an anchor loss for each anchor, such that for each anchor, the anchor loss is based on a distance between the anchor's second word's embedding vector and the transform of the anchor's first word's embedding vector, and for each anchor, the anchor loss for the respective anchor is zero when the distance between the respective anchor's second word's embedding vector and the transform of the respective anchor's first word's embedding vector is less than a specific tolerance.
    Type: Application
    Filed: December 26, 2018
    Publication date: July 2, 2020
    Applicant: SoundHound, Inc.
    Inventor: Terry KONG
  • Patent number: 10699713
    Abstract: A server receives a user audio stream, the stream comprising multiple utterances. A query-processing module of the server continuously listens to and processes the utterances. The processing includes parsing successive utterances and recognizing corresponding queries, taking appropriate actions while the utterances are being received. In some embodiments, a query may be parsed and executed before the previous query's execution is complete.
    Type: Grant
    Filed: April 18, 2019
    Date of Patent: June 30, 2020
    Assignee: SoundHound, Inc.
    Inventors: Scott Halstvedt, Bernard Mont-Reynaud, Kazi Asif Wadud
  • Publication number: 20200184958
    Abstract: A system and method are disclosed for capturing a segment of speech audio, performing phoneme recognition on the segment of speech audio to produce a segmented phoneme sequence, comparing the segmented phoneme sequence to stored phoneme sequences that represent incorrect pronunciations of words to determine if there is a match, and identifying an incorrect pronunciation for a word in the segment of speech audio. The system builds a library based on the data collected for the incorrect pronunciations.
    Type: Application
    Filed: December 7, 2018
    Publication date: June 11, 2020
    Applicant: SoundHound, Inc.
    Inventors: Katayoun NOROUZI, Karl STAHL
  • Publication number: 20200183815
    Abstract: A virtual assistant platform provides a user interface for app developers to configure the enablement of domains for virtual assistants. Sets of test queries can be uploaded and statistical analyses displayed for the numbers of test queries served by each selected domain and costs for usage of each domain. Costs can vary according to complex pricing models. The user interface provides display views of tables, cost stack charts, and histograms to inform decisions that trade-off costs with benefits to the virtual assistant user experience. The platform interface shows, for individual queries, responses possible from different domains. Platform providers promote certain chosen domains.
    Type: Application
    Filed: December 7, 2018
    Publication date: June 11, 2020
    Applicant: SoundHound, Inc.
    Inventors: Bernard Mont-Reynaud, Jonah Probell
  • Patent number: 10657174
    Abstract: The present invention relates to providing identification information in response to an audio segment using a first mode of operation including receiving an audio segment and sending the audio segment to a remote server and receiving, from the remote server, identification information relating to the audio segment, and a second mode of operation of receiving an audio segment and using stored information to obtain identification information relating to the received audio segment received, without sending the audio segment to the remote server. The present invention further includes using identification information from the remote server and using local identification information and selecting either identification information from the remote server or local identification information based on selection criteria, and generating an output based on the selected identification information.
    Type: Grant
    Filed: July 24, 2018
    Date of Patent: May 19, 2020
    Assignee: SoundHound, Inc.
    Inventors: Aaron Master, Bernard Mont-Reynaud, Keyvan Mohajer, Timothy Stonehocker
  • Publication number: 20200151394
    Abstract: Machine learned models take in vectors representing desired behaviors and generate voice vectors that provide the parameters for text-to-speech (TTS) synthesis. Models may be trained on behavior vectors that include user profile attributes, situational attributes, or semantic attributes. Situational attributes may include age of people present, music that is playing, location, noise, and mood. Semantic attributes may include presence of proper nouns, number of modifiers, emotional charge, and domain of discourse. TTS voice parameters may apply per utterance and per word as to enable contrastive emphasis.
    Type: Application
    Filed: January 14, 2020
    Publication date: May 14, 2020
    Applicant: SoundHound, Inc.
    Inventors: Bernard Mont-Reynaud, Monika Almudafar-Depeyrot
  • Publication number: 20200142890
    Abstract: The technology disclosed relates to performing a cross-lingual search. The cross-lingual search may include receiving a first query in a first language, translating the first query from the first language to a second language, to obtain a second query in the second language, performing a first search based on the first query to obtain first language results, performing a second search based on the second query to obtain second language results, translating the second language results to the first language, to obtain translated second results and outputting overall results including at least some of the first language results and some of the translated second results.
    Type: Application
    Filed: November 2, 2018
    Publication date: May 7, 2020
    Applicant: SoundHound, Inc.
    Inventors: Qindi ZHANG, Qiaozhi SONG
  • Patent number: 10636421
    Abstract: A speech-based human-machine interface that parses words spoken to detect a complete parse and, responsive to so detecting, computes a hypothesis as to whether the words are a prefix to another complete parse. The duration of no voice activity period to determine an end of a sentence depends on the prefix hypothesis. The user's typical speech speed profile and a short-term measure of speech speed also scale the period. Speech speed is measured by the time between words, and the period scaling uses a continuously adaptive algorithm. The system uses a longer cut-off period after a system wake-up event but before it detects any voice activity.
    Type: Grant
    Filed: December 27, 2017
    Date of Patent: April 28, 2020
    Assignee: SOUNDHOUND, INC.
    Inventors: Jennifer Hee Young Zhang, Patricia Pozon Aguayo, Jonah Probell
  • Patent number: 10599645
    Abstract: A speech recognition and natural language understanding system performs insertion, deletion, and replacement edits of tokens at positions with low probabilities according to both a forward and a backward statistical language model (SLM) to produce rewritten token sequences. Multiple rewrites can be produced with scores depending on the probabilities of tokens according to the SLMs. The rewritten token sequences can be parsed according to natural language grammars to produce further weighted scores. Token sequences can be rewritten iteratively using a graph-based search algorithm to find the best rewrite. Mappings of input token sequences to rewritten token sequences can be stored in a cache, and searching for a best rewrite can be bypassed by using cached rewrites when present. Analysis of various initial token sequences that produce the same new rewritten token sequence can be useful to improve natural language grammars.
    Type: Grant
    Filed: October 6, 2017
    Date of Patent: March 24, 2020
    Assignee: SoundHound, Inc.
    Inventors: Luke Lefebure, Pranav Singh
  • Patent number: 10585891
    Abstract: A virtual assistant receives natural language interpretation hypotheses for user queries, determines entities and attributes from the interpretations, and requests data from appropriate data sources. A cost function estimates the cost of each data source request. Cost functions include factors such as contract pricing, access latency, and data quality. Based on the estimated cost, the virtual assistant sends requests to a plurality of data sources, each of which might be able to provide data necessary to answer the user query. By including user credits in the cost function, the virtual assistant provides better quality of results and answer latency for paying users. The virtual assistant minimizes latency by answering using data from the first responding data source or provides a latency guarantee by answering with the most accurate data received by a deadline. The virtual assistant measures data source response latency and caches responses for expensive requests.
    Type: Grant
    Filed: November 3, 2016
    Date of Patent: March 10, 2020
    Assignee: SOUNDHOUND, INC.
    Inventor: Scott Halstvedt
  • Patent number: 10586079
    Abstract: Software-based systems perform parametric speech synthesis. TTS voice parameters determine the generated speech audio. Voice parameters include gender, age, dialect, donor, arousal, authoritativeness, pitch, range, speech rate, volume, flutter, roughness, breath, frequencies, bandwidths, and relative amplitudes of formants and nasal sounds. The system chooses TTS parameters based on one or more of: user profile attributes including gender, age, and dialect; situational attributes such as location, noise level, and mood; natural language semantic attributes such as domain of conversation, expression type, dimensions of affect, word emphasis and sentence structure; and analysis of target speaker voices. The system chooses TTS parameters to improve listener satisfaction or other desired listener behavior. Choices may be made by specified algorithms defined by code developers, or by machine learning algorithms trained on labeled samples of system performance.
    Type: Grant
    Filed: January 13, 2017
    Date of Patent: March 10, 2020
    Assignee: SOUNDHOUND, INC.
    Inventors: Monika Almudafar-Depeyrot, Bernard Mont-Reynaud
  • Publication number: 20200043479
    Abstract: The present invention extends to methods, systems, and computer program products for automatically visually presenting information relevant to an utterance. Natural language expressions from conversation participants are received and processed to determine a topic and concepts, a search finds relevant information and it is visually displayed to an assisted user. Applications can include video conferencing, wearable devices, augmented reality, and heads-up vehicle displays. Topics, concepts, and information search results are analyzed for relevance and non-repetition. Relevance can depend on a user profile, conversation history, and environmental information. Further information can be requested through non-verbal modes. Searched and displayed information can be in languages other than that spoken in the conversation. Many-party conversations can be processed.
    Type: Application
    Filed: August 2, 2018
    Publication date: February 6, 2020
    Applicant: SoundHound, Inc.
    Inventors: Bernard Mont-Reynaud, Jonah Probell
  • Publication number: 20200013094
    Abstract: Original concepts obtained from a query may be augmented with additional concepts connected to the original concepts in a concept graph in response to determining that the original concepts did not match a sufficient number of bid functions. The augmented set of concepts may then be evaluated with respect to the bid functions to identify matching ad functions. This process may be repeated until a sufficient number of matching ad functions are found. A bid amount of the matching bid functions may be calculated, such as based on semantic information obtained as a result of the query. The bid amounts may further be based on environmental information. A bid function is selected based on the bid amounts and the content associated with the bid function is provided to the source of the query. The content may be selected based on the semantic information.
    Type: Application
    Filed: September 16, 2019
    Publication date: January 9, 2020
    Applicant: SoundHound, Inc.
    Inventors: Keyvan Mohajer, Scott Halstvedt
  • Publication number: 20190371311
    Abstract: The technology disclosed relates to performing speech recognition for a plurality of different devices or devices in a plurality of conditions. This includes storing a plurality of acoustic models associated with different devices or device conditions, receiving speech audio including natural language utterances, receiving metadata indicative of a device type or device condition, selecting an acoustic model from the plurality in dependence upon the received metadata, and employing the selected acoustic model to recognize speech from the natural language utterances included in the received speech audio. Each of speech recognition and the storage of acoustic models can be performed locally by devices or on a network-connected server. Also provided is a platform and interface, used by device developers to select, configure, and/or train acoustic models for particular devices and/or conditions.
    Type: Application
    Filed: June 1, 2018
    Publication date: December 5, 2019
    Applicant: SOUNDHOUND, INC.
    Inventors: Mehul PATEL, Keyvan MOHAJER
  • Publication number: 20190325898
    Abstract: Real-time speech recognition systems extend an end-of-utterance timeout period in response to the presence of a disfluency at the end of speech, and by so doing avoid cutting off speakers mid-sentence. Approaches to detecting disfluencies include the application of disfluency n-gram language models, acoustic models, prosody models, and phrase spotting. Explicit pause phrases can also be detected to extend sentence parsing until relevant semantic information is gathered from the speaker or another voice. Disfluency models can be trained such as by searching by successive deletion of tokens, phonemes, or acoustic segments to convert sentences that cannot be parsed into ones that can. Disfluency-based timeout adaptation is applicable to safety-critical systems.
    Type: Application
    Filed: April 23, 2018
    Publication date: October 24, 2019
    Applicant: SoundHound, Inc.
    Inventors: Liam O'Hart Kinney, Joel McKenzie, Anitha Kandasamy
  • Patent number: 10453101
    Abstract: An ad processor evaluates bid functions that are based on concepts that might be generated from interpretations of natural language expressions. Ad buyers provide the functions with corresponding ads to ad processors. Bid functions are further based on the values of semantic information referenced by expressions. Bid functions are further based on environmental information. Ad buyers are able to modify bid functions. Ads may be provided in the form of questions, and may be indicated by an identifying sound. Upon finding no expression concepts within a bid function, the set of expression concepts is expanded according to strengths of connections between concepts in a concept graph.
    Type: Grant
    Filed: October 14, 2016
    Date of Patent: October 22, 2019
    Assignee: SOUNDHOUND INC.
    Inventors: Scott Halstvedt, Keyvan Mohajer
  • Publication number: 20190303438
    Abstract: The present invention extends to methods, systems, and computer program products for interpreting expressions having potentially ambiguous meanings in different domains. Multi-domain natural language understanding systems can support a variety of different types of clients. Expressions can be interpreted across multiple domains. Weights can be assigned to domains. Weights can be client specific or expression specific so that a chosen interpretation is more likely correct for the type of client or for its context. Stored weight sets can be chosen according to identifying information carried as metadata with expressions or weight sets carried directly as metadata. Domains can additionally or alternatively be ranked in ordered lists or comparative domain pairs of to favor some domains over others as appropriate for client type or client context.
    Type: Application
    Filed: April 2, 2018
    Publication date: October 3, 2019
    Applicant: SoundHound, Inc.
    Inventors: Christopher S. Wilson, Keyvan Mohajer, Bernard Mont-Reynaud