Patents Assigned to SoundHound, Inc.
  • Patent number: 11741943
    Abstract: A method and system for acoustic model conditioning on non-phoneme information features for optimized automatic speech recognition is provided. The method includes using an encoder model to encode sound embedding from a known key phrase of speech and conditioning an acoustic model with the sound embedding to optimize its performance in inferring the probabilities of phonemes in the speech. The sound embedding can comprise non-phoneme information related to the key phrase and the following utterance. Further, the encoder model and the acoustic model can be neural networks that are jointly trained with audio data.
    Type: Grant
    Filed: April 7, 2021
    Date of Patent: August 29, 2023
    Assignee: SoundHound, Inc
    Inventors: Zizu Gowayyed, Keyvan Mohajer
  • Patent number: 11736769
    Abstract: Various approaches relate to user defined content filtering in media playing devices of undesirable content represented in stored and real-time content from content providers. For example, video, image, and/or audio data can be analyzed to identify and classify content included in the data using various classification models and object and text recognition approaches. Thereafter, the identification and classification can be used to control presentation and/or access to the content and/or portions of the content. For example, based on the classification, portions of the content can be modified (e.g., replaced, removed, degraded, etc.) using one or more techniques (e.g., media replacement, media removal, media degradation, etc.) and then presented.
    Type: Grant
    Filed: April 12, 2021
    Date of Patent: August 22, 2023
    Assignee: SoundHound, Inc
    Inventors: Thor S. Khov, Terry Kong
  • Publication number: 20230245661
    Abstract: A video conferencing system, such as one implemented with a cloud server, receives audio streams from a plurality of endpoints. The system uses automatic speech recognition to transcribe speech in the audio streams. The system multiplexes the transcriptions into individual caption streams and sends them to the endpoints, but the caption stream to each endpoint omits the transcription of audio from the endpoint. Some systems allow muting of audio through an indication to the system. The system then omits sending the muted audio to other endpoints and also omits sending a transcription of the muted audio to other endpoints.
    Type: Application
    Filed: April 10, 2023
    Publication date: August 3, 2023
    Applicant: SoundHound, Inc.
    Inventor: Ethan COEYTAUX
  • Publication number: 20230245649
    Abstract: Methods and systems for correction of a likely erroneous word in a speech transcription are disclosed. By evaluating token confidence scores of individual words or phrases, the automatic speech recognition system can replace a low-confidence score word with a substitute word or phrase. Among various approaches, neural network models can be used to generate individual confidence scores. Such word substitution can enable the speech recognition system to automatically detect and correct likely errors in transcription. Furthermore, the system can indicate the token confidence scores on a graphic user interface for labeling and dictionary enhancement.
    Type: Application
    Filed: February 3, 2022
    Publication date: August 3, 2023
    Applicant: SoundHound, Inc.
    Inventors: Pranav SINGH, Saraswati MISHRA, Eunjee NA
  • Publication number: 20230237056
    Abstract: A method and an apparatus for processing an intelligent voice query. A voice query input is received from a user. Automatic speech recognition and natural language understanding generate structured query data. It is modified based on an input adaptation rule to obtain modified structured query data appropriate for a content providing server, which provides a query result output corresponding to the modified structured query data. Input adaptation rules may comprise rule sets based on behavior patterns of the user and/or business recommendations. The query result output can be used for natural language generation, which may have similar adaptation rules for output.
    Type: Application
    Filed: March 14, 2022
    Publication date: July 27, 2023
    Applicant: SoundHound, Inc.
    Inventor: Chong WANG
  • Publication number: 20230206915
    Abstract: A method of assisting a user. The method including obtaining a plurality of rules having condition components and action components, the action components specifying conversation schemas, detecting, by a sensor, a fact related to an environment of the user, identifying a rule, of the plurality of rules, having a condition component that is satisfied by the detected fact, initiating a conversation with the user according to a conversation schema of the action component of the rule of the plurality of rules, and performing an action in response to a positive statement by the user.
    Type: Application
    Filed: December 23, 2021
    Publication date: June 29, 2023
    Applicant: SoundHound, Inc.
    Inventors: Keyvan MOHAJER, Kaishin KAM, Christophe PIERRET
  • Publication number: 20230126052
    Abstract: Systems and methods are disclosed that enable a user to speak a promoted phrase in response to a voice content or voice advertisement, which includes the promoted phrase. When the promoted phrase is spoken, then additional content is provided, such as additional advertisement. According to various examples, detection of the user speaking the promoted phrase is enabled once the voice advertisement ends. According to various examples, the additional content is related to the promoted phrase. According to various examples, detection of the user speaking the promoted phrase is done within a time frame; once the time frame is exceeded, detection of the user speaking the promoted phrase is disabled.
    Type: Application
    Filed: October 27, 2021
    Publication date: April 27, 2023
    Applicant: SoundHound, Inc.
    Inventors: Keyvan MOHAJER, Michael Zagorsek
  • Patent number: 11636853
    Abstract: A method for configuring natural language grammars is provided to include identifying a first transcription having a first automatic speech recognition (ASR) score and a first natural language understanding (NLU) score and identifying a second transcription having a second ASR score and a second NLU score. The method includes detecting that a difference between the first and second ASR scores has a signed value with an opposite sign than a sign of a signed value of a difference between the first and second NLU scores, and responsive to detecting the opposite sign providing, to an evaluator, the audio query and the first and second transcriptions, receiving, from the evaluator, an indication of which of the first and second transcriptions is a correct transcription, and adjusting a value implemented to calculate the first NLU score or a value implemented to calculate the second NLU score.
    Type: Grant
    Filed: August 20, 2019
    Date of Patent: April 25, 2023
    Assignee: SoundHound, Inc.
    Inventor: Angela Rose Howard
  • Publication number: 20230082955
    Abstract: A system for performing automated speech recognition (ASR) on audio data includes a queue manager to receive a request to perform ASR on audio data, add the request to a queue of incoming requests, and determine a queue depth representing a number of requests in the queue at a given time. The system also includes a load supervisor to receive the request and the queue depth from the queue manager and assign a service level for the request based on the queue depth. In addition, the system includes a speech-to-text converter to receive the assigned service level for the request from the load supervisor, select an ASR model for the request based on the received service level, receive the audio data associated with the request, and perform ASR on the audio data using the selected ASR model.
    Type: Application
    Filed: September 16, 2021
    Publication date: March 16, 2023
    Applicant: SoundHound, Inc.
    Inventors: Timothy P. STONEHOCKER, Zizu GOWAYYED, Matthias EICHSTAEDT, Seyed Majid EMAMI, Evelyn JIANG, Ryan BERRYHILL, Mathieu RAMONA, Neil VEIRA
  • Patent number: 11600284
    Abstract: A voice morphing apparatus having adjustable parameters is described. The disclosed system and method include a voice morphing apparatus that morphs input audio to mask a speaker's identity. Parameter adjustment uses evaluation of an objective function that is based on the input audio and output of the voice morphing apparatus. The voice morphing apparatus includes objectives that are based adversarially on speaker identification and positively on audio fidelity. Thus, the voice morphing apparatus is adjusted to reduce identifiability of speakers while maintaining fidelity of the morphed audio. The voice morphing apparatus may be used as part of an automatic speech recognition system.
    Type: Grant
    Filed: January 11, 2020
    Date of Patent: March 7, 2023
    Assignee: SOUNDHOUND, INC.
    Inventor: Steve Pearson
  • Publication number: 20230055477
    Abstract: Methods and systems for implementing an intuitive interaction between the user and the virtual content of augmented reality applications are disclosed. By implementing an augmented reality inquiry mode of a device, the system can enable a user to interact with relevant virtual objects via a speech-enabled interface. The speech-enabled augmented reality system can identify visual objects in images and recognize virtual objects corresponding to the visual objects, determine one or more relevant objects from the virtual objects based on relevance factors. Once the interaction session is established, a user can further interact with the relevant virtual objects, notably through voice commands addressed to the object. Accordingly, the present subject matter can enable a natural and hands-free interaction between the user and any virtual object that the user is interested in.
    Type: Application
    Filed: August 23, 2021
    Publication date: February 23, 2023
    Applicant: SoundHound, Inc.
    Inventors: Keyvan MOHAJER, Morris MICHAEL, Bernard MONT-REYNAUD
  • Publication number: 20230059765
    Abstract: A method and system for controlling a GUI on a user's network-connected device, the control being provided by a telephone call between the user and a speech recognition and speech synthesis system. An example of a restaurant ordering system is provided. The user calls a phone number and is guided through a verbal ordering process that includes one or more of: adding an item, deleting an item, changing quantities, changing sizes, and changing details of an item. The user's choices are added to a display so that a current status of the order is visible to the user. The GUI is updated as changes are made to the order. The GUI can also request additional information, upsell items, and show menus. The GUI aids the user in confirming that the order is correct. The system provides the final order to a restaurant for fulfillment.
    Type: Application
    Filed: August 22, 2021
    Publication date: February 23, 2023
    Applicant: SoundHound, Inc.
    Inventors: Kamyar MOHAJER, Keyvan MOHAJER, James HOM, Evelyn JIANG
  • Patent number: 11589184
    Abstract: Methods and systems for intuitive spatial audio rendering with improved intelligibility are disclosed. By establishing a virtual association between an audio source and a location in the listener's virtual audio space, a spatial audio rendering system can generate spatial audio signals that create a natural and immersive audio field for a listener. The system can receive the virtual location of the source as a parameter and map the source audio signal to a source-specific multi-channel audio signal. In addition, the spatial audio rendering system can be interactive and dynamically modify the rendering of the spatial audio in response to a user's active control or tracked movement.
    Type: Grant
    Filed: March 21, 2022
    Date of Patent: February 21, 2023
    Assignee: SoundHound, Inc
    Inventor: Bernard Mont-Reynaud
  • Publication number: 20230010815
    Abstract: A method and system for implementing a speech-enabled interface of a host device via an electronic mobile device in a network are provided. The method includes establishing a communication session between the host device and the mobile device via a session service provider. According to some embodiments, a barcode can be adopted to enable the pairing of the host device and mobile device. Furthermore, the present method and system employ the voice interface in conjunction with speech recognition systems and natural language processing to interpret voice input for the hosting device, which can be used to perform one or more actions related to the hosting device.
    Type: Application
    Filed: July 9, 2021
    Publication date: January 12, 2023
    Applicant: SoundHound, Inc.
    Inventor: Keisuke Tsuchida
  • Patent number: 11551083
    Abstract: Training and enhancement of neural network models, such as from private data, are described. A slave device receives a version of a neural network model from a master. The slave accesses a local and/or private data source and uses the data to perform optimization of the neural network model. This can be done such as by computing gradients or performing knowledge distillation to locally train an enhanced second version of the model. The slave sends the gradients or enhanced neural network model to a master. The master may use the gradient or second version of the model to improve a master model.
    Type: Grant
    Filed: December 17, 2019
    Date of Patent: January 10, 2023
    Assignee: SoundHound, Inc.
    Inventors: Zili Li, Asif Amirguliyev, Jonah Probell
  • Patent number: 11539920
    Abstract: A system and a method are disclosed that enable sidebar conversations between two or more attendees that are participating in a primary or main meeting. The sidebar conversation occurs in conjunction or concurrently with the primary meeting. A first attendee provides commands to indicate a desire to initiate a sidebar conversation and information about a targeted attendee. The commands are analyzed to determine if a trigger phrase is included. The commands are analyzed to determine if there is an identification of a second (targeted) attendee, who is currently participating in the main meeting. If the second attendee is available, then the sidebar conversation is initiated. Additional attendees can be added to the sidebar conversation.
    Type: Grant
    Filed: June 21, 2021
    Date of Patent: December 27, 2022
    Assignee: SoundHound, Inc.
    Inventor: Timothy P Stonehocker
  • Publication number: 20220405797
    Abstract: Ads are generated based on product info and consumer profiles. A discriminator evaluates probabilities of ads being effective at causing consumer engagement. A decoder extracts product info from generated ads. Based on the probabilities of ads being effective and similarity of extracted and source product info, generated ads are labeled as examples. The examples are used in training an improved ad generator. Ads may be visual and/or audio containing speech. Ads may even contain humor, as recognized by mismatches between source and decoded product info.
    Type: Application
    Filed: August 18, 2022
    Publication date: December 22, 2022
    Applicant: SoundHound, Inc.
    Inventor: Jonah PROBELL
  • Publication number: 20220408059
    Abstract: A system and a method are disclosed that enable sidebar conversations between two or more attendees that are participating in a primary or main meeting. The sidebar conversation occurs in conjunction or concurrently with the primary meeting. A first attendee provides commands to indicate a desire to initiate a sidebar conversation and information about a targeted attendee. The commands are analyzed to determine if a trigger phrase is included. The commands are analyzed to determine if there is an identification of a second (targeted) attendee, who is currently participating in the main meeting. If the second attendee is available, then the sidebar conversation is initiated. Additional attendees can be added to the sidebar conversation.
    Type: Application
    Filed: June 21, 2021
    Publication date: December 22, 2022
    Applicant: SoundHound, Inc.
    Inventor: Timothy P STONEHOCKER
  • Patent number: 11531819
    Abstract: Machine learned models take in vectors representing desired behaviors and generate voice vectors that provide the parameters for text-to-speech (TTS) synthesis. Models may be trained on behavior vectors that include user profile attributes, situational attributes, or semantic attributes. Situational attributes may include age of people present, music that is playing, location, noise, and mood. Semantic attributes may include presence of proper nouns, number of modifiers, emotional charge, and domain of discourse. TTS voice parameters may apply per utterance and per word as to enable contrastive emphasis.
    Type: Grant
    Filed: January 14, 2020
    Date of Patent: December 20, 2022
    Assignee: SoundHound, Inc.
    Inventors: Bernard Mont-Reynaud, Monika Almudafar-Depeyrot
  • Publication number: 20220382823
    Abstract: As audio (1) is input to an extension of a browser, the extension transmits the audio (1) to a language processing server. A speech recognition unit obtains a text (1) corresponding to the audio (1), and transmits the text (1) to a natural language understanding unit. In the natural language understanding unit, an information processing unit identifies a URL (1) corresponding to the text (1), and transmits the URL (1) to the browser. The extension passes the URL (1) to a browsing function. The browsing function uses the URL (1) to access a web server. The web server transmits a web page (1) corresponding to the URL (1) to the browser. The browsing function shows a screen corresponding to the web page (1) on a display.
    Type: Application
    Filed: January 26, 2022
    Publication date: December 1, 2022
    Applicant: SoundHound, Inc.
    Inventors: Masaki NAITO, Keisuke TSUCHIDA, Jun YONEYAMA, Kaku SAWADA