Patents Assigned to SoundHound, Inc.
-
Patent number: 11741943Abstract: A method and system for acoustic model conditioning on non-phoneme information features for optimized automatic speech recognition is provided. The method includes using an encoder model to encode sound embedding from a known key phrase of speech and conditioning an acoustic model with the sound embedding to optimize its performance in inferring the probabilities of phonemes in the speech. The sound embedding can comprise non-phoneme information related to the key phrase and the following utterance. Further, the encoder model and the acoustic model can be neural networks that are jointly trained with audio data.Type: GrantFiled: April 7, 2021Date of Patent: August 29, 2023Assignee: SoundHound, IncInventors: Zizu Gowayyed, Keyvan Mohajer
-
Patent number: 11736769Abstract: Various approaches relate to user defined content filtering in media playing devices of undesirable content represented in stored and real-time content from content providers. For example, video, image, and/or audio data can be analyzed to identify and classify content included in the data using various classification models and object and text recognition approaches. Thereafter, the identification and classification can be used to control presentation and/or access to the content and/or portions of the content. For example, based on the classification, portions of the content can be modified (e.g., replaced, removed, degraded, etc.) using one or more techniques (e.g., media replacement, media removal, media degradation, etc.) and then presented.Type: GrantFiled: April 12, 2021Date of Patent: August 22, 2023Assignee: SoundHound, IncInventors: Thor S. Khov, Terry Kong
-
Publication number: 20230245661Abstract: A video conferencing system, such as one implemented with a cloud server, receives audio streams from a plurality of endpoints. The system uses automatic speech recognition to transcribe speech in the audio streams. The system multiplexes the transcriptions into individual caption streams and sends them to the endpoints, but the caption stream to each endpoint omits the transcription of audio from the endpoint. Some systems allow muting of audio through an indication to the system. The system then omits sending the muted audio to other endpoints and also omits sending a transcription of the muted audio to other endpoints.Type: ApplicationFiled: April 10, 2023Publication date: August 3, 2023Applicant: SoundHound, Inc.Inventor: Ethan COEYTAUX
-
Publication number: 20230245649Abstract: Methods and systems for correction of a likely erroneous word in a speech transcription are disclosed. By evaluating token confidence scores of individual words or phrases, the automatic speech recognition system can replace a low-confidence score word with a substitute word or phrase. Among various approaches, neural network models can be used to generate individual confidence scores. Such word substitution can enable the speech recognition system to automatically detect and correct likely errors in transcription. Furthermore, the system can indicate the token confidence scores on a graphic user interface for labeling and dictionary enhancement.Type: ApplicationFiled: February 3, 2022Publication date: August 3, 2023Applicant: SoundHound, Inc.Inventors: Pranav SINGH, Saraswati MISHRA, Eunjee NA
-
Publication number: 20230237056Abstract: A method and an apparatus for processing an intelligent voice query. A voice query input is received from a user. Automatic speech recognition and natural language understanding generate structured query data. It is modified based on an input adaptation rule to obtain modified structured query data appropriate for a content providing server, which provides a query result output corresponding to the modified structured query data. Input adaptation rules may comprise rule sets based on behavior patterns of the user and/or business recommendations. The query result output can be used for natural language generation, which may have similar adaptation rules for output.Type: ApplicationFiled: March 14, 2022Publication date: July 27, 2023Applicant: SoundHound, Inc.Inventor: Chong WANG
-
Publication number: 20230206915Abstract: A method of assisting a user. The method including obtaining a plurality of rules having condition components and action components, the action components specifying conversation schemas, detecting, by a sensor, a fact related to an environment of the user, identifying a rule, of the plurality of rules, having a condition component that is satisfied by the detected fact, initiating a conversation with the user according to a conversation schema of the action component of the rule of the plurality of rules, and performing an action in response to a positive statement by the user.Type: ApplicationFiled: December 23, 2021Publication date: June 29, 2023Applicant: SoundHound, Inc.Inventors: Keyvan MOHAJER, Kaishin KAM, Christophe PIERRET
-
Publication number: 20230126052Abstract: Systems and methods are disclosed that enable a user to speak a promoted phrase in response to a voice content or voice advertisement, which includes the promoted phrase. When the promoted phrase is spoken, then additional content is provided, such as additional advertisement. According to various examples, detection of the user speaking the promoted phrase is enabled once the voice advertisement ends. According to various examples, the additional content is related to the promoted phrase. According to various examples, detection of the user speaking the promoted phrase is done within a time frame; once the time frame is exceeded, detection of the user speaking the promoted phrase is disabled.Type: ApplicationFiled: October 27, 2021Publication date: April 27, 2023Applicant: SoundHound, Inc.Inventors: Keyvan MOHAJER, Michael Zagorsek
-
Patent number: 11636853Abstract: A method for configuring natural language grammars is provided to include identifying a first transcription having a first automatic speech recognition (ASR) score and a first natural language understanding (NLU) score and identifying a second transcription having a second ASR score and a second NLU score. The method includes detecting that a difference between the first and second ASR scores has a signed value with an opposite sign than a sign of a signed value of a difference between the first and second NLU scores, and responsive to detecting the opposite sign providing, to an evaluator, the audio query and the first and second transcriptions, receiving, from the evaluator, an indication of which of the first and second transcriptions is a correct transcription, and adjusting a value implemented to calculate the first NLU score or a value implemented to calculate the second NLU score.Type: GrantFiled: August 20, 2019Date of Patent: April 25, 2023Assignee: SoundHound, Inc.Inventor: Angela Rose Howard
-
Publication number: 20230082955Abstract: A system for performing automated speech recognition (ASR) on audio data includes a queue manager to receive a request to perform ASR on audio data, add the request to a queue of incoming requests, and determine a queue depth representing a number of requests in the queue at a given time. The system also includes a load supervisor to receive the request and the queue depth from the queue manager and assign a service level for the request based on the queue depth. In addition, the system includes a speech-to-text converter to receive the assigned service level for the request from the load supervisor, select an ASR model for the request based on the received service level, receive the audio data associated with the request, and perform ASR on the audio data using the selected ASR model.Type: ApplicationFiled: September 16, 2021Publication date: March 16, 2023Applicant: SoundHound, Inc.Inventors: Timothy P. STONEHOCKER, Zizu GOWAYYED, Matthias EICHSTAEDT, Seyed Majid EMAMI, Evelyn JIANG, Ryan BERRYHILL, Mathieu RAMONA, Neil VEIRA
-
Patent number: 11600284Abstract: A voice morphing apparatus having adjustable parameters is described. The disclosed system and method include a voice morphing apparatus that morphs input audio to mask a speaker's identity. Parameter adjustment uses evaluation of an objective function that is based on the input audio and output of the voice morphing apparatus. The voice morphing apparatus includes objectives that are based adversarially on speaker identification and positively on audio fidelity. Thus, the voice morphing apparatus is adjusted to reduce identifiability of speakers while maintaining fidelity of the morphed audio. The voice morphing apparatus may be used as part of an automatic speech recognition system.Type: GrantFiled: January 11, 2020Date of Patent: March 7, 2023Assignee: SOUNDHOUND, INC.Inventor: Steve Pearson
-
Publication number: 20230055477Abstract: Methods and systems for implementing an intuitive interaction between the user and the virtual content of augmented reality applications are disclosed. By implementing an augmented reality inquiry mode of a device, the system can enable a user to interact with relevant virtual objects via a speech-enabled interface. The speech-enabled augmented reality system can identify visual objects in images and recognize virtual objects corresponding to the visual objects, determine one or more relevant objects from the virtual objects based on relevance factors. Once the interaction session is established, a user can further interact with the relevant virtual objects, notably through voice commands addressed to the object. Accordingly, the present subject matter can enable a natural and hands-free interaction between the user and any virtual object that the user is interested in.Type: ApplicationFiled: August 23, 2021Publication date: February 23, 2023Applicant: SoundHound, Inc.Inventors: Keyvan MOHAJER, Morris MICHAEL, Bernard MONT-REYNAUD
-
Publication number: 20230059765Abstract: A method and system for controlling a GUI on a user's network-connected device, the control being provided by a telephone call between the user and a speech recognition and speech synthesis system. An example of a restaurant ordering system is provided. The user calls a phone number and is guided through a verbal ordering process that includes one or more of: adding an item, deleting an item, changing quantities, changing sizes, and changing details of an item. The user's choices are added to a display so that a current status of the order is visible to the user. The GUI is updated as changes are made to the order. The GUI can also request additional information, upsell items, and show menus. The GUI aids the user in confirming that the order is correct. The system provides the final order to a restaurant for fulfillment.Type: ApplicationFiled: August 22, 2021Publication date: February 23, 2023Applicant: SoundHound, Inc.Inventors: Kamyar MOHAJER, Keyvan MOHAJER, James HOM, Evelyn JIANG
-
Patent number: 11589184Abstract: Methods and systems for intuitive spatial audio rendering with improved intelligibility are disclosed. By establishing a virtual association between an audio source and a location in the listener's virtual audio space, a spatial audio rendering system can generate spatial audio signals that create a natural and immersive audio field for a listener. The system can receive the virtual location of the source as a parameter and map the source audio signal to a source-specific multi-channel audio signal. In addition, the spatial audio rendering system can be interactive and dynamically modify the rendering of the spatial audio in response to a user's active control or tracked movement.Type: GrantFiled: March 21, 2022Date of Patent: February 21, 2023Assignee: SoundHound, IncInventor: Bernard Mont-Reynaud
-
Publication number: 20230010815Abstract: A method and system for implementing a speech-enabled interface of a host device via an electronic mobile device in a network are provided. The method includes establishing a communication session between the host device and the mobile device via a session service provider. According to some embodiments, a barcode can be adopted to enable the pairing of the host device and mobile device. Furthermore, the present method and system employ the voice interface in conjunction with speech recognition systems and natural language processing to interpret voice input for the hosting device, which can be used to perform one or more actions related to the hosting device.Type: ApplicationFiled: July 9, 2021Publication date: January 12, 2023Applicant: SoundHound, Inc.Inventor: Keisuke Tsuchida
-
Patent number: 11551083Abstract: Training and enhancement of neural network models, such as from private data, are described. A slave device receives a version of a neural network model from a master. The slave accesses a local and/or private data source and uses the data to perform optimization of the neural network model. This can be done such as by computing gradients or performing knowledge distillation to locally train an enhanced second version of the model. The slave sends the gradients or enhanced neural network model to a master. The master may use the gradient or second version of the model to improve a master model.Type: GrantFiled: December 17, 2019Date of Patent: January 10, 2023Assignee: SoundHound, Inc.Inventors: Zili Li, Asif Amirguliyev, Jonah Probell
-
Patent number: 11539920Abstract: A system and a method are disclosed that enable sidebar conversations between two or more attendees that are participating in a primary or main meeting. The sidebar conversation occurs in conjunction or concurrently with the primary meeting. A first attendee provides commands to indicate a desire to initiate a sidebar conversation and information about a targeted attendee. The commands are analyzed to determine if a trigger phrase is included. The commands are analyzed to determine if there is an identification of a second (targeted) attendee, who is currently participating in the main meeting. If the second attendee is available, then the sidebar conversation is initiated. Additional attendees can be added to the sidebar conversation.Type: GrantFiled: June 21, 2021Date of Patent: December 27, 2022Assignee: SoundHound, Inc.Inventor: Timothy P Stonehocker
-
Publication number: 20220405797Abstract: Ads are generated based on product info and consumer profiles. A discriminator evaluates probabilities of ads being effective at causing consumer engagement. A decoder extracts product info from generated ads. Based on the probabilities of ads being effective and similarity of extracted and source product info, generated ads are labeled as examples. The examples are used in training an improved ad generator. Ads may be visual and/or audio containing speech. Ads may even contain humor, as recognized by mismatches between source and decoded product info.Type: ApplicationFiled: August 18, 2022Publication date: December 22, 2022Applicant: SoundHound, Inc.Inventor: Jonah PROBELL
-
Publication number: 20220408059Abstract: A system and a method are disclosed that enable sidebar conversations between two or more attendees that are participating in a primary or main meeting. The sidebar conversation occurs in conjunction or concurrently with the primary meeting. A first attendee provides commands to indicate a desire to initiate a sidebar conversation and information about a targeted attendee. The commands are analyzed to determine if a trigger phrase is included. The commands are analyzed to determine if there is an identification of a second (targeted) attendee, who is currently participating in the main meeting. If the second attendee is available, then the sidebar conversation is initiated. Additional attendees can be added to the sidebar conversation.Type: ApplicationFiled: June 21, 2021Publication date: December 22, 2022Applicant: SoundHound, Inc.Inventor: Timothy P STONEHOCKER
-
Patent number: 11531819Abstract: Machine learned models take in vectors representing desired behaviors and generate voice vectors that provide the parameters for text-to-speech (TTS) synthesis. Models may be trained on behavior vectors that include user profile attributes, situational attributes, or semantic attributes. Situational attributes may include age of people present, music that is playing, location, noise, and mood. Semantic attributes may include presence of proper nouns, number of modifiers, emotional charge, and domain of discourse. TTS voice parameters may apply per utterance and per word as to enable contrastive emphasis.Type: GrantFiled: January 14, 2020Date of Patent: December 20, 2022Assignee: SoundHound, Inc.Inventors: Bernard Mont-Reynaud, Monika Almudafar-Depeyrot
-
Publication number: 20220382823Abstract: As audio (1) is input to an extension of a browser, the extension transmits the audio (1) to a language processing server. A speech recognition unit obtains a text (1) corresponding to the audio (1), and transmits the text (1) to a natural language understanding unit. In the natural language understanding unit, an information processing unit identifies a URL (1) corresponding to the text (1), and transmits the URL (1) to the browser. The extension passes the URL (1) to a browsing function. The browsing function uses the URL (1) to access a web server. The web server transmits a web page (1) corresponding to the URL (1) to the browser. The browsing function shows a screen corresponding to the web page (1) on a display.Type: ApplicationFiled: January 26, 2022Publication date: December 1, 2022Applicant: SoundHound, Inc.Inventors: Masaki NAITO, Keisuke TSUCHIDA, Jun YONEYAMA, Kaku SAWADA