Patents Examined by Bharatkumar S Shah

Method and apparatus of synthesizing speech, method and apparatus of training speech synthesis model, electronic device, and storage medium

Patent number: 11769482

Abstract: The present disclosure provides a method and apparatus of synthesizing a speech, a method and apparatus of training a speech synthesis model, an electronic device, and a storage medium. The method of synthesizing a speech includes acquiring a style information of a speech to be synthesized, a tone information of the speech to be synthesized, and a content information of a text to be processed; generating an acoustic feature information of the text to be processed, by using a pre-trained speech synthesis model, based on the style information, the tone information, and the content information of the text to be processed; and synthesizing the speech for the text to be processed, based on the acoustic feature information of the text to be processed.

Type: Grant

Filed: September 29, 2021

Date of Patent: September 26, 2023

Assignee: Beijing Baidu Netcom Science Technology Co., Ltd.

Inventors: Wenfu Wang, Tao Sun, Xilei Wang, Junteng Zhang, Zhengkun Gao, Lei Jia
Speaker recognition method and apparatus

Patent number: 11763805

Abstract: A speaker recognition method and apparatus receives a first voice signal of a speaker, generates a second voice signal by enhancing the first voice signal through speech enhancement, generates a multi-channel voice signal by associating the first voice signal with the second voice signal, and recognizes the speaker based on the multi-channel voice signal.

Type: Grant

Filed: May 27, 2022

Date of Patent: September 19, 2023

Assignee: Samsung Electronics Co., Ltd.

Inventors: Sung-Jae Cho, Kyuhong Kim, Jaejoon Han
Methods and apparatus for intent recognition

Patent number: 11741956

Abstract: A system for generating a response to a customer query includes a computing device configured to obtain a first dataset, including a plurality of first phrase-intent pairs associated with a first domain. Each first phrase-intent pair includes a first phrase and a corresponding first intent. The computing device is configured to retrieve a set of configuration rules to configure a plurality of environments. The computing device is also configured to configure a first environment using the first dataset and the set of configuration rules to determine a result user intent based on a requested query associated with the first domain. The first environment embeds the plurality of first phrase-intent pairs in a vector space based on the set of configuration rules. The computing device is configured to perform operations based on the first environment.

Type: Grant

Filed: February 26, 2021

Date of Patent: August 29, 2023

Assignee: Walmart Apollo, LLC

Inventors: Simral Chaudhary, Deepa Mohan, Haoxuan Chen, Lakshmi Manasa Velaga, Snehasish Mukherjee, John Brian Moss, Jason Charles Benesch, Don Bambico
Machine-learned differentiable digital signal processing

Patent number: 11735197

Abstract: Systems and methods of the present disclosure are directed toward digital signal processing using machine-learned differentiable digital signal processors. For example, embodiments of the present disclosure may include differentiable digital signal processors within the training loop of a machine-learned model (e.g., for gradient-based training). Advantageously, systems and methods of the present disclosure provide high quality signal processing using smaller models than prior systems, thereby reducing energy costs (e.g., storage and/or processing costs) associated with performing digital signal processing.

Type: Grant

Filed: July 7, 2020

Date of Patent: August 22, 2023

Assignee: GOOGLE LLC

Inventors: Jesse Engel, Adam Roberts, Chenjie Gu, Lamtharn Hantrakul
Hybrid live captioning systems and methods

Patent number: 11735186

Abstract: A computer system configured to generate captions is provided. The computer system includes a memory and a processor coupled to the memory. The processor is configured to access a first buffer configured to store text generated by an automated speech recognition (ASR) process; access a second buffer configured to store text generated by a captioning client process; identify either the first buffer or the second buffer as a source buffer of caption text; generate caption text from the source buffer; and communicate the caption text to a target process.

Type: Grant

Filed: September 7, 2021

Date of Patent: August 22, 2023

Assignee: 3Play Media, Inc.

Inventors: Roger S. Zimmerman, Christopher S. Antunes, Stephanie A. Laing, John W. Slocum, Nicholas R. Moutis, Theresa M. Kettelberger
Voice output device and voice output method

Patent number: 11735159

Abstract: A voice output device includes a voice output controller configured to determine, when a message reception unit receives a message, whether a start condition to be satisfied when a person intended to receive the message normally listens to voice in the predetermined space is satisfied, and cause a voice output unit to start voice output of the message when the start condition is satisfied and suspend voice output of the message when the start condition is not satisfied. The voice output is not immediately performed in response to a reception of a message but is performed only when the person intended to receive the message normally listens to the message, and the voice output of the message is suspended in other cases.

Type: Grant

Filed: May 25, 2021

Date of Patent: August 22, 2023

Assignee: ALPS ALPINE CO., LTD.

Inventors: Hongda Zheng, Xiao Liu
Flexible-format voice command

Patent number: 11735172

Abstract: A voice-based system is configured to process commands in a flexible format, for example, in which a wake word does not necessarily have to occur at the beginning of an utterance. As in natural speech, the system being addressed may be named within or at the end of a spoken utterance rather than at the beginning, or depending on the context, may not be named at all.

Type: Grant

Filed: April 26, 2021

Date of Patent: August 22, 2023

Assignee: Cerence Operating Company

Inventors: Bart D'hoore, Christoph Halboth, Holger Quast, Dino Seppi, Markus Funk, Tom Claes, Christophe Ris
Automated social agent interaction quality monitoring and improvement

Patent number: 11727916

Abstract: A system for monitoring and improving social agent interaction quality includes a computing platform having processing hardware and a system memory storing a software code. The processing hardware is configured to execute the software code to receive, from a social agent, interaction data describing an interaction of the social agent with a user, and to perform an assessment of the interaction, using the interaction data, as one of successful or including a flaw. When the assessment indicates that the interaction includes the flaw, the processing hardware is further configured to execute the software code to identify an interaction strategy for correcting the flaw, and to deliver, to the social agent, one or both of the assessment and the interaction strategy to correct the flaw in the interaction.

Type: Grant

Filed: May 20, 2021

Date of Patent: August 15, 2023

Assignee: Disney Enterprises, Inc.

Inventors: James R. Kennedy, Raymond J. Scanlon, Komath Naveen Kumar, Douglas A. Fidaleo
Self-improving intent classification

Patent number: 11727921

Abstract: A method, a system, and a computer program product for executing intent classification based on user feedback in a digital assistant environment. Using a natural language processor, an audio input received from user is processed. At least one implicit feedback parameter is extracted from the processed audio input. The feedback parameter classifies an intent derived from the audio input received from the user. The extracted feedback parameter is stored in a replay memory. The replay memory stores labeled data associated with the audio input received from the user. Based on the processed audio input and the labeled data, an initial response to the received audio input is determined. Modeling of the extracted implicit feedback parameter and the stored labeled data is executed. An updated response to the received audio input is generated.

Type: Grant

Filed: March 29, 2021

Date of Patent: August 15, 2023

Assignee: SAP SE

Inventors: Sebastian Schuetz, Christian Pretzsch, Gil Katz
Systems and methods for identifying content corresponding to a language spoken in a household

Patent number: 11721321

Abstract: Systems and methods for identifying content corresponding to a language are provided. Language spoken by a first user based on verbal input received from the first user is automatically determined with voice recognition circuitry. A database of content sources is cross-referenced to identify a content source associated with a language field value that corresponds to the determined language spoken by the first user. The language field in the database identifies the language that the associated content source transmits content to a plurality of users. A representation of the identified content source is generated for display to the first user.

Type: Grant

Filed: August 23, 2021

Date of Patent: August 8, 2023

Assignee: Rovi Guides, Inc.

Inventor: Shuchita Mehra
Acoustic environment simulation

Patent number: 11721348

Abstract: Encoding/decoding an audio signal having one or more audio components, wherein each audio component is associated with a spatial location. A first audio signal presentation (z) of the audio components, a first set of transform parameters (w(f)), and signal level data (?2) are encoded and transmitted to the decoder. The decoder uses the first set of transform parameters (w(f)) to form a reconstructed simulation input signal intended for an acoustic environment simulation, and applies a signal level modification (?) to the reconstructed simulation input signal. The signal level modification is based on the signal level data (?2) and data (p2) related to the acoustic environment simulation. The attenuated reconstructed simulation input signal is then processed in an acoustic environment simulator. With this process, the decoder does not need to determine the signal level of the simulation input signal, thereby reducing processing load.

Type: Grant

Filed: October 25, 2021

Date of Patent: August 8, 2023

Assignee: Dolby Laboratories Licensing Corporation

Inventor: Dirk Jeroen Breebaart
Providing high quality speech recognition

Patent number: 11721324

Abstract: A computer-implemented method, system and computer program product for providing high quality speech recognition. A first speech-to-text model is selected to perform speech recognition of a customer's spoken words and a second speech-to-text model is selected to perform speech recognition of the agent's spoken words during a call. The combined results of the speech-to-text models used to process the customer's and agent's spoken words are then analyzed to generate a reference speech-to-text result. The customer speech data that was processed by the first speech-to-text model is reprocessed by multiple other speech-to-text models. A similarity analysis is performed on the results of these speech-to-text models with respect to the reference speech-to-text result resulting in similarity scores being assigned to these speech-to-text models.

Type: Grant

Filed: June 9, 2021

Date of Patent: August 8, 2023

Assignee: International Business Machines Corporation

Inventors: Yuan Jin, Xi Xi Liu, Li ping Wang, Fan Xiao Xin, Zheng Ping Chu
Abnormality degree calculation system and method

Patent number: 11710500

Abstract: An abnormality degree calculation system includes: a feature amount vector extraction unit configured to generate and output a feature amount vector from an input signal originating from vibration of a target device; an encoding unit configured to receive as an input a set composed of the feature amount vector and a device type vector representing a type of the target device and output an encoding vector; a decoding unit configured receive as an input the encoding vector and the device type vector and output a decoding vector; a learning unit configured to learn parameters of the neural networks of the encoding unit and the decoding unit; and an abnormality degree calculation unit configured to calculate a degree of abnormality defined as a function of the feature amount vector from the feature amount vector extraction unit, the encoding vector from the encoding unit, and the decoding vector from the decoding unit.

Type: Grant

Filed: March 18, 2021

Date of Patent: July 25, 2023

Assignee: HITACHI, LTD.

Inventor: Yohei Kawaguchi
Mixed adaptive and fixed coefficient neural networks for speech enhancement

Patent number: 11705147

Abstract: Systems, methods and computer-readable media are provided for speech enhancement using a hybrid neural network. An example process can include receiving, by a first neural network portion of the hybrid neural network, audio data and reference data, the audio data including speech data, noise data, and echo data; filtering, by the first neural network portion, a portion of the audio data based on adapted coefficients of the first neural network portion, the portion of the audio data including the noise data and/or echo data; based on the filtering, generating, by the first neural network portion, filtered audio data including the speech data and an unfiltered portion of the noise data and/or echo data; and based on the filtered audio data and the reference data, extracting, by a second neural network portion of the hybrid neural network, the speech data from the filtered audio data.

Type: Grant

Filed: April 28, 2021

Date of Patent: July 18, 2023

Assignee: QUALCOMM Incorporated

Inventors: Erik Visser, Vahid Montazeri, Shuhua Zhang, Lae-Hoon Kim
Server side hotwording

Patent number: 11699443

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for detecting hotwords using a server. One of the methods includes receiving an audio signal encoding one or more utterances including a first utterance; determining whether at least a portion of the first utterance satisfies a first threshold of being at least a portion of a key phrase; in response to determining that at least the portion of the first utterance satisfies the first threshold of being at least a portion of a key phrase, sending the audio signal to a server system that determines whether the first utterance satisfies a second threshold of being the key phrase, the second threshold being more restrictive than the first threshold; and receiving tagged text data representing the one or more utterances encoded in the audio signal when the server system determines that the first utterance satisfies the second threshold.

Type: Grant

Filed: June 2, 2021

Date of Patent: July 11, 2023

Assignee: GOOGLE LLC

Inventors: Alexander H. Gruenstein, Petar Aleksic, Johan Schalkwyk, Pedro J. Moreno Mengibar
Determination of transcription accuracy

Patent number: 11699043

Abstract: A method may include obtaining audio of a communication session between a first device of a first user and a second device of a second user. The method may further include obtaining a transcription of second speech of the second user. The method may also include identifying one or more first sound characteristics of first speech of the first user. The method may also include identifying one or more first words indicating a lack of understanding in the first speech. The method may further include determining an experienced emotion of the first user based on the one or more first sound characteristics. The method may also include determining an accuracy of the transcription of the second speech based on the experienced emotion and the one or more first words.

Type: Grant

Filed: September 29, 2021

Date of Patent: July 11, 2023

Assignee: Sorenson IP Holdings, LLC

Inventor: Scott Boekweg
Multi-scale spectrogram text-to-speech

Patent number: 11694674

Abstract: Techniques for performing text-to-speech are described. An exemplary method includes receiving a request to generate audio from input text; generating audio from the input text by: generating a first number of vectors from phoneme embeddings representing the input text, predicting one or more spectrograms having the first number of frames using multiple scales wherein a coarser scale influences a finer scale, concatenating the first number of vectors and the predicted one or more spectrograms, generating at least one mel spectrogram from the concatenated vectors and the predicted one or more spectrograms, and converting, with a vocoder, the at least one mel spectrogram frames to audio; and outputting the generated audio according to the request.

Type: Grant

Filed: May 26, 2021

Date of Patent: July 4, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Syed Ammar Abbas, Bajibabu Bollepalli, Alexis Pierre Moinet, Thomas Renaud Drugman, Arnaud Vincent Pierre Yves Joly, Panagiota Karanasou, Sri Vishnu Kumar Karlapati, Simon Slangen, Petr Makarov
Method, software, and device for training an alarm system to classify audio of an event

Patent number: 11682384

Abstract: A method for training an alarm system to classify audio of an event, wherein the alarm system is connected to a neural network trained to classify audio as an event type, the method comprising the steps of: receiving audio recorded during a first period of time; transmitting the audio to an external unit; receiving data from the external unit indicating a sub-period of time of the audio and data indicating an event type of the indicated sub-period of time of the audio; and re-training the neural network by inputting a sub-period of the audio corresponding to the indicated sub-period of time of the audio and using the indicated event type as a correct classification of the sub-period of the audio.

Type: Grant

Filed: February 23, 2021

Date of Patent: June 20, 2023

Assignee: Axis AB

Inventors: Ingemar Larsson, Daniel Andersson
Electronic device for processing user utterance and controlling method thereof

Patent number: 11676580

Abstract: An electronic device is provided. The electronic device includes a microphone, and at least one processor operatively connected to the microphone, wherein the at least one processor may include a buffer memory configured to store a first feature vector for a first voice signal obtained from the microphone as an inverse value, and an operation circuit configured to perform a norm operation for a first feature vector and a second feature vector, based on the second feature vector, based on a second voice signal streamed from the microphone and an inverse value of the first feature vector stored in the buffer memory, or calculate a similarity between the first feature vector and the second feature vector. In addition, various embodiments identified through the specification are possible.

Type: Grant

Filed: April 30, 2021

Date of Patent: June 13, 2023

Assignee: Samsung Electronics Co., Ltd.

Inventors: Hyunbin Park, Jin Choi
Information processing device, reception device, and information processing method

Patent number: 11676595

Abstract: A reception apparatus, including processing circuitry that is configured to receive a voice command related to content from a user during presentation of the content to the user. The processing circuitry is configured to transmit the voice command to a server system for processing. The processing circuitry is configured to receive a response to the voice command from the server system. The response to the voice command is generated based on the voice command and content information for identifying the content related to the voice command.

Type: Grant

Filed: December 29, 2020

Date of Patent: June 13, 2023

Assignee: SATURN LICENSING LLC

Inventor: Tatsuya Igarashi

prev 1 2 3 4 5 6 7 … next