Patents Examined by Bharatkumar S Shah
-
Patent number: 11769482Abstract: The present disclosure provides a method and apparatus of synthesizing a speech, a method and apparatus of training a speech synthesis model, an electronic device, and a storage medium. The method of synthesizing a speech includes acquiring a style information of a speech to be synthesized, a tone information of the speech to be synthesized, and a content information of a text to be processed; generating an acoustic feature information of the text to be processed, by using a pre-trained speech synthesis model, based on the style information, the tone information, and the content information of the text to be processed; and synthesizing the speech for the text to be processed, based on the acoustic feature information of the text to be processed.Type: GrantFiled: September 29, 2021Date of Patent: September 26, 2023Assignee: Beijing Baidu Netcom Science Technology Co., Ltd.Inventors: Wenfu Wang, Tao Sun, Xilei Wang, Junteng Zhang, Zhengkun Gao, Lei Jia
-
Patent number: 11763805Abstract: A speaker recognition method and apparatus receives a first voice signal of a speaker, generates a second voice signal by enhancing the first voice signal through speech enhancement, generates a multi-channel voice signal by associating the first voice signal with the second voice signal, and recognizes the speaker based on the multi-channel voice signal.Type: GrantFiled: May 27, 2022Date of Patent: September 19, 2023Assignee: Samsung Electronics Co., Ltd.Inventors: Sung-Jae Cho, Kyuhong Kim, Jaejoon Han
-
Patent number: 11741956Abstract: A system for generating a response to a customer query includes a computing device configured to obtain a first dataset, including a plurality of first phrase-intent pairs associated with a first domain. Each first phrase-intent pair includes a first phrase and a corresponding first intent. The computing device is configured to retrieve a set of configuration rules to configure a plurality of environments. The computing device is also configured to configure a first environment using the first dataset and the set of configuration rules to determine a result user intent based on a requested query associated with the first domain. The first environment embeds the plurality of first phrase-intent pairs in a vector space based on the set of configuration rules. The computing device is configured to perform operations based on the first environment.Type: GrantFiled: February 26, 2021Date of Patent: August 29, 2023Assignee: Walmart Apollo, LLCInventors: Simral Chaudhary, Deepa Mohan, Haoxuan Chen, Lakshmi Manasa Velaga, Snehasish Mukherjee, John Brian Moss, Jason Charles Benesch, Don Bambico
-
Patent number: 11735197Abstract: Systems and methods of the present disclosure are directed toward digital signal processing using machine-learned differentiable digital signal processors. For example, embodiments of the present disclosure may include differentiable digital signal processors within the training loop of a machine-learned model (e.g., for gradient-based training). Advantageously, systems and methods of the present disclosure provide high quality signal processing using smaller models than prior systems, thereby reducing energy costs (e.g., storage and/or processing costs) associated with performing digital signal processing.Type: GrantFiled: July 7, 2020Date of Patent: August 22, 2023Assignee: GOOGLE LLCInventors: Jesse Engel, Adam Roberts, Chenjie Gu, Lamtharn Hantrakul
-
Patent number: 11735186Abstract: A computer system configured to generate captions is provided. The computer system includes a memory and a processor coupled to the memory. The processor is configured to access a first buffer configured to store text generated by an automated speech recognition (ASR) process; access a second buffer configured to store text generated by a captioning client process; identify either the first buffer or the second buffer as a source buffer of caption text; generate caption text from the source buffer; and communicate the caption text to a target process.Type: GrantFiled: September 7, 2021Date of Patent: August 22, 2023Assignee: 3Play Media, Inc.Inventors: Roger S. Zimmerman, Christopher S. Antunes, Stephanie A. Laing, John W. Slocum, Nicholas R. Moutis, Theresa M. Kettelberger
-
Patent number: 11735159Abstract: A voice output device includes a voice output controller configured to determine, when a message reception unit receives a message, whether a start condition to be satisfied when a person intended to receive the message normally listens to voice in the predetermined space is satisfied, and cause a voice output unit to start voice output of the message when the start condition is satisfied and suspend voice output of the message when the start condition is not satisfied. The voice output is not immediately performed in response to a reception of a message but is performed only when the person intended to receive the message normally listens to the message, and the voice output of the message is suspended in other cases.Type: GrantFiled: May 25, 2021Date of Patent: August 22, 2023Assignee: ALPS ALPINE CO., LTD.Inventors: Hongda Zheng, Xiao Liu
-
Patent number: 11735172Abstract: A voice-based system is configured to process commands in a flexible format, for example, in which a wake word does not necessarily have to occur at the beginning of an utterance. As in natural speech, the system being addressed may be named within or at the end of a spoken utterance rather than at the beginning, or depending on the context, may not be named at all.Type: GrantFiled: April 26, 2021Date of Patent: August 22, 2023Assignee: Cerence Operating CompanyInventors: Bart D'hoore, Christoph Halboth, Holger Quast, Dino Seppi, Markus Funk, Tom Claes, Christophe Ris
-
Patent number: 11727916Abstract: A system for monitoring and improving social agent interaction quality includes a computing platform having processing hardware and a system memory storing a software code. The processing hardware is configured to execute the software code to receive, from a social agent, interaction data describing an interaction of the social agent with a user, and to perform an assessment of the interaction, using the interaction data, as one of successful or including a flaw. When the assessment indicates that the interaction includes the flaw, the processing hardware is further configured to execute the software code to identify an interaction strategy for correcting the flaw, and to deliver, to the social agent, one or both of the assessment and the interaction strategy to correct the flaw in the interaction.Type: GrantFiled: May 20, 2021Date of Patent: August 15, 2023Assignee: Disney Enterprises, Inc.Inventors: James R. Kennedy, Raymond J. Scanlon, Komath Naveen Kumar, Douglas A. Fidaleo
-
Patent number: 11727921Abstract: A method, a system, and a computer program product for executing intent classification based on user feedback in a digital assistant environment. Using a natural language processor, an audio input received from user is processed. At least one implicit feedback parameter is extracted from the processed audio input. The feedback parameter classifies an intent derived from the audio input received from the user. The extracted feedback parameter is stored in a replay memory. The replay memory stores labeled data associated with the audio input received from the user. Based on the processed audio input and the labeled data, an initial response to the received audio input is determined. Modeling of the extracted implicit feedback parameter and the stored labeled data is executed. An updated response to the received audio input is generated.Type: GrantFiled: March 29, 2021Date of Patent: August 15, 2023Assignee: SAP SEInventors: Sebastian Schuetz, Christian Pretzsch, Gil Katz
-
Patent number: 11721321Abstract: Systems and methods for identifying content corresponding to a language are provided. Language spoken by a first user based on verbal input received from the first user is automatically determined with voice recognition circuitry. A database of content sources is cross-referenced to identify a content source associated with a language field value that corresponds to the determined language spoken by the first user. The language field in the database identifies the language that the associated content source transmits content to a plurality of users. A representation of the identified content source is generated for display to the first user.Type: GrantFiled: August 23, 2021Date of Patent: August 8, 2023Assignee: Rovi Guides, Inc.Inventor: Shuchita Mehra
-
Patent number: 11721348Abstract: Encoding/decoding an audio signal having one or more audio components, wherein each audio component is associated with a spatial location. A first audio signal presentation (z) of the audio components, a first set of transform parameters (w(f)), and signal level data (?2) are encoded and transmitted to the decoder. The decoder uses the first set of transform parameters (w(f)) to form a reconstructed simulation input signal intended for an acoustic environment simulation, and applies a signal level modification (?) to the reconstructed simulation input signal. The signal level modification is based on the signal level data (?2) and data (p2) related to the acoustic environment simulation. The attenuated reconstructed simulation input signal is then processed in an acoustic environment simulator. With this process, the decoder does not need to determine the signal level of the simulation input signal, thereby reducing processing load.Type: GrantFiled: October 25, 2021Date of Patent: August 8, 2023Assignee: Dolby Laboratories Licensing CorporationInventor: Dirk Jeroen Breebaart
-
Patent number: 11721324Abstract: A computer-implemented method, system and computer program product for providing high quality speech recognition. A first speech-to-text model is selected to perform speech recognition of a customer's spoken words and a second speech-to-text model is selected to perform speech recognition of the agent's spoken words during a call. The combined results of the speech-to-text models used to process the customer's and agent's spoken words are then analyzed to generate a reference speech-to-text result. The customer speech data that was processed by the first speech-to-text model is reprocessed by multiple other speech-to-text models. A similarity analysis is performed on the results of these speech-to-text models with respect to the reference speech-to-text result resulting in similarity scores being assigned to these speech-to-text models.Type: GrantFiled: June 9, 2021Date of Patent: August 8, 2023Assignee: International Business Machines CorporationInventors: Yuan Jin, Xi Xi Liu, Li ping Wang, Fan Xiao Xin, Zheng Ping Chu
-
Patent number: 11710500Abstract: An abnormality degree calculation system includes: a feature amount vector extraction unit configured to generate and output a feature amount vector from an input signal originating from vibration of a target device; an encoding unit configured to receive as an input a set composed of the feature amount vector and a device type vector representing a type of the target device and output an encoding vector; a decoding unit configured receive as an input the encoding vector and the device type vector and output a decoding vector; a learning unit configured to learn parameters of the neural networks of the encoding unit and the decoding unit; and an abnormality degree calculation unit configured to calculate a degree of abnormality defined as a function of the feature amount vector from the feature amount vector extraction unit, the encoding vector from the encoding unit, and the decoding vector from the decoding unit.Type: GrantFiled: March 18, 2021Date of Patent: July 25, 2023Assignee: HITACHI, LTD.Inventor: Yohei Kawaguchi
-
Patent number: 11705147Abstract: Systems, methods and computer-readable media are provided for speech enhancement using a hybrid neural network. An example process can include receiving, by a first neural network portion of the hybrid neural network, audio data and reference data, the audio data including speech data, noise data, and echo data; filtering, by the first neural network portion, a portion of the audio data based on adapted coefficients of the first neural network portion, the portion of the audio data including the noise data and/or echo data; based on the filtering, generating, by the first neural network portion, filtered audio data including the speech data and an unfiltered portion of the noise data and/or echo data; and based on the filtered audio data and the reference data, extracting, by a second neural network portion of the hybrid neural network, the speech data from the filtered audio data.Type: GrantFiled: April 28, 2021Date of Patent: July 18, 2023Assignee: QUALCOMM IncorporatedInventors: Erik Visser, Vahid Montazeri, Shuhua Zhang, Lae-Hoon Kim
-
Patent number: 11699443Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for detecting hotwords using a server. One of the methods includes receiving an audio signal encoding one or more utterances including a first utterance; determining whether at least a portion of the first utterance satisfies a first threshold of being at least a portion of a key phrase; in response to determining that at least the portion of the first utterance satisfies the first threshold of being at least a portion of a key phrase, sending the audio signal to a server system that determines whether the first utterance satisfies a second threshold of being the key phrase, the second threshold being more restrictive than the first threshold; and receiving tagged text data representing the one or more utterances encoded in the audio signal when the server system determines that the first utterance satisfies the second threshold.Type: GrantFiled: June 2, 2021Date of Patent: July 11, 2023Assignee: GOOGLE LLCInventors: Alexander H. Gruenstein, Petar Aleksic, Johan Schalkwyk, Pedro J. Moreno Mengibar
-
Patent number: 11699043Abstract: A method may include obtaining audio of a communication session between a first device of a first user and a second device of a second user. The method may further include obtaining a transcription of second speech of the second user. The method may also include identifying one or more first sound characteristics of first speech of the first user. The method may also include identifying one or more first words indicating a lack of understanding in the first speech. The method may further include determining an experienced emotion of the first user based on the one or more first sound characteristics. The method may also include determining an accuracy of the transcription of the second speech based on the experienced emotion and the one or more first words.Type: GrantFiled: September 29, 2021Date of Patent: July 11, 2023Assignee: Sorenson IP Holdings, LLCInventor: Scott Boekweg
-
Patent number: 11694674Abstract: Techniques for performing text-to-speech are described. An exemplary method includes receiving a request to generate audio from input text; generating audio from the input text by: generating a first number of vectors from phoneme embeddings representing the input text, predicting one or more spectrograms having the first number of frames using multiple scales wherein a coarser scale influences a finer scale, concatenating the first number of vectors and the predicted one or more spectrograms, generating at least one mel spectrogram from the concatenated vectors and the predicted one or more spectrograms, and converting, with a vocoder, the at least one mel spectrogram frames to audio; and outputting the generated audio according to the request.Type: GrantFiled: May 26, 2021Date of Patent: July 4, 2023Assignee: Amazon Technologies, Inc.Inventors: Syed Ammar Abbas, Bajibabu Bollepalli, Alexis Pierre Moinet, Thomas Renaud Drugman, Arnaud Vincent Pierre Yves Joly, Panagiota Karanasou, Sri Vishnu Kumar Karlapati, Simon Slangen, Petr Makarov
-
Patent number: 11682384Abstract: A method for training an alarm system to classify audio of an event, wherein the alarm system is connected to a neural network trained to classify audio as an event type, the method comprising the steps of: receiving audio recorded during a first period of time; transmitting the audio to an external unit; receiving data from the external unit indicating a sub-period of time of the audio and data indicating an event type of the indicated sub-period of time of the audio; and re-training the neural network by inputting a sub-period of the audio corresponding to the indicated sub-period of time of the audio and using the indicated event type as a correct classification of the sub-period of the audio.Type: GrantFiled: February 23, 2021Date of Patent: June 20, 2023Assignee: Axis ABInventors: Ingemar Larsson, Daniel Andersson
-
Patent number: 11676580Abstract: An electronic device is provided. The electronic device includes a microphone, and at least one processor operatively connected to the microphone, wherein the at least one processor may include a buffer memory configured to store a first feature vector for a first voice signal obtained from the microphone as an inverse value, and an operation circuit configured to perform a norm operation for a first feature vector and a second feature vector, based on the second feature vector, based on a second voice signal streamed from the microphone and an inverse value of the first feature vector stored in the buffer memory, or calculate a similarity between the first feature vector and the second feature vector. In addition, various embodiments identified through the specification are possible.Type: GrantFiled: April 30, 2021Date of Patent: June 13, 2023Assignee: Samsung Electronics Co., Ltd.Inventors: Hyunbin Park, Jin Choi
-
Patent number: 11676595Abstract: A reception apparatus, including processing circuitry that is configured to receive a voice command related to content from a user during presentation of the content to the user. The processing circuitry is configured to transmit the voice command to a server system for processing. The processing circuitry is configured to receive a response to the voice command from the server system. The response to the voice command is generated based on the voice command and content information for identifying the content related to the voice command.Type: GrantFiled: December 29, 2020Date of Patent: June 13, 2023Assignee: SATURN LICENSING LLCInventor: Tatsuya Igarashi