Patents Examined by Bharatkumar S Shah

Speech synthesis method and system

Patent number: 11842722

Abstract: Disclosed is a speech synthesis method including: acquiring fundamental frequency information and acoustic feature information from original speech; generating an impulse train from the fundamental frequency information, and inputting it to a harmonic time-varying filter; inputting the acoustic feature information into a neural network filter estimator to obtain corresponding impulse response information; generating noise signal by a noise generator; determining, by the harmonic time-varying filter, harmonic component information through filtering processing on the impulse train and the impulse response information; determining, by a noise time-varying filter, noise component information based on the impulse response information and the noise; and generating a synthesized speech from the harmonic component information and the noise component information.

Type: Grant

Filed: June 9, 2021

Date of Patent: December 12, 2023

Assignee: AI SPEECH CO., LTD.

Inventors: Kai Yu, Zhijun Liu, Kuan Chen
Distinguishing user speech from background speech in speech-dense environments

Patent number: 11837253

Abstract: A device, system, and method whereby a speech-driven system can distinguish speech obtained from users of the system from other speech spoken by background persons, as well as from background speech from public address systems. In one aspect, the present system and method prepares, in advance of field-use, a voice-data file which is created in a training environment. The training environment exhibits both desired user speech and unwanted background speech, including unwanted speech from persons other than a user and also speech from a PA system. The speech recognition system is trained or otherwise programmed to identify wanted user speech which may be spoken concurrently with the background sounds. In an embodiment, during the pre-field-use phase the training or programming may be accomplished by having persons who are training listeners audit the pre-recorded sounds to identify the desired user speech. A processor-based learning system is trained to duplicate the assessments made by the human listeners.

Type: Grant

Filed: September 28, 2021

Date of Patent: December 5, 2023

Assignee: VOCOLLECT, INC.

Inventor: David D. Hardek
Method for enhancing quality of audio data, and device using the same

Patent number: 11830513

Abstract: Provided is a method of enhancing quality of audio data which comprise obtaining a spectrum of mixed audio data including noise, inputting two-dimensional (2D) input data corresponding to the spectrum to a convolutional network including a downsampling process and an upsampling process to obtain output data of the convolutional network, generating a mask for removing noise included in the audio data based on the obtained output data and removing noise from the mixed audio data using the generated mask, wherein, in the convolutional network, the downsampling process and the upsampling process are performed on a first axis of the 2D input data, and remaining processes other than the downsampling process and the upsampling process are performed on the first axis and a second axis.

Type: Grant

Filed: November 20, 2020

Date of Patent: November 28, 2023

Assignees: DEEPHEARING INC., The Industry & Academic Cooperation in Chungnam National University (IAC)

Inventors: Kanghun Ahn, Sungwon Kim
Computer implemented method and an apparatus for silence detection in speech recognition

Patent number: 11830522

Abstract: A computer implemented method for speech recognition from an audio signal includes: obtaining initial values for silence detection parameters including: a lead period; a threshold amplitude; and a terminal period. Detect an amplitude of the audio signal at a first time T1 of the audio signal. Optionally adjusting the threshold amplitude based on the detected amplitude. Starting the speech recognition from a second time T2 of the audio signal. Starting silence detection from the audio signal when the lead period has elapsed after the second time T2 including: responsive to detecting an amplitude below the threshold amplitude for a duration of the terminal period, terminating the speech recognition and the silence detection at a third time T3 of the audio signal and adjusting the silence detection parameters based on the detected amplitude changes of the audio signal between the first time T1 and the third time T3.

Type: Grant

Filed: December 14, 2021

Date of Patent: November 28, 2023

Assignee: Elisa Oyj

Inventors: Ville Ruutu, Jussi Ruutu
Voice recognition method and apparatus, and air conditioner

Patent number: 11830479

Abstract: Provided is a voice recognition method and a voice recognition apparatus, and an air conditioner. The method includes: acquiring first voice data; adjusting, according to the first voice data, a collection state of second voice data to obtain an adjusted collection state, and acquiring the second voice data based on the adjusted collection state; and performing far-field voice recognition on the second voice data using a preset far-field voice recognition model so as to obtain semantic information corresponding to the acquired second voice data. The application can solve the problem in which far-field voice recognition performance is poor when a deep learning method or a microphone array method is used to remove reverberation and noise from far-field voice data, thereby enhancing far-field voice recognition performance.

Type: Grant

Filed: August 20, 2021

Date of Patent: November 28, 2023

Assignee: GREE ELECTRIC APPLIANCES, INC. OF ZHUHAI

Inventors: Mingjie Li, Dechao Song, Jutao Jia, Wei Wu, Junjie Xie
Improving speech recognition with speech synthesis-based model adapation

Patent number: 11823697

Abstract: A method for training a speech recognition model includes obtaining sample utterances of synthesized speech in a target domain, obtaining transcribed utterances of non-synthetic speech in the target domain, and pre-training the speech recognition model on the sample utterances of synthesized speech in the target domain to attain an initial state for warm-start training. After pre-training the speech recognition model, the method also includes warm-start training the speech recognition model on the transcribed utterances of non-synthetic speech in the target domain to teach the speech recognition model to learn to recognize real/human speech in the target domain.

Type: Grant

Filed: August 20, 2021

Date of Patent: November 21, 2023

Assignee: Google LLC

Inventors: Andrew Rosenberg, Bhuvana Ramabhadran
GAN-based speech synthesis model and training method

Patent number: 11817079

Abstract: The present disclosure provides a GAN-based speech synthesis model, a training method, and a speech synthesis method. According to the speech synthesis method, to-be-converted text is obtained and is converted into a text phoneme, the text phoneme is further digitized to obtain text data, and the text data is converted into a text vector to be input into a speech synthesis model. In this way, target audio corresponding to the to-be-converted text is obtained. When a target Mel-frequency spectrum is generated by using a trained generator, accuracy of the generated target Mel-frequency spectrum can reach that of a standard Mel-frequency spectrum. Through constant adversary between the generator and a discriminator and the trainings thereof, acoustic losses of the target Mel-frequency spectrum are reduced, and acoustic losses of the target audio generated based on the target Mel-frequency spectrum are also reduced, thereby improving accuracy of audio synthesized from speech.

Type: Grant

Filed: June 16, 2023

Date of Patent: November 14, 2023

Assignee: NANJING SILICON INTELLIGENCE TECHNOLOGY CO., LTD.

Inventors: Huapeng Sima, Zhiqiang Mao
Providing emotional care in a session

Patent number: 11810337

Abstract: A method and apparatus for providing emotional care in a session between a user and conversational agent. A first group of images comprising one or more images associated with the user may be received in the session. A user profile may be obtained. A first group of textual descriptions may be generated from the first group of images based at least on emotion information in the user profile. A first memory record may be created based at least on the first group of images and the first group of textual descriptions. A second group of images may be received in the session to generate a second group of textual descriptions from the second group of images based at least on the emotion information in the user profile. A second memory record may be created based at least on the second group of images and the second group of textual descriptions.

Type: Grant

Filed: May 24, 2022

Date of Patent: November 7, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Xianchao Wu, Daigo Hamura, Yongdong Wang
Generating training datasets for a supervised learning topic model from outputs of a discovery topic model

Patent number: 11804216

Abstract: Systems and methods for generating training data for a supervised topic modeling system from outputs of a topic discovery model are described herein. In an embodiment, a system receives a plurality of digitally stored call transcripts and, using a topic model, generates an output which identifies a plurality of topics represented in the plurality of digitally stored call transcripts. Using the output of the topic model, the system generates an input dataset for a supervised learning model by identify a first subset of the plurality of digitally stored call transcripts that include the particular topic, storing a positive value for the first subset, identifying a second subset that do not include the particular topic, and storing a negative value for the second subset. The input training dataset is then used to train a supervised learning model.

Type: Grant

Filed: August 3, 2022

Date of Patent: October 31, 2023

Assignee: Invoca, Inc.

Inventors: Michael McCourt, Anoop Praturu
Devices and methods for a speech-based user interface

Patent number: 11798526

Abstract: A device may identify a plurality of sources for outputs that the device is configured to provide. The plurality of sources may include at least one of a particular application in the device, an operating system of the device, a particular area within a display of the device, or a particular graphical user interface object. The device may also assign a set of distinct voices to respective sources of the plurality of sources. The device may also receive a request for speech output. The device may also select a particular source that is associated with the requested speech output. The device may also generate speech having particular voice characteristics of a particular voice assigned to the particular source.

Type: Grant

Filed: March 1, 2022

Date of Patent: October 24, 2023

Assignee: Google LLC

Inventors: Ioannis Agiomyrgiannakis, Fergus James Henderson
Communicating apparatus, communication method, and storage medium storing program

Patent number: 11792632

Abstract: A communication method includes establishing a wireless connection between an image forming apparatus and an external access point by a wireless communication unit, establishing a direct wireless connection between the image forming apparatus and a communication partner apparatus by the wireless communication unit without the external access point, concurrently maintaining the wireless connection and the direct wireless connection with each other, and performing print processing based on data received by the wireless communication unit. The external access point is searched in a case where the wireless connection between the image forming apparatus and the external access point is disconnected, the direct wireless connection between the image forming apparatus and the communication partner apparatus is maintained, while the external access point is searched, and the external access point is external to the image forming apparatus and is external to the communication partner apparatus.

Type: Grant

Filed: May 10, 2022

Date of Patent: October 17, 2023

Assignee: CANON KABUSHIKI KAISHA

Inventor: Atsushi Shimazaki
Multimodality in digital assistant systems

Patent number: 11783815

Abstract: Systems and processes for operating an intelligent automated assistant are provided. An example process for determining user intent includes receiving a natural language input and detecting an event. The process further includes, determining, at a first time, based on the natural language input, a first value for a first node of a parsing structure; and determining, at a second time, based on the detected data event, a second value for a second node of the parsing structure. The process further includes in accordance with a determination that the first time and the second time are within the predetermined time: determining, using the parsing structure, the first value, and the second value, a user intent associated with the natural language input; initiating a task based on the determined intent; and providing an output indicative of the task.

Type: Grant

Filed: April 28, 2022

Date of Patent: October 10, 2023

Assignee: Apple Inc.

Inventors: Pierre P. Greborio, Didier Rene Guzzoni, Philippe P. Piernot
Audio processing apparatus and method for audio scene classification

Patent number: 11776532

Abstract: The disclosure relates to an audio processing apparatus (200) configured to classify an audio signal into one or more audio scene classes, the audio signal comprising a component signal. The apparatus (200) comprises: processing circuitry configured to classify the component signal of the audio signal as a foreground layer component signal or a background layer component signal; obtain an audio signal feature on the basis of the audio signal; select, depending on the classification of the component signal, a first set of weights or a second set of weights; and to classify the audio signal on the basis of the audio signal features, the foreground layer component signal or the background layer component signal and the selected set of weights.

Type: Grant

Filed: June 15, 2021

Date of Patent: October 3, 2023

Assignee: HUAWEI TECHNOLOGIES CO., LTD.

Inventors: Yesenia Lacouture Parodi, Florian Eyben, Andrea Crespi, Jun Deng
Chaos testing for voice enabled devices

Patent number: 11769484

Abstract: Computer-implemented methods, computer program products, and computer systems for testing a voice assistant device may include one or more processors configured for receiving test data from a database, wherein the test data may include a first set of coding parameters and a first user utterance having an expected device response. Further, the one or more processors may be configured for generating a first modified user utterance by applying the first set of coding parameters to the first user utterance, wherein the first modified user utterance is acoustically different than the first user utterance. The one or more processors may be configured for audibly presenting the first modified user utterance to a voice assistant device, receiving a first device response from the voice assistant device, and determining whether the first voice assistant response is substantially similar to the expected device response.

Type: Grant

Filed: September 11, 2020

Date of Patent: September 26, 2023

Assignee: International Business Machines Corporation

Inventors: Vijay Kumar Ananthapur Bache, Pradeep Raj Jayarathanasamy, Srithar Rajan Thangaraj, Arvind Rangarajan
Method and apparatus of synthesizing speech, method and apparatus of training speech synthesis model, electronic device, and storage medium

Patent number: 11769482

Abstract: The present disclosure provides a method and apparatus of synthesizing a speech, a method and apparatus of training a speech synthesis model, an electronic device, and a storage medium. The method of synthesizing a speech includes acquiring a style information of a speech to be synthesized, a tone information of the speech to be synthesized, and a content information of a text to be processed; generating an acoustic feature information of the text to be processed, by using a pre-trained speech synthesis model, based on the style information, the tone information, and the content information of the text to be processed; and synthesizing the speech for the text to be processed, based on the acoustic feature information of the text to be processed.

Type: Grant

Filed: September 29, 2021

Date of Patent: September 26, 2023

Assignee: Beijing Baidu Netcom Science Technology Co., Ltd.

Inventors: Wenfu Wang, Tao Sun, Xilei Wang, Junteng Zhang, Zhengkun Gao, Lei Jia
Speaker recognition method and apparatus

Patent number: 11763805

Abstract: A speaker recognition method and apparatus receives a first voice signal of a speaker, generates a second voice signal by enhancing the first voice signal through speech enhancement, generates a multi-channel voice signal by associating the first voice signal with the second voice signal, and recognizes the speaker based on the multi-channel voice signal.

Type: Grant

Filed: May 27, 2022

Date of Patent: September 19, 2023

Assignee: Samsung Electronics Co., Ltd.

Inventors: Sung-Jae Cho, Kyuhong Kim, Jaejoon Han
Methods and apparatus for intent recognition

Patent number: 11741956

Abstract: A system for generating a response to a customer query includes a computing device configured to obtain a first dataset, including a plurality of first phrase-intent pairs associated with a first domain. Each first phrase-intent pair includes a first phrase and a corresponding first intent. The computing device is configured to retrieve a set of configuration rules to configure a plurality of environments. The computing device is also configured to configure a first environment using the first dataset and the set of configuration rules to determine a result user intent based on a requested query associated with the first domain. The first environment embeds the plurality of first phrase-intent pairs in a vector space based on the set of configuration rules. The computing device is configured to perform operations based on the first environment.

Type: Grant

Filed: February 26, 2021

Date of Patent: August 29, 2023

Assignee: Walmart Apollo, LLC

Inventors: Simral Chaudhary, Deepa Mohan, Haoxuan Chen, Lakshmi Manasa Velaga, Snehasish Mukherjee, John Brian Moss, Jason Charles Benesch, Don Bambico
Machine-learned differentiable digital signal processing

Patent number: 11735197

Abstract: Systems and methods of the present disclosure are directed toward digital signal processing using machine-learned differentiable digital signal processors. For example, embodiments of the present disclosure may include differentiable digital signal processors within the training loop of a machine-learned model (e.g., for gradient-based training). Advantageously, systems and methods of the present disclosure provide high quality signal processing using smaller models than prior systems, thereby reducing energy costs (e.g., storage and/or processing costs) associated with performing digital signal processing.

Type: Grant

Filed: July 7, 2020

Date of Patent: August 22, 2023

Assignee: GOOGLE LLC

Inventors: Jesse Engel, Adam Roberts, Chenjie Gu, Lamtharn Hantrakul
Hybrid live captioning systems and methods

Patent number: 11735186

Abstract: A computer system configured to generate captions is provided. The computer system includes a memory and a processor coupled to the memory. The processor is configured to access a first buffer configured to store text generated by an automated speech recognition (ASR) process; access a second buffer configured to store text generated by a captioning client process; identify either the first buffer or the second buffer as a source buffer of caption text; generate caption text from the source buffer; and communicate the caption text to a target process.

Type: Grant

Filed: September 7, 2021

Date of Patent: August 22, 2023

Assignee: 3Play Media, Inc.

Inventors: Roger S. Zimmerman, Christopher S. Antunes, Stephanie A. Laing, John W. Slocum, Nicholas R. Moutis, Theresa M. Kettelberger
Voice output device and voice output method

Patent number: 11735159

Abstract: A voice output device includes a voice output controller configured to determine, when a message reception unit receives a message, whether a start condition to be satisfied when a person intended to receive the message normally listens to voice in the predetermined space is satisfied, and cause a voice output unit to start voice output of the message when the start condition is satisfied and suspend voice output of the message when the start condition is not satisfied. The voice output is not immediately performed in response to a reception of a message but is performed only when the person intended to receive the message normally listens to the message, and the voice output of the message is suspended in other cases.

Type: Grant

Filed: May 25, 2021

Date of Patent: August 22, 2023

Assignee: ALPS ALPINE CO., LTD.

Inventors: Hongda Zheng, Xiao Liu

prev 1 2 3 4 5 6 … next