Patents Examined by Bharatkumar S Shah
  • Patent number: 11842722
    Abstract: Disclosed is a speech synthesis method including: acquiring fundamental frequency information and acoustic feature information from original speech; generating an impulse train from the fundamental frequency information, and inputting it to a harmonic time-varying filter; inputting the acoustic feature information into a neural network filter estimator to obtain corresponding impulse response information; generating noise signal by a noise generator; determining, by the harmonic time-varying filter, harmonic component information through filtering processing on the impulse train and the impulse response information; determining, by a noise time-varying filter, noise component information based on the impulse response information and the noise; and generating a synthesized speech from the harmonic component information and the noise component information.
    Type: Grant
    Filed: June 9, 2021
    Date of Patent: December 12, 2023
    Assignee: AI SPEECH CO., LTD.
    Inventors: Kai Yu, Zhijun Liu, Kuan Chen
  • Patent number: 11837253
    Abstract: A device, system, and method whereby a speech-driven system can distinguish speech obtained from users of the system from other speech spoken by background persons, as well as from background speech from public address systems. In one aspect, the present system and method prepares, in advance of field-use, a voice-data file which is created in a training environment. The training environment exhibits both desired user speech and unwanted background speech, including unwanted speech from persons other than a user and also speech from a PA system. The speech recognition system is trained or otherwise programmed to identify wanted user speech which may be spoken concurrently with the background sounds. In an embodiment, during the pre-field-use phase the training or programming may be accomplished by having persons who are training listeners audit the pre-recorded sounds to identify the desired user speech. A processor-based learning system is trained to duplicate the assessments made by the human listeners.
    Type: Grant
    Filed: September 28, 2021
    Date of Patent: December 5, 2023
    Assignee: VOCOLLECT, INC.
    Inventor: David D. Hardek
  • Patent number: 11830513
    Abstract: Provided is a method of enhancing quality of audio data which comprise obtaining a spectrum of mixed audio data including noise, inputting two-dimensional (2D) input data corresponding to the spectrum to a convolutional network including a downsampling process and an upsampling process to obtain output data of the convolutional network, generating a mask for removing noise included in the audio data based on the obtained output data and removing noise from the mixed audio data using the generated mask, wherein, in the convolutional network, the downsampling process and the upsampling process are performed on a first axis of the 2D input data, and remaining processes other than the downsampling process and the upsampling process are performed on the first axis and a second axis.
    Type: Grant
    Filed: November 20, 2020
    Date of Patent: November 28, 2023
    Assignees: DEEPHEARING INC., The Industry & Academic Cooperation in Chungnam National University (IAC)
    Inventors: Kanghun Ahn, Sungwon Kim
  • Patent number: 11830522
    Abstract: A computer implemented method for speech recognition from an audio signal includes: obtaining initial values for silence detection parameters including: a lead period; a threshold amplitude; and a terminal period. Detect an amplitude of the audio signal at a first time T1 of the audio signal. Optionally adjusting the threshold amplitude based on the detected amplitude. Starting the speech recognition from a second time T2 of the audio signal. Starting silence detection from the audio signal when the lead period has elapsed after the second time T2 including: responsive to detecting an amplitude below the threshold amplitude for a duration of the terminal period, terminating the speech recognition and the silence detection at a third time T3 of the audio signal and adjusting the silence detection parameters based on the detected amplitude changes of the audio signal between the first time T1 and the third time T3.
    Type: Grant
    Filed: December 14, 2021
    Date of Patent: November 28, 2023
    Assignee: Elisa Oyj
    Inventors: Ville Ruutu, Jussi Ruutu
  • Patent number: 11830479
    Abstract: Provided is a voice recognition method and a voice recognition apparatus, and an air conditioner. The method includes: acquiring first voice data; adjusting, according to the first voice data, a collection state of second voice data to obtain an adjusted collection state, and acquiring the second voice data based on the adjusted collection state; and performing far-field voice recognition on the second voice data using a preset far-field voice recognition model so as to obtain semantic information corresponding to the acquired second voice data. The application can solve the problem in which far-field voice recognition performance is poor when a deep learning method or a microphone array method is used to remove reverberation and noise from far-field voice data, thereby enhancing far-field voice recognition performance.
    Type: Grant
    Filed: August 20, 2021
    Date of Patent: November 28, 2023
    Assignee: GREE ELECTRIC APPLIANCES, INC. OF ZHUHAI
    Inventors: Mingjie Li, Dechao Song, Jutao Jia, Wei Wu, Junjie Xie
  • Patent number: 11823697
    Abstract: A method for training a speech recognition model includes obtaining sample utterances of synthesized speech in a target domain, obtaining transcribed utterances of non-synthetic speech in the target domain, and pre-training the speech recognition model on the sample utterances of synthesized speech in the target domain to attain an initial state for warm-start training. After pre-training the speech recognition model, the method also includes warm-start training the speech recognition model on the transcribed utterances of non-synthetic speech in the target domain to teach the speech recognition model to learn to recognize real/human speech in the target domain.
    Type: Grant
    Filed: August 20, 2021
    Date of Patent: November 21, 2023
    Assignee: Google LLC
    Inventors: Andrew Rosenberg, Bhuvana Ramabhadran
  • Patent number: 11817079
    Abstract: The present disclosure provides a GAN-based speech synthesis model, a training method, and a speech synthesis method. According to the speech synthesis method, to-be-converted text is obtained and is converted into a text phoneme, the text phoneme is further digitized to obtain text data, and the text data is converted into a text vector to be input into a speech synthesis model. In this way, target audio corresponding to the to-be-converted text is obtained. When a target Mel-frequency spectrum is generated by using a trained generator, accuracy of the generated target Mel-frequency spectrum can reach that of a standard Mel-frequency spectrum. Through constant adversary between the generator and a discriminator and the trainings thereof, acoustic losses of the target Mel-frequency spectrum are reduced, and acoustic losses of the target audio generated based on the target Mel-frequency spectrum are also reduced, thereby improving accuracy of audio synthesized from speech.
    Type: Grant
    Filed: June 16, 2023
    Date of Patent: November 14, 2023
    Assignee: NANJING SILICON INTELLIGENCE TECHNOLOGY CO., LTD.
    Inventors: Huapeng Sima, Zhiqiang Mao
  • Patent number: 11810337
    Abstract: A method and apparatus for providing emotional care in a session between a user and conversational agent. A first group of images comprising one or more images associated with the user may be received in the session. A user profile may be obtained. A first group of textual descriptions may be generated from the first group of images based at least on emotion information in the user profile. A first memory record may be created based at least on the first group of images and the first group of textual descriptions. A second group of images may be received in the session to generate a second group of textual descriptions from the second group of images based at least on the emotion information in the user profile. A second memory record may be created based at least on the second group of images and the second group of textual descriptions.
    Type: Grant
    Filed: May 24, 2022
    Date of Patent: November 7, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Xianchao Wu, Daigo Hamura, Yongdong Wang
  • Patent number: 11804216
    Abstract: Systems and methods for generating training data for a supervised topic modeling system from outputs of a topic discovery model are described herein. In an embodiment, a system receives a plurality of digitally stored call transcripts and, using a topic model, generates an output which identifies a plurality of topics represented in the plurality of digitally stored call transcripts. Using the output of the topic model, the system generates an input dataset for a supervised learning model by identify a first subset of the plurality of digitally stored call transcripts that include the particular topic, storing a positive value for the first subset, identifying a second subset that do not include the particular topic, and storing a negative value for the second subset. The input training dataset is then used to train a supervised learning model.
    Type: Grant
    Filed: August 3, 2022
    Date of Patent: October 31, 2023
    Assignee: Invoca, Inc.
    Inventors: Michael McCourt, Anoop Praturu
  • Patent number: 11798526
    Abstract: A device may identify a plurality of sources for outputs that the device is configured to provide. The plurality of sources may include at least one of a particular application in the device, an operating system of the device, a particular area within a display of the device, or a particular graphical user interface object. The device may also assign a set of distinct voices to respective sources of the plurality of sources. The device may also receive a request for speech output. The device may also select a particular source that is associated with the requested speech output. The device may also generate speech having particular voice characteristics of a particular voice assigned to the particular source.
    Type: Grant
    Filed: March 1, 2022
    Date of Patent: October 24, 2023
    Assignee: Google LLC
    Inventors: Ioannis Agiomyrgiannakis, Fergus James Henderson
  • Patent number: 11792632
    Abstract: A communication method includes establishing a wireless connection between an image forming apparatus and an external access point by a wireless communication unit, establishing a direct wireless connection between the image forming apparatus and a communication partner apparatus by the wireless communication unit without the external access point, concurrently maintaining the wireless connection and the direct wireless connection with each other, and performing print processing based on data received by the wireless communication unit. The external access point is searched in a case where the wireless connection between the image forming apparatus and the external access point is disconnected, the direct wireless connection between the image forming apparatus and the communication partner apparatus is maintained, while the external access point is searched, and the external access point is external to the image forming apparatus and is external to the communication partner apparatus.
    Type: Grant
    Filed: May 10, 2022
    Date of Patent: October 17, 2023
    Assignee: CANON KABUSHIKI KAISHA
    Inventor: Atsushi Shimazaki
  • Patent number: 11783815
    Abstract: Systems and processes for operating an intelligent automated assistant are provided. An example process for determining user intent includes receiving a natural language input and detecting an event. The process further includes, determining, at a first time, based on the natural language input, a first value for a first node of a parsing structure; and determining, at a second time, based on the detected data event, a second value for a second node of the parsing structure. The process further includes in accordance with a determination that the first time and the second time are within the predetermined time: determining, using the parsing structure, the first value, and the second value, a user intent associated with the natural language input; initiating a task based on the determined intent; and providing an output indicative of the task.
    Type: Grant
    Filed: April 28, 2022
    Date of Patent: October 10, 2023
    Assignee: Apple Inc.
    Inventors: Pierre P. Greborio, Didier Rene Guzzoni, Philippe P. Piernot
  • Patent number: 11776532
    Abstract: The disclosure relates to an audio processing apparatus (200) configured to classify an audio signal into one or more audio scene classes, the audio signal comprising a component signal. The apparatus (200) comprises: processing circuitry configured to classify the component signal of the audio signal as a foreground layer component signal or a background layer component signal; obtain an audio signal feature on the basis of the audio signal; select, depending on the classification of the component signal, a first set of weights or a second set of weights; and to classify the audio signal on the basis of the audio signal features, the foreground layer component signal or the background layer component signal and the selected set of weights.
    Type: Grant
    Filed: June 15, 2021
    Date of Patent: October 3, 2023
    Assignee: HUAWEI TECHNOLOGIES CO., LTD.
    Inventors: Yesenia Lacouture Parodi, Florian Eyben, Andrea Crespi, Jun Deng
  • Patent number: 11769484
    Abstract: Computer-implemented methods, computer program products, and computer systems for testing a voice assistant device may include one or more processors configured for receiving test data from a database, wherein the test data may include a first set of coding parameters and a first user utterance having an expected device response. Further, the one or more processors may be configured for generating a first modified user utterance by applying the first set of coding parameters to the first user utterance, wherein the first modified user utterance is acoustically different than the first user utterance. The one or more processors may be configured for audibly presenting the first modified user utterance to a voice assistant device, receiving a first device response from the voice assistant device, and determining whether the first voice assistant response is substantially similar to the expected device response.
    Type: Grant
    Filed: September 11, 2020
    Date of Patent: September 26, 2023
    Assignee: International Business Machines Corporation
    Inventors: Vijay Kumar Ananthapur Bache, Pradeep Raj Jayarathanasamy, Srithar Rajan Thangaraj, Arvind Rangarajan
  • Patent number: 11769482
    Abstract: The present disclosure provides a method and apparatus of synthesizing a speech, a method and apparatus of training a speech synthesis model, an electronic device, and a storage medium. The method of synthesizing a speech includes acquiring a style information of a speech to be synthesized, a tone information of the speech to be synthesized, and a content information of a text to be processed; generating an acoustic feature information of the text to be processed, by using a pre-trained speech synthesis model, based on the style information, the tone information, and the content information of the text to be processed; and synthesizing the speech for the text to be processed, based on the acoustic feature information of the text to be processed.
    Type: Grant
    Filed: September 29, 2021
    Date of Patent: September 26, 2023
    Assignee: Beijing Baidu Netcom Science Technology Co., Ltd.
    Inventors: Wenfu Wang, Tao Sun, Xilei Wang, Junteng Zhang, Zhengkun Gao, Lei Jia
  • Patent number: 11763805
    Abstract: A speaker recognition method and apparatus receives a first voice signal of a speaker, generates a second voice signal by enhancing the first voice signal through speech enhancement, generates a multi-channel voice signal by associating the first voice signal with the second voice signal, and recognizes the speaker based on the multi-channel voice signal.
    Type: Grant
    Filed: May 27, 2022
    Date of Patent: September 19, 2023
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Sung-Jae Cho, Kyuhong Kim, Jaejoon Han
  • Patent number: 11741956
    Abstract: A system for generating a response to a customer query includes a computing device configured to obtain a first dataset, including a plurality of first phrase-intent pairs associated with a first domain. Each first phrase-intent pair includes a first phrase and a corresponding first intent. The computing device is configured to retrieve a set of configuration rules to configure a plurality of environments. The computing device is also configured to configure a first environment using the first dataset and the set of configuration rules to determine a result user intent based on a requested query associated with the first domain. The first environment embeds the plurality of first phrase-intent pairs in a vector space based on the set of configuration rules. The computing device is configured to perform operations based on the first environment.
    Type: Grant
    Filed: February 26, 2021
    Date of Patent: August 29, 2023
    Assignee: Walmart Apollo, LLC
    Inventors: Simral Chaudhary, Deepa Mohan, Haoxuan Chen, Lakshmi Manasa Velaga, Snehasish Mukherjee, John Brian Moss, Jason Charles Benesch, Don Bambico
  • Patent number: 11735197
    Abstract: Systems and methods of the present disclosure are directed toward digital signal processing using machine-learned differentiable digital signal processors. For example, embodiments of the present disclosure may include differentiable digital signal processors within the training loop of a machine-learned model (e.g., for gradient-based training). Advantageously, systems and methods of the present disclosure provide high quality signal processing using smaller models than prior systems, thereby reducing energy costs (e.g., storage and/or processing costs) associated with performing digital signal processing.
    Type: Grant
    Filed: July 7, 2020
    Date of Patent: August 22, 2023
    Assignee: GOOGLE LLC
    Inventors: Jesse Engel, Adam Roberts, Chenjie Gu, Lamtharn Hantrakul
  • Patent number: 11735186
    Abstract: A computer system configured to generate captions is provided. The computer system includes a memory and a processor coupled to the memory. The processor is configured to access a first buffer configured to store text generated by an automated speech recognition (ASR) process; access a second buffer configured to store text generated by a captioning client process; identify either the first buffer or the second buffer as a source buffer of caption text; generate caption text from the source buffer; and communicate the caption text to a target process.
    Type: Grant
    Filed: September 7, 2021
    Date of Patent: August 22, 2023
    Assignee: 3Play Media, Inc.
    Inventors: Roger S. Zimmerman, Christopher S. Antunes, Stephanie A. Laing, John W. Slocum, Nicholas R. Moutis, Theresa M. Kettelberger
  • Patent number: 11735159
    Abstract: A voice output device includes a voice output controller configured to determine, when a message reception unit receives a message, whether a start condition to be satisfied when a person intended to receive the message normally listens to voice in the predetermined space is satisfied, and cause a voice output unit to start voice output of the message when the start condition is satisfied and suspend voice output of the message when the start condition is not satisfied. The voice output is not immediately performed in response to a reception of a message but is performed only when the person intended to receive the message normally listens to the message, and the voice output of the message is suspended in other cases.
    Type: Grant
    Filed: May 25, 2021
    Date of Patent: August 22, 2023
    Assignee: ALPS ALPINE CO., LTD.
    Inventors: Hongda Zheng, Xiao Liu