Patents Examined by Shreyans A Patel
  • Patent number: 11790930
    Abstract: A system and method for reverberation reduction is disclosed. A first Deep Neural Network (DNN) produces a first estimate of a target direct-path signal from a mixture of acoustic signals that include the target direct-path signal and a reverberation of the target direct-path signal. A filter modeling a room impulse response (RIR) for the first estimate is estimated. The filter when applied to the first estimate of the target direct-path signal generates a result closest to a residual between the mixture of the acoustic signals and the first estimate of the target direct-path signal according to a distance function. A mixture with reduced reverberation of the target direct-path signal is obtained by removing the result of applying the filter to the first estimate of the target direct-path signal from the received mixture. A second DNN produces a second estimate of the target direct-path signal from the mixture with reduced reverberation.
    Type: Grant
    Filed: March 10, 2022
    Date of Patent: October 17, 2023
    Assignee: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Zhong-Qiu Wang, Gordon Wichern, Jonathan Le Roux
  • Patent number: 11783844
    Abstract: Disclosed are methods of encoding and decoding an audio signal using side information, and an encoder and a decoder for performing the methods. The method of encoding an audio signal using side information includes identifying an input signal, the input signal being an original audio signal, extracting side information from the input signal using a learning model trained to extract side information from a feature vector of the input signal, encoding the input signal, and generating a bitstream by combining the encoded input signal and the side information.
    Type: Grant
    Filed: November 16, 2021
    Date of Patent: October 10, 2023
    Assignees: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE, GWANGJU INSTITUTE OF SCIENCE AND TECHNOLOGY
    Inventors: Woo-taek Lim, Seung Kwon Beack, Jongmo Sung, Tae Jin Lee, Inseon Jang, Jong Won Shin, Soojoong Hwang, Youngju Cheon, Sangwook Han
  • Patent number: 11776557
    Abstract: Provided is a zero user interface (UI)-based automatic interpretation method including receiving a plurality of speech signals uttered by a plurality of users from a plurality of terminal devices, acquiring a plurality of speech energies from the plurality of received speech signals, determining main speech signal uttered in a current utterance turn among the plurality of speech signals by comparing the plurality of acquired speech energies, and transmitting an automatic interpretation result acquired by performing automatic interpretation on the determined main speech signal to the plurality of terminal devices.
    Type: Grant
    Filed: April 2, 2021
    Date of Patent: October 3, 2023
    Assignee: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE
    Inventors: Seung Yun, Sang Hun Kim, Min Kyu Lee
  • Patent number: 11769481
    Abstract: Generation of synthetic speech from an input text sequence may be difficult when durations of individual phonemes forming the input text sequence are unknown. A predominantly parallel process may model speech rhythm as a separate generative distribution such that phoneme duration may be sampled at inference. Additional information such as pitch or energy may also be sampled to provide improved diversity for synthetic speech generation.
    Type: Grant
    Filed: October 7, 2021
    Date of Patent: September 26, 2023
    Assignee: Nvidia Corporation
    Inventors: Kevin Shih, Jose Rafael Valle Gomes da Costa, Rohan Badlani, Adrian Lancucki, Wei Ping, Bryan Catanzaro
  • Patent number: 11769483
    Abstract: A multilingual text-to-speech synthesis method and system are disclosed. The method includes receiving an articulatory feature of a speaker regarding a first language, receiving an input text of a second language, and generating output speech data for the input text of the second language that simulates the speaker's speech by inputting the input text of the second language and the articulatory feature of the speaker regarding the first language to a single artificial neural network multilingual text-to-speech synthesis model. The single artificial neural network multilingual text-to-speech synthesis model is generated by learning similarity information between phonemes of the first language and phonemes of the second language based on a first learning data of the first language and a second learning data of the second language.
    Type: Grant
    Filed: November 23, 2021
    Date of Patent: September 26, 2023
    Assignee: NEOSAPIENCE, INC.
    Inventors: Taesu Kim, Younggun Lee
  • Patent number: 11763799
    Abstract: An electronic apparatus and a controlling method thereof are provided. The electronic apparatus includes a microphone; a memory configured to store a text-to-speech (TTS) model and a plurality of evaluation texts; and a processor configured to: obtain a first reference vector of a user speech spoken by a user based the user speech being received through the microphone, generate a plurality of candidate reference vectors based on the first reference vector, obtain a plurality of synthesized sounds by inputting the plurality of candidate reference vectors and the plurality of evaluation texts to the TTS model, identify at least one synthesized sound of the plurality of synthesized sounds based on a similarity between characteristics of the plurality of synthesized sounds and the user speech, and store a second reference vector of the at least one synthesized sound in the memory as a reference vector corresponding to the user for the TTS model.
    Type: Grant
    Filed: December 17, 2021
    Date of Patent: September 19, 2023
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Sangjun Park, Kyoungbo Min, Kihyun Choo, Seungdo Choi
  • Patent number: 11763832
    Abstract: Systems and methods for generating an enhanced audio signal comprise a trained neural network configured to receive an input audio signal and generate an enhanced target signal, the trained neural network comprising a pre-processing neural network configured to receive a segment of the input audio signal and output an audio classification, the pre-processing neural network including at least one hidden layer comprising an embedding vector, and a noise reduction neural network configured to receive the segment of the input audio signal, and the embedding vector and generate the enhanced target signal. The pre-processing neural network may comprise a target signal pre-processing neural network configured to output a target signal classification and comprising at least one hidden layer comprising a target embedding vector.
    Type: Grant
    Filed: May 1, 2020
    Date of Patent: September 19, 2023
    Assignees: Synaptics Incorporated, The Trustees of Indiana University
    Inventors: Francesco Nesta, Minje Kim, Sanna Wager
  • Patent number: 11749295
    Abstract: Provided is pitch enhancement processing having little unnaturalness even in time segments for consonants, and having little unnaturalness to listeners caused by discontinuities even when time segments for consonants and other time segments switch frequently. A pitch emphasis apparatus carries out the following as the pitch enhancement processing: for a time segment in which a spectral envelope of a signal has been determined to be flat, obtaining an output signal for each of times in the time segment, the output signal being a signal including a signal obtained by adding (1) a signal obtained by multiplying the signal of a time, further in the past than the time by a number of samples T0 corresponding to a pitch period of the time segment, a pitch gain ?0 of the time segment, a predetermined constant B0, and a value greater than 0 and less than 1, to (2) the signal of the time.
    Type: Grant
    Filed: August 31, 2022
    Date of Patent: September 5, 2023
    Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Yutaka Kamamoto, Ryosuke Sugiura, Takehiro Moriya
  • Patent number: 11741942
    Abstract: A method, computer program product, and computer system for text-to-speech synthesis is disclosed. Synthetic speech data for an input text may be generated. The synthetic speech data may be compared to recorded reference speech data corresponding to the input text. Based on, at least in part, the comparison of the synthetic speech data to the recorded reference speech data, at least one feature indicative of at least one difference between the synthetic speech data and the recorded reference speech data may be extracted. A speech gap filling model may be generated based on, at least in part, the at least one feature extracted. A speech output may be generated based on, at least in part, the speech gap filling model.
    Type: Grant
    Filed: August 3, 2022
    Date of Patent: August 29, 2023
    Assignee: Telepathy Labs, Inc
    Inventors: Piero Perucci, Martin Reber, Vijeta Avijeet
  • Patent number: 11735158
    Abstract: This specification describes systems and methods for aging voice audio, in particular voice audio in computer games. According to one aspect of this specification, there is described a method for aging speech audio data. The method comprises: inputting an initial audio signal and an age embedding into a machine-learned age convertor model, wherein: the initial audio signal comprises speech audio; and the age embedding is based on an age classification of a plurality of speech audio samples of subjects in a target age category; processing, by the machine-learned age convertor model, the initial audio signal and the age embedding to generate an age-altered audio signal, wherein the age-altered audio signal corresponds to a version of the initial audio signal in the target age category; and outputting, from the machine-learned age convertor model, the age-altered audio signal.
    Type: Grant
    Filed: August 11, 2021
    Date of Patent: August 22, 2023
    Assignee: ELECTRONIC ARTS INC.
    Inventors: Kilol Gupta, Zahra Shakeri, Ping Zhong, Siddharth Gururani, Mohsen Sardari
  • Patent number: 11735164
    Abstract: A system, article, and method of automatic speech recognition with highly efficient decoding is accomplished by frequent beam width adjustment.
    Type: Grant
    Filed: August 9, 2021
    Date of Patent: August 22, 2023
    Assignee: Intel Corporation
    Inventors: Piotr Rozen, Joachim Hofer
  • Patent number: 11727915
    Abstract: Disclosed are a method and a terminal for generating simulated voices of virtual teachers. Real voice samples of teachers are collected and converted into text sequences, and a text emotion polarity training set and a text tone training set are constructed according to the text sequences; a lexical item emotion model is constructed based on lexical items in the text sequences and is trained by using the emotion polarity training set, and word vectors, an emotion polarity vector, and a weight parameter are obtained by training; and the similarity between the word vector and the emotion polarity vector is calculated, and emotion features are extracted according to a similarity calculation result, a conditional vocoder is constructed according to the voice styles and emotion features to generate new voices with emotion changes. The method and the terminal contribute to satisfying the application requirements of high-quality virtual teachers.
    Type: Grant
    Filed: January 18, 2023
    Date of Patent: August 15, 2023
    Assignees: Fujian TQ Digital Inc., Central China Normal University
    Inventors: Dejian Liu, Zhenhua Fang, Zheng Zhong, Jian Xu
  • Patent number: 11687724
    Abstract: Word sense disambiguation using a glossary layer embedded in a deep neural network includes receiving, by one or more processors, input sentences including a plurality of words. At least two words in the plurality of words are homonyms. The one or more processors convert the plurality of words associated with each input sentence into a first vector including possible senses for the at least two words. The first vector is then combined with a second vector including a domain-specific contextual vector associated with the at least two words. The combination of the first vector with the second vector is fed into a recurrent deep logico-neural network model to generate a third vector that includes word senses for the at least two words. A threshold is set for the third vector to generate a fourth vector including a final word sense vector for the at least two words.
    Type: Grant
    Filed: September 30, 2020
    Date of Patent: June 27, 2023
    Assignee: International Business Machines Corporation
    Inventors: Ismail Yunus Akhalwaya, Naweed Aghmad Khan, Francois Pierre Luus, Ndivhuwo Makondo, Ryan Nelson Riegel, Alexander Gray
  • Patent number: 11682379
    Abstract: A method, computer program, and computer system is provided for synthesizing speech at one or more speeds. A context associated with one or more phonemes corresponding to a speaking voice is encoded, and the one or more phonemes are aligned to one or more target acoustic frames based on the encoded context. One or more mel-spectrogram features are recursively generated from the aligned phonemes and target acoustic frames, and a voice sample corresponding to the speaking voice is synthesized using the generated mel-spectrogram features.
    Type: Grant
    Filed: February 24, 2022
    Date of Patent: June 20, 2023
    Assignee: TENCENT AMERICA LLC
    Inventors: Chengzhu Yu, Dong Yu
  • Patent number: 11682388
    Abstract: An AI apparatus includes a microphone to acquire speech data including multiple languages, and a processor to acquire text data corresponding to the speech data, determine a main language from languages included in the text data, acquire a translated text data obtained by translating a text data portion, which has a language other than the main language, in the main language, acquire a morpheme analysis result for the translated text data, extract a keyword for intention analysis from the morpheme analysis result, acquire an intention pattern matched to the keyword, and perform an operation corresponding to the intention pattern.
    Type: Grant
    Filed: June 2, 2022
    Date of Patent: June 20, 2023
    Assignee: LG ELECTRONICS INC
    Inventors: Yejin Kim, Hyun Yu, Jonghoon Chae
  • Patent number: 11676571
    Abstract: A device for speech generation includes one or more processors configured to receive one or more control parameters indicating target speech characteristics. The one or more processors are also configured to process, using a multi-encoder, an input representation of speech based on the one or more control parameters to generate encoded data corresponding to an audio signal that represents a version of the speech based on the target speech characteristics.
    Type: Grant
    Filed: January 21, 2021
    Date of Patent: June 13, 2023
    Assignee: QUALCOMM Incorporated
    Inventors: Kyungguen Byun, Sunkuk Moon, Shuhua Zhang, Vahid Montazeri, Lae-Hoon Kim, Erik Visser
  • Patent number: 11676577
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for adapting a language model are disclosed. In one aspect, a method includes the actions of receiving transcriptions of utterances that were received by computing devices operating in a domain and that are in a source language. The actions further include generating translated transcriptions of the transcriptions of the utterances in a target language. The actions further include receiving a language model for the target language. The actions further include biasing the language model for the target language by increasing the likelihood of the language model selecting terms included in the translated transcriptions. The actions further include generating a transcription of an utterance in the target language using the biased language model and while operating in the domain.
    Type: Grant
    Filed: September 9, 2021
    Date of Patent: June 13, 2023
    Assignee: Google LLC
    Inventors: Petar Aleksic, Benjamin Paul Hillson Haynor
  • Patent number: 11675977
    Abstract: Systems, methods, and apparatuses are presented for a novel natural language tokenizer and tagger. In some embodiments, a method for tokenizing text for natural language processing comprises: generating from a pool of documents, a set of statistical models comprising one or more entries each indicating a likelihood of appearance of a character/letter sequence in the pool of documents; receiving a set of rules comprising rules that identify character/letter sequences as valid tokens; transforming one or more entries in the statistical models into new rules that are added to the set of rules when the entries indicate a high likelihood; receiving a document to be processed; dividing the document to be processed into tokens based on the set of statistical models and the set of rules, wherein the statistical models are applied where the rules fail to unambiguously tokenize the document; and outputting the divided tokens for natural language processing.
    Type: Grant
    Filed: March 27, 2020
    Date of Patent: June 13, 2023
    Assignee: Daash Intelligence, Inc.
    Inventors: Robert J. Munro, Rob Voigt, Schuyler D. Erle, Brendan D. Callahan, Gary C. King, Jessica D. Long, Jason Brenier, Tripti Saxena, Stefan Krawczyk
  • Patent number: 11669688
    Abstract: A system and a corresponding computer-implemented method identifies and classifies community-sourced documents as true documents. The community-sourced documents include one or more data objects such as data items, including text, strings, phrases, and words; image items, including still image items, video image items, and icons; and drawing items. The system and corresponding method then report the analysis results.
    Type: Grant
    Filed: June 7, 2021
    Date of Patent: June 6, 2023
    Assignee: Architecture Technology Corporation
    Inventors: Eric R. Chartier, Andrew Murphy, William Colligan, Paul C. Davis
  • Patent number: 11670311
    Abstract: A wireless audio system for encoding and decoding an audio signal using spectral bandwidth replication is provided. Bandwidth extension is performed in the time-domain, enabling low-latency audio coding.
    Type: Grant
    Filed: April 12, 2021
    Date of Patent: June 6, 2023
    Assignee: Shure Acquisition Holdings, Inc.
    Inventors: Wenshun Tian, Michael Ryan Lester