Patents Examined by Shreyans A Patel
-
Patent number: 11790930Abstract: A system and method for reverberation reduction is disclosed. A first Deep Neural Network (DNN) produces a first estimate of a target direct-path signal from a mixture of acoustic signals that include the target direct-path signal and a reverberation of the target direct-path signal. A filter modeling a room impulse response (RIR) for the first estimate is estimated. The filter when applied to the first estimate of the target direct-path signal generates a result closest to a residual between the mixture of the acoustic signals and the first estimate of the target direct-path signal according to a distance function. A mixture with reduced reverberation of the target direct-path signal is obtained by removing the result of applying the filter to the first estimate of the target direct-path signal from the received mixture. A second DNN produces a second estimate of the target direct-path signal from the mixture with reduced reverberation.Type: GrantFiled: March 10, 2022Date of Patent: October 17, 2023Assignee: Mitsubishi Electric Research Laboratories, Inc.Inventors: Zhong-Qiu Wang, Gordon Wichern, Jonathan Le Roux
-
Patent number: 11783844Abstract: Disclosed are methods of encoding and decoding an audio signal using side information, and an encoder and a decoder for performing the methods. The method of encoding an audio signal using side information includes identifying an input signal, the input signal being an original audio signal, extracting side information from the input signal using a learning model trained to extract side information from a feature vector of the input signal, encoding the input signal, and generating a bitstream by combining the encoded input signal and the side information.Type: GrantFiled: November 16, 2021Date of Patent: October 10, 2023Assignees: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE, GWANGJU INSTITUTE OF SCIENCE AND TECHNOLOGYInventors: Woo-taek Lim, Seung Kwon Beack, Jongmo Sung, Tae Jin Lee, Inseon Jang, Jong Won Shin, Soojoong Hwang, Youngju Cheon, Sangwook Han
-
Patent number: 11776557Abstract: Provided is a zero user interface (UI)-based automatic interpretation method including receiving a plurality of speech signals uttered by a plurality of users from a plurality of terminal devices, acquiring a plurality of speech energies from the plurality of received speech signals, determining main speech signal uttered in a current utterance turn among the plurality of speech signals by comparing the plurality of acquired speech energies, and transmitting an automatic interpretation result acquired by performing automatic interpretation on the determined main speech signal to the plurality of terminal devices.Type: GrantFiled: April 2, 2021Date of Patent: October 3, 2023Assignee: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTEInventors: Seung Yun, Sang Hun Kim, Min Kyu Lee
-
Patent number: 11769481Abstract: Generation of synthetic speech from an input text sequence may be difficult when durations of individual phonemes forming the input text sequence are unknown. A predominantly parallel process may model speech rhythm as a separate generative distribution such that phoneme duration may be sampled at inference. Additional information such as pitch or energy may also be sampled to provide improved diversity for synthetic speech generation.Type: GrantFiled: October 7, 2021Date of Patent: September 26, 2023Assignee: Nvidia CorporationInventors: Kevin Shih, Jose Rafael Valle Gomes da Costa, Rohan Badlani, Adrian Lancucki, Wei Ping, Bryan Catanzaro
-
Patent number: 11769483Abstract: A multilingual text-to-speech synthesis method and system are disclosed. The method includes receiving an articulatory feature of a speaker regarding a first language, receiving an input text of a second language, and generating output speech data for the input text of the second language that simulates the speaker's speech by inputting the input text of the second language and the articulatory feature of the speaker regarding the first language to a single artificial neural network multilingual text-to-speech synthesis model. The single artificial neural network multilingual text-to-speech synthesis model is generated by learning similarity information between phonemes of the first language and phonemes of the second language based on a first learning data of the first language and a second learning data of the second language.Type: GrantFiled: November 23, 2021Date of Patent: September 26, 2023Assignee: NEOSAPIENCE, INC.Inventors: Taesu Kim, Younggun Lee
-
Patent number: 11763799Abstract: An electronic apparatus and a controlling method thereof are provided. The electronic apparatus includes a microphone; a memory configured to store a text-to-speech (TTS) model and a plurality of evaluation texts; and a processor configured to: obtain a first reference vector of a user speech spoken by a user based the user speech being received through the microphone, generate a plurality of candidate reference vectors based on the first reference vector, obtain a plurality of synthesized sounds by inputting the plurality of candidate reference vectors and the plurality of evaluation texts to the TTS model, identify at least one synthesized sound of the plurality of synthesized sounds based on a similarity between characteristics of the plurality of synthesized sounds and the user speech, and store a second reference vector of the at least one synthesized sound in the memory as a reference vector corresponding to the user for the TTS model.Type: GrantFiled: December 17, 2021Date of Patent: September 19, 2023Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventors: Sangjun Park, Kyoungbo Min, Kihyun Choo, Seungdo Choi
-
Patent number: 11763832Abstract: Systems and methods for generating an enhanced audio signal comprise a trained neural network configured to receive an input audio signal and generate an enhanced target signal, the trained neural network comprising a pre-processing neural network configured to receive a segment of the input audio signal and output an audio classification, the pre-processing neural network including at least one hidden layer comprising an embedding vector, and a noise reduction neural network configured to receive the segment of the input audio signal, and the embedding vector and generate the enhanced target signal. The pre-processing neural network may comprise a target signal pre-processing neural network configured to output a target signal classification and comprising at least one hidden layer comprising a target embedding vector.Type: GrantFiled: May 1, 2020Date of Patent: September 19, 2023Assignees: Synaptics Incorporated, The Trustees of Indiana UniversityInventors: Francesco Nesta, Minje Kim, Sanna Wager
-
Patent number: 11749295Abstract: Provided is pitch enhancement processing having little unnaturalness even in time segments for consonants, and having little unnaturalness to listeners caused by discontinuities even when time segments for consonants and other time segments switch frequently. A pitch emphasis apparatus carries out the following as the pitch enhancement processing: for a time segment in which a spectral envelope of a signal has been determined to be flat, obtaining an output signal for each of times in the time segment, the output signal being a signal including a signal obtained by adding (1) a signal obtained by multiplying the signal of a time, further in the past than the time by a number of samples T0 corresponding to a pitch period of the time segment, a pitch gain ?0 of the time segment, a predetermined constant B0, and a value greater than 0 and less than 1, to (2) the signal of the time.Type: GrantFiled: August 31, 2022Date of Patent: September 5, 2023Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATIONInventors: Yutaka Kamamoto, Ryosuke Sugiura, Takehiro Moriya
-
Patent number: 11741942Abstract: A method, computer program product, and computer system for text-to-speech synthesis is disclosed. Synthetic speech data for an input text may be generated. The synthetic speech data may be compared to recorded reference speech data corresponding to the input text. Based on, at least in part, the comparison of the synthetic speech data to the recorded reference speech data, at least one feature indicative of at least one difference between the synthetic speech data and the recorded reference speech data may be extracted. A speech gap filling model may be generated based on, at least in part, the at least one feature extracted. A speech output may be generated based on, at least in part, the speech gap filling model.Type: GrantFiled: August 3, 2022Date of Patent: August 29, 2023Assignee: Telepathy Labs, IncInventors: Piero Perucci, Martin Reber, Vijeta Avijeet
-
Patent number: 11735158Abstract: This specification describes systems and methods for aging voice audio, in particular voice audio in computer games. According to one aspect of this specification, there is described a method for aging speech audio data. The method comprises: inputting an initial audio signal and an age embedding into a machine-learned age convertor model, wherein: the initial audio signal comprises speech audio; and the age embedding is based on an age classification of a plurality of speech audio samples of subjects in a target age category; processing, by the machine-learned age convertor model, the initial audio signal and the age embedding to generate an age-altered audio signal, wherein the age-altered audio signal corresponds to a version of the initial audio signal in the target age category; and outputting, from the machine-learned age convertor model, the age-altered audio signal.Type: GrantFiled: August 11, 2021Date of Patent: August 22, 2023Assignee: ELECTRONIC ARTS INC.Inventors: Kilol Gupta, Zahra Shakeri, Ping Zhong, Siddharth Gururani, Mohsen Sardari
-
Patent number: 11735164Abstract: A system, article, and method of automatic speech recognition with highly efficient decoding is accomplished by frequent beam width adjustment.Type: GrantFiled: August 9, 2021Date of Patent: August 22, 2023Assignee: Intel CorporationInventors: Piotr Rozen, Joachim Hofer
-
Patent number: 11727915Abstract: Disclosed are a method and a terminal for generating simulated voices of virtual teachers. Real voice samples of teachers are collected and converted into text sequences, and a text emotion polarity training set and a text tone training set are constructed according to the text sequences; a lexical item emotion model is constructed based on lexical items in the text sequences and is trained by using the emotion polarity training set, and word vectors, an emotion polarity vector, and a weight parameter are obtained by training; and the similarity between the word vector and the emotion polarity vector is calculated, and emotion features are extracted according to a similarity calculation result, a conditional vocoder is constructed according to the voice styles and emotion features to generate new voices with emotion changes. The method and the terminal contribute to satisfying the application requirements of high-quality virtual teachers.Type: GrantFiled: January 18, 2023Date of Patent: August 15, 2023Assignees: Fujian TQ Digital Inc., Central China Normal UniversityInventors: Dejian Liu, Zhenhua Fang, Zheng Zhong, Jian Xu
-
Patent number: 11687724Abstract: Word sense disambiguation using a glossary layer embedded in a deep neural network includes receiving, by one or more processors, input sentences including a plurality of words. At least two words in the plurality of words are homonyms. The one or more processors convert the plurality of words associated with each input sentence into a first vector including possible senses for the at least two words. The first vector is then combined with a second vector including a domain-specific contextual vector associated with the at least two words. The combination of the first vector with the second vector is fed into a recurrent deep logico-neural network model to generate a third vector that includes word senses for the at least two words. A threshold is set for the third vector to generate a fourth vector including a final word sense vector for the at least two words.Type: GrantFiled: September 30, 2020Date of Patent: June 27, 2023Assignee: International Business Machines CorporationInventors: Ismail Yunus Akhalwaya, Naweed Aghmad Khan, Francois Pierre Luus, Ndivhuwo Makondo, Ryan Nelson Riegel, Alexander Gray
-
Patent number: 11682379Abstract: A method, computer program, and computer system is provided for synthesizing speech at one or more speeds. A context associated with one or more phonemes corresponding to a speaking voice is encoded, and the one or more phonemes are aligned to one or more target acoustic frames based on the encoded context. One or more mel-spectrogram features are recursively generated from the aligned phonemes and target acoustic frames, and a voice sample corresponding to the speaking voice is synthesized using the generated mel-spectrogram features.Type: GrantFiled: February 24, 2022Date of Patent: June 20, 2023Assignee: TENCENT AMERICA LLCInventors: Chengzhu Yu, Dong Yu
-
Patent number: 11682388Abstract: An AI apparatus includes a microphone to acquire speech data including multiple languages, and a processor to acquire text data corresponding to the speech data, determine a main language from languages included in the text data, acquire a translated text data obtained by translating a text data portion, which has a language other than the main language, in the main language, acquire a morpheme analysis result for the translated text data, extract a keyword for intention analysis from the morpheme analysis result, acquire an intention pattern matched to the keyword, and perform an operation corresponding to the intention pattern.Type: GrantFiled: June 2, 2022Date of Patent: June 20, 2023Assignee: LG ELECTRONICS INCInventors: Yejin Kim, Hyun Yu, Jonghoon Chae
-
Patent number: 11676571Abstract: A device for speech generation includes one or more processors configured to receive one or more control parameters indicating target speech characteristics. The one or more processors are also configured to process, using a multi-encoder, an input representation of speech based on the one or more control parameters to generate encoded data corresponding to an audio signal that represents a version of the speech based on the target speech characteristics.Type: GrantFiled: January 21, 2021Date of Patent: June 13, 2023Assignee: QUALCOMM IncorporatedInventors: Kyungguen Byun, Sunkuk Moon, Shuhua Zhang, Vahid Montazeri, Lae-Hoon Kim, Erik Visser
-
Patent number: 11676577Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for adapting a language model are disclosed. In one aspect, a method includes the actions of receiving transcriptions of utterances that were received by computing devices operating in a domain and that are in a source language. The actions further include generating translated transcriptions of the transcriptions of the utterances in a target language. The actions further include receiving a language model for the target language. The actions further include biasing the language model for the target language by increasing the likelihood of the language model selecting terms included in the translated transcriptions. The actions further include generating a transcription of an utterance in the target language using the biased language model and while operating in the domain.Type: GrantFiled: September 9, 2021Date of Patent: June 13, 2023Assignee: Google LLCInventors: Petar Aleksic, Benjamin Paul Hillson Haynor
-
Patent number: 11675977Abstract: Systems, methods, and apparatuses are presented for a novel natural language tokenizer and tagger. In some embodiments, a method for tokenizing text for natural language processing comprises: generating from a pool of documents, a set of statistical models comprising one or more entries each indicating a likelihood of appearance of a character/letter sequence in the pool of documents; receiving a set of rules comprising rules that identify character/letter sequences as valid tokens; transforming one or more entries in the statistical models into new rules that are added to the set of rules when the entries indicate a high likelihood; receiving a document to be processed; dividing the document to be processed into tokens based on the set of statistical models and the set of rules, wherein the statistical models are applied where the rules fail to unambiguously tokenize the document; and outputting the divided tokens for natural language processing.Type: GrantFiled: March 27, 2020Date of Patent: June 13, 2023Assignee: Daash Intelligence, Inc.Inventors: Robert J. Munro, Rob Voigt, Schuyler D. Erle, Brendan D. Callahan, Gary C. King, Jessica D. Long, Jason Brenier, Tripti Saxena, Stefan Krawczyk
-
Patent number: 11669688Abstract: A system and a corresponding computer-implemented method identifies and classifies community-sourced documents as true documents. The community-sourced documents include one or more data objects such as data items, including text, strings, phrases, and words; image items, including still image items, video image items, and icons; and drawing items. The system and corresponding method then report the analysis results.Type: GrantFiled: June 7, 2021Date of Patent: June 6, 2023Assignee: Architecture Technology CorporationInventors: Eric R. Chartier, Andrew Murphy, William Colligan, Paul C. Davis
-
Patent number: 11670311Abstract: A wireless audio system for encoding and decoding an audio signal using spectral bandwidth replication is provided. Bandwidth extension is performed in the time-domain, enabling low-latency audio coding.Type: GrantFiled: April 12, 2021Date of Patent: June 6, 2023Assignee: Shure Acquisition Holdings, Inc.Inventors: Wenshun Tian, Michael Ryan Lester