Patents Examined by Shreyans A Patel
-
Patent number: 12106767Abstract: Provided is pitch enhancement processing having little unnaturalness even in time segments for consonants, and having little unnaturalness to listeners caused by discontinuities even when time segments for consonants and other time segments switch frequently. A pitch emphasis apparatus carries out the following as the pitch enhancement processing: for a time segment in which a spectral envelope of a signal has been determined to be flat, obtaining an output signal for each of times in the time segment, the output signal being a signal including a signal obtained by adding (1) a signal obtained by multiplying the signal of a time, further in the past than the time by a number of samples T0 corresponding to a pitch period of the time segment, a pitch gain ?0 of the time segment, a predetermined constant B0, and a value greater than 0 and less than 1, to (2) the signal of the time.Type: GrantFiled: July 7, 2023Date of Patent: October 1, 2024Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATIONInventors: Yutaka Kamamoto, Ryosuke Sugiura, Takehiro Moriya
-
Patent number: 12087270Abstract: Techniques for generating customized synthetic voices personalized to a user, based on user-provided feedback, are described. A system may determine embedding data representing a user-provided description of a desired synthetic voice and profile data associated with the user, and generate synthetic voice embedding data using synthetic voice embedding data corresponding a profile associated with a user determined to be similar to the current user. Based on user-provided feedback with respect to a customized synthetic voice, generated using synthetic voice characteristics corresponding to the synthetic voice embedding data and presented to the user, and the synthetic voice embedding data, the system may generate new synthetic voice embedding data, corresponding to a new customized synthetic voice. The system may be configured to assign the customized synthetic voice to the user, such that a subsequent user may not be presented with the same customized synthetic voice.Type: GrantFiled: September 29, 2022Date of Patent: September 10, 2024Assignee: Amazon Technologies, Inc.Inventors: Sebastian Dariusz Cygert, Daniel Korzekwa, Kamil Pokora, Piotr Tadeusz Bilinski, Kayoko Yanagisawa, Abdelhamid Ezzerg, Thomas Edward Merritt, Raghu Ram Sreepada Srinivas, Nikhil Sharma
-
Patent number: 12086715Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing sequence modeling tasks using insertions. One of the methods includes receiving a system input that includes one or more source elements from a source sequence and zero or more target elements from a target sequence, wherein each source element is selected from a vocabulary of source elements and wherein each target element is selected from a vocabulary of target elements; generating a partial concatenated sequence that includes the one or more source elements from the source sequence and the zero or more target elements from the target sequence, wherein the source and target elements arranged in the partial concatenated sequence according to a combined order; and generating a final concatenated sequence that includes a finalized source sequence and a finalized target sequence, wherein the finalized target sequence includes one or more target elements.Type: GrantFiled: May 22, 2023Date of Patent: September 10, 2024Assignee: Google LLCInventors: William Chan, Mitchell Thomas Stern, Nikita Kitaev, Kelvin Gu, Jakob D. Uszkoreit
-
Patent number: 12087273Abstract: A method includes receiving an input text sequence to be synthesized into speech in a first language and obtaining a speaker embedding, the speaker embedding specifying specific voice characteristics of a target speaker for synthesizing the input text sequence into speech that clones a voice of the target speaker. The target speaker includes a native speaker of a second language different than the first language. The method also includes generating, using a text-to-speech (TTS) model, an output audio feature representation of the input text by processing the input text sequence and the speaker embedding. The output audio feature representation includes the voice characteristics of the target speaker specified by the speaker embedding.Type: GrantFiled: January 30, 2023Date of Patent: September 10, 2024Assignee: Google LLCInventors: Yu Zhang, Ron J. Weiss, Byungha Chun, Yonghui Wu, Zhifeng Chen, Russell John Wyatt Skerry-Ryan, Ye Jia, Andrew M. Rosenberg, Bhuvana Ramabhadran
-
Patent number: 12080269Abstract: A speech-processing system receives input data corresponding to one or more characteristics of speech. The system determines parameters representing the characteristics and, using the parameters, encoded values corresponding to the characteristics. A speech synthesis component of the speech-processing processes the encoded values to determine audio data including a representation of the speech and corresponding to the characteristics.Type: GrantFiled: May 10, 2022Date of Patent: September 3, 2024Assignee: Amazon Technologies, Inc.Inventors: Abdigani Mohamed Diriye, Jaime Lorenzo Trueba, Patryk Golebiowski, Piotr Jozwiak
-
Patent number: 12067972Abstract: An electronic device is provided. The electronic device includes a processor and a memory operatively connected to the processor. The memory may store instructions that, when executed, cause the processor to receive a voice input of a user, to extract a feature from the voice input of the user, to select an acoustic model through comparison with the extracted feature, and to learn the feature of the voice input by performing fine-tuning on the selected acoustic model.Type: GrantFiled: November 22, 2021Date of Patent: August 20, 2024Assignee: Samsung Electronics Co., Ltd.Inventors: Junkwang Oh, Hosik Cho
-
Patent number: 12067969Abstract: A method for estimating an embedding capacity includes receiving, at a deterministic reference encoder, a reference audio signal, and determining a reference embedding corresponding to the reference audio signal, the reference embedding having a corresponding embedding dimensionality. The method also includes measuring a first reconstruction loss as a function of the corresponding embedding dimensionality of the reference embedding and obtaining a variational embedding from a variational posterior. The variational embedding has a corresponding embedding dimensionality and a specified capacity. The method also includes measuring a second reconstruction loss as a function of the corresponding embedding dimensionality of the variational embedding and estimating a capacity of the reference embedding by comparing the first measured reconstruction loss for the reference embedding relative to the second measured reconstruction loss for the variational embedding having the specified capacity.Type: GrantFiled: April 18, 2023Date of Patent: August 20, 2024Assignee: Google LLCInventors: Eric Dean Battenberg, Daisy Stanton, Russell John Wyatt Skerry-Ryan, Soroosh Mariooryad, David Teh-Hwa Kao, Thomas Edward Bagby, Sean Matthew Shannon
-
Patent number: 12057134Abstract: Methods and systems for providing changes to a live voice stream of a person are disclosed. A change to be made to the live voice stream based on user information can be identified. The live voice stream can be changed based on the user information.Type: GrantFiled: February 14, 2023Date of Patent: August 6, 2024Assignee: CAPITAL ONE SERVICES, LLCInventors: Ahn Truong, Mark Watson, Jeremy Goodsitt, Vincent Pham, Fardin Abdi Taghi Abad, Kate Key, Austin Walters, Reza Farivar
-
Patent number: 12046226Abstract: A text-to-speech synthesis method comprising: receiving text; inputting the received text in a prediction network; and generating speech data, wherein the prediction network comprises a neural network, and wherein the neural network is trained by: receiving a first training dataset comprising audio data and corresponding text data; acquiring an expressivity score for each audio sample of the audio data, wherein the expressivity score is a quantitative representation of how well an audio sample conveys emotional information and sounds natural, realistic and human-like; training the neural network using a first sub-dataset, and further training the neural network using a second sub-dataset, wherein the first sub-dataset and the second sub-dataset comprise audio samples and corresponding text from the first training dataset and wherein the average expressivity score of the audio data in the second sub-dataset is higher than the average expressivity score of the audio data in the first sub-dataset.Type: GrantFiled: December 17, 2020Date of Patent: July 23, 2024Assignee: Spotify ABInventors: John Flynn, Zeenat Qureshi
-
Patent number: 12039272Abstract: The invention provides a method and system for training a machine learning-based patent search or novelty evaluation system. The method comprises providing a plurality of patent documents each having a computer-identifiable claim block and specification block, the specification block including at least part of the description of the patent document. The method also comprises providing a machine learning model and training the machine learning model using a training data set comprising data from said patent documents for forming a trained machine learning model. According to the invention, the training comprises using pairs of claim blocks and specification blocks originating from the same patent document as training cases of said training data set.Type: GrantFiled: October 13, 2019Date of Patent: July 16, 2024Assignee: IPRally Technologies OyInventor: Sakari Arvela
-
Patent number: 12020687Abstract: Embodiments of the present systems and methods may provide techniques for synthesizing speech in any voice in any language in any accent. For example, in an embodiment, a text-to-speech conversion system may comprise a text converter adapted to convert input text to at least one phoneme selected from a plurality of phonemes stored in memory, a machine-learning model storing voice patterns for a plurality of individuals and adapted to receive the at least one phoneme and an identity of a speaker and to generate acoustic features for each phoneme, and a decoder adapted to receive the generated acoustic features and to generate a speech signal simulating a voice of the identified speaker in a language.Type: GrantFiled: February 6, 2023Date of Patent: June 25, 2024Assignee: Georgetown UniversityInventors: Joe Garman, Ophir Frieder
-
Patent number: 12020682Abstract: A method for Parametric resynthesis (PR) producing an audible signal. A degraded audio signal is received which includes a distorted target audio signal. A prediction model predicts parameters of the audible signal from the degraded signal. The prediction model was trained to minimize a loss function between the target audio signal and the predicted audible signal. The predicted parameters are provided to a waveform generator which synthesizes the audible signal.Type: GrantFiled: March 20, 2020Date of Patent: June 25, 2024Assignee: Research Foundation of the City University of New YorkInventors: Michael Mandel, Soumi Maiti
-
Patent number: 12008993Abstract: A method can include: generating a first awakening request and transmitting the first awakening request to a server responsive to a voice awakening event; receiving a second awakening request transmitted by a second intelligent device, where the second awakening request is generated by the second intelligent device which transmits the second awakening request responsive to the voice awakening event; if a decision-making condition is met, generating a first awakening result according to a preset decision-making rule and transmitting the first awakening result to each intelligent device; if the first awakening result is generated, performing awakening or inhibiting awakening according to the second awakening result after receiving a second awakening result returned by the server according to the first awakening request and the second awakening request; and if the first awakening result is generated, performing awakening or inhibiting awakening according to the first awakening result before receiving the secondType: GrantFiled: December 30, 2021Date of Patent: June 11, 2024Assignees: BEIJING XIAOMI MOBILE SOFTWARE CO., LTD., BEIJING XIAOMI PINECONE ELECTRONICS CO., LTD.Inventor: Zhuliang Huang
-
Patent number: 12009004Abstract: Embodiments of this disclosure provide a speech enhancement method and apparatus, an electronic device, and a computer-readable storage medium. The method includes: obtaining a clean speech sample; decomposing the clean speech sample to obtain a first sparse matrix and m base matrices, values in the first sparse matrix being all positive numbers, and m being a positive integer greater than 1; obtaining, according to the first sparse matrix and a weight matrix of a target neural network, state vectors of neurons in a visible layer of the target neural network; and updating the weight matrix according to the state vectors of the neurons in the visible layer and the clean speech sample, to obtain a deep dictionary used for speech enhancement.Type: GrantFiled: April 11, 2022Date of Patent: June 11, 2024Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITEDInventors: Xuefei Fang, Xiaochun Cui, Congbing Li, Xiaoyu Liu, Muyong Cao, Tao Yu, Dong Yang, Rongxin Zhou, Wenyan Li
-
Patent number: 11996084Abstract: The present disclosure discloses a speech synthesis method and apparatus, a device and a computer storage medium, and relates to speech and deep learning technologies in the field of artificial intelligence technologies. A specific implementation solution involves: acquiring to-be-synthesized text; acquiring a prosody feature extracted from the text; inputting the text and the prosody feature into a speech synthesis model to obtain a vocoder feature; and inputting the vocoder feature into a vocoder to obtain synthesized speech.Type: GrantFiled: May 6, 2022Date of Patent: May 28, 2024Assignee: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.Inventors: Liqiang Zhang, Jiankang Hou, Tao Sun, Lei Jia
-
Patent number: 11995411Abstract: Relevance scores may be determined based on text included in a document. The text may be divided into a text portions, with the relevance scores being determined based on a comparison of a text portion of the plurality of text portions with a criterion specified in natural language. A subset of the plurality of text portions may be selected based on the plurality of relevance scores, with each of the subset of the plurality of text portions having a relevance score surpassing a threshold. A criteria evaluation prompt may be sent to a remote text generation modeling system via a communication interface. The criteria evaluation prompts may include an instruction to evaluate one or more of the subset of text portions against the criterion.Type: GrantFiled: June 5, 2023Date of Patent: May 28, 2024Assignee: Casetext, Inc.Inventors: Javed Qadrud-Din, Brian O'Kelly, Alan deLevie, Ethan Blake, Walter DeFoor, Ryan Walker, Pablo Arredondo
-
Patent number: 11996103Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for voice recognition. In one aspect, a method includes the actions of receiving a voice input; determining a transcription for the voice input, wherein determining the transcription for the voice input includes, for a plurality of segments of the voice input: obtaining a first candidate transcription for a first segment of the voice input; determining one or more contexts associated with the first candidate transcription; adjusting a respective weight for each of the one or more contexts; and determining a second candidate transcription for a second segment of the voice input based in part on the adjusted weights; and providing the transcription of the plurality of segments of the voice input for output.Type: GrantFiled: July 11, 2022Date of Patent: May 28, 2024Assignee: Google LLCInventors: Petar Aleksic, Pedro J. Moreno Mengibar
-
Patent number: 11996107Abstract: Provided is a technique according to which it is possible to obtain a decoded sound signal of high sound quality without significantly increasing the delay time compared to a configuration in which only a decoded sound signal of the minimum necessary sound quality is obtained. In a terminal apparatus connected to a first communication line and a second communication line with a lower priority level therethan, sound signals of multiple channels are obtained and output based on a monaural code included in a first code string input from the first communication line and an extended code included in a second code string with the closest frame number to that of the monaural code among extended codes included in the second code string input from the second communication line.Type: GrantFiled: December 27, 2019Date of Patent: May 28, 2024Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATIONInventors: Takehiro Moriya, Yutaka Kamamoto, Ryosuke Sugiura
-
Patent number: 11990124Abstract: A method includes obtaining an utterance from a user including a user query directed toward a digital assistant. The method includes generating, using a language model, a first prediction string based on the utterance and determining whether the first prediction string includes an application programming interface (API) call to invoke a program via an API. When the first prediction string includes the API call to invoke the program, the method includes calling, using the API call, the program via the API to retrieve a program result; receiving, via the API, the program result; updating a conversational context with the program result that includes the utterance; and generating, using the language model, a second prediction string based on the updated conversational context. When the first prediction string does not include the API call, the method includes providing an utterance response to the utterance based on the first prediction string.Type: GrantFiled: December 22, 2021Date of Patent: May 21, 2024Assignee: Google LLCInventors: William J. Byrne, Karthik Krishnamoorthi, Saravanan Ganesh
-
Patent number: 11978478Abstract: A speech recognition system utilizing automatic speech recognition techniques such as end-pointing techniques in conjunction with beamforming and/or signal processing to isolate speech from one or more speaking users from multiple received audio signals and to detect the beginning and/or end of the speech based at least in part on the isolation. Audio capture devices such as microphones may be arranged in a beamforming array to receive the multiple audio signals. Multiple audio sources including speech may be identified in different beams and processed.Type: GrantFiled: March 13, 2023Date of Patent: May 7, 2024Assignee: Amazon Technologies, Inc.Inventors: Kenneth John Basye, Jeffrey Penrod Adams