Patents Examined by Shreyans A Patel
  • Patent number: 12106767
    Abstract: Provided is pitch enhancement processing having little unnaturalness even in time segments for consonants, and having little unnaturalness to listeners caused by discontinuities even when time segments for consonants and other time segments switch frequently. A pitch emphasis apparatus carries out the following as the pitch enhancement processing: for a time segment in which a spectral envelope of a signal has been determined to be flat, obtaining an output signal for each of times in the time segment, the output signal being a signal including a signal obtained by adding (1) a signal obtained by multiplying the signal of a time, further in the past than the time by a number of samples T0 corresponding to a pitch period of the time segment, a pitch gain ?0 of the time segment, a predetermined constant B0, and a value greater than 0 and less than 1, to (2) the signal of the time.
    Type: Grant
    Filed: July 7, 2023
    Date of Patent: October 1, 2024
    Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Yutaka Kamamoto, Ryosuke Sugiura, Takehiro Moriya
  • Patent number: 12087270
    Abstract: Techniques for generating customized synthetic voices personalized to a user, based on user-provided feedback, are described. A system may determine embedding data representing a user-provided description of a desired synthetic voice and profile data associated with the user, and generate synthetic voice embedding data using synthetic voice embedding data corresponding a profile associated with a user determined to be similar to the current user. Based on user-provided feedback with respect to a customized synthetic voice, generated using synthetic voice characteristics corresponding to the synthetic voice embedding data and presented to the user, and the synthetic voice embedding data, the system may generate new synthetic voice embedding data, corresponding to a new customized synthetic voice. The system may be configured to assign the customized synthetic voice to the user, such that a subsequent user may not be presented with the same customized synthetic voice.
    Type: Grant
    Filed: September 29, 2022
    Date of Patent: September 10, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Sebastian Dariusz Cygert, Daniel Korzekwa, Kamil Pokora, Piotr Tadeusz Bilinski, Kayoko Yanagisawa, Abdelhamid Ezzerg, Thomas Edward Merritt, Raghu Ram Sreepada Srinivas, Nikhil Sharma
  • Patent number: 12086715
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing sequence modeling tasks using insertions. One of the methods includes receiving a system input that includes one or more source elements from a source sequence and zero or more target elements from a target sequence, wherein each source element is selected from a vocabulary of source elements and wherein each target element is selected from a vocabulary of target elements; generating a partial concatenated sequence that includes the one or more source elements from the source sequence and the zero or more target elements from the target sequence, wherein the source and target elements arranged in the partial concatenated sequence according to a combined order; and generating a final concatenated sequence that includes a finalized source sequence and a finalized target sequence, wherein the finalized target sequence includes one or more target elements.
    Type: Grant
    Filed: May 22, 2023
    Date of Patent: September 10, 2024
    Assignee: Google LLC
    Inventors: William Chan, Mitchell Thomas Stern, Nikita Kitaev, Kelvin Gu, Jakob D. Uszkoreit
  • Patent number: 12087273
    Abstract: A method includes receiving an input text sequence to be synthesized into speech in a first language and obtaining a speaker embedding, the speaker embedding specifying specific voice characteristics of a target speaker for synthesizing the input text sequence into speech that clones a voice of the target speaker. The target speaker includes a native speaker of a second language different than the first language. The method also includes generating, using a text-to-speech (TTS) model, an output audio feature representation of the input text by processing the input text sequence and the speaker embedding. The output audio feature representation includes the voice characteristics of the target speaker specified by the speaker embedding.
    Type: Grant
    Filed: January 30, 2023
    Date of Patent: September 10, 2024
    Assignee: Google LLC
    Inventors: Yu Zhang, Ron J. Weiss, Byungha Chun, Yonghui Wu, Zhifeng Chen, Russell John Wyatt Skerry-Ryan, Ye Jia, Andrew M. Rosenberg, Bhuvana Ramabhadran
  • Patent number: 12080269
    Abstract: A speech-processing system receives input data corresponding to one or more characteristics of speech. The system determines parameters representing the characteristics and, using the parameters, encoded values corresponding to the characteristics. A speech synthesis component of the speech-processing processes the encoded values to determine audio data including a representation of the speech and corresponding to the characteristics.
    Type: Grant
    Filed: May 10, 2022
    Date of Patent: September 3, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Abdigani Mohamed Diriye, Jaime Lorenzo Trueba, Patryk Golebiowski, Piotr Jozwiak
  • Patent number: 12067972
    Abstract: An electronic device is provided. The electronic device includes a processor and a memory operatively connected to the processor. The memory may store instructions that, when executed, cause the processor to receive a voice input of a user, to extract a feature from the voice input of the user, to select an acoustic model through comparison with the extracted feature, and to learn the feature of the voice input by performing fine-tuning on the selected acoustic model.
    Type: Grant
    Filed: November 22, 2021
    Date of Patent: August 20, 2024
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Junkwang Oh, Hosik Cho
  • Patent number: 12067969
    Abstract: A method for estimating an embedding capacity includes receiving, at a deterministic reference encoder, a reference audio signal, and determining a reference embedding corresponding to the reference audio signal, the reference embedding having a corresponding embedding dimensionality. The method also includes measuring a first reconstruction loss as a function of the corresponding embedding dimensionality of the reference embedding and obtaining a variational embedding from a variational posterior. The variational embedding has a corresponding embedding dimensionality and a specified capacity. The method also includes measuring a second reconstruction loss as a function of the corresponding embedding dimensionality of the variational embedding and estimating a capacity of the reference embedding by comparing the first measured reconstruction loss for the reference embedding relative to the second measured reconstruction loss for the variational embedding having the specified capacity.
    Type: Grant
    Filed: April 18, 2023
    Date of Patent: August 20, 2024
    Assignee: Google LLC
    Inventors: Eric Dean Battenberg, Daisy Stanton, Russell John Wyatt Skerry-Ryan, Soroosh Mariooryad, David Teh-Hwa Kao, Thomas Edward Bagby, Sean Matthew Shannon
  • Patent number: 12057134
    Abstract: Methods and systems for providing changes to a live voice stream of a person are disclosed. A change to be made to the live voice stream based on user information can be identified. The live voice stream can be changed based on the user information.
    Type: Grant
    Filed: February 14, 2023
    Date of Patent: August 6, 2024
    Assignee: CAPITAL ONE SERVICES, LLC
    Inventors: Ahn Truong, Mark Watson, Jeremy Goodsitt, Vincent Pham, Fardin Abdi Taghi Abad, Kate Key, Austin Walters, Reza Farivar
  • Patent number: 12046226
    Abstract: A text-to-speech synthesis method comprising: receiving text; inputting the received text in a prediction network; and generating speech data, wherein the prediction network comprises a neural network, and wherein the neural network is trained by: receiving a first training dataset comprising audio data and corresponding text data; acquiring an expressivity score for each audio sample of the audio data, wherein the expressivity score is a quantitative representation of how well an audio sample conveys emotional information and sounds natural, realistic and human-like; training the neural network using a first sub-dataset, and further training the neural network using a second sub-dataset, wherein the first sub-dataset and the second sub-dataset comprise audio samples and corresponding text from the first training dataset and wherein the average expressivity score of the audio data in the second sub-dataset is higher than the average expressivity score of the audio data in the first sub-dataset.
    Type: Grant
    Filed: December 17, 2020
    Date of Patent: July 23, 2024
    Assignee: Spotify AB
    Inventors: John Flynn, Zeenat Qureshi
  • Patent number: 12039272
    Abstract: The invention provides a method and system for training a machine learning-based patent search or novelty evaluation system. The method comprises providing a plurality of patent documents each having a computer-identifiable claim block and specification block, the specification block including at least part of the description of the patent document. The method also comprises providing a machine learning model and training the machine learning model using a training data set comprising data from said patent documents for forming a trained machine learning model. According to the invention, the training comprises using pairs of claim blocks and specification blocks originating from the same patent document as training cases of said training data set.
    Type: Grant
    Filed: October 13, 2019
    Date of Patent: July 16, 2024
    Assignee: IPRally Technologies Oy
    Inventor: Sakari Arvela
  • Patent number: 12020687
    Abstract: Embodiments of the present systems and methods may provide techniques for synthesizing speech in any voice in any language in any accent. For example, in an embodiment, a text-to-speech conversion system may comprise a text converter adapted to convert input text to at least one phoneme selected from a plurality of phonemes stored in memory, a machine-learning model storing voice patterns for a plurality of individuals and adapted to receive the at least one phoneme and an identity of a speaker and to generate acoustic features for each phoneme, and a decoder adapted to receive the generated acoustic features and to generate a speech signal simulating a voice of the identified speaker in a language.
    Type: Grant
    Filed: February 6, 2023
    Date of Patent: June 25, 2024
    Assignee: Georgetown University
    Inventors: Joe Garman, Ophir Frieder
  • Patent number: 12020682
    Abstract: A method for Parametric resynthesis (PR) producing an audible signal. A degraded audio signal is received which includes a distorted target audio signal. A prediction model predicts parameters of the audible signal from the degraded signal. The prediction model was trained to minimize a loss function between the target audio signal and the predicted audible signal. The predicted parameters are provided to a waveform generator which synthesizes the audible signal.
    Type: Grant
    Filed: March 20, 2020
    Date of Patent: June 25, 2024
    Assignee: Research Foundation of the City University of New York
    Inventors: Michael Mandel, Soumi Maiti
  • Patent number: 12008993
    Abstract: A method can include: generating a first awakening request and transmitting the first awakening request to a server responsive to a voice awakening event; receiving a second awakening request transmitted by a second intelligent device, where the second awakening request is generated by the second intelligent device which transmits the second awakening request responsive to the voice awakening event; if a decision-making condition is met, generating a first awakening result according to a preset decision-making rule and transmitting the first awakening result to each intelligent device; if the first awakening result is generated, performing awakening or inhibiting awakening according to the second awakening result after receiving a second awakening result returned by the server according to the first awakening request and the second awakening request; and if the first awakening result is generated, performing awakening or inhibiting awakening according to the first awakening result before receiving the second
    Type: Grant
    Filed: December 30, 2021
    Date of Patent: June 11, 2024
    Assignees: BEIJING XIAOMI MOBILE SOFTWARE CO., LTD., BEIJING XIAOMI PINECONE ELECTRONICS CO., LTD.
    Inventor: Zhuliang Huang
  • Patent number: 12009004
    Abstract: Embodiments of this disclosure provide a speech enhancement method and apparatus, an electronic device, and a computer-readable storage medium. The method includes: obtaining a clean speech sample; decomposing the clean speech sample to obtain a first sparse matrix and m base matrices, values in the first sparse matrix being all positive numbers, and m being a positive integer greater than 1; obtaining, according to the first sparse matrix and a weight matrix of a target neural network, state vectors of neurons in a visible layer of the target neural network; and updating the weight matrix according to the state vectors of the neurons in the visible layer and the clean speech sample, to obtain a deep dictionary used for speech enhancement.
    Type: Grant
    Filed: April 11, 2022
    Date of Patent: June 11, 2024
    Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED
    Inventors: Xuefei Fang, Xiaochun Cui, Congbing Li, Xiaoyu Liu, Muyong Cao, Tao Yu, Dong Yang, Rongxin Zhou, Wenyan Li
  • Patent number: 11996084
    Abstract: The present disclosure discloses a speech synthesis method and apparatus, a device and a computer storage medium, and relates to speech and deep learning technologies in the field of artificial intelligence technologies. A specific implementation solution involves: acquiring to-be-synthesized text; acquiring a prosody feature extracted from the text; inputting the text and the prosody feature into a speech synthesis model to obtain a vocoder feature; and inputting the vocoder feature into a vocoder to obtain synthesized speech.
    Type: Grant
    Filed: May 6, 2022
    Date of Patent: May 28, 2024
    Assignee: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.
    Inventors: Liqiang Zhang, Jiankang Hou, Tao Sun, Lei Jia
  • Patent number: 11995411
    Abstract: Relevance scores may be determined based on text included in a document. The text may be divided into a text portions, with the relevance scores being determined based on a comparison of a text portion of the plurality of text portions with a criterion specified in natural language. A subset of the plurality of text portions may be selected based on the plurality of relevance scores, with each of the subset of the plurality of text portions having a relevance score surpassing a threshold. A criteria evaluation prompt may be sent to a remote text generation modeling system via a communication interface. The criteria evaluation prompts may include an instruction to evaluate one or more of the subset of text portions against the criterion.
    Type: Grant
    Filed: June 5, 2023
    Date of Patent: May 28, 2024
    Assignee: Casetext, Inc.
    Inventors: Javed Qadrud-Din, Brian O'Kelly, Alan deLevie, Ethan Blake, Walter DeFoor, Ryan Walker, Pablo Arredondo
  • Patent number: 11996103
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for voice recognition. In one aspect, a method includes the actions of receiving a voice input; determining a transcription for the voice input, wherein determining the transcription for the voice input includes, for a plurality of segments of the voice input: obtaining a first candidate transcription for a first segment of the voice input; determining one or more contexts associated with the first candidate transcription; adjusting a respective weight for each of the one or more contexts; and determining a second candidate transcription for a second segment of the voice input based in part on the adjusted weights; and providing the transcription of the plurality of segments of the voice input for output.
    Type: Grant
    Filed: July 11, 2022
    Date of Patent: May 28, 2024
    Assignee: Google LLC
    Inventors: Petar Aleksic, Pedro J. Moreno Mengibar
  • Patent number: 11996107
    Abstract: Provided is a technique according to which it is possible to obtain a decoded sound signal of high sound quality without significantly increasing the delay time compared to a configuration in which only a decoded sound signal of the minimum necessary sound quality is obtained. In a terminal apparatus connected to a first communication line and a second communication line with a lower priority level therethan, sound signals of multiple channels are obtained and output based on a monaural code included in a first code string input from the first communication line and an extended code included in a second code string with the closest frame number to that of the monaural code among extended codes included in the second code string input from the second communication line.
    Type: Grant
    Filed: December 27, 2019
    Date of Patent: May 28, 2024
    Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Takehiro Moriya, Yutaka Kamamoto, Ryosuke Sugiura
  • Patent number: 11990124
    Abstract: A method includes obtaining an utterance from a user including a user query directed toward a digital assistant. The method includes generating, using a language model, a first prediction string based on the utterance and determining whether the first prediction string includes an application programming interface (API) call to invoke a program via an API. When the first prediction string includes the API call to invoke the program, the method includes calling, using the API call, the program via the API to retrieve a program result; receiving, via the API, the program result; updating a conversational context with the program result that includes the utterance; and generating, using the language model, a second prediction string based on the updated conversational context. When the first prediction string does not include the API call, the method includes providing an utterance response to the utterance based on the first prediction string.
    Type: Grant
    Filed: December 22, 2021
    Date of Patent: May 21, 2024
    Assignee: Google LLC
    Inventors: William J. Byrne, Karthik Krishnamoorthi, Saravanan Ganesh
  • Patent number: 11978478
    Abstract: A speech recognition system utilizing automatic speech recognition techniques such as end-pointing techniques in conjunction with beamforming and/or signal processing to isolate speech from one or more speaking users from multiple received audio signals and to detect the beginning and/or end of the speech based at least in part on the isolation. Audio capture devices such as microphones may be arranged in a beamforming array to receive the multiple audio signals. Multiple audio sources including speech may be identified in different beams and processed.
    Type: Grant
    Filed: March 13, 2023
    Date of Patent: May 7, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Kenneth John Basye, Jeffrey Penrod Adams