Patents Assigned to OBEN, INC.
  • Patent number: 11276389
    Abstract: A personalized text-to-speech system configured to perform speaker adaption is disclosed. The TTS system includes an acoustic model comprising a base neural network and a differential neural network. The base neural network is configured to generate acoustic parameters corresponding to a base speaker or voice actor, while the differential neural network is configured to generate acoustic parameters corresponding to differences between acoustic parameters of the base speaker and a particular target speaker. The output of the acoustic model is then a weighted linear combination of the output from the base neural network and differential neural network. The base neural network and differential neural network share a first input layer and first plurality of hidden layers. Thereafter, the base neural network further comprises a second plurality of hidden layers and output layer. In parallel, the differential neural network further comprises a third plurality of hidden layers and separate output layer.
    Type: Grant
    Filed: December 2, 2019
    Date of Patent: March 15, 2022
    Assignee: OBEN, INC.
    Inventor: Sandesh Aryal
  • Patent number: 11183169
    Abstract: A technique to enhance the quality of Text-to-Speech (TTS) based Singing Voice generation is disclosed. The present invention efficiently preserves the speaker identity and improves sound quality by incorporating speaker-independent natural singing information into TTS-based Speech-to-Singing (STS). The Template-based Text-to-Singing (TTTS) system merges qualities of a singing voice generated from a TTS system with qualities of a singing voice generated from an actual voice singing the song. The qualities are represented in terms of Mel-generalized cepstrum (MGC) coefficients. In particular, low-order MGC coefficients from the TTS-based singing voice with high-order MGC coefficients from the voice of an actual singer.
    Type: Grant
    Filed: November 8, 2019
    Date of Patent: November 23, 2021
    Assignee: OBEN, INC.
    Inventors: Kantapon Kaewtip, Fernando Villavicencio
  • Patent number: 10706867
    Abstract: A method and system for converting a source voice to a target voice is disclosed. The method comprises: recording source voice data and target voice data; extracting spectral envelope features from the source voice data and target voice data; time-aligning pairs of frames based on the extracted spectral envelope features; converting each pair of frames into a frequency domain; generating a plurality of frequency-warping factor candidates, wherein each of the plurality of frequency-warping factor candidates is associated with one of the pairs of frames; generating a single global frequency-warping factor based on the candidates; acquiring source speech; converting the source speech to target speech based on the global frequency-warping factor; generating a waveform comprising the target speech; and playing the waveform comprising the target speech to a user.
    Type: Grant
    Filed: March 5, 2018
    Date of Patent: July 7, 2020
    Assignee: OBEN, INC.
    Inventors: Fernando Villavicencio, Mark Harvilla
  • Patent number: 10706856
    Abstract: A speaker identification/verification system comprises at least one feature extractor for extracting a plurality of audio features from speaker voice data, a plurality of speaker-specific subsystems, and a decision module. Each of the speaker-specific subsystem comprises: a neural network configured to generate an estimate of the plurality of extracted audio features based on the plurality of extracted audio features, and an error module. Each of the plurality of neural networks is associated with one of a plurality of speakers, and the one speaker associated with each of the plurality of neural networks is different for all neural networks. The error module is configured to estimate an error based on the plurality of extracted audio features and the estimate of the plurality of extracted audio features generated by the associated neural network. The neural networks are speaker-specific auto-encoders trained for one user and therefore calibrated on that particular user's speech.
    Type: Grant
    Filed: September 12, 2017
    Date of Patent: July 7, 2020
    Assignee: OBEN, INC.
    Inventor: Mohammad Mehdi Korjani
  • Patent number: 10643600
    Abstract: A method and system for personalizing synthetic speech from a text-to-speech (TTS) system is disclosed. The method uses linguistic feature vectors to correct/modify the synthetic speech, particularly Chinese Mandarin speech. The linguistic feature vectors are used to generate or retrieve onset and rime scaling factors encoding differences between the synthetic speech and a user's natural speech. Together, the onset and rime scaling factors are used to modify every word/syllable of the synthetic speech from a TTS system, for example. In particular, segments of synthetic speech are either compressed or stretched in time for each part of each syllable of the synthetic speech. After modification, the synthetic speech more closely resembles the speech patterns of a speaker for which the scaling factors were generated. The modified synthetic speech may then be transmitted to a user and played to the user via a mobile phone, for example.
    Type: Grant
    Filed: March 9, 2018
    Date of Patent: May 5, 2020
    Assignee: OBEN, INC.
    Inventor: Sandesh Aryal
  • Patent number: 10535336
    Abstract: A system and method of converting source speech to target speech using intermediate speech data is disclosed. The method comprises identifying intermediate speech data that match target voice training data based on acoustic features; performing dynamic time warping to match the second set of acoustic features of intermediate speech data and the first set of acoustic features of target voice training data; training a neural network to convert the intermediate speech data to target voice training data; receiving source speech data; converting the source speech data to an intermediate speech; converting the intermediate speech to a target speech sequence using the neural network; and converting the target speech sequence to target speech using the pitch from the target voice training data.
    Type: Grant
    Filed: January 22, 2019
    Date of Patent: January 14, 2020
    Assignee: OBEN, INC.
    Inventor: Seyed Hamidreza Mohammadi
  • Patent number: 10453476
    Abstract: A voice conversion system suitable for encoding small and large corpuses is disclosed. The voice conversion system comprises hardware including a neural network for generating estimated target speech data based on source speech data. The neural network includes an input layer, an output layer, and a novel split-model hidden layer. The input layer comprises a first portion and a second portion. The output layer comprises a third portion and a fourth portion. The hidden layer comprises a first subnet and a second subnet, wherein the first subnet is directly connected to the first portion of the input layer and the third portion of the output layer, and wherein the second subnet is directly connected to the second portion of the input layer and the fourth portion of the output layer. The first subnet and second subnet operate in parallel, and link to different but overlapping nodes of the input layer.
    Type: Grant
    Filed: July 21, 2017
    Date of Patent: October 22, 2019
    Assignee: OBEN, INC.
    Inventor: Sandesh Aryal
  • Patent number: 10354671
    Abstract: A voice coder configured to resolve periodic and aperiodic components of spectra is disclosed. The method of voice coding includes parsing the speech signal into a plurality of speech frames; for each of the plurality of speech frames: (a) generating the spectra for the speech frame, (b) parsing the spectra of the speech frame into a plurality of sub-bands, (c) transforming each of the plurality of sub-bands into a time-domain envelope signal, and (d) generating a plurality of sub-band voicing factors, wherein each sub-band voicing factor indicates the harmonicity of one of the plurality of sub-bands, and each sub-band voicing factor is based on the periodicity of one of said time-domain envelope signals associated with one of the plurality of sub-bands.
    Type: Grant
    Filed: February 21, 2018
    Date of Patent: July 16, 2019
    Assignee: OBEN, INC.
    Inventors: Kantapon Kaewtip, Fernando Villavicencio, Mark Harvilla
  • Patent number: 10249314
    Abstract: A voice conversion system for generating realistic, natural-sounding target speech is disclosed. The voice conversion system preferably comprises a neural network for converting the source speech data to estimated target speech data; a global variance correction module; a modulation spectrum correction module; and a waveform generator. The global variance correction module is configured to scale and shift (or normalize and de-normalize) the estimated target speech based on (i) a mean and standard deviation of the source speech data, and further based on (ii) a mean and standard deviation of the estimated target speech data. The modulation spectrum correction module is configured to apply a plurality of filters to the estimated target speech data after it has been scaled and shifted by the global variance correction module. Each filter is designed to correct the trajectory representing the curve of one MCEP coefficient over time.
    Type: Grant
    Filed: July 21, 2017
    Date of Patent: April 2, 2019
    Assignee: OBEN, INC.
    Inventor: Sandesh Aryal
  • Patent number: 10186251
    Abstract: A system and method of converting source speech to target speech using intermediate speech data is disclosed. The method comprises identifying intermediate speech data that match target voice training data based on acoustic features; performing dynamic time warping to match the second set of acoustic features of intermediate speech data and the first set of acoustic features of target voice training data; training a neural network to convert the intermediate speech data to target voice training data; receiving source speech data; converting the source speech data to an intermediate speech; converting the intermediate speech to a target speech sequence using the neural network; and converting the target speech sequence to target speech using the pitch from the target voice training data.
    Type: Grant
    Filed: August 4, 2016
    Date of Patent: January 22, 2019
    Assignee: OBEN, INC.
    Inventor: Seyed Hamidreza Mohammadi
  • Patent number: 10186252
    Abstract: A system and method for converting text to speech is disclosed. The text is decomposed into a sequence of phonemes and a text feature matrix constructed to define the manner in which the phonemes are pronounced and accented. A spectrum generator then queries a neural network to produce normalized spectrograms based on the input of the sequence of phonemes and features. Normalized spectrograms are fixed-length spectrograms with uniform temporal length (i.e., data size), which enables them to be effectively encoded into a neural network representation. A duration generator output a plurality of durations that are associated with phonemes. A speech synthesizer modifies the temporal length (i.e., de-normalizes) of each normalized spectrogram based on the associated duration, and then combines the plurality of modified spectrograms into speech.
    Type: Grant
    Filed: August 12, 2016
    Date of Patent: January 22, 2019
    Assignee: OBEN, INC.
    Inventor: Seyed Hamidreza Mohammadi
  • Patent number: 10008193
    Abstract: A singing voice conversion system configured to generate a song in the voice of a target singer based on a song in the voice of a source singer is disclosed. The embodiment utilizes two complementary approaches to voice timbre conversion. Both combine the natural prosody of a source singer with the pitch of the target singer—typically the user of the system—to achieve realistic sounding synthetic singing. The system is able to transpose the key of any song to match the automatically determined or desired pitch range of the target singer, thus allowing the system to generalize to any target singer, irrespective of their gender, natural pitch range, and the original pitch range of the song to be sung.
    Type: Grant
    Filed: August 18, 2017
    Date of Patent: June 26, 2018
    Assignee: OBEN, INC.
    Inventor: Mark J. Harvilla