Patents Assigned to OBEN, INC.

Personalizing a DNN-based text-to-speech system using small target speech corpus

Patent number: 11276389

Abstract: A personalized text-to-speech system configured to perform speaker adaption is disclosed. The TTS system includes an acoustic model comprising a base neural network and a differential neural network. The base neural network is configured to generate acoustic parameters corresponding to a base speaker or voice actor, while the differential neural network is configured to generate acoustic parameters corresponding to differences between acoustic parameters of the base speaker and a particular target speaker. The output of the acoustic model is then a weighted linear combination of the output from the base neural network and differential neural network. The base neural network and differential neural network share a first input layer and first plurality of hidden layers. Thereafter, the base neural network further comprises a second plurality of hidden layers and output layer. In parallel, the differential neural network further comprises a third plurality of hidden layers and separate output layer.

Type: Grant

Filed: December 2, 2019

Date of Patent: March 15, 2022

Assignee: OBEN, INC.

Inventor: Sandesh Aryal
Enhanced virtual singers generation by incorporating singing dynamics to personalized text-to-speech-to-singing

Patent number: 11183169

Abstract: A technique to enhance the quality of Text-to-Speech (TTS) based Singing Voice generation is disclosed. The present invention efficiently preserves the speaker identity and improves sound quality by incorporating speaker-independent natural singing information into TTS-based Speech-to-Singing (STS). The Template-based Text-to-Singing (TTTS) system merges qualities of a singing voice generated from a TTS system with qualities of a singing voice generated from an actual voice singing the song. The qualities are represented in terms of Mel-generalized cepstrum (MGC) coefficients. In particular, low-order MGC coefficients from the TTS-based singing voice with high-order MGC coefficients from the voice of an actual singer.

Type: Grant

Filed: November 8, 2019

Date of Patent: November 23, 2021

Assignee: OBEN, INC.

Inventors: Kantapon Kaewtip, Fernando Villavicencio
Speaker recognition using deep learning neural network

Patent number: 10706856

Abstract: A speaker identification/verification system comprises at least one feature extractor for extracting a plurality of audio features from speaker voice data, a plurality of speaker-specific subsystems, and a decision module. Each of the speaker-specific subsystem comprises: a neural network configured to generate an estimate of the plurality of extracted audio features based on the plurality of extracted audio features, and an error module. Each of the plurality of neural networks is associated with one of a plurality of speakers, and the one speaker associated with each of the plurality of neural networks is different for all neural networks. The error module is configured to estimate an error based on the plurality of extracted audio features and the estimate of the plurality of extracted audio features generated by the associated neural network. The neural networks are speaker-specific auto-encoders trained for one user and therefore calibrated on that particular user's speech.

Type: Grant

Filed: September 12, 2017

Date of Patent: July 7, 2020

Assignee: OBEN, INC.

Inventor: Mohammad Mehdi Korjani
Global frequency-warping transformation estimation for voice timbre approximation

Patent number: 10706867

Abstract: A method and system for converting a source voice to a target voice is disclosed. The method comprises: recording source voice data and target voice data; extracting spectral envelope features from the source voice data and target voice data; time-aligning pairs of frames based on the extracted spectral envelope features; converting each pair of frames into a frequency domain; generating a plurality of frequency-warping factor candidates, wherein each of the plurality of frequency-warping factor candidates is associated with one of the pairs of frames; generating a single global frequency-warping factor based on the candidates; acquiring source speech; converting the source speech to target speech based on the global frequency-warping factor; generating a waveform comprising the target speech; and playing the waveform comprising the target speech to a user.

Type: Grant

Filed: March 5, 2018

Date of Patent: July 7, 2020

Assignee: OBEN, INC.

Inventors: Fernando Villavicencio, Mark Harvilla
Modifying syllable durations for personalizing Chinese Mandarin TTS using small corpus

Patent number: 10643600

Abstract: A method and system for personalizing synthetic speech from a text-to-speech (TTS) system is disclosed. The method uses linguistic feature vectors to correct/modify the synthetic speech, particularly Chinese Mandarin speech. The linguistic feature vectors are used to generate or retrieve onset and rime scaling factors encoding differences between the synthetic speech and a user's natural speech. Together, the onset and rime scaling factors are used to modify every word/syllable of the synthetic speech from a TTS system, for example. In particular, segments of synthetic speech are either compressed or stretched in time for each part of each syllable of the synthetic speech. After modification, the synthetic speech more closely resembles the speech patterns of a speaker for which the scaling factors were generated. The modified synthetic speech may then be transmitted to a user and played to the user via a mobile phone, for example.

Type: Grant

Filed: March 9, 2018

Date of Patent: May 5, 2020

Assignee: OBEN, INC.

Inventor: Sandesh Aryal
Voice conversion using deep neural network with intermediate voice training

Patent number: 10535336

Abstract: A system and method of converting source speech to target speech using intermediate speech data is disclosed. The method comprises identifying intermediate speech data that match target voice training data based on acoustic features; performing dynamic time warping to match the second set of acoustic features of intermediate speech data and the first set of acoustic features of target voice training data; training a neural network to convert the intermediate speech data to target voice training data; receiving source speech data; converting the source speech data to an intermediate speech; converting the intermediate speech to a target speech sequence using the neural network; and converting the target speech sequence to target speech using the pitch from the target voice training data.

Type: Grant

Filed: January 22, 2019

Date of Patent: January 14, 2020

Assignee: OBEN, INC.

Inventor: Seyed Hamidreza Mohammadi
Split-model architecture for DNN-based small corpus voice conversion

Patent number: 10453476

Abstract: A voice conversion system suitable for encoding small and large corpuses is disclosed. The voice conversion system comprises hardware including a neural network for generating estimated target speech data based on source speech data. The neural network includes an input layer, an output layer, and a novel split-model hidden layer. The input layer comprises a first portion and a second portion. The output layer comprises a third portion and a fourth portion. The hidden layer comprises a first subnet and a second subnet, wherein the first subnet is directly connected to the first portion of the input layer and the third portion of the output layer, and wherein the second subnet is directly connected to the second portion of the input layer and the fourth portion of the output layer. The first subnet and second subnet operate in parallel, and link to different but overlapping nodes of the input layer.

Type: Grant

Filed: July 21, 2017

Date of Patent: October 22, 2019

Assignee: OBEN, INC.

Inventor: Sandesh Aryal
System and method for the analysis and synthesis of periodic and non-periodic components of speech signals

Patent number: 10354671

Abstract: A voice coder configured to resolve periodic and aperiodic components of spectra is disclosed. The method of voice coding includes parsing the speech signal into a plurality of speech frames; for each of the plurality of speech frames: (a) generating the spectra for the speech frame, (b) parsing the spectra of the speech frame into a plurality of sub-bands, (c) transforming each of the plurality of sub-bands into a time-domain envelope signal, and (d) generating a plurality of sub-band voicing factors, wherein each sub-band voicing factor indicates the harmonicity of one of the plurality of sub-bands, and each sub-band voicing factor is based on the periodicity of one of said time-domain envelope signals associated with one of the plurality of sub-bands.

Type: Grant

Filed: February 21, 2018

Date of Patent: July 16, 2019

Assignee: OBEN, INC.

Inventors: Kantapon Kaewtip, Fernando Villavicencio, Mark Harvilla
Voice conversion system and method with variance and spectrum compensation

Patent number: 10249314

Abstract: A voice conversion system for generating realistic, natural-sounding target speech is disclosed. The voice conversion system preferably comprises a neural network for converting the source speech data to estimated target speech data; a global variance correction module; a modulation spectrum correction module; and a waveform generator. The global variance correction module is configured to scale and shift (or normalize and de-normalize) the estimated target speech based on (i) a mean and standard deviation of the source speech data, and further based on (ii) a mean and standard deviation of the estimated target speech data. The modulation spectrum correction module is configured to apply a plurality of filters to the estimated target speech data after it has been scaled and shifted by the global variance correction module. Each filter is designed to correct the trajectory representing the curve of one MCEP coefficient over time.

Type: Grant

Filed: July 21, 2017

Date of Patent: April 2, 2019

Assignee: OBEN, INC.

Inventor: Sandesh Aryal
Text to speech synthesis using deep neural network with constant unit length spectrogram

Patent number: 10186252

Abstract: A system and method for converting text to speech is disclosed. The text is decomposed into a sequence of phonemes and a text feature matrix constructed to define the manner in which the phonemes are pronounced and accented. A spectrum generator then queries a neural network to produce normalized spectrograms based on the input of the sequence of phonemes and features. Normalized spectrograms are fixed-length spectrograms with uniform temporal length (i.e., data size), which enables them to be effectively encoded into a neural network representation. A duration generator output a plurality of durations that are associated with phonemes. A speech synthesizer modifies the temporal length (i.e., de-normalizes) of each normalized spectrogram based on the associated duration, and then combines the plurality of modified spectrograms into speech.

Type: Grant

Filed: August 12, 2016

Date of Patent: January 22, 2019

Assignee: OBEN, INC.

Inventor: Seyed Hamidreza Mohammadi
Voice conversion using deep neural network with intermediate voice training

Patent number: 10186251

Abstract: A system and method of converting source speech to target speech using intermediate speech data is disclosed. The method comprises identifying intermediate speech data that match target voice training data based on acoustic features; performing dynamic time warping to match the second set of acoustic features of intermediate speech data and the first set of acoustic features of target voice training data; training a neural network to convert the intermediate speech data to target voice training data; receiving source speech data; converting the source speech data to an intermediate speech; converting the intermediate speech to a target speech sequence using the neural network; and converting the target speech sequence to target speech using the pitch from the target voice training data.

Type: Grant

Filed: August 4, 2016

Date of Patent: January 22, 2019

Assignee: OBEN, INC.

Inventor: Seyed Hamidreza Mohammadi
Method and system for speech-to-singing voice conversion

Patent number: 10008193

Abstract: A singing voice conversion system configured to generate a song in the voice of a target singer based on a song in the voice of a source singer is disclosed. The embodiment utilizes two complementary approaches to voice timbre conversion. Both combine the natural prosody of a source singer with the pitch of the target singer—typically the user of the system—to achieve realistic sounding synthetic singing. The system is able to transpose the key of any song to match the automatically determined or desired pitch range of the target singer, thus allowing the system to generalize to any target singer, irrespective of their gender, natural pitch range, and the original pitch range of the song to be sung.

Type: Grant

Filed: August 18, 2017

Date of Patent: June 26, 2018

Assignee: OBEN, INC.

Inventor: Mark J. Harvilla