Patents by Inventor Javier Latorre

Javier Latorre has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 8494856
    Abstract: According to one embodiment, a speech synthesizer includes an analyzer, a first estimator, a selector, a generator, a second estimator, and a synthesizer. The analyzer analyzes text and extracts a linguistic feature. The first estimator selects a first prosody model adapted to the linguistic feature and estimates prosody information that maximizes a first likelihood representing probability of the selected first prosody model. The selector selects speech units that minimize a cost function determined in accordance with the prosody information. The generator generates a second prosody model that is a model of the prosody information of the speech units. The second estimator estimates prosody information that maximizes a third likelihood calculated on the basis of the first likelihood and a second likelihood representing probability of the second prosody model. The synthesizer generates synthetic speech by concatenating the speech units on the basis of the prosody information estimated by the second estimator.
    Type: Grant
    Filed: October 12, 2011
    Date of Patent: July 23, 2013
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Javier Latorre, Masami Akamine
  • Patent number: 8438014
    Abstract: According to one embodiment, in a speech processing device, an extractor windows a part of the speech signal and extracts a partial waveform. A calculator performs frequency analysis of the partial waveform to calculate a frequency spectrum. An estimator generates an artificial waveform that is a waveform according to an interval between the pitch marks for each harmonic component having a frequency that is a predetermined multiple of a fundamental frequency of the speech signal and estimates harmonic spectral features representing characteristics of the frequency spectrum of the harmonic component from each of the artificial waveforms. A separator separates the partial waveform into a periodic component produced from periodic vocal-fold vibration as an acoustic source and an aperiodic component produced from aperiodic acoustic sources other than the vocal-fold vibration by using the respective harmonic spectral features and the frequency spectrum of the partial waveform.
    Type: Grant
    Filed: January 26, 2012
    Date of Patent: May 7, 2013
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Masahiro Morita, Javier Latorre, Takehiko Kagoshima
  • Patent number: 8407053
    Abstract: A speech processing apparatus, including a segmenting unit to divide a fundamental frequency signal of a speech signal corresponding to an input text into pitch segments, based on an alignment between samples of at least one given linguistic level included in the input text and the speech signal. Character strings of the input text are divided into the samples based on each linguistic level. A parameterizing unit generates a parametric representation of the pitch segments using a predetermined invertible operator and generates a group of first parameters in correspondence with each linguistic level. A descriptor generating unit generates, for each linguistic level, a descriptor that includes a set of features describing each sample in the input text and a model learning unit classifies the first parameters of each linguistic level of all speech signals in a memory into clusters based on the descriptor corresponding to the linguistic level.
    Type: Grant
    Filed: March 17, 2009
    Date of Patent: March 26, 2013
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Javier Latorre, Masami Akamine
  • Publication number: 20120185244
    Abstract: According to one embodiment, in a speech processing device, an extractor windows a part of the speech signal and extracts a partial waveform. A calculator performs frequency analysis of the partial waveform to calculate a frequency spectrum. An estimator generates an artificial waveform that is a waveform according to an interval between the pitch marks for each harmonic component having a frequency that is a predetermined multiple of a fundamental frequency of the speech signal and estimates harmonic spectral features representing characteristics of the frequency spectrum of the harmonic component from each of the artificial waveforms. A separator separates the partial waveform into a periodic component produced from periodic vocal-fold vibration as an acoustic source and an aperiodic component produced from aperiodic acoustic sources other than the vocal-fold vibration by using the respective harmonic spectral features and the frequency spectrum of the partial waveform.
    Type: Application
    Filed: January 26, 2012
    Publication date: July 19, 2012
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Masahiro Morita, Javier Latorre, Takehiko Kagoshima
  • Publication number: 20120089402
    Abstract: According to one embodiment, a speech synthesizer includes an analyzer, a first estimator, a selector, a generator, a second estimator, and a synthesizer. The analyzer analyzes text and extracts a linguistic feature. The first estimator selects a first prosody model adapted to the linguistic feature and estimates prosody information that maximizes a first likelihood representing probability of the selected first prosody model. The selector selects speech units that minimize a cost function determined in accordance with the prosody information. The generator generates a second prosody model that is a model of the prosody information of the speech units. The second estimator estimates prosody information that maximizes a third likelihood calculated on the basis of the first likelihood and a second likelihood representing probability of the second prosody model. The synthesizer generates synthetic speech by concatenating the speech units on the basis of the prosody information estimated by the second estimator.
    Type: Application
    Filed: October 12, 2011
    Publication date: April 12, 2012
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Javier Latorre, Masami Akamine
  • Publication number: 20120065961
    Abstract: According to one embodiment, a speech model generating apparatus includes a spectrum analyzer, a chunker, a parameterizer, a clustering unit, and a model training unit. The spectrum analyzer acquires a speech signal corresponding to text information and calculates a set of spectral coefficients. The chunker acquires boundary information indicating a beginning and an end of linguistic units and chunks the speech signal into linguistic units. The parameterizer calculates a set of spectral trajectory parameters for a trajectory of the spectral trajectory parameters of the linguistic unit on the basis of the spectral coefficients. The clustering unit clusters the spectral trajectory parameters calculated for each of the linguistic units into clusters on the basis of linguistic information. The model training unit obtains a trained spectral trajectory model indicating a characteristic of a cluster based on the spectral trajectory parameters belonging to the same cluster.
    Type: Application
    Filed: September 21, 2011
    Publication date: March 15, 2012
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Javier Latorre, Masami Akamine
  • Publication number: 20090248417
    Abstract: A method to generate a pitch contour for speech synthesis is proposed. The method is based on finding the pitch contour that maximizes a total likelihood function created by the combination of all the statistical models of the pitch contour segments of an utterance, at one or multiple linguistic levels. These statistical models are trained from a database of spoken speech, by means of a decision tree that for each linguistic level clusters the parametric representation of the pitch segments extracted from the spoken speech data with some features obtained from the text associated with that speech data. The parameterization of the pitch segments is performed in such a way, the likelihood function of any linguistic level can be expressed in terms of the parameters of one of the levels, thus allowing the maximization to be calculated with respect to the parameters of that level.
    Type: Application
    Filed: March 17, 2009
    Publication date: October 1, 2009
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Javier Latorre, Masami Akamine