Patents by Inventor Javier Gonzalvo

Javier Gonzalvo has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20210019599
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for determining neural network architectures. One of the methods includes selecting a candidate architecture; selecting a neural network block from the set of neural network blocks; determining whether to (i) add the selected neural network block as a new neural network block in the candidate architecture or (ii) replace one of the neural network blocks in the selected candidate architecture with the selected neural network block; based on the determining, generating a mutated architecture; training a neural network having the mutated architecture on the training data; determining a performance measure for the trained neural network that measures the performance of the trained neural network on the particular machine learning task; and adding, to the maintained data, data specifying the mutated architecture and data associating the mutated architecture with the determined performance measure.
    Type: Application
    Filed: July 20, 2020
    Publication date: January 21, 2021
    Inventors: Hanna Mazzawi, Javier Gonzalvo Fructuoso, Eugen Surri Hotaj
  • Patent number: 10249289
    Abstract: Methods, systems, and computer-readable media for text-to-speech synthesis using an autoencoder. In some implementations, data indicating a text for text-to-speech synthesis is obtained. Data indicating a linguistic unit of the text is provided as input to an encoder. The encoder is configured to output speech unit representations indicative of acoustic characteristics based on linguistic information. A speech unit representation that the encoder outputs is received. A speech unit is selected to represent the linguistic unit, the speech unit being selected from among a collection of speech units based on the speech unit representation output by the encoder. Audio data for a synthesized utterance of the text that includes the selected speech unit is provided.
    Type: Grant
    Filed: July 13, 2017
    Date of Patent: April 2, 2019
    Assignee: Google LLC
    Inventors: Byung Ha Chun, Javier Gonzalvo, Chun-an Chan, Ioannis Agiomyrgiannakis, Vincent Ping Leung Wan, Robert Andrew James Clark, Jakub Vit
  • Publication number: 20180268806
    Abstract: Methods, systems, and computer-readable media for text-to-speech synthesis using an autoencoder. In some implementations, data indicating a text for text-to-speech synthesis is obtained. Data indicating a linguistic unit of the text is provided as input to an encoder. The encoder is configured to output speech unit representations indicative of acoustic characteristics based on linguistic information. A speech unit representation that the encoder outputs is received. A speech unit is selected to represent the linguistic unit, the speech unit being selected from among a collection of speech units based on the speech unit representation output by the encoder. Audio data for a synthesized utterance of the text that includes the selected speech unit is provided.
    Type: Application
    Filed: July 13, 2017
    Publication date: September 20, 2018
    Inventors: Byung Ha Chun, Javier Gonzalvo, Chun-an Chan, Ioannis Agiomyrgiannakis, Vincent Ping Leung Wan, Robert Andrew James Clark, Jakub Vit
  • Patent number: 9905220
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for multilingual prosody generation. In some implementations, data indicating a set of linguistic features corresponding to a text is obtained. Data indicating the linguistic features and data indicating the language of the text are provided as input to a neural network that has been trained to provide output indicating prosody information for multiple languages. The neural network can be a neural network having been trained using speech in multiple languages. Output indicating prosody information for the linguistic features is received from the neural network. Audio data representing the text is generated using the output of the neural network.
    Type: Grant
    Filed: November 16, 2015
    Date of Patent: February 27, 2018
    Assignee: Google LLC
    Inventors: Javier Gonzalvo Fructuoso, Andrew W. Senior, Byungha Chun
  • Publication number: 20160343366
    Abstract: In some implementations, a text-to-speech system may perform a mapping of acoustic frames to linguistic model clusters in a pre-selection process for unit selection synthesis. An architecture may leverage data-driven models, such as neural networks that are trained using recorded speech samples, to effectively map acoustic frames to linguistic model clusters during synthesis. This architecture may allow for improved handling and synthesis of combinations of unseen linguistic features.
    Type: Application
    Filed: May 19, 2015
    Publication date: November 24, 2016
    Inventors: Javier Gonzalvo Fructuoso, Byungha Chun
  • Patent number: 9460704
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for providing a representation based on structured data in resources. The methods, systems, and apparatus include actions of receiving target acoustic features output from a neural network that has been trained to predict acoustic features given linguistic features. Additional actions include determining a distance between the target acoustic features and acoustic features of a stored acoustic sample. Further actions include selecting the acoustic sample to be used in speech synthesis based at least on the determined distance and synthesizing speech based on the selected acoustic sample.
    Type: Grant
    Filed: September 6, 2013
    Date of Patent: October 4, 2016
    Assignee: Google Inc.
    Inventors: Andrew W. Senior, Javier Gonzalvo Fructuoso
  • Patent number: 9424835
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for providing statistical unit selection language modeling based on acoustic fingerprinting. The methods, systems and apparatus include the actions of obtaining a unit database of acoustic units and, for each acoustic unit, linguistic data corresponding to the acoustic unit; obtaining stored data associating each acoustic unit with (i) a corresponding acoustic fingerprint and (ii) a probability of the linguistic data corresponding to the acoustic unit occurring in a text corpus; determining that the unit database of acoustic units has been updated to include one or more new acoustic units; for each new acoustic unit in the updated unit database: generating an acoustic fingerprint for the new acoustic unit; identifying an acoustic unit that (i) has an acoustic fingerprint that is indicated as similar to the fingerprint of the new acoustic unit, and (ii) has a stored associated probability.
    Type: Grant
    Filed: September 10, 2015
    Date of Patent: August 23, 2016
    Assignee: Google Inc.
    Inventors: Alexander Gutkin, Javier Gonzalvo Fructuoso, Cyril Georges Luc Allauzen
  • Patent number: 9318104
    Abstract: Methods and systems for sharing of adapted voice profiles are provided. The method may comprise receiving, at a computing system, one or more speech samples, and the one or more speech samples may include a plurality of spoken utterances. The method may further comprise determining, at the computing system, a voice profile associated with a speaker of the plurality of spoken utterances, and including an adapted voice of the speaker. Still further, the method may comprise receiving, at the computing system, an authorization profile associated with the determined voice profile, and the authorization profile may include one or more user identifiers associated with one or more respective users. Yet still further, the method may comprise the computing system providing the voice profile to at least one computing device associated with the one or more respective users, based at least in part on the authorization profile.
    Type: Grant
    Filed: July 10, 2015
    Date of Patent: April 19, 2016
    Assignee: Google Inc.
    Inventors: Javier Gonzalvo Fructuoso, Johan Schalkwyk
  • Publication number: 20160093295
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for providing statistical unit selection language modeling based on acoustic fingerprinting. The methods, systems and apparatus include the actions of obtaining a unit database of acoustic units and, for each acoustic unit, linguistic data corresponding to the acoustic unit; obtaining stored data associating each acoustic unit with (i) a corresponding acoustic fingerprint and (ii) a probability of the linguistic data corresponding to the acoustic unit occurring in a text corpus; determining that the unit database of acoustic units has been updated to include one or more new acoustic units; for each new acoustic unit in the updated unit database: generating an acoustic fingerprint for the new acoustic unit; identifying an acoustic unit that (i) has an acoustic fingerprint that is indicated as similar to the fingerprint of the new acoustic unit, and (ii) has a stored associated probability.
    Type: Application
    Filed: September 10, 2015
    Publication date: March 31, 2016
    Inventors: Alexander Gutkin, Javier Gonzalvo Fructuoso, Cyril Georges Luc Allauzen
  • Publication number: 20160071512
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for multilingual prosody generation. In some implementations, data indicating a set of linguistic features corresponding to a text is obtained. Data indicating the linguistic features and data indicating the language of the text are provided as input to a neural network that has been trained to provide output indicating prosody information for multiple languages. The neural network can be a neural network having been trained using speech in multiple languages. Output indicating prosody information for the linguistic features is received from the neural network. Audio data representing the text is generated using the output of the neural network.
    Type: Application
    Filed: November 16, 2015
    Publication date: March 10, 2016
    Inventors: Javier Gonzalvo Fructuoso, Andrew W. Senior, Byungha Chun
  • Patent number: 9263028
    Abstract: An input signal that includes linguistic content in a first language may be received by a computing device. The linguistic content may include text or speech. The computing device may associate the linguistic content in the first language with one or more phonemes from a second language. The computing device may also determine a phonemic representation of the linguistic content in the first language based on use of the one or more phonemes from the second language. The phonemic representation may be indicative of a pronunciation of the linguistic content in the first language according to speech sounds of the second language.
    Type: Grant
    Filed: May 21, 2014
    Date of Patent: February 16, 2016
    Assignee: Google Inc.
    Inventors: Javier Gonzalvo Fructuoso, Ioannis Agiomyrgiannakis
  • Patent number: 9195656
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for multilingual prosody generation. In some implementations, data indicating a set of linguistic features corresponding to a text is obtained. Data indicating the linguistic features and data indicating the language of the text are provided as input to a neural network that has been trained to provide output indicating prosody information for multiple languages. The neural network can be a neural network having been trained using speech in multiple languages. Output indicating prosody information for the linguistic features is received from the neural network. Audio data representing the text is generated using the output of the neural network.
    Type: Grant
    Filed: December 30, 2013
    Date of Patent: November 24, 2015
    Assignee: Google Inc.
    Inventors: Javier Gonzalvo Fructuoso, Andrew W. Senior, Byungha Chun
  • Patent number: 9117451
    Abstract: Methods and systems for sharing of adapted voice profiles are provided. The method may comprise receiving, at a computing system, one or more speech samples, and the one or more speech samples may include a plurality of spoken utterances. The method may further comprise determining, at the computing system, a voice profile associated with a speaker of the plurality of spoken utterances, and including an adapted voice of the speaker. Still further, the method may comprise receiving, at the computing system, an authorization profile associated with the determined voice profile, and the authorization profile may include one or more user identifiers associated with one or more respective users. Yet still further, the method may comprise the computing system providing the voice profile to at least one computing device associated with the one or more respective users, based at least in part on the authorization profile.
    Type: Grant
    Filed: April 29, 2013
    Date of Patent: August 25, 2015
    Assignee: Google Inc.
    Inventors: Javier Gonzalvo Fructuoso, Johan Schalkwyk
  • Patent number: 9111534
    Abstract: Implementations related to system and techniques for providing audio news reports are discussed. A computer-implemented method includes identifying, with a computer system, one or more news preferences for a first user, selecting a plurality of news stories, wherein particular ones of the new stories are determined to be responsive to the news preferences for the first user and comprise audio versions of stories converted automatically from textual news stories, assembling, with the computer system and for the first user, an audio news report that includes the audio versions of the selected news stories, and delivering, to a computing device, the assembled audio news report.
    Type: Grant
    Filed: March 14, 2013
    Date of Patent: August 18, 2015
    Assignee: Google Inc.
    Inventors: Michael A. Sylvester, Javier Gonzalvo Fructuoso
  • Patent number: 9082401
    Abstract: The present disclosure describes example systems, methods, and devices for generating a synthetic speech signal. An example method may include determining a phonemic representation of text. The example method may also include identifying one or more finite-state machines (“FSMs”) corresponding to one or more phonemes included in the phonemic representation of the text. A given FSM may be a compressed unit of recorded speech that simulates a Hidden Markov Model. The example method may further include determining a selected sequence of models that minimizes a cost function that represents a likelihood that a possible sequence of models substantially matches a phonemic representation of text. Each possible sequence of models may include at least one FSM. The method may additionally include generating a synthetic speech signal based on the selected sequence that includes one or more spectral features generated from at least one FSM included in the selected sequence.
    Type: Grant
    Filed: January 9, 2013
    Date of Patent: July 14, 2015
    Assignee: Google Inc.
    Inventors: Javier Gonzalvo Fructuoso, Alexander Gutkin
  • Publication number: 20150186359
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for multilingual prosody generation. In some implementations, data indicating a set of linguistic features corresponding to a text is obtained. Data indicating the linguistic features and data indicating the language of the text are provided as input to a neural network that has been trained to provide output indicating prosody information for multiple languages. The neural network can be a neural network having been trained using speech in multiple languages. Output indicating prosody information for the linguistic features is received from the neural network. Audio data representing the text is generated using the output of the neural network.
    Type: Application
    Filed: December 30, 2013
    Publication date: July 2, 2015
    Applicant: Google Inc.
    Inventors: Javier Gonzalvo Fructuoso, Andrew W. Senior, Byungha Chun
  • Publication number: 20150095018
    Abstract: An input signal that includes linguistic content in a first language may be received by a computing device. The linguistic content may include text or speech. The computing device may associate the linguistic content in the first language with one or more phonemes from a second language. The computing device may also determine a phonemic representation of the linguistic content in the first language based on use of the one or more phonemes from the second language. The phonemic representation may be indicative of a pronunciation of the linguistic content in the first language according to speech sounds of the second language.
    Type: Application
    Filed: May 21, 2014
    Publication date: April 2, 2015
    Applicant: Google Inc.
    Inventors: Javier Gonzalvo Fructuoso, Ioannis Agiomyrgiannakis
  • Publication number: 20150073804
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for providing a representation based on structured data in resources. The methods, systems, and apparatus include actions of receiving target acoustic features output from a neural network that has been trained to predict acoustic features given linguistic features. Additional actions include determining a distance between the target acoustic features and acoustic features of a stored acoustic sample. Further actions include selecting the acoustic sample to be used in speech synthesis based at least on the determined distance and synthesizing speech based on the selected acoustic sample.
    Type: Application
    Filed: September 6, 2013
    Publication date: March 12, 2015
    Applicant: Google Inc.
    Inventors: Andrew W. Senior, Javier Gonzalvo Fructuoso
  • Patent number: 8768704
    Abstract: An input signal that includes linguistic content in a first language may be received by a computing device. The linguistic content may include text or speech. Based on an acoustic feature comparison between a plurality of first-language speech sounds and a plurality of second-language speech sounds, the computing device may associate the linguistic content in the first language with one or more phonemes from a second language. The computing device may also determine a phonemic representation of the linguistic content in the first language based on use of the one or more phonemes from the second language. The phonemic representation may be indicative of a pronunciation of the linguistic content in the first language according to speech sounds of the second language.
    Type: Grant
    Filed: October 14, 2013
    Date of Patent: July 1, 2014
    Assignee: Google Inc.
    Inventors: Javier Gonzalvo Fructuoso, Ioannis Agiomyrgiannakis
  • Patent number: 8751236
    Abstract: A device may receive a plurality of speech sounds that are indicative of pronunciations of a first linguistic term. The device may determine concatenation features of the plurality of speech sounds. The concatenation features may be indicative of an acoustic transition between a first speech sound and a second speech sound when the first speech sound and the second speech sound are concatenated. The first speech sound may be included in the plurality of speech sounds and the second speech sound may be indicative of a pronunciation of a second linguistic term. The device may cluster the plurality of speech sounds into one or more clusters based on the concatenation features. The device may provide a representative speech sound of the given cluster as the first speech sound when the first speech sound and the second speech sound are concatenated.
    Type: Grant
    Filed: October 23, 2013
    Date of Patent: June 10, 2014
    Assignee: Google Inc.
    Inventors: Javier Gonzalvo Fructuoso, Alexander Gutkin, Ioannis Agiomyrgiannakis