Patents by Inventor Javier Gonzalvo
Javier Gonzalvo has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20210019599Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for determining neural network architectures. One of the methods includes selecting a candidate architecture; selecting a neural network block from the set of neural network blocks; determining whether to (i) add the selected neural network block as a new neural network block in the candidate architecture or (ii) replace one of the neural network blocks in the selected candidate architecture with the selected neural network block; based on the determining, generating a mutated architecture; training a neural network having the mutated architecture on the training data; determining a performance measure for the trained neural network that measures the performance of the trained neural network on the particular machine learning task; and adding, to the maintained data, data specifying the mutated architecture and data associating the mutated architecture with the determined performance measure.Type: ApplicationFiled: July 20, 2020Publication date: January 21, 2021Inventors: Hanna Mazzawi, Javier Gonzalvo Fructuoso, Eugen Surri Hotaj
-
Patent number: 10249289Abstract: Methods, systems, and computer-readable media for text-to-speech synthesis using an autoencoder. In some implementations, data indicating a text for text-to-speech synthesis is obtained. Data indicating a linguistic unit of the text is provided as input to an encoder. The encoder is configured to output speech unit representations indicative of acoustic characteristics based on linguistic information. A speech unit representation that the encoder outputs is received. A speech unit is selected to represent the linguistic unit, the speech unit being selected from among a collection of speech units based on the speech unit representation output by the encoder. Audio data for a synthesized utterance of the text that includes the selected speech unit is provided.Type: GrantFiled: July 13, 2017Date of Patent: April 2, 2019Assignee: Google LLCInventors: Byung Ha Chun, Javier Gonzalvo, Chun-an Chan, Ioannis Agiomyrgiannakis, Vincent Ping Leung Wan, Robert Andrew James Clark, Jakub Vit
-
Publication number: 20180268806Abstract: Methods, systems, and computer-readable media for text-to-speech synthesis using an autoencoder. In some implementations, data indicating a text for text-to-speech synthesis is obtained. Data indicating a linguistic unit of the text is provided as input to an encoder. The encoder is configured to output speech unit representations indicative of acoustic characteristics based on linguistic information. A speech unit representation that the encoder outputs is received. A speech unit is selected to represent the linguistic unit, the speech unit being selected from among a collection of speech units based on the speech unit representation output by the encoder. Audio data for a synthesized utterance of the text that includes the selected speech unit is provided.Type: ApplicationFiled: July 13, 2017Publication date: September 20, 2018Inventors: Byung Ha Chun, Javier Gonzalvo, Chun-an Chan, Ioannis Agiomyrgiannakis, Vincent Ping Leung Wan, Robert Andrew James Clark, Jakub Vit
-
Patent number: 9905220Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for multilingual prosody generation. In some implementations, data indicating a set of linguistic features corresponding to a text is obtained. Data indicating the linguistic features and data indicating the language of the text are provided as input to a neural network that has been trained to provide output indicating prosody information for multiple languages. The neural network can be a neural network having been trained using speech in multiple languages. Output indicating prosody information for the linguistic features is received from the neural network. Audio data representing the text is generated using the output of the neural network.Type: GrantFiled: November 16, 2015Date of Patent: February 27, 2018Assignee: Google LLCInventors: Javier Gonzalvo Fructuoso, Andrew W. Senior, Byungha Chun
-
Publication number: 20160343366Abstract: In some implementations, a text-to-speech system may perform a mapping of acoustic frames to linguistic model clusters in a pre-selection process for unit selection synthesis. An architecture may leverage data-driven models, such as neural networks that are trained using recorded speech samples, to effectively map acoustic frames to linguistic model clusters during synthesis. This architecture may allow for improved handling and synthesis of combinations of unseen linguistic features.Type: ApplicationFiled: May 19, 2015Publication date: November 24, 2016Inventors: Javier Gonzalvo Fructuoso, Byungha Chun
-
Patent number: 9460704Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for providing a representation based on structured data in resources. The methods, systems, and apparatus include actions of receiving target acoustic features output from a neural network that has been trained to predict acoustic features given linguistic features. Additional actions include determining a distance between the target acoustic features and acoustic features of a stored acoustic sample. Further actions include selecting the acoustic sample to be used in speech synthesis based at least on the determined distance and synthesizing speech based on the selected acoustic sample.Type: GrantFiled: September 6, 2013Date of Patent: October 4, 2016Assignee: Google Inc.Inventors: Andrew W. Senior, Javier Gonzalvo Fructuoso
-
Patent number: 9424835Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for providing statistical unit selection language modeling based on acoustic fingerprinting. The methods, systems and apparatus include the actions of obtaining a unit database of acoustic units and, for each acoustic unit, linguistic data corresponding to the acoustic unit; obtaining stored data associating each acoustic unit with (i) a corresponding acoustic fingerprint and (ii) a probability of the linguistic data corresponding to the acoustic unit occurring in a text corpus; determining that the unit database of acoustic units has been updated to include one or more new acoustic units; for each new acoustic unit in the updated unit database: generating an acoustic fingerprint for the new acoustic unit; identifying an acoustic unit that (i) has an acoustic fingerprint that is indicated as similar to the fingerprint of the new acoustic unit, and (ii) has a stored associated probability.Type: GrantFiled: September 10, 2015Date of Patent: August 23, 2016Assignee: Google Inc.Inventors: Alexander Gutkin, Javier Gonzalvo Fructuoso, Cyril Georges Luc Allauzen
-
Patent number: 9318104Abstract: Methods and systems for sharing of adapted voice profiles are provided. The method may comprise receiving, at a computing system, one or more speech samples, and the one or more speech samples may include a plurality of spoken utterances. The method may further comprise determining, at the computing system, a voice profile associated with a speaker of the plurality of spoken utterances, and including an adapted voice of the speaker. Still further, the method may comprise receiving, at the computing system, an authorization profile associated with the determined voice profile, and the authorization profile may include one or more user identifiers associated with one or more respective users. Yet still further, the method may comprise the computing system providing the voice profile to at least one computing device associated with the one or more respective users, based at least in part on the authorization profile.Type: GrantFiled: July 10, 2015Date of Patent: April 19, 2016Assignee: Google Inc.Inventors: Javier Gonzalvo Fructuoso, Johan Schalkwyk
-
Publication number: 20160093295Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for providing statistical unit selection language modeling based on acoustic fingerprinting. The methods, systems and apparatus include the actions of obtaining a unit database of acoustic units and, for each acoustic unit, linguistic data corresponding to the acoustic unit; obtaining stored data associating each acoustic unit with (i) a corresponding acoustic fingerprint and (ii) a probability of the linguistic data corresponding to the acoustic unit occurring in a text corpus; determining that the unit database of acoustic units has been updated to include one or more new acoustic units; for each new acoustic unit in the updated unit database: generating an acoustic fingerprint for the new acoustic unit; identifying an acoustic unit that (i) has an acoustic fingerprint that is indicated as similar to the fingerprint of the new acoustic unit, and (ii) has a stored associated probability.Type: ApplicationFiled: September 10, 2015Publication date: March 31, 2016Inventors: Alexander Gutkin, Javier Gonzalvo Fructuoso, Cyril Georges Luc Allauzen
-
Publication number: 20160071512Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for multilingual prosody generation. In some implementations, data indicating a set of linguistic features corresponding to a text is obtained. Data indicating the linguistic features and data indicating the language of the text are provided as input to a neural network that has been trained to provide output indicating prosody information for multiple languages. The neural network can be a neural network having been trained using speech in multiple languages. Output indicating prosody information for the linguistic features is received from the neural network. Audio data representing the text is generated using the output of the neural network.Type: ApplicationFiled: November 16, 2015Publication date: March 10, 2016Inventors: Javier Gonzalvo Fructuoso, Andrew W. Senior, Byungha Chun
-
Patent number: 9263028Abstract: An input signal that includes linguistic content in a first language may be received by a computing device. The linguistic content may include text or speech. The computing device may associate the linguistic content in the first language with one or more phonemes from a second language. The computing device may also determine a phonemic representation of the linguistic content in the first language based on use of the one or more phonemes from the second language. The phonemic representation may be indicative of a pronunciation of the linguistic content in the first language according to speech sounds of the second language.Type: GrantFiled: May 21, 2014Date of Patent: February 16, 2016Assignee: Google Inc.Inventors: Javier Gonzalvo Fructuoso, Ioannis Agiomyrgiannakis
-
Patent number: 9195656Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for multilingual prosody generation. In some implementations, data indicating a set of linguistic features corresponding to a text is obtained. Data indicating the linguistic features and data indicating the language of the text are provided as input to a neural network that has been trained to provide output indicating prosody information for multiple languages. The neural network can be a neural network having been trained using speech in multiple languages. Output indicating prosody information for the linguistic features is received from the neural network. Audio data representing the text is generated using the output of the neural network.Type: GrantFiled: December 30, 2013Date of Patent: November 24, 2015Assignee: Google Inc.Inventors: Javier Gonzalvo Fructuoso, Andrew W. Senior, Byungha Chun
-
Patent number: 9117451Abstract: Methods and systems for sharing of adapted voice profiles are provided. The method may comprise receiving, at a computing system, one or more speech samples, and the one or more speech samples may include a plurality of spoken utterances. The method may further comprise determining, at the computing system, a voice profile associated with a speaker of the plurality of spoken utterances, and including an adapted voice of the speaker. Still further, the method may comprise receiving, at the computing system, an authorization profile associated with the determined voice profile, and the authorization profile may include one or more user identifiers associated with one or more respective users. Yet still further, the method may comprise the computing system providing the voice profile to at least one computing device associated with the one or more respective users, based at least in part on the authorization profile.Type: GrantFiled: April 29, 2013Date of Patent: August 25, 2015Assignee: Google Inc.Inventors: Javier Gonzalvo Fructuoso, Johan Schalkwyk
-
Patent number: 9111534Abstract: Implementations related to system and techniques for providing audio news reports are discussed. A computer-implemented method includes identifying, with a computer system, one or more news preferences for a first user, selecting a plurality of news stories, wherein particular ones of the new stories are determined to be responsive to the news preferences for the first user and comprise audio versions of stories converted automatically from textual news stories, assembling, with the computer system and for the first user, an audio news report that includes the audio versions of the selected news stories, and delivering, to a computing device, the assembled audio news report.Type: GrantFiled: March 14, 2013Date of Patent: August 18, 2015Assignee: Google Inc.Inventors: Michael A. Sylvester, Javier Gonzalvo Fructuoso
-
Patent number: 9082401Abstract: The present disclosure describes example systems, methods, and devices for generating a synthetic speech signal. An example method may include determining a phonemic representation of text. The example method may also include identifying one or more finite-state machines (“FSMs”) corresponding to one or more phonemes included in the phonemic representation of the text. A given FSM may be a compressed unit of recorded speech that simulates a Hidden Markov Model. The example method may further include determining a selected sequence of models that minimizes a cost function that represents a likelihood that a possible sequence of models substantially matches a phonemic representation of text. Each possible sequence of models may include at least one FSM. The method may additionally include generating a synthetic speech signal based on the selected sequence that includes one or more spectral features generated from at least one FSM included in the selected sequence.Type: GrantFiled: January 9, 2013Date of Patent: July 14, 2015Assignee: Google Inc.Inventors: Javier Gonzalvo Fructuoso, Alexander Gutkin
-
Publication number: 20150186359Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for multilingual prosody generation. In some implementations, data indicating a set of linguistic features corresponding to a text is obtained. Data indicating the linguistic features and data indicating the language of the text are provided as input to a neural network that has been trained to provide output indicating prosody information for multiple languages. The neural network can be a neural network having been trained using speech in multiple languages. Output indicating prosody information for the linguistic features is received from the neural network. Audio data representing the text is generated using the output of the neural network.Type: ApplicationFiled: December 30, 2013Publication date: July 2, 2015Applicant: Google Inc.Inventors: Javier Gonzalvo Fructuoso, Andrew W. Senior, Byungha Chun
-
Publication number: 20150095018Abstract: An input signal that includes linguistic content in a first language may be received by a computing device. The linguistic content may include text or speech. The computing device may associate the linguistic content in the first language with one or more phonemes from a second language. The computing device may also determine a phonemic representation of the linguistic content in the first language based on use of the one or more phonemes from the second language. The phonemic representation may be indicative of a pronunciation of the linguistic content in the first language according to speech sounds of the second language.Type: ApplicationFiled: May 21, 2014Publication date: April 2, 2015Applicant: Google Inc.Inventors: Javier Gonzalvo Fructuoso, Ioannis Agiomyrgiannakis
-
Publication number: 20150073804Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for providing a representation based on structured data in resources. The methods, systems, and apparatus include actions of receiving target acoustic features output from a neural network that has been trained to predict acoustic features given linguistic features. Additional actions include determining a distance between the target acoustic features and acoustic features of a stored acoustic sample. Further actions include selecting the acoustic sample to be used in speech synthesis based at least on the determined distance and synthesizing speech based on the selected acoustic sample.Type: ApplicationFiled: September 6, 2013Publication date: March 12, 2015Applicant: Google Inc.Inventors: Andrew W. Senior, Javier Gonzalvo Fructuoso
-
Patent number: 8768704Abstract: An input signal that includes linguistic content in a first language may be received by a computing device. The linguistic content may include text or speech. Based on an acoustic feature comparison between a plurality of first-language speech sounds and a plurality of second-language speech sounds, the computing device may associate the linguistic content in the first language with one or more phonemes from a second language. The computing device may also determine a phonemic representation of the linguistic content in the first language based on use of the one or more phonemes from the second language. The phonemic representation may be indicative of a pronunciation of the linguistic content in the first language according to speech sounds of the second language.Type: GrantFiled: October 14, 2013Date of Patent: July 1, 2014Assignee: Google Inc.Inventors: Javier Gonzalvo Fructuoso, Ioannis Agiomyrgiannakis
-
Patent number: 8751236Abstract: A device may receive a plurality of speech sounds that are indicative of pronunciations of a first linguistic term. The device may determine concatenation features of the plurality of speech sounds. The concatenation features may be indicative of an acoustic transition between a first speech sound and a second speech sound when the first speech sound and the second speech sound are concatenated. The first speech sound may be included in the plurality of speech sounds and the second speech sound may be indicative of a pronunciation of a second linguistic term. The device may cluster the plurality of speech sounds into one or more clusters based on the concatenation features. The device may provide a representative speech sound of the given cluster as the first speech sound when the first speech sound and the second speech sound are concatenated.Type: GrantFiled: October 23, 2013Date of Patent: June 10, 2014Assignee: Google Inc.Inventors: Javier Gonzalvo Fructuoso, Alexander Gutkin, Ioannis Agiomyrgiannakis