Patents by Inventor Javier Gonzalvo

Javier Gonzalvo has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

SCALABLE FOUNDATION MODELS FOR PROCESSING STRUCTURED DATA

Publication number: 20250110940

Abstract: Methods, systems, and apparatuses, including computer programs encoded on computer storage media, for implementing a neural network that can perform one or more machine learning tasks on an input that includes data that represents a given data structure. In particular, implementing a language model to encode the data and a foundation neural network with an attention-based architecture to generate the task output. Because of how language model generated embeddings are defined and cached, the described techniques demonstrate significant improvements in required computational resources for training and inference while also exceeding prediction performance on a variety of prediction tasks over conventional approaches.

Type: Application

Filed: October 2, 2024

Publication date: April 3, 2025

Inventors: Xin Yang Yak, Sercan Omer Arik, Yihe Dong, Javier Gonzalvo Fructuoso
ADAPTIVE NEURAL ARCHITECTURE SEARCH

Publication number: 20210019599

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for determining neural network architectures. One of the methods includes selecting a candidate architecture; selecting a neural network block from the set of neural network blocks; determining whether to (i) add the selected neural network block as a new neural network block in the candidate architecture or (ii) replace one of the neural network blocks in the selected candidate architecture with the selected neural network block; based on the determining, generating a mutated architecture; training a neural network having the mutated architecture on the training data; determining a performance measure for the trained neural network that measures the performance of the trained neural network on the particular machine learning task; and adding, to the maintained data, data specifying the mutated architecture and data associating the mutated architecture with the determined performance measure.

Type: Application

Filed: July 20, 2020

Publication date: January 21, 2021

Inventors: Hanna Mazzawi, Javier Gonzalvo Fructuoso, Eugen Surri Hotaj
Text-to-speech synthesis using an autoencoder

Patent number: 10249289

Abstract: Methods, systems, and computer-readable media for text-to-speech synthesis using an autoencoder. In some implementations, data indicating a text for text-to-speech synthesis is obtained. Data indicating a linguistic unit of the text is provided as input to an encoder. The encoder is configured to output speech unit representations indicative of acoustic characteristics based on linguistic information. A speech unit representation that the encoder outputs is received. A speech unit is selected to represent the linguistic unit, the speech unit being selected from among a collection of speech units based on the speech unit representation output by the encoder. Audio data for a synthesized utterance of the text that includes the selected speech unit is provided.

Type: Grant

Filed: July 13, 2017

Date of Patent: April 2, 2019

Assignee: Google LLC

Inventors: Byung Ha Chun, Javier Gonzalvo, Chun-an Chan, Ioannis Agiomyrgiannakis, Vincent Ping Leung Wan, Robert Andrew James Clark, Jakub Vit
TEXT-TO-SPEECH SYNTHESIS USING AN AUTOENCODER

Publication number: 20180268806

Abstract: Methods, systems, and computer-readable media for text-to-speech synthesis using an autoencoder. In some implementations, data indicating a text for text-to-speech synthesis is obtained. Data indicating a linguistic unit of the text is provided as input to an encoder. The encoder is configured to output speech unit representations indicative of acoustic characteristics based on linguistic information. A speech unit representation that the encoder outputs is received. A speech unit is selected to represent the linguistic unit, the speech unit being selected from among a collection of speech units based on the speech unit representation output by the encoder. Audio data for a synthesized utterance of the text that includes the selected speech unit is provided.

Type: Application

Filed: July 13, 2017

Publication date: September 20, 2018

Inventors: Byung Ha Chun, Javier Gonzalvo, Chun-an Chan, Ioannis Agiomyrgiannakis, Vincent Ping Leung Wan, Robert Andrew James Clark, Jakub Vit
Multilingual prosody generation

Patent number: 9905220

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for multilingual prosody generation. In some implementations, data indicating a set of linguistic features corresponding to a text is obtained. Data indicating the linguistic features and data indicating the language of the text are provided as input to a neural network that has been trained to provide output indicating prosody information for multiple languages. The neural network can be a neural network having been trained using speech in multiple languages. Output indicating prosody information for the linguistic features is received from the neural network. Audio data representing the text is generated using the output of the neural network.

Type: Grant

Filed: November 16, 2015

Date of Patent: February 27, 2018

Assignee: Google LLC

Inventors: Javier Gonzalvo Fructuoso, Andrew W. Senior, Byungha Chun
SPEECH SYNTHESIS MODEL SELECTION

Publication number: 20160343366

Abstract: In some implementations, a text-to-speech system may perform a mapping of acoustic frames to linguistic model clusters in a pre-selection process for unit selection synthesis. An architecture may leverage data-driven models, such as neural networks that are trained using recorded speech samples, to effectively map acoustic frames to linguistic model clusters during synthesis. This architecture may allow for improved handling and synthesis of combinations of unseen linguistic features.

Type: Application

Filed: May 19, 2015

Publication date: November 24, 2016

Inventors: Javier Gonzalvo Fructuoso, Byungha Chun
Deep networks for unit selection speech synthesis

Patent number: 9460704

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for providing a representation based on structured data in resources. The methods, systems, and apparatus include actions of receiving target acoustic features output from a neural network that has been trained to predict acoustic features given linguistic features. Additional actions include determining a distance between the target acoustic features and acoustic features of a stored acoustic sample. Further actions include selecting the acoustic sample to be used in speech synthesis based at least on the determined distance and synthesizing speech based on the selected acoustic sample.

Type: Grant

Filed: September 6, 2013

Date of Patent: October 4, 2016

Assignee: Google Inc.

Inventors: Andrew W. Senior, Javier Gonzalvo Fructuoso
Statistical unit selection language models based on acoustic fingerprinting

Patent number: 9424835

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for providing statistical unit selection language modeling based on acoustic fingerprinting. The methods, systems and apparatus include the actions of obtaining a unit database of acoustic units and, for each acoustic unit, linguistic data corresponding to the acoustic unit; obtaining stored data associating each acoustic unit with (i) a corresponding acoustic fingerprint and (ii) a probability of the linguistic data corresponding to the acoustic unit occurring in a text corpus; determining that the unit database of acoustic units has been updated to include one or more new acoustic units; for each new acoustic unit in the updated unit database: generating an acoustic fingerprint for the new acoustic unit; identifying an acoustic unit that (i) has an acoustic fingerprint that is indicated as similar to the fingerprint of the new acoustic unit, and (ii) has a stored associated probability.

Type: Grant

Filed: September 10, 2015

Date of Patent: August 23, 2016

Assignee: Google Inc.

Inventors: Alexander Gutkin, Javier Gonzalvo Fructuoso, Cyril Georges Luc Allauzen
Methods and systems for sharing of adapted voice profiles

Patent number: 9318104

Abstract: Methods and systems for sharing of adapted voice profiles are provided. The method may comprise receiving, at a computing system, one or more speech samples, and the one or more speech samples may include a plurality of spoken utterances. The method may further comprise determining, at the computing system, a voice profile associated with a speaker of the plurality of spoken utterances, and including an adapted voice of the speaker. Still further, the method may comprise receiving, at the computing system, an authorization profile associated with the determined voice profile, and the authorization profile may include one or more user identifiers associated with one or more respective users. Yet still further, the method may comprise the computing system providing the voice profile to at least one computing device associated with the one or more respective users, based at least in part on the authorization profile.

Type: Grant

Filed: July 10, 2015

Date of Patent: April 19, 2016

Assignee: Google Inc.

Inventors: Javier Gonzalvo Fructuoso, Johan Schalkwyk
STATISTICAL UNIT SELECTION LANGUAGE MODELS BASED ON ACOUSTIC FINGERPRINTING

Publication number: 20160093295

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for providing statistical unit selection language modeling based on acoustic fingerprinting. The methods, systems and apparatus include the actions of obtaining a unit database of acoustic units and, for each acoustic unit, linguistic data corresponding to the acoustic unit; obtaining stored data associating each acoustic unit with (i) a corresponding acoustic fingerprint and (ii) a probability of the linguistic data corresponding to the acoustic unit occurring in a text corpus; determining that the unit database of acoustic units has been updated to include one or more new acoustic units; for each new acoustic unit in the updated unit database: generating an acoustic fingerprint for the new acoustic unit; identifying an acoustic unit that (i) has an acoustic fingerprint that is indicated as similar to the fingerprint of the new acoustic unit, and (ii) has a stored associated probability.

Type: Application

Filed: September 10, 2015

Publication date: March 31, 2016

Inventors: Alexander Gutkin, Javier Gonzalvo Fructuoso, Cyril Georges Luc Allauzen
MULTILINGUAL PROSODY GENERATION

Publication number: 20160071512

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for multilingual prosody generation. In some implementations, data indicating a set of linguistic features corresponding to a text is obtained. Data indicating the linguistic features and data indicating the language of the text are provided as input to a neural network that has been trained to provide output indicating prosody information for multiple languages. The neural network can be a neural network having been trained using speech in multiple languages. Output indicating prosody information for the linguistic features is received from the neural network. Audio data representing the text is generated using the output of the neural network.

Type: Application

Filed: November 16, 2015

Publication date: March 10, 2016

Inventors: Javier Gonzalvo Fructuoso, Andrew W. Senior, Byungha Chun
Methods and systems for automated generation of nativized multi-lingual lexicons

Patent number: 9263028

Abstract: An input signal that includes linguistic content in a first language may be received by a computing device. The linguistic content may include text or speech. The computing device may associate the linguistic content in the first language with one or more phonemes from a second language. The computing device may also determine a phonemic representation of the linguistic content in the first language based on use of the one or more phonemes from the second language. The phonemic representation may be indicative of a pronunciation of the linguistic content in the first language according to speech sounds of the second language.

Type: Grant

Filed: May 21, 2014

Date of Patent: February 16, 2016

Assignee: Google Inc.

Inventors: Javier Gonzalvo Fructuoso, Ioannis Agiomyrgiannakis
Multilingual prosody generation

Patent number: 9195656

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for multilingual prosody generation. In some implementations, data indicating a set of linguistic features corresponding to a text is obtained. Data indicating the linguistic features and data indicating the language of the text are provided as input to a neural network that has been trained to provide output indicating prosody information for multiple languages. The neural network can be a neural network having been trained using speech in multiple languages. Output indicating prosody information for the linguistic features is received from the neural network. Audio data representing the text is generated using the output of the neural network.

Type: Grant

Filed: December 30, 2013

Date of Patent: November 24, 2015

Assignee: Google Inc.

Inventors: Javier Gonzalvo Fructuoso, Andrew W. Senior, Byungha Chun
Methods and systems for sharing of adapted voice profiles

Patent number: 9117451

Abstract: Methods and systems for sharing of adapted voice profiles are provided. The method may comprise receiving, at a computing system, one or more speech samples, and the one or more speech samples may include a plurality of spoken utterances. The method may further comprise determining, at the computing system, a voice profile associated with a speaker of the plurality of spoken utterances, and including an adapted voice of the speaker. Still further, the method may comprise receiving, at the computing system, an authorization profile associated with the determined voice profile, and the authorization profile may include one or more user identifiers associated with one or more respective users. Yet still further, the method may comprise the computing system providing the voice profile to at least one computing device associated with the one or more respective users, based at least in part on the authorization profile.

Type: Grant

Filed: April 29, 2013

Date of Patent: August 25, 2015

Assignee: Google Inc.

Inventors: Javier Gonzalvo Fructuoso, Johan Schalkwyk
Creation of spoken news programs

Patent number: 9111534

Abstract: Implementations related to system and techniques for providing audio news reports are discussed. A computer-implemented method includes identifying, with a computer system, one or more news preferences for a first user, selecting a plurality of news stories, wherein particular ones of the new stories are determined to be responsive to the news preferences for the first user and comprise audio versions of stories converted automatically from textual news stories, assembling, with the computer system and for the first user, an audio news report that includes the audio versions of the selected news stories, and delivering, to a computing device, the assembled audio news report.

Type: Grant

Filed: March 14, 2013

Date of Patent: August 18, 2015

Assignee: Google Inc.

Inventors: Michael A. Sylvester, Javier Gonzalvo Fructuoso
Text-to-speech synthesis

Patent number: 9082401

Abstract: The present disclosure describes example systems, methods, and devices for generating a synthetic speech signal. An example method may include determining a phonemic representation of text. The example method may also include identifying one or more finite-state machines (“FSMs”) corresponding to one or more phonemes included in the phonemic representation of the text. A given FSM may be a compressed unit of recorded speech that simulates a Hidden Markov Model. The example method may further include determining a selected sequence of models that minimizes a cost function that represents a likelihood that a possible sequence of models substantially matches a phonemic representation of text. Each possible sequence of models may include at least one FSM. The method may additionally include generating a synthetic speech signal based on the selected sequence that includes one or more spectral features generated from at least one FSM included in the selected sequence.

Type: Grant

Filed: January 9, 2013

Date of Patent: July 14, 2015

Assignee: Google Inc.

Inventors: Javier Gonzalvo Fructuoso, Alexander Gutkin
MULTILINGUAL PROSODY GENERATION

Publication number: 20150186359

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for multilingual prosody generation. In some implementations, data indicating a set of linguistic features corresponding to a text is obtained. Data indicating the linguistic features and data indicating the language of the text are provided as input to a neural network that has been trained to provide output indicating prosody information for multiple languages. The neural network can be a neural network having been trained using speech in multiple languages. Output indicating prosody information for the linguistic features is received from the neural network. Audio data representing the text is generated using the output of the neural network.

Type: Application

Filed: December 30, 2013

Publication date: July 2, 2015

Applicant: Google Inc.

Inventors: Javier Gonzalvo Fructuoso, Andrew W. Senior, Byungha Chun
Methods and Systems for Automated Generation of Nativized Multi-Lingual Lexicons

Publication number: 20150095018

Abstract: An input signal that includes linguistic content in a first language may be received by a computing device. The linguistic content may include text or speech. The computing device may associate the linguistic content in the first language with one or more phonemes from a second language. The computing device may also determine a phonemic representation of the linguistic content in the first language based on use of the one or more phonemes from the second language. The phonemic representation may be indicative of a pronunciation of the linguistic content in the first language according to speech sounds of the second language.

Type: Application

Filed: May 21, 2014

Publication date: April 2, 2015

Applicant: Google Inc.

Inventors: Javier Gonzalvo Fructuoso, Ioannis Agiomyrgiannakis
DEEP NETWORKS FOR UNIT SELECTION SPEECH SYNTHESIS

Publication number: 20150073804

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for providing a representation based on structured data in resources. The methods, systems, and apparatus include actions of receiving target acoustic features output from a neural network that has been trained to predict acoustic features given linguistic features. Additional actions include determining a distance between the target acoustic features and acoustic features of a stored acoustic sample. Further actions include selecting the acoustic sample to be used in speech synthesis based at least on the determined distance and synthesizing speech based on the selected acoustic sample.

Type: Application

Filed: September 6, 2013

Publication date: March 12, 2015

Applicant: Google Inc.

Inventors: Andrew W. Senior, Javier Gonzalvo Fructuoso
Methods and systems for automated generation of nativized multi-lingual lexicons

Patent number: 8768704

Abstract: An input signal that includes linguistic content in a first language may be received by a computing device. The linguistic content may include text or speech. Based on an acoustic feature comparison between a plurality of first-language speech sounds and a plurality of second-language speech sounds, the computing device may associate the linguistic content in the first language with one or more phonemes from a second language. The computing device may also determine a phonemic representation of the linguistic content in the first language based on use of the one or more phonemes from the second language. The phonemic representation may be indicative of a pronunciation of the linguistic content in the first language according to speech sounds of the second language.

Type: Grant

Filed: October 14, 2013

Date of Patent: July 1, 2014

Assignee: Google Inc.

Inventors: Javier Gonzalvo Fructuoso, Ioannis Agiomyrgiannakis

1 2 next