Patents by Inventor Ioannis Agiomyrgiannakis

Ioannis Agiomyrgiannakis has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

SPEECH SYNTHESIS UNIT SELECTION

Publication number: 20180268807

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for selecting units for speech synthesis. One of the methods includes determining a sequence of text units that each represent a respective portion of text for speech synthesis; and determining multiple paths of speech units that each represent the sequence of text units by selecting a first speech unit that includes speech synthesis data representing a first text unit; selecting multiple second speech units including speech synthesis data representing a second text unit based on (i) a join cost to concatenate the second speech unit with a first speech unit and (ii) a target cost indicating a degree that the second speech unit corresponds to the second text unit; and defining paths from the selected first speech unit to each of the multiple second speech units to include in the multiple paths of speech units.

Type: Application

Filed: November 28, 2017

Publication date: September 20, 2018

Inventor: Ioannis Agiomyrgiannakis
Devices and Methods for a Speech-Based User Interface

Publication number: 20180144737

Abstract: A device may identify a plurality of sources for outputs that the device is configured to provide. The plurality of sources may include at least one of a particular application in the device, an operating system of the device, a particular area within a display of the device, or a particular graphical user interface object. The device may also assign a set of distinct voices to respective sources of the plurality of sources. The device may also receive a request for speech output. The device may also select a particular source that is associated with the requested speech output. The device may also generate speech having particular voice characteristics of a particular voice assigned to the particular source.

Type: Application

Filed: January 18, 2018

Publication date: May 24, 2018

Inventors: Ioannis Agiomyrgiannakis, Fergus James Henderson
Devices and methods for use of phase information in speech synthesis systems

Patent number: 9865247

Abstract: A device may receive a speech signal. The device may determine acoustic feature parameters for the speech signal. The acoustic feature parameters may include phase data. The device may determine circular space representations for the phase data based on an alignment of the phase data with given axes of the circular space representations. The device may map the phase data to linguistic features based on the circular space representations. The linguistic features may be associated with linguistic content that includes phonemic content or text content. The device may provide a synthetic audio pronunciation of the linguistic content based on the mapping.

Type: Grant

Filed: February 25, 2015

Date of Patent: January 9, 2018

Assignee: Google Inc.

Inventors: Ioannis Agiomyrgiannakis, Byung Ha Chun
Methods and systems for voice conversion

Patent number: 9613620

Abstract: A device may receive data indicative of a plurality of speech sounds associated with first voice characteristics of a first voice. The device may receive an input indicative of speech associated with second voice characteristics of a second voice. The device may map at least one portion of the speech of the second voice to one or more speech sounds of the plurality of speech sounds of the first voice. The device may compare the first voice characteristics with the second voice characteristics based on the map. The comparison may include vocal tract characteristics, nasal cavity characteristics, and voicing characteristics. The device may determine a given representation configured to associate the first voice characteristics with the second voice characteristics. The device may provide an output indicative of pronunciations of the one or more speech sounds of the first voice according to the second voice characteristics based on the given representation.

Type: Grant

Filed: February 25, 2015

Date of Patent: April 4, 2017

Assignee: Google Inc.

Inventors: Ioannis Agiomyrgiannakis, Zoi Roupakia
Devices and methods for noise modulation in a universal vocoder synthesizer

Patent number: 9607610

Abstract: A device may receive an input indicative of acoustic feature parameters associated with speech. The device may determine a modulated noise representation for noise pertaining to one or more of an aspirate or a fricative in the speech based on the acoustic feature parameters. The aspirate may be associated with a characteristic of an exhalation of at least a threshold amount of breath. The fricative may be associated with a characteristic of airflow between two or more vocal tract articulators. The device may also provide an audio signal indicative of a synthetic audio pronunciation of the speech based on the modulated noise representation.

Type: Grant

Filed: February 26, 2015

Date of Patent: March 28, 2017

Assignee: Google Inc.

Inventor: Ioannis Agiomyrgiannakis
Melody recognition systems

Patent number: 9569532

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting, from among a collection of videos, a set of candidate videos that (i) are identified as being associated with a particular song, and (ii) are classified as a cappella video recordings; extracting, from each of the candidate videos of the set, a monophonic melody line from an audio channel of the candidate video; selecting, from among the set of candidate videos, a subset of the candidate videos based on a similarity of the monophonic melody line of the candidate videos of the subset with each other; and providing, to a recognizer that recognizes songs from sounds produced by a human voice, (i) an identifier of the particular song, and (ii) one or more of the monophonic melody lines of the candidate videos of the subset.

Type: Grant

Filed: June 10, 2014

Date of Patent: February 14, 2017

Assignee: Google Inc.

Inventors: Matthew Sharifi, Dominik Roblek, Vera Dron, Ioannis Agiomyrgiannakis
Method and system for building text-to-speech voice from diverse recordings

Patent number: 9542927

Abstract: A method and system is disclosed for building a speech database for a text-to-speech (TTS) synthesis system from multiple speakers recorded under diverse conditions. For a plurality of utterances of a reference speaker, a set of reference-speaker vectors may be extracted, and for each of a plurality of utterances of a colloquial speaker, a respective set of colloquial-speaker vectors may be extracted. A matching procedure, carried out under a transform that compensates for speaker differences, may be used to match each colloquial-speaker vector to a reference-speaker vector. The colloquial-speaker vector may be replaced with the matched reference-speaker vector. The matching-and-replacing can be carried out separately for each set of colloquial-speaker vectors. A conditioned set of speaker vectors can then be constructed by aggregating all the replaced speaker vectors. The condition set of speaker vectors can be used to train the TTS system.

Type: Grant

Filed: November 13, 2014

Date of Patent: January 10, 2017

Assignee: Google Inc.

Inventors: Ioannis Agiomyrgiannakis, Alexander Gutkin
Devices and Methods for a Speech-Based User Interface

Publication number: 20160336003

Abstract: A device may identify a plurality of sources for outputs that the device is configured to provide. The plurality of sources may include at least one of a particular application in the device, an operating system of the device, a particular area within a display of the device, or a particular graphical user interface object. The device may also assign a set of distinct voices to respective sources of the plurality of sources. The device may also receive a request for speech output. The device may also select a particular source that is associated with the requested speech output. The device may also generate speech having particular voice characteristics of a particular voice assigned to the particular source.

Type: Application

Filed: May 13, 2015

Publication date: November 17, 2016

Inventors: Ioannis Agiomyrgiannakis, Fergus James Henderson
Devices and methods for weighting of local costs for unit selection text-to-speech synthesis

Patent number: 9460705

Abstract: A device may determine a representation of text that includes a first linguistic term associated with a first set of speech sounds and a second linguistic term associated with a second set of speech sounds. The device may determine a plurality of joins between the first set and the second set. A given join may be indicative of concatenating a first speech sound from the first set with a second speech sound from the second set. A given local cost of the given join may correspond to a weighted sum of individual cost. A given individual cost may be weighted based on a variability of the given individual cost in the plurality of joins. The device may provide a sequence of speech sounds indicative of a pronunciation of the text based on a minimization of a sum of local costs of adjacent speech sounds in the sequence.

Type: Grant

Filed: November 22, 2013

Date of Patent: October 4, 2016

Assignee: Google Inc.

Inventors: Ioannis Agiomyrgiannakis, Ibrahim Badr
Method and System for Building Text-to-Speech Voice from Diverse Recordings

Publication number: 20160140951

Abstract: A method and system is disclosed for building a speech database for a text-to-speech (TTS) synthesis system from multiple speakers recorded under diverse conditions. For a plurality of utterances of a reference speaker, a set of reference-speaker vectors may be extracted, and for each of a plurality of utterances of a colloquial speaker, a respective set of colloquial-speaker vectors may be extracted. A matching procedure, carried out under a transform that compensates for speaker differences, may be used to match each colloquial-speaker vector to a reference-speaker vector. The colloquial-speaker vector may be replaced with the matched reference-speaker vector. The matching-and-replacing can be carried out separately for each set of colloquial-speaker vectors. A conditioned set of speaker vectors can then be constructed by aggregating all the replaced speaker vectors. The condition set of speaker vectors can be used to train the TTS system.

Type: Application

Filed: November 13, 2014

Publication date: May 19, 2016

Inventors: Ioannis Agiomyrgiannakis, Alexander Gutkin
Methods and systems for automated generation of nativized multi-lingual lexicons

Patent number: 9263028

Abstract: An input signal that includes linguistic content in a first language may be received by a computing device. The linguistic content may include text or speech. The computing device may associate the linguistic content in the first language with one or more phonemes from a second language. The computing device may also determine a phonemic representation of the linguistic content in the first language based on use of the one or more phonemes from the second language. The phonemic representation may be indicative of a pronunciation of the linguistic content in the first language according to speech sounds of the second language.

Type: Grant

Filed: May 21, 2014

Date of Patent: February 16, 2016

Assignee: Google Inc.

Inventors: Javier Gonzalvo Fructuoso, Ioannis Agiomyrgiannakis
Devices and Methods for a Universal Vocoder Synthesizer

Publication number: 20160005392

Abstract: A device may receive an input indicative of acoustic feature parameters associated with speech. The device may determine a modulated noise representation for noise pertaining to one or more of an aspirate or a fricative in the speech based on the acoustic feature parameters. The aspirate may be associated with a characteristic of an exhalation of at least a threshold amount of breath. The fricative may be associated with a characteristic of airflow between two or more vocal tract articulators. The device may also provide an audio signal indicative of a synthetic audio pronunciation of the speech based on the modulated noise representation.

Type: Application

Filed: February 26, 2015

Publication date: January 7, 2016

Inventor: Ioannis Agiomyrgiannakis
Devices and Methods for Use of Phase Information in Speech Processing Systems

Publication number: 20160005391

Abstract: A device may receive a speech signal. The device may determine acoustic feature parameters for the speech signal. The acoustic feature parameters may include phase data. The device may determine circular space representations for the phase data based on an alignment of the phase data with given axes of the circular space representations. The device may map the phase data to linguistic features based on the circular space representations. The linguistic features may be associated with linguistic content that includes phonemic content or text content. The device may provide a synthetic audio pronunciation of the linguistic content based on the mapping.

Type: Application

Filed: February 25, 2015

Publication date: January 7, 2016

Inventors: Ioannis Agiomyrgiannakis, Byungha Chun
Methods and Systems for Voice Conversion

Publication number: 20160005403

Abstract: A device may receive data indicative of a plurality of speech sounds associated with first voice characteristics of a first voice. The device may receive an input indicative of speech associated with second voice characteristics of a second voice. The device may map at least one portion of the speech of the second voice to one or more speech sounds of the plurality of speech sounds of the first voice. The device may compare the first voice characteristics with the second voice characteristics based on the map. The comparison may include vocal tract characteristics, nasal cavity characteristics, and voicing characteristics. The device may determine a given representation configured to associate the first voice characteristics with the second voice characteristics. The device may provide an output indicative of pronunciations of the one or more speech sounds of the first voice according to the second voice characteristics based on the given representation.

Type: Application

Filed: February 25, 2015

Publication date: January 7, 2016

Inventors: Ioannis Agiomyrgiannakis, Zoi Roupakia
Method and system for non-parametric voice conversion

Patent number: 9183830

Abstract: A method and system is disclosed for non-parametric speech conversion. A text-to-speech (TTS) synthesis system may include hidden Markov model (HMM) HMM based speech modeling for both synthesizing output speech. A converted HMM may be initially set to a source HMM trained with a voice of a source speaker. A parametric representation of speech may be extract from speech of a target speaker to generate a set of target-speaker vectors. A matching procedure, carried out under a transform that compensates for speaker differences, may be used to match each HMM state of the source HMM to a target-speaker vector. The HMM states of the converted HMM may be replaced with the matched target-speaker vectors. Transforms may be applied to further adapt the converted HMM to the voice of target speaker. The converted HMM may be used to synthesize speech with voice characteristics of the target speaker.

Type: Grant

Filed: November 1, 2013

Date of Patent: November 10, 2015

Assignee: Google Inc.

Inventor: Ioannis Agiomyrgiannakis
Method and system for cross-lingual voice conversion

Patent number: 9177549

Abstract: A method and system for is disclosed for cross-lingual voice conversion. A speech-to-speech system may include hidden Markov model (HMM) HMM based speech modeling for both recognizing input speech and synthesizing output speech. A cross-lingual HMM may be initially set to an output HMM trained with a voice of an output speaker in an output language. An auxiliary HMM may be trained with a voice of an auxiliary speaker in an input language. A matching procedure, carried out under a transform that compensates for speaker differences, may be used to match each HMM state of the output HMM to a HMM state of the auxiliary HMM. The HMM states of the cross-lingual HMM may be replaced with the matched states. Transforms may be applied to adapt the cross-lingual HMM to the voices of the auxiliary speaker and of an input speaker. The cross-lingual HMM may be used for speech synthesis.

Type: Grant

Filed: November 1, 2013

Date of Patent: November 3, 2015

Assignee: Google Inc.

Inventor: Ioannis Agiomyrgiannakis
Statistical post-filtering for hidden Markov modeling (HMM)-based speech synthesis

Patent number: 9159329

Abstract: A method and system for improving the quality of speech generated from Hidden Markov Model (HMM)-based Text-To-Speech Synthesizers using statistical post-filtering techniques. An example method involves: (a) determining a scale factor that, when applied to a synthesized reference spectral envelope, minimizes a statistical divergence between a natural reference spectral envelope and the synthesized reference spectral envelope, where the synthesized reference spectral envelope is generated by a state of an HMM; (b) for a given synthesized subject spectral envelope generated by the state of the HMM, determining an enhanced synthesized subject spectral envelope based on the determined scale factor; and (c) generating, by a computing device, a synthetic speech signal including the enhanced synthesized subject spectral envelope.

Type: Grant

Filed: December 5, 2012

Date of Patent: October 13, 2015

Assignee: Google Inc.

Inventors: Ioannis Agiomyrgiannakis, Florian Alexander Eyben
Devices and Methods for Weighting of Local Costs for Unit Selection Text-to-Speech Synthesis

Publication number: 20150134339

Abstract: A device may determine a representation of text that includes a first linguistic term associated with a first set of speech sounds and a second linguistic term associated with a second set of speech sounds. The device may determine a plurality of joins between the first set and the second set. A given join may be indicative of concatenating a first speech sound from the first set with a second speech sound from the second set. A given local cost of the given join may correspond to a weighted sum of individual cost. A given individual cost may be weighted based on a variability of the given individual cost in the plurality of joins. The device may provide a sequence of speech sounds indicative of a pronunciation of the text based on a minimization of a sum of local costs of adjacent speech sounds in the sequence.

Type: Application

Filed: November 22, 2013

Publication date: May 14, 2015

Applicant: Google Inc

Inventors: Ioannis Agiomyrgiannakis, Ibrahim Badr
Method and System for Cross-Lingual Voice Conversion

Publication number: 20150127349

Abstract: A method and system for is disclosed for cross-lingual voice conversion. A speech-to-speech system may include hidden Markov model (HMM) HMM based speech modeling for both recognizing input speech and synthesizing output speech. A cross-lingual HMM may be initially set to an output HMM trained with a voice of an output speaker in an output language. An auxiliary HMM may be trained with a voice of an auxiliary speaker in an input language. A matching procedure, carried out under a transform that compensates for speaker differences, may be used to match each HMM state of the output HMM to a HMM state of the auxiliary HMM. The HMM states of the cross-lingual HMM may be replaced with the matched states. Transforms may be applied to adapt the cross-lingual HMM to the voices of the auxiliary speaker and of an input speaker. The cross-lingual HMM may be used for speech synthesis.

Type: Application

Filed: November 1, 2013

Publication date: May 7, 2015

Applicant: Google Inc.

Inventor: Ioannis Agiomyrgiannakis
Method and System for Non-Parametric Voice Conversion

Publication number: 20150127350

Abstract: A method and system is disclosed for non-parametric speech conversion. A text-to-speech (TTS) synthesis system may include hidden Markov model (HMM) HMM based speech modeling for both synthesizing output speech. A converted HMM may be initially set to a source HMM trained with a voice of a source speaker. A parametric representation of speech may be extract from speech of a target speaker to generate a set of target-speaker vectors. A matching procedure, carried out under a transform that compensates for speaker differences, may be used to match each HMM state of the source HMM to a target-speaker vector. The HMM states of the converted HMM may be replaced with the matched target-speaker vectors. Transforms may be applied to further adapt the converted HMM to the voice of target speaker. The converted HMM may be used to synthesize speech with voice characteristics of the target speaker.

Type: Application

Filed: November 1, 2013

Publication date: May 7, 2015

Applicant: Google Inc.

Inventor: Ioannis Agiomyrgiannakis

prev 1 2 3 next