Patents by Inventor Ioannis Agiomyrgiannakis

Ioannis Agiomyrgiannakis has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20180268807
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for selecting units for speech synthesis. One of the methods includes determining a sequence of text units that each represent a respective portion of text for speech synthesis; and determining multiple paths of speech units that each represent the sequence of text units by selecting a first speech unit that includes speech synthesis data representing a first text unit; selecting multiple second speech units including speech synthesis data representing a second text unit based on (i) a join cost to concatenate the second speech unit with a first speech unit and (ii) a target cost indicating a degree that the second speech unit corresponds to the second text unit; and defining paths from the selected first speech unit to each of the multiple second speech units to include in the multiple paths of speech units.
    Type: Application
    Filed: November 28, 2017
    Publication date: September 20, 2018
    Inventor: Ioannis Agiomyrgiannakis
  • Publication number: 20180144737
    Abstract: A device may identify a plurality of sources for outputs that the device is configured to provide. The plurality of sources may include at least one of a particular application in the device, an operating system of the device, a particular area within a display of the device, or a particular graphical user interface object. The device may also assign a set of distinct voices to respective sources of the plurality of sources. The device may also receive a request for speech output. The device may also select a particular source that is associated with the requested speech output. The device may also generate speech having particular voice characteristics of a particular voice assigned to the particular source.
    Type: Application
    Filed: January 18, 2018
    Publication date: May 24, 2018
    Inventors: Ioannis Agiomyrgiannakis, Fergus James Henderson
  • Patent number: 9865247
    Abstract: A device may receive a speech signal. The device may determine acoustic feature parameters for the speech signal. The acoustic feature parameters may include phase data. The device may determine circular space representations for the phase data based on an alignment of the phase data with given axes of the circular space representations. The device may map the phase data to linguistic features based on the circular space representations. The linguistic features may be associated with linguistic content that includes phonemic content or text content. The device may provide a synthetic audio pronunciation of the linguistic content based on the mapping.
    Type: Grant
    Filed: February 25, 2015
    Date of Patent: January 9, 2018
    Assignee: Google Inc.
    Inventors: Ioannis Agiomyrgiannakis, Byung Ha Chun
  • Patent number: 9613620
    Abstract: A device may receive data indicative of a plurality of speech sounds associated with first voice characteristics of a first voice. The device may receive an input indicative of speech associated with second voice characteristics of a second voice. The device may map at least one portion of the speech of the second voice to one or more speech sounds of the plurality of speech sounds of the first voice. The device may compare the first voice characteristics with the second voice characteristics based on the map. The comparison may include vocal tract characteristics, nasal cavity characteristics, and voicing characteristics. The device may determine a given representation configured to associate the first voice characteristics with the second voice characteristics. The device may provide an output indicative of pronunciations of the one or more speech sounds of the first voice according to the second voice characteristics based on the given representation.
    Type: Grant
    Filed: February 25, 2015
    Date of Patent: April 4, 2017
    Assignee: Google Inc.
    Inventors: Ioannis Agiomyrgiannakis, Zoi Roupakia
  • Patent number: 9607610
    Abstract: A device may receive an input indicative of acoustic feature parameters associated with speech. The device may determine a modulated noise representation for noise pertaining to one or more of an aspirate or a fricative in the speech based on the acoustic feature parameters. The aspirate may be associated with a characteristic of an exhalation of at least a threshold amount of breath. The fricative may be associated with a characteristic of airflow between two or more vocal tract articulators. The device may also provide an audio signal indicative of a synthetic audio pronunciation of the speech based on the modulated noise representation.
    Type: Grant
    Filed: February 26, 2015
    Date of Patent: March 28, 2017
    Assignee: Google Inc.
    Inventor: Ioannis Agiomyrgiannakis
  • Patent number: 9569532
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting, from among a collection of videos, a set of candidate videos that (i) are identified as being associated with a particular song, and (ii) are classified as a cappella video recordings; extracting, from each of the candidate videos of the set, a monophonic melody line from an audio channel of the candidate video; selecting, from among the set of candidate videos, a subset of the candidate videos based on a similarity of the monophonic melody line of the candidate videos of the subset with each other; and providing, to a recognizer that recognizes songs from sounds produced by a human voice, (i) an identifier of the particular song, and (ii) one or more of the monophonic melody lines of the candidate videos of the subset.
    Type: Grant
    Filed: June 10, 2014
    Date of Patent: February 14, 2017
    Assignee: Google Inc.
    Inventors: Matthew Sharifi, Dominik Roblek, Vera Dron, Ioannis Agiomyrgiannakis
  • Patent number: 9542927
    Abstract: A method and system is disclosed for building a speech database for a text-to-speech (TTS) synthesis system from multiple speakers recorded under diverse conditions. For a plurality of utterances of a reference speaker, a set of reference-speaker vectors may be extracted, and for each of a plurality of utterances of a colloquial speaker, a respective set of colloquial-speaker vectors may be extracted. A matching procedure, carried out under a transform that compensates for speaker differences, may be used to match each colloquial-speaker vector to a reference-speaker vector. The colloquial-speaker vector may be replaced with the matched reference-speaker vector. The matching-and-replacing can be carried out separately for each set of colloquial-speaker vectors. A conditioned set of speaker vectors can then be constructed by aggregating all the replaced speaker vectors. The condition set of speaker vectors can be used to train the TTS system.
    Type: Grant
    Filed: November 13, 2014
    Date of Patent: January 10, 2017
    Assignee: Google Inc.
    Inventors: Ioannis Agiomyrgiannakis, Alexander Gutkin
  • Publication number: 20160336003
    Abstract: A device may identify a plurality of sources for outputs that the device is configured to provide. The plurality of sources may include at least one of a particular application in the device, an operating system of the device, a particular area within a display of the device, or a particular graphical user interface object. The device may also assign a set of distinct voices to respective sources of the plurality of sources. The device may also receive a request for speech output. The device may also select a particular source that is associated with the requested speech output. The device may also generate speech having particular voice characteristics of a particular voice assigned to the particular source.
    Type: Application
    Filed: May 13, 2015
    Publication date: November 17, 2016
    Inventors: Ioannis Agiomyrgiannakis, Fergus James Henderson
  • Patent number: 9460705
    Abstract: A device may determine a representation of text that includes a first linguistic term associated with a first set of speech sounds and a second linguistic term associated with a second set of speech sounds. The device may determine a plurality of joins between the first set and the second set. A given join may be indicative of concatenating a first speech sound from the first set with a second speech sound from the second set. A given local cost of the given join may correspond to a weighted sum of individual cost. A given individual cost may be weighted based on a variability of the given individual cost in the plurality of joins. The device may provide a sequence of speech sounds indicative of a pronunciation of the text based on a minimization of a sum of local costs of adjacent speech sounds in the sequence.
    Type: Grant
    Filed: November 22, 2013
    Date of Patent: October 4, 2016
    Assignee: Google Inc.
    Inventors: Ioannis Agiomyrgiannakis, Ibrahim Badr
  • Publication number: 20160140951
    Abstract: A method and system is disclosed for building a speech database for a text-to-speech (TTS) synthesis system from multiple speakers recorded under diverse conditions. For a plurality of utterances of a reference speaker, a set of reference-speaker vectors may be extracted, and for each of a plurality of utterances of a colloquial speaker, a respective set of colloquial-speaker vectors may be extracted. A matching procedure, carried out under a transform that compensates for speaker differences, may be used to match each colloquial-speaker vector to a reference-speaker vector. The colloquial-speaker vector may be replaced with the matched reference-speaker vector. The matching-and-replacing can be carried out separately for each set of colloquial-speaker vectors. A conditioned set of speaker vectors can then be constructed by aggregating all the replaced speaker vectors. The condition set of speaker vectors can be used to train the TTS system.
    Type: Application
    Filed: November 13, 2014
    Publication date: May 19, 2016
    Inventors: Ioannis Agiomyrgiannakis, Alexander Gutkin
  • Patent number: 9263028
    Abstract: An input signal that includes linguistic content in a first language may be received by a computing device. The linguistic content may include text or speech. The computing device may associate the linguistic content in the first language with one or more phonemes from a second language. The computing device may also determine a phonemic representation of the linguistic content in the first language based on use of the one or more phonemes from the second language. The phonemic representation may be indicative of a pronunciation of the linguistic content in the first language according to speech sounds of the second language.
    Type: Grant
    Filed: May 21, 2014
    Date of Patent: February 16, 2016
    Assignee: Google Inc.
    Inventors: Javier Gonzalvo Fructuoso, Ioannis Agiomyrgiannakis
  • Publication number: 20160005392
    Abstract: A device may receive an input indicative of acoustic feature parameters associated with speech. The device may determine a modulated noise representation for noise pertaining to one or more of an aspirate or a fricative in the speech based on the acoustic feature parameters. The aspirate may be associated with a characteristic of an exhalation of at least a threshold amount of breath. The fricative may be associated with a characteristic of airflow between two or more vocal tract articulators. The device may also provide an audio signal indicative of a synthetic audio pronunciation of the speech based on the modulated noise representation.
    Type: Application
    Filed: February 26, 2015
    Publication date: January 7, 2016
    Inventor: Ioannis Agiomyrgiannakis
  • Publication number: 20160005391
    Abstract: A device may receive a speech signal. The device may determine acoustic feature parameters for the speech signal. The acoustic feature parameters may include phase data. The device may determine circular space representations for the phase data based on an alignment of the phase data with given axes of the circular space representations. The device may map the phase data to linguistic features based on the circular space representations. The linguistic features may be associated with linguistic content that includes phonemic content or text content. The device may provide a synthetic audio pronunciation of the linguistic content based on the mapping.
    Type: Application
    Filed: February 25, 2015
    Publication date: January 7, 2016
    Inventors: Ioannis Agiomyrgiannakis, Byungha Chun
  • Publication number: 20160005403
    Abstract: A device may receive data indicative of a plurality of speech sounds associated with first voice characteristics of a first voice. The device may receive an input indicative of speech associated with second voice characteristics of a second voice. The device may map at least one portion of the speech of the second voice to one or more speech sounds of the plurality of speech sounds of the first voice. The device may compare the first voice characteristics with the second voice characteristics based on the map. The comparison may include vocal tract characteristics, nasal cavity characteristics, and voicing characteristics. The device may determine a given representation configured to associate the first voice characteristics with the second voice characteristics. The device may provide an output indicative of pronunciations of the one or more speech sounds of the first voice according to the second voice characteristics based on the given representation.
    Type: Application
    Filed: February 25, 2015
    Publication date: January 7, 2016
    Inventors: Ioannis Agiomyrgiannakis, Zoi Roupakia
  • Patent number: 9183830
    Abstract: A method and system is disclosed for non-parametric speech conversion. A text-to-speech (TTS) synthesis system may include hidden Markov model (HMM) HMM based speech modeling for both synthesizing output speech. A converted HMM may be initially set to a source HMM trained with a voice of a source speaker. A parametric representation of speech may be extract from speech of a target speaker to generate a set of target-speaker vectors. A matching procedure, carried out under a transform that compensates for speaker differences, may be used to match each HMM state of the source HMM to a target-speaker vector. The HMM states of the converted HMM may be replaced with the matched target-speaker vectors. Transforms may be applied to further adapt the converted HMM to the voice of target speaker. The converted HMM may be used to synthesize speech with voice characteristics of the target speaker.
    Type: Grant
    Filed: November 1, 2013
    Date of Patent: November 10, 2015
    Assignee: Google Inc.
    Inventor: Ioannis Agiomyrgiannakis
  • Patent number: 9177549
    Abstract: A method and system for is disclosed for cross-lingual voice conversion. A speech-to-speech system may include hidden Markov model (HMM) HMM based speech modeling for both recognizing input speech and synthesizing output speech. A cross-lingual HMM may be initially set to an output HMM trained with a voice of an output speaker in an output language. An auxiliary HMM may be trained with a voice of an auxiliary speaker in an input language. A matching procedure, carried out under a transform that compensates for speaker differences, may be used to match each HMM state of the output HMM to a HMM state of the auxiliary HMM. The HMM states of the cross-lingual HMM may be replaced with the matched states. Transforms may be applied to adapt the cross-lingual HMM to the voices of the auxiliary speaker and of an input speaker. The cross-lingual HMM may be used for speech synthesis.
    Type: Grant
    Filed: November 1, 2013
    Date of Patent: November 3, 2015
    Assignee: Google Inc.
    Inventor: Ioannis Agiomyrgiannakis
  • Patent number: 9159329
    Abstract: A method and system for improving the quality of speech generated from Hidden Markov Model (HMM)-based Text-To-Speech Synthesizers using statistical post-filtering techniques. An example method involves: (a) determining a scale factor that, when applied to a synthesized reference spectral envelope, minimizes a statistical divergence between a natural reference spectral envelope and the synthesized reference spectral envelope, where the synthesized reference spectral envelope is generated by a state of an HMM; (b) for a given synthesized subject spectral envelope generated by the state of the HMM, determining an enhanced synthesized subject spectral envelope based on the determined scale factor; and (c) generating, by a computing device, a synthetic speech signal including the enhanced synthesized subject spectral envelope.
    Type: Grant
    Filed: December 5, 2012
    Date of Patent: October 13, 2015
    Assignee: Google Inc.
    Inventors: Ioannis Agiomyrgiannakis, Florian Alexander Eyben
  • Publication number: 20150134339
    Abstract: A device may determine a representation of text that includes a first linguistic term associated with a first set of speech sounds and a second linguistic term associated with a second set of speech sounds. The device may determine a plurality of joins between the first set and the second set. A given join may be indicative of concatenating a first speech sound from the first set with a second speech sound from the second set. A given local cost of the given join may correspond to a weighted sum of individual cost. A given individual cost may be weighted based on a variability of the given individual cost in the plurality of joins. The device may provide a sequence of speech sounds indicative of a pronunciation of the text based on a minimization of a sum of local costs of adjacent speech sounds in the sequence.
    Type: Application
    Filed: November 22, 2013
    Publication date: May 14, 2015
    Applicant: Google Inc
    Inventors: Ioannis Agiomyrgiannakis, Ibrahim Badr
  • Publication number: 20150127349
    Abstract: A method and system for is disclosed for cross-lingual voice conversion. A speech-to-speech system may include hidden Markov model (HMM) HMM based speech modeling for both recognizing input speech and synthesizing output speech. A cross-lingual HMM may be initially set to an output HMM trained with a voice of an output speaker in an output language. An auxiliary HMM may be trained with a voice of an auxiliary speaker in an input language. A matching procedure, carried out under a transform that compensates for speaker differences, may be used to match each HMM state of the output HMM to a HMM state of the auxiliary HMM. The HMM states of the cross-lingual HMM may be replaced with the matched states. Transforms may be applied to adapt the cross-lingual HMM to the voices of the auxiliary speaker and of an input speaker. The cross-lingual HMM may be used for speech synthesis.
    Type: Application
    Filed: November 1, 2013
    Publication date: May 7, 2015
    Applicant: Google Inc.
    Inventor: Ioannis Agiomyrgiannakis
  • Publication number: 20150127350
    Abstract: A method and system is disclosed for non-parametric speech conversion. A text-to-speech (TTS) synthesis system may include hidden Markov model (HMM) HMM based speech modeling for both synthesizing output speech. A converted HMM may be initially set to a source HMM trained with a voice of a source speaker. A parametric representation of speech may be extract from speech of a target speaker to generate a set of target-speaker vectors. A matching procedure, carried out under a transform that compensates for speaker differences, may be used to match each HMM state of the source HMM to a target-speaker vector. The HMM states of the converted HMM may be replaced with the matched target-speaker vectors. Transforms may be applied to further adapt the converted HMM to the voice of target speaker. The converted HMM may be used to synthesize speech with voice characteristics of the target speaker.
    Type: Application
    Filed: November 1, 2013
    Publication date: May 7, 2015
    Applicant: Google Inc.
    Inventor: Ioannis Agiomyrgiannakis