Patents by Inventor Juergen Schroeter

Juergen Schroeter has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 7392190
    Abstract: A method for generating animated sequences of talking heads in text-to-speech applications wherein a processor samples a plurality of frames comprising image samples. The processor reads first data comprising one or more parameters associated with noise-producing orifice images of sequences of at least three concatenated phonemes which correspond to an input stimulus. The processor reads, based on the first data, second data comprising images of a noise-producing entity. The processor generates an animated sequence of the noise-producing entity.
    Type: Grant
    Filed: August 24, 2006
    Date of Patent: June 24, 2008
    Assignee: AT&T Corp.
    Inventors: Eric Cosatto, Hans Peter Graf, Juergen Schroeter
  • Publication number: 20080065383
    Abstract: A system, method and computer readable medium that trains a text-to-speech synthesis system for use in speech synthesis is disclosed. The method may include recording audio files of one or more live voices speaking language used in a specific domain, the audio files being recorded using various prosodies, storing the recorded audio files in a speech database; and training a text-to-speech synthesis system using the speech database, wherein the text-to-speech synthesis system selects audio selects audio segments having a prosody based on at least one dialog state and one speech act.
    Type: Application
    Filed: September 8, 2006
    Publication date: March 13, 2008
    Applicant: AT&T Corp.
    Inventor: Horst Juergen SCHROETER
  • Patent number: 7117155
    Abstract: A method for generating animated sequences of talking heads in text-to-speech applications wherein a processor samples a plurality of frames comprising image samples. Representative parameters are extracted from the image samples and stored in an animation library. The processor also samples a plurality of multiphones comprising images together with their associated sounds. The processor extracts parameters from these images comprising data characterizing mouth shapes, maps, rules, or equations, and stores the resulting parameters and sound information in a coarticulation library. The animated sequence begins with the processor considering an input phoneme sequence, recalling from the coarticulation library parameters associated with that sequence, and selecting appropriate image samples from the animation library based on that sequence. The image samples are concatenated together, and the corresponding sound is output, to form the animated synthesis.
    Type: Grant
    Filed: October 1, 2003
    Date of Patent: October 3, 2006
    Assignee: AT&T Corp.
    Inventors: Eric Cosatto, Hans Peter Graf, Juergen Schroeter
  • Publication number: 20050256716
    Abstract: A system and method are disclosed for generating customized text-to-speech voices for a particular application. The method comprises generating a custom text-to-speech voice by selecting a voice for generating a custom text-to-speech voice associated with a domain, collecting text data associated with the domain from a pre-existing text data source and using the collected text data, generating an in-domain inventory of synthesis speech units by selecting speech units appropriate to the domain via a search of a pre-existing inventory of synthesis speech units, or by recording the minimal inventory for a selected level of synthesis quality. The text-to-speech custom voice for the domain is generated utilizing the in-domain inventory of synthesis speech units. Active learning techniques may also be employed to identify problem phrases wherein only a few minutes of recorded data is necessary to deliver a high quality TTS custom voice.
    Type: Application
    Filed: May 13, 2004
    Publication date: November 17, 2005
    Applicant: AT&T Corp.
    Inventors: Srinivas Bangalore, Junlan Feng, Mazin Rahim, Juergen Schroeter, David Schulz, Ann Syrdal
  • Publication number: 20050057570
    Abstract: A system and method for generating photo-realistic talking-head animation from a text input utilizes an audio-visual unit selection process. The lip-synchronization is obtained by optimally selecting and concatenating variable-length video units of the mouth area. The unit selection process utilizes the acoustic data to determine the target costs for the candidate images and utilizes the visual data to determine the concatenation costs. The image database is prepared in a hierarchical fashion, including high-level features (such as a full 3D modeling of the head, geometric size and position of elements) and pixel-based, low-level features (such as a PCA-based metric for labeling the various feature bitmaps).
    Type: Application
    Filed: September 15, 2003
    Publication date: March 17, 2005
    Inventors: Eric Cosatto, Hans Graf, Gerasimos Potamianos, Juergen Schroeter
  • Publication number: 20040064321
    Abstract: A method for generating animated sequences of talking heads in text-to-speech applications wherein a processor samples a plurality of frames comprising image samples. Representative parameters are extracted from the image samples and stored in an animation library. The processor also samples a plurality of multiphones comprising images together with their associated sounds. The processor extracts parameters from these images comprising data characterizing mouth shapes, maps, rules, or equations, and stores the resulting parameters and sound information in a coarticulation library. The animated sequence begins with the processor considering an input phoneme sequence, recalling from the coarticulation library parameters associated with that sequence, and selecting appropriate image samples from the animation library based on that sequence. The image samples are concatenated together, and the corresponding sound is output, to form the animated synthesis.
    Type: Application
    Filed: October 1, 2003
    Publication date: April 1, 2004
    Inventors: Eric Cosatto, Hans Peter Graf, Juergen Schroeter
  • Patent number: 6662161
    Abstract: A method for generating animated sequences of talking heads in text-to-speech applications wherein a processor samples a plurality of frames comprising image samples. Representative parameters are extracted from the image samples and stored in an animation library. The processor also samples a plurality of multiphones comprising images together with their associated sounds. The processor extracts parameters from these images comprising data characterizing mouth shapes, maps, rules, or equations, and stores the resulting parameters and sound information in a coarticulation library. The animated sequence begins with the processor considering an input phoneme sequence, recalling from the coarticulation library parameters associated with that sequence, and selecting appropriate image samples from the animation library based on that sequence. The image samples are concatenated together, and the corresponding sound is output, to form the animated synthesis.
    Type: Grant
    Filed: September 7, 1999
    Date of Patent: December 9, 2003
    Assignee: AT&T Corp.
    Inventors: Eric Cosatto, Hans Peter Graf, Juergen Schroeter
  • Patent number: 6654018
    Abstract: A system and method for generating photo-realistic talking-head animation from a text input utilizes an audio-visual unit selection process. The lip-synchronization is obtained by optimally selecting and concatenating variable-length video units of the mouth area. The unit selection process utilizes the acoustic data to determine the target costs for the candidate images and utilizes the visual data to determine the concatenation costs. The image database is prepared in a hierarchical fashion, including high-level features (such as a full 3D modeling of the head, geometric size and position of elements) and pixel-based, low-level features (such as a PCA-based metric for labeling the various feature bitmaps).
    Type: Grant
    Filed: March 29, 2001
    Date of Patent: November 25, 2003
    Assignee: AT&T Corp.
    Inventors: Eric Cosatto, Hans Peter Graf, Gerasimos Potamianos, Juergen Schroeter
  • Patent number: 6535843
    Abstract: When necessary to time scale a speech signal, it is advantageous to do it under influence of a signal that measures the small-window non-stationarity of the speech signal. Three measures of stationarity are disclosed: one that is based on time domain analysis, one that is based on frequency domain analysis, and one that is based on both time and frequency domain analysis.
    Type: Grant
    Filed: August 18, 1999
    Date of Patent: March 18, 2003
    Assignee: AT&T Corp.
    Inventors: Ioannis G. Stylianou, David A. Kapilow, Juergen Schroeter
  • Patent number: 6324501
    Abstract: Speech signals, and similar one-dimensional signals, are time scaled, interpolated, and/or smoothed, when necessary, under influence of a signal that is sensitive to a small window stationarity of the signal that is being modified. Three measures of stationarity are disclosed: one that is based on time domain analysis, one that is based on frequency domain analysis, and one that is based on both time and frequency domain analysis.
    Type: Grant
    Filed: August 18, 1999
    Date of Patent: November 27, 2001
    Assignee: AT&T Corp.
    Inventors: Ioannis G. Stylianou, David A. Kapilow, Juergen Schroeter
  • Patent number: 6112177
    Abstract: A method for generating animated sequences of talking heads in text-to-speech applications wherein a processor samples a plurality of frames comprising image samples. Representative parameters are extracted from the image samples and stored in an animation library. The processor also samples a plurality of multiphones comprising images together with their associated sounds. The processor extracts parameters from these images comprising data characterizing mouth shapes, maps, rules, or equations, and stores the resulting parameters and sound information in a coarticulation library. The animated sequence begins with the processor considering an input phoneme sequence, recalling from the coarticulation library parameters associated with that sequence, and selecting appropriate image samples from the animation library based on that sequence. The image samples are concatenated together, and the corresponding sound is output, to form the animated synthesis.
    Type: Grant
    Filed: November 7, 1997
    Date of Patent: August 29, 2000
    Assignee: AT&T Corp.
    Inventors: Eric Cosatto, Hans Peter Graf, Juergen Schroeter
  • Patent number: 4054427
    Abstract: A method of recovering krypton and xenon nuclides from waste gases of nuclear power plants which comprises conveying a stream of waste gas at atmospheric pressure through a bed of activated carbon in an adsorber until the nuclides begin to issue at the outlet of the adsorber. The bed is thereupon regenerated by reducing the pressure therein to 10-300 torr to obtain a desorption gas which can be admixed to waste gas by rinsing the bed with a fluid (preferably an inert gas) at a pressure of 10-400 torr to obtain a stream of product gas which contains a high concentration of nuclides, and by thereupon raising the pressure in the adsorber with an inert gas back to atmospheric pressure.
    Type: Grant
    Filed: December 26, 1974
    Date of Patent: October 18, 1977
    Assignee: Bergwerksverband GmbH
    Inventors: Hans-Juergen Schroeter, Karl Knoblauch, Harald Juentgen, Peter Kronauer