Patents by Inventor Juergen Schroeter

Juergen Schroeter has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

System and method for generating customized text-to-speech voices

Patent number: 10991360

Abstract: A system and method are disclosed for generating customized text-to-speech voices for a particular application. The method comprises generating a custom text-to-speech voice by selecting a voice for generating a custom text-to-speech voice associated with a domain, collecting text data associated with the domain from a pre-existing text data source and using the collected text data, generating an in-domain inventory of synthesis speech units by selecting speech units appropriate to the domain via a search of a pre-existing inventory of synthesis speech units, or by recording the minimal inventory for a selected level of synthesis quality. The text-to-speech custom voice for the domain is generated utilizing the in-domain inventory of synthesis speech units. Active learning techniques may also be employed to identify problem phrases wherein only a few minutes of recorded data is necessary to deliver a high quality TTS custom voice.

Type: Grant

Filed: July 31, 2017

Date of Patent: April 27, 2021

Assignee: Cerence Operating Company

Inventors: Srinivas Bangalore, Junlan Feng, Mazin Gilbert, Juergen Schroeter, Ann K. Syrdal, David Schulz
SYSTEM AND METHOD FOR GENERATING CUSTOMIZED TEXT-TO-SPEECH VOICES

Publication number: 20170330554

Abstract: A system and method are disclosed for generating customized text-to-speech voices for a particular application. The method comprises generating a custom text-to-speech voice by selecting a voice for generating a custom text-to-speech voice associated with a domain, collecting text data associated with the domain from a pre-existing text data source and using the collected text data, generating an in-domain inventory of synthesis speech units by selecting speech units appropriate to the domain via a search of a pre-existing inventory of synthesis speech units, or by recording the minimal inventory for a selected level of synthesis quality. The text-to-speech custom voice for the domain is generated utilizing the in-domain inventory of synthesis speech units. Active learning techniques may also be employed to identify problem phrases wherein only a few minutes of recorded data is necessary to deliver a high quality TTS custom voice.

Type: Application

Filed: July 31, 2017

Publication date: November 16, 2017

Inventors: Srinivas BANGALORE, Junlan FENG, Mazin GILBERT, Juergen SCHROETER, Ann K. SYRDAL, David SCHULZ
System and method for generating customized text-to-speech voices

Patent number: 9721558

Abstract: A system and method are disclosed for generating customized text-to-speech voices for a particular application. The method comprises generating a custom text-to-speech voice by selecting a voice for generating a custom text-to-speech voice associated with a domain, collecting text data associated with the domain from a pre-existing text data source and using the collected text data, generating an in-domain inventory of synthesis speech units by selecting speech units appropriate to the domain via a search of a pre-existing inventory of synthesis speech units, or by recording the minimal inventory for a selected level of synthesis quality. The text-to-speech custom voice for the domain is generated utilizing the in-domain inventory of synthesis speech units. Active learning techniques may also be employed to identify problem phrases wherein only a few minutes of recorded data is necessary to deliver a high quality TTS custom voice.

Type: Grant

Filed: December 10, 2015

Date of Patent: August 1, 2017

Assignee: Nuance Communications, Inc.

Inventors: Srinivas Bangalore, Junlan Feng, Mazin Gilbert, Juergen Schroeter, Ann K. Syrdal, David Schulz
SYSTEM AND METHOD FOR GENERATING CUSTOMIZED TEXT-TO-SPEECH VOICES

Publication number: 20160093287

Abstract: A system and method are disclosed for generating customized text-to-speech voices for a particular application. The method comprises generating a custom text-to-speech voice by selecting a voice for generating a custom text-to-speech voice associated with a domain, collecting text data associated with the domain from a pre-existing text data source and using the collected text data, generating an in-domain inventory of synthesis speech units by selecting speech units appropriate to the domain via a search of a pre-existing inventory of synthesis speech units, or by recording the minimal inventory for a selected level of synthesis quality. The text-to-speech custom voice for the domain is generated utilizing the in-domain inventory of synthesis speech units. Active learning techniques may also be employed to identify problem phrases wherein only a few minutes of recorded data is necessary to deliver a high quality TTS custom voice.

Type: Application

Filed: December 10, 2015

Publication date: March 31, 2016

Inventors: Srinivas BANGALORE, Junlan FENG, Mazin GILBERT, Juergen SCHROETER, Ann K. SYRDAL, David SCHULZ
System and method for generating customized text-to-speech voices

Patent number: 9240177

Abstract: A system and method are disclosed for generating customized text-to-speech voices for a particular application. The method comprises generating a custom text-to-speech voice by selecting a voice for generating a custom text-to-speech voice associated with a domain, collecting text data associated with the domain from a pre-existing text data source and using the collected text data, generating an in-domain inventory of synthesis speech units by selecting speech units appropriate to the domain via a search of a pre-existing inventory of synthesis speech units, or by recording the minimal inventory for a selected level of synthesis quality. The text-to-speech custom voice for the domain is generated utilizing the in-domain inventory of synthesis speech units. Active learning techniques may also be employed to identify problem phrases wherein only a few minutes of recorded data is necessary to deliver a high quality TTS custom voice.

Type: Grant

Filed: March 4, 2014

Date of Patent: January 19, 2016

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Srinivas Bangalore, Junlan Feng, Mazin G. Rahim, Juergen Schroeter, Ann K. Syrdal, David Schulz
SYSTEM AND METHOD FOR GENERATING CUSTOMIZED TEXT-TO-SPEECH VOICES

Publication number: 20140188480

Abstract: A system and method are disclosed for generating customized text-to-speech voices for a particular application. The method comprises generating a custom text-to-speech voice by selecting a voice for generating a custom text-to-speech voice associated with a domain, collecting text data associated with the domain from a pre-existing text data source and using the collected text data, generating an in-domain inventory of synthesis speech units by selecting speech units appropriate to the domain via a search of a pre-existing inventory of synthesis speech units, or by recording the minimal inventory for a selected level of synthesis quality. The text-to-speech custom voice for the domain is generated utilizing the in-domain inventory of synthesis speech units. Active learning techniques may also be employed to identify problem phrases wherein only a few minutes of recorded data is necessary to deliver a high quality TTS custom voice.

Type: Application

Filed: March 4, 2014

Publication date: July 3, 2014

Applicant: AT&T Intellectual Property II, L.P.

Inventors: Srinivas BANGALORE, Junlan Feng, Mazin G. Rahim, Juergen Schroeter, Ann K. Syrdal, David Schulz
Method of Handling Frequently Asked Questions in a Natural Language Dialog Service

Publication number: 20140149121

Abstract: A voice-enabled help desk service is disclosed. The service comprises an automatic speech recognition module for recognizing speech from a user, a spoken language understanding module for understanding the output from the automatic speech recognition module, a dialog management module for generating a response to speech from the user, a natural voices text-to-speech synthesis module for synthesizing speech to generate the response to the user, and a frequently asked questions module. The frequently asked questions module handles frequently asked questions from the user by changing voices and providing predetermined prompts to answer frequently asked questions.

Type: Application

Filed: February 3, 2014

Publication date: May 29, 2014

Applicant: AT&T Intellectual Property II, L.P.

Inventors: Giuseppe Di Fabbrizio, Dawn L. Dutton, Narendra K. Gupta, Barbara B. Hollister, Mazin G. Rahim, Giuseppe Riccardi, Robert Elias Schapire, Juergen Schroeter
System and method for generating customized text-to-speech voices

Patent number: 8666746

Abstract: A system and method are disclosed for generating customized text-to-speech voices for a particular application. The method comprises generating a custom text-to-speech voice by selecting a voice for generating a custom text-to-speech voice associated with a domain, collecting text data associated with the domain from a pre-existing text data source and using the collected text data, generating an in-domain inventory of synthesis speech units by selecting speech units appropriate to the domain via a search of a pre-existing inventory of synthesis speech units, or by recording the minimal inventory for a selected level of synthesis quality. The text-to-speech custom voice for the domain is generated utilizing the in-domain inventory of synthesis speech units. Active learning techniques may also be employed to identify problem phrases wherein only a few minutes of recorded data is necessary to deliver a high quality TTS custom voice.

Type: Grant

Filed: May 13, 2004

Date of Patent: March 4, 2014

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Srinivas Bangalore, Junlan Feng, Mazin G. Rahim, Juergen Schroeter, David Eugene Schulz, Ann K. Syrdal
Method of handling frequently asked questions in a natural language dialog service

Patent number: 8645122

Abstract: A voice-enabled help desk service is disclosed. The service comprises an automatic speech recognition module for recognizing speech from a user, a spoken language understanding module for understanding the output from the automatic speech recognition module, a dialog management module for generating a response to speech from the user, a natural voices text-to-speech synthesis module for synthesizing speech to generate the response to the user, and a frequently asked questions module. The frequently asked questions module handles frequently asked questions from the user by changing voices and providing predetermined prompts to answer frequently asked questions.

Type: Grant

Filed: December 19, 2002

Date of Patent: February 4, 2014

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Giuseppe Di Fabbrizio, Dawn L Dutton, Narendra K. Gupta, Barbara B. Hollister, Mazin G Rahim, Giuseppe Riccardi, Robert Elias Schapire, Juergen Schroeter
System and method of dynamically modifying a spoken dialog system to reduce hardware requirements

Patent number: 8600757

Abstract: A system and method for providing a scalable spoken dialog system are disclosed. The method comprises receiving information which may be internal to the system or external to the system and dynamically modifying at least one module within a spoken dialog system according to the received information. The modules may be one or more of an automatic speech recognition, natural language understanding, dialog management and text-to-speech module or engine. Dynamically modifying the module may improve hardware performance or improve a specific caller's speech processing accuracy, for example. The modification of the modules or hardware may also be based on an application or a task, or based on a current portion of a dialog.

Type: Grant

Filed: November 30, 2012

Date of Patent: December 3, 2013

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Rahim Mazin, Juergen Schroeter
System and method of dynamically modifying a spoken dialog system to reduce hardware requirements

Patent number: 8332226

Abstract: A system and method for providing a scalable spoken dialog system are disclosed. The method comprises receiving information which may be internal to the system or external to the system and dynamically modifying at least one module within a spoken dialog system according to the received information. The modules may be one or more of an automatic speech recognition, natural language understanding, dialog management and text-to-speech module or engine. Dynamically modifying the module may improve hardware performance or improve a specific caller's speech processing accuracy, for example. The modification of the modules or hardware may also be based on an application or a task, or based on a current portion of a dialog.

Type: Grant

Filed: January 7, 2005

Date of Patent: December 11, 2012

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Mazin G. Rahim, Juergen Schroeter
Coarticulation method for audio-visual text-to-speech synthesis

Patent number: 8078466

Abstract: A method for generating animated sequences of talking heads in text-to-speech applications wherein a processor samples a plurality of frames comprising image samples. The processor reads first data comprising one or more parameters associated with noise-producing orifice images of sequences of at least three concatenated phonemes which correspond to an input stimulus. The processor reads, based on the first data, second data comprising images of a noise-producing entity. The processor generates an animated sequence of the noise-producing entity.

Type: Grant

Filed: November 30, 2009

Date of Patent: December 13, 2011

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Eric Cosatto, Hans Peter Graf, Juergen Schroeter
Audio-visual selection process for the synthesis of photo-realistic talking-head animations

Patent number: 7990384

Abstract: A system and method for generating photo-realistic talking-head animation from a text input utilizes an audio-visual unit selection process. The lip-synchronization is obtained by optimally selecting and concatenating variable-length video units of the mouth area. The unit selection process utilizes the acoustic data to determine the target costs for the candidate images and utilizes the visual data to determine the concatenation costs. The image database is prepared in a hierarchical fashion, including high-level features (such as a full 3D modeling of the head, geometric size and position of elements) and pixel-based, low-level features (such as a PCA-based metric for labeling the various feature bitmaps).

Type: Grant

Filed: September 15, 2003

Date of Patent: August 2, 2011

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Eric Cosatto, Hans Peter Graf, Gerasimos Potamianos, Juergen Schroeter
System and method for blending synthetic voices

Patent number: 7966186

Abstract: A system and method for generating a synthetic text-to-speech TTS voice are disclosed. A user is presented with at least one TTS voice and at least one voice characteristic. A new synthetic TTS voice is generated by blending a plurality of existing TTS voices according to the selected voice characteristics. The blending of voices involves interpolating segmented parameters of each TTS voice. Segmented parameters may be, for example, prosodic characteristics of the speech such as pitch, volume, phone durations, accents, stress, mis-pronunciations and emotion.

Type: Grant

Filed: November 4, 2008

Date of Patent: June 21, 2011

Assignee: AT&T Intellectual Property II, L.P.

Inventors: David A. Kapilow, Kenneth H. Rosen, Juergen Schroeter
Voice-enabled dialog system

Patent number: 7869998

Abstract: A voice-enabled help desk service is disclosed. The service comprises an automatic speech recognition module for recognizing speech from a user, a spoken language understanding module for understanding the output from the automatic speech recognition module, a dialog management module for generating a response to speech from the user, a natural voices text-to-speech synthesis module for synthesizing speech to generate the response to the user, and a frequently asked questions module. The frequently asked questions module handles frequently asked questions from the user by changing voices and providing predetermined prompts to answer the frequently asked question.

Type: Grant

Filed: December 19, 2002

Date of Patent: January 11, 2011

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Giuseppe Di Fabbrizio, Dawn L Dutton, Narendra K. Gupta, Barbara B. Hollister, Mazin G Rahim, Giuseppe Riccardi, Robert Elias Schapire, Juergen Schroeter
Coarticulation Method for Audio-Visual Text-to-Speech Synthesis

Publication number: 20100076762

Abstract: A method for generating animated sequences of talking heads in text-to-speech applications wherein a processor samples a plurality of frames comprising image samples. The processor reads first data comprising one or more parameters associated with noise-producing orifice images of sequences of at least three concatenated phonemes which correspond to an input stimulus. The processor reads, based on the first data. second data comprising images of a noise-producing entity. The processor generates an animated sequence of the noise-producing entity.

Type: Application

Filed: November 30, 2009

Publication date: March 25, 2010

Applicant: AT&T Corp.

Inventors: Eric Cosatto, Hans Peter Graf, Juergen Schroeter
Coarticulation method for audio-visual text-to-speech synthesis

Patent number: 7630897

Abstract: A method for generating animated sequences of talking heads in text-to-speech applications wherein a processor samples a plurality of frames comprising image samples. The processor reads first data comprising one or more parameters associated with noise-producing orifice images of sequences of at least three concatenated phonemes which correspond to an input stimulus. The processor reads, based on the first data. second data comprising images of a noise-producing entity. The processor generates an animated sequence of the noise-producing entity.

Type: Grant

Filed: May 19, 2008

Date of Patent: December 8, 2009

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Eric Cosatto, Hans Peter Graf, Juergen Schroeter
SYSTEM AND METHOD FOR BLENDING SYNTHETIC VOICES

Publication number: 20090063153

Abstract: A system and method for generating a synthetic text-to-speech TTS voice are disclosed. A user is presented with at least one TTS voice and at least one voice characteristic. A new synthetic TTS voice is generated by blending a plurality of existing TTS voices according to the selected voice characteristics. The blending of voices involves interpolating segmented parameters of each TTS voice. Segmented parameters may be, for example, prosodic characteristics of the speech such as pitch, volume, phone durations, accents, stress, mis-pronunciations and emotion.

Type: Application

Filed: November 4, 2008

Publication date: March 5, 2009

Applicant: AT&T Corp.

Inventors: David A. Kapilow, Kenneth H. Rosen, Juergen Schroeter
System and method for blending synthetic voices

Patent number: 7454348

Abstract: A system and method for generating a synthetic text-to-speech TTS voice are disclosed. A user is presented with at least one TTS voice and at least one voice characteristic. A new synthetic TTS voice is generated by blending a plurality of existing TTS voices according to the selected voice characteristics. The blending of voices involves interpolating segmented parameters of each TTS voice. Segmented parameters may be, for example, prosodic characteristics of the speech such as pitch, volume, phone durations, accents, stress, mis-pronunciations and emotion.

Type: Grant

Filed: January 8, 2004

Date of Patent: November 18, 2008

Assignee: AT&T Intellectual Property II, L.P.

Inventors: David A. Kapilow, Kenneth H. Rosen, Juergen Schroeter
COARTICULATION METHOD FOR AUDIO-VISUAL TEXT-TO-SPEECH SYNTHESIS

Publication number: 20080221904

Abstract: A method for generating animated sequences of talking heads in text-to-speech applications wherein a processor samples a plurality of frames comprising image samples. The processor reads first data comprising one or more parameters associated with noise-producing orifice images of sequences of at least three concatenated phonemes which correspond to an input stimulus. The processor reads, based on the first data. second data comprising images of a noise-producing entity. The processor generates an animated sequence of the noise-producing entity.

Type: Application

Filed: May 19, 2008

Publication date: September 11, 2008

Applicant: AT&T Corp.

Inventors: Eric Cosatto, Hans Peter Graf, Juergen Schroeter

1 2 next