Patents by Inventor Siegfried Kunzmann

Siegfried Kunzmann has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Automatic speech recognition

Patent number: 11915690

Abstract: A multi-channel transformer acoustic model that processes a plurality of audio signals output by microphones of a microphone array and outputs probabilities for acoustic units of an utterance represented in the audio signals. The audio signals represent the individual microphones' respective capturing of the utterance. The multi-channel model may perform self-attention on embeddings of the audio signals and then cross-channel attention across the attended audio signals. The cross-channel attention may involve processing of signals relative to each other to model the relationships across channels within and across time frames. The multi-channel model may include a transducer to perform processing frame-by-frame.

Type: Grant

Filed: September 29, 2021

Date of Patent: February 27, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Feng-Ju Chang, Martin Radfar, Athanasios Mouchtaris, Brian King, Siegfried Kunzmann, Maurizio Omologo
Back-end database reorganization for application-specific concatenative text-to-speech systems

Patent number: 8412528

Abstract: The present invention relates to computer-generated text-to-speech conversion. It relates in particular to a method and system for updating a Concatenative Text-To-Speech (CTTS) system with a speech database from a base version to a new version. The present invention performs an application-specific re-organization of a synthesizer's speech database by means of certain decision tree modifications. By that reorganization, certain synthesis units are made available for the new application, which are not available in prior art without a new speech session. This allows the creation of application-specific synthesizers with improved output speech quality for arbitrary domains and applications at very low cost.

Type: Grant

Filed: May 2, 2006

Date of Patent: April 2, 2013

Assignee: Nuance Communications, Inc.

Inventors: Volker Fischer, Siegfried Kunzmann
Sensor based approach recognizer selection, adaptation and combination

Patent number: 7302393

Abstract: A method and respective system for operating a speech recognition system, in which a plurality of recognizer programs are accessible to be activated for speech recognition, and are combined on a per need basis in order to efficiently improve the results of speech recognition done by a single recognizer. In order to adapt such system to the dynamically changing acoustic conditions of various operating environments and to the particular requirements of running in embedded systems having only a limited computing power available, it is proposed to a) collect selection base data characterizing speech recognition boundary conditions, e.g. the speaker person and the environmental noise, etc., with sensor means, and b) using program-controlled arbiter means for evaluating the collected data, e.g., a decision engine including software mechanism and a physical sensor, to select the best suited recognizer or a combination thereof out of the plurality of available recognizers.

Type: Grant

Filed: October 31, 2003

Date of Patent: November 27, 2007

Assignee: International Business Machines Corporation

Inventors: Volker Fischer, Siegfried Kunzmann
Method and computer system for encoding of information into a representation

Patent number: 7213151

Abstract: The present invention relates to a computer system and to a method for encoding of information into a representation comprising a plurality of segments, the order of the segments in the representation being irrelevant for a rendering of the representation, the method comprising the steps of: identification of the segments, permutation of the segments to encode the information.

Type: Grant

Filed: June 27, 2002

Date of Patent: May 1, 2007

Assignee: International Business Machines Corporation

Inventors: Carsten Guenther, Werner Kriechbaum, Siegfried Kunzmann, Bernhard Hubert Zeller
Back-end database reorganization for application-specific concatenative text-to-speech systems

Publication number: 20060287861

Abstract: The present invention relates to computer-generated text-to-speech conversion. It relates in particular to a method and system for updating a Concatenative Text-To-Speech (CTTS) system with a speech database from a base version to a new version. The present invention performs an application-specific re-organization of a synthesizer's speech database by means of certain decision tree modifications. By that reorganization, certain synthesis units are made available for the new application, which are not available in prior art without a new speech session. This allows the creation of application-specific synthesizers with improved output speech quality for arbitrary domains and applications at very low cost.

Type: Application

Filed: May 2, 2006

Publication date: December 21, 2006

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Volker Fischer, Siegfried Kunzmann
Sensor based speech recognizer selection, adaptation and combination

Publication number: 20060173684

Abstract: A method and respective system for operating a speech recognition system, in which a plurality of recognizer programs are accessible to be activated for speech recognition, and are combined on a per need basis in order to efficiently improve the results of speech recognition done by a single recognizer. In order to adapt such system to the dynamically changing acoustic conditions of various operating environments and to the particular requirements of running in embedded systems having only a limited computing power available, it is proposed to a) collect selection base data characterizing speech recognition boundary conditions, e.g. the speaker person and the environmental noise, etc., with sensor means, and b) using program-controlled arbiter means for evaluating the collected data, e.g., a decision engine including software mechanism and a physical sensor, to select the best suited recognizer or a combination thereof out of the plurality of available recognizers.

Type: Application

Filed: October 31, 2003

Publication date: August 3, 2006

Applicant: International Business Machines Corporation

Inventors: Volker Fischer, Siegfried Kunzmann
Method and apparatus for phonetic context adaptation for improved speech recognition

Patent number: 6999925

Abstract: The present invention provides a computerized method and apparatus for automatically generating from a first speech recognizer a second speech recognizer which can be adapted to a specific domain. The first speech recognizer can include a first acoustic model with a first decision network and corresponding first phonetic contexts. The first acoustic model can be used as a starting point for the adaptation process. A second acoustic model with a second decision network and corresponding second phonetic contexts for the second speech recognizer can be generated by re-estimating the first decision network and the corresponding first phonetic contexts based on domain-specific training data.

Type: Grant

Filed: November 13, 2001

Date of Patent: February 14, 2006

Assignee: International Business Machines Corporation

Inventors: Volker Fischer, Siegfried Kunzmann, Eric-W. Janke, A. Jon Tyrrell
Method and system for generating squeezed acoustic models for specialized speech recognizer

Patent number: 6789061

Abstract: Computer-based methods and systems are provided for automatically generating, from a first speech recognizer, a second speech recognizer such that the second speech recognizer is tailored to a certain application and requires reduced resources compared to the first speech recognizer. The invention exploits the first speech recognizer's set of states si and set of probability density functions (pdfs) assembling output probabilities for an observation of a speech frame in said states si. The invention teaches a first step of generating a set of states of the second speech recognizer reduced to a subset of states of the first speech recognizer being distinctive of the certain application. The invention teaches a second step of generating a set of probability density functions of the second speech recognizer reduced to a subset of probability density functions of the first speech recognizer being distinctive of the certain application.

Type: Grant

Filed: August 14, 2000

Date of Patent: September 7, 2004

Assignee: International Business Machines Corporation

Inventors: Volker Fischer, Siegfried Kunzmann, Claire Waast-Ricard
Segmentation technique increasing the active vocabulary of speech recognizers

Patent number: 6738741

Abstract: A speech recognition system and a method executed by a speech recognition system focusing on the vocabulary of the speech recognition system and its usage during the speech recognition process is provided. A segmented vocabulary and its exploitation is provided comprising a multitude of entries wherein an entry is either identical to a legal word or a constituent of a legal word of the language, and the constituent is an arbitrary sub-component of the legal word according to the orthography. A constituent can comprise any number of characters not limited to a syllable of a legal word or a recognition unit of the speech recognition system. The vocabulary is used to recognize constituents of the vocabulary for recombination of the constituents into legal words if a constituent combination table indicates that the recognized constituents are a legal concatenation in the language.

Type: Grant

Filed: November 18, 2002

Date of Patent: May 18, 2004

Assignee: International Business Machines Corporation

Inventors: Ossama Emam, Siegfried Kunzmann
Segmentation technique increasing the active vocabulary of speech recognizers

Publication number: 20030078778

Abstract: A speech recognition system and a method executed by a speech recognition system focusing on the vocabulary of the speech recognition system and its usage during the speech recognition process is provided. A segmented vocabulary and its exploitation is provided comprising a multitude of entries wherein an entry is either identical to a legal word or a constituent of a legal word of the language, and the constituent is an arbitrary sub-component of the legal word according to the orthography. A constituent can comprise any number of characters not limited to a syllable of a legal word or a recognition unit of the speech recognition system. The vocabulary is used to recognize constituents of the vocabulary for recombination of the constituents into legal words if a constituent combination table indicates that the recognized constituents are a legal concatenation in the language.

Type: Application

Filed: November 18, 2002

Publication date: April 24, 2003

Applicant: International Business Machines Corporation

Inventors: Ossama Emam, Siegfried Kunzmann
Method and computer system for encoding of information into a representation

Publication number: 20030074561

Abstract: The present invention relates to a computer system and to a method for encoding of information into a representation comprising a plurality of segments, the order of the segments in the representation being irrelevant for a rendering of the representation, the method comprising the steps of:

Type: Application

Filed: June 27, 2002

Publication date: April 17, 2003

Applicant: International Business Machines Corporation

Inventors: Carsten Guenther, Werner Kriechbaum, Siegfried Kunzmann, Bernhard Hubert Zeller
Method and apparatus for providing authentication of a rendered realization

Publication number: 20020168089

Abstract: Disclosed are a method, apparatus, and program for providing authentication of a rendered multimedia realization. A renderer and a watermark generator are integrated wherein the renderer receives a symbolic stream, e.g. in the case of a text-to-speech system a text, and generates a realization, e.g. an audio signal representing a spoken version of the text. An identification is embedded into the signal by the watermark generator using standard steganographic methods. Such a serial integration of renderer and watermark generator is applicable to all known renderers and watermarking techniques. The mechanism enables inheritance of originality of the original representation or realization to the rendered realization.

Type: Application

Filed: May 9, 2002

Publication date: November 14, 2002

Applicant: International Business Machines Corporation

Inventors: Carsten Guenther, Werner Kriechbaum, Siegfried Kunzmann, Bernhard Hubert Zeller
SEGMENTATION TECHNIQUE INCREASING THE ACTIVE VOCABULARY OF SPEECH RECOGNIZERS

Publication number: 20020099543

Abstract: A speech recognition system and a method executed by a speech recognition system focusing on the vocabulary of the speech recognition system and its usage during the speech recognition process is provided. A segmented vocabulary and its exploitation is provided comprising a multitude of entries wherein an entry is either identical to a legal word or a constituent of a legal word of the language, and the constituent is an arbitrary sub-component of the legal word according to the orthography. A constituent can comprise any number of characters not limited to a syllable of a legal word or a recognition unit of the speech recognition system. The vocabulary is used to recognize constituents of the vocabulary for recombination of the constituents into legal words if a constituent combination table indicates that the recognized constituents are a legal concatenation in the language.

Type: Application

Filed: August 25, 1999

Publication date: July 25, 2002

Inventors: OSSAMA EMAN, SIEGFRIED KUNZMANN
Method and apparatus for phonetic context adaptation for improved speech recognition

Publication number: 20020087314

Abstract: The present invention provides a computerized method and apparatus for automatically generating from a first speech recognizer a second speech recognizer which can be adapted to a specific domain. The first speech recognizer can include a first acoustic model with a first decision network and corresponding first phonetic contexts. The first acoustic model can be used as a starting point for the adaptation process. A second acoustic model with a second decision network and corresponding second phonetic contexts for the second speech recognizer can be generated by re-estimating the first decision network and the corresponding first phonetic contexts based on domain-specific training data.

Type: Application

Filed: November 13, 2001

Publication date: July 4, 2002

Applicant: International Business Machines Corporation

Inventors: Volker Fischer, Siegfried Kunzmann, Eric-W. Janke, A. Jon Tyrrell
Method and apparatus for adapting the language model's size in a speech recognition system

Patent number: 5899973

Abstract: In this speech recognition system, the size of the language model is reduced by discarding those n-grams that the acoustic part of the system can recognize most accurately without support from a language model. The n-grams can be discarded dynamically during the running of the system or during the build or setup-time of the system. Trigrams occurring infrequently in the text corpora are substituted for the discarded n-grams to increase the accuracy of the word recognitions.

Type: Grant

Filed: September 25, 1997

Date of Patent: May 4, 1999

Assignee: International Business Machines Corporation

Inventors: Upali Bandara, Siegfried Kunzmann, Karlheinz Mohr, Burn L. Lewis