Patents by Inventor Siegfried Kunzmann

Siegfried Kunzmann has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11915690
    Abstract: A multi-channel transformer acoustic model that processes a plurality of audio signals output by microphones of a microphone array and outputs probabilities for acoustic units of an utterance represented in the audio signals. The audio signals represent the individual microphones' respective capturing of the utterance. The multi-channel model may perform self-attention on embeddings of the audio signals and then cross-channel attention across the attended audio signals. The cross-channel attention may involve processing of signals relative to each other to model the relationships across channels within and across time frames. The multi-channel model may include a transducer to perform processing frame-by-frame.
    Type: Grant
    Filed: September 29, 2021
    Date of Patent: February 27, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Feng-Ju Chang, Martin Radfar, Athanasios Mouchtaris, Brian King, Siegfried Kunzmann, Maurizio Omologo
  • Patent number: 8412528
    Abstract: The present invention relates to computer-generated text-to-speech conversion. It relates in particular to a method and system for updating a Concatenative Text-To-Speech (CTTS) system with a speech database from a base version to a new version. The present invention performs an application-specific re-organization of a synthesizer's speech database by means of certain decision tree modifications. By that reorganization, certain synthesis units are made available for the new application, which are not available in prior art without a new speech session. This allows the creation of application-specific synthesizers with improved output speech quality for arbitrary domains and applications at very low cost.
    Type: Grant
    Filed: May 2, 2006
    Date of Patent: April 2, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Volker Fischer, Siegfried Kunzmann
  • Patent number: 7302393
    Abstract: A method and respective system for operating a speech recognition system, in which a plurality of recognizer programs are accessible to be activated for speech recognition, and are combined on a per need basis in order to efficiently improve the results of speech recognition done by a single recognizer. In order to adapt such system to the dynamically changing acoustic conditions of various operating environments and to the particular requirements of running in embedded systems having only a limited computing power available, it is proposed to a) collect selection base data characterizing speech recognition boundary conditions, e.g. the speaker person and the environmental noise, etc., with sensor means, and b) using program-controlled arbiter means for evaluating the collected data, e.g., a decision engine including software mechanism and a physical sensor, to select the best suited recognizer or a combination thereof out of the plurality of available recognizers.
    Type: Grant
    Filed: October 31, 2003
    Date of Patent: November 27, 2007
    Assignee: International Business Machines Corporation
    Inventors: Volker Fischer, Siegfried Kunzmann
  • Patent number: 7213151
    Abstract: The present invention relates to a computer system and to a method for encoding of information into a representation comprising a plurality of segments, the order of the segments in the representation being irrelevant for a rendering of the representation, the method comprising the steps of: identification of the segments, permutation of the segments to encode the information.
    Type: Grant
    Filed: June 27, 2002
    Date of Patent: May 1, 2007
    Assignee: International Business Machines Corporation
    Inventors: Carsten Guenther, Werner Kriechbaum, Siegfried Kunzmann, Bernhard Hubert Zeller
  • Publication number: 20060287861
    Abstract: The present invention relates to computer-generated text-to-speech conversion. It relates in particular to a method and system for updating a Concatenative Text-To-Speech (CTTS) system with a speech database from a base version to a new version. The present invention performs an application-specific re-organization of a synthesizer's speech database by means of certain decision tree modifications. By that reorganization, certain synthesis units are made available for the new application, which are not available in prior art without a new speech session. This allows the creation of application-specific synthesizers with improved output speech quality for arbitrary domains and applications at very low cost.
    Type: Application
    Filed: May 2, 2006
    Publication date: December 21, 2006
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Volker Fischer, Siegfried Kunzmann
  • Publication number: 20060173684
    Abstract: A method and respective system for operating a speech recognition system, in which a plurality of recognizer programs are accessible to be activated for speech recognition, and are combined on a per need basis in order to efficiently improve the results of speech recognition done by a single recognizer. In order to adapt such system to the dynamically changing acoustic conditions of various operating environments and to the particular requirements of running in embedded systems having only a limited computing power available, it is proposed to a) collect selection base data characterizing speech recognition boundary conditions, e.g. the speaker person and the environmental noise, etc., with sensor means, and b) using program-controlled arbiter means for evaluating the collected data, e.g., a decision engine including software mechanism and a physical sensor, to select the best suited recognizer or a combination thereof out of the plurality of available recognizers.
    Type: Application
    Filed: October 31, 2003
    Publication date: August 3, 2006
    Applicant: International Business Machines Corporation
    Inventors: Volker Fischer, Siegfried Kunzmann
  • Patent number: 6999925
    Abstract: The present invention provides a computerized method and apparatus for automatically generating from a first speech recognizer a second speech recognizer which can be adapted to a specific domain. The first speech recognizer can include a first acoustic model with a first decision network and corresponding first phonetic contexts. The first acoustic model can be used as a starting point for the adaptation process. A second acoustic model with a second decision network and corresponding second phonetic contexts for the second speech recognizer can be generated by re-estimating the first decision network and the corresponding first phonetic contexts based on domain-specific training data.
    Type: Grant
    Filed: November 13, 2001
    Date of Patent: February 14, 2006
    Assignee: International Business Machines Corporation
    Inventors: Volker Fischer, Siegfried Kunzmann, Eric-W. Janke, A. Jon Tyrrell
  • Patent number: 6789061
    Abstract: Computer-based methods and systems are provided for automatically generating, from a first speech recognizer, a second speech recognizer such that the second speech recognizer is tailored to a certain application and requires reduced resources compared to the first speech recognizer. The invention exploits the first speech recognizer's set of states si and set of probability density functions (pdfs) assembling output probabilities for an observation of a speech frame in said states si. The invention teaches a first step of generating a set of states of the second speech recognizer reduced to a subset of states of the first speech recognizer being distinctive of the certain application. The invention teaches a second step of generating a set of probability density functions of the second speech recognizer reduced to a subset of probability density functions of the first speech recognizer being distinctive of the certain application.
    Type: Grant
    Filed: August 14, 2000
    Date of Patent: September 7, 2004
    Assignee: International Business Machines Corporation
    Inventors: Volker Fischer, Siegfried Kunzmann, Claire Waast-Ricard
  • Patent number: 6738741
    Abstract: A speech recognition system and a method executed by a speech recognition system focusing on the vocabulary of the speech recognition system and its usage during the speech recognition process is provided. A segmented vocabulary and its exploitation is provided comprising a multitude of entries wherein an entry is either identical to a legal word or a constituent of a legal word of the language, and the constituent is an arbitrary sub-component of the legal word according to the orthography. A constituent can comprise any number of characters not limited to a syllable of a legal word or a recognition unit of the speech recognition system. The vocabulary is used to recognize constituents of the vocabulary for recombination of the constituents into legal words if a constituent combination table indicates that the recognized constituents are a legal concatenation in the language.
    Type: Grant
    Filed: November 18, 2002
    Date of Patent: May 18, 2004
    Assignee: International Business Machines Corporation
    Inventors: Ossama Emam, Siegfried Kunzmann
  • Publication number: 20030078778
    Abstract: A speech recognition system and a method executed by a speech recognition system focusing on the vocabulary of the speech recognition system and its usage during the speech recognition process is provided. A segmented vocabulary and its exploitation is provided comprising a multitude of entries wherein an entry is either identical to a legal word or a constituent of a legal word of the language, and the constituent is an arbitrary sub-component of the legal word according to the orthography. A constituent can comprise any number of characters not limited to a syllable of a legal word or a recognition unit of the speech recognition system. The vocabulary is used to recognize constituents of the vocabulary for recombination of the constituents into legal words if a constituent combination table indicates that the recognized constituents are a legal concatenation in the language.
    Type: Application
    Filed: November 18, 2002
    Publication date: April 24, 2003
    Applicant: International Business Machines Corporation
    Inventors: Ossama Emam, Siegfried Kunzmann
  • Publication number: 20030074561
    Abstract: The present invention relates to a computer system and to a method for encoding of information into a representation comprising a plurality of segments, the order of the segments in the representation being irrelevant for a rendering of the representation, the method comprising the steps of:
    Type: Application
    Filed: June 27, 2002
    Publication date: April 17, 2003
    Applicant: International Business Machines Corporation
    Inventors: Carsten Guenther, Werner Kriechbaum, Siegfried Kunzmann, Bernhard Hubert Zeller
  • Publication number: 20020168089
    Abstract: Disclosed are a method, apparatus, and program for providing authentication of a rendered multimedia realization. A renderer and a watermark generator are integrated wherein the renderer receives a symbolic stream, e.g. in the case of a text-to-speech system a text, and generates a realization, e.g. an audio signal representing a spoken version of the text. An identification is embedded into the signal by the watermark generator using standard steganographic methods. Such a serial integration of renderer and watermark generator is applicable to all known renderers and watermarking techniques. The mechanism enables inheritance of originality of the original representation or realization to the rendered realization.
    Type: Application
    Filed: May 9, 2002
    Publication date: November 14, 2002
    Applicant: International Business Machines Corporation
    Inventors: Carsten Guenther, Werner Kriechbaum, Siegfried Kunzmann, Bernhard Hubert Zeller
  • Publication number: 20020099543
    Abstract: A speech recognition system and a method executed by a speech recognition system focusing on the vocabulary of the speech recognition system and its usage during the speech recognition process is provided. A segmented vocabulary and its exploitation is provided comprising a multitude of entries wherein an entry is either identical to a legal word or a constituent of a legal word of the language, and the constituent is an arbitrary sub-component of the legal word according to the orthography. A constituent can comprise any number of characters not limited to a syllable of a legal word or a recognition unit of the speech recognition system. The vocabulary is used to recognize constituents of the vocabulary for recombination of the constituents into legal words if a constituent combination table indicates that the recognized constituents are a legal concatenation in the language.
    Type: Application
    Filed: August 25, 1999
    Publication date: July 25, 2002
    Inventors: OSSAMA EMAN, SIEGFRIED KUNZMANN
  • Publication number: 20020087314
    Abstract: The present invention provides a computerized method and apparatus for automatically generating from a first speech recognizer a second speech recognizer which can be adapted to a specific domain. The first speech recognizer can include a first acoustic model with a first decision network and corresponding first phonetic contexts. The first acoustic model can be used as a starting point for the adaptation process. A second acoustic model with a second decision network and corresponding second phonetic contexts for the second speech recognizer can be generated by re-estimating the first decision network and the corresponding first phonetic contexts based on domain-specific training data.
    Type: Application
    Filed: November 13, 2001
    Publication date: July 4, 2002
    Applicant: International Business Machines Corporation
    Inventors: Volker Fischer, Siegfried Kunzmann, Eric-W. Janke, A. Jon Tyrrell
  • Patent number: 5899973
    Abstract: In this speech recognition system, the size of the language model is reduced by discarding those n-grams that the acoustic part of the system can recognize most accurately without support from a language model. The n-grams can be discarded dynamically during the running of the system or during the build or setup-time of the system. Trigrams occurring infrequently in the text corpora are substituted for the discarded n-grams to increase the accuracy of the word recognitions.
    Type: Grant
    Filed: September 25, 1997
    Date of Patent: May 4, 1999
    Assignee: International Business Machines Corporation
    Inventors: Upali Bandara, Siegfried Kunzmann, Karlheinz Mohr, Burn L. Lewis