Patents by Inventor Petr Fousek
Petr Fousek has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11120802Abstract: An approach is provided that receives an audio stream and utilizes a voice activation detection (VAD) process to create a digital audio stream of voices from at least two different speakers. An automatic speech recognition (ASR) process is applied to the digital stream with the ASR process resulting in the spoken words to which a speaker turn detection (STD) process is applied to identify a number of speaker segments with each speaker segment ending at a word boundary. A speaker clustering algorithm is then applied to the speaker segments to associate one of the speakers with each of the speaker segments.Type: GrantFiled: November 21, 2017Date of Patent: September 14, 2021Assignee: International Business Machines CorporationInventors: Kenneth W. Church, Dimitrios B. Dimitriadis, Petr Fousek, Miroslav Novak, George A. Saon
-
Patent number: 10614797Abstract: A diarization embodiment may include a system that clusters data up to a current point in time and consolidates it with the past decisions, and then returns the result that minimizes the difference with past decisions. The consolidation may be achieved by performing a permutation of the different possible labels and comparing the distance. For speaker diarization, a distance may be determined based on a minimum edit or hamming distance. The distance may alternatively be a measure other than the minimum edit or hamming distance. The clustering may have a finite time window over which the analysis is performed.Type: GrantFiled: November 30, 2017Date of Patent: April 7, 2020Assignee: International Business Machines CorporationInventors: Kenneth W. Church, Dimitrios B. Dimitriadis, Petr Fousek, Jason W. Pelecanos, Weizhong Zhu
-
Patent number: 10468031Abstract: An approach is provided that receives an audio stream and utilizes a voice activation detection (VAD) process to create a digital audio stream of voices from at least two different speakers. An automatic speech recognition (ASR) process is applied to the digital stream with the ASR process resulting in the spoken words to which a speaker turn detection (STD) process is applied to identify a number of speaker segments with each speaker segment ending at a word boundary. The STD process analyzes a number of speaker segments using a language model that determines when speaker changes occur. A speaker clustering algorithm is then applied to the speaker segments to associate one of the speakers with each of the speaker segments.Type: GrantFiled: November 21, 2017Date of Patent: November 5, 2019Assignee: International Business Machines CorporationInventors: Kenneth W. Church, Dimitrios B. Dimitriadis, Petr Fousek, Miroslav Novak, George A. Saon
-
Publication number: 20190156835Abstract: An approach is provided that receives an audio stream and utilizes a voice activation detection (VAD) process to create a digital audio stream of voices from at least two different speakers. An automatic speech recognition (ASR) process is applied to the digital stream with the ASR process resulting in the spoken words to which a speaker turn detection (STD) process is applied to identify a number of speaker segments with each speaker segment ending at a word boundary. The STD process analyzes a number of speaker segments using a language model that determines when speaker changes occur. A speaker clustering algorithm is then applied to the speaker segments to associate one of the speakers with each of the speaker segments.Type: ApplicationFiled: November 21, 2017Publication date: May 23, 2019Inventors: Kenneth W. Church, Dimitrios B. Dimitriadis, Petr Fousek, Miroslav Novak, George A. Saon
-
Publication number: 20190156832Abstract: An approach is provided that receives an audio stream and utilizes a voice activation detection (VAD) process to create a digital audio stream of voices from at least two different speakers. An automatic speech recognition (ASR) process is applied to the digital stream with the ASR process resulting in the spoken words to which a speaker turn detection (STD) process is applied to identify a number of speaker segments with each speaker segment ending at a word boundary. A speaker clustering algorithm is then applied to the speaker segments to associate one of the speakers with each of the speaker segments.Type: ApplicationFiled: November 21, 2017Publication date: May 23, 2019Inventors: Kenneth W. Church, Dimitrios B. Dimitriadis, Petr Fousek, Miroslav Novak, George A. Saon
-
Publication number: 20180158451Abstract: A diarization embodiment may include a system that clusters data up to a current point in time and consolidates it with the past decisions, and then returns the result that minimizes the difference with past decisions. The consolidation may be achieved by performing a permutation of the different possible labels and comparing the distance. For speaker diarization, a distance may be determined based on a minimum edit or hamming distance. The distance may alternatively be a measure other than the minimum edit or hamming distance. The clustering may have a finite time window over which the analysis is performed.Type: ApplicationFiled: November 30, 2017Publication date: June 7, 2018Inventors: Kenneth W. CHURCH, Dimitrios B. DIMITRIADIS, Petr FOUSEK, Jason W. PELECANOS, Weizhong ZHU
-
Patent number: 9741341Abstract: A speech processing method and arrangement are described. A dynamic noise adaptation (DNA) model characterizes a speech input reflecting effects of background noise. A null noise DNA model characterizes the speech input based on reflecting a null noise mismatch condition. A DNA interaction model performs Bayesian model selection and re-weighting of the DNA model and the null noise DNA model to realize a modified DNA model characterizing the speech input for automatic speech recognition and compensating for noise to a varying degree depending on relative probabilities of the DNA model and the null noise DNA model.Type: GrantFiled: January 20, 2015Date of Patent: August 22, 2017Assignee: Nuance Communications, Inc.Inventors: Steven J. Rennie, Pierre Dognin, Petr Fousek
-
Patent number: 9640186Abstract: Deep scattering spectral features are extracted from an acoustic input signal to generate a deep scattering spectral feature representation of the acoustic input signal. The deep scattering spectral feature representation is input to a speech recognition engine. The acoustic input signal is decoded based on at least a portion of the deep scattering spectral feature representation input to a speech recognition engine.Type: GrantFiled: May 2, 2014Date of Patent: May 2, 2017Assignee: International Business Machines CorporationInventors: Petr Fousek, Vaibhava Goel, Brian E. D. Kingsbury, Etienne Marcheret, Shay Maymon, David Nahamoo, Vijayaditya Peddinti, Bhuvana Ramabhadran, Tara N. Sainath
-
Publication number: 20150317990Abstract: Deep scattering spectral features are extracted from an acoustic input signal to generate a deep scattering spectral feature representation of the acoustic input signal. The deep scattering spectral feature representation is input to a speech recognition engine. The acoustic input signal is decoded based on at least a portion of the deep scattering spectral feature representation input to a speech recognition engine.Type: ApplicationFiled: May 2, 2014Publication date: November 5, 2015Inventors: Petr Fousek, Vaibhava Goel, Brian E.D. Kingsbury, Etienne Marcheret, Shay Maymon, David Nahamoo, Tara N. Sainath, Bhuvana Ramabhadran
-
Publication number: 20150199964Abstract: A speech processing method and arrangement are described. A dynamic noise adaptation (DNA) model characterizes a speech input reflecting effects of background noise. A null noise DNA model characterizes the speech input based on reflecting a null noise mismatch condition. A DNA interaction model performs Bayesian model selection and re-weighting of the DNA model and the null noise DNA model to realize a modified DNA model characterizing the speech input for automatic speech recognition and compensating for noise to a varying degree depending on relative probabilities of the DNA model and the null noise DNA model.Type: ApplicationFiled: January 20, 2015Publication date: July 16, 2015Inventors: Steven J. Rennie, Pierre Dognin, Petr Fousek
-
Patent number: 8972256Abstract: A speech processing method and arrangement are described. A dynamic noise adaptation (DNA) model characterizes a speech input reflecting effects of background noise. A null noise DNA model characterizes the speech input based on reflecting a null noise mismatch condition. A DNA interaction model performs Bayesian model selection and re-weighting of the DNA model and the null noise DNA model to realize a modified DNA model characterizing the speech input for automatic speech recognition and compensating for noise to a varying degree depending on relative probabilities of the DNA model and the null noise DNA model.Type: GrantFiled: October 17, 2011Date of Patent: March 3, 2015Assignee: Nuance Communications, Inc.Inventors: Steven J. Rennie, Pierre Dognin, Petr Fousek
-
Publication number: 20130096915Abstract: A speech processing method and arrangement are described. A dynamic noise adaptation (DNA) model characterizes a speech input reflecting effects of background noise. A null noise DNA model characterizes the speech input based on reflecting a null noise mismatch condition. A DNA interaction model performs Bayesian model selection and re-weighting of the DNA model and the null noise DNA model to realize a modified DNA model characterizing the speech input for automatic speech recognition and compensating for noise to a varying degree depending on relative probabilities of the DNA model and the null noise DNA model.Type: ApplicationFiled: October 17, 2011Publication date: April 18, 2013Applicant: NUANCE COMMUNICATIONS, INC.Inventors: Steven J. Rennie, Pierre Dognin, Petr Fousek
-
Patent number: 8386249Abstract: Methods for compressing a transform associated with a feature space are presented. For example, a method for compressing a transform associated with a feature space includes obtaining the transform including a plurality of transform parameters, assigning each of a plurality of quantization levels for the plurality of transform parameters to one of a plurality of quantization values, and assigning each of the plurality of transform parameters to one of the plurality of quantization values to which one of the plurality of quantization levels is assigned. One or more of obtaining the transform, assigning of each of the plurality of quantization levels, and assigning of each of the transform parameters are implemented as instruction code executed on a processor device. Further, a Viterbi algorithm may be employed for use in non-uniform level/value assignments.Type: GrantFiled: December 11, 2009Date of Patent: February 26, 2013Assignee: International Business Machines CorporationInventors: Petr Fousek, Vaibhava Goel, Etienne Marcheret, Peder Andreas Olsen
-
Publication number: 20110144991Abstract: Methods for compressing a transform associated with a feature space are presented. For example, a method for compressing a transform associated with a feature space includes obtaining the transform including a plurality of transform parameters, assigning each of a plurality of quantization levels for the plurality of transform parameters to one of a plurality of quantization values, and assigning each of the plurality of transform parameters to one of the plurality of quantization values to which one of the plurality of quantization levels is assigned. One or more of obtaining the transform, assigning of each of the plurality of quantization levels, and assigning of each of the transform parameters are implemented as instruction code executed on a processor device. Further, a Viterbi algorithm may be employed for use in non-uniform level/value assignments.Type: ApplicationFiled: December 11, 2009Publication date: June 16, 2011Applicant: International Business Machines CorporationInventors: Petr Fousek, Vaibhava Goel, Etienne Marcheret, Peder Andreas Olsen