Patents by Inventor JOSEF VOPICKA

JOSEF VOPICKA has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10181325
    Abstract: Aspects described herein are directed towards methods, computing devices, systems, and computer-readable media that apply scattering operations to extracted visual features of audiovisual input to generate predictions regarding the speech status of a subject. Visual scattering coefficients generated according to one or more aspects described herein may be used as input to a neural network operative to generate the predictions regarding the speech status of the subject. Predictions generated based on the visual features may be combined with predictions based on audio input associated with the visual features. In some embodiments, the extracted visual features may be combined with the audio input to generate a combined feature vector for use in generating predictions.
    Type: Grant
    Filed: June 30, 2017
    Date of Patent: January 15, 2019
    Assignee: Nuance Communications, Inc.
    Inventors: Etienne Marcheret, Josef Vopicka, Vaibhava Goel
  • Patent number: 10147438
    Abstract: Embodiments of the invention include method, systems and computer program products for role modeling. Aspects of the invention include receiving, by a processor, audio data, wherein the audio data includes a plurality of audio conversation for one or more speakers. The one or more segments for each of the plurality of audio conversations are partitioned. A speaker is associated with each of the one or more segments. The one or more segments for each of the plurality of audio conversations are labeled with roles utilizing a speaker recognition engine. Speakers are clustered based at least in part on a number of times the speakers are present in an audio conversation.
    Type: Grant
    Filed: March 2, 2017
    Date of Patent: December 4, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Kenneth W. Church, Jason W. Pelecanos, Josef Vopicka, Weizhong Zhu
  • Patent number: 10109277
    Abstract: Methods and apparatus for using visual information to facilitate a speech recognition process. The method comprises dividing received audio information into a plurality of audio frames, determining for each of the plurality of audio frames, whether the audio information in the audio frame comprises speech from the foreground speaker, wherein the determining is based, at least in part, on received visual information, and transmitting the audio frame to an automatic speech recognition (ASR) engine for speech recognition when it is determined that the audio frame comprises speech from the foreground speaker.
    Type: Grant
    Filed: April 27, 2015
    Date of Patent: October 23, 2018
    Assignee: Nuance Communications, Inc.
    Inventors: Etienne Marcheret, Josef Vopicka, Vaibhava Goel
  • Publication number: 20180254051
    Abstract: Embodiments of the invention include method, systems and computer program products for role modeling. Aspects of the invention include receiving, by a processor, audio data, wherein the audio data includes a plurality of audio conversation for one or more speakers. The one or more segments for each of the plurality of audio conversations are partitioned. A speaker is associated with each of the one or more segments. The one or more segments for each of the plurality of audio conversations are labeled with roles utilizing a speaker recognition engine. Speakers are clustered based at least in part on a number of times the speakers are present in an audio conversation.
    Type: Application
    Filed: March 2, 2017
    Publication date: September 6, 2018
    Inventors: Kenneth W. Church, Jason W. Pelecanos, Josef Vopicka, Weizhong Zhu
  • Publication number: 20180025729
    Abstract: Aspects described herein are directed towards methods, computing devices, systems, and computer-readable media that apply scattering operations to extracted visual features of audiovisual input to generate predictions regarding the speech status of a subject. Visual scattering coefficients generated according to one or more aspects described herein may be used as input to a neural network operative to generate the predictions regarding the speech status of the subject. Predictions generated based on the visual features may be combined with predictions based on audio input associated with the visual features. In some embodiments, the extracted visual features may be combined with the audio input to generate a combined feature vector for use in generating predictions.
    Type: Application
    Filed: June 30, 2017
    Publication date: January 25, 2018
    Inventors: Etienne Marcheret, Josef Vopicka, Vaibhava Goel
  • Patent number: 9697833
    Abstract: Aspects described herein are directed towards methods, computing devices, systems, and computer-readable media that apply scattering operations to extracted visual features of audiovisual input to generate predictions regarding the speech status of a subject. Visual scattering coefficients generated according to one or more aspects described herein may be used as input to a neural network operative to generate the predictions regarding the speech status of the subject. Predictions generated based on the visual features may be combined with predictions based on audio input associated with the visual features. In some embodiments, the extracted visual features may be combined with the audio input to generate a combined feature vector for use in generating predictions.
    Type: Grant
    Filed: August 25, 2015
    Date of Patent: July 4, 2017
    Assignee: Nuance Communications, Inc.
    Inventors: Etienne Marcheret, Josef Vopicka, Vaibhava Goel
  • Publication number: 20170061966
    Abstract: Aspects described herein are directed towards methods, computing devices, systems, and computer-readable media that apply scattering operations to extracted visual features of audiovisual input to generate predictions regarding the speech status of a subject. Visual scattering coefficients generated according to one or more aspects described herein may be used as input to a neural network operative to generate the predictions regarding the speech status of the subject. Predictions generated based on the visual features may be combined with predictions based on audio input associated with the visual features. In some embodiments, the extracted visual features may be combined with the audio input to generate a combined feature vector for use in generating predictions.
    Type: Application
    Filed: August 25, 2015
    Publication date: March 2, 2017
    Inventors: Etienne Marcheret, Josef Vopicka, Vaibhava Goel
  • Publication number: 20160314789
    Abstract: Methods and apparatus for using visual information to facilitate a speech recognition process. The method comprises dividing received audio information into a plurality of audio frames, determining for each of the plurality of audio frames, whether the audio information in the audio frame comprises speech from the foreground speaker, wherein the determining is based, at least in part, on received visual information, and transmitting the audio frame to an automatic speech recognition (ASR) engine for speech recognition when it is determined that the audio frame comprises speech from the foreground speaker.
    Type: Application
    Filed: April 27, 2015
    Publication date: October 27, 2016
    Applicant: Nuance Communications, Inc.
    Inventors: Etienne Marcheret, Josef Vopicka, Vaibhava Goel
  • Publication number: 20090198490
    Abstract: The present invention discloses a solution for a speech processing system to determine end-of-utterance (EOU) events. The solution is a modified dual factor technique, where one factor is based upon a number of silence frames received and a second factor is based upon an end-of-path occurrence. The solution permits a set of configurable timeout delay values to be established, which can be configured on an application specific basis by application developers. The solution can speed up EOU determinations made through a dual factor technique, by situationally making finalization determination before a silence frame window is full.
    Type: Application
    Filed: February 6, 2008
    Publication date: August 6, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: JOHN W. ECKHART, JONATHAN PALGON, JOSEF VOPICKA