Patents by Inventor Andreas Stolcke

Andreas Stolcke has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11468895
    Abstract: A computer implemented method includes receiving audio streams at a meeting server from two distributed devices that are streaming audio captured during an ad-hoc meeting between at least two users, comparing the received audio streams to determine that the received audio streams are representative of sound from the ad-hoc meeting, generating a meeting instance to process the audio streams in response to the comparing determining that the audio streams are representative of sound from the ad-hoc meeting, and processing the received audio streams to generate a transcript of the ad-hoc meeting.
    Type: Grant
    Filed: April 30, 2019
    Date of Patent: October 11, 2022
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Takuya Yoshioka, Andreas Stolcke, Zhuo Chen, Dimitrios Basile Dimitriadis, Nanshan Zeng, Lijuan Qin, William Isaac Hinthorn, Xuedong Huang
  • Publication number: 20220230642
    Abstract: A computer implemented method processes audio streams recorded during a meeting by a plurality of distributed devices.
    Type: Application
    Filed: April 4, 2022
    Publication date: July 21, 2022
    Inventors: Takuya Yoshioka, Andreas Stolcke, Zhuo Chen, Dimitrios Basile Dimitriadis, Nanshan ZENG, Lijuan QIN, William Isaac Hinthorn, Xuedong HUANG
  • Patent number: 11322148
    Abstract: A computer implemented method processes audio streams recorded during a meeting by a plurality of distributed devices.
    Type: Grant
    Filed: April 30, 2019
    Date of Patent: May 3, 2022
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Takuya Yoshioka, Andreas Stolcke, Zhuo Chen, Dimitrios Basile Dimitriadis, Nanshan Zeng, Lijuan Qin, William Isaac Hinthorn, Xuedong Huang
  • Publication number: 20220050922
    Abstract: Methods for speaker role determination and scrubbing identifying information are performed by systems and devices. In speaker role determination, data from an audio or text file is divided into respective portions related to speaking parties. Characteristics classifying the portions of the data for speaking party roles are identified in the portions to generate data sets from the portions corresponding to the speaking party roles and to assign speaking party roles for the data sets. For scrubbing identifying information in data, audio data for speaking parties is processed using speech recognition to generate a text-based representation. Text associated with identifying information is determined based on a set of key words/phrases, and a portion of the text-based representation that includes a part of the text is identified. A segment of audio data that corresponds to the identified portion is replaced with different audio data, and the portion is replaced with different text.
    Type: Application
    Filed: October 28, 2021
    Publication date: February 17, 2022
    Inventors: Yun-Cheng Ju, Ashwarya Poddar, Royi Ronen, Oron Nir, Ami Turgman, Andreas Stolcke, Edan Hauon
  • Publication number: 20210407516
    Abstract: A computer implemented method includes receiving audio signals representative of speech via multiple audio streams transmitted from corresponding multiple distributed devices, performing, via a neural network model, continuous speech separation for one or more of the received audio signals having overlapped speech, and providing the separated speech on a fixed number of separate output audio channels.
    Type: Application
    Filed: September 13, 2021
    Publication date: December 30, 2021
    Inventors: Takuya Yoshioka, Andreas Stolcke, Zhuo Chen, Dimitrios Basile Dimitriadis, Nanshan Zeng, Lijuan Qin, William Isaac Hinthorn, Xuedong Huang
  • Patent number: 11182504
    Abstract: Methods for speaker role determination and scrubbing identifying information are performed by systems and devices. In speaker role determination, data from an audio or text file is divided into respective portions related to speaking parties. Characteristics classifying the portions of the data for speaking party roles are identified in the portions to generate data sets from the portions corresponding to the speaking party roles and to assign speaking party roles for the data sets. For scrubbing identifying information in data, audio data for speaking parties is processed using speech recognition to generate a text-based representation. Text associated with identifying information is determined based on a set of key words/phrases, and a portion of the text-based representation that includes a part of the text is identified. A segment of audio data that corresponds to the identified portion is replaced with different audio data, and the portion is replaced with different text.
    Type: Grant
    Filed: April 29, 2019
    Date of Patent: November 23, 2021
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Yun-Cheng Ju, Ashwarya Poddar, Royi Ronen, Oron Nir, Ami Turgman, Andreas Stolcke, Edan Hauon
  • Patent number: 11138980
    Abstract: A computer implemented method includes receiving audio signals representative of speech via multiple audio streams transmitted from corresponding multiple distributed devices, performing, via a neural network model, continuous speech separation for one or more of the received audio signals having overlapped speech, and providing the separated speech on a fixed number of separate output audio channels.
    Type: Grant
    Filed: April 30, 2019
    Date of Patent: October 5, 2021
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Takuya Yoshioka, Andreas Stolcke, Zhuo Chen, Dimitrios Basile Dimitriadis, Nanshan Zeng, Lijuan Qin, William Isaac Hinthorn, Xuedong Huang
  • Patent number: 11062706
    Abstract: Methods for speaker role determination and scrubbing identifying information are performed by systems and devices. In speaker role determination, data from an audio or text file is divided into respective portions related to speaking parties. Characteristics classifying the portions of the data for speaking party roles are identified in the portions to generate data sets from the portions corresponding to the speaking party roles and to assign speaking party roles for the data sets. For scrubbing identifying information in data, audio data for speaking parties is processed using speech recognition to generate a text-based representation. Text associated with identifying information is determined based on a set of key words/phrases, and a portion of the text-based representation that includes a part of the text is identified. A segment of audio data that corresponds to the identified portion is replaced with different audio data, and the portion is replaced with different text.
    Type: Grant
    Filed: April 29, 2019
    Date of Patent: July 13, 2021
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Yun-Cheng Ju, Ashwarya Poddar, Royi Ronen, Oron Nir, Ami Turgman, Andreas Stolcke, Edan Hauon
  • Patent number: 11023690
    Abstract: Systems and methods for providing customized output based on a user preference in a distributed system are provided. In example embodiments, a meeting server or system receives audio streams from a plurality of distributed devices involved in an intelligent meeting. The meeting system identifies a user corresponding to a distributed device of the plurality of distributed devices and determines a preferred language of the user. A transcript from the received audio streams is generated. The meeting system translates the transcript into the preferred language of the user to form a translated transcript. The translated transcript is provided to the distributed device of the user.
    Type: Grant
    Filed: April 30, 2019
    Date of Patent: June 1, 2021
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Takuya Yoshioka, Andreas Stolcke, Zhuo Chen, Dimitrios Basile Dimitriadis, Nanshan Zeng, Lijuan Qin, William Isaac Hinthorn, Xuedong Huang
  • Publication number: 20200349953
    Abstract: A computer implemented method includes receiving information streams on a meeting server from a set of multiple distributed devices included in a meeting, receiving audio signals representative of speech by at least two users in at least two of the information streams, receiving at least one video signal of at least one user in the information streams, associating a specific user with speech in the received audio signals as a function of the received audio and video signals, and generating a transcript of the meeting with an indication of the specific user associated with the speech.
    Type: Application
    Filed: April 30, 2019
    Publication date: November 5, 2020
    Inventors: Lijuan Qin, Nanshan Zeng, Dimitrios Basile Dimitriadis, Zhuo Chen, Andreas Stolcke, Takuya Yoshioka, William Isaac Hinthorn, Xuedong Huang
  • Publication number: 20200349950
    Abstract: A computer implemented method processes audio streams recorded during a meeting by a plurality of distributed devices.
    Type: Application
    Filed: April 30, 2019
    Publication date: November 5, 2020
    Inventors: Takuya Yoshioka, Andreas Stolcke, Zhuo Chen, Dimitrios Basile Dimitriadis, Nanshan Zeng, Lijuan Qin, William Isaac Hinthorn, Xuedong Huang
  • Publication number: 20200349230
    Abstract: Systems and methods for providing customized output based on a user preference in a distributed system are provided. In example embodiments, a meeting server or system receives audio streams from a plurality of distributed devices involved in an intelligent meeting. The meeting system identifies a user corresponding to a distributed device of the plurality of distributed devices and determines a preferred language of the user. A transcript from the received audio streams is generated. The meeting system translates the transcript into the preferred language of the user to form a translated transcript. The translated transcript is provided to the distributed device of the user.
    Type: Application
    Filed: April 30, 2019
    Publication date: November 5, 2020
    Inventors: Takuya Yoshioka, Andreas Stolcke, Zhuo Chen, Dimitrios Basile Dimitriadis, Nanshan Zeng, Lijuan Qin, William Isaac Hinthorn, Xuedong Huang
  • Publication number: 20200351603
    Abstract: A computer implemented method includes receiving multiple channels of audio from three or more microphones detecting speech from a meeting of multiple users, localizing speech sources to determine an approximate direction of arrival of speech from a user, using a speech unmixing model to select two channels corresponding to a primary and a secondary microphone, and sending the two selected channels to a meeting server for generation of a speaker attributed meeting transcript.
    Type: Application
    Filed: April 30, 2019
    Publication date: November 5, 2020
    Inventors: William Isaac Hinthorn, Lijuan Qin, Nanshan Zeng, Dimitrios Basile Dimitriadis, Zhuo Chen, Andreas Stolcke, Takuya Yoshioka, Xuedong Huang
  • Publication number: 20200349949
    Abstract: A computer implemented method includes receiving audio streams at a meeting server from two distributed devices that are streaming audio captured during an ad-hoc meeting between at least two users, comparing the received audio streams to determine that the received audio streams are representative of sound from the ad-hoc meeting, generating a meeting instance to process the audio streams in response to the comparing determining that the audio streams are representative of sound from the ad-hoc meeting, and processing the received audio streams to generate a transcript of the ad-hoc meeting.
    Type: Application
    Filed: April 30, 2019
    Publication date: November 5, 2020
    Inventors: Takuya Yoshioka, Andreas Stolcke, Zhuo Chen, Dimitrios Basile Dimitriadis, Nanshan Zeng, Lijuan Qin, William Isaac Hinthorn, Xuedong Huang
  • Publication number: 20200349954
    Abstract: A computer implemented method includes receiving audio signals representative of speech via multiple audio streams transmitted from corresponding multiple distributed devices, performing, via a neural network model, continuous speech separation for one or more of the received audio signals having overlapped speech, and providing the separated speech on a fixed number of separate output audio channels.
    Type: Application
    Filed: April 30, 2019
    Publication date: November 5, 2020
    Inventors: Takuya Yoshioka, Andreas Stolcke, Zhuo Chen, Dimitrios Basile Dimitriadis, Nanshan Zeng, Lijuan Qin, William Isaac Hinthorn, Xuedong Huang
  • Publication number: 20200342138
    Abstract: Methods for speaker role determination and scrubbing identifying information are performed by systems and devices. In speaker role determination, data from an audio or text file is divided into respective portions related to speaking parties. Characteristics classifying the portions of the data for speaking party roles are identified in the portions to generate data sets from the portions corresponding to the speaking party roles and to assign speaking party roles for the data sets. For scrubbing identifying information in data, audio data for speaking parties is processed using speech recognition to generate a text-based representation. Text associated with identifying information is determined based on a set of key words/phrases, and a portion of the text-based representation that includes a part of the text is identified. A segment of audio data that corresponds to the identified portion is replaced with different audio data, and the portion is replaced with different text.
    Type: Application
    Filed: April 29, 2019
    Publication date: October 29, 2020
    Inventors: Yun-Cheng Ju, Ashwarya Poddar, Royi Ronen, Oron Nir, Ami Turgman, Andreas Stolcke, Edan Hauon
  • Publication number: 20200342860
    Abstract: Methods for speaker role determination and scrubbing identifying information are performed by systems and devices. In speaker role determination, data from an audio or text file is divided into respective portions related to speaking parties. Characteristics classifying the portions of the data for speaking party roles are identified in the portions to generate data sets from the portions corresponding to the speaking party roles and to assign speaking party roles for the data sets. For scrubbing identifying information in data, audio data for speaking parties is processed using speech recognition to generate a text-based representation. Text associated with identifying information is determined based on a set of key words/phrases, and a portion of the text-based representation that includes a part of the text is identified. A segment of audio data that corresponds to the identified portion is replaced with different audio data, and the portion is replaced with different text.
    Type: Application
    Filed: April 29, 2019
    Publication date: October 29, 2020
    Inventors: Yun-Cheng Ju, Ashwarya Poddar, Royi Ronen, Oron Nir, Ami Turgman, Andreas Stolcke, Edan Hauon
  • Patent number: 10812921
    Abstract: A computer implemented method includes receiving multiple channels of audio from three or more microphones detecting speech from a meeting of multiple users, localizing speech sources to determine an approximate direction of arrival of speech from a user, using a speech unmixing model to select two channels corresponding to a primary and a secondary microphone, and sending the two selected channels to a meeting server for generation of a speaker attributed meeting transcript.
    Type: Grant
    Filed: April 30, 2019
    Date of Patent: October 20, 2020
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: William Isaac Hinthorn, Lijuan Qin, Nanshan Zeng, Dimitrios Basile Dimitriadis, Zhuo Chen, Andreas Stolcke, Takuya Yoshioka, Xuedong Huang
  • Patent number: 10743107
    Abstract: A computer implemented method includes receiving audio signals representative of speech via multiple audio channels transmitted from corresponding multiple distributed devices, designating one of the audio channels as a reference channel, and for each of the remaining audio channels, determine a difference in time from the reference channel, and correcting each remaining audio channel by compensating for the corresponding difference in time from the reference channel.
    Type: Grant
    Filed: April 30, 2019
    Date of Patent: August 11, 2020
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Takuya Yoshioka, Andreas Stolcke, Zhuo Chen, Dimitrios Basile Dimitriadis, Nanshan Zeng, Lijuan Qin, William Isaac Hinthorn, Xuedong Huang
  • Patent number: 10529321
    Abstract: Prosodic features are used for discriminating computer-directed speech from human-directed speech. Statistics and models describing energy/intensity patterns over time, speech/pause distributions, pitch patterns, vocal effort features, and speech segment duration patterns may be used for prosodic modeling. The prosodic features for at least a portion of an utterance are monitored over a period of time to determine a shape associated with the utterance. A score may be determined to assist in classifying the current utterance as human directed or computer directed without relying on knowledge of preceding utterances or utterances following the current utterance. Outside data may be used for training lexical addressee detection systems for the H-H-C scenario. H-C training data can be obtained from a single-user H-C collection and that H-H speech can be modeled using general conversational speech. H-C and H-H language models may also be adapted using interpolation with small amounts of matched H-H-C data.
    Type: Grant
    Filed: August 7, 2017
    Date of Patent: January 7, 2020
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Elizabeth Shriberg, Andreas Stolcke, Dilek Hakkani-Tur, Larry Heck, Heeyoung Lee