Patents by Inventor Xuedong Huang

Xuedong Huang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20200351603
    Abstract: A computer implemented method includes receiving multiple channels of audio from three or more microphones detecting speech from a meeting of multiple users, localizing speech sources to determine an approximate direction of arrival of speech from a user, using a speech unmixing model to select two channels corresponding to a primary and a secondary microphone, and sending the two selected channels to a meeting server for generation of a speaker attributed meeting transcript.
    Type: Application
    Filed: April 30, 2019
    Publication date: November 5, 2020
    Inventors: William Isaac Hinthorn, Lijuan Qin, Nanshan Zeng, Dimitrios Basile Dimitriadis, Zhuo Chen, Andreas Stolcke, Takuya Yoshioka, Xuedong Huang
  • Publication number: 20200349954
    Abstract: A computer implemented method includes receiving audio signals representative of speech via multiple audio streams transmitted from corresponding multiple distributed devices, performing, via a neural network model, continuous speech separation for one or more of the received audio signals having overlapped speech, and providing the separated speech on a fixed number of separate output audio channels.
    Type: Application
    Filed: April 30, 2019
    Publication date: November 5, 2020
    Inventors: Takuya Yoshioka, Andreas Stolcke, Zhuo Chen, Dimitrios Basile Dimitriadis, Nanshan Zeng, Lijuan Qin, William Isaac Hinthorn, Xuedong Huang
  • Publication number: 20200349949
    Abstract: A computer implemented method includes receiving audio streams at a meeting server from two distributed devices that are streaming audio captured during an ad-hoc meeting between at least two users, comparing the received audio streams to determine that the received audio streams are representative of sound from the ad-hoc meeting, generating a meeting instance to process the audio streams in response to the comparing determining that the audio streams are representative of sound from the ad-hoc meeting, and processing the received audio streams to generate a transcript of the ad-hoc meeting.
    Type: Application
    Filed: April 30, 2019
    Publication date: November 5, 2020
    Inventors: Takuya Yoshioka, Andreas Stolcke, Zhuo Chen, Dimitrios Basile Dimitriadis, Nanshan Zeng, Lijuan Qin, William Isaac Hinthorn, Xuedong Huang
  • Patent number: 10817678
    Abstract: Systems and methods may be used to provide transcription and translation services. A method may include initializing a plurality of user devices with respective language output selections in a translation group by receiving a shared identifier from the plurality of user devices and transcribing the audio stream to transcribed text. The method may include translating the transcribed text to one or more of the respective language output selections when an original language of the transcribed text differs from the one or more of the respective language output selections. The method may include sending, a user device in the translation group, the transcribed text including translated text in a language corresponding to the respective language output selection for the user device. In an example, the method may include customizing the transcription or the translation, such as to a particular topic, location, user, or the like.
    Type: Grant
    Filed: August 5, 2019
    Date of Patent: October 27, 2020
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: William D. Lewis, Ivo José Garcia Dos Santos, Tanvi Surti, Arul A. Menezes, Olivier Nano, Christian Wendt, Xuedong Huang
  • Patent number: 10812921
    Abstract: A computer implemented method includes receiving multiple channels of audio from three or more microphones detecting speech from a meeting of multiple users, localizing speech sources to determine an approximate direction of arrival of speech from a user, using a speech unmixing model to select two channels corresponding to a primary and a secondary microphone, and sending the two selected channels to a meeting server for generation of a speaker attributed meeting transcript.
    Type: Grant
    Filed: April 30, 2019
    Date of Patent: October 20, 2020
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: William Isaac Hinthorn, Lijuan Qin, Nanshan Zeng, Dimitrios Basile Dimitriadis, Zhuo Chen, Andreas Stolcke, Takuya Yoshioka, Xuedong Huang
  • Publication number: 20200327148
    Abstract: Systems and methods for enhanced content capture on a computing device are presented. In operation, a user interaction is detected on a computing device with the intent to capture content to a content store associated with the computer user operating the computing device. A content capture service is executed to capture content to the content store, comprising the following: applications executing on the computing device are notified to suspend output to display views corresponding to the applications; content to be captured to the content store is identified and obtained; the applications executing on the computing device are notified to resume output to display views; and automatically storing the obtained content in a content store associated with the computer user.
    Type: Application
    Filed: June 26, 2020
    Publication date: October 15, 2020
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Madhur Dixit, Chinmay Vaishampayan, Justin Varacheril George, Nirav Ashwin Kamdar, Deepak Achuthan Menon, Srinivasa V. Thirumalai-Anandanpillai, Ramindar Singh Khatra, Xuedong Huang, Akshad Viswanathan
  • Patent number: 10743107
    Abstract: A computer implemented method includes receiving audio signals representative of speech via multiple audio channels transmitted from corresponding multiple distributed devices, designating one of the audio channels as a reference channel, and for each of the remaining audio channels, determine a difference in time from the reference channel, and correcting each remaining audio channel by compensating for the corresponding difference in time from the reference channel.
    Type: Grant
    Filed: April 30, 2019
    Date of Patent: August 11, 2020
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Takuya Yoshioka, Andreas Stolcke, Zhuo Chen, Dimitrios Basile Dimitriadis, Nanshan Zeng, Lijuan Qin, William Isaac Hinthorn, Xuedong Huang
  • Patent number: 10606989
    Abstract: Methods, apparatuses, computer program products, devices and systems are described that carry out accessing at least one persona that includes a unique identifier that is at least partly based on a first user's device-identifier data and the first user's network-participation data; verifying the persona by comparing the first user's device-identifier data and the first user's network-participation data of the unique identifier to a second user's device-identifier data and the second user's network-participation data; and presenting the persona in response to a request for personal information.
    Type: Grant
    Filed: December 30, 2011
    Date of Patent: March 31, 2020
    Assignee: Elwha LLC
    Inventors: Marc E. Davis, Matthew G. Dyor, William Gates, Xuedong Huang, Roderick A. Hyde, Edward K. Y. Jung, Jordin T. Kare, Royce A. Levien, Richard T. Lord, Robert W. Lord, Qi Lu, Mark A. Malamud, Nathan P. Myhrvold, Satya Nadella, Daniel Reed, Harry Shum, Clarence T. Tegreene, Lowell L. Wood, Jr.
  • Publication number: 20200082824
    Abstract: Systems, methods, and computer-readable storage devices are disclosed for generating smart notes for a meeting based on participant actions and machine learning. One method including: receiving meeting data from a plurality of participant devices participating in an online meeting; continuously generating text data based on the received audio data from each participant device of the plurality of participant devices; iteratively performing the following steps until receiving meeting data for the meeting has ended, the steps including: receiving an indication that a predefined action has occurred on the first participating device; generating a participant segment of the meeting data for at least the first participant device from a first predetermined time before when the predefined action occurred to when the predefined action occurred; determining whether the receiving meeting data of the meeting has ended; and generating a summary of the meeting.
    Type: Application
    Filed: November 13, 2019
    Publication date: March 12, 2020
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Heiko RAHMEL, Li-Juan QIN, Xuedong HUANG, Wei XIONG
  • Publication number: 20200034437
    Abstract: Systems and methods may be used to provide transcription and translation services. A method may include initializing a plurality of user devices with respective language output selections in a translation group by receiving a shared identifier from the plurality of user devices and transcribing the audio stream to transcribed text. The method may include translating the transcribed text to one or more of the respective language output selections when an original language of the transcribed text differs from the one or more of the respective language output selections. The method may include sending, a user device in the translation group, the transcribed text including translated text in a language corresponding to the respective language output selection for the user device. In an example, the method may include customizing the transcription or the translation, such as to a particular topic, location, user, or the like.
    Type: Application
    Filed: August 5, 2019
    Publication date: January 30, 2020
    Inventors: William D. Lewis, Ivo José Garcia Dos Santos, Tanvi Surti, Arul A. Menezes, Olivier Nano, Christian Wendt, Xuedong Huang
  • Patent number: 10546306
    Abstract: Methods, apparatuses, computer program products, devices and systems are described that carry out accepting at least one persona from a party to a transaction; evaluating the transaction; and negotiating receipt of at least one different persona from the party to the transaction at least partly based on an evaluation of the transaction.
    Type: Grant
    Filed: December 29, 2011
    Date of Patent: January 28, 2020
    Assignee: Elwha LLC
    Inventors: Marc E. Davis, Matthew G. Dyor, William Gates, Xuedong Huang, Roderick A. Hyde, Edward K. Y. Jung, Jordin T. Kare, Royce A. Levien, Richard T. Lord, Robert W. Lord, Qi Lu, Mark A. Malamud, Nathan P. Myhrvold, Satya Nadella, Daniel Reed, Harry Shum, Clarence T. Tegreene, Lowell L. Wood, Jr.
  • Patent number: 10546295
    Abstract: Methods, apparatuses, computer program products, devices and systems are described that carry out accepting at least one request for personal information from a party to a transaction; evaluating the transaction; and negotiating presentation of at least one persona to the party to the transaction at least partly based on an evaluation of the transaction.
    Type: Grant
    Filed: December 29, 2011
    Date of Patent: January 28, 2020
    Assignee: Elwha LLC
    Inventors: Marc E. Davis, Matthew G. Dyor, William Gates, Xuedong Huang, Roderick A. Hyde, Edward K. Y. Jung, Jordin T. Kare, Royce A. Levien, Richard T. Lord, Robert W. Lord, Qi Lu, Mark A. Malamud, Nathan P. Myhrvold, Satya Nadella, Daniel Reed, Harry Shum, Clarence T. Tegreene, Lowell L. Wood, Jr.
  • Patent number: 10523618
    Abstract: Methods, apparatuses, computer program products, devices and systems are described that carry out accepting at least one email communication from at least one member of a network; disambiguating the at least one search term including associating the at least one search term with at least one of network-participation identifier data or device-identifier data; and presenting the sender profile in association with the at least one email communication.
    Type: Grant
    Filed: December 16, 2011
    Date of Patent: December 31, 2019
    Assignee: Elwha LLC
    Inventors: Marc E. Davis, Matthew G. Dyor, William Gates, Xuedong Huang, Roderick A. Hyde, Edward K. Y. Jung, Jordin T. Kare, Royce A. Levien, Richard T. Lord, Robert W. Lord, Qi Lu, Mark A. Malamud, Nathan P. Myhrvold, Satya Nadella, Daniel Reed, Harry Shum, Clarence T. Tegreene, Lowell L. Wood, Jr.
  • Patent number: 10510346
    Abstract: Systems, methods, and computer-readable storage devices are disclosed for generating smart notes for a meeting based on participant actions and machine learning. One method including: receiving meeting data from a plurality of participant devices participating in an online meeting; continuously generating text data based on the received audio data from each participant device of the plurality of participant devices; iteratively performing the following steps until receiving meeting data for the meeting has ended, the steps including: receiving an indication that a predefined action has occurred on the first participating device; generating a participant segment of the meeting data for at least the first participant device from a first predetermined time before when the predefined action occurred to when the predefined action occurred; determining whether the receiving meeting data of the meeting has ended; and generating a summary of the meeting.
    Type: Grant
    Filed: November 9, 2017
    Date of Patent: December 17, 2019
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Heiko Rahmel, Li-Juan Qin, Xuedong Huang, Wei Xiong
  • Publication number: 20190377733
    Abstract: Systems, methods, and computer-readable storage media are provided for conducting searches utilizing search navigation patterns. Search queries are received that include search terms that are of a particular type. It is recognized that at least one prior search session has been conducted that included a search query having search terms of an equivalent or similar type and followed a particular navigation pattern. Such prior search(es) may have been conducted by the user or by a different user and/or may have a navigation pattern that was affirmatively recorded by the requesting user or that was recorded by the system without explicit contemporaneous user instruction to do so. Upon identifying the navigation pattern associated with the prior search, the system effectively conducts a search session following the navigation pattern.
    Type: Application
    Filed: June 24, 2019
    Publication date: December 12, 2019
    Inventors: Anoop GUPTA, Xuedong HUANG
  • Publication number: 20190341050
    Abstract: A method for facilitating a remote conference includes receiving a digital video and a computer-readable audio signal. A face recognition machine is operated to recognize a face of a first conference participant in the digital video, and a speech recognition machine is operated to translate the computer-readable audio signal into a first text. An attribution machine attributes the text to the first conference participant. A second computer-readable audio signal is processed similarly, to obtain a second text attributed to a second conference participant. A transcription machine automatically creates a transcript including the first text attributed to the first conference participant and the second text attributed to the second conference participant.
    Type: Application
    Filed: June 29, 2018
    Publication date: November 7, 2019
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Adi DIAMANT, Karen MASTER BEN-DOR, Eyal KRUPKA, Raz HALALY, Yoni SMOLIN, Ilya GURVICH, Aviv HURVITZ, Lijuan QIN, Wei XIONG, Shixiong ZHANG, Lingfeng WU, Xiong XIAO, Ido LEICHTER, Moshe DAVID, Xuedong HUANG, Amit Kumar AGARWAL
  • Patent number: 10460727
    Abstract: Various systems and methods for multi-talker speech separation and recognition are disclosed herein. In one example, a system includes a memory and a processor to process mixed speech audio received from a microphone. In an example, the processor can also separate the mixed speech audio using permutation invariant training, wherein a criterion of the permutation invariant training is defined on an utterance of the mixed speech audio. In an example, the processor can also generate a plurality of separated streams for submission to a speech decoder.
    Type: Grant
    Filed: May 23, 2017
    Date of Patent: October 29, 2019
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: James Droppo, Xuedong Huang, Dong Yu
  • Patent number: 10417349
    Abstract: Systems and methods may be used to provide transcription and translation services. A method may include initializing a plurality of user devices with respective language output selections in a translation group by receiving a shared identifier from the plurality of user devices and transcribing the audio stream to transcribed text. The method may include translating the transcribed text to one or more of the respective language output selections when an original language of the transcribed text differs from the one or more of the respective language output selections. The method may include sending, a user device in the translation group, the transcribed text including translated text in a language corresponding to the respective language output selection for the user device. In an example, the method may include customizing the transcription or the translation, such as to a particular topic, location, user, or the like.
    Type: Grant
    Filed: June 14, 2017
    Date of Patent: September 17, 2019
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: William D Lewis, Ivo José Garcia dos Santos, Tanvi Surti, Arul A Menezes, Olivier Nano, Christian Wendt, Xuedong Huang
  • Publication number: 20190236416
    Abstract: In some embodiments, the disclosed subject matter involves a system and method relating to using an ambient capture device including a fisheye camera and a microphone array to capture audio and video in an environment, for use in an artificial intelligence (Al) application. The device with fisheye camera may provide approximately a 360° audio and video view, at relatively low cost. An embodiment may utilize a speech and vision fusion model component. The speech and vision fusion model may be trained using deep learning to combine features from many different sources, including available sensor data from the capture device. A long short term memory (LSTM) model may inter or identify features such as, but not limited to: audio direction; vision detection and tracking; voice signature; facial signature; gesture recognition; and object identification. The fusion processing may be performed by a cloud server, enabling the capture device to remain less complex.
    Type: Application
    Filed: January 31, 2018
    Publication date: August 1, 2019
    Inventors: Zhenghao Wang, Xuedong Huang, Lijuan Qin, Kun Wu, Huaming Wang
  • Patent number: 10331686
    Abstract: Systems, methods, and computer-readable storage media are provided for conducting searches utilizing search navigation patterns. Search queries are received that include search terms that are of a particular type. It is recognized that at least one prior search session has been conducted that included a search query having search terms of an equivalent or similar type and followed a particular navigation pattern. Such prior search(es) may have been conducted by the user or by a different user and/or may have a navigation pattern that was affirmatively recorded by the requesting user or that was recorded by the system without explicit contemporaneous user instruction to do so. Upon identifying the navigation pattern associated with the prior search, the system effectively conducts a search session following the navigation pattern.
    Type: Grant
    Filed: March 14, 2013
    Date of Patent: June 25, 2019
    Assignee: MICROSOFT CORPORATION
    Inventors: Anoop Gupta, Xuedong Huang