Patents by Inventor Guoli YE

Guoli YE has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11798535
    Abstract: Generally discussed herein are devices, systems, and methods for on-device detection of a wake word. A device can include a memory including model parameters that define a custom wake word detection model, the wake word detection model including a recurrent neural network transducer (RNNT) and a lookup table (LUT), the LUT indicating a hidden vector to be provided in response to a phoneme of a user-specified wake word, a microphone to capture audio, and processing circuitry to receive the audio from the microphone, determine, using the wake word detection model, whether the audio includes an utterance of the user-specified wake word, and wake up a personal assistant after determining the audio includes the utterance of the user-specified wake word.
    Type: Grant
    Filed: September 14, 2021
    Date of Patent: October 24, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Emilian Stoimenov, Rui Zhao, Kaustubh Prakash Kalgaonkar, Ivaylo Andreanov Enchev, Khuram Shahid, Anthony Phillip Stark, Guoli Ye, Mahadevan Srinivasan, Yifan Gong, Hosam Adel Khalil
  • Patent number: 11790891
    Abstract: Generally discussed herein are devices, systems, and methods for custom wake word selection assistance. A method can include receiving, at a device, data indicating a custom wake word provided by a user, determining one or more characteristics of the custom wake word, determining that use of the custom wake word will cause more than a threshold rate of false detections based on the characteristics, rejecting the custom wake word as the wake word for accessing a personal assistant in response to determining that use of the custom wake word will cause more than a threshold rate of false detections, and setting the custom wake word as the wake word in response to determining that use of the custom wake word will not cause more than the threshold rate of false detections.
    Type: Grant
    Filed: December 1, 2021
    Date of Patent: October 17, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Emilian Stoimenov, Khuram Shahid, Guoli Ye, Hosam Adel Khalil, Yifan Gong
  • Publication number: 20230186919
    Abstract: Systems, methods, and devices are provided for generating and using text-to-speech (TTS) data for improved speech recognition models. A main model is trained with keyword independent baseline training data. In some instances, acoustic and language model sub-components of the main model are modified with new TTS training data. In some instances, the new TTS training is obtained from a multi-speaker neural TTS system for a keyword that is underrepresented in the baseline training data. In some instances, the new TTS training data is used for pronunciation learning and normalization of keyword dependent confidence scores in keyword spotting (KWS) applications. In some instances, the new TTS training data is used for rapid speaker adaptation in speech recognition models.
    Type: Application
    Filed: February 10, 2023
    Publication date: June 15, 2023
    Inventors: Guoli YE, Yan HUANG, Wenning WEI, Lei HE, Eva SHARMA, Jian WU, Yao TIAN, Edward C. LIN, Yifan GONG, Rui ZHAO, Jinyu LI, William Maxwell GALE
  • Patent number: 11587569
    Abstract: Systems, methods, and devices are provided for generating and using text-to-speech (TTS) data for improved speech recognition models. A main model is trained with keyword independent baseline training data. In some instances, acoustic and language model sub-components of the main model are modified with new TTS training data. In some instances, the new TTS training is obtained from a multi-speaker neural TTS system for a keyword that is underrepresented in the baseline training data. In some instances, the new TTS training data is used for pronunciation learning and normalization of keyword dependent confidence scores in keyword spotting (KWS) applications. In some instances, the new TTS training data is used for rapid speaker adaptation in speech recognition models.
    Type: Grant
    Filed: May 14, 2020
    Date of Patent: February 21, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Guoli Ye, Yan Huang, Wenning Wei, Lei He, Eva Sharma, Jian Wu, Yao Tian, Edward C. Lin, Yifan Gong, Rui Zhao, Jinyu Li, William Maxwell Gale
  • Publication number: 20220254334
    Abstract: Generally discussed herein are devices, systems, and methods for custom wake word selection assistance. A method can include receiving, at a device, data indicating a custom wake word provided by a user, determining one or more characteristics of the custom wake word, determining that use of the custom wake word will cause more than a threshold rate of false detections based on the characteristics, rejecting the custom wake word as the wake word for accessing a personal assistant in response to determining that use of the custom wake word will cause more than a threshold rate of false detections, and setting the custom wake word as the wake word in response to determining that use of the custom wake word will not cause more than the threshold rate of false detections.
    Type: Application
    Filed: December 1, 2021
    Publication date: August 11, 2022
    Inventors: Emilian Stoimenov, Khuram Shahid, Guoli Ye, Hosam Adel Khalil, Yifan Gong
  • Patent number: 11222622
    Abstract: Generally discussed herein are devices, systems, and methods for custom wake word selection assistance. A method can include receiving, at a device, data indicating a custom wake word provided by a user, determining one or more characteristics of the custom wake word, determining that use of the custom wake word will cause more than a threshold rate of false detections based on the characteristics, rejecting the custom wake word as the wake word for accessing a personal assistant in response to determining that use of the custom wake word will cause more than a threshold rate of false detections, and setting the custom wake word as the wake word in response to determining that use of the custom wake word will not cause more than the threshold rate of false detections.
    Type: Grant
    Filed: July 25, 2019
    Date of Patent: January 11, 2022
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Emilian Stoimenov, Khuram Shahid, Guoli Ye, Hosam Adel Khalil, Yifan Gong
  • Publication number: 20210407498
    Abstract: Generally discussed herein are devices, systems, and methods for on-device detection of a wake word. A device can include a memory including model parameters that define a custom wake word detection model, the wake word detection model including a recurrent neural network transducer (RNNT) and a lookup table (LUT), the LUT indicating a hidden vector to be provided in response to a phoneme of a user-specified wake word, a microphone to capture audio, and processing circuitry to receive the audio from the microphone, determine, using the wake word detection model, whether the audio includes an utterance of the user-specified wake word, and wake up a personal assistant after determining the audio includes the utterance of the user-specified wake word.
    Type: Application
    Filed: September 14, 2021
    Publication date: December 30, 2021
    Inventors: Emilian Stoimenov, Rui Zhao, Kaustubh Prakash Kalgaonkar, Ivaylo Andreanov Enchev, Khuram Shahid, Anthony Phillip Stark, Guoli Ye, Mahadevan Srinivasan, Yifan Gong, Hosam Adel Khalil
  • Publication number: 20210304769
    Abstract: Systems, methods, and devices are provided for generating and using text-to-speech (TTS) data for improved speech recognition models. A main model is trained with keyword independent baseline training data. In some instances, acoustic and language model sub-components of the main model are modified with new TTS training data. In some instances, the new TTS training is obtained from a multi-speaker neural TTS system for a keyword that is underrepresented in the baseline training data. In some instances, the new TTS training data is used for pronunciation learning and normalization of keyword dependent confidence scores in keyword spotting (KWS) applications. In some instances, the new TTS training data is used for rapid speaker adaptation in speech recognition models.
    Type: Application
    Filed: May 14, 2020
    Publication date: September 30, 2021
    Inventors: Guoli Ye, Yan Huang, Wenning Wei, Lei He, Eva Sharma, Jian Wu, Yao Tian, Edward C. Lin, Yifan Gong, Rui Zhao, Jinyu Li, William Maxwell Gale
  • Patent number: 11132992
    Abstract: Generally discussed herein are devices, systems, and methods for on-device detection of a wake word. A device can include a memory including model parameters that define a custom wake word detection model, the wake word detection model including a recurrent neural network transducer (RNNT) and a lookup table (LUT), the LUT indicating a hidden vector to be provided in response to a phoneme of a user-specified wake word, a microphone to capture audio, and processing circuitry to receive the audio from the microphone, determine, using the wake word detection model, whether the audio includes an utterance of the user-specified wake word, and wake up a personal assistant after determining the audio includes the utterance of the user-specified wake word.
    Type: Grant
    Filed: July 25, 2019
    Date of Patent: September 28, 2021
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Emilian Stoimenov, Rui Zhao, Kaustubh Prakash Kalgaonkar, Ivaylo Andreanov Enchev, Khuram Shahid, Anthony Phillip Stark, Guoli Ye, Mahadevan Srinivasan, Yifan Gong, Hosam Adel Khalil
  • Patent number: 10964309
    Abstract: A CS CTC model may be initialed from a major language CTC model by keeping network hidden weights and replacing output tokens with a union of major and secondary language output tokens. The initialized model may be trained by updating parameters with training data from both languages, and a LID model may also be trained with the data. During a decoding process for each of a series of audio frames, if silence dominates a current frame then a silence output token may be emitted. If silence does not dominate the frame, then a major language output token posterior vector from the CS CTC model may be multiplied with the LID major language probability to create a probability vector from the major language. A similar step is performed for the secondary language, and the system may emit an output token associated with the highest probability across all tokens from both languages.
    Type: Grant
    Filed: May 13, 2019
    Date of Patent: March 30, 2021
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Jinyu Li, Guoli Ye, Rui Zhao, Yifan Gong, Ke Li
  • Publication number: 20200349924
    Abstract: Generally discussed herein are devices, systems, and methods for custom wake word selection assistance. A method can include receiving, at a device, data indicating a custom wake word provided by a user, determining one or more characteristics of the custom wake word, determining that use of the custom wake word will cause more than a threshold rate of false detections based on the characteristics, rejecting the custom wake word as the wake word for accessing a personal assistant in response to determining that use of the custom wake word will cause more than a threshold rate of false detections, and setting the custom wake word as the wake word in response to determining that use of the custom wake word will not cause more than the threshold rate of false detections.
    Type: Application
    Filed: July 25, 2019
    Publication date: November 5, 2020
    Inventors: Emilian Stoimenov, Khuram Shahid, Guoli Ye, Hosam Adel Khalil, Yifan Gong
  • Publication number: 20200349927
    Abstract: Generally discussed herein are devices, systems, and methods for on-device detection of a wake word. A device can include a memory including model parameters that define a custom wake word detection model, the wake word detection model including a recurrent neural network transducer (RNNT) and a lookup table (LUT), the LUT indicating a hidden vector to be provided in response to a phoneme of a user-specified wake word, a microphone to capture audio, and processing circuitry to receive the audio from the microphone, determine, using the wake word detection model, whether the audio includes an utterance of the user-specified wake word, and wake up a personal assistant after determining the audio includes the utterance of the user-specified wake word.
    Type: Application
    Filed: July 25, 2019
    Publication date: November 5, 2020
    Inventors: Emilian Stoimenov, Rui Zhao, Kaustubh Prakash Kalgaonkar, Ivaylo Andreanov Enchev, Khuram Shahid, Anthony Phillip Stark, Guoli Ye, Mahadevan Srinivasan, Yifan Gong, Hosam Adel Khalil
  • Publication number: 20200335082
    Abstract: A CS CTC model may be initialed from a major language CTC model by keeping network hidden weights and replacing output tokens with a union of major and secondary language output tokens. The initialized model may be trained by updating parameters with training data from both languages, and a LID model may also be trained with the data. During a decoding process for each of a series of audio frames, if silence dominates a current frame then a silence output token may be emitted. If silence does not dominate the frame, then a major language output token posterior vector from the CS CTC model may be multiplied with the LID major language probability to create a probability vector from the major language. A similar step is performed for the secondary language, and the system may emit an output token associated with the highest probability across all tokens from both languages.
    Type: Application
    Filed: May 13, 2019
    Publication date: October 22, 2020
    Inventors: Jinyu LI, Guoli YE, Rui ZHAO, Yifan GONG, Ke LI
  • Patent number: 10629193
    Abstract: Non-limiting examples of the present disclosure describe advancements in acoustic-to-word modeling that improve accuracy in speech recognition processing through the replacement of out-of-vocabulary (OOV) tokens. During the decoding of speech signals, better accuracy in speech recognition processing is achieved through training and implementation of multiple different solutions that present enhanced speech recognition models. In one example, a hybrid neural network model for speech recognition processing combines a word-based neural network model as a primary model and a character-based neural network model as an auxiliary model. The primary word-based model emits a word sequence, and an output of character-based auxiliary model is consulted at a segment where the word-based model emits an OOV token. In another example, a mixed unit speech recognition model is developed and trained to generate a mixed word and character sequence during decoding of a speech signal without requiring generation of OOV tokens.
    Type: Grant
    Filed: March 9, 2018
    Date of Patent: April 21, 2020
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Guoli Ye, James Droppo, Jinyu Li, Rui Zhao, Yifan Gong
  • Publication number: 20190279614
    Abstract: Non-limiting examples of the present disclosure describe advancements in acoustic-to-word modeling that improve accuracy in speech recognition processing through the replacement of out-of-vocabulary (OOV) tokens. During the decoding of speech signals, better accuracy in speech recognition processing is achieved through training and implementation of multiple different solutions that present enhanced speech recognition models. In one example, a hybrid neural network model for speech recognition processing combines a word-based neural network model as a primary model and a character-based neural network model as an auxiliary model. The primary word-based model emits a word sequence, and an output of character-based auxiliary model is consulted at a segment where the word-based model emits an OOV token. In another example, a mixed unit speech recognition model is developed and trained to generate a mixed word and character sequence during decoding of a speech signal without requiring generation of OOV tokens.
    Type: Application
    Filed: March 9, 2018
    Publication date: September 12, 2019
    Inventors: Guoli YE, James DROPPO, Jinyu LI, Rui ZHAO, Yifan GONG