Patents by Inventor Kaisheng Yao

Kaisheng Yao has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11727914
    Abstract: An example intent-recognition system comprises a processor and memory storing instructions. The instructions cause the processor to receive speech input comprising spoken words. The instructions cause the processor to generate text results based on the speech input and generate acoustic feature annotations based on the speech input. The instructions also cause the processor to apply an intent model to the text result and the acoustic feature annotations to recognize an intent based on the speech input. An example system for adapting an emotional text-to-speech model comprises a processor and memory. The memory stores instructions that cause the processor to receive training examples comprising speech input and receive labelling data comprising emotion information associated with the speech input. The instructions also cause the processor to extract audio signal vectors from the training examples and generate an emotion-adapted voice font model based on the audio signal vectors and the labelling data.
    Type: Grant
    Filed: December 24, 2021
    Date of Patent: August 15, 2023
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Pei Zhao, Kaisheng Yao, Max Leung, Bo Yan, Jian Luan, Yu Shi, Malone Ma, Mei-Yuh Hwang
  • Patent number: 11386166
    Abstract: Data storage and calling methods and devices are provided. One of the methods includes: receiving first motion data and business data; establishing an association relationship between the first motion data and the business data and storing the association relationship; receiving second motion data; and determining first motion data that matches the second motion data, and returning, to a sender of the second motion data, business data associated with the matched first motion data.
    Type: Grant
    Filed: June 12, 2019
    Date of Patent: July 12, 2022
    Assignee: Advanced New Technologies Co., Ltd.
    Inventors: Kaisheng Yao, Peng Xu, Yuan Qi, Xiaofu Chang
  • Patent number: 11334632
    Abstract: Data storage and calling methods and devices are provided. One of the methods includes: receiving first motion data and business data; establishing an association relationship between the first motion data and the business data and storing the association relationship; receiving second motion data; and determining first motion data that matches the second motion data, and returning, to a sender of the second motion data, business data associated with the matched first motion data.
    Type: Grant
    Filed: January 28, 2020
    Date of Patent: May 17, 2022
    Assignee: Advanced New Technologies Co., Ltd.
    Inventors: Kaisheng Yao, Peng Xu, Yuan Qi, Xiaofu Chang
  • Publication number: 20220122580
    Abstract: An example intent-recognition system comprises a processor and memory storing instructions. The instructions cause the processor to receive speech input comprising spoken words. The instructions cause the processor to generate text results based on the speech input and generate acoustic feature annotations based on the speech input. The instructions also cause the processor to apply an intent model to the text result and the acoustic feature annotations to recognize an intent based on the speech input. An example system for adapting an emotional text-to-speech model comprises a processor and memory. The memory stores instructions that cause the processor to receive training examples comprising speech input and receive labelling data comprising emotion information associated with the speech input. The instructions also cause the processor to extract audio signal vectors from the training examples and generate an emotion-adapted voice font model based on the audio signal vectors and the labelling data.
    Type: Application
    Filed: December 24, 2021
    Publication date: April 21, 2022
    Inventors: Pei ZHAO, Kaisheng YAO, Max LEUNG, Bo YAN, Jian LUAN, Yu SHI, Malone MA, Mei-Yuh HWANG
  • Patent number: 11244689
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for determining voice characteristics are provided. One of the methods includes: obtaining speech data of a speaker; inputting the speech data into a model trained at least by jointly minimizing a first loss function and a second loss function, wherein the first loss function comprises a non-sampling-based loss function and the second loss function comprises a Gaussian mixture loss function with non-unit multi-variant covariance matrix; and obtaining from the trained model one or more voice characteristics of the speaker.
    Type: Grant
    Filed: March 22, 2021
    Date of Patent: February 8, 2022
    Assignee: ALIPAY (HANGZHOU) INFORMATION TECHNOLOGY CO., LTD.
    Inventors: Zhiming Wang, Kaisheng Yao, Xiaolong Li
  • Patent number: 11238842
    Abstract: An example intent-recognition system comprises a processor and memory storing instructions. The instructions cause the processor to receive speech input comprising spoken words. The instructions cause the processor to generate text results based on the speech input and generate acoustic feature annotations based on the speech input. The instructions also cause the processor to apply an intent model to the text result and the acoustic feature annotations to recognize an intent based on the speech input. An example system for adapting an emotional text-to-speech model comprises a processor and memory. The memory stores instructions that cause the processor to receive training examples comprising speech input and receive labelling data comprising emotion information associated with the speech input. The instructions also cause the processor to extract audio signal vectors from the training examples and generate an emotion-adapted voice font model based on the audio signal vectors and the labelling data.
    Type: Grant
    Filed: June 7, 2017
    Date of Patent: February 1, 2022
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Pei Zhao, Kaisheng Yao, Max Leung, Bo Yan, Jian Luan, Yu Shi, Malone Ma, Mei-Yuh Hwang
  • Patent number: 11100412
    Abstract: Implementations of the present specification provide a method and an apparatus for extending question and answer samples. According to the method, a random number is generated for each existing sample, a question is blurred for a sample whose random number belongs to sample extension random numbers, to generate an extended sample, so that an overall sample blurring extension rate can be effectively controlled. In addition, for a sample needing blurring extension, a question is extended by deleting a word with a predetermined part of speech in the corresponding question, and then an extended sample is generated based on an extended question, so that more question expression ways are compatible. As such, a question and answer model is trained by using a sample set to which extended samples are added, so that an answer can be provided to a user more effectively.
    Type: Grant
    Filed: March 13, 2020
    Date of Patent: August 24, 2021
    Assignee: Advanced New Technologies Co., Ltd.
    Inventors: Kaisheng Yao, Jiaxing Zhang, Jia Liu, Xiaolong Li
  • Publication number: 20210225357
    Abstract: An example intent-recognition system comprises a processor and memory storing instructions. The instructions cause the processor to receive speech input comprising spoken words. The instructions cause the processor to generate text results based on the speech input and generate acoustic feature annotations based on the speech input. The instructions also cause the processor to apply an intent model to the text result and the acoustic feature annotations to recognize an intent based on the speech input. An example system for adapting an emotional text-to-speech model comprises a processor and memory. The memory stores instructions that cause the processor to receive training examples comprising speech input and receive labelling data comprising emotion information associated with the speech input. The instructions also cause the processor to extract audio signal vectors from the training examples and generate an emotion-adapted voice font model based on the audio signal vectors and the labelling data.
    Type: Application
    Filed: June 7, 2017
    Publication date: July 22, 2021
    Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Pei ZHAO, Kaisheng YAO, Max LEUNG, Bo YAN, Jian LUAN, Yu SHI, Malone MA, Mei-Yuh HWANG
  • Publication number: 20210210101
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for determining voice characteristics are provided. One of the methods includes: obtaining speech data of a speaker; inputting the speech data into a model trained at least by jointly minimizing a first loss function and a second loss function, wherein the first loss function comprises a non-sampling-based loss function and the second loss function comprises a Gaussian mixture loss function with non-unit multi-variant covariance matrix; and obtaining from the trained model one or more voice characteristics of the speaker.
    Type: Application
    Filed: March 22, 2021
    Publication date: July 8, 2021
    Inventors: Zhiming WANG, Kaisheng YAO, Xiaolong LI
  • Patent number: 11031018
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for personalized speaker verification are provided. One of the methods includes: obtaining first speech data of a speaker as a positive sample and second speech data of an entity different from the speaker as a negative sample; feeding the positive sample and the negative sample to a first model for determining voice characteristics to correspondingly output a positive voice characteristic and a negative voice characteristic of the speaker; obtaining a gradient based at least on the positive voice characteristic and the negative voice characteristic; and feeding the gradient to the first model to update one or more parameters of the first model to obtain a second model for personalized speaker verification.
    Type: Grant
    Filed: December 22, 2020
    Date of Patent: June 8, 2021
    Assignee: ALIPAY (HANGZHOU) INFORMATION TECHNOLOGY CO., LTD.
    Inventors: Zhiming Wang, Kaisheng Yao, Xiaolong Li
  • Patent number: 10997980
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for determining voice characteristics are provided. One of the methods includes: obtaining speech data of a speaker; inputting the speech data into a model trained at least by jointly minimizing a first loss function and a second loss function, wherein the first loss function comprises a non-sampling-based loss function and the second function comprises a Gaussian mixture loss function with non-unit multi-variant covariance matrix; and obtaining from the trained model one or more voice characteristics of the speaker.
    Type: Grant
    Filed: October 27, 2020
    Date of Patent: May 4, 2021
    Assignee: ALIPAY (HANGZHOU) INFORMATION TECHNOLOGY CO., LTD.
    Inventors: Zhiming Wang, Kaisheng Yao, Xiaolong Li
  • Publication number: 20210110833
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for personalized speaker verification are provided. One of the methods includes: obtaining first speech data of a speaker as a positive sample and second speech data of an entity different from the speaker as a negative sample; feeding the positive sample and the negative sample to a first model for determining voice characteristics to correspondingly output a positive voice characteristic and a negative voice characteristic of the speaker; obtaining a gradient based at least on the positive voice characteristic and the negative voice characteristic; and feeding the gradient to the first model to update one or more parameters of the first model to obtain a second model for personalized speaker verification.
    Type: Application
    Filed: December 22, 2020
    Publication date: April 15, 2021
    Inventors: Zhiming WANG, Kaisheng YAO, Xiaolong LI
  • Publication number: 20210043216
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for determining voice characteristics are provided. One of the methods includes: obtaining speech data of a speaker; inputting the speech data into a model trained at least by jointly minimizing a first loss function and a second loss function, wherein the first loss function comprises a non-sampling-based loss function and the second function comprises a Gaussian mixture loss function with non-unit multi-variant covariance matrix; and obtaining from the trained model one or more voice characteristics of the speaker.
    Type: Application
    Filed: October 27, 2020
    Publication date: February 11, 2021
    Inventors: Zhiming WANG, Kaisheng YAO, Xiaolong LI
  • Publication number: 20210027177
    Abstract: Implementations of the present specification provide a method and an apparatus for extending question and answer samples. According to the method, a random number is generated for each existing sample, a question is blurred for a sample whose random number belongs to sample extension random numbers, to generate an extended sample, so that an overall sample blurring extension rate can be effectively controlled. In addition, for a sample needing blurring extension, a question is extended by deleting a word with a predetermined part of speech in the corresponding question, and then an extended sample is generated based on an extended question, so that more question expression ways are compatible. As such, a question and answer model is trained by using a sample set to which extended samples are added, so that an answer can be provided to a user more effectively.
    Type: Application
    Filed: March 13, 2020
    Publication date: January 28, 2021
    Inventors: Kaisheng Yao, Jiaxing Zhang, Jia Liu, Xiaolong Li
  • Patent number: 10867597
    Abstract: Technologies pertaining to slot filling are described herein. A deep neural network, a recurrent neural network, and/or a spatio-temporally deep neural network are configured to assign labels to words in a word sequence set forth in natural language. At least one label is a semantic label that is assigned to at least one word in the word sequence.
    Type: Grant
    Filed: September 2, 2013
    Date of Patent: December 15, 2020
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Anoop Deoras, Kaisheng Yao, Xiaodong He, Li Deng, Geoffrey Gerson Zweig, Ruhi Sarikaya, Dong Yu, Mei-Yuh Hwang, Gregoire Mesnil
  • Publication number: 20200167388
    Abstract: Data storage and calling methods and devices are provided. One of the methods includes: receiving first motion data and business data; establishing an association relationship between the first motion data and the business data and storing the association relationship; receiving second motion data; and determining first motion data that matches the second motion data, and returning, to a sender of the second motion data, business data associated with the matched first motion data.
    Type: Application
    Filed: January 28, 2020
    Publication date: May 28, 2020
    Inventors: Kaisheng YAO, Peng Xu, Yuan Qi, Xiaofu Chang
  • Publication number: 20190294632
    Abstract: Data storage and calling methods and devices are provided. One of the methods includes: receiving first motion data and business data; establishing an association relationship between the first motion data and the business data and storing the association relationship; receiving second motion data; and determining first motion data that matches the second motion data, and returning, to a sender of the second motion data, business data associated with the matched first motion data.
    Type: Application
    Filed: June 12, 2019
    Publication date: September 26, 2019
    Inventors: Kaisheng YAO, Peng Xu, Yuan Qi, Xiaofu Chang
  • Patent number: 10127901
    Abstract: The technology relates to converting text to speech utilizing recurrent neural networks (RNNs). The recurrent neural networks may be implemented as multiple modules for determining properties of the text. In embodiments, a part-of-speech RNN module, letter-to-sound RNN module, a linguistic prosody tagger RNN module, and a context awareness and semantic mining RNN module may all be utilized. The properties from the RNN modules are processed by a hyper-structure RNN module that determine the phonetic properties of the input text based on the outputs of the other RNN modules. The hyper-structure RNN module may generate a generation sequence that is capable of being converting to audible speech by a speech synthesizer. The generation sequence may also be optimized by a global optimization module prior to being synthesized into audible speech.
    Type: Grant
    Filed: June 13, 2014
    Date of Patent: November 13, 2018
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Pei Zhao, Max Leung, Kaisheng Yao, Bo Yan, Sheng Zhao, Fileno A. Alleva
  • Patent number: 10089974
    Abstract: An example text-to-speech learning system performs a method for generating a pronunciation sequence conversion model. The method includes generating a first pronunciation sequence from a speech input of a training pair and generating a second pronunciation sequence from a text input of the training pair. The method also includes determining a pronunciation sequence difference between the first pronunciation sequence and the second pronunciation sequence; and generating a pronunciation sequence conversion model based on the pronunciation sequence difference. An example speech recognition learning system performs a method for generating a pronunciation sequence conversion model. The method includes extracting an audio signal vector from a speech input and applying an audio signal conversion model to the audio signal vector to generate a converted audio signal vector. The method also includes adapting an acoustic model based on the converted audio signal vector to generate an adapted acoustic model.
    Type: Grant
    Filed: March 31, 2016
    Date of Patent: October 2, 2018
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Pei Zhao, Kaisheng Yao, Max Leung, Bo Yan
  • Publication number: 20170287465
    Abstract: An example text-to-speech learning system performs a method for generating a pronunciation sequence conversion model. The method includes generating a first pronunciation sequence from a speech input of a training pair and generating a second pronunciation sequence from a text input of the training pair. The method also includes determining a pronunciation sequence difference between the first pronunciation sequence and the second pronunciation sequence; and generating a pronunciation sequence conversion model based on the pronunciation sequence difference. An example speech recognition learning system performs a method for generating a pronunciation sequence conversion model. The method includes extracting an audio signal vector from a speech input and applying an audio signal conversion model to the audio signal vector to generate a converted audio signal vector. The method also includes adapting an acoustic model based on the converted audio signal vector to generate an adapted acoustic model.
    Type: Application
    Filed: March 31, 2016
    Publication date: October 5, 2017
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Pei Zhao, Kaisheng Yao, Max Leung, Bo Yan