Patents by Inventor Kaisheng Yao
Kaisheng Yao has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11727914Abstract: An example intent-recognition system comprises a processor and memory storing instructions. The instructions cause the processor to receive speech input comprising spoken words. The instructions cause the processor to generate text results based on the speech input and generate acoustic feature annotations based on the speech input. The instructions also cause the processor to apply an intent model to the text result and the acoustic feature annotations to recognize an intent based on the speech input. An example system for adapting an emotional text-to-speech model comprises a processor and memory. The memory stores instructions that cause the processor to receive training examples comprising speech input and receive labelling data comprising emotion information associated with the speech input. The instructions also cause the processor to extract audio signal vectors from the training examples and generate an emotion-adapted voice font model based on the audio signal vectors and the labelling data.Type: GrantFiled: December 24, 2021Date of Patent: August 15, 2023Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Pei Zhao, Kaisheng Yao, Max Leung, Bo Yan, Jian Luan, Yu Shi, Malone Ma, Mei-Yuh Hwang
-
Patent number: 11386166Abstract: Data storage and calling methods and devices are provided. One of the methods includes: receiving first motion data and business data; establishing an association relationship between the first motion data and the business data and storing the association relationship; receiving second motion data; and determining first motion data that matches the second motion data, and returning, to a sender of the second motion data, business data associated with the matched first motion data.Type: GrantFiled: June 12, 2019Date of Patent: July 12, 2022Assignee: Advanced New Technologies Co., Ltd.Inventors: Kaisheng Yao, Peng Xu, Yuan Qi, Xiaofu Chang
-
Patent number: 11334632Abstract: Data storage and calling methods and devices are provided. One of the methods includes: receiving first motion data and business data; establishing an association relationship between the first motion data and the business data and storing the association relationship; receiving second motion data; and determining first motion data that matches the second motion data, and returning, to a sender of the second motion data, business data associated with the matched first motion data.Type: GrantFiled: January 28, 2020Date of Patent: May 17, 2022Assignee: Advanced New Technologies Co., Ltd.Inventors: Kaisheng Yao, Peng Xu, Yuan Qi, Xiaofu Chang
-
Publication number: 20220122580Abstract: An example intent-recognition system comprises a processor and memory storing instructions. The instructions cause the processor to receive speech input comprising spoken words. The instructions cause the processor to generate text results based on the speech input and generate acoustic feature annotations based on the speech input. The instructions also cause the processor to apply an intent model to the text result and the acoustic feature annotations to recognize an intent based on the speech input. An example system for adapting an emotional text-to-speech model comprises a processor and memory. The memory stores instructions that cause the processor to receive training examples comprising speech input and receive labelling data comprising emotion information associated with the speech input. The instructions also cause the processor to extract audio signal vectors from the training examples and generate an emotion-adapted voice font model based on the audio signal vectors and the labelling data.Type: ApplicationFiled: December 24, 2021Publication date: April 21, 2022Inventors: Pei ZHAO, Kaisheng YAO, Max LEUNG, Bo YAN, Jian LUAN, Yu SHI, Malone MA, Mei-Yuh HWANG
-
Patent number: 11244689Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for determining voice characteristics are provided. One of the methods includes: obtaining speech data of a speaker; inputting the speech data into a model trained at least by jointly minimizing a first loss function and a second loss function, wherein the first loss function comprises a non-sampling-based loss function and the second loss function comprises a Gaussian mixture loss function with non-unit multi-variant covariance matrix; and obtaining from the trained model one or more voice characteristics of the speaker.Type: GrantFiled: March 22, 2021Date of Patent: February 8, 2022Assignee: ALIPAY (HANGZHOU) INFORMATION TECHNOLOGY CO., LTD.Inventors: Zhiming Wang, Kaisheng Yao, Xiaolong Li
-
Patent number: 11238842Abstract: An example intent-recognition system comprises a processor and memory storing instructions. The instructions cause the processor to receive speech input comprising spoken words. The instructions cause the processor to generate text results based on the speech input and generate acoustic feature annotations based on the speech input. The instructions also cause the processor to apply an intent model to the text result and the acoustic feature annotations to recognize an intent based on the speech input. An example system for adapting an emotional text-to-speech model comprises a processor and memory. The memory stores instructions that cause the processor to receive training examples comprising speech input and receive labelling data comprising emotion information associated with the speech input. The instructions also cause the processor to extract audio signal vectors from the training examples and generate an emotion-adapted voice font model based on the audio signal vectors and the labelling data.Type: GrantFiled: June 7, 2017Date of Patent: February 1, 2022Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Pei Zhao, Kaisheng Yao, Max Leung, Bo Yan, Jian Luan, Yu Shi, Malone Ma, Mei-Yuh Hwang
-
Patent number: 11100412Abstract: Implementations of the present specification provide a method and an apparatus for extending question and answer samples. According to the method, a random number is generated for each existing sample, a question is blurred for a sample whose random number belongs to sample extension random numbers, to generate an extended sample, so that an overall sample blurring extension rate can be effectively controlled. In addition, for a sample needing blurring extension, a question is extended by deleting a word with a predetermined part of speech in the corresponding question, and then an extended sample is generated based on an extended question, so that more question expression ways are compatible. As such, a question and answer model is trained by using a sample set to which extended samples are added, so that an answer can be provided to a user more effectively.Type: GrantFiled: March 13, 2020Date of Patent: August 24, 2021Assignee: Advanced New Technologies Co., Ltd.Inventors: Kaisheng Yao, Jiaxing Zhang, Jia Liu, Xiaolong Li
-
Publication number: 20210225357Abstract: An example intent-recognition system comprises a processor and memory storing instructions. The instructions cause the processor to receive speech input comprising spoken words. The instructions cause the processor to generate text results based on the speech input and generate acoustic feature annotations based on the speech input. The instructions also cause the processor to apply an intent model to the text result and the acoustic feature annotations to recognize an intent based on the speech input. An example system for adapting an emotional text-to-speech model comprises a processor and memory. The memory stores instructions that cause the processor to receive training examples comprising speech input and receive labelling data comprising emotion information associated with the speech input. The instructions also cause the processor to extract audio signal vectors from the training examples and generate an emotion-adapted voice font model based on the audio signal vectors and the labelling data.Type: ApplicationFiled: June 7, 2017Publication date: July 22, 2021Applicant: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Pei ZHAO, Kaisheng YAO, Max LEUNG, Bo YAN, Jian LUAN, Yu SHI, Malone MA, Mei-Yuh HWANG
-
Publication number: 20210210101Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for determining voice characteristics are provided. One of the methods includes: obtaining speech data of a speaker; inputting the speech data into a model trained at least by jointly minimizing a first loss function and a second loss function, wherein the first loss function comprises a non-sampling-based loss function and the second loss function comprises a Gaussian mixture loss function with non-unit multi-variant covariance matrix; and obtaining from the trained model one or more voice characteristics of the speaker.Type: ApplicationFiled: March 22, 2021Publication date: July 8, 2021Inventors: Zhiming WANG, Kaisheng YAO, Xiaolong LI
-
Patent number: 11031018Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for personalized speaker verification are provided. One of the methods includes: obtaining first speech data of a speaker as a positive sample and second speech data of an entity different from the speaker as a negative sample; feeding the positive sample and the negative sample to a first model for determining voice characteristics to correspondingly output a positive voice characteristic and a negative voice characteristic of the speaker; obtaining a gradient based at least on the positive voice characteristic and the negative voice characteristic; and feeding the gradient to the first model to update one or more parameters of the first model to obtain a second model for personalized speaker verification.Type: GrantFiled: December 22, 2020Date of Patent: June 8, 2021Assignee: ALIPAY (HANGZHOU) INFORMATION TECHNOLOGY CO., LTD.Inventors: Zhiming Wang, Kaisheng Yao, Xiaolong Li
-
Patent number: 10997980Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for determining voice characteristics are provided. One of the methods includes: obtaining speech data of a speaker; inputting the speech data into a model trained at least by jointly minimizing a first loss function and a second loss function, wherein the first loss function comprises a non-sampling-based loss function and the second function comprises a Gaussian mixture loss function with non-unit multi-variant covariance matrix; and obtaining from the trained model one or more voice characteristics of the speaker.Type: GrantFiled: October 27, 2020Date of Patent: May 4, 2021Assignee: ALIPAY (HANGZHOU) INFORMATION TECHNOLOGY CO., LTD.Inventors: Zhiming Wang, Kaisheng Yao, Xiaolong Li
-
Publication number: 20210110833Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for personalized speaker verification are provided. One of the methods includes: obtaining first speech data of a speaker as a positive sample and second speech data of an entity different from the speaker as a negative sample; feeding the positive sample and the negative sample to a first model for determining voice characteristics to correspondingly output a positive voice characteristic and a negative voice characteristic of the speaker; obtaining a gradient based at least on the positive voice characteristic and the negative voice characteristic; and feeding the gradient to the first model to update one or more parameters of the first model to obtain a second model for personalized speaker verification.Type: ApplicationFiled: December 22, 2020Publication date: April 15, 2021Inventors: Zhiming WANG, Kaisheng YAO, Xiaolong LI
-
Publication number: 20210043216Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for determining voice characteristics are provided. One of the methods includes: obtaining speech data of a speaker; inputting the speech data into a model trained at least by jointly minimizing a first loss function and a second loss function, wherein the first loss function comprises a non-sampling-based loss function and the second function comprises a Gaussian mixture loss function with non-unit multi-variant covariance matrix; and obtaining from the trained model one or more voice characteristics of the speaker.Type: ApplicationFiled: October 27, 2020Publication date: February 11, 2021Inventors: Zhiming WANG, Kaisheng YAO, Xiaolong LI
-
Publication number: 20210027177Abstract: Implementations of the present specification provide a method and an apparatus for extending question and answer samples. According to the method, a random number is generated for each existing sample, a question is blurred for a sample whose random number belongs to sample extension random numbers, to generate an extended sample, so that an overall sample blurring extension rate can be effectively controlled. In addition, for a sample needing blurring extension, a question is extended by deleting a word with a predetermined part of speech in the corresponding question, and then an extended sample is generated based on an extended question, so that more question expression ways are compatible. As such, a question and answer model is trained by using a sample set to which extended samples are added, so that an answer can be provided to a user more effectively.Type: ApplicationFiled: March 13, 2020Publication date: January 28, 2021Inventors: Kaisheng Yao, Jiaxing Zhang, Jia Liu, Xiaolong Li
-
Patent number: 10867597Abstract: Technologies pertaining to slot filling are described herein. A deep neural network, a recurrent neural network, and/or a spatio-temporally deep neural network are configured to assign labels to words in a word sequence set forth in natural language. At least one label is a semantic label that is assigned to at least one word in the word sequence.Type: GrantFiled: September 2, 2013Date of Patent: December 15, 2020Assignee: Microsoft Technology Licensing, LLCInventors: Anoop Deoras, Kaisheng Yao, Xiaodong He, Li Deng, Geoffrey Gerson Zweig, Ruhi Sarikaya, Dong Yu, Mei-Yuh Hwang, Gregoire Mesnil
-
Publication number: 20200167388Abstract: Data storage and calling methods and devices are provided. One of the methods includes: receiving first motion data and business data; establishing an association relationship between the first motion data and the business data and storing the association relationship; receiving second motion data; and determining first motion data that matches the second motion data, and returning, to a sender of the second motion data, business data associated with the matched first motion data.Type: ApplicationFiled: January 28, 2020Publication date: May 28, 2020Inventors: Kaisheng YAO, Peng Xu, Yuan Qi, Xiaofu Chang
-
Publication number: 20190294632Abstract: Data storage and calling methods and devices are provided. One of the methods includes: receiving first motion data and business data; establishing an association relationship between the first motion data and the business data and storing the association relationship; receiving second motion data; and determining first motion data that matches the second motion data, and returning, to a sender of the second motion data, business data associated with the matched first motion data.Type: ApplicationFiled: June 12, 2019Publication date: September 26, 2019Inventors: Kaisheng YAO, Peng Xu, Yuan Qi, Xiaofu Chang
-
Patent number: 10127901Abstract: The technology relates to converting text to speech utilizing recurrent neural networks (RNNs). The recurrent neural networks may be implemented as multiple modules for determining properties of the text. In embodiments, a part-of-speech RNN module, letter-to-sound RNN module, a linguistic prosody tagger RNN module, and a context awareness and semantic mining RNN module may all be utilized. The properties from the RNN modules are processed by a hyper-structure RNN module that determine the phonetic properties of the input text based on the outputs of the other RNN modules. The hyper-structure RNN module may generate a generation sequence that is capable of being converting to audible speech by a speech synthesizer. The generation sequence may also be optimized by a global optimization module prior to being synthesized into audible speech.Type: GrantFiled: June 13, 2014Date of Patent: November 13, 2018Assignee: Microsoft Technology Licensing, LLCInventors: Pei Zhao, Max Leung, Kaisheng Yao, Bo Yan, Sheng Zhao, Fileno A. Alleva
-
Patent number: 10089974Abstract: An example text-to-speech learning system performs a method for generating a pronunciation sequence conversion model. The method includes generating a first pronunciation sequence from a speech input of a training pair and generating a second pronunciation sequence from a text input of the training pair. The method also includes determining a pronunciation sequence difference between the first pronunciation sequence and the second pronunciation sequence; and generating a pronunciation sequence conversion model based on the pronunciation sequence difference. An example speech recognition learning system performs a method for generating a pronunciation sequence conversion model. The method includes extracting an audio signal vector from a speech input and applying an audio signal conversion model to the audio signal vector to generate a converted audio signal vector. The method also includes adapting an acoustic model based on the converted audio signal vector to generate an adapted acoustic model.Type: GrantFiled: March 31, 2016Date of Patent: October 2, 2018Assignee: Microsoft Technology Licensing, LLCInventors: Pei Zhao, Kaisheng Yao, Max Leung, Bo Yan
-
Publication number: 20170287465Abstract: An example text-to-speech learning system performs a method for generating a pronunciation sequence conversion model. The method includes generating a first pronunciation sequence from a speech input of a training pair and generating a second pronunciation sequence from a text input of the training pair. The method also includes determining a pronunciation sequence difference between the first pronunciation sequence and the second pronunciation sequence; and generating a pronunciation sequence conversion model based on the pronunciation sequence difference. An example speech recognition learning system performs a method for generating a pronunciation sequence conversion model. The method includes extracting an audio signal vector from a speech input and applying an audio signal conversion model to the audio signal vector to generate a converted audio signal vector. The method also includes adapting an acoustic model based on the converted audio signal vector to generate an adapted acoustic model.Type: ApplicationFiled: March 31, 2016Publication date: October 5, 2017Applicant: Microsoft Technology Licensing, LLCInventors: Pei Zhao, Kaisheng Yao, Max Leung, Bo Yan