Patents by Inventor Xiaoyin FU
Xiaoyin FU has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11900918Abstract: The present disclosure provides a method for training a linguistic model, related to fields of speech, natural language processing, deep learning technologies. A method includes: obtaining grammars corresponding to a plurality of sample texts and a slot value of a slot in each grammar by using semantic analysis; generating a grammar graph corresponding to each grammar based on the corresponding grammar and the slot value of the slot in the corresponding grammar; obtaining a weight of each grammar, a weight of each slot, and a weight of each slot value in each grammar graph based on the sample texts; determining at least one grammar frequency of each order based on the weight of each grammar, the weight of each slot, and the weight of each slot value in each grammar graph; and training the linguistic model based on the at least one grammar frequency of each order.Type: GrantFiled: October 19, 2021Date of Patent: February 13, 2024Assignee: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.Inventors: Liao Zhang, Zhengxiang Jiang, Xiaoyin Fu
-
Publication number: 20230410794Abstract: An audio recognition method, a method of training an audio recognition model, and an electronic device are provided, which relate to fields of artificial intelligence, speech recognition, deep learning and natural language processing technologies. The audio recognition method includes: truncating an audio feature of target audio data to obtain at least one first audio sequence feature corresponding to a predetermined duration; obtaining, according to a peak information of the audio feature, a peak sub-information corresponding to the first audio sequence feature; performing at least one decoding operation on the first audio sequence feature to obtain a recognition result for the first audio sequence feature, a number of times the decoding operation is performed being identical to a number of peaks corresponding to the first audio sequence feature; obtaining target text data for the target audio data according to the recognition result for the at least one first audio sequence feature.Type: ApplicationFiled: August 25, 2023Publication date: December 21, 2023Inventors: Xiaoyin FU, Mingshun YANG, Qiguang ZANG, Zhijie CHEN, Yangkai XU, Guibin WANG, Lei JIA
-
Patent number: 11756529Abstract: Proposed are a method and apparatus for speech recognition, and a storage medium. The specific solution includes: obtaining audio data to be recognized; decoding the audio data to obtain a first syllable of a to-be-converted word, in which the first syllable is a combination of at least one phoneme corresponding to the to-be-converted word; obtaining a sentence to which the to-be-converted word belongs and a converted word in the sentence, and obtaining a second syllable of the converted word; encoding the first syllable and the second syllable to generate first encoding information of the first syllable; and decoding the first encoding information to obtain a text corresponding to the to-be-converted word.Type: GrantFiled: December 16, 2020Date of Patent: September 12, 2023Assignee: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD.Inventors: Liao Zhang, Xiaoyin Fu, Zhengxiang Jiang, Mingxin Liang, Junyao Shao, Qi Zhang, Zhijie Chen, Qiguang Zang
-
Publication number: 20230090590Abstract: The present disclosure provides speech recognition and codec methods and apparatuses, an electronic device and a storage medium, and relates to the field of artificial intelligence such as intelligent speech, deep learning and natural language processing. The speech recognition method may include: acquiring an audio feature of to-be-recognized speech; encoding the audio feature to obtain an encoding feature; truncating the encoding feature to obtain continuous N feature fragments, N being a positive integer greater than one; and acquiring, for any one of the feature segments, corresponding historical feature abstraction information, encoding the feature segment in combination with the historical feature abstraction information, and decoding an encoding result to obtain a recognition result corresponding to the feature segment, wherein the historical feature abstraction information is information obtained by feature abstraction of recognized historical feature fragments.Type: ApplicationFiled: May 6, 2022Publication date: March 23, 2023Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.Inventors: Xiaoyin FU, Zhijie CHEN, Mingxin LIANG, Mingshun YANG, Lei JIA, Haifeng WANG
-
Publication number: 20230058437Abstract: The present disclosure provides a method for a human-computer interaction, an apparatus for a human-computer interaction, a device, and a storage medium, and the present disclosure relates to the field of artificial intelligence, such as deep learning and voice. A specific implementation includes: acquiring a voice command; performing voice recognition on the voice command to determine a corresponding voice text; sending, in response to satisfying a preset information sending condition, the voice text to a cloud; receiving a resource for the voice command returned from the cloud; and responding to the voice command based on the resource.Type: ApplicationFiled: March 28, 2022Publication date: February 23, 2023Inventors: Zhen WU, Jiaxiang GE, Xiao WANG, Xianze SU, Bing LIU, Jiawei WANG, Dan WANG, Song YANG, Jinghao HAO, Yufang WU, Qin QU, Bingqi ZHANG, Xiaoyin FU, Siyuan WU, Chao LI, Cong GAO, Lei JIA
-
Publication number: 20220328040Abstract: The present disclosure discloses a speech recognition method and apparatus, and relates to the field of speech and deep learning technologies. A specific implementation scheme involves: acquiring candidate recognition results with first N recognition scores outputted by a speech recognition model for to-be-recognized speech, N being a positive integer greater than 1; scoring the N candidate recognition results based on pronunciation similarities between candidate recognition results and pre-collected popular entities, to obtain similarity scores of the candidate recognition results; and integrating the recognition scores and the similarity scores of the candidate recognition results to determine a recognition result corresponding to the to-be-recognized speech from the N candidate recognition results. The present disclosure can improve recognition accuracy.Type: ApplicationFiled: March 2, 2022Publication date: October 13, 2022Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.Inventors: Liao ZHANG, Yinlou ZHAO, Zhengxiang JIANG, Xiaoyin FU, Wei WEI
-
Publication number: 20220310064Abstract: A method for training a speech recognition model, a device and a storage medium, which relate to the field of computer technologies, and particularly to the fields of speech recognition technologies, deep learning technologies, or the like, are disclosed. The method for training a speech recognition model includes: obtaining a fusion probability of each of at least one candidate text corresponding to a speech based on an acoustic decoding model and a language model; selecting a preset number of one or more candidate texts based on the fusion probability of each of the at least one candidate text, and determining a predicted text based on the preset number of one or more candidate texts; and obtaining a loss function based on the predicted text and a standard text corresponding to the speech, and training the speech recognition model based on the loss function.Type: ApplicationFiled: January 10, 2022Publication date: September 29, 2022Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.Inventors: Junyao SHAO, Xiaoyin FU, Qiguang ZANG, Zhijie CHEN, Mingxin LIANG, Huanxin ZHENG, Sheng QIAN
-
Publication number: 20220207427Abstract: A method for training a data processing model includes: acquiring sample data; acquiring an initial data processing model, the initial data processing model including a plurality of forward nodes for outputting a plurality of intermediate results corresponding to the sample data; determining a plurality of time-dependent features corresponding to the plurality of forward nodes; acquiring a data processing model to be trained by processing the initial data processing model based on the plurality of time-dependent features; and training the data processing model to be trained using the sample data and the plurality of intermediate results.Type: ApplicationFiled: March 17, 2022Publication date: June 30, 2022Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.Inventors: Yangkai Xu, Guibin Wang, Xiaoyin Fu, Zhijie Chen, Mingshun Yang, Shijun Cong, Ming Jia, Lei Jia
-
Publication number: 20220108684Abstract: The present disclosure provides a method of recognizing speech offline, electronic device, and a storage medium, relating to a field of artificial intelligence such as speech recognition, natural language processing, and deep learning. The method may include: decoding speech data to be recognized into a syllable recognition result; transforming the syllable recognition result into a corresponding text as a speech recognition result of the speech data.Type: ApplicationFiled: December 16, 2021Publication date: April 7, 2022Inventors: Xiaoyin FU, Mingxin LIANG, Zhijie CHEN, Qiguang ZANG, Zhengxiang JIANG, Liao ZHANG, Qi ZHANG, Lei JIA
-
Publication number: 20220036879Abstract: A method for mining feature information, an apparatus for mining feature information and an electronic device are disclosed. The method includes: determining a usage scenario of a target device; obtaining raw audio data including real scenario data, speech synthesis data, recorded audio data and other media data; generating target audio data of the usage scenario by simulating the usage scenario based on the raw audio data; and obtaining feature information of the usage scenario by performing feature extraction on the target audio data.Type: ApplicationFiled: October 13, 2021Publication date: February 3, 2022Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.Inventors: Jiaxiang GE, Zhen WU, Maoren ZHOU, Qiguang ZANG, Ming WEN, Xiaoyin FU
-
Publication number: 20220036880Abstract: The present disclosure provides a method for training a linguistic model, related to fields of speech, natural language processing, deep learning technologies. A method includes: obtaining grammars corresponding to a plurality of sample texts and a slot value of a slot in each grammar by using semantic analysis; generating a grammar graph corresponding to each grammar based on the corresponding grammar and the slot value of the slot in the corresponding grammar; obtaining a weight of each grammar, a weight of each slot, and a weight of each slot value in each grammar graph based on the sample texts; determining at least one grammar frequency of each order based on the weight of each grammar, the weight of each slot, and the weight of each slot value in each grammar graph; and training the linguistic model based on the at least one grammar frequency of each order.Type: ApplicationFiled: October 19, 2021Publication date: February 3, 2022Applicant: Beijing Baidu Netcom Science Technology Co., Ltd.Inventors: Liao Zhang, Zhengxiang Jiang, Xiaoyin Fu
-
Publication number: 20220028376Abstract: The disclosure discloses a method for semantic recognition, an electronic device, and a storage medium. The detailed solution includes: obtaining a speech recognition result of a speech to be processed, in which the speech recognition result includes a newly added recognition result fragment and a historical recognition result fragment; obtaining a semantic vector of each historical object in the historical recognition result fragment, and obtaining a semantic vector of each newly added object by inputting the semantic vector of each historical object and each newly added object in the newly added recognition result fragment into a streaming semantic coding layer; and obtaining a semantic recognition result of the speech by inputting the semantic vector of each historical object and the semantic vector of each newly added object into a streaming semantic vector fusion layer and a semantic understanding multi-task layer sequentially arranged.Type: ApplicationFiled: October 13, 2021Publication date: January 27, 2022Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.Inventors: Yufang WU, Qin QU, Qibo WANG, Chengjian MAN, Qiguang ZANG, Xiaoyin FU
-
Patent number: 11216615Abstract: The disclosure provides a method, a device and a storage medium for predicting a punctuation in a text. The method includes: inputting a text to be predicted into a sequence tagging model to obtain at least one prediction result and a corresponding first score of each character in the text to be predicted; generating a text to be inputted corresponding to each of the at least one prediction result; obtaining a second score corresponding to each of the at least one prediction result; determining a punctuation existence situation of the corresponding character based on the first score and the second score corresponding to each of the at least one prediction result; and performing punctuation processing on the text to be predicted based on the punctuation existence situation of each character in the text to be predicted to obtain a punctuated text corresponding to the text to be predicted.Type: GrantFiled: September 29, 2020Date of Patent: January 4, 2022Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.Inventors: Mingxin Liang, Xiaoyin Fu
-
Publication number: 20210375264Abstract: Proposed are a method and apparatus for speech recognition, and a storage medium. The specific solution includes: obtaining audio data to be recognized; decoding the audio data to obtain a first syllable of a to-be-converted word, in which the first syllable is a combination of at least one phoneme corresponding to the to-be-converted word; obtaining a sentence to which the to-be-converted word belongs and a converted word in the sentence, and obtaining a second syllable of the converted word; encoding the first syllable and the second syllable to generate first encoding information of the first syllable; and decoding the first encoding information to obtain a text corresponding to the to-be-converted word.Type: ApplicationFiled: December 16, 2020Publication date: December 2, 2021Inventors: Liao ZHANG, Xiaoyin FU, Zhengxiang JIANG, Mingxin LIANG, Junyao SHAO, Qi ZHANG, Zhijie CHEN, Qiguang ZANG
-
Publication number: 20210224480Abstract: The disclosure provides a method, a device and a storage medium for predicting a punctuation in a text. The method includes: inputting a text to be predicted into a sequence tagging model to obtain at least one prediction result and a corresponding first score of each character in the text to be predicted; generating a text to be inputted corresponding to each of the at least one prediction result; obtaining a second score corresponding to each of the at least one prediction result; determining a punctuation existence situation of the corresponding character based on the first score and the second score corresponding to each of the at least one prediction result; and performing punctuation processing on the text to be predicted based on the punctuation existence situation of each character in the text to be predicted to obtain a punctuated text corresponding to the text to be predicted.Type: ApplicationFiled: September 29, 2020Publication date: July 22, 2021Inventors: Mingxin Liang, Xiaoyin Fu
-
Patent number: 10854193Abstract: Methods, apparatuses, devices and computer-readable storage media for real-time speech recognition are provided. The method includes: based on an input speech signal, obtaining truncating information for truncating a sequence of features of the speech signal; based on the truncating information, truncating the sequence of features into a plurality of subsequences; and for each subsequence in the plurality of subsequences, obtaining a real-time recognition result through attention mechanism.Type: GrantFiled: February 6, 2019Date of Patent: December 1, 2020Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.Inventors: Xiaoyin Fu, Jinfeng Bai, Zhijie Chen, Mingxin Liang, Xu Chen, Lei Jia
-
Publication number: 20200219486Abstract: Methods, apparatuses, devices and computer-readable storage media for real-time speech recognition are provided. The method includes: based on an input speech signal, obtaining truncating information for truncating a sequence of features of the speech signal; based on the truncating information, truncating the sequence of features into a plurality of subsequences; and for each subsequence in the plurality of subsequences, obtaining a real-time recognition result through attention mechanism.Type: ApplicationFiled: February 6, 2019Publication date: July 9, 2020Inventors: Xiaoyin FU, Jinfeng BAI, Zhijie CHEN, Mingxin LIANG, Xu CHEN, Lei JIA