Search Patents
  • Patent number: 10482879
    Abstract: The present invention provides a wake-on-voice method and device. The method includes: obtaining a voice inputted by a user; processing data frames of the voice with a frame skipping strategy and performing a voice activity detection on the data frames by a time-domain energy algorithm; extracting an acoustic feature of the voice and performing a voice recognition on the acoustic feature according to a preset recognition network and an acoustic model; and performing an operation corresponding to the voice if the voice is a preset wake-up word in the preset recognition network.
    Type: Grant
    Filed: October 27, 2016
    Date of Patent: November 19, 2019
    Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.
    Inventor: Liliang Tang
  • Patent number: 10621391
    Abstract: Provided are a method and an apparatus for acquiring a semantic fragment of a query based on artificial intelligence, and a terminal. The method includes pre-processing the query and determining a first main word and a semantic fragment set included in the query; determining an association degree between each semantic fragment in the semantic fragment set and the first main word according to historical retrieve data; filtering the semantic fragment set according to the association degree and determining an object semantic fragment set corresponding to the query.
    Type: Grant
    Filed: December 26, 2017
    Date of Patent: April 14, 2020
    Assignee: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD.
    Inventor: Yufang Wu
  • Patent number: 10854193
    Abstract: Methods, apparatuses, devices and computer-readable storage media for real-time speech recognition are provided. The method includes: based on an input speech signal, obtaining truncating information for truncating a sequence of features of the speech signal; based on the truncating information, truncating the sequence of features into a plurality of subsequences; and for each subsequence in the plurality of subsequences, obtaining a real-time recognition result through attention mechanism.
    Type: Grant
    Filed: February 6, 2019
    Date of Patent: December 1, 2020
    Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.
    Inventors: Xiaoyin Fu, Jinfeng Bai, Zhijie Chen, Mingxin Liang, Xu Chen, Lei Jia
  • Patent number: 10573294
    Abstract: Embodiments of the present disclosure provide a speech recognition method based on artificial intelligence, and a terminal. The method includes obtaining speech data to be recognized; performing a processing on the speech data to be recognized using a trained sub-band energy normalized acoustic model, to determine an normalized energy feature corresponding to each time-frequency unit in the speech data to be recognized; and determining text data corresponding to the speech data to be recognized according to the normalized energy feature corresponding to each time-frequency unit.
    Type: Grant
    Filed: December 29, 2017
    Date of Patent: February 25, 2020
    Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (GEIJING) CO., LTD.
    Inventors: Mingming Chen, Xiangang Li, Jue Sun
  • Patent number: 11355109
    Abstract: Embodiments of the present disclosure provide a method and apparatus for man-machine conversion, and an electronic device. The method includes: outputting question information to a user based on a first task of a first conversation scenario; judging, in response to receiving reply information returned by the user, whether to trigger a second conversation scenario based on the reply information; generating, in response to determining the second conversation scenario being triggered based on the reply information, response information corresponding to the reply information based on the second conversation scenario; and outputting the response information to the user.
    Type: Grant
    Filed: December 10, 2019
    Date of Patent: June 7, 2022
    Assignee: Beijing Baidu Netcom Science and Technology Co., Ltd.
    Inventors: Xuefeng Lou, Qingwei Huang, Weiwei Wang, Cheng Peng, Xiaojun Zhao
  • Patent number: 11264034
    Abstract: A voice identification method, device, apparatus, and a storage medium are provided. The method includes: receiving voice data; and performing a voice identification on the voice data, to obtain first text data associated with the voice data; determining common text data in a preset fixed data table, wherein a similarity between a pronunciation of the determined common text data and a pronunciation of the first text data meets a preset condition, wherein the determined common text data is a voice identification result with an occurrence number larger than a first preset threshold; and replacing the first text data with the determined common text data.
    Type: Grant
    Filed: February 26, 2020
    Date of Patent: March 1, 2022
    Assignee: Baidu Online Network Technology (Beijing) Co., Ltd
    Inventors: Ye Song, Long Zhang, Pengpeng Jie
  • Patent number: 11727927
    Abstract: Embodiments of the present disclosure disclose a view-based voice interaction method, an apparatus, a server, a terminal and a medium. The method includes: obtaining voice information of a user and voice-action description information of a voice-operable element in a currently displayed view on a terminal; obtaining operational intention of the user by performing semantic recognition on the voice information of the user according to view description information of the voice-operable element; locating a sequence of actions matched with the operational intention of the user in the voice-action list according to the voice-action description information; and delivering the sequence of actions to the terminal for performing.
    Type: Grant
    Filed: May 29, 2020
    Date of Patent: August 15, 2023
    Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.
    Inventors: Zhou Shen, Dai Tan, Sheng Lv, Kaifang Wu, Yudong Li
  • Patent number: 11360737
    Abstract: The present disclosure discloses a method and apparatus for providing a speech service. A specific embodiment of the method comprises: receiving request information sent by a device, the request information comprising first event information and speech information, the first event information used for indicating a first event occurring on the device when the device sends the request information, wherein the first event information comprises speech input event information used for instructing a user to input the speech information; generating response information comprising an operation instruction for a targeted device on the basis of the first event information and the speech information; and sending the response information to the targeted device for the targeted device to perform an operation indicated by the operation instruction. The embodiment improves the efficiency of providing a speech service.
    Type: Grant
    Filed: July 5, 2018
    Date of Patent: June 14, 2022
    Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD
    Inventors: Jianliang Zhou, Guanghao Shen, Ruisheng Wu
  • Patent number: 11348571
    Abstract: The present disclosure provides methods, computing devices, and storage media for generating a training corpus. The method includes: mining out pieces of data from user behavior logs associated with a target application, each piece of data including a first behavior log and a second behavior log, the first behavior log including a user speech and a corresponding speech recognition result, the second behavior log belonging to the same user as the first behavior log and time-dependent with the first behavior log; and determining the user speech and the corresponding speech recognition result in each piece of data as a positive feedback sample or a negative feedback sample, based on the first behavior log and the second behavior log.
    Type: Grant
    Filed: March 5, 2020
    Date of Patent: May 31, 2022
    Assignee: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD.
    Inventors: Shiqiang Ding, Jizhou Huang, Zhongwei Jiang, Wentao Ma
  • Patent number: 10867595
    Abstract: Described herein are systems and methods for generating natural language sentences with Sequence-to-sequence (Seq2Seq) models with attention. The Seq2Seq models may be implemented in applications, such as machine translation, image captioning, and speech recognition. Performance has further been improved by leveraging unlabeled data, often in the form of a language models. Disclosed herein are “Cold Fusion” architecture embodiments that leverage a pre-trained language model during training. The Seq2Seq models with Cold Fusion embodiments are able to better utilize language information enjoying faster convergence, better generalization, and almost complete transfer to a new domain while using less labeled training data.
    Type: Grant
    Filed: March 6, 2018
    Date of Patent: December 15, 2020
    Assignee: Baidu USA LLC
    Inventors: Anuroop Sriram, Heewoo Jun, Sanjeev Satheesh, Adam Coates
  • Patent number: 11087763
    Abstract: A voice recognition method is provided by embodiments of the present application. The method includes: obtaining a voice signal to be recognized; and recognizing a current frame in the voice signal using a pre-trained causal acoustic model, according to the current frame in the voice signal and a frame within a preset time period before the current frame, the causal acoustic model being derived based on a causal convolutional neural network training. In the method provided by the embodiments of the present application, only the information of the current frame and the frame before the current frame is used when performing the recognition of the current frame, thereby solving a problem in voice recognition technologies based on prior art convolutional neural network where a hard delay is created because there is a need to wait for the frames after the current frame, improving the timeliness of the voice recognition.
    Type: Grant
    Filed: December 28, 2018
    Date of Patent: August 10, 2021
    Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.
    Inventors: Chao Li, Weixin Zhu, Ming Wen
  • Patent number: 10616475
    Abstract: The present disclosure provides a photo-taking prompting method and apparatus, an apparatus and a non-volatile computer storage medium. On the one hand, a user's image information is collected while the user finds view, then the user's face posture information is obtained from the image information, then face posture information of a preset photo-taking template is compared with the user's face posture information, and the user is prompted to adjust the face posture according to a comparison result. The technical solutions provided by embodiments of the present disclosure may implement prompting the user's face posture adjustment while the user finds view and thereby implement providing guidance for the user's face posture, and solve the problem in the prior art about failure to perform photo-taking guidance while the user finds view.
    Type: Grant
    Filed: November 13, 2015
    Date of Patent: April 7, 2020
    Assignee: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD.
    Inventors: Fujian Wang, Fuguo Zhu, Errui Ding, Long Gong, Yafeng Deng
  • Patent number: 10332507
    Abstract: A method and a device for waking up via a speech based on artificial intelligence are provided in the present disclosure. The method includes: acquiring pronunciation information of a customized wake-up word; acquiring approximate pronunciation information of the pronunciation information; and constructing a network for identifying wake-up words according to a preset garbage word list, the pronunciation information and the approximate pronunciation information, identifying an input speech according to the network to acquire an identified result, and determining whether to perform a wake-up operation according to the identified result. With embodiments of the present disclosure, different networks for identifying the wake-up words may be constructed dynamically for different customized wake-up words, thus effectively improving an accuracy of waking up, reducing a false alarm rate, improving an efficiency of waking up, occupying less memory, and having low power consumption.
    Type: Grant
    Filed: June 15, 2017
    Date of Patent: June 25, 2019
    Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.
    Inventor: Liliang Tang
  • Patent number: 10482903
    Abstract: A method for selectively interacting with multi-devices is provided. The method includes the following steps: receiving identical voice information transmitted by a plurality of terminal devices respectively; performing voice recognition on the received voice information; calculating energy of a wake-up word in respective voice information; and comparing the energy of one wake-up word with another, and transmitting feedback information to the terminal devices according to an energy comparison result and a voice recognition result. By calculating the energy of the wake-up word in respective voice information transmitted by respective devices, the distances between respective device and a user can be distinguished. A unique response can be ensured by determining that the device closest to the user responds to the user's request, thus ensuring the user experience.
    Type: Grant
    Filed: December 26, 2017
    Date of Patent: November 19, 2019
    Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.
    Inventors: Sha Tao, Yonghui Zuo, Peng Wang, Guoguo Chen, Ji Zhou, Kaihua Zhu
  • Patent number: 11670015
    Abstract: Embodiments of the present disclosure provide a method and apparatus for generating a video. The method may include: acquiring a cartoon face image sequence of a target cartoon character from a received cartoon-style video, and generating a cartoon face contour figure sequence based on the cartoon face image sequence; generating a face image sequence for a real face based on the cartoon face contour figure sequence and a received initial face image of the real face, a face expression in the face image sequence matching a face expression in the cartoon face image sequence; generating a cartoon-style face image sequence for the real face according to the face image sequence; and replacing a face image of the target cartoon character in the cartoon-style video with a cartoon-style face image in a cartoon-style face image sequence, to generate a cartoon-style video corresponding to the real face.
    Type: Grant
    Filed: December 4, 2020
    Date of Patent: June 6, 2023
    Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.
    Inventors: Yunfeng Liu, Chao Wang, Yuanhang Li, Ting Yun, Guoqing Chen
  • Patent number: 11081108
    Abstract: Embodiments of the present disclosure disclose an interaction method and apparatus. A specific embodiment of the method includes: generating, in response to determining that a request input by a user satisfies a guiding condition, guiding information, and feeding back the guiding information to the user, the guiding condition including one of the following: associating with a plurality of query intents, or associating with no query intent; and generating, based on the request and a feedback input by the user corresponding to the guiding information, an intent-clear request, and feeding back push information bound with the intent-clear request to the user. Realizing that in the process of interacting with the user, for conditions such as the request input by the user is associated with a plurality of query intents or incompleteness, an intent-clear request associated with an explicit query intent is determined through the interaction with the user.
    Type: Grant
    Filed: June 28, 2019
    Date of Patent: August 3, 2021
    Assignees: Baidu Online Network Technology (Beijing) Co., Ltd., Shanghai Xiaodu Technology Co. Ltd.
    Inventors: Mengmeng Zhang, Zhongji Fan, Lei Shi, Li Wan, Qiang Ju, Chao Yin, Wei Shen, Jian Xie, Ran Xu, Jingya Wang
  • Patent number: 10325593
    Abstract: A method and a device for waking up via a speech based on artificial intelligence are provided in the present disclosure. The method includes: clustering phones to select garbage phones for representing the phones; constructing an alternative wake-up word approximate to a preset wake-up word according to the preset wake-up word; constructing a decoding network according to the garbage phones, the alternative wake-up word and the preset wake-up word; and waking up via the speech by using the decoding network. Due to the data size for the garbage phones is significantly smaller than the data size for the garbage words, a problem that the data size occupied is too large by using a garbage word model in the prior art is solved. Meanwhile, as a word is composed of several phones, the garbage phones may be more likely to cover all words than the garbage words. Thus, an accuracy of waking up is improved and a probability of false waking up is reduced.
    Type: Grant
    Filed: June 15, 2017
    Date of Patent: June 18, 2019
    Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.
    Inventor: Liliang Tang
  • Patent number: 11462209
    Abstract: For the problem of waveform synthesis from spectrograms, presented herein are embodiments of an efficient neural network architecture, based on transposed convolutions to achieve a high compute intensity and fast inference. In one or more embodiments, for training of the convolutional vocoder architecture, losses are used that are related to perceptual audio quality, as well as a GAN framework to guide with a critic that discerns unrealistic waveforms. While yielding a high-quality audio, embodiments of the model can achieve more than 500 times faster than real-time audio synthesis. Multi-head convolutional neural network (MCNN) embodiments for waveform synthesis from spectrograms are also disclosed. MCNN embodiments enable significantly better utilization of modern multi-core processors than commonly-used iterative algorithms like Griffin-Lim and yield very fast (more than 300× real-time) waveform synthesis.
    Type: Grant
    Filed: March 27, 2019
    Date of Patent: October 4, 2022
    Assignee: Baidu USA LLC
    Inventors: Sercan Arik, Hee Woo Jun, Eric Undersander, Gregory Diamos
  • Patent number: 10872598
    Abstract: Embodiments of a production-quality text-to-speech (TTS) system constructed from deep neural networks are described. System embodiments comprise five major building blocks: a segmentation model for locating phoneme boundaries, a grapheme-to-phoneme conversion model, a phoneme duration prediction model, a fundamental frequency prediction model, and an audio synthesis model. For embodiments of the segmentation model, phoneme boundary detection was performed with deep neural networks using Connectionist Temporal Classification (CTC) loss. For embodiments of the audio synthesis model, a variant of WaveNet was created that requires fewer parameters and trains faster than the original. By using a neural network for each component, system embodiments are simpler and more flexible than traditional TTS systems, where each component requires laborious feature engineering and extensive domain expertise. Inference with system embodiments may be performed faster than real time.
    Type: Grant
    Filed: January 29, 2018
    Date of Patent: December 22, 2020
    Assignee: Baidu USA LLC
    Inventors: Sercan O. Arik, Mike Chrzanowski, Adam Coates, Gregory Diamos, Andrew Gibiansky, John Miller, Andrew Ng, Jonathan Raiman, Shubhahrata Sengupta, Mohammad Shoeybi