Patents by Inventor Bolei HE

Bolei HE has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20250117668
    Abstract: A method for model training based on a large model includes: determining a first large model as a teacher model of a language model, and performing distillation learning on the language model based on the first large model; inputting a first prompt text into the language model, and obtaining a plurality of first response texts for the first prompt text output by the language model; determining a reference response text for the first prompt text from the plurality of first response texts; and training the language model based on the reference response text for the first prompt text.
    Type: Application
    Filed: December 19, 2024
    Publication date: April 10, 2025
    Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.
    Inventors: Xinran He, Xianwei Xue, Bolei He, Kunbin Chen, Jinchang Luo, Ruigao Li
  • Publication number: 20250117734
    Abstract: Method and apparatus for target business model generation and data processing based on large language model are disclosed, which relates to the field of artificial intelligence technology, specifically in the areas of intelligent office, big data, and large models. A method for generating a target business model based on large language model includes: performing knowledge distillation on at least two pre-trained large models to obtain a base model of a target scenario, wherein each pre-trained model corresponds to one of at least two business types included in the target scenario; performing knowledge distillation on the base model to obtain a target business model of a target business type among the at least two business types, wherein the target business model is used for processing data of the target business type.
    Type: Application
    Filed: December 17, 2024
    Publication date: April 10, 2025
    Applicant: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.
    Inventor: Bolei HE
  • Publication number: 20250117714
    Abstract: A method for generating a text training sample based on a large model includes: obtaining at least two query clusters by clustering at least two queries; obtaining a first query from each query cluster; generating at least two second queries under a set theme through a first large model by taking the first query as an example; and generating a first text training sample for fine-tuning a second large model based on the second query.
    Type: Application
    Filed: December 19, 2024
    Publication date: April 10, 2025
    Applicant: BAIDU INTERNATIONAL TECHNOLOGY (SHENZHEN) CO., LTD.
    Inventors: Jinchang Luo, Bolei He, Kunbin Chen, Wei He
  • Publication number: 20250013676
    Abstract: A computer-implemented method for information processing based on a large language model is provided. The method includes obtaining query information provided by a user. The method further includes determining memory information related to the query information. The method further includes determining, based on the query information and the memory information, a tool for processing the query information. The method further includes invoking the tool to obtain auxiliary information. The method further includes generating, based on the query information and the auxiliary information, a result of processing the query information.
    Type: Application
    Filed: September 19, 2024
    Publication date: January 9, 2025
    Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.
    Inventors: Jinchang LUO, Bolei HE, Kunbin CHEN, Wei HE
  • Publication number: 20250013876
    Abstract: An apparatus for training a large language model includes: at least one sample text instruction is input into a target large language model to obtain at least one standard response text, and the at least one sample text instruction is input into a large language model to be trained to obtain at least one predicted response text. A first sample response text is determined from the at least one standard response text according to the score difference between a first quality score of a standard response text and a second quality score of a predicted response text. A first target training sample is generated according to the first sample response text and a sample text instruction corresponding to the first sample response text, and a training dataset is constructed according to the first target training sample.
    Type: Application
    Filed: September 19, 2024
    Publication date: January 9, 2025
    Inventors: Xianwei XUE, Qiutong PAN, Jinchang LUO, Bolei HE, Wei HE
  • Patent number: 12174824
    Abstract: A method for denoising click data includes: acquiring a set of click data including pieces of first click data and a real label corresponding to each piece of first click data; extracting feature vectors of each piece of first click data with a graph model; dividing the feature vectors into sets of feature vectors; obtaining trained binary classification models by training binary classification models with the sets of feature vectors; for each of the feature vectors, obtaining prediction values corresponding to the feature vector by predicting the feature vector with the trained binary classification models, and calculating a prediction label of the feature vector based on the prediction values of the feature vector; and removing noise data in the pieces of first click data, based on the pieces of first click data, the real label and the prediction label of each piece of first click data.
    Type: Grant
    Filed: December 29, 2022
    Date of Patent: December 24, 2024
    Assignee: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.
    Inventors: Wei Xu, Xiaoling Xia, Junxiang Jiang, Chengtai Cao, Bolei He, Kunbin Chen, Wei He
  • Patent number: 11797607
    Abstract: Embodiments of the present disclosure disclose a method and apparatus for constructing a quality evaluation model, an electronic device and a computer-readable storage medium. A specific implementation mode of the method comprises: acquiring samples of knowledge contents; extracting statistical features, semantic features, and image features respectively from the samples of knowledge contents; and constructing a quality evaluation model for knowledge according to the statistical features, the semantic features, and the image features. On the basis of the prior art, this implementation mode additionally uses semantic features and image features of knowledge contents to construct a more accurate quality evaluation model based on multi-dimensional features that characterize the actual quality of a knowledge, which may well discover some brief but very useful summary knowledge in an enterprise and may recommend high-quality knowledge more accurately for employees in the enterprise.
    Type: Grant
    Filed: March 24, 2021
    Date of Patent: October 24, 2023
    Assignee: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD.
    Inventors: Huan Liu, Mingquan Cheng, Kunbin Chen, Zhun Liu, Bolei He, Wei He
  • Publication number: 20230132618
    Abstract: A method for denoising click data includes: acquiring a set of click data including pieces of first click data and a real label corresponding to each piece of first click data; extracting feature vectors of each piece of first click data with a graph model; dividing the feature vectors into sets of feature vectors; obtaining trained binary classification models by training binary classification models with the sets of feature vectors; for each of the feature vectors, obtaining prediction values corresponding to the feature vector by predicting the feature vector with the trained binary classification models, and calculating a prediction label of the feature vector based on the prediction values of the feature vector; and removing noise data in the pieces of first click data, based on the pieces of first click data, the real label and the prediction label of each piece of first click data.
    Type: Application
    Filed: December 29, 2022
    Publication date: May 4, 2023
    Inventors: Wei XU, Xiaoling XIA, Junxiang JIANG, Chengtai CAO, Bolei HE, Kunbin CHEN, Wei HE
  • Patent number: 11537792
    Abstract: The present disclosure provides a pre-training method for a sentiment analysis model and an electronic device, which relates to a field of artificial intelligence technologies. The method includes: based on a given seed sentiment dictionary, performing sentimental knowledge detection on a training corpus in a training corpus set, and determining a detection sentiment word and a detection word pair of the training corpus; according to preset mask processing rules, performing mask process on the training corpus to generate a masked corpus; performing encoding and decoding on the masked corpus by using a preset encoder and decoder to determine the detection sentiment word and the detection word pair of the training corpus; and updating the preset encoder and decoder according to a difference between prediction sentiment word and the detection sentiment word, and a difference between prediction word pair and the detection word pair.
    Type: Grant
    Filed: July 21, 2020
    Date of Patent: December 27, 2022
    Assignee: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD.
    Inventors: Can Gao, Hao Liu, Bolei He, Xinyan Xiao, Hao Tian
  • Patent number: 11507751
    Abstract: The present disclosure discloses a comment information processing method and apparatus, and a medium. The specific implementation solution is: in response to a user operation, determining an opinion category corresponding to each opinion phrase in a comment opinion dictionary; obtaining a target corpus matching each opinion phrase from a plurality of comment corpora; for each opinion phrase, using a corresponding opinion category to label the target corpus matching each opinion phrase to obtain a first training sample; and training a classification model with the first training sample to identify the opinion category of a comment by using a trained classification model.
    Type: Grant
    Filed: July 24, 2020
    Date of Patent: November 22, 2022
    Assignee: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD.
    Inventors: Hao Liu, Bolei He, Xinyan Xiao
  • Patent number: 11508153
    Abstract: A method for generating a tag of a video, an electronic device, and a storage medium are related to a field of natural language processing and deep learning technologies. The detailed implementing solution includes: obtaining multiple candidate tags and video information of the video; determining first correlation information between the video information and each of the multiple candidate tags; sorting the multiple candidate tags based on the first correlation information to obtain a sort result; and generating the tag of the video based on the sort result.
    Type: Grant
    Filed: December 8, 2020
    Date of Patent: November 22, 2022
    Assignee: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD.
    Inventors: Chengxiang Liu, Hao Liu, Bolei He
  • Publication number: 20220365941
    Abstract: The disclosure provides a method for searching an instant messaging object, an electronic device and a storage medium. The method includes: receiving a search request of a first object, and determining a type of the search request; obtaining at least one recall set of the first object based on a client-side search engine in an instant messaging system in response to the type of the search request being a first type; obtaining at least one candidate object corresponding to a search keyword in the search request based on the search keyword and the at least one recall set; obtaining feature information of each candidate object; and responding to the search request by sorting the at least one candidate object based on the feature information.
    Type: Application
    Filed: July 11, 2022
    Publication date: November 17, 2022
    Applicant: Beijing Baidu Netcom Science Technology Co., Ltd.
    Inventors: Qiutong Pan, Ruigao Li, Yanan Li, Bolei He
  • Publication number: 20220286416
    Abstract: A method for generating an account intimacy includes: obtaining a set of accounts in an instant messaging (IM) group; obtaining a communication frequency between two accounts in the set of accounts within a preset time period; generating a communication network graph based on the communication frequency; obtaining an embedding vector of each account output by a graph model, in which the graph model is trained based on the communication network graph; and generating an intimacy between two accounts based on the embedding vectors of the two accounts.
    Type: Application
    Filed: May 25, 2022
    Publication date: September 8, 2022
    Inventors: Shijie CAO, Yanan LI, Bolei HE, Kunbin CHEN, Wei HE, Feng HE
  • Patent number: 11341366
    Abstract: A cross-modality processing method is related to a field of natural language processing technologies. The method includes: obtaining a sample set, wherein the sample set includes a plurality of corpus and a plurality of images; generating a plurality of training samples according to the sample set, in which each of the plurality of the training samples is a combination of at least one of the plurality of the corpus and at least one of the plurality of the images corresponding to the at least one of the plurality of the corpus; adopting the plurality of the training samples to train a semantic model, so that the semantic model learns semantic vectors containing combinations of the corpus and the images.
    Type: Grant
    Filed: August 10, 2020
    Date of Patent: May 24, 2022
    Assignee: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD.
    Inventors: Guocheng Niu, Bolei He, Xinyan Xiao
  • Publication number: 20220121668
    Abstract: The present disclosure provides a method of recommending a document, an electronic device, and a storage medium, relating to fields of intelligent recommendation, deep learning etc. The method of recommending a document includes: acquiring a document operated by a user, as a reference document; determining, from a plurality of initial documents, at least one candidate document for the reference document, wherein a document content of each candidate document is associated with a document content of the reference document, based on preset knowledge system data; and recommending a target document in the at least one candidate document to the user, the target document including a document that the user is currently interested in and a document that the user is interested in after a preset time period.
    Type: Application
    Filed: December 29, 2021
    Publication date: April 21, 2022
    Inventors: Wei XU, Xiaoling XIA, Bolei HE, Kunbin CHEN, Zhun LIU, Wei HE
  • Patent number: 11216504
    Abstract: A document recommendation method based on a semantic tag and a document recommendation device. The method includes: for each document, acquiring a first candidate tag set corresponding to the document, and processing each first candidate tag in the first candidate tag set corresponding to the document to obtain a second candidate tag set corresponding to the document; performing normalization processing on each second candidate tag in the second candidate tag set corresponding to the document to obtain a third candidate tag set corresponding to the document; performing expanding process on each third candidate tag in the third candidate tag set corresponding to the document, and acquiring a fourth candidate tag set corresponding to the document, to form a document library having semantic tags; and recommending a target document obtained from the document library having semantic tags to the user, according to historical semantic tag.
    Type: Grant
    Filed: December 6, 2019
    Date of Patent: January 4, 2022
    Assignee: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD.
    Inventors: Guocheng Niu, Bolei He, Chengxiang Liu, Xinyan Xiao, Yajuan Lyu
  • Publication number: 20210383121
    Abstract: A method for generating a tag of a video, an electronic device, and a storage medium are related to a field of natural language processing and deep learning technologies. The detailed implementing solution includes: obtaining multiple candidate tags and video information of the video; determining first correlation information between the video information and each of the multiple candidate tags; sorting the multiple candidate tags based on the first correlation information to obtain a sort result; and generating the tag of the video based on the sort result.
    Type: Application
    Filed: December 8, 2020
    Publication date: December 9, 2021
    Inventors: Chengxiang LIU, Hao LIU, Bolei HE
  • Publication number: 20210374195
    Abstract: The present disclosure provides an information processing method, an electronic device and a computer storage medium, and relates to a field of information processing. The method includes: obtaining a first content based on a first search keyword indicating a first event and a second search keyword indicating an object related to the first event; obtaining information associated with an attribute of the object from the first content; obtaining a second content based on the first search keyword and a third search keyword indicating a result at least caused by the first event; and generating statistical data associated with the first event based on the information and the second content.
    Type: Application
    Filed: November 18, 2020
    Publication date: December 2, 2021
    Inventors: Lei CHEN, Bolei HE, Kai LIU, Lei HAN, Ke SUN
  • Publication number: 20210303921
    Abstract: A cross-modality processing method is related to a field of natural language processing technologies. The method includes: obtaining a sample set, wherein the sample set includes a plurality of corpus and a plurality of images; generating a plurality of training samples according to the sample set, in which each of the plurality of the training samples is a combination of at least one of the plurality of the corpus and at least one of the plurality of the images corresponding to the at least one of the plurality of the corpus; adopting the plurality of the training samples to train a semantic model, so that the semantic model learns semantic vectors containing combinations of the corpus and the images.
    Type: Application
    Filed: August 10, 2020
    Publication date: September 30, 2021
    Inventors: Guocheng NIU, Bolei HE, Xinyan XIAO
  • Publication number: 20210209421
    Abstract: Embodiments of the present disclosure disclose a method and apparatus for constructing a quality evaluation model, an electronic device and a computer-readable storage medium. A specific implementation mode of the method comprises: acquiring samples of knowledge contents; extracting statistical features, semantic features, and image features respectively from the samples of knowledge contents; and constructing a quality evaluation model for knowledge according to the statistical features, the semantic features, and the image features. On the basis of the prior art, this implementation mode additionally uses semantic features and image features of knowledge contents to construct a more accurate quality evaluation model based on multi-dimensional features that characterize the actual quality of a knowledge, which may well discover some brief but very useful summary knowledge in an enterprise and may recommend high-quality knowledge more accurately for employees in the enterprise.
    Type: Application
    Filed: March 24, 2021
    Publication date: July 8, 2021
    Inventors: Huan LIU, Mingquan CHENG, Kunbin CHEN, Zhun LIU, Bolei HE, Wei HE