Patents by Inventor Junyuan SHANG

Junyuan SHANG has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20250094713
    Abstract: A multimodal data generation method is provided. The method includes: inputting a query data sequence into a multimodal model, to obtain a plurality of tokens in a response data sequence, where a current token is generated through the following operations: inputting the query data sequence and a current response data sequence into the multimodal model, so that the multimodal model generates the current token based on the query data sequence and the current response data sequence, in response to determining that the current token belongs to a first data modality; or inputting the query data sequence and a current response data sequence into the multimodal model, so that the multimodal model denoises an initial token sequence based on the query data sequence and the current response data sequence, to generate a result token sequence, in response to determining that the current token belongs to a second data modality.
    Type: Application
    Filed: December 3, 2024
    Publication date: March 20, 2025
    Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.
    Inventors: Shuohuan WANG, Yekun CHAI, Siyu DING, Junyuan SHANG, Zhenyu ZHANG, Yu SUN, Hao TIAN, Hua WU, Haifeng WANG
  • Publication number: 20250094802
    Abstract: Provided is a model training method, a model reasoning method, an electronic device, and a storage medium, relating to the field of data processing, and especially to the technical fields of artificial intelligence, big data, deep learning and large models. The model training method includes: folding an initial token sequence for training a model based on a folding feature value for folding a token sequence to obtain at least a first token sequence subjected to the folding, wherein the initial token sequence represents a token sequence composed of T1 tokens, and the first token sequence has a sequence length less than that of the initial token sequence; and inputting at least the first token sequence into a preset model to train the preset model so as to obtain a target model.
    Type: Application
    Filed: December 2, 2024
    Publication date: March 20, 2025
    Inventors: Junyuan SHANG, Guoxia WANG, Yinqi YANG, Shuohuan WANG, Yu SUN
  • Publication number: 20250094534
    Abstract: A task execution method for a large model relates to fields of artificial intelligence, deep learning and large model technologies, and includes executing attention tasks in a task group to be fused using a target computing unit to obtain attention features, where the attention task corresponds to a weighted matrix to be fused, the weighted matrix to be fused is obtained by weighting a matrix to be fused using a weight; obtaining a processing result according to the attention features; determining a loss information according to the processing result; and weighting and fusing matrices to be fused using the target computing unit according to weights for the task group to be fused if the loss information converges, to obtain a fusion matrix for a target task group, where a target task in the target task group is executed by the target computing unit according to the fusion matrix.
    Type: Application
    Filed: December 4, 2024
    Publication date: March 20, 2025
    Inventors: Linhao ZHANG, Yilong CHEN, Junyuan SHANG, Yinqi YANG, Shuohuan WANG, Yu SUN
  • Publication number: 20250094806
    Abstract: Provided is a large language model training method, an electronic device and a storage medium, relating to the field of artificial intelligence technologies, and in particular, to the fields of deep learning, natural language processing and large model. The method includes: performing dimension reduction parameter fusion on a two-dimensional parameter matrix on each channel in each network layer in a first large language model, respectively, to obtain a second large language model; performing layer reduction parameter fusion on network layers in the second large language model based on a three-dimensional parameter matrix of each network layer in the second large language model to obtain a third large language model; and training the third large language model to obtain a target large language model under the condition that the target loss function determined based on the first and third large language models meets a preset first function condition.
    Type: Application
    Filed: December 3, 2024
    Publication date: March 20, 2025
    Inventors: Junyuan Shang, Yilong Chen, Zhenyu Zhang, Shuohuan Wang, Yu Sun, Hua Wu
  • Publication number: 20250061305
    Abstract: A training method, an inference method, a device, an apparatus, and a medium for a deep learning model are provided. A first model includes a plurality of first parameters, a second model comprises a plurality of second parameters, which is initialized to parameter values of a plurality of target parameters selected from the plurality of first parameters. The training method includes: determining a target loss for both the first model and the second model; adjusting parameter values, including: in response to determining that the target loss indicates that the parameter values of at least part of the target parameters need to be adjusted, synchronously adjusting the parameter values of the corresponding second parameters; and in response to determining that the target loss indicates that the parameter values of at least part of the second parameters need to be adjusted, synchronously adjusting the parameter values of the corresponding target parameters.
    Type: Application
    Filed: November 4, 2024
    Publication date: February 20, 2025
    Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.
    Inventors: Shuohuan WANG, Junyuan SHANG, Yinqi YANG, Guoxia WANG, Linhao ZHANG, Yu SUN, Hua WU, Haifeng WANG
  • Patent number: 12131728
    Abstract: The present application provides a method of training a natural language processing model, which relates to a field of artificial intelligence, and in particular to a field of natural language processing. A specific implementation scheme includes: performing a semantic learning for multi-tasks on an input text, so as to obtain a semantic feature for the multi-tasks, wherein the multi-tasks include a plurality of branch tasks; performing a feature learning for each branch task based on the semantic feature, so as to obtain a first output result for each branch task; calculating a loss for each branch task according to the first output result for the branch task; and adjusting a parameter of the natural language processing model according to the loss for each branch task. The present application further provides a method of processing a natural language, an electronic device, and a storage medium.
    Type: Grant
    Filed: May 31, 2022
    Date of Patent: October 29, 2024
    Assignee: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.
    Inventors: Siyu Ding, Chao Pang, Shuohuan Wang, Yanbin Zhao, Junyuan Shang, Yu Sun, Shikun Feng, Hao Tian, Hua Wu, Haifeng Wang
  • Publication number: 20230252354
    Abstract: A method for pre-training a language model includes: constructing a pre-training language data set, in which the pre-training language data set comprises unsupervised language data and supervised language data; generating a hierarchical multi-template and multi-task language data set based on the pre-training language data set; and pre-training the language model based on the hierarchical multi-template and multi-task language data set.
    Type: Application
    Filed: March 7, 2023
    Publication date: August 10, 2023
    Inventors: Junyuan SHANG, Shuohuan WANG, Siyu DING, Yanbin ZHAO, Chao PANG, Yu SUN, Hao TIAN, Hua WU, Haifeng WANG
  • Publication number: 20230040095
    Abstract: A method and apparatus for pre-training a model, a device, a storage medium, and a program product. An embodiment of the method includes: acquiring a sample natural language text; generating N types of prompt words based on the sample natural language text, where N is a positive integer; generating sample input data based on the sample natural language text and the N types of prompt words; and training an initial language model based on the sample input data, to obtain a pre-trained language model.
    Type: Application
    Filed: August 16, 2022
    Publication date: February 9, 2023
    Inventors: Junyuan SHANG, Shuohuan WANG, Siyu DING, Yanbin ZHAO, Chao PANG, Yu Sun
  • Publication number: 20220327290
    Abstract: There is provided a method of training a feature determination model, which relates to a field of deep learning and natural language processing. The method is implemented to include: determining, by a plurality of feature determination layers arranged in stages, a feature vector for each segment in a pre-training text; and pre-training the feature determination model according to the feature vector. A current stage feature vector is determined by a feature determination layer of a current stage according to a preceding segment feature vector determined for a preceding segment, and a preceding stage feature vector determined by a feature determination layer of a preceding stage. A method of training a feature determination model for a target task, a method of performing semantic analysis for a target task, an electronic device, and a computer storage medium are also provided.
    Type: Application
    Filed: June 29, 2022
    Publication date: October 13, 2022
    Inventors: Junyuan SHANG, Shuohuan WANG, Siyu DING
  • Publication number: 20220293092
    Abstract: The present application provides a method of training a natural language processing model, which relates to a field of artificial intelligence, and in particular to a field of natural language processing. A specific implementation scheme includes: performing a semantic learning for multi-tasks on an input text, so as to obtain a semantic feature for the multi-tasks, wherein the multi-tasks include a plurality of branch tasks; performing a feature learning for each branch task based on the semantic feature, so as to obtain a first output result for each branch task; calculating a loss for each branch task according to the first output result for the branch task; and adjusting a parameter of the natural language processing model according to the loss for each branch task. The present application further provides a method of processing a natural language, an electronic device, and a storage medium.
    Type: Application
    Filed: May 31, 2022
    Publication date: September 15, 2022
    Inventors: Siyu DING, Chao PANG, Shuohuan WANG, Yanbin ZHAO, Junyuan SHANG, Yu SUN, Shikun FENG, Hao TIAN, Hua WU, Haifeng WANG
  • Publication number: 20210312139
    Abstract: A method and apparatus of generating a semantic feature, a method and apparatus of training a model, an electronic device, and a storage medium are provided. The method of generating the semantic feature includes: segmenting a target document to obtain a segment sequence of the target document; generating a semantic feature of each document segment in the segment sequence of the target document by using a pre-trained bidirectional semantic encoding model; and acquiring the semantic feature of the target document based on the semantic feature of the each document segment in the segment sequence of the target document. The present disclosure further provides a method of training a bidirectional semantic encoding model.
    Type: Application
    Filed: June 22, 2021
    Publication date: October 7, 2021
    Inventors: Shuohuan WANG, Siyu DING, Junyuan SHANG, Yu SUN