Patents by Inventor Xiameng QIN

Xiameng QIN has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Method and apparatus for performing structured extraction on text, device and storage medium

Patent number: 12211304

Abstract: Embodiments of the present disclosure provide a method and apparatus for performing a structured extraction on a text, a device and a storage medium. The method may include: performing a text detection on an entity text image to obtain a position and content of a text line of the entity text image; extracting multivariate information of the text line based on the position and the content of the text line; performing a feature fusion on the multivariate information of the text line to obtain a multimodal fusion feature of the text line; performing category and relationship reasoning based on the multimodal fusion feature of the text line to obtain a category and a relationship probability matrix of the text line; and constructing structured information of the entity text image based on the category and the relationship probability matrix of the text line.

Type: Grant

Filed: March 12, 2021

Date of Patent: January 28, 2025

Assignee: Beijing Baidu Netcom Science and Technology Co., Ltd.

Inventors: Yulin Li, Xiameng Qin, Chengquan Zhang, Junyu Han, Errui Ding, Tian Wu, Haifeng Wang
METHOD OF TRAINING TEXT DETECTION MODEL, METHOD OF DETECTING TEXT, AND DEVICE

Publication number: 20240265718

Abstract: A method training a text detection model and a method of detecting a text. The training method includes: inputting a sample image into a text feature extraction sub-model of a text detection model to obtain a text feature of a text in the sample image, the sample image having a label indicating an actual position information and an actual category; inputting a predetermined text vector into a text encoding sub-model of the text detection model to obtain a text reference feature; inputting the text feature and the text reference feature into a decoding sub-model of the text detection model to obtain a text sequence vector; inputting the text sequence vector into an output sub-model of the text detection model to obtain a predicted position information and a predicted category; and training the text detection model based on the predicted and actual categories, the predicted and actual position information.

Type: Application

Filed: April 22, 2022

Publication date: August 8, 2024

Inventors: Xiaoqiang ZHANG, Xiameng QIN, Chengquan ZHANG, Kun YAO
Method and apparatus for processing image, device and storage medium

Patent number: 11881044

Abstract: A method and apparatus for processing an image, a device and a storage medium are provided. An implementation of the method includes: acquiring a template image, the template image including at least one region of interest; determining a first feature map corresponding to each region of interest in the template image; acquiring a target image; determining a second feature map of the target image; and determining at least one region of interest in the target image according to the first feature map and the second feature map.

Type: Grant

Filed: June 21, 2021

Date of Patent: January 23, 2024

Assignee: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD.

Inventors: Chengquan Zhang, Mengyi En, Ju Huang, Qunyi Xie, Xiameng Qin, Kun Yao, Junyu Han, Jingtuo Liu, Errui Ding
IMAGE-BASED INFORMATION EXTRACTION MODEL, METHOD, AND APPARATUS, DEVICE, AND STORAGE MEDIUM

Publication number: 20240021000

Abstract: There is provided an image-based information extraction model, method, and apparatus, a device, and a storage medium, which relates to the field of artificial intelligence (AI) technologies, specifically to fields of deep learning, image processing, computer vision technologies, and is applicable to optical character recognition (OCR) and other scenarios. A specific implementation solution involves: acquiring a to-be-extracted first image and a category of to-be-extracted information; and inputting the first image and the category into a pre-trained information extraction model to perform information extraction on the first image to obtain text information corresponding to the category.

Type: Application

Filed: February 23, 2023

Publication date: January 18, 2024

Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

Inventors: Xiameng QIN, Yulin LI, Xiaoqiang ZHANG, Ju HUANG, Qunyi XIE, Kun YAO
Method, apparatus, device and storage medium for recognizing bill image

Patent number: 11854246

Abstract: A method, apparatus, device and storage medium for recognizing a bill image may include: performing text detection on a bill image, and determining an attribute information set and a relationship information set of each text box of at least two text boxes in the bill image; determining a type of the text box and an associated text box that has a structural relationship with the text box based on the attribute information set and the relationship information set of the text box; and extracting structured bill data of the bill image, based on the type of the text box and the associated text box that has the structural relationship with the text box.

Type: Grant

Filed: March 15, 2021

Date of Patent: December 26, 2023

Assignee: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD.

Inventors: Yulin Li, Ju Huang, Xiameng Qin, Junyu Han
METHOD FOR TRAINING IMAGE RECOGNITION MODEL, ELECTRONIC DEVICE AND STORAGE MEDIUM

Publication number: 20230401828

Abstract: A method for training an image recognition model includes: obtaining a training data set, in which the training data set includes first text images of each vertical category in a non-target scene and second text images of each vertical category in a target scene, and a type of text content involved in the first text images is the same as a type of text content involved in the second text image; training an initial recognition model by using the first text images, to obtain a basic recognition model; and modifying the basic recognition model by using the second text images, to obtain an image recognition model corresponding to the target scene.

Type: Application

Filed: April 8, 2022

Publication date: December 14, 2023

Inventors: Meina QIAO, Shanshan LIU, Xiameng QIN, Chengquan ZHANG, Kun YAO
Method and apparatus for visual question answering, computer device and medium

Patent number: 11775574

Abstract: A method for visual question answering, a computer device implementing the method and a medium for storing instructions on performing the method are provided. The method includes: acquiring an input image and an input question; constructing a visual graph based on the input image, wherein the visual graph comprises a first node feature and a first edge feature; constructing a question graph based on the input question, wherein the question graph comprises a second node feature and a second edge feature; performing a multimodal fusion on the visual graph and the question graph to obtain an updated visual graph and an updated question graph; determining a question feature based on the input question; determining a fusion feature based on the updated visual graph, the updated question graph and the question feature; and generating a predicted answer for the input image and the input question.

Type: Grant

Filed: February 23, 2021

Date of Patent: October 3, 2023

Assignee: Beijing Baidu Netcom Science Technology Co., Ltd.

Inventors: Yulin Li, Xiameng Qin, Ju Huang, Qunyi Xie, Junyu Han
Method and device for visual question answering, computer apparatus and medium

Patent number: 11768876

Abstract: The present disclosure provides a method for visual question answering, which relates to a field of computer vision and natural language processing. The method includes: acquiring an input image and an input question; constructing a Visual Graph based on the input image, wherein the Visual Graph comprises a Node Feature and an Edge Feature; updating the Node Feature by using the Node Feature and the Edge Feature to obtain an updated Visual Graph; determining a question feature based on the input question; fusing the updated Visual Graph and the question feature to obtain a fused feature; and generating a predicted answer for the input image and the input question based on the fused feature. The present disclosure further provides an apparatus for visual question answering, a computer device and a non-transitory computer-readable storage medium.

Type: Grant

Filed: January 28, 2021

Date of Patent: September 26, 2023

Assignee: Beijing Baidu Netcom Science Technology Co., Ltd.

Inventors: Xiameng Qin, Yulin Li, Qunyi Xie, Ju Huang, Junyu Han
Method and apparatus for correcting distorted document image

Patent number: 11756170

Abstract: Embodiments of the present disclosure provide a method and apparatus for correcting a distorted document image, where the method for correcting a distorted document image includes: obtaining a distorted document image; and inputting the distorted document image into a correction model, and obtaining a corrected image corresponding to the distorted document image; where the correction model is a model obtained by training with a set of image samples as inputs and a corrected image corresponding to each image sample in the set of image samples as an output, and the image samples are distorted. By inputting the distorted document image to be corrected into the correction model, the corrected image corresponding to the distorted document image can be obtained through the correction model, which realizes document image correction end-to-end, improves accuracy of the document image correction, and extends application scenarios of the document image correction.

Type: Grant

Filed: January 19, 2021

Date of Patent: September 12, 2023

Inventors: Qunyi Xie, Xiameng Qin, Yulin Li, Junyu Han, Shengxian Zhu
Method, apparatus and electronic device for annotating information of structured document

Patent number: 11687704

Abstract: Disclosed are a method, apparatus and electronic device for annotating information of a structured document. A specific implementation is: obtaining a template image of a structured document and at least one piece of annotation information of a field to be filled in the template image, where the annotation information includes attribute value and historical content of the field to be filled, and historical position of the field to be filled in the template image; generating, according to the attribute value of the field to be filled, the historical content of the field to be filled and the historical position of the field to be filled in the template image, target filling information of the field to be filled; obtaining, according to the target filling information of the field to be filled, an image of an annotated structured document.

Type: Grant

Filed: March 19, 2021

Date of Patent: June 27, 2023

Assignee: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

Inventors: Qiaoyi Li, Xiangkai Huang, Yulin Li, Ju Huang, Xiameng Qin, Duohao Qin, Minghao Liu, Junyu Han
CHARACTER DETECTION METHOD AND APPARATUS , MODEL TRAINING METHOD AND APPARATUS, DEVICE AND STORAGE MEDIUM

Publication number: 20230196805

Abstract: The present disclosure provides a character detection method and apparatus, a model training method and apparatus, a device and a storage medium. The specific implementation is: acquiring a training sample, where the training sample includes a sample image and a marked image, and the marked image is an image obtained by marking a text instance in the sample image; inputting the sample image into a character detection model, to obtain segmented images and image types of the segmented images output by the character detection model, where the image type indicates that the segmented image includes a text instance, or the segmented image does not include a text instance; and adjusting a parameter of the character detection model according to the segmented images, the image types of the segmented images and the marked image.

Type: Application

Filed: February 13, 2023

Publication date: June 22, 2023

Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

Inventors: Ju HUANG, Xiaoqiang ZHANG, Xiameng QIN, Chengquan ZHANG, Kun YAO
METHOD OF PROCESSING TASK, ELECTRONIC DEVICE, AND STORAGE MEDIUM

Publication number: 20230134615

Abstract: A method of processing a task, an electronic device, and a storage medium are provided, which relate to a field of artificial intelligence, in particular to fields of deep learning and computer vision, and may be applied to OCR optical character recognition and other scenarios. The method includes: parsing labeled data to be processed according to a task type identification, to obtain task labeled data, a tag information of the task labeled data is matched with the task type identification, and the task labeled data includes first task labeled data and second task labeled data; training a model using the first task labeled data, to obtain candidate models, the model is determined according to the task type identification; and determining a target model from the candidate models according to a performance evaluation result obtained by performing performance evaluation on the plurality of candidate models using the second task labeled data.

Type: Application

Filed: December 27, 2022

Publication date: May 4, 2023

Inventors: Qunyi XIE, Dongdong ZHANG, Xiameng QIN, Mengyi EN, Yangliu XU, Yi CHEN, Ju HUANG, Kun YAO
TEXT EXTRACTION METHOD, TEXT EXTRACTION MODEL TRAINING METHOD, ELECTRONIC DEVICE AND STORAGE MEDIUM

Publication number: 20230106873

Abstract: A text extraction method and a text extraction model training method are provided. The present disclosure relates to the technical field of artificial intelligence, in particular to the technical field of computer vision. An implementation of the method comprises: obtaining a visual encoding feature of a to-be-detected image; extracting a plurality of sets of multimodal features from the to-be-detected image, wherein each set of multimodal features includes position information of one detection frame extracted from the to-be-detected image, a detection feature in the detection frame and first text information in the detection frame; and obtaining second text information matched with a to-be-extracted attribute based on the visual encoding feature, the to-be-extracted attribute and the plurality of sets of multimodal features, wherein the to-be-extracted attribute is an attribute of text information needing to be extracted.

Type: Application

Filed: November 28, 2022

Publication date: April 6, 2023

Inventors: Xiameng QIN, Xiaoqiang ZHANG, Ju HUANG, Yulin LI, Qunyi XIE, Kun YAO, Junyu HAN
METHOD AND PLATFORM OF GENERATING DOCUMENT, ELECTRONIC DEVICE AND STORAGE MEDIUM

Publication number: 20230048495

Abstract: A method and a platform of generating a document, an electronic device, and a storage medium are provided, which relate to a field of an artificial intelligence technology, in particular to fields of computer vision and deep learning technologies, and may be applied to a text recognition scenario and other scenarios. The method includes: performing a category recognition on a document picture to obtain a target category result; determining a target structured model matched with the target category result; and performing, by using the target structured model, a structure recognition on the document picture to obtain a structure recognition result, so as to generate an electronic document based on the structure recognition result, wherein the structure recognition result includes a field attribute recognition result and a field position recognition result.

Type: Application

Filed: October 26, 2022

Publication date: February 16, 2023

Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

Inventors: Qunyi XIE, Xiameng QIN, Mengyi EN, Dongdong ZHANG, Ju HUANG, Yangliu XU, Yi CHEN, Kun YAO
METHOD FOR TRAINING MODEL, DEVICE, AND STORAGE MEDIUM

Publication number: 20230042234

Abstract: A method for training a model includes: obtaining a scene image, second actual characters in the scene image and a second construct image; obtaining first features and first recognition characters of characters obtained by performing character recognition on the scene image using the model to be trained; obtaining second features of characters obtained by performing character recognition on the second construct image using the training auxiliary model; and obtaining a character recognition model by adjusting model parameters of the model to be trained based on the first recognition characters, the second actual characters, the first features and the second features.

Type: Application

Filed: October 24, 2022

Publication date: February 9, 2023

Inventors: Yangliu XU, Qunyi Xie, Yi Chen, Xiameng Qin, Chengquan Zhang, Kun Yao
METHOD FOR TRAINING TEXT POSITIONING MODEL AND METHOD FOR TEXT POSITIONING

Publication number: 20220392242

Abstract: A method for training a text positioning model includes: obtaining a sample image, where the sample image contains a sample text to be positioned and a text marking box for the sample text; inputting the sample image into a text positioning model to be trained to position the sample text, and outputting a prediction text box for the sample image; obtaining a sample prior anchor box corresponding to the sample image; and adjusting model parameters of the text positioning model based on the sample prior anchor box, the text marking box and the prediction text box, and continuing training the adjusted text positioning model based on a next sample image until model training is completed, to generate a target text positioning model.

Type: Application

Filed: August 15, 2022

Publication date: December 8, 2022

Inventors: Ju HUANG, Yulin LI, Peng WANG, Qunyi XIE, Xiameng QIN, Kun YAO
IMAGE PROCESSING METHOD, ELECTRONIC DEVICE AND STORAGE MEDIUM

Publication number: 20220253631

Abstract: The present disclosure discloses an image processing method, an electronic device and a storage medium, and relates to the field of artificial intelligence technologies, and particularly to the fields of computer vision technologies, deep learning technologies, or the like. The image processing method includes: acquiring a multi-modal feature of each of at least one text region in an image, the multi-modal feature including features in plural dimensions; performing a global attention processing operation on the multi-modal feature of each text region to obtain a global attention feature of each text region; determining a category of each text region based on the global attention feature of each text region; and constructing structured information based on text content and the category of each text region.

Type: Application

Filed: October 14, 2021

Publication date: August 11, 2022

Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

Inventors: Yulin LI, Ju HUANG, Qunyi XIE, Xiameng QIN, Chengquan ZHANG, Jingtuo LIU
METHOD AND APPARATUS FOR EXTRACTING INFORMATION ABOUT A NEGOTIABLE INSTRUMENT, ELECTRONIC DEVICE AND STORAGE MEDIUM

Publication number: 20220148324

Abstract: Provided are a method and apparatus for extracting information about a negotiable instrument, an electronic device and a storage medium. The method includes inputting a to-be-recognized negotiable instrument into a pretrained deep learning network and obtaining a visual image corresponding to the to-be-recognized negotiable instrument through the deep learning network; matching the visual image corresponding to the to-be-recognized negotiable instrument with a visual image corresponding to each negotiable-instrument template in a preconstructed base template library; and in response to the visual image corresponding to the to-be-recognized negotiable instrument successfully matching a visual image corresponding to one negotiable-instrument template in the base template library, extracting structured information of the to-be-recognized negotiable instrument by using the negotiable-instrument template.

Type: Application

Filed: January 21, 2022

Publication date: May 12, 2022

Applicant: Beijing Baidu Netcom Science Technology Co., Ltd.

Inventors: Xiameng QIN, Yulin Li, Ju Huang, Qunyi Xie, Chengquan Zhang, Kun Yao, Jingtuo Liu, Junyu Han
METHOD FOR CHARACTER RECOGNITION, ELECTRONIC DEVICE, AND STORAGE MEDIUM

Publication number: 20220101642

Abstract: The disclosure discloses a method for character recognition, an electronic device, and a storage medium. The technical solution includes: obtaining a test sample image and a test sample character both corresponding to a test task; performing fine-tuning on a trained meta-learning model based on the test sample image and the test sample character to obtain a test task model; obtaining a test image corresponding to the test task; and generating a test character corresponding to the test image by inputting the test image into the test task model.

Type: Application

Filed: December 8, 2021

Publication date: March 31, 2022

Inventors: Qunyi XIE, Yangliu XU, Xiameng QIN, Chengquan ZHANG
METHOD AND DEVICE FOR TRAINING IMAGE RECOGNITION MODEL, EQUIPMENT AND MEDIUM

Publication number: 20220092353

Abstract: A computer-implemented method includes: acquiring training data, the training data includes training images for a preset vertical type, and the training images include a first training image containing real data of the preset vertical type and a second training image containing virtual data of the preset vertical type ; building a basic model, the basic model includes a deep learning network, and the deep learning network is configured to recognize the training images to extract text data in the training image; and training the basic model by using the training data to obtain the image recognition model.

Type: Application

Filed: December 1, 2021

Publication date: March 24, 2022

Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

Inventors: Ruixue Liu, Xiameng Qin, Mengyi En, Kun Yao, Chengquan Zhang, Shengxian Zhu, Yunhao Li, Junyu Han, Hao Sun

1 2 next