Patents by Inventor Junyu Han

Junyu Han has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20230145443
    Abstract: Provided are a video stitching method and an apparatus, an electronic device, and a storage medium. In the video stitching method, an intermediate frame is inserted between a last image frame of a first video and a first image frame of a second video. L image frames are sequentially selected in order from back to front from the first video and L image frames are sequentially selected in order from front to back from the second video separately, and L is a natural number greater than 1. The first video and the second video are stitched together to form a target video according to the intermediate frame, the L image frames in the first video, and the L image frames in the second video.
    Type: Application
    Filed: October 4, 2022
    Publication date: May 11, 2023
    Inventors: Tianshu HU, Hanqi GUO, Junyu HAN, Zhibin HONG
  • Publication number: 20230123327
    Abstract: A method for recognizing text includes: obtaining an image sequence feature of an image to be recognized; obtaining a full text string of the image to be recognized by decoding the image sequence feature; obtaining a text sequence feature by performing a semantic enhancement process on the full text string, in which the image sequence feature, the full text string and the text sequence feature are of the same length; and determining text content of the image to be recognized based on the full text string and the text sequence feature.
    Type: Application
    Filed: December 19, 2022
    Publication date: April 20, 2023
    Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.
    Inventors: Chengquan Zhang, Pengyuan Lv, Kun Yao, Junyu Han, Jingtuo Liu
  • Publication number: 20230124389
    Abstract: A model determination method and electronic device is provided, and relates to the technical field of artificial intelligence and, in particular, to the field of computer visions and deep learning, and can be applied to image processing, image identification and other scenarios. A specific implementation solution includes an image sample and a text sample are acquired, wherein text data in the text sample is used for performing text description to target image data in the image sample; at least one image feature in the image sample is stored to a first queue, and at least text feature in the text sample is stored to a second queue; the first queue and the second queue are trained to obtain a first target model; and the first target model is determined as an initialization model for a second target model.
    Type: Application
    Filed: August 15, 2022
    Publication date: April 20, 2023
    Applicant: Beijing Baidu Netcom Science Technology Co., Ltd.
    Inventors: Longchao WANG, Yipeng SUN, Kun YAO, Junyu HAN, Jingtuo LIU, Errui DING
  • Publication number: 20230120985
    Abstract: A method for training a face recognition model includes: acquiring a plurality of first training images being uncovered face images, and acquiring a plurality of covering object images; generating a plurality of second training images by separately fusing the plurality of covering object images with the uncovered face images; and training the face recognition model by inputting the plurality of first training images and the plurality of second training images into the face recognition model.
    Type: Application
    Filed: December 16, 2022
    Publication date: April 20, 2023
    Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.
    Inventors: Yanwen Fan, Xiyu Yu, Gang Zhang, Jingtuo Liu, Haifeng Wang, Errui Ding, Junyu Han
  • Publication number: 20230106873
    Abstract: A text extraction method and a text extraction model training method are provided. The present disclosure relates to the technical field of artificial intelligence, in particular to the technical field of computer vision. An implementation of the method comprises: obtaining a visual encoding feature of a to-be-detected image; extracting a plurality of sets of multimodal features from the to-be-detected image, wherein each set of multimodal features includes position information of one detection frame extracted from the to-be-detected image, a detection feature in the detection frame and first text information in the detection frame; and obtaining second text information matched with a to-be-extracted attribute based on the visual encoding feature, the to-be-extracted attribute and the plurality of sets of multimodal features, wherein the to-be-extracted attribute is an attribute of text information needing to be extracted.
    Type: Application
    Filed: November 28, 2022
    Publication date: April 6, 2023
    Inventors: Xiameng QIN, Xiaoqiang ZHANG, Ju HUANG, Yulin LI, Qunyi XIE, Kun YAO, Junyu HAN
  • Publication number: 20230065675
    Abstract: A method of processing an image, a method of training a model, an electronic device and a medium, which relate to a field of artificial intelligence technology, in particular to deep learning, computer vision and other technical fields. A solution includes: generating a first face image, wherein a definition difference and an authenticity difference between the first face image and a reference face image are within a set range; adjusting, according to a target voice used to drive the first face image, a facial action information related to pronunciation in the first face image to generate a second face image with a facial tissue position conforming to a pronunciation rule of the target voice; and determining the second face image as a face image driven by the target voice.
    Type: Application
    Filed: November 8, 2022
    Publication date: March 2, 2023
    Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.
    Inventors: Tianshu Hu, Shengyi He, Junyu Han, Zhibin Hong
  • Publication number: 20230045715
    Abstract: The present disclosure provides a text detection method, a text recognition method and an apparatus, which relate to the field of artificial intelligence technology, in particular to the field of deep learning and computer vision technologies, and can be applied to scenarios such as optical character recognition. The text detection method is: acquiring an image feature of a text strip in a to-be-recognized image; performing visual enhancement processing on the to-be-recognized image to obtain an enhanced feature map of the to-be-recognized image; comparing the image feature of the text strip with the enhanced feature map for similarity to obtain a target bounding box of the text strip on the enhanced feature map.
    Type: Application
    Filed: October 14, 2022
    Publication date: February 9, 2023
    Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.
    Inventors: Chengquan ZHANG, Pengyuan LV, Sen FAN, Kun YAO, Junyu HAN, Jingtuo LIU
  • Publication number: 20230010031
    Abstract: A method for recognizing a text, an electronic device and a storage medium. An implementation of the method comprises: obtaining a multi-dimensional first feature map of a to-be-recognized image; performing, based on feature values in the first feature map, feature enhancement processing on each feature value in the first feature map; and performing a text recognition on the to-be-recognized image based on the first feature map after the enhancement processing.
    Type: Application
    Filed: September 16, 2022
    Publication date: January 12, 2023
    Inventors: Pengyuan LYU, Sen FAN, Xiaoyan WANG, Yuechen YU, Chengquan ZHANG, Kun YAO, Junyu HAN
  • Publication number: 20220415071
    Abstract: The present disclosure provides a training method of a text recognition model, a text recognition method, and an apparatus, relating to the technical field of artificial intelligence, and specifically, to the technical field of deep learning and computer vision, which can be applied in scenarios such as optional character recognition, etc. The specific implementation solution is: performing mask prediction on visual features of an acquired sample image, to obtain a predicted visual feature; performing mask prediction on semantic features of acquired sample text, to obtain a predicted semantic feature, where the sample image includes text; determining a first loss value of the text of the sample image according to the predicted visual feature; determining a second loss value of the sample text according to the predicted semantic feature; training, according to the first loss value and the second loss value, to obtain the text recognition model.
    Type: Application
    Filed: August 31, 2022
    Publication date: December 29, 2022
    Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.
    Inventors: Chengquan ZHANG, Pengyuan LV, Shanshan LIU, Meina QIAO, Yangliu XU, Liang WU, Jingtuo LIU, Junyu HAN, Errui DING, Jingdong WANG
  • Publication number: 20220392205
    Abstract: Embodiments of the present disclosure provide a method and apparatus for training an image recognition model based on a semantic enhancement, a method and apparatus for recognizing an image, an electronic device, and a computer readable storage medium. The method for training an image recognition model based on a semantic enhancement comprises: extracting, from an inputted first image being unannotated and having no textual description, a first feature representation of the first image; calculating a first loss function based on the first feature representation; extracting, from an inputted second image being unannotated and having an original textual description, a second feature representation of the second image; calculating a second loss function based on the second feature representation, and training an image recognition model based on a fusion of the first loss function and the second loss function.
    Type: Application
    Filed: August 22, 2022
    Publication date: December 8, 2022
    Inventors: Yipeng SUN, Rongqiao AN, Xiang WEI, Longchao WANG, Kun YAO, Junyu HAN, Jingtuo LIU, Errui DING
  • Publication number: 20220383626
    Abstract: An image processing method includes: obtaining a first categorical feature and M first image features corresponding to M first images respectively, each first image being associated with a task index, task indices associated with different first images being different from each other, M being a positive integer; fusing the M first image features with the first categorical feature respectively so as to obtain M first target features; performing feature extraction on the M first target features so as to obtain M second categorical features; selecting a second categorical feature corresponding to each task index from the M second categorical features, and performing regularization corresponding to the task index on the second categorical feature, to obtain a third categorical feature corresponding to the task index; and performing image processing in accordance with M third categorical features so as to obtain M first image processing results of the M first images.
    Type: Application
    Filed: August 8, 2022
    Publication date: December 1, 2022
    Inventors: Jian WANG, Junyu HAN, Jinwen CHEN, Lufei LIU
  • Patent number: 11482023
    Abstract: A method and apparatus for detecting text regions in an image, a device, and a medium are provided. The method may include: detecting, based on feature representation of an image, a first text region in the image, where the first text region covers a text in the image, a region occupied by the text being of a certain shape; determining, based on a feature block of the first text region, text geometry information associated with the text, where the text geometry information includes a text centerline of the text and distance information of the centerline from the upper and lower borders of the text; and adjusting, based on the text geometry information associated with the text, the first text region to a second text region, where the second text region also covers the text and is smaller than the first text region.
    Type: Grant
    Filed: December 11, 2019
    Date of Patent: October 25, 2022
    Assignee: Beijing Baidu Netcom Science and Technology Co., Ltd.
    Inventors: Chengquan Zhang, Zuming Huang, Mengyi En, Junyu Han, Errui Ding
  • Publication number: 20220292131
    Abstract: A method, apparatus and system for retrieving an image is provided, the method comprises: detecting, in response to receiving a query request comprising a target image, a target subject from the target image; extracting a subject feature from the target subject if a confidence level of a detection box of the detected target subject is greater than a first threshold, the subject feature comprising an identical feature, a similar feature and a category; performing matching on the subject feature of the target image and a subject feature of a candidate image pre-stored in a database, to obtain a similarity score and an identicalness score of the candidate image; and selecting, according to the similarity score and the identicalness score, a predetermined number of candidate images as a search result for output.
    Type: Application
    Filed: May 27, 2022
    Publication date: September 15, 2022
    Inventors: Ruibin BAI, Xiang WEI, Yipeng SUN, Kun YAO, Jingtuo LIU, Junyu HAN
  • Publication number: 20220148324
    Abstract: Provided are a method and apparatus for extracting information about a negotiable instrument, an electronic device and a storage medium. The method includes inputting a to-be-recognized negotiable instrument into a pretrained deep learning network and obtaining a visual image corresponding to the to-be-recognized negotiable instrument through the deep learning network; matching the visual image corresponding to the to-be-recognized negotiable instrument with a visual image corresponding to each negotiable-instrument template in a preconstructed base template library; and in response to the visual image corresponding to the to-be-recognized negotiable instrument successfully matching a visual image corresponding to one negotiable-instrument template in the base template library, extracting structured information of the to-be-recognized negotiable instrument by using the negotiable-instrument template.
    Type: Application
    Filed: January 21, 2022
    Publication date: May 12, 2022
    Applicant: Beijing Baidu Netcom Science Technology Co., Ltd.
    Inventors: Xiameng QIN, Yulin Li, Ju Huang, Qunyi Xie, Chengquan Zhang, Kun Yao, Jingtuo Liu, Junyu Han
  • Publication number: 20220139096
    Abstract: A character recognition method, a model training method, a related apparatus and an electronic device are provided. The specific solution is: obtaining a target picture; performing feature encoding on the target picture to obtain a visual feature of the target picture; performing feature mapping on the visual feature to obtain a first target feature of the target picture, where the first target feature is a feature that has a matching space with a feature of character semantic information of the target picture; inputting the first target feature into a character recognition model for character recognition to obtain a first character recognition result of the target picture.
    Type: Application
    Filed: January 19, 2022
    Publication date: May 5, 2022
    Applicant: Beijing Baidu Netcom Science Technology Co., Ltd.
    Inventors: Pengyuan Lv, Chengquan Zhang, Kun Yao, Junyu Han
  • Publication number: 20220092353
    Abstract: A computer-implemented method includes: acquiring training data, the training data includes training images for a preset vertical type, and the training images include a first training image containing real data of the preset vertical type and a second training image containing virtual data of the preset vertical type ; building a basic model, the basic model includes a deep learning network, and the deep learning network is configured to recognize the training images to extract text data in the training image; and training the basic model by using the training data to obtain the image recognition model.
    Type: Application
    Filed: December 1, 2021
    Publication date: March 24, 2022
    Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.
    Inventors: Ruixue Liu, Xiameng Qin, Mengyi En, Kun Yao, Chengquan Zhang, Shengxian Zhu, Yunhao Li, Junyu Han, Hao Sun
  • Publication number: 20220027611
    Abstract: Provided are an image classification method and apparatus, an electronic device and a storage medium, relating to the field of artificial intelligence and, in particular, to computer vision and deep learning. The method includes inputting a to-be-classified document image into a pretrained neural network and obtaining a feature submap of each text box of the to-be-classified document image by use of the neural network; inputting the feature submap of each text box, a semantic feature corresponding to preobtained text information of each text box and a position feature corresponding to preobtained position information of each text box into a pretrained multimodal feature fusion model and fusing, by use of the multimodal feature fusion model, the three into a multimodal feature corresponding to each text box; and classifying the to-be-classified document image based on the multimodal feature corresponding to each text box.
    Type: Application
    Filed: October 11, 2021
    Publication date: January 27, 2022
    Applicant: Beijing Baidu Netcom Science Technology Co., Ltd.
    Inventors: Yuechen YU, Chengquan ZHANG, Yulin LI, Xiaoqiang ZHANG, Ju HUANG, Xiameng QIN, Kun YAO, Jingtuo LIU, Junyu HAN, Errui DING
  • Publication number: 20220005244
    Abstract: The present disclosure relates to a field of artificial intelligence technology, in particular to a field of computer vision and deep learning technology, and more particularly, a method and an apparatus for changing a hairstyle of a character, a device, and a storage medium are provided. The method includes: determining an original feature vector of an original image containing the character, wherein the character in the original image has an original hairstyle; acquiring a boundary vector associated with the original hairstyle and a target hairstyle based on a hairstyle classification model; determining a target feature vector corresponding to the target hairstyle based on the original feature vector and the boundary vector; and generating a target image containing the character based on the target feature vector, wherein the character in the target image has the target hairstyle.
    Type: Application
    Filed: September 20, 2021
    Publication date: January 6, 2022
    Inventors: Zhizhi GUO, Borong LIANG, Zhibin HONG, Junyu HAN
  • Publication number: 20210406592
    Abstract: The present disclosure provides a method for visual question answering. The method includes: acquiring an input image and an input question; constructing a visual graph based on the input image, wherein the visual graph comprises a first node feature and a first edge feature; constructing a question graph based on the input question, wherein the question graph comprises a second node feature and a second edge feature; performing a multimodal fusion on the visual graph and the question graph to obtain an updated visual graph and an updated question graph; determining a question feature based on the input question; determining a fusion feature based on the updated visual graph, the updated question graph and the question feature; and generating a predicted answer for the input image and the input question. The present disclosure further provides an apparatus for visual question answering, a computer device and a medium.
    Type: Application
    Filed: February 23, 2021
    Publication date: December 30, 2021
    Inventors: Yulin LI, Xiameng QIN, Ju HUANG, Qunyi XIE, Junyu HAN
  • Publication number: 20210406468
    Abstract: The present disclosure provides a method for visual question answering, which relates to a field of computer vision and natural language processing. The method includes: acquiring an input image and an input question; constructing a Visual Graph based on the input image, wherein the Visual Graph comprises a Node Feature and an Edge Feature; updating the Node Feature by using the Node Feature and the Edge Feature to obtain an updated Visual Graph; determining a question feature based on the input question; fusing the updated Visual Graph and the question feature to obtain a fused feature; and generating a predicted answer for the input image and the input question based on the fused feature. The present disclosure further provides an apparatus for visual question answering, a computer device and a non-transitory computer-readable storage medium.
    Type: Application
    Filed: January 28, 2021
    Publication date: December 30, 2021
    Inventors: Xiameng QIN, Yulin LI, Qunyi XIE, Ju HUANG, Junyu HAN