Patents by Inventor Shengfu DONG

Shengfu DONG has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11397890
    Abstract: The present application discloses a cross-media retrieval method based on deep semantic space, which includes a feature generation stage and a semantic space learning stage. In the feature generation stage, a CNN visual feature vector and an LSTM language description vector of an image are generated by simulating a perception process of a person for the image; and topic information about a text is explored by using an LDA topic model, thus extracting an LDA text topic vector. In the semantic space learning phase, a training set image is trained to obtain a four-layer Multi-Sensory Fusion Deep Neural Network, and a training set text is trained to obtain a three-layer text semantic network, respectively. Finally, a test image and a text are respectively mapped to an isomorphic semantic space by using two networks, so as to realize cross-media retrieval. The disclosed method can significantly improve the performance of cross-media retrieval.
    Type: Grant
    Filed: August 16, 2017
    Date of Patent: July 26, 2022
    Assignee: Peking University Shenzhen Graduate School
    Inventors: Wenmin Wang, Mengdi Fan, Peilei Dong, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Hui Zhao, Wen Gao
  • Patent number: 11379711
    Abstract: A video action detection method based on a convolutional neural network (CNN) is disclosed in the field of computer vision recognition technologies. A temporal-spatial pyramid pooling layer is added to a network structure, which eliminates limitations on input by a network, speeds up training and detection, and improves performance of video action classification and time location. The disclosed convolutional neural network includes a convolutional layer, a common pooling layer, a temporal-spatial pyramid pooling layer and a full connection layer. The outputs of the convolutional neural network include a category classification output layer and a time localization calculation result output layer. The disclosed method does not require down-sampling to obtain video clips of different durations, but instead utilizes direct input of the whole video at once, improving efficiency.
    Type: Grant
    Filed: August 16, 2017
    Date of Patent: July 5, 2022
    Assignee: Peking University Shenzhen Graduate School
    Inventors: Wenmin Wang, Zhihao Li, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Hui Zhao, Wen Gao
  • Patent number: 11347979
    Abstract: A method and a device for MCMC framework-based sub-hypergraph matching are provided. Matching of object features is performed by constructing sub-hypergraphs. In a large number of actual images and videos, objects vary constantly, and contain various noise points as well as other interference factors, which makes image object matching and searching very difficult. Perform object feature matching by representing the appearance and positions of objects by sub-hypergraphs allows for faster and more accurate image matching. Furthermore, a sub-hypergraph has several advantages over a graph or a hypergraph: on one hand, a sub-hypergraph has more geometric information (e.g. angle transformation, rotation, scale, etc.) than a graph, and has a lower degree of difficulty and better extensibility than a hypergraph. On the other hand, the disclosed method and device have stronger capabilities to resist interference and good robustness, and are adaptable to more complex settings, especially with outliers.
    Type: Grant
    Filed: March 10, 2016
    Date of Patent: May 31, 2022
    Assignee: Peking University Shenzhen Graduate School
    Inventors: Wenmin Wang, Ruonan Zhang, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Hui Zhao, Wen Gao
  • Patent number: 11238274
    Abstract: An image feature extraction method for person re-identification includes performing person re-identification by means of aligned local descriptor extraction and graded global feature extraction; performing the aligned local descriptor extraction by processing an original image by affine transformation and performing a summation pooling operation on image block features of same regions to obtain an aligned local descriptor; reserving spatial information between inner blocks of the image for the aligned local descriptor; and performing the graded global feature extraction by grading a positioned pedestrian region block and solving a corresponding feature mean value to obtain a global feature. The method can resolve the problem of feature misalignment caused by posture changes of pedestrian, etc., and eliminate the effect of unrelated backgrounds on re-recognition, thus improving the precision and robustness of person re-identification.
    Type: Grant
    Filed: December 27, 2017
    Date of Patent: February 1, 2022
    Assignee: Peking University
    Inventors: Wenmin Wang, Yihao Zhang, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Wen Gao
  • Patent number: 11106951
    Abstract: A bidirectional image-text retrieval method based on a multi-view joint embedding space includes: performing retrieval with reference to a semantic association relationship at a global level and a local level, obtaining the semantic association relationship at the global level and the local level in a frame-sentence view and a region-phrase view, and obtaining semantic association information in a global level subspace of frame and sentence in the frame-sentence view, obtaining semantic association information in a local level subspace of region and phrase in the region-phrase view, processing data by a dual-branch neural network in the two views to obtain an isomorphic feature and embedding the same in a common space, and using a constraint condition to reserve an original semantic relationship of the data during training, and merging the two semantic association relationships using multi-view merging and sorting to obtain a more accurate semantic similarity between data.
    Type: Grant
    Filed: January 29, 2018
    Date of Patent: August 31, 2021
    Assignee: Peking University Shenzhen Graduate Sohool
    Inventors: Wenmin Wang, Lu Ran, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Hui Zhao, Wen Gao
  • Patent number: 11100370
    Abstract: Disclosed is a deep discriminative network for person re-identification in an image or a video. Concatenation are carried out on different input images on a color channel by constructing a deep discriminative network, and an obtained splicing result is defined as an original difference space of different images. The original difference space is sent into a convolutional network. The network outputs the similarity between two input images by learning difference information in the original difference space, thereby realizing person re-identification. The features of an individual image are not learnt, and concatenation are carried out on input images on a color channel at the beginning, and difference information is learnt on an original space of the images by using a designed network. By introducing an Inception module and embedding the same into a model, the learning ability of a network can be improved, and a better differentiation effect can be achieved.
    Type: Grant
    Filed: January 23, 2018
    Date of Patent: August 24, 2021
    Assignee: Peking University Shenzhen Graduate School
    Inventors: Wenmin Wang, Yihao Zhang, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Hui Zhao, Wen Gao
  • Publication number: 20210256365
    Abstract: The present application discloses a cross-media retrieval method based on deep semantic space, which includes a feature generation stage and a semantic space learning stage. In the feature generation stage, a CNN visual feature vector and an LSTM language description vector of an image are generated by simulating a perception process of a person for the image; and topic information about a text is explored by using an LDA topic model, thus extracting an LDA text topic vector. In the semantic space learning phase, a training set image is trained to obtain a four-layer Multi-Sensory Fusion Deep Neural Network, and a training set text is trained to obtain a three-layer text semantic network, respectively. Finally, a test image and a text are respectively mapped to an isomorphic semantic space by using two networks, so as to realize cross-media retrieval. The disclosed method can significantly improve the performance of cross-media retrieval.
    Type: Application
    Filed: August 16, 2017
    Publication date: August 19, 2021
    Inventors: Wenmin Wang, Mengdi Fan, Peilei Dong, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Hui Zhao, Wen Gao
  • Patent number: 11030444
    Abstract: Disclosed is a method for detecting pedestrians in an image by using Gaussian penalty. Initial pedestrian boundary box is screened using a Gaussian penalty, to improve the pedestrian detection performance, especially sheltered pedestrians in an image. The method includes acquiring a training data set, a test data set and pedestrian labels of a pedestrian detection image; using the training data set for training to obtain a detection model by using a pedestrian detection method, and acquiring initial pedestrian boundary box and confidence degrees and coordinates thereof; performing Gaussian penalty on the confidence degrees of the pedestrian boundary box, to obtain confidence degree of the pedestrian boundary box after the penalty; and obtaining final pedestrian boundary boxes by screening the pedestrian boundary boxes. Thus, repeated boundary boxes of a single pedestrian are removed while reserving boundary boxes of sheltered pedestrians, thereby realizing the detection of the pedestrians in an image.
    Type: Grant
    Filed: November 24, 2017
    Date of Patent: June 8, 2021
    Assignee: Peking University Shenzhen Graduate School
    Inventors: Wenmin Wang, Peilei Dong, Mengdi Fan, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Hui Zhao, Wen Gao
  • Publication number: 20210150268
    Abstract: Disclosed is a deep discriminative network for person re-identification in an image or a video. Concatenation are carried out on different input images on a color channel by constructing a deep discriminative network, and an obtained splicing result is defined as an original difference space of different images. The original difference space is sent into a convolutional network. The network outputs the similarity between two input images by learning difference information in the original difference space, thereby realizing person re-identification. The features of an individual image are not learnt, and concatenation are carried out on input images on a color channel at the beginning, and difference information is learnt on an original space of the images by using a designed network. By introducing an Inception module and embedding the same into a model, the learning ability of a network can be improved, and a better differentiation effect can be achieved.
    Type: Application
    Filed: January 23, 2018
    Publication date: May 20, 2021
    Inventors: Wenmin Wang, Yihao Zhang, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Hui Zhao, Wen Gao
  • Publication number: 20210150194
    Abstract: An image feature extraction method for person re-identification includes performing person re-identification by means of aligned local descriptor extraction and graded global feature extraction; performing the aligned local descriptor extraction by processing an original image by affine transformation and performing a summation pooling operation on image block features of same regions to obtain an aligned local descriptor; reserving spatial information between inner blocks of the image for the aligned local descriptor; and performing the graded global feature extraction by grading a positioned pedestrian region block and solving a corresponding feature mean value to obtain a global feature. The method can resolve the problem of feature misalignment caused by posture changes of pedestrian, etc., and eliminate the effect of unrelated backgrounds on re-recognition, thus improving the precision and robustness of person re-identification.
    Type: Application
    Filed: December 27, 2017
    Publication date: May 20, 2021
    Inventors: Wenmin Wang, Yihao Zhang, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Wen Gao
  • Publication number: 20210150255
    Abstract: A bidirectional image-text retrieval method based on a multi-view joint embedding space includes: performing retrieval with reference to a semantic association relationship at a global level and a local level, obtaining the semantic association relationship at the global level and the local level in a frame-sentence view and a region-phrase view, and obtaining semantic association information in a global level subspace of frame and sentence in the frame-sentence view, obtaining semantic association information in a local level subspace of region and phrase in the region-phrase view, processing data by a dual-branch neural network in the two views to obtain an isomorphic feature and embedding the same in a common space, and using a constraint condition to reserve an original semantic relationship of the data during training, and merging the two semantic association relationships using multi-view merging and sorting to obtain a more accurate semantic similarity between data.
    Type: Application
    Filed: January 29, 2018
    Publication date: May 20, 2021
    Inventors: Wenmin Wang, Lu Ran, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Hui Zhao, Wen Gao
  • Patent number: 10867167
    Abstract: A Collaborative Deep Network model method for pedestrian detection includes constructing a new collaborative multi-model learning framework to complete a classification process during pedestrian detection; and using an artificial neuron network to integrate judgment results of sub-classifiers in a collaborative model, and training the network by means of the method for machine learning, so that information fed back by sub-classifiers can be more effectively synthesized. A re-sampling method based on a K-means clustering algorithm can enhance the classification effect of each classifier in the collaborative model, and thus improves the overall classification effect.
    Type: Grant
    Filed: July 24, 2017
    Date of Patent: December 15, 2020
    Assignee: Peking University Shenzhen Graduate School
    Inventors: Wenmin Wang, Hongmeng Song, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Hui Zhao, Wen Gao
  • Patent number: 10719664
    Abstract: A cross-media search method using a VGG convolutional neural network (VGG net) to extract image features. The 4096-dimensional feature of a seventh fully-connected layer (fc7) in the VGG net, after processing by a ReLU activation function, serves as image features. A Fisher Vector based on Word2vec is utilized to extract text features. Semantic matching is performed on heterogeneous images and the text features by means of logistic regression. A correlation between the two heterogeneous features, which are images and text, is found by means of semantic matching based on logistic regression, and thus cross-media search is achieved. The feature extraction method can effectively indicate deep semantics of image and text, improve cross-media search accuracy, and thus greatly improve the cross-media search effect.
    Type: Grant
    Filed: December 1, 2016
    Date of Patent: July 21, 2020
    Assignee: Peking University Shenzhen Graduate School
    Inventors: Wenmin Wang, Liang Han, Mengdi Fan, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Hui Zhao, Wen Gao
  • Patent number: 10699156
    Abstract: A method for image matching includes acquiring a template image and a target image; acquiring a group of template features according to the template image; extracting a group of target features according to the target image; and according to template features and target features, calculating an degree of image similarity between the template image and each frame of target images, and using a target image with the maximum degree of image similarity as a matched image to the template image. In the image-matching method, image matching is performed by calculating an degree of image similarity between a template image and each target image according to a degree of image similarity between template features and target features, so that non-redundancy of features in an image matching process and correct image matching can be guaranteed, and the image matching accuracy can be improved.
    Type: Grant
    Filed: January 13, 2016
    Date of Patent: June 30, 2020
    Assignee: Peking University Shenzhen Graduate School
    Inventors: Wenmin Wang, Ruonan Zhang, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Wen Gao
  • Publication number: 20200160048
    Abstract: Disclosed is a method for detecting pedestrians in an image by using Gaussian penalty. Initial pedestrian boundary box is screened using a Gaussian penalty, to improve the pedestrian detection performance, especially sheltered pedestrians in an image. The method includes acquiring a training data set, a test data set and pedestrian labels of a pedestrian detection image; using the training data set for training to obtain a detection model by using a pedestrian detection method, and acquiring initial pedestrian boundary box and confidence degrees and coordinates thereof; performing Gaussian penalty on the confidence degrees of the pedestrian boundary box, to obtain confidence degree of the pedestrian boundary box after the penalty; and obtaining final pedestrian boundary boxes by screening the pedestrian boundary boxes. Thus, repeated boundary boxes of a single pedestrian are removed while reserving boundary boxes of sheltered pedestrians, thereby realizing the detection of the pedestrians in an image.
    Type: Application
    Filed: November 24, 2017
    Publication date: May 21, 2020
    Inventors: Wenmin Wang, Peilei Dong, Mengdi Fan, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Hui Zhao, Wen Gao
  • Publication number: 20200082165
    Abstract: A Collaborative Deep Network model method for pedestrian detection includes constructing a new collaborative multi-model learning framework to complete a classification process during pedestrian detection; and using an artificial neuron network to integrate judgment results of sub-classifiers in a collaborative model, and training the network by means of the method for machine learning, so that information fed back by sub-classifiers can be more effectively synthesized. A re-sampling method based on a K-means clustering algorithm can enhance the classification effect of each classifier in the collaborative model, and thus improves the overall classification effect.
    Type: Application
    Filed: July 24, 2017
    Publication date: March 12, 2020
    Inventors: Wenmin Wang, Hongmeng Song, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Hui Zhao, Wen Gao
  • Publication number: 20200057935
    Abstract: A video action detection method based on a convolutional neural network (CNN) is disclosed in the field of computer vision recognition technologies. A temporal-spatial pyramid pooling layer is added to a network structure, which eliminates limitations on input by a network, speeds up training and detection, and improves performance of video action classification and time location. The disclosed convolutional neural network includes a convolutional layer, a common pooling layer, a temporal-spatial pyramid pooling layer and a full connection layer. The outputs of the convolutional neural network include a category classification output layer and a time localization calculation result output layer. The disclosed method does not require down-sampling to obtain video clips of different durations, but instead utilizes direct input of the whole video at once, improving efficiency.
    Type: Application
    Filed: August 16, 2017
    Publication date: February 20, 2020
    Inventors: Wenmin Wang, Zhihao Li, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Hui Zhao, Wen Gao
  • Patent number: 10424052
    Abstract: An image representation method and processing device based on local PCA whitening. A first mapping module maps words and characteristics to a high-dimension space. A principal component analysis module conducts principal component analysis in each corresponding word space, to obtain a projection matrix. A VLAD computation module computes a VLAD image representation vector; a second mapping module maps the VLAD image representation vector to the high-dimension space. A projection transformation module conducts projection transformation on the VLAD image representation vector obtained by means of projection. A normalization processing module conducts normalization on characteristics obtained by means of projection transformation, to obtain a final image representation vector.
    Type: Grant
    Filed: September 15, 2015
    Date of Patent: September 24, 2019
    Assignee: Peking University Shenzhen Graduate School
    Inventors: Wenmin Wang, Mingmin Zhen, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Wen Gao
  • Publication number: 20190205393
    Abstract: A cross-media search method using a VGG convolutional neural network (VGG net) to extract image features. The 4096-dimensional feature of a seventh fully-connected layer (fc7) in the VGG net, after processing by a ReLU activation function, serves as image features. A Fisher Vector based on Word2vec is utilized to extract text features. Semantic matching is performed on heterogeneous images and the text features by means of logistic regression. A correlation between the two heterogeneous features, which are images and text, is found by means of semantic matching based on logistic regression, and thus cross-media search is achieved. The feature extraction method can effectively indicate deep semantics of image and text, improve cross-media search accuracy, and thus greatly improve the cross-media search effect.
    Type: Application
    Filed: December 1, 2016
    Publication date: July 4, 2019
    Applicant: Peking University Shenzhen Graduate School
    Inventors: Wenmin Wang, Liang Han, Mengdi Fan, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Hui Zhao, Wen Gao
  • Patent number: 10339409
    Abstract: A method and a device for extracting local features of a 3D point cloud are disclosed. Angle information and the concavo-convex information about a feature point to be extracted and a point of an adjacent body element are calculated based on a local reference system corresponding to the points of each body element. The feature relation between the two points can be calculated accurately. The property of invariance in translation and rotation is possessed. Since concavo-convex information about a local point cloud is contained during extraction, the inaccurate extraction caused by ignoring concavo-convex ambiguity in previous 3D local feature description is resolved.
    Type: Grant
    Filed: June 18, 2015
    Date of Patent: July 2, 2019
    Assignee: Peking University Shenzhen Graduate School
    Inventors: Wenmin Wang, Mingmin Zhen, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Wen Gao