Patents by Inventor Shengfu DONG

Shengfu DONG has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Cross-media retrieval method based on deep semantic space

Patent number: 11397890

Abstract: The present application discloses a cross-media retrieval method based on deep semantic space, which includes a feature generation stage and a semantic space learning stage. In the feature generation stage, a CNN visual feature vector and an LSTM language description vector of an image are generated by simulating a perception process of a person for the image; and topic information about a text is explored by using an LDA topic model, thus extracting an LDA text topic vector. In the semantic space learning phase, a training set image is trained to obtain a four-layer Multi-Sensory Fusion Deep Neural Network, and a training set text is trained to obtain a three-layer text semantic network, respectively. Finally, a test image and a text are respectively mapped to an isomorphic semantic space by using two networks, so as to realize cross-media retrieval. The disclosed method can significantly improve the performance of cross-media retrieval.

Type: Grant

Filed: August 16, 2017

Date of Patent: July 26, 2022

Assignee: Peking University Shenzhen Graduate School

Inventors: Wenmin Wang, Mengdi Fan, Peilei Dong, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Hui Zhao, Wen Gao
Video action detection method based on convolutional neural network

Patent number: 11379711

Abstract: A video action detection method based on a convolutional neural network (CNN) is disclosed in the field of computer vision recognition technologies. A temporal-spatial pyramid pooling layer is added to a network structure, which eliminates limitations on input by a network, speeds up training and detection, and improves performance of video action classification and time location. The disclosed convolutional neural network includes a convolutional layer, a common pooling layer, a temporal-spatial pyramid pooling layer and a full connection layer. The outputs of the convolutional neural network include a category classification output layer and a time localization calculation result output layer. The disclosed method does not require down-sampling to obtain video clips of different durations, but instead utilizes direct input of the whole video at once, improving efficiency.

Type: Grant

Filed: August 16, 2017

Date of Patent: July 5, 2022

Assignee: Peking University Shenzhen Graduate School

Inventors: Wenmin Wang, Zhihao Li, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Hui Zhao, Wen Gao
MCMC framework-based sub-hypergraph matching method and device

Patent number: 11347979

Abstract: A method and a device for MCMC framework-based sub-hypergraph matching are provided. Matching of object features is performed by constructing sub-hypergraphs. In a large number of actual images and videos, objects vary constantly, and contain various noise points as well as other interference factors, which makes image object matching and searching very difficult. Perform object feature matching by representing the appearance and positions of objects by sub-hypergraphs allows for faster and more accurate image matching. Furthermore, a sub-hypergraph has several advantages over a graph or a hypergraph: on one hand, a sub-hypergraph has more geometric information (e.g. angle transformation, rotation, scale, etc.) than a graph, and has a lower degree of difficulty and better extensibility than a hypergraph. On the other hand, the disclosed method and device have stronger capabilities to resist interference and good robustness, and are adaptable to more complex settings, especially with outliers.

Type: Grant

Filed: March 10, 2016

Date of Patent: May 31, 2022

Assignee: Peking University Shenzhen Graduate School

Inventors: Wenmin Wang, Ruonan Zhang, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Hui Zhao, Wen Gao
Image feature extraction method for person re-identification

Patent number: 11238274

Abstract: An image feature extraction method for person re-identification includes performing person re-identification by means of aligned local descriptor extraction and graded global feature extraction; performing the aligned local descriptor extraction by processing an original image by affine transformation and performing a summation pooling operation on image block features of same regions to obtain an aligned local descriptor; reserving spatial information between inner blocks of the image for the aligned local descriptor; and performing the graded global feature extraction by grading a positioned pedestrian region block and solving a corresponding feature mean value to obtain a global feature. The method can resolve the problem of feature misalignment caused by posture changes of pedestrian, etc., and eliminate the effect of unrelated backgrounds on re-recognition, thus improving the precision and robustness of person re-identification.

Type: Grant

Filed: December 27, 2017

Date of Patent: February 1, 2022

Assignee: Peking University

Inventors: Wenmin Wang, Yihao Zhang, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Wen Gao
Method of bidirectional image-text retrieval based on multi-view joint embedding space

Patent number: 11106951

Abstract: A bidirectional image-text retrieval method based on a multi-view joint embedding space includes: performing retrieval with reference to a semantic association relationship at a global level and a local level, obtaining the semantic association relationship at the global level and the local level in a frame-sentence view and a region-phrase view, and obtaining semantic association information in a global level subspace of frame and sentence in the frame-sentence view, obtaining semantic association information in a local level subspace of region and phrase in the region-phrase view, processing data by a dual-branch neural network in the two views to obtain an isomorphic feature and embedding the same in a common space, and using a constraint condition to reserve an original semantic relationship of the data during training, and merging the two semantic association relationships using multi-view merging and sorting to obtain a more accurate semantic similarity between data.

Type: Grant

Filed: January 29, 2018

Date of Patent: August 31, 2021

Assignee: Peking University Shenzhen Graduate Sohool

Inventors: Wenmin Wang, Lu Ran, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Hui Zhao, Wen Gao
Method of using deep discriminate network model for person re-identification in image or video

Patent number: 11100370

Abstract: Disclosed is a deep discriminative network for person re-identification in an image or a video. Concatenation are carried out on different input images on a color channel by constructing a deep discriminative network, and an obtained splicing result is defined as an original difference space of different images. The original difference space is sent into a convolutional network. The network outputs the similarity between two input images by learning difference information in the original difference space, thereby realizing person re-identification. The features of an individual image are not learnt, and concatenation are carried out on input images on a color channel at the beginning, and difference information is learnt on an original space of the images by using a designed network. By introducing an Inception module and embedding the same into a model, the learning ability of a network can be improved, and a better differentiation effect can be achieved.

Type: Grant

Filed: January 23, 2018

Date of Patent: August 24, 2021

Assignee: Peking University Shenzhen Graduate School

Inventors: Wenmin Wang, Yihao Zhang, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Hui Zhao, Wen Gao
Cross-media retrieval method based on deep semantic space

Publication number: 20210256365

Abstract: The present application discloses a cross-media retrieval method based on deep semantic space, which includes a feature generation stage and a semantic space learning stage. In the feature generation stage, a CNN visual feature vector and an LSTM language description vector of an image are generated by simulating a perception process of a person for the image; and topic information about a text is explored by using an LDA topic model, thus extracting an LDA text topic vector. In the semantic space learning phase, a training set image is trained to obtain a four-layer Multi-Sensory Fusion Deep Neural Network, and a training set text is trained to obtain a three-layer text semantic network, respectively. Finally, a test image and a text are respectively mapped to an isomorphic semantic space by using two networks, so as to realize cross-media retrieval. The disclosed method can significantly improve the performance of cross-media retrieval.

Type: Application

Filed: August 16, 2017

Publication date: August 19, 2021

Inventors: Wenmin Wang, Mengdi Fan, Peilei Dong, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Hui Zhao, Wen Gao
Method for detecting pedestrians in image by using Gaussian penalty

Patent number: 11030444

Abstract: Disclosed is a method for detecting pedestrians in an image by using Gaussian penalty. Initial pedestrian boundary box is screened using a Gaussian penalty, to improve the pedestrian detection performance, especially sheltered pedestrians in an image. The method includes acquiring a training data set, a test data set and pedestrian labels of a pedestrian detection image; using the training data set for training to obtain a detection model by using a pedestrian detection method, and acquiring initial pedestrian boundary box and confidence degrees and coordinates thereof; performing Gaussian penalty on the confidence degrees of the pedestrian boundary box, to obtain confidence degree of the pedestrian boundary box after the penalty; and obtaining final pedestrian boundary boxes by screening the pedestrian boundary boxes. Thus, repeated boundary boxes of a single pedestrian are removed while reserving boundary boxes of sheltered pedestrians, thereby realizing the detection of the pedestrians in an image.

Type: Grant

Filed: November 24, 2017

Date of Patent: June 8, 2021

Assignee: Peking University Shenzhen Graduate School

Inventors: Wenmin Wang, Peilei Dong, Mengdi Fan, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Hui Zhao, Wen Gao
Method of using deep discriminate network model for person re-identification in image or video

Publication number: 20210150268

Abstract: Disclosed is a deep discriminative network for person re-identification in an image or a video. Concatenation are carried out on different input images on a color channel by constructing a deep discriminative network, and an obtained splicing result is defined as an original difference space of different images. The original difference space is sent into a convolutional network. The network outputs the similarity between two input images by learning difference information in the original difference space, thereby realizing person re-identification. The features of an individual image are not learnt, and concatenation are carried out on input images on a color channel at the beginning, and difference information is learnt on an original space of the images by using a designed network. By introducing an Inception module and embedding the same into a model, the learning ability of a network can be improved, and a better differentiation effect can be achieved.

Type: Application

Filed: January 23, 2018

Publication date: May 20, 2021

Inventors: Wenmin Wang, Yihao Zhang, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Hui Zhao, Wen Gao
Image feature extraction method for person re-identification

Publication number: 20210150194

Abstract: An image feature extraction method for person re-identification includes performing person re-identification by means of aligned local descriptor extraction and graded global feature extraction; performing the aligned local descriptor extraction by processing an original image by affine transformation and performing a summation pooling operation on image block features of same regions to obtain an aligned local descriptor; reserving spatial information between inner blocks of the image for the aligned local descriptor; and performing the graded global feature extraction by grading a positioned pedestrian region block and solving a corresponding feature mean value to obtain a global feature. The method can resolve the problem of feature misalignment caused by posture changes of pedestrian, etc., and eliminate the effect of unrelated backgrounds on re-recognition, thus improving the precision and robustness of person re-identification.

Type: Application

Filed: December 27, 2017

Publication date: May 20, 2021

Inventors: Wenmin Wang, Yihao Zhang, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Wen Gao
Method of bidirectional image-text retrieval based on multi-view joint embedding space

Publication number: 20210150255

Abstract: A bidirectional image-text retrieval method based on a multi-view joint embedding space includes: performing retrieval with reference to a semantic association relationship at a global level and a local level, obtaining the semantic association relationship at the global level and the local level in a frame-sentence view and a region-phrase view, and obtaining semantic association information in a global level subspace of frame and sentence in the frame-sentence view, obtaining semantic association information in a local level subspace of region and phrase in the region-phrase view, processing data by a dual-branch neural network in the two views to obtain an isomorphic feature and embedding the same in a common space, and using a constraint condition to reserve an original semantic relationship of the data during training, and merging the two semantic association relationships using multi-view merging and sorting to obtain a more accurate semantic similarity between data.

Type: Application

Filed: January 29, 2018

Publication date: May 20, 2021

Inventors: Wenmin Wang, Lu Ran, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Hui Zhao, Wen Gao
Collaborative deep network model method for pedestrian detection

Patent number: 10867167

Abstract: A Collaborative Deep Network model method for pedestrian detection includes constructing a new collaborative multi-model learning framework to complete a classification process during pedestrian detection; and using an artificial neuron network to integrate judgment results of sub-classifiers in a collaborative model, and training the network by means of the method for machine learning, so that information fed back by sub-classifiers can be more effectively synthesized. A re-sampling method based on a K-means clustering algorithm can enhance the classification effect of each classifier in the collaborative model, and thus improves the overall classification effect.

Type: Grant

Filed: July 24, 2017

Date of Patent: December 15, 2020

Assignee: Peking University Shenzhen Graduate School

Inventors: Wenmin Wang, Hongmeng Song, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Hui Zhao, Wen Gao
Cross-media search method

Patent number: 10719664

Abstract: A cross-media search method using a VGG convolutional neural network (VGG net) to extract image features. The 4096-dimensional feature of a seventh fully-connected layer (fc7) in the VGG net, after processing by a ReLU activation function, serves as image features. A Fisher Vector based on Word2vec is utilized to extract text features. Semantic matching is performed on heterogeneous images and the text features by means of logistic regression. A correlation between the two heterogeneous features, which are images and text, is found by means of semantic matching based on logistic regression, and thus cross-media search is achieved. The feature extraction method can effectively indicate deep semantics of image and text, improve cross-media search accuracy, and thus greatly improve the cross-media search effect.

Type: Grant

Filed: December 1, 2016

Date of Patent: July 21, 2020

Assignee: Peking University Shenzhen Graduate School

Inventors: Wenmin Wang, Liang Han, Mengdi Fan, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Hui Zhao, Wen Gao
Method and a device for image matching

Patent number: 10699156

Abstract: A method for image matching includes acquiring a template image and a target image; acquiring a group of template features according to the template image; extracting a group of target features according to the target image; and according to template features and target features, calculating an degree of image similarity between the template image and each frame of target images, and using a target image with the maximum degree of image similarity as a matched image to the template image. In the image-matching method, image matching is performed by calculating an degree of image similarity between a template image and each target image according to a degree of image similarity between template features and target features, so that non-redundancy of features in an image matching process and correct image matching can be guaranteed, and the image matching accuracy can be improved.

Type: Grant

Filed: January 13, 2016

Date of Patent: June 30, 2020

Assignee: Peking University Shenzhen Graduate School

Inventors: Wenmin Wang, Ruonan Zhang, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Wen Gao
Method for detecting pedestrians in image by using Gaussian penalty

Publication number: 20200160048

Abstract: Disclosed is a method for detecting pedestrians in an image by using Gaussian penalty. Initial pedestrian boundary box is screened using a Gaussian penalty, to improve the pedestrian detection performance, especially sheltered pedestrians in an image. The method includes acquiring a training data set, a test data set and pedestrian labels of a pedestrian detection image; using the training data set for training to obtain a detection model by using a pedestrian detection method, and acquiring initial pedestrian boundary box and confidence degrees and coordinates thereof; performing Gaussian penalty on the confidence degrees of the pedestrian boundary box, to obtain confidence degree of the pedestrian boundary box after the penalty; and obtaining final pedestrian boundary boxes by screening the pedestrian boundary boxes. Thus, repeated boundary boxes of a single pedestrian are removed while reserving boundary boxes of sheltered pedestrians, thereby realizing the detection of the pedestrians in an image.

Type: Application

Filed: November 24, 2017

Publication date: May 21, 2020

Inventors: Wenmin Wang, Peilei Dong, Mengdi Fan, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Hui Zhao, Wen Gao
Collaborative deep network model method for pedestrian detection

Publication number: 20200082165

Abstract: A Collaborative Deep Network model method for pedestrian detection includes constructing a new collaborative multi-model learning framework to complete a classification process during pedestrian detection; and using an artificial neuron network to integrate judgment results of sub-classifiers in a collaborative model, and training the network by means of the method for machine learning, so that information fed back by sub-classifiers can be more effectively synthesized. A re-sampling method based on a K-means clustering algorithm can enhance the classification effect of each classifier in the collaborative model, and thus improves the overall classification effect.

Type: Application

Filed: July 24, 2017

Publication date: March 12, 2020

Inventors: Wenmin Wang, Hongmeng Song, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Hui Zhao, Wen Gao
Video action detection method based on convolutional neural network

Publication number: 20200057935

Abstract: A video action detection method based on a convolutional neural network (CNN) is disclosed in the field of computer vision recognition technologies. A temporal-spatial pyramid pooling layer is added to a network structure, which eliminates limitations on input by a network, speeds up training and detection, and improves performance of video action classification and time location. The disclosed convolutional neural network includes a convolutional layer, a common pooling layer, a temporal-spatial pyramid pooling layer and a full connection layer. The outputs of the convolutional neural network include a category classification output layer and a time localization calculation result output layer. The disclosed method does not require down-sampling to obtain video clips of different durations, but instead utilizes direct input of the whole video at once, improving efficiency.

Type: Application

Filed: August 16, 2017

Publication date: February 20, 2020

Inventors: Wenmin Wang, Zhihao Li, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Hui Zhao, Wen Gao
Image representation method and processing device based on local PCA whitening

Patent number: 10424052

Abstract: An image representation method and processing device based on local PCA whitening. A first mapping module maps words and characteristics to a high-dimension space. A principal component analysis module conducts principal component analysis in each corresponding word space, to obtain a projection matrix. A VLAD computation module computes a VLAD image representation vector; a second mapping module maps the VLAD image representation vector to the high-dimension space. A projection transformation module conducts projection transformation on the VLAD image representation vector obtained by means of projection. A normalization processing module conducts normalization on characteristics obtained by means of projection transformation, to obtain a final image representation vector.

Type: Grant

Filed: September 15, 2015

Date of Patent: September 24, 2019

Assignee: Peking University Shenzhen Graduate School

Inventors: Wenmin Wang, Mingmin Zhen, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Wen Gao
A cross-media search method

Publication number: 20190205393

Abstract: A cross-media search method using a VGG convolutional neural network (VGG net) to extract image features. The 4096-dimensional feature of a seventh fully-connected layer (fc7) in the VGG net, after processing by a ReLU activation function, serves as image features. A Fisher Vector based on Word2vec is utilized to extract text features. Semantic matching is performed on heterogeneous images and the text features by means of logistic regression. A correlation between the two heterogeneous features, which are images and text, is found by means of semantic matching based on logistic regression, and thus cross-media search is achieved. The feature extraction method can effectively indicate deep semantics of image and text, improve cross-media search accuracy, and thus greatly improve the cross-media search effect.

Type: Application

Filed: December 1, 2016

Publication date: July 4, 2019

Applicant: Peking University Shenzhen Graduate School

Inventors: Wenmin Wang, Liang Han, Mengdi Fan, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Hui Zhao, Wen Gao
Method and a device for extracting local features of a three-dimensional point cloud

Patent number: 10339409

Abstract: A method and a device for extracting local features of a 3D point cloud are disclosed. Angle information and the concavo-convex information about a feature point to be extracted and a point of an adjacent body element are calculated based on a local reference system corresponding to the points of each body element. The feature relation between the two points can be calculated accurately. The property of invariance in translation and rotation is possessed. Since concavo-convex information about a local point cloud is contained during extraction, the inaccurate extraction caused by ignoring concavo-convex ambiguity in previous 3D local feature description is resolved.

Type: Grant

Filed: June 18, 2015

Date of Patent: July 2, 2019

Assignee: Peking University Shenzhen Graduate School

Inventors: Wenmin Wang, Mingmin Zhen, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Wen Gao

1 2 next