Patents by Inventor Wenmin Wang

Wenmin Wang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10762901
    Abstract: Embodiments of the present disclosure disclose an artificial intelligence based method and apparatus for classifying a voice-recognized text. A specific embodiment of the method includes: acquiring a current interactive text of a voice query from a user; analyzing the current interactive text using a lexical analyzer to obtain a current lexical structure; determining whether the current lexical structure matches a template of a category in a classifier; and classifying, if the current lexical structure matches the template of the category in the classifier, the current interactive text corresponding to the current lexical structure into the category belonging to the matched template. The embodiment can fast classify texts, effectively reduce the magnitude of manually annotated texts, and improve the annotation efficiency in intelligent voice interaction services.
    Type: Grant
    Filed: August 3, 2018
    Date of Patent: September 1, 2020
    Assignee: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD.
    Inventors: Yichuan Liang, Guang Ling, Yingzhan Lin, Wenmin Wang, Zeying Xie, Yin Zhang, Wei Xu, Chao Zhou
  • Patent number: 10719664
    Abstract: A cross-media search method using a VGG convolutional neural network (VGG net) to extract image features. The 4096-dimensional feature of a seventh fully-connected layer (fc7) in the VGG net, after processing by a ReLU activation function, serves as image features. A Fisher Vector based on Word2vec is utilized to extract text features. Semantic matching is performed on heterogeneous images and the text features by means of logistic regression. A correlation between the two heterogeneous features, which are images and text, is found by means of semantic matching based on logistic regression, and thus cross-media search is achieved. The feature extraction method can effectively indicate deep semantics of image and text, improve cross-media search accuracy, and thus greatly improve the cross-media search effect.
    Type: Grant
    Filed: December 1, 2016
    Date of Patent: July 21, 2020
    Assignee: Peking University Shenzhen Graduate School
    Inventors: Wenmin Wang, Liang Han, Mengdi Fan, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Hui Zhao, Wen Gao
  • Patent number: 10699156
    Abstract: A method for image matching includes acquiring a template image and a target image; acquiring a group of template features according to the template image; extracting a group of target features according to the target image; and according to template features and target features, calculating an degree of image similarity between the template image and each frame of target images, and using a target image with the maximum degree of image similarity as a matched image to the template image. In the image-matching method, image matching is performed by calculating an degree of image similarity between a template image and each target image according to a degree of image similarity between template features and target features, so that non-redundancy of features in an image matching process and correct image matching can be guaranteed, and the image matching accuracy can be improved.
    Type: Grant
    Filed: January 13, 2016
    Date of Patent: June 30, 2020
    Assignee: Peking University Shenzhen Graduate School
    Inventors: Wenmin Wang, Ruonan Zhang, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Wen Gao
  • Publication number: 20200160048
    Abstract: Disclosed is a method for detecting pedestrians in an image by using Gaussian penalty. Initial pedestrian boundary box is screened using a Gaussian penalty, to improve the pedestrian detection performance, especially sheltered pedestrians in an image. The method includes acquiring a training data set, a test data set and pedestrian labels of a pedestrian detection image; using the training data set for training to obtain a detection model by using a pedestrian detection method, and acquiring initial pedestrian boundary box and confidence degrees and coordinates thereof; performing Gaussian penalty on the confidence degrees of the pedestrian boundary box, to obtain confidence degree of the pedestrian boundary box after the penalty; and obtaining final pedestrian boundary boxes by screening the pedestrian boundary boxes. Thus, repeated boundary boxes of a single pedestrian are removed while reserving boundary boxes of sheltered pedestrians, thereby realizing the detection of the pedestrians in an image.
    Type: Application
    Filed: November 24, 2017
    Publication date: May 21, 2020
    Inventors: Wenmin Wang, Peilei Dong, Mengdi Fan, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Hui Zhao, Wen Gao
  • Publication number: 20200082165
    Abstract: A Collaborative Deep Network model method for pedestrian detection includes constructing a new collaborative multi-model learning framework to complete a classification process during pedestrian detection; and using an artificial neuron network to integrate judgment results of sub-classifiers in a collaborative model, and training the network by means of the method for machine learning, so that information fed back by sub-classifiers can be more effectively synthesized. A re-sampling method based on a K-means clustering algorithm can enhance the classification effect of each classifier in the collaborative model, and thus improves the overall classification effect.
    Type: Application
    Filed: July 24, 2017
    Publication date: March 12, 2020
    Inventors: Wenmin Wang, Hongmeng Song, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Hui Zhao, Wen Gao
  • Publication number: 20200057935
    Abstract: A video action detection method based on a convolutional neural network (CNN) is disclosed in the field of computer vision recognition technologies. A temporal-spatial pyramid pooling layer is added to a network structure, which eliminates limitations on input by a network, speeds up training and detection, and improves performance of video action classification and time location. The disclosed convolutional neural network includes a convolutional layer, a common pooling layer, a temporal-spatial pyramid pooling layer and a full connection layer. The outputs of the convolutional neural network include a category classification output layer and a time localization calculation result output layer. The disclosed method does not require down-sampling to obtain video clips of different durations, but instead utilizes direct input of the whole video at once, improving efficiency.
    Type: Application
    Filed: August 16, 2017
    Publication date: February 20, 2020
    Inventors: Wenmin Wang, Zhihao Li, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Hui Zhao, Wen Gao
  • Patent number: 10424052
    Abstract: An image representation method and processing device based on local PCA whitening. A first mapping module maps words and characteristics to a high-dimension space. A principal component analysis module conducts principal component analysis in each corresponding word space, to obtain a projection matrix. A VLAD computation module computes a VLAD image representation vector; a second mapping module maps the VLAD image representation vector to the high-dimension space. A projection transformation module conducts projection transformation on the VLAD image representation vector obtained by means of projection. A normalization processing module conducts normalization on characteristics obtained by means of projection transformation, to obtain a final image representation vector.
    Type: Grant
    Filed: September 15, 2015
    Date of Patent: September 24, 2019
    Assignee: Peking University Shenzhen Graduate School
    Inventors: Wenmin Wang, Mingmin Zhen, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Wen Gao
  • Patent number: 10424075
    Abstract: A method and a device for post-processing depth/disparity maps adopt a strategy of combining edge information and segmentation information when detecting irregular edge regions. The method includes dividing a color image into super pixels when performing image segmentation on a color image; partitioning a grayscale range into a preset number of intervals, and for each super pixel, statistically obtaining a histogram of all the pixel points that fall within the intervals; determining, in a current super pixel, whether a ratio of the number of pixels contained in the interval having a maximum interval distribution value, to the total number of pixels in the current super pixel is less than the first threshold; and if so, further dividing the current super pixel using a color-based segmentation method. The disclosed method and device improve accuracy of color image division while ensuring image processing speed, thus improving detection accuracy of the irregular edge regions.
    Type: Grant
    Filed: May 6, 2015
    Date of Patent: September 24, 2019
    Assignee: Peking University Shenzhen Graduate School
    Inventors: Jianbo Jiao, Ronggang Wang, Zhenyu Wang, Wenmin Wang, Wen Gao
  • Patent number: 10395374
    Abstract: Disclosed in the present invention is a video foreground extraction method for a surveillance video, which adjusts a size of a block to adapt to different video resolutions based on an image block processing method; and then extracts a foreground object in a moving state by establishing a background block model, the method comprising: representing each frame of image I in the surveillance video as a block; initializing; updating a block background weight, a block temporary background and a temporary background; updating a block background and a background; saving a foreground, and updating a foreground block weight and a foreground block; and performing binarization processing on the foreground to obtain a final foreground result.
    Type: Grant
    Filed: April 6, 2017
    Date of Patent: August 27, 2019
    Assignee: PEKING UNIVERSITY SHENZHEN GRADUATE SCHOOL
    Inventors: Ge Li, Xianghao Zang, Wenmin Wang, Ronggang Wang
  • Publication number: 20190205393
    Abstract: A cross-media search method using a VGG convolutional neural network (VGG net) to extract image features. The 4096-dimensional feature of a seventh fully-connected layer (fc7) in the VGG net, after processing by a ReLU activation function, serves as image features. A Fisher Vector based on Word2vec is utilized to extract text features. Semantic matching is performed on heterogeneous images and the text features by means of logistic regression. A correlation between the two heterogeneous features, which are images and text, is found by means of semantic matching based on logistic regression, and thus cross-media search is achieved. The feature extraction method can effectively indicate deep semantics of image and text, improve cross-media search accuracy, and thus greatly improve the cross-media search effect.
    Type: Application
    Filed: December 1, 2016
    Publication date: July 4, 2019
    Applicant: Peking University Shenzhen Graduate School
    Inventors: Wenmin Wang, Liang Han, Mengdi Fan, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Hui Zhao, Wen Gao
  • Patent number: 10339633
    Abstract: The present application provides a method and a device for super-resolution image reconstruction based on dictionary matching. The method includes: establishing a matching dictionary library; inputting an image to be reconstructed into a multi-layer linear filter network; extracting a local characteristic of the image to be reconstructed; searching the matching dictionary library for a local characteristic of a low-resolution image block having the highest similarity with the local characteristic of the image to be reconstructed; searching the matching dictionary library for a residual of a combined sample where the local characteristic of the low-resolution image block with the highest similarity is located; performing interpolation amplification on the local characteristic of the low-resolution image block having the highest similarity; and adding the residual to a result of the interpolation amplification to obtain a reconstructed high-resolution image block.
    Type: Grant
    Filed: November 4, 2015
    Date of Patent: July 2, 2019
    Assignee: Peking University Shenzhen Graduate School
    Inventors: Yang Zhao, Ronggang Wang, Wen Gao, Zhenyu Wang, Wenmin Wang
  • Patent number: 10339409
    Abstract: A method and a device for extracting local features of a 3D point cloud are disclosed. Angle information and the concavo-convex information about a feature point to be extracted and a point of an adjacent body element are calculated based on a local reference system corresponding to the points of each body element. The feature relation between the two points can be calculated accurately. The property of invariance in translation and rotation is possessed. Since concavo-convex information about a local point cloud is contained during extraction, the inaccurate extraction caused by ignoring concavo-convex ambiguity in previous 3D local feature description is resolved.
    Type: Grant
    Filed: June 18, 2015
    Date of Patent: July 2, 2019
    Assignee: Peking University Shenzhen Graduate School
    Inventors: Wenmin Wang, Mingmin Zhen, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Wen Gao
  • Patent number: 10298950
    Abstract: A P frame-based multi-hypothesis motion compensation method includes: taking an encoded image block adjacent to a current image block as a reference image block and obtaining a first motion vector of the current image block by using a motion vector of the reference image block, the first motion vector pointing to a first prediction block; taking the first motion vector as a reference value and performing joint motion estimation on the current image block to obtain a second motion vector of the current image block, the second motion vector pointing to a second prediction block; and performing weighted averaging on the first prediction block and the second prediction block to obtain a final prediction block of the current image block. The method increases the accuracy of the obtained prediction block of the current image block without increasing the code rate.
    Type: Grant
    Filed: January 26, 2016
    Date of Patent: May 21, 2019
    Assignee: PEKING UNIVERSITY SHENZHEN GRADUATE SCHOOL
    Inventors: Ronggang Wang, Lei Chen, Zhenyu Wang, Siwei Ma, Wen Gao, Tiejun Huang, Wenmin Wang, Shengfu Dong
  • Patent number: 10297016
    Abstract: Disclosed is a video background removal method, which relates to the technical field of video analysis, and in particular to a background removal method based on an image block, a Gaussian mixture model and a random process. Firstly, the concept of blocks is defined, and a foreground and a background are determined by means of comparing a difference between blocks; a threshold value is automatically adjusted by using a Gaussian mixture model, and at the same time, the background is updated by using the idea of random process; and finally, an experiment is made on a BMC dataset, and a result shows that this method surpasses most of the current advanced algorithms, and the accuracy is very high. This method has wide applicability, can be applied to monitor video background subtraction, and is applied very importantly in the field of video analysis.
    Type: Grant
    Filed: January 5, 2017
    Date of Patent: May 21, 2019
    Assignee: Peking University Shenzhen Graduate School
    Inventors: Ge Li, Xianghao Zang, Wenmin Wang, Ronggang Wang
  • Publication number: 20190114753
    Abstract: Disclosed is a video background removal method, which relates to the technical field of video analysis, and in particular to a background removal method based on an image block, a Gaussian mixture model and a random process. Firstly, the concept of blocks is defined, and a foreground and a background are determined by means of comparing a difference between blocks; a threshold value is automatically adjusted by using a Gaussian mixture model, and at the same time, the background is updated by using the idea of random process; and finally, an experiment is made on a BMC dataset, and a result shows that this method surpasses most of the current advanced algorithms, and the accuracy is very high. This method has wide applicability, can be applied to monitor video background subtraction, and is applied very importantly in the field of video analysis.
    Type: Application
    Filed: January 5, 2017
    Publication date: April 18, 2019
    Applicant: Peking University Shenzhen Graduate School
    Inventors: Ge LI, Xianghao ZANG, Wenmin WANG, Ronggang WANG
  • Publication number: 20190108642
    Abstract: Disclosed in the present invention is a video foreground extraction method for a surveillance video, which adjusts a size of a block to adapt to different video resolutions based on an image block processing method; and then extracts a foreground object in a moving state by establishing a background block model, the method comprising: representing each frame of image I in the surveillance video as a block; initializing; updating a block background weight, a block temporary background and a temporary background; updating a block background and a background; saving a foreground, and updating a foreground block weight and a foreground block; and performing binarization processing on the foreground to obtain a final foreground result.
    Type: Application
    Filed: April 6, 2017
    Publication date: April 11, 2019
    Inventors: Ge LI, Xianghao ZANG, Wenmin WANG, Ronggang WANG
  • Publication number: 20190088256
    Abstract: The disclosure discloses a human-machine interaction method and apparatus based on artificial intelligence. A specific embodiment of the method comprises: receiving a user-entered interaction sentence, and determining whether to generate an interaction result corresponding to the interaction sentence; and determining interaction information to be presented to the user based on a determining result, the interaction information comprising at least one of following items: the generated interaction result corresponding to the interaction sentence, or a search result corresponding to the interaction sentence in a search engine.
    Type: Application
    Filed: August 3, 2018
    Publication date: March 21, 2019
    Inventors: Yingzhan Lin, Zeying Xie, Yichuan Liang, Wenmin Wang, Yin Zhang, Guang Ling, Chao Zhou
  • Publication number: 20190065912
    Abstract: A method and a device for MCMC framework-based sub-hypergraph matching are provided. Matching of object features is performed by constructing sub-hypergraphs. In a large number of actual images and videos, objects vary constantly, and contain various noise points as well as other interference factors, which makes image object matching and searching very difficult. Perform object feature matching by representing the appearance and positions of objects by sub-hypergraphs allows for faster and more accurate image matching. Furthermore, a sub-hypergraph has several advantages over a graph or a hypergraph: on one hand, a sub-hypergraph has more geometric information (e.g. angle transformation, rotation, scale, etc.) than a graph, and has a lower degree of difficulty and better extensibility than a hypergraph. On the other hand, the disclosed method and device have stronger capabilities to resist interference and good robustness, and are adaptable to more complex settings, especially with outliers.
    Type: Application
    Filed: March 10, 2016
    Publication date: February 28, 2019
    Inventors: Wenmin Wang, Ruonan Zhang, Ronggang Wang, Ge Li, Shengfu Dong, Zhenyu Wang, Ying Li, Hui Zhao, Wen Gao
  • Publication number: 20190066675
    Abstract: Embodiments of the present disclosure disclose an artificial intelligence based method and apparatus for classifying a voice-recognized text. A specific embodiment of the method includes: acquiring a current interactive text of a voice query from a user; analyzing the current interactive text using a lexical analyzer to obtain a current lexical structure; determining whether the current lexical structure matches a template of a category in a classifier; and classifying, if the current lexical structure matches the template of the category in the classifier, the current interactive text corresponding to the current lexical structure into the category belonging to the matched template. The embodiment can fast classify texts, effectively reduce the magnitude of manually annotated texts, and improve the annotation efficiency in intelligent voice interaction services.
    Type: Application
    Filed: August 3, 2018
    Publication date: February 28, 2019
    Inventors: Yichuan LIANG, Guang LING, Yingzhan LIN, Wenmin WANG, Zeying XIE, Yin ZHANG, Wei XU, Chao ZHOU
  • Patent number: 10116968
    Abstract: An arithmetic encoding-decoding method for compression of a video image block. The method includes an encoding process and a decoding process. The encoding process includes: 1) acquiring an information of an image block to be encoded; 2) extracting an encoding command of a weighted skip model; 3) acquiring an index of a reference frame according to the information of the image block to be encoded and the command of the weighted skip model, in which the reference frame includes a prediction block for reconstructing the image block to be encoded; 4) acquiring a context-based adaptive probability model for encoding; and 5) performing arithmetic encoding of the index of the reference frame and writing arithmetic codes into an arithmetically encoded bitstream according to the context-based adaptive probability model for encoding.
    Type: Grant
    Filed: March 4, 2016
    Date of Patent: October 30, 2018
    Assignee: PEKING UNIVERSITY SHENZHEN GRADUATE SCHOOL
    Inventors: Zhenyu Wang, Ronggang Wang, Shengfu Dong, Wenmin Wang, Tiejun Huang, Wen Gao