Patents by Inventor Kaiming He

Kaiming He has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Masked autoencoders for computer vision

Patent number: 12266160

Abstract: In particular embodiments, a computing system may access a plurality of images for pre-training a first machine-learning model that includes an encoder and a decoder. Using each image, the system may pre-train the model by dividing the image into a set a patches, selecting a first subset of the patches to be visible and a second subset of the patches to be masked during the pre-training, processing, using the encoder, the first subset of patches to generate corresponding first latent representations, processing, using the decoder, the first latent representations corresponding to the first subset of patches and mask tokens corresponding to the second subset of patches to generate reconstructed patches corresponding to the second subset of patches, the reconstructed patches and the first subset of patches being used to generate a reconstructed image, and updating the model based on comparisons between the image and the reconstructed image.

Type: Grant

Filed: July 27, 2022

Date of Patent: April 1, 2025

Assignee: Meta Platforms, Inc.

Inventors: Kaiming He, Piotr Dollar, Ross Girshick, Saining Xie, Xinlei Chen, Yanghao Li
Masked Autoencoders for Computer Vision

Publication number: 20240096072

Abstract: In particular embodiments, a computing system may access a plurality of images for pre-training a first machine-learning model that includes an encoder and a decoder. Using each image, the system may pre-train the model by dividing the image into a set a patches, selecting a first subset of the patches to be visible and a second subset of the patches to be masked during the pre-training, processing, using the encoder, the first subset of patches to generate corresponding first latent representations, processing, using the decoder, the first latent representations corresponding to the first subset of patches and mask tokens corresponding to the second subset of patches to generate reconstructed patches corresponding to the second subset of patches, the reconstructed patches and the first subset of patches being used to generate a reconstructed image, and updating the model based on comparisons between the image and the reconstructed image.

Type: Application

Filed: July 27, 2022

Publication date: March 21, 2024

Inventors: Kaiming He, Piotr Dollar, Ross Girshick, Saining Xie, Xinlei Chen, Yanghao Li
Machine-learning models based on non-local neural networks

Patent number: 11562243

Abstract: In one embodiment, a method includes training a baseline machine-learning model based on a neural network comprising a plurality of stages, wherein each stage comprises a plurality of neural blocks, accessing a plurality of training samples comprising a plurality of content objects, respectively, determining one or more non-local operations, wherein each non-local operation is based on one or more pairwise functions and one or more unary functions, generating one or more non-local blocks based on the plurality of training samples and the one or more non-local operations, determining a stage from the plurality of stages of the neural network, and training a non-local machine-learning model by inserting each of the one or more non-local blocks in between at least two of the plurality of neural blocks in the determined stage of the neural network.

Type: Grant

Filed: November 15, 2018

Date of Patent: January 24, 2023

Assignee: Meta Platforms, Inc.

Inventors: Kaiming He, Ross Girshick, Xiaolong Wang
Convolutional neural network based on groupwise convolution for efficient video analysis

Patent number: 10984245

Abstract: In one embodiment, a method includes receiving a request for information associated with a video, determining the information associated with the video by processing the video using a machine-learning model which is based on a convolutional neural network comprising a plurality of layers, wherein at least one of the plurality of layers comprises one or more building blocks, wherein at least one of the one or more building blocks comprises a first filter configured to perform a three-dimensional (3D) pointwise convolutional operation and a second filter configured to perform a three-dimensional (3D) groupwise convolutional operation, and outputting the information associated with the video in response to the request.

Type: Grant

Filed: February 26, 2019

Date of Patent: April 20, 2021

Assignee: Facebook, Inc.

Inventors: Du Le Hong Tran, Kaiming He, Heng Wang, Matthew Dan Feiszli, Lorenzo Torresani
Method and system for using machine-learning for object instance segmentation

Patent number: 10713794

Abstract: In one embodiment, a method includes a computing system accessing a training image. The system may generate a feature map for the training image using a first neural network. The system may identify a region of interest in the feature map and generate a regional feature map for the region of interest based on sampling locations defined by a sampling region. The sampling region and the region of interest may correspond to the same region in the feature map. The system may generate an instance segmentation mask associated with the region of interest by processing the regional feature map using a second neural network. The second neural network may be trained using the instance segmentation mask. Once trained, the second neural network is configured to generate instance segmentation masks for object instances depicted in images.

Type: Grant

Filed: March 15, 2018

Date of Patent: July 14, 2020

Assignee: Facebook, Inc.

Inventors: Kaiming He, Georgia Gkioxari, Piotr Dollar, Ross Girshick
Machine-Learning Models Based on Non-local Neural Networks

Publication number: 20190156210

Abstract: In one embodiment, a method includes training a baseline machine-learning model based on a neural network comprising a plurality of stages, wherein each stage comprises a plurality of neural blocks, accessing a plurality of training samples comprising a plurality of content objects, respectively, determining one or more non-local operations, wherein each non-local operation is based on one or more pairwise functions and one or more unary functions, generating one or more non-local blocks based on the plurality of training samples and the one or more non-local operations, determining a stage from the plurality of stages of the neural network, and training a non-local machine-learning model by inserting each of the one or more non-local blocks in between at least two of the plurality of neural blocks in the determined stage of the neural network.

Type: Application

Filed: November 15, 2018

Publication date: May 23, 2019

Inventors: Kaiming He, Ross Girshick, Xiaolong Wang
Image semantic segmentation

Patent number: 9865042

Abstract: In implementations of the subject matter described herein, the feature maps are obtained by convoluting an input image using a plurality of layers of convolution filters. The feature maps record semantic information for respective regions on the image and only need to be computed once. Segment features of the image are extracted from the convolutional feature maps. Particularly, the binary masks may be obtained from a set of candidate segments of the image. The binary masks are used to mask the feature maps instead of the raw image. The masked feature maps define the segment features. The semantic segmentation of the image is done by determining a semantic category for each pixel in the image at least in part based on the resulting segment features.

Type: Grant

Filed: July 17, 2015

Date of Patent: January 9, 2018

Assignee: Microsoft Technology Licensing, LLC

Inventors: Jifeng Dai, Kaiming He, Jian Sun
System for training networks for semantic segmentation

Patent number: 9858525

Abstract: Disclosed herein are technologies directed to training a neural network to perform semantic segmentation. A system receives a training image, and using the training image, candidate masks are generated. The candidate masks are ranked and a set of the ranked candidate masks are selected for further processing. One of the set of the ranked candidate masks is selected to train the neural network. The one of the set of the set of the ranked candidate masks is also used as an input to train the neural network in a further training evolution. In some examples, the one of the set of the ranked candidate masks is selected randomly to reduce the likelihood of ending up in poor local optima that result in poor training inputs.

Type: Grant

Filed: October 14, 2015

Date of Patent: January 2, 2018

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Jifeng Dai, Kaiming He, Jian Sun
Object detection and classification in images

Patent number: 9858496

Abstract: Systems, methods, and computer-readable media for providing fast and accurate object detection and classification in images are described herein. In some examples, a computing device can receive an input image. The computing device can process the image, and generate a convolutional feature map. In some configurations, the convolutional feature map can be processed through a Region Proposal Network (RPN) to generate proposals for candidate objects in the image. In various examples, the computing device can process the convolutional feature map with the proposals through a Fast Region-Based Convolutional Neural Network (FRCN) proposal classifier to determine a class of each object in the image and a confidence score associated therewith. The computing device can then provide a requestor with an output including the object classification and/or confidence score.

Type: Grant

Filed: January 20, 2016

Date of Patent: January 2, 2018

Assignee: Microsoft Technology Licensing, LLC

Inventors: Jian Sun, Ross Girshick, Shaoqing Ren, Kaiming He
OBJECT DETECTION AND CLASSIFICATION IN IMAGES

Publication number: 20170206431

Abstract: Systems, methods, and computer-readable media for providing fast and accurate object detection and classification in images are described herein. In some examples, a computing device can receive an input image. The computing device can process the image, and generate a convolutional feature map. In some configurations, the convolutional feature map can be processed through a Region Proposal Network (RPN) to generate proposals for candidate objects in the image. In various examples, the computing device can process the convolutional feature map with the proposals through a Fast Region-Based Convolutional Neural Network (FRCN) proposal classifier to determine a class of each object in the image and a confidence score associated therewith. The computing device can then provide a requestor with an output including the object classification and/or confidence score.

Type: Application

Filed: January 20, 2016

Publication date: July 20, 2017

Inventors: Jian Sun, Ross Girshick, Shaoqing Ren, Kaiming He
SYSTEM FOR TRAINING NETWORKS FOR SEMANTIC SEGMENTATION

Publication number: 20170109625

Abstract: Disclosed herein are technologies directed to training a neural network to perform semantic segmentation. A system receives a training image, and using the training image, candidate masks are generated. The candidate masks are ranked and a set of the ranked candidate masks are selected for further processing. One of the set of the ranked candidate masks is selected to train the neural network. The one of the set of the set of the ranked candidate masks is also used as an input to train the neural network in a further training evolution. In some examples, the one of the set of the ranked candidate masks is selected randomly to reduce the likelihood of ending up in poor local optima that result in poor training inputs.

Type: Application

Filed: October 14, 2015

Publication date: April 20, 2017

Inventors: Jifeng Dai, Kaiming He, Jian Sun
Spatial pyramid pooling networks for image processing

Patent number: 9542621

Abstract: Spatial pyramid pooling (SPP) layers are combined with convolutional layers and partition an input image into divisions from finer to coarser levels, and aggregate local features in the divisions. A fixed-length output may be generated by the SPP layer(s) regardless of the input size. The multi-level spatial bins used by the SPP layer(s) may provide robustness to object deformations. An SPP layer based system may pool features extracted at variable scales due to the flexibility of input scales making it possible to generate a full-image representation for testing. Moreover, SPP networks may enable feeding of images with varying sizes or scales during training, which may increase scale-invariance and reduce the risk of over-fitting.

Type: Grant

Filed: February 10, 2015

Date of Patent: January 10, 2017

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Kaiming He, Jian Sun, Xiangyu Zhang, Shaoqing Ren
IMAGE SEMANTIC SEGMENTATION

Publication number: 20160358337

Abstract: In implementations of the subject matter described herein, the feature maps are obtained by convoluting an input image using a plurality of layers of convolution filters. The feature maps record semantic information for respective regions on the image and only need to be computed once. Segment features of the image are extracted from the convolutional feature maps. Particularly, the binary masks may be obtained from a set of candidate segments of the image. The binary masks are used to mask the feature maps instead of the raw image. The masked feature maps define the segment features. The semantic segmentation of the image is done by determining a semantic category for each pixel in the image at least in part based on the resulting segment features.

Type: Application

Filed: July 17, 2015

Publication date: December 8, 2016

Inventors: Jifeng DAI, Kaiming HE, Jian SUN
Content-aware image rotation

Patent number: 9466092

Abstract: According to implementations of this disclosure, image content is rotated in a content-aware fashion. In one implementation, a mesh is formed over an image and image lines in the image content are identified. The image is warped using an energy function that rotates a subset of the lines a predetermined rotation angle, while rotating other lines by an angle other than the predetermined rotation angle. In one example, lines that are intended to be horizontal or vertical after correcting are rotated by a rotation angle that will make them horizontal or vertical, whereas oblique lines are rotated by an angle other than the rotation angle.

Type: Grant

Filed: November 27, 2013

Date of Patent: October 11, 2016

Assignee: Microsoft Technology Licensing, LLC

Inventors: Kaiming He, Huiwen Chang, Jian Sun
Generic object detection in images

Patent number: 9424493

Abstract: Neural networks for object detection in images are used with a spatial pyramid pooling (SPP) layer. Using the SPP network structure, a fixed-length representation is generated regardless of image size and scale. The feature maps are computed from the entire image once, and the features are pooled in arbitrary regions (sub-images) to generate fixed-length representations for training the detectors. Thus, repeated computation of the convolutional features is avoided while accuracy is enhanced.

Type: Grant

Filed: February 9, 2015

Date of Patent: August 23, 2016

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Kaiming He, Jian Sun, Xiangyu Zhang
SPATIAL PYRAMID POOLING NETWORKS FOR IMAGE PROCESSING

Publication number: 20160104056

Abstract: Spatial pyramid pooling (SPP) layers are combined with convolutional layers and partition an input image into divisions from finer to coarser levels, and aggregate local features in the divisions. A fixed-length output may be generated by the SPP layer(s) regardless of the input size. The multi-level spatial bins used by the SPP layer(s) may provide robustness to object deformations. An SPP layer based system may pool features extracted at variable scales due to the flexibility of input scales making it possible to generate a full-image representation for testing. Moreover, SPP networks may enable feeding of images with varying sizes or scales during training, which may increase scale-invariance and reduce the risk of over-fitting.

Type: Application

Filed: February 10, 2015

Publication date: April 14, 2016

Inventors: Kaiming He, Jian Sun, Xiangyu Zhang, Shaoqing Ren
GENERIC OBJECT DETECTION IN IMAGES

Publication number: 20160104058

Abstract: Neural networks for object detection in images are used with a spatial pyramid pooling (SPP) layer. Using the SPP network structure, a fixed-length representation is generated regardless of image size and scale. The feature maps are computed from the entire image once, and the features are pooled in arbitrary regions (sub-images) to generate fixed-length representations for training the detectors. Thus, repeated computation of the convolutional features is avoided while accuracy is enhanced.

Type: Application

Filed: February 9, 2015

Publication date: April 14, 2016

Inventors: Kaiming He, Jian Sun, Xiangyu Zhang
Content-Aware Image Rotation

Publication number: 20150147003

Abstract: According to implementations of this disclosure, image content is rotated in a content-aware fashion. In one implementation, a mesh is formed over an image and image lines in the image content are identified. The image is warped using an energy function that rotates a subset of the lines a predetermined rotation angle, while rotating other lines by an angle other than the predetermined rotation angle. In one example, lines that are intended to be horizontal or vertical after correcting are rotated by a rotation angle that will make them horizontal or vertical, whereas oblique lines are rotated by an angle other than the rotation angle.

Type: Application

Filed: November 27, 2013

Publication date: May 28, 2015

Applicant: Microsoft Corporation

Inventors: Kaiming He, Huiwen Chang, Jian Sun
Creation of Rectangular Images from Input Images

Publication number: 20150131924

Abstract: Stitched images generated from combinations of multiple separate images mostly have irregular boundaries. Users generally prefer rectangular boundaries. Techniques for warping an image with irregular boundaries to give the image rectangular boundaries are disclosed herein. Preliminary warping of the image into the rectangle provides a rectangular shape on which to overlay a mesh. The image is reverted to its original shape with irregular boundaries and the mesh is warped accordingly. Global optimization is applied to the image by finding an energy minimum, or reduced energy below a threshold, for a function that gives the image a rectangular shape while preserving shapes and preserving straight lines. The mesh is warped according to the solution of the function and the image is stretched and/or compressed along with the mesh. This approach generates results that are qualitatively more visually attractive than other contemporary techniques.

Type: Application

Filed: November 13, 2013

Publication date: May 14, 2015

Applicant: Microsoft Corporation

Inventors: Kaiming He, Huiwen Chang, Jian Sun
Opacity Measurement Using A Global Pixel Set

Publication number: 20150016717

Abstract: A computing device is described herein that is configured to select a pixel pair including a foreground pixel of an image and a background pixel of the image from a global set of pixels based at least on spatial distances from an unknown pixel and color distances from the unknown pixel. The computing device is further configured to determine an opacity measure for the unknown pixel based at least on the selected pixel pair.

Type: Application

Filed: September 29, 2014

Publication date: January 15, 2015

Inventors: Kaiming He, Jian Sun, Carsten Curt Eckard Rother, Xiao-ou Tang

1 2 next