Patents by Inventor Kaiming He

Kaiming He has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240096072
    Abstract: In particular embodiments, a computing system may access a plurality of images for pre-training a first machine-learning model that includes an encoder and a decoder. Using each image, the system may pre-train the model by dividing the image into a set a patches, selecting a first subset of the patches to be visible and a second subset of the patches to be masked during the pre-training, processing, using the encoder, the first subset of patches to generate corresponding first latent representations, processing, using the decoder, the first latent representations corresponding to the first subset of patches and mask tokens corresponding to the second subset of patches to generate reconstructed patches corresponding to the second subset of patches, the reconstructed patches and the first subset of patches being used to generate a reconstructed image, and updating the model based on comparisons between the image and the reconstructed image.
    Type: Application
    Filed: July 27, 2022
    Publication date: March 21, 2024
    Inventors: Kaiming He, Piotr Dollar, Ross Girshick, Saining Xie, Xinlei Chen, Yanghao Li
  • Patent number: 11562243
    Abstract: In one embodiment, a method includes training a baseline machine-learning model based on a neural network comprising a plurality of stages, wherein each stage comprises a plurality of neural blocks, accessing a plurality of training samples comprising a plurality of content objects, respectively, determining one or more non-local operations, wherein each non-local operation is based on one or more pairwise functions and one or more unary functions, generating one or more non-local blocks based on the plurality of training samples and the one or more non-local operations, determining a stage from the plurality of stages of the neural network, and training a non-local machine-learning model by inserting each of the one or more non-local blocks in between at least two of the plurality of neural blocks in the determined stage of the neural network.
    Type: Grant
    Filed: November 15, 2018
    Date of Patent: January 24, 2023
    Assignee: Meta Platforms, Inc.
    Inventors: Kaiming He, Ross Girshick, Xiaolong Wang
  • Patent number: 10984245
    Abstract: In one embodiment, a method includes receiving a request for information associated with a video, determining the information associated with the video by processing the video using a machine-learning model which is based on a convolutional neural network comprising a plurality of layers, wherein at least one of the plurality of layers comprises one or more building blocks, wherein at least one of the one or more building blocks comprises a first filter configured to perform a three-dimensional (3D) pointwise convolutional operation and a second filter configured to perform a three-dimensional (3D) groupwise convolutional operation, and outputting the information associated with the video in response to the request.
    Type: Grant
    Filed: February 26, 2019
    Date of Patent: April 20, 2021
    Assignee: Facebook, Inc.
    Inventors: Du Le Hong Tran, Kaiming He, Heng Wang, Matthew Dan Feiszli, Lorenzo Torresani
  • Patent number: 10713794
    Abstract: In one embodiment, a method includes a computing system accessing a training image. The system may generate a feature map for the training image using a first neural network. The system may identify a region of interest in the feature map and generate a regional feature map for the region of interest based on sampling locations defined by a sampling region. The sampling region and the region of interest may correspond to the same region in the feature map. The system may generate an instance segmentation mask associated with the region of interest by processing the regional feature map using a second neural network. The second neural network may be trained using the instance segmentation mask. Once trained, the second neural network is configured to generate instance segmentation masks for object instances depicted in images.
    Type: Grant
    Filed: March 15, 2018
    Date of Patent: July 14, 2020
    Assignee: Facebook, Inc.
    Inventors: Kaiming He, Georgia Gkioxari, Piotr Dollar, Ross Girshick
  • Publication number: 20190156210
    Abstract: In one embodiment, a method includes training a baseline machine-learning model based on a neural network comprising a plurality of stages, wherein each stage comprises a plurality of neural blocks, accessing a plurality of training samples comprising a plurality of content objects, respectively, determining one or more non-local operations, wherein each non-local operation is based on one or more pairwise functions and one or more unary functions, generating one or more non-local blocks based on the plurality of training samples and the one or more non-local operations, determining a stage from the plurality of stages of the neural network, and training a non-local machine-learning model by inserting each of the one or more non-local blocks in between at least two of the plurality of neural blocks in the determined stage of the neural network.
    Type: Application
    Filed: November 15, 2018
    Publication date: May 23, 2019
    Inventors: Kaiming He, Ross Girshick, Xiaolong Wang
  • Patent number: 9865042
    Abstract: In implementations of the subject matter described herein, the feature maps are obtained by convoluting an input image using a plurality of layers of convolution filters. The feature maps record semantic information for respective regions on the image and only need to be computed once. Segment features of the image are extracted from the convolutional feature maps. Particularly, the binary masks may be obtained from a set of candidate segments of the image. The binary masks are used to mask the feature maps instead of the raw image. The masked feature maps define the segment features. The semantic segmentation of the image is done by determining a semantic category for each pixel in the image at least in part based on the resulting segment features.
    Type: Grant
    Filed: July 17, 2015
    Date of Patent: January 9, 2018
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Jifeng Dai, Kaiming He, Jian Sun
  • Patent number: 9858496
    Abstract: Systems, methods, and computer-readable media for providing fast and accurate object detection and classification in images are described herein. In some examples, a computing device can receive an input image. The computing device can process the image, and generate a convolutional feature map. In some configurations, the convolutional feature map can be processed through a Region Proposal Network (RPN) to generate proposals for candidate objects in the image. In various examples, the computing device can process the convolutional feature map with the proposals through a Fast Region-Based Convolutional Neural Network (FRCN) proposal classifier to determine a class of each object in the image and a confidence score associated therewith. The computing device can then provide a requestor with an output including the object classification and/or confidence score.
    Type: Grant
    Filed: January 20, 2016
    Date of Patent: January 2, 2018
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Jian Sun, Ross Girshick, Shaoqing Ren, Kaiming He
  • Patent number: 9858525
    Abstract: Disclosed herein are technologies directed to training a neural network to perform semantic segmentation. A system receives a training image, and using the training image, candidate masks are generated. The candidate masks are ranked and a set of the ranked candidate masks are selected for further processing. One of the set of the ranked candidate masks is selected to train the neural network. The one of the set of the set of the ranked candidate masks is also used as an input to train the neural network in a further training evolution. In some examples, the one of the set of the ranked candidate masks is selected randomly to reduce the likelihood of ending up in poor local optima that result in poor training inputs.
    Type: Grant
    Filed: October 14, 2015
    Date of Patent: January 2, 2018
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Jifeng Dai, Kaiming He, Jian Sun
  • Publication number: 20170206431
    Abstract: Systems, methods, and computer-readable media for providing fast and accurate object detection and classification in images are described herein. In some examples, a computing device can receive an input image. The computing device can process the image, and generate a convolutional feature map. In some configurations, the convolutional feature map can be processed through a Region Proposal Network (RPN) to generate proposals for candidate objects in the image. In various examples, the computing device can process the convolutional feature map with the proposals through a Fast Region-Based Convolutional Neural Network (FRCN) proposal classifier to determine a class of each object in the image and a confidence score associated therewith. The computing device can then provide a requestor with an output including the object classification and/or confidence score.
    Type: Application
    Filed: January 20, 2016
    Publication date: July 20, 2017
    Inventors: Jian Sun, Ross Girshick, Shaoqing Ren, Kaiming He
  • Publication number: 20170109625
    Abstract: Disclosed herein are technologies directed to training a neural network to perform semantic segmentation. A system receives a training image, and using the training image, candidate masks are generated. The candidate masks are ranked and a set of the ranked candidate masks are selected for further processing. One of the set of the ranked candidate masks is selected to train the neural network. The one of the set of the set of the ranked candidate masks is also used as an input to train the neural network in a further training evolution. In some examples, the one of the set of the ranked candidate masks is selected randomly to reduce the likelihood of ending up in poor local optima that result in poor training inputs.
    Type: Application
    Filed: October 14, 2015
    Publication date: April 20, 2017
    Inventors: Jifeng Dai, Kaiming He, Jian Sun
  • Patent number: 9542621
    Abstract: Spatial pyramid pooling (SPP) layers are combined with convolutional layers and partition an input image into divisions from finer to coarser levels, and aggregate local features in the divisions. A fixed-length output may be generated by the SPP layer(s) regardless of the input size. The multi-level spatial bins used by the SPP layer(s) may provide robustness to object deformations. An SPP layer based system may pool features extracted at variable scales due to the flexibility of input scales making it possible to generate a full-image representation for testing. Moreover, SPP networks may enable feeding of images with varying sizes or scales during training, which may increase scale-invariance and reduce the risk of over-fitting.
    Type: Grant
    Filed: February 10, 2015
    Date of Patent: January 10, 2017
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Kaiming He, Jian Sun, Xiangyu Zhang, Shaoqing Ren
  • Publication number: 20160358337
    Abstract: In implementations of the subject matter described herein, the feature maps are obtained by convoluting an input image using a plurality of layers of convolution filters. The feature maps record semantic information for respective regions on the image and only need to be computed once. Segment features of the image are extracted from the convolutional feature maps. Particularly, the binary masks may be obtained from a set of candidate segments of the image. The binary masks are used to mask the feature maps instead of the raw image. The masked feature maps define the segment features. The semantic segmentation of the image is done by determining a semantic category for each pixel in the image at least in part based on the resulting segment features.
    Type: Application
    Filed: July 17, 2015
    Publication date: December 8, 2016
    Inventors: Jifeng DAI, Kaiming HE, Jian SUN
  • Patent number: 9466092
    Abstract: According to implementations of this disclosure, image content is rotated in a content-aware fashion. In one implementation, a mesh is formed over an image and image lines in the image content are identified. The image is warped using an energy function that rotates a subset of the lines a predetermined rotation angle, while rotating other lines by an angle other than the predetermined rotation angle. In one example, lines that are intended to be horizontal or vertical after correcting are rotated by a rotation angle that will make them horizontal or vertical, whereas oblique lines are rotated by an angle other than the rotation angle.
    Type: Grant
    Filed: November 27, 2013
    Date of Patent: October 11, 2016
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Kaiming He, Huiwen Chang, Jian Sun
  • Patent number: 9424493
    Abstract: Neural networks for object detection in images are used with a spatial pyramid pooling (SPP) layer. Using the SPP network structure, a fixed-length representation is generated regardless of image size and scale. The feature maps are computed from the entire image once, and the features are pooled in arbitrary regions (sub-images) to generate fixed-length representations for training the detectors. Thus, repeated computation of the convolutional features is avoided while accuracy is enhanced.
    Type: Grant
    Filed: February 9, 2015
    Date of Patent: August 23, 2016
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Kaiming He, Jian Sun, Xiangyu Zhang
  • Publication number: 20160104056
    Abstract: Spatial pyramid pooling (SPP) layers are combined with convolutional layers and partition an input image into divisions from finer to coarser levels, and aggregate local features in the divisions. A fixed-length output may be generated by the SPP layer(s) regardless of the input size. The multi-level spatial bins used by the SPP layer(s) may provide robustness to object deformations. An SPP layer based system may pool features extracted at variable scales due to the flexibility of input scales making it possible to generate a full-image representation for testing. Moreover, SPP networks may enable feeding of images with varying sizes or scales during training, which may increase scale-invariance and reduce the risk of over-fitting.
    Type: Application
    Filed: February 10, 2015
    Publication date: April 14, 2016
    Inventors: Kaiming He, Jian Sun, Xiangyu Zhang, Shaoqing Ren
  • Publication number: 20160104058
    Abstract: Neural networks for object detection in images are used with a spatial pyramid pooling (SPP) layer. Using the SPP network structure, a fixed-length representation is generated regardless of image size and scale. The feature maps are computed from the entire image once, and the features are pooled in arbitrary regions (sub-images) to generate fixed-length representations for training the detectors. Thus, repeated computation of the convolutional features is avoided while accuracy is enhanced.
    Type: Application
    Filed: February 9, 2015
    Publication date: April 14, 2016
    Inventors: Kaiming He, Jian Sun, Xiangyu Zhang
  • Publication number: 20150147003
    Abstract: According to implementations of this disclosure, image content is rotated in a content-aware fashion. In one implementation, a mesh is formed over an image and image lines in the image content are identified. The image is warped using an energy function that rotates a subset of the lines a predetermined rotation angle, while rotating other lines by an angle other than the predetermined rotation angle. In one example, lines that are intended to be horizontal or vertical after correcting are rotated by a rotation angle that will make them horizontal or vertical, whereas oblique lines are rotated by an angle other than the rotation angle.
    Type: Application
    Filed: November 27, 2013
    Publication date: May 28, 2015
    Applicant: Microsoft Corporation
    Inventors: Kaiming He, Huiwen Chang, Jian Sun
  • Publication number: 20150131924
    Abstract: Stitched images generated from combinations of multiple separate images mostly have irregular boundaries. Users generally prefer rectangular boundaries. Techniques for warping an image with irregular boundaries to give the image rectangular boundaries are disclosed herein. Preliminary warping of the image into the rectangle provides a rectangular shape on which to overlay a mesh. The image is reverted to its original shape with irregular boundaries and the mesh is warped accordingly. Global optimization is applied to the image by finding an energy minimum, or reduced energy below a threshold, for a function that gives the image a rectangular shape while preserving shapes and preserving straight lines. The mesh is warped according to the solution of the function and the image is stretched and/or compressed along with the mesh. This approach generates results that are qualitatively more visually attractive than other contemporary techniques.
    Type: Application
    Filed: November 13, 2013
    Publication date: May 14, 2015
    Applicant: Microsoft Corporation
    Inventors: Kaiming He, Huiwen Chang, Jian Sun
  • Publication number: 20150016717
    Abstract: A computing device is described herein that is configured to select a pixel pair including a foreground pixel of an image and a background pixel of the image from a global set of pixels based at least on spatial distances from an unknown pixel and color distances from the unknown pixel. The computing device is further configured to determine an opacity measure for the unknown pixel based at least on the selected pixel pair.
    Type: Application
    Filed: September 29, 2014
    Publication date: January 15, 2015
    Inventors: Kaiming He, Jian Sun, Carsten Curt Eckard Rother, Xiao-ou Tang
  • Publication number: 20140369622
    Abstract: An image completion system receives an input image that includes an unknown region to be filled. Upon receiving the image, the image completion system examines a known region of the image other than the unknown region and matches a plurality of patches that are obtained from the known region. The image completion system determines a plurality of offsets associated with the matching and computes statistics associated with these offsets. Based on a subset of the offsets, the image completion system locates features in the known region that are used to fill the unknown region and corresponding offsets based on an energy function and an optimization algorithm. Upon locating the features, the image completion system fills the unknown region based on the located features and the corresponding offsets.
    Type: Application
    Filed: June 5, 2014
    Publication date: December 18, 2014
    Inventors: Kaiming He, Jian Sun