Patents by Inventor Kaiming He
Kaiming He has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12266160Abstract: In particular embodiments, a computing system may access a plurality of images for pre-training a first machine-learning model that includes an encoder and a decoder. Using each image, the system may pre-train the model by dividing the image into a set a patches, selecting a first subset of the patches to be visible and a second subset of the patches to be masked during the pre-training, processing, using the encoder, the first subset of patches to generate corresponding first latent representations, processing, using the decoder, the first latent representations corresponding to the first subset of patches and mask tokens corresponding to the second subset of patches to generate reconstructed patches corresponding to the second subset of patches, the reconstructed patches and the first subset of patches being used to generate a reconstructed image, and updating the model based on comparisons between the image and the reconstructed image.Type: GrantFiled: July 27, 2022Date of Patent: April 1, 2025Assignee: Meta Platforms, Inc.Inventors: Kaiming He, Piotr Dollar, Ross Girshick, Saining Xie, Xinlei Chen, Yanghao Li
-
Patent number: 12230242Abstract: Provided are a sound gathering device for voiceprint monitoring and a preparation method. The sound gathering device is a cone structure and includes a front end portion (1) and a rear end portion (2), the front end portion (1) is trumpet-shaped, an inner wall surface of the front end portion (1) is present a pattern array of a microstructure (3), and the microstructure (3) presenting the pattern array is formed through laser etching. A sound wave frequency monitored by the sound gathering device is 50 Hz˜10 kHz, and a sound wave enters the sound gathering device from an entrance of the front end portion (1), is reflected through the pattern array of the microstructure (3) on the inner wall surface of the front end portion (1), and is transmitted from an exit of the rear end portion (2), and a sound pressure of the exit of the rear end portion (2) is 4 times˜8 times a sound pressure of the entrance of the front end portion (1).Type: GrantFiled: July 20, 2023Date of Patent: February 18, 2025Assignee: State Grid Jiangsu Taizhou Power Supply CompanyInventors: Yong Li, Ting Chen, Ling Ju, Jijing Yin, Ze Zhang, Xingchun Xu, Beibei Weng, Zhenguo Chuai, Yan Wu, Li Chen, Yang Cheng, Tianyu He, Le Yuan, Jie Qian, Debao Tang, Yanquan Zhu, Anqi Ding, Kaiming Bian, Wen Chen, Wanjian Hu, Hongbo Dai, Weijun Shi
-
Publication number: 20240096072Abstract: In particular embodiments, a computing system may access a plurality of images for pre-training a first machine-learning model that includes an encoder and a decoder. Using each image, the system may pre-train the model by dividing the image into a set a patches, selecting a first subset of the patches to be visible and a second subset of the patches to be masked during the pre-training, processing, using the encoder, the first subset of patches to generate corresponding first latent representations, processing, using the decoder, the first latent representations corresponding to the first subset of patches and mask tokens corresponding to the second subset of patches to generate reconstructed patches corresponding to the second subset of patches, the reconstructed patches and the first subset of patches being used to generate a reconstructed image, and updating the model based on comparisons between the image and the reconstructed image.Type: ApplicationFiled: July 27, 2022Publication date: March 21, 2024Inventors: Kaiming He, Piotr Dollar, Ross Girshick, Saining Xie, Xinlei Chen, Yanghao Li
-
Patent number: 11562243Abstract: In one embodiment, a method includes training a baseline machine-learning model based on a neural network comprising a plurality of stages, wherein each stage comprises a plurality of neural blocks, accessing a plurality of training samples comprising a plurality of content objects, respectively, determining one or more non-local operations, wherein each non-local operation is based on one or more pairwise functions and one or more unary functions, generating one or more non-local blocks based on the plurality of training samples and the one or more non-local operations, determining a stage from the plurality of stages of the neural network, and training a non-local machine-learning model by inserting each of the one or more non-local blocks in between at least two of the plurality of neural blocks in the determined stage of the neural network.Type: GrantFiled: November 15, 2018Date of Patent: January 24, 2023Assignee: Meta Platforms, Inc.Inventors: Kaiming He, Ross Girshick, Xiaolong Wang
-
Patent number: 10984245Abstract: In one embodiment, a method includes receiving a request for information associated with a video, determining the information associated with the video by processing the video using a machine-learning model which is based on a convolutional neural network comprising a plurality of layers, wherein at least one of the plurality of layers comprises one or more building blocks, wherein at least one of the one or more building blocks comprises a first filter configured to perform a three-dimensional (3D) pointwise convolutional operation and a second filter configured to perform a three-dimensional (3D) groupwise convolutional operation, and outputting the information associated with the video in response to the request.Type: GrantFiled: February 26, 2019Date of Patent: April 20, 2021Assignee: Facebook, Inc.Inventors: Du Le Hong Tran, Kaiming He, Heng Wang, Matthew Dan Feiszli, Lorenzo Torresani
-
Patent number: 10713794Abstract: In one embodiment, a method includes a computing system accessing a training image. The system may generate a feature map for the training image using a first neural network. The system may identify a region of interest in the feature map and generate a regional feature map for the region of interest based on sampling locations defined by a sampling region. The sampling region and the region of interest may correspond to the same region in the feature map. The system may generate an instance segmentation mask associated with the region of interest by processing the regional feature map using a second neural network. The second neural network may be trained using the instance segmentation mask. Once trained, the second neural network is configured to generate instance segmentation masks for object instances depicted in images.Type: GrantFiled: March 15, 2018Date of Patent: July 14, 2020Assignee: Facebook, Inc.Inventors: Kaiming He, Georgia Gkioxari, Piotr Dollar, Ross Girshick
-
Publication number: 20190156210Abstract: In one embodiment, a method includes training a baseline machine-learning model based on a neural network comprising a plurality of stages, wherein each stage comprises a plurality of neural blocks, accessing a plurality of training samples comprising a plurality of content objects, respectively, determining one or more non-local operations, wherein each non-local operation is based on one or more pairwise functions and one or more unary functions, generating one or more non-local blocks based on the plurality of training samples and the one or more non-local operations, determining a stage from the plurality of stages of the neural network, and training a non-local machine-learning model by inserting each of the one or more non-local blocks in between at least two of the plurality of neural blocks in the determined stage of the neural network.Type: ApplicationFiled: November 15, 2018Publication date: May 23, 2019Inventors: Kaiming He, Ross Girshick, Xiaolong Wang
-
Patent number: 9865042Abstract: In implementations of the subject matter described herein, the feature maps are obtained by convoluting an input image using a plurality of layers of convolution filters. The feature maps record semantic information for respective regions on the image and only need to be computed once. Segment features of the image are extracted from the convolutional feature maps. Particularly, the binary masks may be obtained from a set of candidate segments of the image. The binary masks are used to mask the feature maps instead of the raw image. The masked feature maps define the segment features. The semantic segmentation of the image is done by determining a semantic category for each pixel in the image at least in part based on the resulting segment features.Type: GrantFiled: July 17, 2015Date of Patent: January 9, 2018Assignee: Microsoft Technology Licensing, LLCInventors: Jifeng Dai, Kaiming He, Jian Sun
-
Patent number: 9858525Abstract: Disclosed herein are technologies directed to training a neural network to perform semantic segmentation. A system receives a training image, and using the training image, candidate masks are generated. The candidate masks are ranked and a set of the ranked candidate masks are selected for further processing. One of the set of the ranked candidate masks is selected to train the neural network. The one of the set of the set of the ranked candidate masks is also used as an input to train the neural network in a further training evolution. In some examples, the one of the set of the ranked candidate masks is selected randomly to reduce the likelihood of ending up in poor local optima that result in poor training inputs.Type: GrantFiled: October 14, 2015Date of Patent: January 2, 2018Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Jifeng Dai, Kaiming He, Jian Sun
-
Patent number: 9858496Abstract: Systems, methods, and computer-readable media for providing fast and accurate object detection and classification in images are described herein. In some examples, a computing device can receive an input image. The computing device can process the image, and generate a convolutional feature map. In some configurations, the convolutional feature map can be processed through a Region Proposal Network (RPN) to generate proposals for candidate objects in the image. In various examples, the computing device can process the convolutional feature map with the proposals through a Fast Region-Based Convolutional Neural Network (FRCN) proposal classifier to determine a class of each object in the image and a confidence score associated therewith. The computing device can then provide a requestor with an output including the object classification and/or confidence score.Type: GrantFiled: January 20, 2016Date of Patent: January 2, 2018Assignee: Microsoft Technology Licensing, LLCInventors: Jian Sun, Ross Girshick, Shaoqing Ren, Kaiming He
-
Publication number: 20170206431Abstract: Systems, methods, and computer-readable media for providing fast and accurate object detection and classification in images are described herein. In some examples, a computing device can receive an input image. The computing device can process the image, and generate a convolutional feature map. In some configurations, the convolutional feature map can be processed through a Region Proposal Network (RPN) to generate proposals for candidate objects in the image. In various examples, the computing device can process the convolutional feature map with the proposals through a Fast Region-Based Convolutional Neural Network (FRCN) proposal classifier to determine a class of each object in the image and a confidence score associated therewith. The computing device can then provide a requestor with an output including the object classification and/or confidence score.Type: ApplicationFiled: January 20, 2016Publication date: July 20, 2017Inventors: Jian Sun, Ross Girshick, Shaoqing Ren, Kaiming He
-
Publication number: 20170109625Abstract: Disclosed herein are technologies directed to training a neural network to perform semantic segmentation. A system receives a training image, and using the training image, candidate masks are generated. The candidate masks are ranked and a set of the ranked candidate masks are selected for further processing. One of the set of the ranked candidate masks is selected to train the neural network. The one of the set of the set of the ranked candidate masks is also used as an input to train the neural network in a further training evolution. In some examples, the one of the set of the ranked candidate masks is selected randomly to reduce the likelihood of ending up in poor local optima that result in poor training inputs.Type: ApplicationFiled: October 14, 2015Publication date: April 20, 2017Inventors: Jifeng Dai, Kaiming He, Jian Sun
-
Patent number: 9542621Abstract: Spatial pyramid pooling (SPP) layers are combined with convolutional layers and partition an input image into divisions from finer to coarser levels, and aggregate local features in the divisions. A fixed-length output may be generated by the SPP layer(s) regardless of the input size. The multi-level spatial bins used by the SPP layer(s) may provide robustness to object deformations. An SPP layer based system may pool features extracted at variable scales due to the flexibility of input scales making it possible to generate a full-image representation for testing. Moreover, SPP networks may enable feeding of images with varying sizes or scales during training, which may increase scale-invariance and reduce the risk of over-fitting.Type: GrantFiled: February 10, 2015Date of Patent: January 10, 2017Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Kaiming He, Jian Sun, Xiangyu Zhang, Shaoqing Ren
-
Publication number: 20160358337Abstract: In implementations of the subject matter described herein, the feature maps are obtained by convoluting an input image using a plurality of layers of convolution filters. The feature maps record semantic information for respective regions on the image and only need to be computed once. Segment features of the image are extracted from the convolutional feature maps. Particularly, the binary masks may be obtained from a set of candidate segments of the image. The binary masks are used to mask the feature maps instead of the raw image. The masked feature maps define the segment features. The semantic segmentation of the image is done by determining a semantic category for each pixel in the image at least in part based on the resulting segment features.Type: ApplicationFiled: July 17, 2015Publication date: December 8, 2016Inventors: Jifeng DAI, Kaiming HE, Jian SUN
-
Patent number: 9466092Abstract: According to implementations of this disclosure, image content is rotated in a content-aware fashion. In one implementation, a mesh is formed over an image and image lines in the image content are identified. The image is warped using an energy function that rotates a subset of the lines a predetermined rotation angle, while rotating other lines by an angle other than the predetermined rotation angle. In one example, lines that are intended to be horizontal or vertical after correcting are rotated by a rotation angle that will make them horizontal or vertical, whereas oblique lines are rotated by an angle other than the rotation angle.Type: GrantFiled: November 27, 2013Date of Patent: October 11, 2016Assignee: Microsoft Technology Licensing, LLCInventors: Kaiming He, Huiwen Chang, Jian Sun
-
Patent number: 9424493Abstract: Neural networks for object detection in images are used with a spatial pyramid pooling (SPP) layer. Using the SPP network structure, a fixed-length representation is generated regardless of image size and scale. The feature maps are computed from the entire image once, and the features are pooled in arbitrary regions (sub-images) to generate fixed-length representations for training the detectors. Thus, repeated computation of the convolutional features is avoided while accuracy is enhanced.Type: GrantFiled: February 9, 2015Date of Patent: August 23, 2016Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Kaiming He, Jian Sun, Xiangyu Zhang
-
Publication number: 20160104058Abstract: Neural networks for object detection in images are used with a spatial pyramid pooling (SPP) layer. Using the SPP network structure, a fixed-length representation is generated regardless of image size and scale. The feature maps are computed from the entire image once, and the features are pooled in arbitrary regions (sub-images) to generate fixed-length representations for training the detectors. Thus, repeated computation of the convolutional features is avoided while accuracy is enhanced.Type: ApplicationFiled: February 9, 2015Publication date: April 14, 2016Inventors: Kaiming He, Jian Sun, Xiangyu Zhang
-
Publication number: 20160104056Abstract: Spatial pyramid pooling (SPP) layers are combined with convolutional layers and partition an input image into divisions from finer to coarser levels, and aggregate local features in the divisions. A fixed-length output may be generated by the SPP layer(s) regardless of the input size. The multi-level spatial bins used by the SPP layer(s) may provide robustness to object deformations. An SPP layer based system may pool features extracted at variable scales due to the flexibility of input scales making it possible to generate a full-image representation for testing. Moreover, SPP networks may enable feeding of images with varying sizes or scales during training, which may increase scale-invariance and reduce the risk of over-fitting.Type: ApplicationFiled: February 10, 2015Publication date: April 14, 2016Inventors: Kaiming He, Jian Sun, Xiangyu Zhang, Shaoqing Ren
-
Publication number: 20150147003Abstract: According to implementations of this disclosure, image content is rotated in a content-aware fashion. In one implementation, a mesh is formed over an image and image lines in the image content are identified. The image is warped using an energy function that rotates a subset of the lines a predetermined rotation angle, while rotating other lines by an angle other than the predetermined rotation angle. In one example, lines that are intended to be horizontal or vertical after correcting are rotated by a rotation angle that will make them horizontal or vertical, whereas oblique lines are rotated by an angle other than the rotation angle.Type: ApplicationFiled: November 27, 2013Publication date: May 28, 2015Applicant: Microsoft CorporationInventors: Kaiming He, Huiwen Chang, Jian Sun
-
Publication number: 20150131924Abstract: Stitched images generated from combinations of multiple separate images mostly have irregular boundaries. Users generally prefer rectangular boundaries. Techniques for warping an image with irregular boundaries to give the image rectangular boundaries are disclosed herein. Preliminary warping of the image into the rectangle provides a rectangular shape on which to overlay a mesh. The image is reverted to its original shape with irregular boundaries and the mesh is warped accordingly. Global optimization is applied to the image by finding an energy minimum, or reduced energy below a threshold, for a function that gives the image a rectangular shape while preserving shapes and preserving straight lines. The mesh is warped according to the solution of the function and the image is stretched and/or compressed along with the mesh. This approach generates results that are qualitatively more visually attractive than other contemporary techniques.Type: ApplicationFiled: November 13, 2013Publication date: May 14, 2015Applicant: Microsoft CorporationInventors: Kaiming He, Huiwen Chang, Jian Sun