Patents by Inventor Xiyang Dai
Xiyang Dai has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12223412Abstract: A computer device for automatic feature detection comprises a processor, a communication device, and a memory configured to hold instructions executable by the processor to instantiate a dynamic convolution neural network, receive input data via the communication network, and execute the dynamic convolution neural network to automatically detect features in the input data. The dynamic convolution neural network compresses the input data from an input space having a dimensionality equal to a predetermined number of channels into an intermediate space having a dimensionality less than the number of channels. The dynamic convolution neural network dynamically fuses the channels into an intermediate representation within the intermediate space and expands the intermediate representation from the intermediate space to an expanded representation in an output space having a higher dimensionality than the dimensionality of the intermediate space.Type: GrantFiled: December 16, 2020Date of Patent: February 11, 2025Assignee: Microsoft Technology Licensing, LLCInventors: Yinpeng Chen, Xiyang Dai, Mengchen Liu, Dongdong Chen, Lu Yuan, Zicheng Liu, Ye Yu, Mei Chen, Yunsheng Li
-
Publication number: 20250037252Abstract: The disclosure herein describes generating an inpainted image from a masked image using a patch-based encoder and an unquantized transformer. An image including a masked region and an unmasked region is received, and the received image is divided into a plurality of patches including masked patches. The plurality of patches is encoded into a plurality of feature vectors, wherein each patch is encoded to a feature vector. Using a transformer, a predicted token is generated for each masked patch using a feature vector encoded from the masked patch, and a quantized vector of the masked patch is determined using generated predicted token and a masked patch-specific codebook. The determined quantized vector of the masked patch is included into a set of quantized vectors associated with the plurality of patches, and an output image is generated from the set of quantized vectors using a decoder.Type: ApplicationFiled: October 11, 2024Publication date: January 30, 2025Inventors: Dongdong CHEN, Xiyang DAI, Yinpeng CHEN, Mengchen LIU, Lu YUAN
-
Patent number: 12148131Abstract: The disclosure herein describes generating an inpainted image from a masked image using a patch-based encoder and an unquantized transformer. An image including a masked region and an unmasked region is received, and the received image is divided into a plurality of patches including masked patches. The plurality of patches is encoded into a plurality of feature vectors, wherein each patch is encoded to a feature vector. Using a transformer, a predicted token is generated for each masked patch using a feature vector encoded from the masked patch, and a quantized vector of the masked patch is determined using generated predicted token and a masked patch-specific codebook. The determined quantized vector of the masked patch is included into a set of quantized vectors associated with the plurality of patches, and an output image is generated from the set of quantized vectors using a decoder.Type: GrantFiled: April 29, 2022Date of Patent: November 19, 2024Assignee: Microsoft Technology Licensing, LLC.Inventors: Dongdong Chen, Xiyang Dai, Yinpeng Chen, Mengchen Liu, Lu Yuan
-
Patent number: 11989956Abstract: Systems and methods for object detection generate a feature pyramid corresponding to image data, and rescaling the feature pyramid to a scale corresponding to a median level of the feature pyramid, wherein the rescaled feature pyramid is a four-dimensional (4D) tensor. The 4D tensor is reshaped into a three-dimensional (3D) tensor having individual perspectives including scale features, spatial features, and task features corresponding to different dimensions of the 3D tensor. The 3D tensor is used with a plurality of attention layers to update a plurality of feature maps associated with the image data. Object detection is performed on the image data using the updated plurality of feature maps.Type: GrantFiled: April 5, 2021Date of Patent: May 21, 2024Assignee: Microsoft Technology Licensing, LLCInventors: Xiyang Dai, Yinpeng Chen, Bin Xiao, Dongdong Chen, Mengchen Liu, Lu Yuan, Lei Zhang
-
Publication number: 20230351558Abstract: The disclosure herein describes generating an inpainted image from a masked image using a patch-based encoder and an unquantized transformer. An image including a masked region and an unmasked region is received, and the received image is divided into a plurality of patches including masked patches. The plurality of patches is encoded into a plurality of feature vectors, wherein each patch is encoded to a feature vector. Using a transformer, a predicted token is generated for each masked patch using a feature vector encoded from the masked patch, and a quantized vector of the masked patch is determined using generated predicted token and a masked patch-specific codebook. The determined quantized vector of the masked patch is included into a set of quantized vectors associated with the plurality of patches, and an output image is generated from the set of quantized vectors using a decoder.Type: ApplicationFiled: April 29, 2022Publication date: November 2, 2023Inventors: Dongdong CHEN, Xiyang DAI, Yinpeng CHEN, Mengchen LIU, Lu YUAN
-
Publication number: 20220318541Abstract: Systems and methods for object detection generate a feature pyramid corresponding to image data, and rescaling the feature pyramid to a scale corresponding to a median level of the feature pyramid, wherein the rescaled feature pyramid is a four-dimensional (4D) tensor. The 4D tensor is reshaped into a three-dimensional (3D) tensor having individual perspectives including scale features, spatial features, and task features corresponding to different dimensions of the 3D tensor. The 3D tensor is used with a plurality of attention layers to update a plurality of feature maps associated with the image data. Object detection is performed on the image data using the updated plurality of feature maps.Type: ApplicationFiled: April 5, 2021Publication date: October 6, 2022Inventors: Xiyang DAI, Yinpeng CHEN, Bin XIAO, Dongdong CHEN, Mengchen LIU, Lu YUAN, Lei ZHANG
-
Publication number: 20220188599Abstract: A neural architecture search (NAS) with a weak predictor comprises: receiving network architecture scoring information; iteratively sampling a search space, wherein the sampling comprises: generating a set of candidate architectures within the search space; learning a first predictor; evaluating performance of the candidate architectures; and based on at least the performance of the set of candidate architectures and the network architecture scoring information, refining the search space to a smaller search space; based on at least the network architecture scoring information, thresholding the performance of candidate architectures to determine scored output candidate architectures; and reporting the scored output candidate architectures. In some examples, the candidate architectures each comprise a machine learning (ML) model, for example a neural network (NN).Type: ApplicationFiled: December 15, 2020Publication date: June 16, 2022Inventors: Xiyang DAI, Dongdong CHEN, Yinpeng CHEN, Mengchen LIU, Ye YU, Zicheng LIU, Mei CHEN, Lu YUAN, Junru WU
-
Publication number: 20220188595Abstract: A computer device for automatic feature detection comprises a processor, a communication device, and a memory configured to hold instructions executable by the processor to instantiate a dynamic convolution neural network, receive input data via the communication network, and execute the dynamic convolution neural network to automatically detect features in the input data. The dynamic convolution neural network compresses the input data from an input space having a dimensionality equal to a predetermined number of channels into an intermediate space having a dimensionality less than the number of channels. The dynamic convolution neural network dynamically fuses the channels into an intermediate representation within the intermediate space and expands the intermediate representation from the intermediate space to an expanded representation in an output space having a higher dimensionality than the dimensionality of the intermediate space.Type: ApplicationFiled: December 16, 2020Publication date: June 16, 2022Applicant: Microsoft Technology Licensing, LLCInventors: Yinpeng CHEN, Xiyang DAI, Mengchen LIU, Dongdong CHEN, Lu YUAN, Zicheng LIU, Ye YU, Mei CHEN, Yunsheng LI
-
Patent number: 10769491Abstract: Techniques are disclosed for identifying discriminative, fine-grained features of an object in an image. In one example, an input device receives an image. A machine learning system includes a model comprising a first set, a second set, and a third set of filters. The machine learning system applies the first set of filters to the received image to generate an intermediate representation of the received image. The machine learning system applies the second set of filters to the intermediate representation to generate part localization data identifying sub-parts of an object and one or more regions of the image in which the sub-parts are located. The machine learning system applies the third set of filters to the intermediate representation to generate classification data identifying a subordinate category to which the object belongs. The system uses the part localization and classification data to perform fine-grained classification of the object.Type: GrantFiled: August 31, 2018Date of Patent: September 8, 2020Assignee: SRI InternationalInventors: Bogdan Calin Mihai Matei, Xiyang Dai, John Benjamin Southall, Nhon Hoc Trinh, Harpreet Sawhney
-
Publication number: 20190073560Abstract: Techniques are disclosed for identifying discriminative, fine-grained features of an object in an image. In one example, an input device receives an image. A machine learning system includes a model comprising a first set, a second set, and a third set of filters. The machine learning system applies the first set of filters to the received image to generate an intermediate representation of the received image. The machine learning system applies the second set of filters to the intermediate representation to generate part localization data identifying sub-parts of an object and one or more regions of the image in which the sub-parts are located. The machine learning system applies the third set of filters to the intermediate representation to generate classification data identifying a subordinate category to which the object belongs. The system uses the part localization and classification data to perform fine-grained classification of the object.Type: ApplicationFiled: August 31, 2018Publication date: March 7, 2019Inventors: Bogdan Calin Mihai Matei, Xiyang Dai, John Benjamin Southall, Nhon Hoc Trinh, Harpreet Sawhney