Patents by Inventor Yi Ke Wu

Yi Ke Wu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Denoising autoencoder image captioning

Patent number: 11763544

Abstract: In an approach to augmenting a caption dataset by leveraging a denoising autoencoder to sample and generate additional captions from the ground truth captions, one or more computer processors generate a plurality of new captions utilizing an autoencoder fed with one or more noisy captions, wherein the autoencoder is trained with a dataset comprising a plurality of ground truth captions. The one or more computer processors calculate an importance weight for each new caption in the plurality of generated new captions as compared to a plurality of associated ground truth captions based on a consensus metric. The one or more computer processors train a caption model with the generated plurality of new captions and associated calculated weights.

Type: Grant

Filed: July 7, 2020

Date of Patent: September 19, 2023

Assignee: International Business Machines Corporation

Inventors: Shiwan Zhao, Hao Kai Zhang, Yi Ke Wu, Zhong Su
Adaptive cycle consistency multimodal image captioning

Patent number: 11651522

Abstract: In an approach to improving the image captioning performance of low-resource languages by leveraging multimodal inputs, one or more computer processors encode an image utilizing an image encoder, wherein the image is contained within a triplet comprising the image, one or more high-resource captions, and one or more low-resource captions. The one or more computer processors generate one or more high-resource captions utilizing the encoded image and the triplet inputted into a high-resource decoder. The one or more computer processors encode the one or more generated high-resource captions utilizing a high-resource encoder. The one or more computer processors add adaptive cycle consistency constraints on a set of calculated attention weights associated the triplet. The one or more computer processors generate one or more low-resource captions by simultaneously inputting the encoded image, the encoded high-resource caption, and the triplet into a trained low-resource decoder.

Type: Grant

Filed: July 8, 2020

Date of Patent: May 16, 2023

Assignee: International Business Machines Corporation

Inventors: Shiwan Zhao, Yi Ke Wu, Hao Kai Zhang, Zhong Su
Mixup image captioning

Patent number: 11334769

Abstract: In an approach to augmenting caption datasets, one or more computer processors sample a ratio lambda from a probability distribution based on a pair of datapoints contained in a dataset, wherein each datapoint in the pair of datapoints comprises an image and an associated caption; extend the dataset by generating one or more new datapoints based on the sampled ratio lambda for each pair of datapoints in the dataset, wherein the sampled ratio lambda incorporates an interpolation of features associated with the pair of datapoints into the generated one or more new datapoints; identify one or more objects contained within a subsequent image utilizing an image model trained utilizing the extended dataset; generate a subsequent caption for one or more identified objects contained within the subsequent image utilizing a language generating model trained utilizing the extended dataset.

Type: Grant

Filed: July 7, 2020

Date of Patent: May 17, 2022

Assignee: International Business Machines Corporation

Inventors: Shiwan Zhao, Yi Ke Wu, Hao Kai Zhang, Zhong Su
MIXUP IMAGE CAPTIONING

Publication number: 20220012544

Abstract: In an approach to augmenting caption datasets, one or more computer processors sample a ratio lambda from a probability distribution based on a pair of datapoints contained in a dataset, wherein each datapoint in the pair of datapoints comprises an image and an associated caption; extend the dataset by generating one or more new datapoints based on the sampled ratio lambda for each pair of datapoints in the dataset, wherein the sampled ratio lambda incorporates an interpolation of features associated with the pair of datapoints into the generated one or more new datapoints; identify one or more objects contained within a subsequent image utilizing an image model trained utilizing the extended dataset; generate a subsequent caption for one or more identified objects contained within the subsequent image utilizing a language generating model trained utilizing the extended dataset.

Type: Application

Filed: July 7, 2020

Publication date: January 13, 2022

Inventors: Shiwan Zhao, Yi Ke Wu, Hao Kai Zhang, Zhong Su
ADAPTIVE CYCLE CONSISTENCY MULTIMODAL IMAGE CAPTIONING

Publication number: 20220012919

Abstract: In an approach to improving the image captioning performance of low-resource languages by leveraging multimodal inputs, one or more computer processors encode an image utilizing an image encoder, wherein the image is contained within a triplet comprising the image, one or more high-resource captions, and one or more low-resource captions. The one or more computer processors generate one or more high-resource captions utilizing the encoded image and the triplet inputted into a high-resource decoder. The one or more computer processors encode the one or more generated high-resource captions utilizing a high-resource encoder. The one or more computer processors add adaptive cycle consistency constraints on a set of calculated attention weights associated the triplet. The one or more computer processors generate one or more low-resource captions by simultaneously inputting the encoded image, the encoded high-resource caption, and the triplet into a trained low-resource decoder.

Type: Application

Filed: July 8, 2020

Publication date: January 13, 2022

Inventors: Shiwan Zhao, Yi Ke Wu, Hao Kai Zhang, Zhong Su
DENOISING AUTOENCODER IMAGE CAPTIONING

Publication number: 20220012534

Abstract: In an approach to augmenting a caption dataset by leveraging a denoising autoencoder to sample and generate additional captions from the ground truth captions, one or more computer processors generate a plurality of new captions utilizing an autoencoder fed with one or more noisy captions, wherein the autoencoder is trained with a dataset comprising a plurality of ground truth captions. The one or more computer processors calculate an importance weight for each new caption in the plurality of generated new captions as compared to a plurality of associated ground truth captions based on a consensus metric. The one or more computer processors train a caption model with the generated plurality of new captions and associated calculated weights.

Type: Application

Filed: July 7, 2020

Publication date: January 13, 2022

Inventors: Shiwan Zhao, Hao Kai Zhang, Yi Ke Wu, Zhong Su

Denoising autoencoder image captioning

Adaptive cycle consistency multimodal image captioning

Mixup image captioning

MIXUP IMAGE CAPTIONING

ADAPTIVE CYCLE CONSISTENCY MULTIMODAL IMAGE CAPTIONING

DENOISING AUTOENCODER IMAGE CAPTIONING