Patents by Inventor Burak Uzkent

Burak Uzkent has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

System and method for supervised contrastive learning for multi-modal tasks

Patent number: 12183062

Abstract: A method includes obtaining a batch of training data including multiple paired image-text pairs and multiple unpaired image-text pairs, where each paired image-text pair and each unpaired image-text pair includes an image and a text. The method also includes training a machine learning model using the training data based on an optimization of a combination of losses. The losses include, for each paired image-text pair, (i) a first multi-modal representation loss based on the paired image-text pair and (ii) a second multi-modal representation loss based on two or more unpaired image-text pairs, selected from among the multiple unpaired image-text pairs, wherein each of the two or more unpaired image-text pairs includes either the image or the text of the paired image-text pair.

Type: Grant

Filed: January 31, 2022

Date of Patent: December 31, 2024

Assignee: Samsung Electronics Co., Ltd.

Inventors: Changsheng Zhao, Burak Uzkent, Yilin Shen, Hongxia Jin
APPARATUS AND METHOD FOR SHARING AND PRUNING WEIGHTS FOR VISION AND LANGUAGE MODELS

Publication number: 20240119077

Abstract: A method of performing a multimodal tasks by using a multimodal model that includes a text encoder and a vision encoder, may include obtaining a text feature from the query via the text encoder; obtaining an image feature from the one or more input images via the vision encoder; and outputting a response to the query based on similarity between the text feature and the image feature, wherein weights vectors of the text encoder and the vision encoder are pruned and shared according to a sharing vector and a pruning vector that are generated by a hypernetwork, and wherein the hypernetwork and the multimodal model are jointly trained to minimize at least one of a difference between the weight vectors in the text encoder and the vision encoder, a difference between the weight vectors in different layers of the text encoder, and a number of parameters in the multimodal model.

Type: Application

Filed: September 14, 2023

Publication date: April 11, 2024

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Shangqian GAO, Burak UZKENT, Yilin SHEN, Hongxia JIN
FUSION TECHNIQUES FOR COMBINING MOST SIGNIFICANT BITS AND LEAST SIGNIFICANT BITS OF IMAGE DATA IN IMAGE PROCESSING OR OTHER APPLICATIONS

Publication number: 20240080423

Abstract: A method includes obtaining raw image data, where the raw image data includes data values each having most significant bits and least significant bits. The method also includes providing the raw image data to a trained machine learning model and generating processed image data using the trained machine learning model. The method further includes presenting an image based on the processed image data. The trained machine learning model is trained to modulate a feature map associated with the most significant bits of the data values of the raw image data based on the least significant bits of the data values of the raw image data in order to generate a fusion of the most significant bits and the least significant bits of the data values of the raw image data.

Type: Application

Filed: November 18, 2022

Publication date: March 7, 2024

Inventors: Wenbo Li, Zhipeng Mo, Yi Wei, Burak Uzkent, Qian Lou, Yilin Shen, Hongxia Jin
METHOD AND SYSTEM FOR LEARNING TO SHARE WEIGHTS ACROSS TRANSFORMER BACKBONES IN VISION AND LANGUAGE TASKS

Publication number: 20230289590

Abstract: A method of training a model includes configuring a first transformer for visual learning with a first set of weights, configuring a second transformer for textual learning with a second set of weights, adjusting at least the second set of weights based on minimizing a weight difference between the first set of weights and the second set of weights, replacing the first set of weights for the first transformer with the adjusted second set of weights, and updating the first transformer based on the adjusted second set of weights.

Type: Application

Filed: September 8, 2022

Publication date: September 14, 2023

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Burak UZKENT, Vasili Ramanishka, Yilin Shen, Hongxia Jin
SYSTEM AND METHOD FOR SUPERVISED CONTRASTIVE LEARNING FOR MULTI-MODAL TASKS

Publication number: 20230245435

Abstract: A method includes obtaining a batch of training data including multiple paired image-text pairs and multiple unpaired image-text pairs, where each paired image-text pair and each unpaired image-text pair includes an image and a text. The method also includes training a machine learning model using the training data based on an optimization of a combination of losses. The losses include, for each paired image-text pair, (i) a first multi-modal representation loss based on the paired image-text pair and (ii) a second multi-modal representation loss based on two or more unpaired image-text pairs, selected from among the multiple unpaired image-text pairs, wherein each of the two or more unpaired image-text pairs includes either the image or the text of the paired image-text pair.

Type: Application

Filed: January 31, 2022

Publication date: August 3, 2023

Inventors: Changsheng Zhao, Burak Uzkent, Yilin Shen, Hongxia Jin
SMALL AND FAST TRANSFORMER MODEL FOR MULTI-MODAL OR OTHER TASKS

Publication number: 20230177338

Abstract: A method includes obtaining, using a first electronic device, a weight matrix associated with a trained transformer model. The method also includes factorizing the weight matrix into a dictionary weight matrix and an intermediate matrix. The method further includes pruning the intermediate matrix to generate a sparse intermediate matrix. The method also includes fine-tuning the sparse intermediate matrix based on a training dataset to generate a fine-tuned sparse intermediate matrix. The method further includes determining an index matrix and a coefficient matrix based on the fine-tuned sparse intermediate matrix. In addition, the method includes deploying the dictionary weight matrix, the index matrix, and the coefficient matrix to a second electronic device without deploying the weight matrix to the second electronic device. A number of parameters in the dictionary weight matrix, the index matrix, and the coefficient matrix is smaller than a number of parameters in the weight matrix.

Type: Application

Filed: December 1, 2022

Publication date: June 8, 2023

Inventors: Qian Lou, Yen-Chang Hsu, Burak Uzkent, Ting Hua, Yilin Shen, Hongxia Jin
SUPERVISED CONTRASTIVE LEARNING FOR VISUAL GROUNDING

Publication number: 20230075862

Abstract: A method of training a neural network model includes generating a positive image based on an original image, generating a positive text corresponding to the positive image based on an original text corresponding to the original image, the positive text referring to an object in the positive image, constructing a positive image-text pair for the object based on the positive image and the positive text, constructing a negative image-text pair for the object based on the original image and a negative text, the negative text not referring to the object, training the neural network model based on the positive image-text pair and the negative image-text pair to output features representing an input image-text pair, and identifying the object in the original image based on the features representing the input image-text pair.

Type: Application

Filed: August 30, 2022

Publication date: March 9, 2023

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Burak UZKENT, Vasili Ramanishka, Yilin Shen, Hongxia Jin
Structured Pruning of Vision Transformer

Publication number: 20230073835

Abstract: In one embodiment, a method includes accessing a batch B of a plurality of images, wherein each image in the batch is part of a training set of images used to train a vision transformer comprising a plurality of attention heads. The method further includes determining, for each attention head A, a similarity between (1) the output of the attention head evaluated using each image in the batch and the (2) output of each attention head evaluated using each image in the batch. The method further includes determining, based on the determined similarities, an importance score for each attention head; and pruning, based on the importance scores, one or more attention heads from the vision transformer.

Type: Application

Filed: August 31, 2022

Publication date: March 9, 2023

Inventors: Miao Yin, Burak Uzkent, Yilin Shen, Hongxia Jin
METHOD AND APPARATUS FOR CLASSIFYING IMAGES USING AN ARTIFICIAL INTELLIGENCE MODEL

Publication number: 20220309774

Abstract: An apparatus for performing image processing, may include at least one processor configured to: input an image to a vision transformer comprising a plurality of encoders that correspond to at least one fixed encoder and a plurality of adaptive encoders; process the image via the at least one fixed encoder to obtain image representations; determine one or more layers of the plurality of adaptive encoders to drop, by inputting the image representations to a policy network configured to determine layer dropout actions for the plurality of adaptive encoders; and obtain a class of the input image using remaining layers of the plurality of adaptive encoders other than the dropped one or more layers.

Type: Application

Filed: March 22, 2022

Publication date: September 29, 2022

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Burak Uzkent, Vasili Ramanishka, Yilin Shen, Hongxia Jin