Patents by Inventor Raviteja Vemulapalli

Raviteja Vemulapalli has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Modeling Dependencies with Global Self-Attention Neural Networks

Publication number: 20230359865

Abstract: The present disclosure provides systems, methods, and computer program products for modeling dependencies throughout a network using a global-self attention model with a content attention layer and a positional attention layer that operate in parallel. The model receives input data comprising content values and context positions. The content attention layer generates one or more output features for each context position based on a global attention operation applied to the content values independent of the context positions. The positional attention layer generates an attention map for each of the context positions based on one or more content values of the respective context position and associated neighboring positions. Output is determined based on the output features generated by the content attention layer and the attention map generated for each context position by the positional attention layer. The model improves efficiency and can be used throughout a deep network.

Type: Application

Filed: September 16, 2020

Publication date: November 9, 2023

Inventors: Zhuoran Shen, Raviteja Vemulapalli, Irwan Bello, Xuhui Jia, Ching-Hui Chen
Systems and Methods for Training Machine-Learned Visual Attention Models

Publication number: 20230281979

Abstract: Systems and methods of the present disclosure are directed to a method for training a machine-learned visual attention model. The method can include obtaining image data that depicts a head of a person and an additional entity. The method can include processing the image data with an encoder portion of the visual attention model to obtain latent head and entity encodings. The method can include processing the latent encodings with the visual attention model to obtain a visual attention value and processing the latent encodings with a machine-learned visual location model to obtain a visual location estimation. The method can include training the models by evaluating a loss function that evaluates differences between the visual location estimation and a pseudo visual location label derived from the image data and between the visual attention value and a ground truth visual attention label.

Type: Application

Filed: August 3, 2020

Publication date: September 7, 2023

Inventors: Xuhui Jia, Raviteja Vemulapalli, Bradley Ray Green, Bardia Doosti, Ching-Hui Chen
Subtask Adaptable Neural Network

Publication number: 20230214656

Abstract: At training time, a base neural network can be trained to perform each of a plurality of basis subtasks included in a total set of basis subtasks (e.g., individually or some combination thereof). Next, a description of a desired combined subtask can be obtained. Based on the description of the combined subtask, a mask generator can produce a pruning mask which is used to prune the base neural network into a smaller combined-subtask-specific network that performs only the two or more basis subtasks included in the combined subtask.

Type: Application

Filed: June 10, 2020

Publication date: July 6, 2023

Inventors: Raviteja Vemulapalli, Jianrui Cai, Bradley Ray Green, Ching-Hui Chen, Lior Shapira
Compact language-free facial expression embedding and novel triplet training scheme

Patent number: 11163987

Abstract: The present disclosure provides systems and methods that include or otherwise leverage use of a facial expression model that is configured to provide a facial expression embedding. In particular, the facial expression model can receive an input image that depicts a face and, in response, provide a facial expression embedding that encodes information descriptive of a facial expression made by the face depicted in the input image. As an example, the facial expression model can be or include a neural network such as a convolutional neural network. The present disclosure also provides a novel and unique triplet training scheme which does not rely upon designation of a particular image as an anchor or reference image.

Type: Grant

Filed: January 15, 2020

Date of Patent: November 2, 2021

Assignee: Google LLC

Inventors: Raviteja Vemulapalli, Aseem Agarwala
Frame-recurrent video super-resolution

Patent number: 10783611

Abstract: The present disclosure provides systems and methods to increase resolution of imagery. In one example embodiment, a computer-implemented method includes obtaining a current low-resolution image frame. The method includes obtaining a previous estimated high-resolution image frame, the previous estimated high-resolution frame being a high-resolution estimate of a previous low-resolution image frame. The method includes warping the previous estimated high-resolution image frame based on the current low-resolution image frame. The method includes inputting the warped previous estimated high-resolution image frame and the current low-resolution image frame into a machine-learned frame estimation model. The method includes receiving a current estimated high-resolution image frame as an output of the machine-learned frame estimation model, the current estimated high-resolution image frame being a high-resolution estimate of the current low-resolution image frame.

Type: Grant

Filed: January 2, 2018

Date of Patent: September 22, 2020

Assignee: Google LLC

Inventors: Raviteja Vemulapalli, Matthew Brown, Seyed Mohammad Mehdi Sajjadi
Compact Language-Free Facial Expression Embedding and Novel Triplet Training Scheme

Publication number: 20200151438

Abstract: The present disclosure provides systems and methods that include or otherwise leverage use of a facial expression model that is configured to provide a facial expression embedding. In particular, the facial expression model can receive an input image that depicts a face and, in response, provide a facial expression embedding that encodes information descriptive of a facial expression made by the face depicted in the input image. As an example, the facial expression model can be or include a neural network such as a convolutional neural network. The present disclosure also provides a novel and unique triplet training scheme which does not rely upon designation of a particular image as an anchor or reference image.

Type: Application

Filed: January 15, 2020

Publication date: May 14, 2020

Inventors: Raviteja Vemulapalli, Aseem Agarwala
Compact language-free facial expression embedding and novel triplet training scheme

Patent number: 10565434

Abstract: The present disclosure provides systems and methods that include or otherwise leverage use of a facial expression model that is configured to provide a facial expression embedding. In particular, the facial expression model can receive an input image that depicts a face and, in response, provide a facial expression embedding that encodes information descriptive of a facial expression made by the face depicted in the input image. As an example, the facial expression model can be or include a neural network such as a convolutional neural network. The present disclosure also provides a novel and unique triplet training scheme which does not rely upon designation of a particular image as an anchor or reference image.

Type: Grant

Filed: June 30, 2017

Date of Patent: February 18, 2020

Assignee: Google LLC

Inventors: Raviteja Vemulapalli, Aseem Agarwala
Frame-Recurrent Video Super-Resolution

Publication number: 20190206026

Abstract: The present disclosure provides systems and methods to increase resolution of imagery. In one example embodiment, a computer-implemented method includes obtaining a current low-resolution image frame. The method includes obtaining a previous estimated high-resolution image frame, the previous estimated high-resolution frame being a high-resolution estimate of a previous low-resolution image frame. The method includes warping the previous estimated high-resolution image frame based on the current low-resolution image frame. The method includes inputting the warped previous estimated high-resolution image frame and the current low-resolution image frame into a machine-learned frame estimation model. The method includes receiving a current estimated high-resolution image frame as an output of the machine-learned frame estimation model, the current estimated high-resolution image frame being a high-resolution estimate of the current low-resolution image frame.

Type: Application

Filed: January 2, 2018

Publication date: July 4, 2019

Inventors: Raviteja Vemulapalli, Matthew Brown, Seyed Mohammad Mehdi Sajjadi
Compact Language-Free Facial Expression Embedding and Novel Triplet Training Scheme

Publication number: 20190005313

Abstract: The present disclosure provides systems and methods that include or otherwise leverage use of a facial expression model that is configured to provide a facial expression embedding. In particular, the facial expression model can receive an input image that depicts a face and, in response, provide a facial expression embedding that encodes information descriptive of a facial expression made by the face depicted in the input image. As an example, the facial expression model can be or include a neural network such as a convolutional neural network. The present disclosure also provides a novel and unique triplet training scheme which does not rely upon designation of a particular image as an anchor or reference image.

Type: Application

Filed: June 30, 2017

Publication date: January 3, 2019

Inventors: Raviteja Vemulapalli, Aseem Agarwala
System and method for semantic segmentation using Gaussian random field network

Patent number: 9704257

Abstract: A computer-implemented method for semantic segmentation of an image determines unary energy of each pixel in an image using a first subnetwork, determines pairwise energy of at least some pairs of pixels of the image using a second subnetwork, and determines, using a third subnetwork, an inference on a Gaussian random field (GRF) minimizing an energy function including a combination of the unary energy and the pairwise energy. The GRF inference defining probabilities of semantic labels for each pixel in the image, and the method converts the image into a semantically segmented image by assigning to a pixel in the semantically segmented image a semantic label having the highest probability for a corresponding pixel in the image among the probabilities determined by the third subnetwork. The first subnetwork, the second subnetwork, and the third subnetwork are parts of a neural network.

Type: Grant

Filed: March 25, 2016

Date of Patent: July 11, 2017

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Oncel Tuzel, Raviteja Vemulapalli, Ming-Yu Liu
Method and system for denoising images using deep Gaussian conditional random field network

Patent number: 9633274

Abstract: A sensor acquires an input image X of a scene. The image includes noise with a variance ?2. A deep Gaussian conditional random field (GCRF) network is applied to the input image to produce an output image Y, where the output image is denoised, and wherein the deep GCRF includes a prior generation (PgNet) network followed by an inference network (InfNet), wherein the PgNet produces patch covariance priors ?ij for patches centered on every pixel (i,j) in the input image, and wherein the InfNet is applied to the patch covariance priors and the input image to solve the GCRF.

Type: Grant

Filed: September 15, 2015

Date of Patent: April 25, 2017

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Oncel Tuzel, Ming-Yu Liu, Raviteja Vemulapalli
METHOD AND SYSTEM FOR DENOISING IMAGES USING DEEP GAUSSIAN CONDITIONAL RANDOM FIELD NETWORK

Publication number: 20170076170

Abstract: A sensor acquires an input image X of a scene. The image includes noise with a variance ?2. A deep Gaussian conditional random field (GCRF) network is applied to the input image to produce an output image Y, where the output image is denoised, and wherein the deep GCRF includes a prior generation (PgNet) network followed by an inference network (InfNet), wherein the PgNet produces patch covariance priors ?ij for patches centered on every pixel (i,j) in the input image, and wherein the InfNet is applied to the patch covariance priors and the input image to solve the GCRF.

Type: Application

Filed: September 15, 2015

Publication date: March 16, 2017

Inventors: Oncel Tuzel, Ming-Yu Liu, Raviteja Vemulapalli
Method and system for unsupervised cross-modal medical image synthesis

Patent number: 9582916

Abstract: A method and apparatus for unsupervised cross-modal medical image synthesis is disclosed, which synthesizes a target modality medical image based on a source modality medical image without the need for paired source and target modality training data. A source modality medical image is received. Multiple candidate target modality intensity values are generated for each of a plurality of voxels of a target modality medical image based on corresponding voxels in the source modality medical image. A synthesized target modality medical image is generated by selecting, jointly for all of the plurality of voxels in the target modality medical image, intensity values from the multiple candidate target modality intensity values generated for each of the plurality of voxels. The synthesized target modality medical image can be refined using coupled sparse representation.

Type: Grant

Filed: September 30, 2015

Date of Patent: February 28, 2017

Assignee: SIEMENS HEALTHCARE GMBH

Inventors: Raviteja Vemulapalli, Hien Nguyen, Shaohua Kevin Zhou
Method and System for Unsupervised Cross-Modal Medical Image Synthesis

Publication number: 20160133037

Abstract: A method and apparatus for unsupervised cross-modal medical image synthesis is disclosed, which synthesizes a target modality medical image based on a source modality medical image without the need for paired source and target modality training data. A source modality medical image is received. Multiple candidate target modality intensity values are generated for each of a plurality of voxels of a target modality medical image based on corresponding voxels in the source modality medical image. A synthesized target modality medical image is generated by selecting, jointly for all of the plurality of voxels in the target modality medical image, intensity values from the multiple candidate target modality intensity values generated for each of the plurality of voxels. The synthesized target modality medical image can be refined using coupled sparse representation.

Type: Application

Filed: September 30, 2015

Publication date: May 12, 2016

Inventors: Raviteja Vemulapalli, Hien Nguyen, Shaohua Kevin Zhou