Patents by Inventor Rahul Sukthankar

Rahul Sukthankar has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11908071
    Abstract: The present disclosure is generally directed to reconstructing representations of bodies from images. An example method of the present disclosure includes inputting, into a machine-learned reconstruction model, input data descriptive of an image depicting a body; predicting, using a machine-learned marker prediction component of the reconstruction model, a set of surface marker locations on the body; and outputting, using a machine-learned marker poser component of the reconstruction model, an output representation of the body that corresponds to the set of surface marker locations. In the example method, one or more parameters of the reconstruction model were learned at least in part based on a consistency loss corresponding to a distance between relaxed-constraint representations generated from a prior set of surface marker locations predicted according to the one or more parameters and parametric representations generated from the prior set using kinematic constraints associated with the body.
    Type: Grant
    Filed: October 7, 2021
    Date of Patent: February 20, 2024
    Assignee: GOOGLE LLC
    Inventors: Cristian Sminchisescu, Mihai Zanfir, Andrei Zanfir, Eduard Gabriel Bazavan, William Tafel Freeman, Rahul Sukthankar
  • Patent number: 11836221
    Abstract: Systems and methods are directed to a method for estimation of an object state from image data. The method can include obtaining two-dimensional image data depicting an object. The method can include processing, with an estimation portion of a machine-learned object state estimation model, the two-dimensional image data to obtain an initial estimated state of the object. The method can include, for each of one or more refinement iterations, obtaining a previous loss value associated with a previous estimated state for the object, processing the previous loss value to obtain a current estimated state of the object, and evaluating a loss function to determine a loss value associated with the current estimated state of the object. The method can include providing a final estimated state for the object.
    Type: Grant
    Filed: March 12, 2021
    Date of Patent: December 5, 2023
    Assignee: GOOGLE LLC
    Inventors: Cristian Sminchisescu, Andrei Zanfir, Eduard Gabriel Bazavan, Mihai Zanfir, William Tafel Freeman, Rahul Sukthankar
  • Patent number: 11763466
    Abstract: A system comprising an encoder neural network, a scene structure decoder neural network, and a motion decoder neural network. The encoder neural network is configured to: receive a first image and a second image; and process the first image and the second image to generate an encoded representation of the first image and the second image. The scene structure decoder neural network is configured to process the encoded representation to generate a structure output characterizing a structure of a scene depicted in the first image. The motion decoder neural network configured to process the encoded representation to generate a motion output characterizing motion between the first image and the second image.
    Type: Grant
    Filed: December 23, 2020
    Date of Patent: September 19, 2023
    Assignee: Google LLC
    Inventors: Cordelia Luise Schmid, Sudheendra Vijayanarasimhan, Susanna Maria Ricco, Bryan Andrew Seybold, Rahul Sukthankar, Aikaterini Fragkiadaki
  • Publication number: 20230169727
    Abstract: The present disclosure provides a statistical, articulated 3D human shape modeling pipeline within a fully trainable, modular, deep learning framework. In particular, aspects of the present disclosure are directed to a machine-learned 3D human shape model with at least facial and body shape components that are jointly trained end-to-end on a set of training data. Joint training of the model components (e.g., including both facial, hands, and rest of body components) enables improved consistency of synthesis between the generated face and body shapes.
    Type: Application
    Filed: April 30, 2020
    Publication date: June 1, 2023
    Inventors: Cristian Sminchisescu, Hongyi Xu, Eduard Gabriel Bazavan, Andrei Zanfir, William T. Freeman, Rahul Sukthankar
  • Publication number: 20230116884
    Abstract: The present disclosure is generally directed to reconstructing representations of bodies from images. An example method of the present disclosure includes inputting, into a machine-learned reconstruction model, input data descriptive of an image depicting a body; predicting, using a machine-learned marker prediction component of the reconstruction model, a set of surface marker locations on the body; and outputting, using a machine-learned marker poser component of the reconstruction model, an output representation of the body that corresponds to the set of surface marker locations. In the example method, one or more parameters of the reconstruction model were learned at least in part based on a consistency loss corresponding to a distance between relaxed-constraint representations generated from a prior set of surface marker locations predicted according to the one or more parameters and parametric representations generated from the prior set using kinematic constraints associated with the body.
    Type: Application
    Filed: October 7, 2021
    Publication date: April 13, 2023
    Inventors: Cristian Sminchisescu, Mihai Zanfir, Andrei Zanfir, Eduard Gabriel Bazavan, William Tafel Freeman, Rahul Sukthankar
  • Publication number: 20220292314
    Abstract: Systems and methods are directed to a method for estimation of an object state from image data. The method can include obtaining two-dimensional image data depicting an object. The method can include processing, with an estimation portion of a machine-learned object state estimation model, the two-dimensional image data to obtain an initial estimated state of the object. The method can include, for each of one or more refinement iterations, obtaining a previous loss value associated with a previous estimated state for the object, processing the previous loss value to obtain a current estimated state of the object, and evaluating a loss function to determine a loss value associated with the current estimated state of the object. The method can include providing a final estimated state for the object.
    Type: Application
    Filed: March 12, 2021
    Publication date: September 15, 2022
    Inventors: Cristian Sminchisescu, Andrei Zanfir, Eduard Gabriel Bazavan, Mihai Zanfir, William Tafel Freeman, Rahul Sukthankar
  • Publication number: 20220237882
    Abstract: The present disclosure is directed to encoding images. In particular, one or more computing devices can receive data representing one or more machine learning (ML) models configured, at least in part, to encode images comprising objects of a particular type. The computing device(s) can receive data representing an image comprising one or more objects of the particular type. The computing device(s) can generate, based at least in part on the data representing the image and the data representing the ML model(s), data representing an encoded version of the image that alters at least a portion of the image comprising the object(s) such that when the encoded version of the image is decoded, the object(s) are unrecognizable as being of the particular type by one or more object-recognition ML models based at least in part upon which the ML model(s) configured to encode the images were trained.
    Type: Application
    Filed: May 28, 2019
    Publication date: July 28, 2022
    Inventors: Shumeet Baluja, Rahul Sukthankar
  • Patent number: 11163989
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing action localization in images and videos. In one aspect, a system comprises a data processing apparatus; a memory in data communication with the data processing apparatus and storing instructions that cause the data processing apparatus to perform image processing and video processing operations comprising: receiving an input comprising an image depicting a person; identifying a plurality of context positions from the image; determining respective feature representations of each of the context positions; providing a feature representation of the person and the feature representations of each of the context positions to a context neural network to obtain relational features, wherein the relational features represent relationships between the person and the context positions; and determining an action performed by the person using the feature representation of the person and the relational features.
    Type: Grant
    Filed: August 6, 2019
    Date of Patent: November 2, 2021
    Assignee: Google LLC
    Inventors: Chen Sun, Abhinav Shrivastava, Cordelia Luise Schmid, Rahul Sukthankar, Kevin Patrick Murphy, Carl Martin Vondrick
  • Publication number: 20210166009
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing action localization. In one aspect, a system comprises a data processing apparatus; a memory in data communication with the data processing apparatus and storing instructions that cause the data processing apparatus to perform operations comprising: receiving an input comprising an image depicting a person; identifying a plurality of context positions from the image; determining respective feature representations of each of the context positions; providing a feature representation of the person and the feature representations of each of the context positions to a context neural network to obtain relational features, wherein the relational features represent relationships between the person and the context positions; and determining an action performed by the person using the feature representation of the person and the relational features.
    Type: Application
    Filed: August 6, 2019
    Publication date: June 3, 2021
    Inventors: Chen Sun, Abhinav Shrivastava, Cordelia Luise Schmid, Rahul Sukthankar, Kevin Patrick Murphy, Carl Martin Vondrick
  • Patent number: 11010948
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for navigation using visual inputs. One of the systems includes a mapping subsystem configured to, at each time step of a plurality of time steps, generate a characterization of an environment from an image of the environment at the time step, wherein the characterization comprises an environment map identifying locations in the environment having a particular characteristic, and wherein generating the characterization comprises, for each time step: obtaining the image of the environment at the time step, processing the image to generate a first initial characterization for the time step, obtaining a final characterization for a previous time step, processing the characterization for the previous time step to generate a second initial characterization for the time step, and combining the first initial characterization and the second initial characterization to generate a final characterization for the time step.
    Type: Grant
    Filed: February 9, 2018
    Date of Patent: May 18, 2021
    Assignee: Google LLC
    Inventors: Rahul Sukthankar, Saurabh Gupta, James Christopher Davidson, Sergey Vladimir Levine, Jitendra Malik
  • Publication number: 20210118153
    Abstract: A system comprising an encoder neural network, a scene structure decoder neural network, and a motion decoder neural network. The encoder neural network is configured to: receive a first image and a second image; and process the first image and the second image to generate an encoded representation of the first image and the second image. The scene structure decoder neural network is configured to process the encoded representation to generate a structure output characterizing a structure of a scene depicted in the first image. The motion decoder neural network configured to process the encoded representation to generate a motion output characterizing motion between the first image and the second image.
    Type: Application
    Filed: December 23, 2020
    Publication date: April 22, 2021
    Inventors: Cordelia Luise Schmid, Sudheendra Vijayanarasimhan, Susanna Maria Ricco, Bryan Andrew Seybold, Rahul Sukthankar, Aikaterini Fragkiadaki
  • Patent number: 10878583
    Abstract: A system comprising an encoder neural network, a scene structure decoder neural network, and a motion decoder neural network. The encoder neural network is configured to: receive a first image and a second image; and process the first image and the second image to generate an encoded representation of the first image and the second image. The scene structure decoder neural network is configured to process the encoded representation to generate a structure output characterizing a structure of a scene depicted in the first image. The motion decoder neural network configured to process the encoded representation to generate a motion output characterizing motion between the first image and the second image.
    Type: Grant
    Filed: December 1, 2017
    Date of Patent: December 29, 2020
    Assignee: Google LLC
    Inventors: Cordelia Luise Schmid, Sudheendra Vijayanarasimhan, Susanna Maria Ricco, Bryan Andrew Seybold, Rahul Sukthankar, Aikaterini Fragkiadaki
  • Publication number: 20200349722
    Abstract: A system comprising an encoder neural network, a scene structure decoder neural network, and a motion decoder neural network. The encoder neural network is configured to: receive a first image and a second image; and process the first image and the second image to generate an encoded representation of the first image and the second image. The scene structure decoder neural network is configured to process the encoded representation to generate a structure output characterizing a structure of a scene depicted in the first image. The motion decoder neural network configured to process the encoded representation to generate a motion output characterizing motion between the first image and the second image.
    Type: Application
    Filed: December 1, 2017
    Publication date: November 5, 2020
    Inventors: Cordelia Luise Schmid, Sudheendra Vijayanarasimhan, Susanna Maria Ricco, Bryan Andrew Seybold, Rahul Sukthankar, Aikaterini Fragkiadaki
  • Patent number: 10713818
    Abstract: Methods, and systems, including computer programs encoded on computer storage media for compressing data items with variable compression rate. A system includes an encoder sub-network configured to receive a system input image and to generate an encoded representation of the system input image, the encoder sub-network including a first stack of neural network layers including one or more LSTM neural network layers and one or more non-LSTM neural network layers, the first stack configured to, at each of a plurality of time steps, receive an input image for the time step that is derived from the system input image and generate a corresponding first stack output, and a binarizing neural network layer configured to receive a first stack output as input and generate a corresponding binarized output.
    Type: Grant
    Filed: January 28, 2019
    Date of Patent: July 14, 2020
    Assignee: Google LLC
    Inventors: George Dan Toderici, Sean O'Malley, Rahul Sukthankar, Sung Jin Hwang, Damien Vincent, Nicholas Johnston, David Charles Minnen, Joel Shor, Michele Covell
  • Patent number: 10681388
    Abstract: Encoding and decoding occupancy information is disclosed. A method includes determining row sums for the region, determining column sums for the region, encoding, in a compressed bitstream, at least one of the row sums and the column sums, and encoding, in the compressed bitstream and based on a coding order, at least one of the rows and the columns of the region. The coding order is based on the encoded at least one of the row sums and the column sums. The row sums include, for each row of the region, a respective count of a number of locations in the row having a specified value. The column sums include, for each column of the region, a respective count of a number of locations in the column having the specified value. A location having the specified value is indicative of the occupancy information at the location.
    Type: Grant
    Filed: January 30, 2018
    Date of Patent: June 9, 2020
  • Patent number: 10635979
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining a clustering of images into a plurality of semantic categories. In one aspect, a method comprises: training a categorization neural network, comprising, at each of a plurality of iterations: processing an image depicting an object using the categorization neural network to generate (i) a current prediction for whether the image depicts an object or a background region, and (ii) a current embedding of the image; determining a plurality of current cluster centers based on the current values of the categorization neural network parameters, wherein each cluster center represents a respective semantic category; and determining a gradient of an objective function that includes a classification loss and a clustering loss, wherein the clustering loss depends on a similarity between the current embedding of the image and the current cluster centers.
    Type: Grant
    Filed: July 15, 2019
    Date of Patent: April 28, 2020
    Assignee: Google LLC
    Inventors: Steven Hickson, Anelia Angelova, Irfan Aziz Essa, Rahul Sukthankar
  • Publication number: 20200027002
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining a clustering of images into a plurality of semantic categories. In one aspect, a method comprises: training a categorization neural network, comprising, at each of a plurality of iterations: processing an image depicting an object using the categorization neural network to generate (i) a current prediction for whether the image depicts an object or a background region, and (ii) a current embedding of the image; determining a plurality of current cluster centers based on the current values of the categorization neural network parameters, wherein each cluster center represents a respective semantic category; and determining a gradient of an objective function that includes a classification loss and a clustering loss, wherein the clustering loss depends on a similarity between the current embedding of the image and the current cluster centers.
    Type: Application
    Filed: July 15, 2019
    Publication date: January 23, 2020
    Inventors: Steven Hickson, Anelia Angelova, Irfan Aziz Essa, Rahul Sukthankar
  • Publication number: 20190371025
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for navigation using visual inputs. One of the systems includes a mapping subsystem configured to, at each time step of a plurality of time steps, generate a characterization of an environment from an image of the environment at the time step, wherein the characterization comprises an environment map identifying locations in the environment having a particular characteristic, and wherein generating the characterization comprises, for each time step: obtaining the image of the environment at the time step, processing the image to generate a first initial characterization for the time step, obtaining a final characterization for a previous time step, processing the characterization for the previous time step to generate a second initial characterization for the time step, and combining the first initial characterization and the second initial characterization to generate a final characterization for the time step.
    Type: Application
    Filed: February 9, 2018
    Publication date: December 5, 2019
    Inventors: Rahul Sukthankar, Saurabh Gupta, James Christopher Davidson, Sergey Vladimir Levine, Jitendra Malik
  • Publication number: 20190238893
    Abstract: Encoding and decoding occupancy information is disclosed. A method includes determining row sums for the region, determining column sums for the region, encoding, in a compressed bitstream, at least one of the row sums and the column sums, and encoding, in the compressed bitstream and based on a coding order, at least one of the rows and the columns of the region. The coding order is based on the encoded at least one of the row sums and the column sums. The row sums include, for each row of the region, a respective count of a number of locations in the row having a specified value. The column sums include, for each column of the region, a respective count of a number of locations in the column having the specified value. A location having the specified value is indicative of the occupancy information at the location.
    Type: Application
    Filed: January 30, 2018
    Publication date: August 1, 2019
  • Patent number: 10192327
    Abstract: Methods, and systems, including computer programs encoded on computer storage media for compressing data items with variable compression rate. A system includes an encoder sub-network configured to receive a system input image and to generate an encoded representation of the system input image, the encoder sub-network including a first stack of neural network layers including one or more LSTM neural network layers and one or more non-LSTM neural network layers, the first stack configured to, at each of a plurality of time steps, receive an input image for the time step that is derived from the system input image and generate a corresponding first stack output, and a binarizing neural network layer configured to receive a first stack output as input and generate a corresponding binarized output.
    Type: Grant
    Filed: February 3, 2017
    Date of Patent: January 29, 2019
    Assignee: Google LLC
    Inventors: George Dan Toderici, Sean O'Malley, Rahul Sukthankar, Sung Jin Hwang, Damien Vincent, Nicholas Johnston, David Charles Minnen, Joel Shor, Michele Covell