Patents by Inventor Rahul Sukthankar
Rahul Sukthankar has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11908071Abstract: The present disclosure is generally directed to reconstructing representations of bodies from images. An example method of the present disclosure includes inputting, into a machine-learned reconstruction model, input data descriptive of an image depicting a body; predicting, using a machine-learned marker prediction component of the reconstruction model, a set of surface marker locations on the body; and outputting, using a machine-learned marker poser component of the reconstruction model, an output representation of the body that corresponds to the set of surface marker locations. In the example method, one or more parameters of the reconstruction model were learned at least in part based on a consistency loss corresponding to a distance between relaxed-constraint representations generated from a prior set of surface marker locations predicted according to the one or more parameters and parametric representations generated from the prior set using kinematic constraints associated with the body.Type: GrantFiled: October 7, 2021Date of Patent: February 20, 2024Assignee: GOOGLE LLCInventors: Cristian Sminchisescu, Mihai Zanfir, Andrei Zanfir, Eduard Gabriel Bazavan, William Tafel Freeman, Rahul Sukthankar
-
Patent number: 11836221Abstract: Systems and methods are directed to a method for estimation of an object state from image data. The method can include obtaining two-dimensional image data depicting an object. The method can include processing, with an estimation portion of a machine-learned object state estimation model, the two-dimensional image data to obtain an initial estimated state of the object. The method can include, for each of one or more refinement iterations, obtaining a previous loss value associated with a previous estimated state for the object, processing the previous loss value to obtain a current estimated state of the object, and evaluating a loss function to determine a loss value associated with the current estimated state of the object. The method can include providing a final estimated state for the object.Type: GrantFiled: March 12, 2021Date of Patent: December 5, 2023Assignee: GOOGLE LLCInventors: Cristian Sminchisescu, Andrei Zanfir, Eduard Gabriel Bazavan, Mihai Zanfir, William Tafel Freeman, Rahul Sukthankar
-
Patent number: 11763466Abstract: A system comprising an encoder neural network, a scene structure decoder neural network, and a motion decoder neural network. The encoder neural network is configured to: receive a first image and a second image; and process the first image and the second image to generate an encoded representation of the first image and the second image. The scene structure decoder neural network is configured to process the encoded representation to generate a structure output characterizing a structure of a scene depicted in the first image. The motion decoder neural network configured to process the encoded representation to generate a motion output characterizing motion between the first image and the second image.Type: GrantFiled: December 23, 2020Date of Patent: September 19, 2023Assignee: Google LLCInventors: Cordelia Luise Schmid, Sudheendra Vijayanarasimhan, Susanna Maria Ricco, Bryan Andrew Seybold, Rahul Sukthankar, Aikaterini Fragkiadaki
-
Publication number: 20230169727Abstract: The present disclosure provides a statistical, articulated 3D human shape modeling pipeline within a fully trainable, modular, deep learning framework. In particular, aspects of the present disclosure are directed to a machine-learned 3D human shape model with at least facial and body shape components that are jointly trained end-to-end on a set of training data. Joint training of the model components (e.g., including both facial, hands, and rest of body components) enables improved consistency of synthesis between the generated face and body shapes.Type: ApplicationFiled: April 30, 2020Publication date: June 1, 2023Inventors: Cristian Sminchisescu, Hongyi Xu, Eduard Gabriel Bazavan, Andrei Zanfir, William T. Freeman, Rahul Sukthankar
-
Publication number: 20230116884Abstract: The present disclosure is generally directed to reconstructing representations of bodies from images. An example method of the present disclosure includes inputting, into a machine-learned reconstruction model, input data descriptive of an image depicting a body; predicting, using a machine-learned marker prediction component of the reconstruction model, a set of surface marker locations on the body; and outputting, using a machine-learned marker poser component of the reconstruction model, an output representation of the body that corresponds to the set of surface marker locations. In the example method, one or more parameters of the reconstruction model were learned at least in part based on a consistency loss corresponding to a distance between relaxed-constraint representations generated from a prior set of surface marker locations predicted according to the one or more parameters and parametric representations generated from the prior set using kinematic constraints associated with the body.Type: ApplicationFiled: October 7, 2021Publication date: April 13, 2023Inventors: Cristian Sminchisescu, Mihai Zanfir, Andrei Zanfir, Eduard Gabriel Bazavan, William Tafel Freeman, Rahul Sukthankar
-
Publication number: 20220292314Abstract: Systems and methods are directed to a method for estimation of an object state from image data. The method can include obtaining two-dimensional image data depicting an object. The method can include processing, with an estimation portion of a machine-learned object state estimation model, the two-dimensional image data to obtain an initial estimated state of the object. The method can include, for each of one or more refinement iterations, obtaining a previous loss value associated with a previous estimated state for the object, processing the previous loss value to obtain a current estimated state of the object, and evaluating a loss function to determine a loss value associated with the current estimated state of the object. The method can include providing a final estimated state for the object.Type: ApplicationFiled: March 12, 2021Publication date: September 15, 2022Inventors: Cristian Sminchisescu, Andrei Zanfir, Eduard Gabriel Bazavan, Mihai Zanfir, William Tafel Freeman, Rahul Sukthankar
-
Publication number: 20220237882Abstract: The present disclosure is directed to encoding images. In particular, one or more computing devices can receive data representing one or more machine learning (ML) models configured, at least in part, to encode images comprising objects of a particular type. The computing device(s) can receive data representing an image comprising one or more objects of the particular type. The computing device(s) can generate, based at least in part on the data representing the image and the data representing the ML model(s), data representing an encoded version of the image that alters at least a portion of the image comprising the object(s) such that when the encoded version of the image is decoded, the object(s) are unrecognizable as being of the particular type by one or more object-recognition ML models based at least in part upon which the ML model(s) configured to encode the images were trained.Type: ApplicationFiled: May 28, 2019Publication date: July 28, 2022Inventors: Shumeet Baluja, Rahul Sukthankar
-
Patent number: 11163989Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing action localization in images and videos. In one aspect, a system comprises a data processing apparatus; a memory in data communication with the data processing apparatus and storing instructions that cause the data processing apparatus to perform image processing and video processing operations comprising: receiving an input comprising an image depicting a person; identifying a plurality of context positions from the image; determining respective feature representations of each of the context positions; providing a feature representation of the person and the feature representations of each of the context positions to a context neural network to obtain relational features, wherein the relational features represent relationships between the person and the context positions; and determining an action performed by the person using the feature representation of the person and the relational features.Type: GrantFiled: August 6, 2019Date of Patent: November 2, 2021Assignee: Google LLCInventors: Chen Sun, Abhinav Shrivastava, Cordelia Luise Schmid, Rahul Sukthankar, Kevin Patrick Murphy, Carl Martin Vondrick
-
Publication number: 20210166009Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing action localization. In one aspect, a system comprises a data processing apparatus; a memory in data communication with the data processing apparatus and storing instructions that cause the data processing apparatus to perform operations comprising: receiving an input comprising an image depicting a person; identifying a plurality of context positions from the image; determining respective feature representations of each of the context positions; providing a feature representation of the person and the feature representations of each of the context positions to a context neural network to obtain relational features, wherein the relational features represent relationships between the person and the context positions; and determining an action performed by the person using the feature representation of the person and the relational features.Type: ApplicationFiled: August 6, 2019Publication date: June 3, 2021Inventors: Chen Sun, Abhinav Shrivastava, Cordelia Luise Schmid, Rahul Sukthankar, Kevin Patrick Murphy, Carl Martin Vondrick
-
Patent number: 11010948Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for navigation using visual inputs. One of the systems includes a mapping subsystem configured to, at each time step of a plurality of time steps, generate a characterization of an environment from an image of the environment at the time step, wherein the characterization comprises an environment map identifying locations in the environment having a particular characteristic, and wherein generating the characterization comprises, for each time step: obtaining the image of the environment at the time step, processing the image to generate a first initial characterization for the time step, obtaining a final characterization for a previous time step, processing the characterization for the previous time step to generate a second initial characterization for the time step, and combining the first initial characterization and the second initial characterization to generate a final characterization for the time step.Type: GrantFiled: February 9, 2018Date of Patent: May 18, 2021Assignee: Google LLCInventors: Rahul Sukthankar, Saurabh Gupta, James Christopher Davidson, Sergey Vladimir Levine, Jitendra Malik
-
Publication number: 20210118153Abstract: A system comprising an encoder neural network, a scene structure decoder neural network, and a motion decoder neural network. The encoder neural network is configured to: receive a first image and a second image; and process the first image and the second image to generate an encoded representation of the first image and the second image. The scene structure decoder neural network is configured to process the encoded representation to generate a structure output characterizing a structure of a scene depicted in the first image. The motion decoder neural network configured to process the encoded representation to generate a motion output characterizing motion between the first image and the second image.Type: ApplicationFiled: December 23, 2020Publication date: April 22, 2021Inventors: Cordelia Luise Schmid, Sudheendra Vijayanarasimhan, Susanna Maria Ricco, Bryan Andrew Seybold, Rahul Sukthankar, Aikaterini Fragkiadaki
-
Patent number: 10878583Abstract: A system comprising an encoder neural network, a scene structure decoder neural network, and a motion decoder neural network. The encoder neural network is configured to: receive a first image and a second image; and process the first image and the second image to generate an encoded representation of the first image and the second image. The scene structure decoder neural network is configured to process the encoded representation to generate a structure output characterizing a structure of a scene depicted in the first image. The motion decoder neural network configured to process the encoded representation to generate a motion output characterizing motion between the first image and the second image.Type: GrantFiled: December 1, 2017Date of Patent: December 29, 2020Assignee: Google LLCInventors: Cordelia Luise Schmid, Sudheendra Vijayanarasimhan, Susanna Maria Ricco, Bryan Andrew Seybold, Rahul Sukthankar, Aikaterini Fragkiadaki
-
Publication number: 20200349722Abstract: A system comprising an encoder neural network, a scene structure decoder neural network, and a motion decoder neural network. The encoder neural network is configured to: receive a first image and a second image; and process the first image and the second image to generate an encoded representation of the first image and the second image. The scene structure decoder neural network is configured to process the encoded representation to generate a structure output characterizing a structure of a scene depicted in the first image. The motion decoder neural network configured to process the encoded representation to generate a motion output characterizing motion between the first image and the second image.Type: ApplicationFiled: December 1, 2017Publication date: November 5, 2020Inventors: Cordelia Luise Schmid, Sudheendra Vijayanarasimhan, Susanna Maria Ricco, Bryan Andrew Seybold, Rahul Sukthankar, Aikaterini Fragkiadaki
-
Patent number: 10713818Abstract: Methods, and systems, including computer programs encoded on computer storage media for compressing data items with variable compression rate. A system includes an encoder sub-network configured to receive a system input image and to generate an encoded representation of the system input image, the encoder sub-network including a first stack of neural network layers including one or more LSTM neural network layers and one or more non-LSTM neural network layers, the first stack configured to, at each of a plurality of time steps, receive an input image for the time step that is derived from the system input image and generate a corresponding first stack output, and a binarizing neural network layer configured to receive a first stack output as input and generate a corresponding binarized output.Type: GrantFiled: January 28, 2019Date of Patent: July 14, 2020Assignee: Google LLCInventors: George Dan Toderici, Sean O'Malley, Rahul Sukthankar, Sung Jin Hwang, Damien Vincent, Nicholas Johnston, David Charles Minnen, Joel Shor, Michele Covell
-
Patent number: 10681388Abstract: Encoding and decoding occupancy information is disclosed. A method includes determining row sums for the region, determining column sums for the region, encoding, in a compressed bitstream, at least one of the row sums and the column sums, and encoding, in the compressed bitstream and based on a coding order, at least one of the rows and the columns of the region. The coding order is based on the encoded at least one of the row sums and the column sums. The row sums include, for each row of the region, a respective count of a number of locations in the row having a specified value. The column sums include, for each column of the region, a respective count of a number of locations in the column having the specified value. A location having the specified value is indicative of the occupancy information at the location.Type: GrantFiled: January 30, 2018Date of Patent: June 9, 2020
-
Patent number: 10635979Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining a clustering of images into a plurality of semantic categories. In one aspect, a method comprises: training a categorization neural network, comprising, at each of a plurality of iterations: processing an image depicting an object using the categorization neural network to generate (i) a current prediction for whether the image depicts an object or a background region, and (ii) a current embedding of the image; determining a plurality of current cluster centers based on the current values of the categorization neural network parameters, wherein each cluster center represents a respective semantic category; and determining a gradient of an objective function that includes a classification loss and a clustering loss, wherein the clustering loss depends on a similarity between the current embedding of the image and the current cluster centers.Type: GrantFiled: July 15, 2019Date of Patent: April 28, 2020Assignee: Google LLCInventors: Steven Hickson, Anelia Angelova, Irfan Aziz Essa, Rahul Sukthankar
-
Publication number: 20200027002Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining a clustering of images into a plurality of semantic categories. In one aspect, a method comprises: training a categorization neural network, comprising, at each of a plurality of iterations: processing an image depicting an object using the categorization neural network to generate (i) a current prediction for whether the image depicts an object or a background region, and (ii) a current embedding of the image; determining a plurality of current cluster centers based on the current values of the categorization neural network parameters, wherein each cluster center represents a respective semantic category; and determining a gradient of an objective function that includes a classification loss and a clustering loss, wherein the clustering loss depends on a similarity between the current embedding of the image and the current cluster centers.Type: ApplicationFiled: July 15, 2019Publication date: January 23, 2020Inventors: Steven Hickson, Anelia Angelova, Irfan Aziz Essa, Rahul Sukthankar
-
Publication number: 20190371025Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for navigation using visual inputs. One of the systems includes a mapping subsystem configured to, at each time step of a plurality of time steps, generate a characterization of an environment from an image of the environment at the time step, wherein the characterization comprises an environment map identifying locations in the environment having a particular characteristic, and wherein generating the characterization comprises, for each time step: obtaining the image of the environment at the time step, processing the image to generate a first initial characterization for the time step, obtaining a final characterization for a previous time step, processing the characterization for the previous time step to generate a second initial characterization for the time step, and combining the first initial characterization and the second initial characterization to generate a final characterization for the time step.Type: ApplicationFiled: February 9, 2018Publication date: December 5, 2019Inventors: Rahul Sukthankar, Saurabh Gupta, James Christopher Davidson, Sergey Vladimir Levine, Jitendra Malik
-
Publication number: 20190238893Abstract: Encoding and decoding occupancy information is disclosed. A method includes determining row sums for the region, determining column sums for the region, encoding, in a compressed bitstream, at least one of the row sums and the column sums, and encoding, in the compressed bitstream and based on a coding order, at least one of the rows and the columns of the region. The coding order is based on the encoded at least one of the row sums and the column sums. The row sums include, for each row of the region, a respective count of a number of locations in the row having a specified value. The column sums include, for each column of the region, a respective count of a number of locations in the column having the specified value. A location having the specified value is indicative of the occupancy information at the location.Type: ApplicationFiled: January 30, 2018Publication date: August 1, 2019
-
Patent number: 10192327Abstract: Methods, and systems, including computer programs encoded on computer storage media for compressing data items with variable compression rate. A system includes an encoder sub-network configured to receive a system input image and to generate an encoded representation of the system input image, the encoder sub-network including a first stack of neural network layers including one or more LSTM neural network layers and one or more non-LSTM neural network layers, the first stack configured to, at each of a plurality of time steps, receive an input image for the time step that is derived from the system input image and generate a corresponding first stack output, and a binarizing neural network layer configured to receive a first stack output as input and generate a corresponding binarized output.Type: GrantFiled: February 3, 2017Date of Patent: January 29, 2019Assignee: Google LLCInventors: George Dan Toderici, Sean O'Malley, Rahul Sukthankar, Sung Jin Hwang, Damien Vincent, Nicholas Johnston, David Charles Minnen, Joel Shor, Michele Covell