Patents by Inventor Mihir JAIN

Mihir JAIN has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20220318553
    Abstract: Systems and techniques are provided for performing holistic video understanding. For example a process can include obtaining a first video and determining, using a machine learning model decision engine, a first machine learning model from a set of machine learning models to use for processing at least a portion of the first video. The first machine learning model can be determined based on one or more characteristics of at least the portion of the first video. The process can include processing at least the portion of the first video using the first machine learning model.
    Type: Application
    Filed: March 31, 2021
    Publication date: October 6, 2022
    Inventors: Haitam BEN YAHIA, Amir GHODRATI, Mihir JAIN, Amirhossein HABIBIAN
  • Publication number: 20220301310
    Abstract: Certain aspects of the present disclosure provide a method of processing video data. In one example, the method includes receiving input video data; sampling a first subset of clips from the input video data; providing the first subset of clips to a first component of a machine learning model to generate first output; sampling a second subset of clips from the input video data, wherein the second subset of clips comprises fewer clips than the first subset of clips; providing the second subset of clips to a second component of the machine learning model to generate a second output; aggregating the first output from the first component of the machine learning model with the second output from the second component of the machine learning model to generate aggregated output; and determining a characteristic of the input video data based on the aggregated output.
    Type: Application
    Filed: March 15, 2022
    Publication date: September 22, 2022
    Inventors: Hanul KIM, Mihir Jain, Juntae Lee, Sungrack Yun, Fatih Murat Porikli
  • Publication number: 20220156514
    Abstract: Certain aspects of the present disclosure provide techniques for training a first model based on a first labeled video dataset; generating a plurality of action-words based on output generated by the first model processing motion data in videos of an unlabeled video dataset; defining labels for the videos in the unlabeled video dataset based on the generated action-words; and training a second model based on the labels for the videos in the unlabeled video dataset.
    Type: Application
    Filed: November 12, 2021
    Publication date: May 19, 2022
    Inventors: Kirill GAVRILYUK, Mihir JAIN, Cornelis Gerardus Maria SNOEK
  • Publication number: 20220101087
    Abstract: A method performed by an artificial neural network (ANN) includes determining, at a first stage of a multi-stage cross-attention model of the ANN, a first cross-correlation between a first representation of each modality of a number of modalities associated with a sequence of inputs. The method still further includes determining, at each second stage of one or more second stages of the multi-stage cross-attention model, a second cross-correlation between first attended representations of each modality. The method also includes generating a concatenated feature representation associated with a final second stage of the one or more second stages based on the second cross-correlation associated with the final second stage, the first attended representation of each modality, and the first representation of each modality. The method further includes determining a probability distribution between a set of background actions and a set of foreground actions from the concatenated feature representation.
    Type: Application
    Filed: August 18, 2021
    Publication date: March 31, 2022
    Inventors: Juntae LEE, Mihir JAIN, Sungrack YUN, Hyoungwoo PARK, Kyu Woong HWANG
  • Patent number: 11256964
    Abstract: A method for predicting a future action of agents in a scene includes assigning a fidelity level to agents observed in the scene. The method also includes recursively predicting future actions of the agents by traversing the scene. A different forward prediction model is used at each recursion level. The method further includes controlling an action of an ego agent based on the predicted future actions of the agents.
    Type: Grant
    Filed: October 10, 2019
    Date of Patent: February 22, 2022
    Assignee: Qualcomm Incorporated
    Inventors: Kyle Jordan Brown, Mihir Jain, Ahmed Kamel Sadek
  • Patent number: 10776628
    Abstract: A method for processing a sequence of frames includes receiving a sequence of frames and multiple action proposals for the sequence of frames. The method also includes generating a representation of the sequence of frames and pooling the representation around each of the action proposals. The method further includes classifying the action proposals based on the pooled representations and controlling a device based on the classifying.
    Type: Grant
    Filed: October 4, 2018
    Date of Patent: September 15, 2020
    Assignee: Qualcomm Incorporated
    Inventors: Victor Augusto Escorcia, Mihir Jain, Amirhossein Habibian, Cornelis Gerardus Maria Snoek
  • Publication number: 20200117958
    Abstract: A method for predicting a future action of agents in a scene includes assigning a fidelity level to agents observed in the scene. The method also includes recursively predicting future actions of the agents by traversing the scene. A different forward prediction model is used at each recursion level. The method further includes controlling an action of an ego agent based on the predicted future actions of the agents.
    Type: Application
    Filed: October 10, 2019
    Publication date: April 16, 2020
    Inventors: Kyle Jordan BROWN, Mihir JAIN, Ahmed Kamel SADEK
  • Publication number: 20190108400
    Abstract: A method for generating action proposals in a sequence of frames comprises determining, at each frame of the sequence of frames, at least one possible action location for a type of actor to be detected. The method also expands, for each frame of the sequence of frames, the at least one possible action location to neighboring regions in neighboring frames from a given frame to identify a similar location between the given frame and each one of the neighboring frames. The method further comprises associating a most similar possible action location over the sequence of frames to generate the action proposals. The method also comprises classifying an action in the sequence of frames based on the action proposals and controlling an action of a device based on the classifying.
    Type: Application
    Filed: October 5, 2018
    Publication date: April 11, 2019
    Inventors: Victor Augusto ESCORCIA, Mihir JAIN, Amirhossein HABIBIAN, Cornelis Gerardus Maria SNOEK
  • Publication number: 20190108399
    Abstract: A method for processing a sequence of frames includes receiving a sequence of frames and multiple action proposals for the sequence of frames. The method also includes generating a representation of the sequence of frames and pooling the representation around each of the action proposals. The method further includes classifying the action proposals based on the pooled representations and controlling a device based on the classifying.
    Type: Application
    Filed: October 4, 2018
    Publication date: April 11, 2019
    Inventors: Victor Augusto ESCORCIA, Mihir JAIN, Amirhossein HABIBIAN, Cornelis Gerardus Maria SNOEK
  • Patent number: 10049279
    Abstract: A method of predicting action labels for a video stream includes receiving the video stream and calculating an optical flow of consecutive frames of the video stream. An attention map is generated from the current frame of the video stream and the calculated optical flow. An action label is predicted for the current frame based on the optical flow, a previous hidden state and the attention map.
    Type: Grant
    Filed: September 16, 2016
    Date of Patent: August 14, 2018
    Inventors: Zhenyang Li, Efstratios Gavves, Mihir Jain, Cornelis Gerardus Maria Snoek
  • Patent number: 9830709
    Abstract: A method of processing data within a convolutional attention recurrent neural network (RNN) includes generating a current multi-dimensional attention map. The current multi-dimensional attention map indicates areas of interest in a first frame from a sequence of spatio-temporal data. The method further includes receiving a multi-dimensional feature map. The method also includes convolving the current multi-dimensional attention map and the multi-dimensional feature map to obtain a multi-dimensional hidden state and a next multi-dimensional attention map. The method identifies a class of interest in the first frame based on the multi-dimensional hidden state and training data.
    Type: Grant
    Filed: August 26, 2016
    Date of Patent: November 28, 2017
    Assignee: QUALCOMM Incorporated
    Inventors: Zhenyang Li, Efstratios Gavves, Mihir Jain, Cornelis Gerardus Maria Snoek
  • Publication number: 20170262996
    Abstract: A method generates bounding-boxes within frames of a sequence of frames. The bounding-boxes may be generated via a recurrent neural network (RNN) such as a long short-term memory (LSTM) network. The method includes receiving the sequence of frames and generating an attention feature map for each frame of the sequence of frames. Each attention feature map indicates at least one potential moving object. The method also includes up-sampling each attention feature map to determine an attention saliency for pixels in each frame of the sequence of frames. The method further includes generating a bounding-box within each frame based on the attention saliency and temporally smoothing multiple bounding-boxes along the sequence of frames to obtain a smooth sequence of bounding-boxes. The method still further includes localizing an action location within each frame based on the smooth sequence of bounding-boxes.
    Type: Application
    Filed: August 29, 2016
    Publication date: September 14, 2017
    Inventors: Mihir JAIN, Zhenyang LI, Efstratios GAVVES, Cornelis Gerardus Maria SNOEK
  • Publication number: 20170262705
    Abstract: A method of predicting action labels for a video stream includes receiving the video stream and calculating an optical flow of consecutive frames of the video stream. An attention map is generated from the current frame of the video stream and the calculated optical flow. An action label is predicted for the current frame based on the optical flow, a previous hidden state and the attention map.
    Type: Application
    Filed: September 16, 2016
    Publication date: September 14, 2017
    Inventors: Zhenyang LI, Efstratios GAVVES, Mihir JAIN, Cornelis Gerardus Maria SNOEK
  • Publication number: 20170262995
    Abstract: A method of processing data within a convolutional attention recurrent neural network (RNN) includes generating a current multi-dimensional attention map. The current multi-dimensional attention map indicates areas of interest in a first frame from a sequence of spatio-temporal data. The method further includes receiving a multi-dimensional feature map. The method also includes convolving the current multi-dimensional attention map and the multi-dimensional feature map to obtain a multi-dimensional hidden state and a next multi-dimensional attention map. The method identifies a class of interest in the first frame based on the multi-dimensional hidden state and training data.
    Type: Application
    Filed: August 26, 2016
    Publication date: September 14, 2017
    Inventors: Zhenyang LI, Efstratios GAVVES, Mihir JAIN, Cornelis Gerardus Maria SNOEK