Patents by Inventor Michail Raptis

Michail Raptis has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240062560
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for jointly performing text detection and layout analysis. In one aspect, a method comprises processing the image and a set of object queries to generate an encoded representation of the image and an encoded representation of the set of object queries; processing the encoded representation of the image and the encoded representation of the set of object queries to generate a set of text detection masks; processing the encoded representation of the set of object queries to generate layout relevance measures; processing the encoded representation of the set of object queries to generate textness scores for the text detection masks; generating a text detection output that defines respective areas of the image that include text items; and generating a layout analysis output that defines clusters of respective areas of the image identified by the text detection masks.
    Type: Application
    Filed: September 1, 2022
    Publication date: February 22, 2024
    Inventors: Shangbang Long, Siyang Qin, Dmitry Panteleev, Alessandro Bissacco, Yasuhisa Fujii, Michail Raptis
  • Patent number: 10223580
    Abstract: Methods and systems for video action recognition using poselet keyframes are disclosed. An action recognition model may be implemented to spatially and temporally model discriminative action components as a set of discriminative keyframes. One method of action recognition may include the operations of selecting a plurality of poselets that are components of an action, encoding each of a plurality of video frames as a summary of the detection confidence of each of the plurality of poselets for the video frame, and encoding correlations between poselets in the encoded video frames.
    Type: Grant
    Filed: September 12, 2013
    Date of Patent: March 5, 2019
    Assignee: Disney Enterprises, Inc.
    Inventors: Michail Raptis, Leonid Sigal
  • Patent number: 9514363
    Abstract: The disclosure provides an approach for detecting and localizing action in video. In one embodiment, an action detection application receives training video sequences and associated eye gaze fixation data collected from a sample of human viewers. Using the training video sequences and eye gaze data, the action detection application learns a model which includes a latent regions potential term that measures the compatibility of latent spatio-temporal regions with the model, as well as a context potential term that accounts for contextual information that is not directly produced by the appearance and motion of the actor. The action detection application may train this model in, e.g., the latent structural SVM framework by minimizing a cost function which encodes the cost of an incorrect action label prediction and a mislocalization of the eye gaze. During training and thereafter, inferences using the model may be made using an efficient dynamic programming algorithm.
    Type: Grant
    Filed: April 8, 2014
    Date of Patent: December 6, 2016
    Assignee: Disney Enterprises, Inc.
    Inventors: Leonid Sigal, Nataliya Shapovalova, Michail Raptis
  • Patent number: 9477908
    Abstract: The disclosure provides an approach for detecting objects in images. An object detection application receives a set of training images with object annotations. Given these training images, the object detection application generates semantic labeling for object detections, where the labeling includes lower-level subcategories and higher-level visual composites. In one embodiment, the object detection application identifies subcategories using an exemplar support vector machine (SVM) based clustering approach. Identified subcategories are used to initialize mixture components in mixture models which the object detection application trains in a latent SVM framework, thereby learning a number of subcategory classifiers that produce, for any given image, a set of candidate windows and associated subcategory labels.
    Type: Grant
    Filed: April 10, 2014
    Date of Patent: October 25, 2016
    Assignee: Disney Enterprises, Inc.
    Inventors: Leonid Sigal, Michail Raptis, Tian Lan
  • Publication number: 20150294192
    Abstract: The disclosure provides an approach for detecting objects in images. An object detection application receives a set of training images with object annotations. Given these training images, the object detection application generates semantic labeling for object detections, where the labeling includes lower-level subcategories and higher-level visual composites. In one embodiment, the object detection application identifies subcategories using an exemplar support vector machine (SVM) based clustering approach. Identified subcategories are used to initialize mixture components in mixture models which the object detection application trains in a latent SVM framework, thereby learning a number of subcategory classifiers that produce, for any given image, a set of candidate windows and associated subcategory labels.
    Type: Application
    Filed: April 10, 2014
    Publication date: October 15, 2015
    Applicant: DISNEY ENTERPRISES, INC.
    Inventors: Tian LAN, Michail RAPTIS, Leonid SIGAL
  • Publication number: 20150286853
    Abstract: The disclosure provides an approach for detecting and localizing action in video. In one embodiment, an action detection application receives training video sequences and associated eye gaze fixation data collected from a sample of human viewers. Using the training video sequences and eye gaze data, the action detection application learns a model which includes a latent regions potential term that measures the compatibility of latent spatio-temporal regions with the model, as well as a context potential term that accounts for contextual information that is not directly produced by the appearance and motion of the actor. The action detection application may train this model in, e.g., the latent structural SVM framework by minimizing a cost function which encodes the cost of an incorrect action label prediction and a mislocalization of the eye gaze. During training and therafter, inferences using the model may be made using an efficient dynamic programming algorithm.
    Type: Application
    Filed: April 8, 2014
    Publication date: October 8, 2015
    Applicant: DISNEY ENTERPRISES, INC.
    Inventors: Nataliya SHAPOVALOVA, Leonid SIGNAL, Michail RAPTIS
  • Publication number: 20140294360
    Abstract: Methods and systems for video action recognition using poselet keyframes are disclosed. An action recognition model may be implemented to spatially and temporally model discriminative action components as a set of discriminative keyframes. One method of action recognition may include the operations of selecting a plurality of poselets that are components of an action, encoding each of a plurality of video frames as a summary of the detection confidence of each of the plurality of poselets for the video frame, and encoding correlations between poselets in the encoded video frames.
    Type: Application
    Filed: September 12, 2013
    Publication date: October 2, 2014
    Applicant: Disney Enterprises, Inc.
    Inventors: MICHAIL RAPTIS, LEONID SIGAL
  • Patent number: 8761437
    Abstract: Human body motion is represented by a skeletal model derived from image data of a user. Skeletal model data may be used to perform motion recognition and/or similarity analysis of body motion. An example method of motion recognition includes receiving skeletal motion data representative of a user data motion feature from a capture device relating to a position of a user within a scene. A cross-correlation of the received skeletal motion data relative to a plurality of prototype motion features from a prototype motion feature database is determined. Likelihoods that the skeletal motion data corresponds to each of the plurality of prototype motion features are ranked. The likelihoods are determined using the cross-correlation. A classifying operation is performed on a subset of the plurality of prototype motion features. The subset of the plurality of prototype motion features is chosen because its members have the relatively highest likelihoods of corresponding to the skeletal motion data.
    Type: Grant
    Filed: February 18, 2011
    Date of Patent: June 24, 2014
    Assignee: Microsoft Corporation
    Inventors: Darko Kirovski, Michail Raptis
  • Publication number: 20120214594
    Abstract: Human body motion is represented by a skeletal model derived from image data of a user. Skeletal model data may be used to perform motion recognition and/or similarity analysis of body motion. An example method of motion recognition includes receiving skeletal motion data representative of a user data motion feature from a capture device relating to a position of a user within a scene. A cross-correlation of the received skeletal motion data relative to a plurality of prototype motion features from a prototype motion feature database is determined. Likelihoods that the skeletal motion data corresponds to each of the plurality of prototype motion features are ranked. The likelihoods are determined using the cross-correlation. A classifying operation is performed on a subset of the plurality of prototype motion features. The subset of the plurality of prototype motion features is chosen because its members have the relatively highest likelihoods of corresponding to the skeletal motion data.
    Type: Application
    Filed: February 18, 2011
    Publication date: August 23, 2012
    Applicant: Microsoft Corporation
    Inventors: Darko Kirovski, Michail Raptis