Patents by Inventor Michail Raptis
Michail Raptis has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240062560Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for jointly performing text detection and layout analysis. In one aspect, a method comprises processing the image and a set of object queries to generate an encoded representation of the image and an encoded representation of the set of object queries; processing the encoded representation of the image and the encoded representation of the set of object queries to generate a set of text detection masks; processing the encoded representation of the set of object queries to generate layout relevance measures; processing the encoded representation of the set of object queries to generate textness scores for the text detection masks; generating a text detection output that defines respective areas of the image that include text items; and generating a layout analysis output that defines clusters of respective areas of the image identified by the text detection masks.Type: ApplicationFiled: September 1, 2022Publication date: February 22, 2024Inventors: Shangbang Long, Siyang Qin, Dmitry Panteleev, Alessandro Bissacco, Yasuhisa Fujii, Michail Raptis
-
Patent number: 10223580Abstract: Methods and systems for video action recognition using poselet keyframes are disclosed. An action recognition model may be implemented to spatially and temporally model discriminative action components as a set of discriminative keyframes. One method of action recognition may include the operations of selecting a plurality of poselets that are components of an action, encoding each of a plurality of video frames as a summary of the detection confidence of each of the plurality of poselets for the video frame, and encoding correlations between poselets in the encoded video frames.Type: GrantFiled: September 12, 2013Date of Patent: March 5, 2019Assignee: Disney Enterprises, Inc.Inventors: Michail Raptis, Leonid Sigal
-
Patent number: 9514363Abstract: The disclosure provides an approach for detecting and localizing action in video. In one embodiment, an action detection application receives training video sequences and associated eye gaze fixation data collected from a sample of human viewers. Using the training video sequences and eye gaze data, the action detection application learns a model which includes a latent regions potential term that measures the compatibility of latent spatio-temporal regions with the model, as well as a context potential term that accounts for contextual information that is not directly produced by the appearance and motion of the actor. The action detection application may train this model in, e.g., the latent structural SVM framework by minimizing a cost function which encodes the cost of an incorrect action label prediction and a mislocalization of the eye gaze. During training and thereafter, inferences using the model may be made using an efficient dynamic programming algorithm.Type: GrantFiled: April 8, 2014Date of Patent: December 6, 2016Assignee: Disney Enterprises, Inc.Inventors: Leonid Sigal, Nataliya Shapovalova, Michail Raptis
-
Patent number: 9477908Abstract: The disclosure provides an approach for detecting objects in images. An object detection application receives a set of training images with object annotations. Given these training images, the object detection application generates semantic labeling for object detections, where the labeling includes lower-level subcategories and higher-level visual composites. In one embodiment, the object detection application identifies subcategories using an exemplar support vector machine (SVM) based clustering approach. Identified subcategories are used to initialize mixture components in mixture models which the object detection application trains in a latent SVM framework, thereby learning a number of subcategory classifiers that produce, for any given image, a set of candidate windows and associated subcategory labels.Type: GrantFiled: April 10, 2014Date of Patent: October 25, 2016Assignee: Disney Enterprises, Inc.Inventors: Leonid Sigal, Michail Raptis, Tian Lan
-
Publication number: 20150294192Abstract: The disclosure provides an approach for detecting objects in images. An object detection application receives a set of training images with object annotations. Given these training images, the object detection application generates semantic labeling for object detections, where the labeling includes lower-level subcategories and higher-level visual composites. In one embodiment, the object detection application identifies subcategories using an exemplar support vector machine (SVM) based clustering approach. Identified subcategories are used to initialize mixture components in mixture models which the object detection application trains in a latent SVM framework, thereby learning a number of subcategory classifiers that produce, for any given image, a set of candidate windows and associated subcategory labels.Type: ApplicationFiled: April 10, 2014Publication date: October 15, 2015Applicant: DISNEY ENTERPRISES, INC.Inventors: Tian LAN, Michail RAPTIS, Leonid SIGAL
-
Publication number: 20150286853Abstract: The disclosure provides an approach for detecting and localizing action in video. In one embodiment, an action detection application receives training video sequences and associated eye gaze fixation data collected from a sample of human viewers. Using the training video sequences and eye gaze data, the action detection application learns a model which includes a latent regions potential term that measures the compatibility of latent spatio-temporal regions with the model, as well as a context potential term that accounts for contextual information that is not directly produced by the appearance and motion of the actor. The action detection application may train this model in, e.g., the latent structural SVM framework by minimizing a cost function which encodes the cost of an incorrect action label prediction and a mislocalization of the eye gaze. During training and therafter, inferences using the model may be made using an efficient dynamic programming algorithm.Type: ApplicationFiled: April 8, 2014Publication date: October 8, 2015Applicant: DISNEY ENTERPRISES, INC.Inventors: Nataliya SHAPOVALOVA, Leonid SIGNAL, Michail RAPTIS
-
Publication number: 20140294360Abstract: Methods and systems for video action recognition using poselet keyframes are disclosed. An action recognition model may be implemented to spatially and temporally model discriminative action components as a set of discriminative keyframes. One method of action recognition may include the operations of selecting a plurality of poselets that are components of an action, encoding each of a plurality of video frames as a summary of the detection confidence of each of the plurality of poselets for the video frame, and encoding correlations between poselets in the encoded video frames.Type: ApplicationFiled: September 12, 2013Publication date: October 2, 2014Applicant: Disney Enterprises, Inc.Inventors: MICHAIL RAPTIS, LEONID SIGAL
-
Patent number: 8761437Abstract: Human body motion is represented by a skeletal model derived from image data of a user. Skeletal model data may be used to perform motion recognition and/or similarity analysis of body motion. An example method of motion recognition includes receiving skeletal motion data representative of a user data motion feature from a capture device relating to a position of a user within a scene. A cross-correlation of the received skeletal motion data relative to a plurality of prototype motion features from a prototype motion feature database is determined. Likelihoods that the skeletal motion data corresponds to each of the plurality of prototype motion features are ranked. The likelihoods are determined using the cross-correlation. A classifying operation is performed on a subset of the plurality of prototype motion features. The subset of the plurality of prototype motion features is chosen because its members have the relatively highest likelihoods of corresponding to the skeletal motion data.Type: GrantFiled: February 18, 2011Date of Patent: June 24, 2014Assignee: Microsoft CorporationInventors: Darko Kirovski, Michail Raptis
-
Publication number: 20120214594Abstract: Human body motion is represented by a skeletal model derived from image data of a user. Skeletal model data may be used to perform motion recognition and/or similarity analysis of body motion. An example method of motion recognition includes receiving skeletal motion data representative of a user data motion feature from a capture device relating to a position of a user within a scene. A cross-correlation of the received skeletal motion data relative to a plurality of prototype motion features from a prototype motion feature database is determined. Likelihoods that the skeletal motion data corresponds to each of the plurality of prototype motion features are ranked. The likelihoods are determined using the cross-correlation. A classifying operation is performed on a subset of the plurality of prototype motion features. The subset of the plurality of prototype motion features is chosen because its members have the relatively highest likelihoods of corresponding to the skeletal motion data.Type: ApplicationFiled: February 18, 2011Publication date: August 23, 2012Applicant: Microsoft CorporationInventors: Darko Kirovski, Michail Raptis