Abstract: Methods and systems for online domain adaptation for multi-object tracking. Video of an area of interest can be captured with an image-capturing unit. The video (e.g., video images) can be analyzed with a pre-trained object detector utilizing online domain adaptation including convex multi-task learning and an associated self-tuning stochastic optimization procedure to jointly adapt online all trackers associated with the pre-trained object detector and a pre-trained category-level model from the trackers in order to efficiently track a plurality of objects in the video captured by the image-capturing unit.
Abstract: A method for generating a system for predicting saliency in an image and method of use of the prediction system are described. Attention maps for each of a set of training images are used to train the system. The training includes passing the training images though a neural network and optimizing an objective function over the training set which is based on a distance measure computed between a first probability distribution computed for a saliency map output by the neural network and a second probability distribution computed for the attention map for the respective training image. The trained neural network is suited to generation of saliency maps for new images.
Abstract: A method for generating a system for predicting saliency in an image and method of use of the prediction system are described. Attention maps for each of a set of training images are used to train the system. The training includes passing the training images though a neural network and optimizing an objective function over the training set which is based on a distance measure computed between a first probability distribution computed for a saliency map output by the neural network and a second probability distribution computed for the attention map for the respective training image. The trained neural network is suited to generation of saliency maps for new images.
Abstract: Methods and systems for online domain adaptation for multi-object tracking. Video of an area of interest can be captured with an image-capturing unit. The video (e.g., video images) can be analyzed with a pre-trained object detector utilizing online domain adaptation including convex multi-task learning and an associated self-tuning stochastic optimization procedure to jointly adapt online all trackers associated with the pre-trained object detector and a pre-trained category-level model from the trackers in order to efficiently track a plurality of objects in the video captured by the image-capturing unit.
Abstract: A computer-implemented video classification method and system are disclosed. The method includes receiving an input video including a sequence of frames. At least one transformation of the input video is generated, each transformation including a sequence of frames. For the input video and each transformation, local descriptors are extracted from the respective sequence of frames. The local descriptors of the input video and each transformation are aggregated to form an aggregated feature vector with a first set of processing layers learned using unsupervised learning. An output classification value is generated for the input video, based on the aggregated feature vector with a second set of processing layers learned using supervised learning.
Type:
Application
Filed:
August 18, 2016
Publication date:
February 22, 2018
Applicant:
Xerox Corporation
Inventors:
César Roberto De Souza, Adrien Gaidon, Eleonora Vig, Antonio M. Lopez
Abstract: A computer-implemented video classification method and system are disclosed. The method includes receiving an input video including a sequence of frames. At least one transformation of the input video is generated, each transformation including a sequence of frames. For the input video and each transformation, local descriptors are extracted from the respective sequence of frames. The local descriptors of the input video and each transformation are aggregated to form an aggregated feature vector with a first set of processing layers learned using unsupervised learning. An output classification value is generated for the input video, based on the aggregated feature vector with a second set of processing layers learned using supervised learning.
Type:
Grant
Filed:
August 18, 2016
Date of Patent:
April 17, 2018
Assignee:
XEROX CORPORATION
Inventors:
César Roberto De Souza, Adrien Gaidon, Eleonora Vig, Antonio M. Lopez
Abstract: A system and method are suited for assessing video performance analysis. A computer graphics engine clones real-world data in a virtual world by decomposing the real-world data into visual components and objects in one or more object categories and populates the virtual world with virtual visual components and virtual objects. A scripting component controls the virtual visual components and the virtual objects in the virtual world based on the set of real-world data. A synthetic clone of the video sequence is generated based on the script controlling the virtual visual components and the virtual objects. The real-world data is compared with the synthetic clone of the video sequence and a transferability of conclusions from the virtual world to the real-world is assessed based on this comparison.
Abstract: A system and method are suited for assessing video performance analysis. A computer graphics engine clones real-world data in a virtual world by decomposing the real-world data into visual components and objects in one or more object categories and populates the virtual world with virtual visual components and virtual objects. A scripting component controls the virtual visual components and the virtual objects in the virtual world based on the set of real-world data. A synthetic clone of the video sequence is generated based on the script controlling the virtual visual components and the virtual objects. The real-world data is compared with the synthetic clone of the video sequence and a transferability of conclusions from the virtual world to the real-world is assessed based on this comparison.
Abstract: Systems and techniques are generally described for generating visually blended recommendation grids. In some examples, a selection of a first item and a second item displayed on a display may be received. In various examples, the first item may be displayed in a first element of a grid and the second item may be displayed in a second element of the grid. In some examples, a third element of the grid that is disposed between the first element and the second element along an axis of the grid may be determined. In various examples, a third item may be determined for display in the third element of the grid based at least in part on a blended representation of an embedding of the first item and an embedding of the second item. The third item may be displayed in the third element of the grid.
Type:
Grant
Filed:
September 28, 2020
Date of Patent:
August 16, 2022
Assignee:
AMAZON TECHNOLOGIES, INC.
Inventors:
Loris Bazzani, Filip Saina, Amaia Salvador Aguilera, Angel Noe Martinez Gonzalez, Eleonora Vig, Erhan Gundogdu, Michael Donoser
Abstract: A graphical user interface (GUI) of a business process management (BPM) system is provided to construct a process model that is displayed on a graphical display device as a graphical representation comprising nodes representing process events, activities, or decision points and including computer vision (CV) nodes representing video stream processing, with flow connectors defining operational sequences of nodes and data flow between nodes of the process model. The process model is executed to perform a process represented by the process model including executing CV nodes of the process model by performing video stream processing represented by the CV nodes of the process model. The available CV nodes include a set of video pattern detection nodes, and a set of video pattern relation nodes defining a video grammar of relations between video patterns detectable by the video pattern detection nodes.
Type:
Application
Filed:
April 27, 2015
Publication date:
October 27, 2016
Inventors:
Adrian Corneliu Mos, Adrien Gaidon, Eleonora Vig
Abstract: Systems and techniques are generally described for attribute-based content selection and search. In some examples, a graphical user interface (GUI) may display an image of a first product comprising a plurality of visual attributes. In some further examples, the GUI may display at least a first control button with data identifying a first visual attribute of the plurality of visual attributes. In some cases, a first selection of the first control button may be received. In some examples, a first plurality of products may be determined based at least in part on the first selection of the first control button. The first plurality of products may be determined based on a visual similarity to the first product, and a visual dissimilarity to the first product with respect to the first visual attribute. In some examples, the first plurality of products may be displayed on the GUI.
Type:
Grant
Filed:
June 29, 2021
Date of Patent:
November 28, 2023
Assignee:
AMAZON TECHNOLOGIES, INC.
Inventors:
Loris Bazzani, Michael Donoser, Yuxin Hou, Eleonora Vig
Abstract: A tracking system and method are suited to tracking multiple of objects of different categories in a video sequence. A sequence of video frames is received and a set of windows is extracted from each frame in turn, based on a computed probability that the respective window contains an object, without reference to any specific category of object. For each of these windows, a feature representation is extracted. A trained detector for a selected category detects windows that constitute targets in that category, based on the respective feature representations. More than one detector can be used when there is more than one category of objects to be tracked. A target-specific appearance model is generated for each of the targets (e.g., learned or updated, if the target is present in a prior frame). The detected targets are tracked over one or more subsequent frames based on the target-specific appearance models of the targets.