Patents by Inventor Stefano Soatto

Stefano Soatto has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Hierarchical graph neural networks for visual clustering

Patent number: 11860977

Abstract: Techniques for performing visual clustering with a hierarchical graph neural network framework including a joint linkage prediction and density estimation graph model are described. Embodiments herein recurrently run the joint linkage prediction and density estimation graph model to generate intermediate clusters in multiple iterations (e.g., until convergence) to obtain a final clustering result. In certain embodiments, for each iteration, the input graph contains nodes that are merged from nodes assigned to intermediate clusters from the previous iteration. By using a small and fixed bandwidth k in each iteration, embodiments herein alleviate the sensitivity to the k selection for different clustering applications. Certain embodiments herein remove the tuning of a different k (e.g., k-bandwidth) for k-nearest neighbor graph construction over different clustering applications.

Type: Grant

Filed: May 4, 2021

Date of Patent: January 2, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Yifan Xing, Tianjun Xiao, Tong He, Yongxin Wang, Yuanjun Xiong, Wei Xia, David Paul Wipf, Zheng Zhang, Stefano Soatto
Automated model selection for network-based image recognition service

Patent number: 11429813

Abstract: This disclosure describes automatically selecting and training one or more models for image recognition based upon training and testing (validation) data provided by a user. A service provider network includes a recognition service that may use models to process images and videos to recognize objects in the images and videos, features on the objects in the images and videos, and/or locate objects in the images and videos. The service provider network also includes a model selection and training service that may select one or more modeling techniques based on the objectives of the user and/or the amount of data provided by the user. Based on the selected modeling technique, the model selection and training service selects and trains one or more models for use by the recognition service to process images and videos using the training data. The trained model may be tested and validated using the testing data.

Type: Grant

Filed: November 27, 2019

Date of Patent: August 30, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Avinash Aghoram Ravichandran, Rahul Bhotika, Stefano Soatto, Pietro Perona, Hao Yang
Auto-annotation techniques for text localization

Patent number: 11257006

Abstract: Techniques for auto-generation of annotated real-world training data are described. An electronic document is analyzed to determine text represented in the document and corresponding locations of the text. A representation of the electronic document is modified to include markers and printed. The printed document is photographed in real-world environments, and the markers within the digital photographs are analyzed to allow for the depiction of the document within the photographs to be rectified. The text and location data are used to annotate the rectified images.

Type: Grant

Filed: November 20, 2018

Date of Patent: February 22, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Oron Anschel, Amit Adam, Shahar Tsiper, Hadar Averbuch Elor, Shai Mazor, Rahul Bhotika, Stefano Soatto
Backward compatible and backfill-free image search system

Patent number: 11216697

Abstract: Techniques for building a backward compatible and backfill-free image search system are described. According to some embodiments, a backwards compatible training system trains a new embedding model to be backward compatible with the face embeddings (e.g., floating-point vectors) generated by a previous embedding model. In one embodiment, backwards compatible training uses a classifier of the previous embedding model as a form of constraint in the training of the new embedding model.

Type: Grant

Filed: March 11, 2020

Date of Patent: January 4, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Yantao Shen, Yuanjun Xiong, Siqi Deng, Wei Xia, Shuo Yang, Yifan Xing, Wei Li, Stefano Soatto
Grammar-based automated generation of annotated synthetic form training data for machine learning

Patent number: 10970530

Abstract: Techniques for grammar-based automated generation of annotated synthetic form training data for machine learning are described. A training data generation engine utilizes a defined grammar to construct a layout for a form, select key-value units to place within the layout, and select attribute variants for the key-value units. The form is rendered and stored at a storage location, where it can be provided along with other similarly-generated forms to be used as training data for a machine learning model.

Type: Grant

Filed: November 13, 2018

Date of Patent: April 6, 2021

Assignee: Amazon Technologies, Inc.

Inventors: Amit Adam, Oron Anschel, Or Perel, Gal Sabina Star, Omri Ben-Eliezer, Hadar Averbuch Elor, Shai Mazor, Wendy Tse, Andrea Olgiati, Rahul Bhotika, Stefano Soatto
Prototypical network algorithms for few-shot learning

Patent number: 10963754

Abstract: Techniques for training an embedding using a limited training set are described. In some examples, the embedding is trained by generating a plurality of vectors from a random sample of the limited set of training data classes using a layer of the particular machine learning classification model, randomly selecting samples from the plurality of vectors into a set of samples, computing at least one distance for each sampled class from a center parameter for the class using the set of samples, generating a discrete probability distribution over the classes for a query point based on distances to a center parameter for each of the classes in the embedding space, calculating a loss value for the modified prototypical network, the calculation of the loss value being for a fixed geometry of the embedding space and including a measure of the difference between distributions, and back propagating.

Type: Grant

Filed: September 27, 2018

Date of Patent: March 30, 2021

Assignee: Amazon Technologies, Inc.

Inventors: Avinash Aghoram Ravichandran, Paulo Ricardo dos Santos Mendonca, Rahul Bhotika, Stefano Soatto
Automated form understanding via layout agnostic identification of keys and corresponding values

Patent number: 10878234

Abstract: Techniques for automated form understanding via layout-agnostic identification of keys and corresponding values are described. An embedding generator creates embeddings of pixels from an image including a representation of a form. The generated embeddings are similar for pixels within a same key-value unit, and far apart for pixels not in a same key-value unit. A weighted bipartite graph is constructed including a first set of nodes corresponding to keys of the form and a second set of nodes corresponding to values of the form. Weights for the edges are determined based on an analysis of distances between ones of the embeddings. The graph is partitioned according to a scheme to identify pairings between the first set of nodes and the second set of nodes that produces a minimum overall edge weight. The pairings indicate keys and values that are associated within the form.

Type: Grant

Filed: November 20, 2018

Date of Patent: December 29, 2020

Assignee: Amazon Technologies, Inc.

Inventors: Amit Adam, Oron Anschel, Hadar Averbuch Elor, Shai Mazor, Gal Sabina Star, Or Perel, Wendy Tse, Andrea Olgiati, Rahul Bhotika, Stefano Soatto
Layout-agnostic clustering-based classification of document keys and values

Patent number: 10872236

Abstract: Techniques for layout-agnostic clustering-based classification of document keys and values are described. A key-value differentiation unit generates feature vectors corresponding to text elements of a form represented within an electronic image using a machine learning (ML) model. The ML model was trained utilizing a loss function that separates keys from values. The feature vectors are clustered into at least two clusters, and a cluster is determined to include either keys of the form or values of the form via identifying neighbors between feature vectors of the cluster(s) with labeled feature vectors.

Type: Grant

Filed: September 28, 2018

Date of Patent: December 22, 2020

Assignee: Amazon Technologies, Inc.

Inventors: Hadar Averbuch Elor, Oron Anschel, Or Perel, Amit Adam, Shai Mazor, Rahul Bhotika, Stefano Soatto
Structured document analyzer

Patent number: 10839245

Abstract: A structured document analyzer that associates keys and values in structured documents based on key, value, and key-value container bounding boxes. A trained machine learning model analyzes images of structured documents to determine bounding boxes for keys, values, and key-value containers in the images with confidence scores for the classifications. For each image, duplicate bounding boxes are removed, and then a set of key-value containers are selected and sorted based on the confidence scores. For each key-value container, a best key and value are determined for the container based on overlap of the key and value bounding boxes with the container bounding box and the confidence scores. Optical character recognition may be performed on the image to determine text for the keys and values.

Type: Grant

Filed: March 25, 2019

Date of Patent: November 17, 2020

Assignee: Amazon Technologies, Inc.

Inventors: Guneet Singh Dhillon, Vijay Mahadevan, Yuting Zhang, Meng Wang, Gangadhar Payyavula, Viet Cuong Nguyen, Rahul Bhotika, Stefano Soatto
Multiple object tracking in video by combining neural networks within a bayesian framework

Patent number: 10762644

Abstract: Techniques for multiple object tracking in video are described in which the outputs of neural networks are combined within a Bayesian framework. A motion model is applied to a probability distribution representing the estimated current state of a target object being tracked to predict the state of the target object in the next frame. A state of an object can include one or more features, such as the location of the object in the frame, a velocity and/or acceleration of the object across frames, a classification of the object, etc. The prediction of the state of the target object in the next frame is adjusted by a score based on the combined outputs of neural networks that process the next frame.

Type: Grant

Filed: December 13, 2018

Date of Patent: September 1, 2020

Assignee: Amazon Technologies, Inc.

Inventors: Vijay Mahadevan, Stefano Soatto
VISUAL-INERTIAL SENSOR FUSION FOR NAVIGATION, LOCALIZATION, MAPPING, AND 3D RECONSTRUCTION

Publication number: 20190236399

Abstract: A new method for improving the robustness of visual-inertial integration systems (VINS) based on derivation of optimal discriminants for outlier rejection, and the consequent approximations, that are both conceptually and empirically superior to other outlier detection schemes used in this context. It should be appreciated that VINS is central to a number of application areas including augmented reality (AR), virtual reality (VR), robotics, autonomous vehicles, autonomous flying robots, and so forth and their related hardware including mobile phones, such as for use in indoor localization (in GPS-denied areas), and the like.

Type: Application

Filed: August 9, 2018

Publication date: August 1, 2019

Applicant: THE REGENTS OF THE UNIVERSITY OF CALIFORNIA

Inventors: Stefano Soatto, Konstantine Tsotsos
DSP-SIFT: DOMAIN-SIZE POOLING FOR IMAGE DESCRIPTORS FOR IMAGE MATCHING AND OTHER APPLICATIONS

Publication number: 20170243084

Abstract: A variation of scale-invariant feature transform (SIFT) based on pooling gradient orientations across different domain sizes, in addition to spatial locations. The resulting descriptor is called DSP-SIFT, and it outperforms other methods in wide-baseline matching benchmarks, including those based on convolutional neural networks, despite having the same dimension of SIFT and requiring no training. Problems of local representation of imaging data are also addressed as computation of minimal sufficient statistics that are invariant to nuisance variability induced by viewpoint and illumination. A sampling-based and a point-estimate based approximation of such representations are described.

Type: Application

Filed: November 7, 2016

Publication date: August 24, 2017

Applicant: THE REGENTS OF THE UNIVERSITY OF CALIFORNIA

Inventors: Stefano Soatto, Jingming Dong
Spirometry system and disposable volume spirometry apparatus

Patent number: 9456768

Abstract: A disposable volume spirometry apparatus and non-contact measurement system is described. The spirometry system includes an expandable disposable volume spirometry apparatus, a remote non-contact sensor, memory, and a processor. The remote non-contact sensor captures images associated with the expandable disposable volume spirometry apparatus. The memory stores the captured images. The processor is operatively coupled to the memory and determines a volume for the expandable disposable volume spirometry apparatus by analyzing the captured images.

Type: Grant

Filed: August 18, 2011

Date of Patent: October 4, 2016

Inventors: Stefano Soatto, Giuseppe Torresin
End-to end visual recognition system and methods

Patent number: 9418317

Abstract: We describe an end-to-end visual recognition system, where “end-to-end” refers to the ability of the system of performing all aspects of the system, from the construction of “maps” of scenes, or “models” of objects from training data, to the determination of the class, identity, location and other inferred parameters from test data. Our visual recognition system is capable of operating on a mobile hand-held device, such as a mobile phone, tablet or other portable device equipped with sensing and computing power. Our system employs a video based feature descriptor, and we characterize its invariance and discriminative properties. Feature selection and tracking are performed in real-time, and used to train a template-based classifier during a capture phase prompted by the user. During normal operation, the system scores objects in the field of view based on their ranking.

Type: Grant

Filed: April 4, 2014

Date of Patent: August 16, 2016

Assignee: THE REGENTS OF THE UNIVERSITY OF CALIFORNIA

Inventors: Stefano Soatto, Taehee Lee
VISUAL-INERTIAL SENSOR FUSION FOR NAVIGATION, LOCALIZATION, MAPPING, AND 3D RECONSTRUCTION

Publication number: 20160140729

Abstract: A new method for improving the robustness of visual-inertial integration systems (VINS) based on derivation of optimal discriminants for outlier rejection, and the consequent approximations, that are both conceptually and empirically superior to other outlier detection schemes used in this context. It should be appreciated that VINS is central to a number of application areas including augmented reality (AR), virtual reality (VR), robotics, autonomous vehicles, autonomous flying robots, and so forth and their related hardware including mobile phones, such as for use in indoor localization (in GPS-denied areas), and the like.

Type: Application

Filed: November 4, 2015

Publication date: May 19, 2016

Applicant: THE REGENTS OF THE UNIVERSITY OF CALIFORNIA

Inventors: Stefano Soatto, Konstantine Tsotsos
END-TO-END VISUAL RECOGNITION SYSTEM AND METHODS

Publication number: 20140301635

Abstract: We describe an end-to-end visual recognition system, where “end-to-end” refers to the ability of the system of performing all aspects of the system, from the construction of “maps” of scenes, or “models” of objects from training data, to the determination of the class, identity, location and other inferred parameters from test data. Our visual recognition system is capable of operating on a mobile hand-held device, such as a mobile phone, tablet or other portable device equipped with sensing and computing power. Our system employs a video based feature descriptor, and we characterize its invariance and discriminative properties. Feature selection and tracking are performed in real-time, and used to train a template-based classifier during a capture phase prompted by the user. During normal operation, the system scores objects in the field of view based on their ranking.

Type: Application

Filed: April 4, 2014

Publication date: October 9, 2014

Applicant: THE REGENTS OF THE UNIVERSITY OF CALIFORNIA

Inventors: Stefano Soatto, Taehee Lee
End-to end visual recognition system and methods

Patent number: 8717437

Abstract: We describe an end-to-end visual recognition system, where “end-to-end” refers to the ability of the system of performing all aspects of the system, from the construction of “maps” of scenes, or “models” of objects from training data, to the determination of the class, identity, location and other inferred parameters from test data. Our visual recognition system is capable of operating on a mobile hand-held device, such as a mobile phone, tablet or other portable device equipped with sensing and computing power. Our system employs a video based feature descriptor, and we characterize its invariance and discriminative properties. Feature selection and tracking are performed in real-time, and used to train a template-based classifier during a capture phase prompted by the user. During normal operation, the system scores objects in the field of view based on their ranking.

Type: Grant

Filed: January 7, 2013

Date of Patent: May 6, 2014

Assignee: The Regents of the University of California

Inventors: Stefano Soatto, Taehee Lee
SPIROMETRY SYSTEM AND DISPOSABLE VOLUME SPIROMETRY APPARATUS

Publication number: 20120046568

Abstract: A disposable volume spirometry apparatus and non-contact measurement system is described. The spirometry system includes an expandable disposable volume spirometry apparatus, a remote non-contact sensor, memory, and a processor. The remote non-contact sensor captures images associated with the expandable disposable volume spirometry apparatus. The memory stores the captured images. The processor is operatively coupled to the memory and determines a volume for the expandable disposable volume spirometry apparatus by analyzing the captured images.

Type: Application

Filed: August 18, 2011

Publication date: February 23, 2012

Inventors: Stefano Soatto, Giuseppe Torresin
Method and system for selecting and designing eyeglass frames

Patent number: 6944327

Abstract: The present invention provides a method and apparatus for designing and visualizing the shape of eyeglass lenses and of the front rims of eyeglass frames, and for allowing the customer to modify the design by changing the shape and style interactively to suit his/her preference and perceived character. The present invention comprises an interface for a lens grinding machine that allows the retailer to manufacture the selected frame at the store for certain specified styles, or allows the retailer to transmit the shape and style data to a manufacturer that can implement the selected design and deliver it directly to the customer. Another embodiment of the present invention is to provide a method for designing, visualizing, modifying the shape and style of eyeglass lenses and frames remotely through the Internet or other communication channel, and to transmit the design to a manufacturer, which enables the customer to select and purchase eyeglasses directly from manufacturers.

Type: Grant

Filed: November 3, 2000

Date of Patent: September 13, 2005

Inventor: Stefano Soatto