Patents Assigned to Scale AI, Inc.

Automatic data curation

Patent number: 12271443

Abstract: One embodiment of the present invention sets forth a technique for curating a data sample set. The technique includes determining one or more data sampling criteria based on a sampling objective for a data sample set associated with the machine learning model. The technique also includes selecting, from a set of unlabeled data samples, at least one data sample to be labeled and added to a data sample set associated with the machine learning model based on the one or more data sampling criteria. The technique also includes, for each selected data sample, supplementing the data sample set with the selected data sample and at least one association with a label.

Type: Grant

Filed: September 23, 2021

Date of Patent: April 8, 2025

Assignee: SCALE AI, INC.

Inventors: Diego Ardila, Russell Kaplan, Vinjai Saraj Vale, Jihan Yin
Unique sampling of datasets

Patent number: 12259865

Abstract: One embodiment of the present invention sets forth a technique for sampling from a dataset. The technique includes determining a plurality of embeddings for a plurality of data points included in the dataset. The technique also includes populating a tree structure with the plurality of embeddings by generating a first node that stores a first set of embeddings included in the plurality of embeddings and generating a first plurality of nodes as children of the first node, where each node in the first plurality of nodes stores a different subset of embeddings in the first set of embeddings. The technique further includes sampling a subset of embeddings from the plurality of embeddings via a traversal of the tree structure, and generating a sampled dataset that includes a subset of data points corresponding to the subset of embeddings.

Type: Grant

Filed: December 14, 2022

Date of Patent: March 25, 2025

Assignee: Scale AI, Inc.

Inventors: Jihan Yin, Chiao-Lun Cheng
Key-value extraction from documents

Patent number: 12205395

Abstract: One embodiment of the present invention sets forth a technique for extracting data from a document. The technique includes determining a first set of features associated with the document, wherein the first set of features comprises a set of region proposals that bound one or more portions of text within the document. The technique also includes applying a first machine learning model to the first set of features to generate a set of predictions associated with one or more key-value pairs included in the document. The technique further includes extracting the one or more key-value pairs from the document based on the set of predictions.

Type: Grant

Filed: August 18, 2021

Date of Patent: January 21, 2025

Assignee: Scale AI, Inc.

Inventors: Adrian Yunpfei Lam, Chiao-Lun Cheng, Alexandre Matton
Automatic benchmarking of labeling tasks

Patent number: 12189719

Abstract: One embodiment of the present invention sets forth a technique for evaluating labeled data. The technique includes selecting, from a set of labels for a data sample, a subset of the labels representing non-outliers in a distribution of values in the set of labels. The technique also includes aggregating the subset of the labels into a benchmark for the data sample. The technique further includes generating, based on a comparison between the benchmark and an additional label, a benchmark score associated with the data sample, and generating a set of performance metrics for labeling the data sample based on the benchmark score.

Type: Grant

Filed: January 6, 2022

Date of Patent: January 7, 2025

Assignee: Scale AI, Inc.

Inventors: Nathaniel John Herman, Akshat Bubna, Alexandr Wang, Shariq Shahab Hashme, Samuel J. Clearman, Liren Tu, Jeffrey Zhihong Li, James Lennon
Linking key-value pairs in documents

Patent number: 12182102

Abstract: One embodiment of the present invention sets forth a technique for extracting data from a document. The technique includes determining, via execution of one or more machine learning models, a first set of bounding boxes for a first set of values associated with a first key within the document and a second set of bounding boxes for a second set of values associated with a second key within the document. The technique also includes generating a first set of mappings between a list of items in the document and the first set of bounding boxes and a second set of mappings between the first and second sets of bounding boxes based on locations of the bounding boxes. The technique further includes determining, for a given item, one or more associated bounding boxes in the first and second sets of bounding boxes based on the mappings.

Type: Grant

Filed: August 18, 2021

Date of Patent: December 31, 2024

Assignee: Scale AI, Inc.

Inventors: Alexandre Matton, Chiao-Lun Cheng, Adrian Yunpfei Lam
Prelabeling of bounding boxes in video frames

Patent number: 11804042

Abstract: One embodiment of the present invention sets forth a technique for performing a labeling task. The technique includes determining one or more region proposals, wherein each region proposal included in the one or more region proposals includes estimates of one or more bounding boxes surrounding one or more objects in a plurality of video frames. The technique also includes performing one or more operations that execute a refinement stage of a machine learning model to produce one or more refined estimates of the one or more bounding boxes included in the one or more region proposals. The technique further includes outputting the one or more refined estimates as initial representations of the one more bounding boxes for subsequent annotation of the one or more bounding boxes by one or more users.

Type: Grant

Filed: September 4, 2020

Date of Patent: October 31, 2023

Assignee: SCALE AI, INC.

Inventors: Anastasiia Alokhina, Chiao-Lun Cheng, Andrew Liu
Pre-labeling data with cuboid annotations

Patent number: 11776215

Abstract: One embodiment provides techniques for automatically pre-labeling point cloud data with cuboid annotations. Point cloud data is processed using ML models to detect, associate, and localize objects therein, in order to generate cuboid tracks that each include a series of cuboid annotations associated with an object. An object detection model that detects objects and performs coarse localization is trained using a loss function that separately evaluates the distances between corners of predicted cuboids and corners of ground truth cuboids for position, size, and yaw. A refinement model that performs more accurate localization takes as input 2D projections of regions surrounding cuboid tracks predicted by the object detection model and the cuboid tracks, and outputs refined cuboid tracks. The refined cuboid tracks are filtered to a set of keyframes, with in-between frames being interpolated. The cuboid tracks can then be presented to a user for viewing and editing.

Type: Grant

Filed: December 16, 2019

Date of Patent: October 3, 2023

Assignee: SCALE AI, INC.

Inventors: Chiao-Lun Cheng, Elliot Branson, Leigh Marie Braswell, Daniel Havíř, Jeffrey Alan Anders
Prelabeling for semantic segmentation tasks

Patent number: 11636602

Abstract: One embodiment of the present invention sets forth a technique for performing a labeling task. The technique includes generating a multi-scale representation of an image as input to a machine learning model. The technique also includes performing one or more operations that apply the machine learning model to the multi-scale representation of the image to produce a semantic segmentation comprising predictions of labels for regions of pixels in the image. The technique further includes outputting, in a user interface, the semantic segmentation for use in assisting a user in specifying the labels for the pixels in the image.

Type: Grant

Filed: February 12, 2020

Date of Patent: April 25, 2023

Assignee: SCALE AI, INC.

Inventors: Daniel Havír, Chiao-Lun Cheng, Elliot Branson, Nathan Herman, Nathan Hayflick, Simon Alexander Hewat, Suchir Balaji, Shariq Hashme
Visualization techniques for data labeling

Patent number: 11625892

Abstract: One embodiment provides a user interface (UI) that permits users to select how point cloud colorings determined from multiple data sources are blended together in a rendering of a point cloud. The data sources may include photographic, label, and/or LIDAR intensity data. To improve frame rates, an aggregated point cloud may be generated using a spatial hash of a large set of points and sampling of each hash bucket based on the number of points therein and a user-configurable density. Sizes of points in the point cloud may decrease proportionally to distance from a viewer, but increase based on an activation function that enlarges points greater than a threshold distance from the viewer. In addition, luminance statistics for sub-regions of photographic data and dominant colors determined from photographic data may be used to automatically determine color properties to apply to a point cloud coloring.

Type: Grant

Filed: August 12, 2021

Date of Patent: April 11, 2023

Assignee: SCALE AI, INC.

Inventors: Evan Moss, Steven Hao, Leigh Marie Braswell, Akshat Bubna, Chiao-Lun Cheng, Samuel Jacob Clearman, Nathaniel John Herman, Guido Leandro Maliandi
Intensity data visualization

Patent number: 11488332

Abstract: Techniques for coloring a point cloud based on colors derived from LIDAR (light detection and ranging) intensity data are disclosed. In some embodiments, the coloring of the point cloud may employ an activation function that controls the colors assigned to different intensity values. Further, the activation function may be parameterized based on statistics computed for a distribution of intensities associated with a 3D scene and a user-selected sensitivity. Alternatively, a Fourier transform of the distribution of intensities or a clustering of the intensities may be used to estimate individual distributions associated with different materials, based on which the point cloud coloring may be determined from intensity data.

Type: Grant

Filed: February 26, 2021

Date of Patent: November 1, 2022

Assignee: SCALE AI, INC.

Inventors: Evan Moss, Steven Hao, Leigh Marie Braswell
Automatic benchmarking of labeling tasks

Patent number: 11308364

Abstract: One embodiment of the present invention sets forth a technique for evaluating labeled data. The technique includes selecting, from a set of labels for a data sample, a subset of the labels representing non-outliers in a distribution of values in the set of labels. The technique also includes aggregating the subset of the labels into a benchmark for the data sample. The technique further includes generating, based on a comparison between the benchmark and an additional label, a benchmark score associated with the data sample, and generating a set of performance metrics for labeling the data sample based on the benchmark score.

Type: Grant

Filed: December 30, 2019

Date of Patent: April 19, 2022

Assignee: SCALE AI, INC.

Inventors: Nathaniel John Herman, Akshat Bubna, Alexandr Wang, Shariq Shahab Hashme, Samuel J. Clearman, Liren Tu, Jeffrey Zhihong Li, James Lennon
Visualization techniques for data labeling

Patent number: 11222460

Abstract: One embodiment provides a user interface (UI) that permits users to select how point cloud colorings determined from multiple data sources are blended together in a rendering of a point cloud. The data sources may include photographic, label, and/or LIDAR intensity data. To improve frame rates, an aggregated point cloud may be generated using a spatial hash of a large set of points and sampling of each hash bucket based on the number of points therein and a user-configurable density. Sizes of points in the point cloud may decrease proportionally to distance from a viewer, but increase based on an activation function that enlarges points greater than a threshold distance from the viewer. In addition, luminance statistics for sub-regions of photographic data and dominant colors determined from photographic data may be used to automatically determine color properties to apply to a point cloud coloring.

Type: Grant

Filed: July 22, 2019

Date of Patent: January 11, 2022

Assignee: Scale AI, Inc.

Inventors: Evan Moss, Steven Hao, Leigh Marie Braswell, Akshat Bubna, Chiao-Lun Cheng, Samuel Jacob Clearman, Nathaniel John Herman, Guido Leandro Maliandi
Intensity data visualization

Patent number: 10937202

Abstract: Techniques for coloring a point cloud based on colors derived from LIDAR (light detection and ranging) intensity data are disclosed. In some embodiments, the coloring of the point cloud may employ an activation function that controls the colors assigned to different intensity values. Further, the activation function may be parameterized based on statistics computed for a distribution of intensities associated with a 3D scene and a user-selected sensitivity. Alternatively, a Fourier transform of the distribution of intensities or a clustering of the intensities may be used to estimate individual distributions associated with different materials, based on which the point cloud coloring may be determined from intensity data.

Type: Grant

Filed: July 22, 2019

Date of Patent: March 2, 2021

Assignee: Scale AI, Inc.

Inventors: Evan Moss, Steven Hao, Leigh Marie Braswell