Patents by Inventor TIANMING ZHENG

TIANMING ZHENG has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

CONSISTENT FILTERING OF MACHINE LEARNING DATA

Publication number: 20230126005

Abstract: Consistency metadata, including a parameter for a pseudo-random number source, are determined for training-and-evaluation iterations of a machine learning model. Using the metadata, a first training set comprising records of at least a first chunk is identified from a plurality of chunks of a data set. The first training set is used to train a machine learning model during a first training-and-evaluation iteration. A first test set comprising records of at least a second chunk is identified using the metadata, and is used to evaluate the model during the first training-and-evaluation iteration.

Type: Application

Filed: December 23, 2022

Publication date: April 27, 2023

Applicant: Amazon Technologies, Inc.

Inventors: Leo Parker Dirac, Jin Li, Tianming Zheng, Donghui Zhuo
Consistent filtering of machine learning data

Patent number: 11544623

Abstract: Consistency metadata, including a parameter for a pseudo-random number source, are determined for training-and-evaluation iterations of a machine learning model. Using the metadata, a first training set comprising records of at least a first chunk is identified from a plurality of chunks of a data set. The first training set is used to train a machine learning model during a first training-and-evaluation iteration. A first test set comprising records of at least a second chunk is identified using the metadata, and is used to evaluate the model during the first training-and-evaluation iteration.

Type: Grant

Filed: October 2, 2019

Date of Patent: January 3, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Leo Parker Dirac, Jin Li, Tianming Zheng, Donghui Zhuo
Input processing for machine learning

Patent number: 11100420

Abstract: A record extraction request for a data set is received at a machine learning service. A plan to perform one or more chunk-level operations (such as sampling, shuffling, splitting or partitioning for parallel computation) on chunks of the data set is generated. A set of data transfers that results in a particular chunk being stored in a particular server's memory is initiated to implement the first chunk-level operation of the sequence. A second operation such as another filtering operation or a feature processing operation is performed on a result set of the first chunk-level operation.

Type: Grant

Filed: August 14, 2014

Date of Patent: August 24, 2021

Assignee: Amazon Technologies, Inc.

Inventors: Leo Parker Dirac, Jin Li, Rakesh Ramakrishnan, Tianming Zheng, Donghui Zhuo
Consistent sort-based record-level shuffling of machine learning data

Patent number: 10713589

Abstract: A determination that a machine learning data set is to be shuffled is made. Tokens corresponding to the individual observation records are generated based on respective identifiers of the records' storage objects and record key values. Respective representative values are derived from the tokens. The observation records are rearranged based on a result of sorting the representative values and provided to a shuffle result destination.

Type: Grant

Filed: March 3, 2016

Date of Patent: July 14, 2020

Assignee: Amazon Technologies, Inc.

Inventors: Saman Zarandioon, Nicolle M. Correa, Leo Parker Dirac, Aleksandr Mikhaylovich Ingerman, Steven Andrew Loeppky, Robert Matthias Steele, Tianming Zheng
CONSISTENT FILTERING OF MACHINE LEARNING DATA

Publication number: 20200034742

Abstract: Consistency metadata, including a parameter for a pseudo-random number source, are determined for training-and-evaluation iterations of a machine learning model. Using the metadata, a first training set comprising records of at least a first chunk is identified from a plurality of chunks of a data set. The first training set is used to train a machine learning model during a first training-and-evaluation iteration. A first test set comprising records of at least a second chunk is identified using the metadata, and is used to evaluate the model during the first training-and-evaluation iteration.

Type: Application

Filed: October 2, 2019

Publication date: January 30, 2020

Applicant: Amazon Technologies, Inc.

Inventors: Leo Parker Dirac, Jin Li, Tianming Zheng, Donghui Zhuo
Consistent filtering of machine learning data

Patent number: 10540606

Abstract: Consistency metadata, including a parameter for a pseudo-random number source, are determined for training-and-evaluation iterations of a machine learning model. Using the metadata, a first training set comprising records of at least a first chunk is identified from a plurality of chunks of a data set. The first training set is used to train a machine learning model during a first training-and-evaluation iteration. A first test set comprising records of at least a second chunk is identified using the metadata, and is used to evaluate the model during the first training-and-evaluation iteration.

Type: Grant

Filed: August 14, 2014

Date of Patent: January 21, 2020

Assignee: Amazon Technologies, Inc.

Inventors: Leo Parker Dirac, Jin Li, Tianming Zheng, Donghui Zhuo
Consistent randomized record-level splitting of machine learning data

Patent number: 10366053

Abstract: A request to split a data set comprising observation records located in a group of storage objects is received. With respect to a particular observation record, a token is generated based on an identifier of the record's storage object and a key value of the record. A numeric value is calculated using the token, and the observation record is assigned to a split subset using the numeric value. An indication of the assignment is provided to a destination associated with the split subset.

Type: Grant

Filed: November 24, 2015

Date of Patent: July 30, 2019

Assignee: Amazon Technologies, Inc.

Inventors: Tianming Zheng, Nicolle M. Correa, Leo Parker Dirac, James Joseph Jesensky, Robert Matthias Steele
CONSISTENT FILTERING OF MACHINE LEARNING DATA

Publication number: 20150379425

Abstract: Consistency metadata, including a parameter for a pseudo-random number source, are determined for training-and-evaluation iterations of a machine learning model. Using the metadata, a first training set comprising records of at least a first chunk is identified from a plurality of chunks of a data set. The first training set is used to train a machine learning model during a first training-and-evaluation iteration. A first test set comprising records of at least a second chunk is identified using the metadata, and is used to evaluate the model during the first training-and-evaluation iteration.

Type: Application

Filed: August 14, 2014

Publication date: December 31, 2015

Applicant: AMAZON TECHNOLOGIES, INC.

Inventors: LEO PARKER DIRAC, JIN LI, TIANMING ZHENG, DONGHUI ZHUO
INPUT PROCESSING FOR MACHINE LEARNING

Publication number: 20150379072

Abstract: A record extraction request for a data set is received at a machine learning service. A plan to perform one or more chunk-level operations (such as sampling, shuffling, splitting or partitioning for parallel computation) on chunks of the data set is generated. A set of data transfers that results in a particular chunk being stored in a particular server's memory is initiated to implement the first chunk-level operation of the sequence. A second operation such as another filtering operation or a feature processing operation is performed on a result set of the first chunk-level operation.

Type: Application

Filed: August 14, 2014

Publication date: December 31, 2015

Applicant: AMAZON TECHNOLOGIES, INC.

Inventors: LEO PARKER DIRAC, JIN LI, RAKESH RAMAKRISHNAN, TIANMING ZHENG, DONGHUI ZHUO