Patents by Inventor TIANMING ZHENG
TIANMING ZHENG has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20230126005Abstract: Consistency metadata, including a parameter for a pseudo-random number source, are determined for training-and-evaluation iterations of a machine learning model. Using the metadata, a first training set comprising records of at least a first chunk is identified from a plurality of chunks of a data set. The first training set is used to train a machine learning model during a first training-and-evaluation iteration. A first test set comprising records of at least a second chunk is identified using the metadata, and is used to evaluate the model during the first training-and-evaluation iteration.Type: ApplicationFiled: December 23, 2022Publication date: April 27, 2023Applicant: Amazon Technologies, Inc.Inventors: Leo Parker Dirac, Jin Li, Tianming Zheng, Donghui Zhuo
-
Patent number: 11544623Abstract: Consistency metadata, including a parameter for a pseudo-random number source, are determined for training-and-evaluation iterations of a machine learning model. Using the metadata, a first training set comprising records of at least a first chunk is identified from a plurality of chunks of a data set. The first training set is used to train a machine learning model during a first training-and-evaluation iteration. A first test set comprising records of at least a second chunk is identified using the metadata, and is used to evaluate the model during the first training-and-evaluation iteration.Type: GrantFiled: October 2, 2019Date of Patent: January 3, 2023Assignee: Amazon Technologies, Inc.Inventors: Leo Parker Dirac, Jin Li, Tianming Zheng, Donghui Zhuo
-
Patent number: 11100420Abstract: A record extraction request for a data set is received at a machine learning service. A plan to perform one or more chunk-level operations (such as sampling, shuffling, splitting or partitioning for parallel computation) on chunks of the data set is generated. A set of data transfers that results in a particular chunk being stored in a particular server's memory is initiated to implement the first chunk-level operation of the sequence. A second operation such as another filtering operation or a feature processing operation is performed on a result set of the first chunk-level operation.Type: GrantFiled: August 14, 2014Date of Patent: August 24, 2021Assignee: Amazon Technologies, Inc.Inventors: Leo Parker Dirac, Jin Li, Rakesh Ramakrishnan, Tianming Zheng, Donghui Zhuo
-
Patent number: 10713589Abstract: A determination that a machine learning data set is to be shuffled is made. Tokens corresponding to the individual observation records are generated based on respective identifiers of the records' storage objects and record key values. Respective representative values are derived from the tokens. The observation records are rearranged based on a result of sorting the representative values and provided to a shuffle result destination.Type: GrantFiled: March 3, 2016Date of Patent: July 14, 2020Assignee: Amazon Technologies, Inc.Inventors: Saman Zarandioon, Nicolle M. Correa, Leo Parker Dirac, Aleksandr Mikhaylovich Ingerman, Steven Andrew Loeppky, Robert Matthias Steele, Tianming Zheng
-
Publication number: 20200034742Abstract: Consistency metadata, including a parameter for a pseudo-random number source, are determined for training-and-evaluation iterations of a machine learning model. Using the metadata, a first training set comprising records of at least a first chunk is identified from a plurality of chunks of a data set. The first training set is used to train a machine learning model during a first training-and-evaluation iteration. A first test set comprising records of at least a second chunk is identified using the metadata, and is used to evaluate the model during the first training-and-evaluation iteration.Type: ApplicationFiled: October 2, 2019Publication date: January 30, 2020Applicant: Amazon Technologies, Inc.Inventors: Leo Parker Dirac, Jin Li, Tianming Zheng, Donghui Zhuo
-
Patent number: 10540606Abstract: Consistency metadata, including a parameter for a pseudo-random number source, are determined for training-and-evaluation iterations of a machine learning model. Using the metadata, a first training set comprising records of at least a first chunk is identified from a plurality of chunks of a data set. The first training set is used to train a machine learning model during a first training-and-evaluation iteration. A first test set comprising records of at least a second chunk is identified using the metadata, and is used to evaluate the model during the first training-and-evaluation iteration.Type: GrantFiled: August 14, 2014Date of Patent: January 21, 2020Assignee: Amazon Technologies, Inc.Inventors: Leo Parker Dirac, Jin Li, Tianming Zheng, Donghui Zhuo
-
Patent number: 10366053Abstract: A request to split a data set comprising observation records located in a group of storage objects is received. With respect to a particular observation record, a token is generated based on an identifier of the record's storage object and a key value of the record. A numeric value is calculated using the token, and the observation record is assigned to a split subset using the numeric value. An indication of the assignment is provided to a destination associated with the split subset.Type: GrantFiled: November 24, 2015Date of Patent: July 30, 2019Assignee: Amazon Technologies, Inc.Inventors: Tianming Zheng, Nicolle M. Correa, Leo Parker Dirac, James Joseph Jesensky, Robert Matthias Steele
-
Publication number: 20150379425Abstract: Consistency metadata, including a parameter for a pseudo-random number source, are determined for training-and-evaluation iterations of a machine learning model. Using the metadata, a first training set comprising records of at least a first chunk is identified from a plurality of chunks of a data set. The first training set is used to train a machine learning model during a first training-and-evaluation iteration. A first test set comprising records of at least a second chunk is identified using the metadata, and is used to evaluate the model during the first training-and-evaluation iteration.Type: ApplicationFiled: August 14, 2014Publication date: December 31, 2015Applicant: AMAZON TECHNOLOGIES, INC.Inventors: LEO PARKER DIRAC, JIN LI, TIANMING ZHENG, DONGHUI ZHUO
-
Publication number: 20150379072Abstract: A record extraction request for a data set is received at a machine learning service. A plan to perform one or more chunk-level operations (such as sampling, shuffling, splitting or partitioning for parallel computation) on chunks of the data set is generated. A set of data transfers that results in a particular chunk being stored in a particular server's memory is initiated to implement the first chunk-level operation of the sequence. A second operation such as another filtering operation or a feature processing operation is performed on a result set of the first chunk-level operation.Type: ApplicationFiled: August 14, 2014Publication date: December 31, 2015Applicant: AMAZON TECHNOLOGIES, INC.Inventors: LEO PARKER DIRAC, JIN LI, RAKESH RAMAKRISHNAN, TIANMING ZHENG, DONGHUI ZHUO