Patents by Inventor TIANMING ZHENG

TIANMING ZHENG has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20230126005
    Abstract: Consistency metadata, including a parameter for a pseudo-random number source, are determined for training-and-evaluation iterations of a machine learning model. Using the metadata, a first training set comprising records of at least a first chunk is identified from a plurality of chunks of a data set. The first training set is used to train a machine learning model during a first training-and-evaluation iteration. A first test set comprising records of at least a second chunk is identified using the metadata, and is used to evaluate the model during the first training-and-evaluation iteration.
    Type: Application
    Filed: December 23, 2022
    Publication date: April 27, 2023
    Applicant: Amazon Technologies, Inc.
    Inventors: Leo Parker Dirac, Jin Li, Tianming Zheng, Donghui Zhuo
  • Patent number: 11544623
    Abstract: Consistency metadata, including a parameter for a pseudo-random number source, are determined for training-and-evaluation iterations of a machine learning model. Using the metadata, a first training set comprising records of at least a first chunk is identified from a plurality of chunks of a data set. The first training set is used to train a machine learning model during a first training-and-evaluation iteration. A first test set comprising records of at least a second chunk is identified using the metadata, and is used to evaluate the model during the first training-and-evaluation iteration.
    Type: Grant
    Filed: October 2, 2019
    Date of Patent: January 3, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Leo Parker Dirac, Jin Li, Tianming Zheng, Donghui Zhuo
  • Patent number: 11100420
    Abstract: A record extraction request for a data set is received at a machine learning service. A plan to perform one or more chunk-level operations (such as sampling, shuffling, splitting or partitioning for parallel computation) on chunks of the data set is generated. A set of data transfers that results in a particular chunk being stored in a particular server's memory is initiated to implement the first chunk-level operation of the sequence. A second operation such as another filtering operation or a feature processing operation is performed on a result set of the first chunk-level operation.
    Type: Grant
    Filed: August 14, 2014
    Date of Patent: August 24, 2021
    Assignee: Amazon Technologies, Inc.
    Inventors: Leo Parker Dirac, Jin Li, Rakesh Ramakrishnan, Tianming Zheng, Donghui Zhuo
  • Patent number: 10713589
    Abstract: A determination that a machine learning data set is to be shuffled is made. Tokens corresponding to the individual observation records are generated based on respective identifiers of the records' storage objects and record key values. Respective representative values are derived from the tokens. The observation records are rearranged based on a result of sorting the representative values and provided to a shuffle result destination.
    Type: Grant
    Filed: March 3, 2016
    Date of Patent: July 14, 2020
    Assignee: Amazon Technologies, Inc.
    Inventors: Saman Zarandioon, Nicolle M. Correa, Leo Parker Dirac, Aleksandr Mikhaylovich Ingerman, Steven Andrew Loeppky, Robert Matthias Steele, Tianming Zheng
  • Publication number: 20200034742
    Abstract: Consistency metadata, including a parameter for a pseudo-random number source, are determined for training-and-evaluation iterations of a machine learning model. Using the metadata, a first training set comprising records of at least a first chunk is identified from a plurality of chunks of a data set. The first training set is used to train a machine learning model during a first training-and-evaluation iteration. A first test set comprising records of at least a second chunk is identified using the metadata, and is used to evaluate the model during the first training-and-evaluation iteration.
    Type: Application
    Filed: October 2, 2019
    Publication date: January 30, 2020
    Applicant: Amazon Technologies, Inc.
    Inventors: Leo Parker Dirac, Jin Li, Tianming Zheng, Donghui Zhuo
  • Patent number: 10540606
    Abstract: Consistency metadata, including a parameter for a pseudo-random number source, are determined for training-and-evaluation iterations of a machine learning model. Using the metadata, a first training set comprising records of at least a first chunk is identified from a plurality of chunks of a data set. The first training set is used to train a machine learning model during a first training-and-evaluation iteration. A first test set comprising records of at least a second chunk is identified using the metadata, and is used to evaluate the model during the first training-and-evaluation iteration.
    Type: Grant
    Filed: August 14, 2014
    Date of Patent: January 21, 2020
    Assignee: Amazon Technologies, Inc.
    Inventors: Leo Parker Dirac, Jin Li, Tianming Zheng, Donghui Zhuo
  • Patent number: 10366053
    Abstract: A request to split a data set comprising observation records located in a group of storage objects is received. With respect to a particular observation record, a token is generated based on an identifier of the record's storage object and a key value of the record. A numeric value is calculated using the token, and the observation record is assigned to a split subset using the numeric value. An indication of the assignment is provided to a destination associated with the split subset.
    Type: Grant
    Filed: November 24, 2015
    Date of Patent: July 30, 2019
    Assignee: Amazon Technologies, Inc.
    Inventors: Tianming Zheng, Nicolle M. Correa, Leo Parker Dirac, James Joseph Jesensky, Robert Matthias Steele
  • Publication number: 20150379425
    Abstract: Consistency metadata, including a parameter for a pseudo-random number source, are determined for training-and-evaluation iterations of a machine learning model. Using the metadata, a first training set comprising records of at least a first chunk is identified from a plurality of chunks of a data set. The first training set is used to train a machine learning model during a first training-and-evaluation iteration. A first test set comprising records of at least a second chunk is identified using the metadata, and is used to evaluate the model during the first training-and-evaluation iteration.
    Type: Application
    Filed: August 14, 2014
    Publication date: December 31, 2015
    Applicant: AMAZON TECHNOLOGIES, INC.
    Inventors: LEO PARKER DIRAC, JIN LI, TIANMING ZHENG, DONGHUI ZHUO
  • Publication number: 20150379072
    Abstract: A record extraction request for a data set is received at a machine learning service. A plan to perform one or more chunk-level operations (such as sampling, shuffling, splitting or partitioning for parallel computation) on chunks of the data set is generated. A set of data transfers that results in a particular chunk being stored in a particular server's memory is initiated to implement the first chunk-level operation of the sequence. A second operation such as another filtering operation or a feature processing operation is performed on a result set of the first chunk-level operation.
    Type: Application
    Filed: August 14, 2014
    Publication date: December 31, 2015
    Applicant: AMAZON TECHNOLOGIES, INC.
    Inventors: LEO PARKER DIRAC, JIN LI, RAKESH RAMAKRISHNAN, TIANMING ZHENG, DONGHUI ZHUO