Patents by Inventor Haishan Zhu

Haishan Zhu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240134564
    Abstract: Embodiments of the present disclosure include systems and methods for transposing matrices based on a multi-level crossbar. A system may include a memory configured to store a matrix comprising a plurality of elements arranged in a set of rows and a set of columns. A system may include an input buffer configured to retrieve a subset of a plurality of elements from the memory. Each element in the subset of the plurality of elements is retrieved from a different column in the matrix. A system may include a multi-level crossbar configured to perform a transpose operation on the subset of the plurality of elements. A system may include an output buffer configured to receive the transposed subset of the plurality of elements and store, in the memory, each element in the transposed subset of the plurality of elements in a different column in the matrix.
    Type: Application
    Filed: October 19, 2022
    Publication date: April 25, 2024
    Inventors: Jinhang CHOI, Haishan ZHU, Yi LUO, Eric S. CHUNG
  • Publication number: 20240134683
    Abstract: A hardware retire circuit includes: one or more input queues, each queue corresponding to an input stream of tasks and being configured to store input task identifiers corresponding to tasks of the input stream; and processing logic configured to: receive a completed task event; determine whether a completed task queue identifier and a completed task identifier of the completed task event match an input task identifier of an input task at a head of an input queue having an input queue identifier corresponding to the completed task queue identifier; and in response to determining a match, pop the task at the head of the input queue and output a task retirement event corresponding to the input task.
    Type: Application
    Filed: October 20, 2022
    Publication date: April 25, 2024
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Yi LUO, Jinwen XI, Xuan ZUO, Haishan ZHU, Eric Sen CHUNG
  • Publication number: 20240127107
    Abstract: Embodiments of the present disclosure include techniques for machine language processing. In one embodiment, the present disclosure include commands with data structures comprising fields describing multi-dimensional data and fields describing synchronization. Large volumes of data may be processed and automatically synchronized by execution of a single command.
    Type: Application
    Filed: October 14, 2022
    Publication date: April 18, 2024
    Inventors: Haishan ZHU, Eric S. CHUNG
  • Publication number: 20240126617
    Abstract: Embodiments of the present disclosure include techniques for machine language processing. In one embodiment, the present disclosure includes configuring functional modules on a machine learning processor to execute a plurality of machine learning (ML) operations during a plurality of time segments. During the time segments, a first portion of the ML operations execute serially and at least one other ML operation executes during at least a majority of the time of each of the time segments. Serial ML operations may be processed simultaneously with the at least one other ML operation.
    Type: Application
    Filed: October 14, 2022
    Publication date: April 18, 2024
    Inventors: Haishan ZHU, Preyas Janak SHAH, Tiyasa MITRA, Eric S. CHUNG
  • Publication number: 20240086233
    Abstract: Embodiments of the present disclosure include systems and methods for providing a hierarchical programming model for AI hardware. A system includes a set of lower-level control threads. The system also includes a higher-level control thread configured to receive a command from a device, generate a set of commands based on the command, and provide the set of commands to a subset of the set of lower-level control threads. A lower-level control thread in the subset of the set of lower-level control threads is configured to instruct, based on a particular command in the set of commands, a subset of a plurality of processing threads to perform a set of operations.
    Type: Application
    Filed: September 9, 2022
    Publication date: March 14, 2024
    Inventors: Haishan ZHU, Eric S. CHUNG
  • Patent number: 11853897
    Abstract: Bounding box quantization can reduce the quantity of bits utilized to express numerical values prior to the multiplication of matrices comprised of such numerical values, thereby reducing both memory consumption and processor utilization. Stochastic rounding can provide sufficient precision to enable the storage of weight values in reduced-precision formats without having to separately store weight values in a full-precision format. Alternatively, other rounding mechanisms, such as round to nearest, can be utilized to exchange weight values in reduced-precision formats, while also storing weight values in full-precision formats for subsequent updating. To facilitate conversion, reduced-precision formats such as brain floating-point format can be utilized.
    Type: Grant
    Filed: December 9, 2022
    Date of Patent: December 26, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Taesik Na, Daniel Lo, Haishan Zhu, Eric Sen Chung
  • Patent number: 11704158
    Abstract: Methods, systems, and computer storage media storing instructions for managing processing system efficiency. One of the methods includes obtaining data splitting a plurality of general-purpose processing units in a processing system into a high-priority domain and a low-priority domain, wherein the general-purpose processing units in the high-priority domain are assigned to perform one or more tasks comprising one or more high-priority tasks, and the general-purpose processing units in the low-priority domain are assigned to perform one or more low-priority tasks; and during runtime of the processing system, obtaining memory usage measurements that characterize usage of system memory by the high-priority domain and the low-priority domain; and adjusting, based on the memory usage measurements, a configuration of (i) the high-priority domain, (ii) the low-priority domain, or (iii) both to adjust utilization of the system memory by the general-purpose processing units.
    Type: Grant
    Filed: January 29, 2021
    Date of Patent: July 18, 2023
    Assignee: Google LLC
    Inventors: Liqun Cheng, Rama Krishna Govindaraju, Haishan Zhu, David Lo, Parthasarathy Ranganathan, Nishant Patil
  • Publication number: 20230110219
    Abstract: Bounding box quantization can reduce the quantity of bits utilized to express numerical values prior to the multiplication of matrices comprised of such numerical values, thereby reducing both memory consumption and processor utilization. Stochastic rounding can provide sufficient precision to enable the storage of weight values in reduced-precision formats without having to separately store weight values in a full-precision format. Alternatively, other rounding mechanisms, such as round to nearest, can be utilized to exchange weight values in reduced-precision formats, while also storing weight values in full-precision formats for subsequent updating. To facilitate conversion, reduced-precision formats such as brain floating-point format can be utilized.
    Type: Application
    Filed: December 9, 2022
    Publication date: April 13, 2023
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Taesik NA, Daniel LO, Haishan ZHU, Eric Sen CHUNG
  • Patent number: 11526761
    Abstract: Bounding box quantization can reduce the quantity of bits utilized to express numerical values prior to the multiplication of matrices comprised of such numerical values, thereby reducing both memory consumption and processor utilization. Stochastic rounding can provide sufficient precision to enable the storage of weight values in reduced-precision formats without having to separately store weight values in a full-precision format. Alternatively, other rounding mechanisms, such as round to nearest, can be utilized to exchange weight values in reduced-precision formats, while also storing weight values in full-precision formats for subsequent updating. To facilitate conversion, reduced-precision formats such as brain floating-point format can be utilized.
    Type: Grant
    Filed: August 24, 2019
    Date of Patent: December 13, 2022
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Taesik Na, Daniel Lo, Haishan Zhu, Eric Sen Chung
  • Publication number: 20210224129
    Abstract: Methods, systems, and computer storage media storing instructions for managing processing system efficiency. One of the methods includes obtaining data splitting a plurality of general-purpose processing units in a processing system into a high-priority domain and a low-priority domain, wherein the general-purpose processing units in the high-priority domain are assigned to perform one or more tasks comprising one or more high-priority tasks, and the general-purpose processing units in the low-priority domain are assigned to perform one or more low-priority tasks; and during runtime of the processing system, obtaining memory usage measurements that characterize usage of system memory by the high-priority domain and the low-priority domain; and adjusting, based on the memory usage measurements, a configuration of (i) the high-priority domain, (ii) the low-priority domain, or (iii) both to adjust utilization of the system memory by the general-purpose processing units.
    Type: Application
    Filed: January 29, 2021
    Publication date: July 22, 2021
    Inventors: Liqun Cheng, Rama Krishna Govindaraju, Haishan Zhu, David Lo, Parthasarathy Ranganathan, Nishant Patil
  • Publication number: 20210056423
    Abstract: Bounding box quantization can reduce the quantity of bits utilized to express numerical values prior to the multiplication of matrices comprised of such numerical values, thereby reducing both memory consumption and processor utilization. Stochastic rounding can provide sufficient precision to enable the storage of weight values in reduced-precision formats without having to separately store weight values in a full-precision format. Alternatively, other rounding mechanisms, such as round to nearest, can be utilized to exchange weight values in reduced-precision formats, while also storing weight values in full-precision formats for subsequent updating. To facilitate conversion, reduced-precision formats such as brain floating-point format can be utilized.
    Type: Application
    Filed: August 24, 2019
    Publication date: February 25, 2021
    Inventors: Taesik NA, Daniel LO, Haishan ZHU, Eric Sen CHUNG
  • Patent number: 10908964
    Abstract: Methods, systems, and computer storage media storing instructions for managing processing system efficiency. One of the methods includes obtaining data splitting a plurality of general-purpose processing units in a processing system into a high-priority domain and a low-priority domain, wherein the general-purpose processing units in the high-priority domain are assigned to perform one or more tasks comprising one or more high-priority tasks, and the general-purpose processing units in the low-priority domain are assigned to perform one or more low-priority tasks; and during runtime of the processing system, obtaining memory usage measurements that characterize usage of system memory by the high-priority domain and the low-priority domain; and adjusting, based on the memory usage measurements, a configuration of (i) the high-priority domain, (ii) the low-priority domain, or (iii) both to adjust utilization of the system memory by the general-purpose processing units.
    Type: Grant
    Filed: November 21, 2018
    Date of Patent: February 2, 2021
    Assignee: Google LLC
    Inventors: Liqun Cheng, Rama Krishna Govindaraju, Haishan Zhu, David Lo, Parthasarathy Ranganathan, Nishant Patil
  • Publication number: 20200302283
    Abstract: The use of mixed precision values when training an artificial neural network (ANN) can increase performance while reducing cost. Certain portions and/or steps of an ANN may be selected to use higher or lower precision values when training. Additionally, or alternatively, early phases of training are accurate enough with lower levels of precision to quickly refine an ANN model, while higher levels of precision may be used to increase accuracy for later steps and epochs. Similarly, different gates of a long short-term memory (LSTM) may be supplied with values having different precisions.
    Type: Application
    Filed: March 18, 2019
    Publication date: September 24, 2020
    Inventors: Haishan ZHU, Taesik NA, Daniel LO, Eric S. CHUNG
  • Publication number: 20200218982
    Abstract: A machine learning tool uses dithered quantization of parameters during training of a machine learning model such as a neural network. The machine learning tool receives training data and initializes certain parameters of the machine learning model (e.g., weights for connections between nodes of a neural network, biases for nodes). The machine learning tool trains the parameters in one or more iterations based on the training data. In particular, in a given iteration, the machine learning tool applies the machine learning model to at least some of the training data and, based at least in part on the results, determines parameter updates to the parameters. The machine learning tool updates the parameters using the parameter updates and a dithered quantizer function, which can add random values before a rounding or truncation operation.
    Type: Application
    Filed: January 4, 2019
    Publication date: July 9, 2020
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Thomas M. ANNAU, Haishan ZHU, Daniel LO, Eric S. CHUNG
  • Publication number: 20190155658
    Abstract: Methods, systems, and computer storage media storing instructions for managing processing system efficiency. One of the methods includes obtaining data splitting a plurality of general-purpose processing units in a processing system into a high-priority domain and a low-priority domain, wherein the general-purpose processing units in the high-priority domain are assigned to perform one or more tasks comprising one or more high-priority tasks, and the general-purpose processing units in the low-priority domain are assigned to perform one or more low-priority tasks; and during runtime of the processing system, obtaining memory usage measurements that characterize usage of system memory by the high-priority domain and the low-priority domain; and adjusting, based on the memory usage measurements, a configuration of (i) the high-priority domain, (ii) the low-priority domain, or (iii) both to adjust utilization of the system memory by the general-purpose processing units.
    Type: Application
    Filed: November 21, 2018
    Publication date: May 23, 2019
    Inventors: Liqun Cheng, Rama Krishna Govindaraju, Haishan Zhu, David Lo, Parthasarathy Ranganathan, Nishant Patil