Patents by Inventor Haishan Zhu
Haishan Zhu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240134564Abstract: Embodiments of the present disclosure include systems and methods for transposing matrices based on a multi-level crossbar. A system may include a memory configured to store a matrix comprising a plurality of elements arranged in a set of rows and a set of columns. A system may include an input buffer configured to retrieve a subset of a plurality of elements from the memory. Each element in the subset of the plurality of elements is retrieved from a different column in the matrix. A system may include a multi-level crossbar configured to perform a transpose operation on the subset of the plurality of elements. A system may include an output buffer configured to receive the transposed subset of the plurality of elements and store, in the memory, each element in the transposed subset of the plurality of elements in a different column in the matrix.Type: ApplicationFiled: October 19, 2022Publication date: April 25, 2024Inventors: Jinhang CHOI, Haishan ZHU, Yi LUO, Eric S. CHUNG
-
Publication number: 20240134683Abstract: A hardware retire circuit includes: one or more input queues, each queue corresponding to an input stream of tasks and being configured to store input task identifiers corresponding to tasks of the input stream; and processing logic configured to: receive a completed task event; determine whether a completed task queue identifier and a completed task identifier of the completed task event match an input task identifier of an input task at a head of an input queue having an input queue identifier corresponding to the completed task queue identifier; and in response to determining a match, pop the task at the head of the input queue and output a task retirement event corresponding to the input task.Type: ApplicationFiled: October 20, 2022Publication date: April 25, 2024Applicant: Microsoft Technology Licensing, LLCInventors: Yi LUO, Jinwen XI, Xuan ZUO, Haishan ZHU, Eric Sen CHUNG
-
Publication number: 20240127107Abstract: Embodiments of the present disclosure include techniques for machine language processing. In one embodiment, the present disclosure include commands with data structures comprising fields describing multi-dimensional data and fields describing synchronization. Large volumes of data may be processed and automatically synchronized by execution of a single command.Type: ApplicationFiled: October 14, 2022Publication date: April 18, 2024Inventors: Haishan ZHU, Eric S. CHUNG
-
Publication number: 20240126617Abstract: Embodiments of the present disclosure include techniques for machine language processing. In one embodiment, the present disclosure includes configuring functional modules on a machine learning processor to execute a plurality of machine learning (ML) operations during a plurality of time segments. During the time segments, a first portion of the ML operations execute serially and at least one other ML operation executes during at least a majority of the time of each of the time segments. Serial ML operations may be processed simultaneously with the at least one other ML operation.Type: ApplicationFiled: October 14, 2022Publication date: April 18, 2024Inventors: Haishan ZHU, Preyas Janak SHAH, Tiyasa MITRA, Eric S. CHUNG
-
Publication number: 20240086233Abstract: Embodiments of the present disclosure include systems and methods for providing a hierarchical programming model for AI hardware. A system includes a set of lower-level control threads. The system also includes a higher-level control thread configured to receive a command from a device, generate a set of commands based on the command, and provide the set of commands to a subset of the set of lower-level control threads. A lower-level control thread in the subset of the set of lower-level control threads is configured to instruct, based on a particular command in the set of commands, a subset of a plurality of processing threads to perform a set of operations.Type: ApplicationFiled: September 9, 2022Publication date: March 14, 2024Inventors: Haishan ZHU, Eric S. CHUNG
-
Patent number: 11853897Abstract: Bounding box quantization can reduce the quantity of bits utilized to express numerical values prior to the multiplication of matrices comprised of such numerical values, thereby reducing both memory consumption and processor utilization. Stochastic rounding can provide sufficient precision to enable the storage of weight values in reduced-precision formats without having to separately store weight values in a full-precision format. Alternatively, other rounding mechanisms, such as round to nearest, can be utilized to exchange weight values in reduced-precision formats, while also storing weight values in full-precision formats for subsequent updating. To facilitate conversion, reduced-precision formats such as brain floating-point format can be utilized.Type: GrantFiled: December 9, 2022Date of Patent: December 26, 2023Assignee: Microsoft Technology Licensing, LLCInventors: Taesik Na, Daniel Lo, Haishan Zhu, Eric Sen Chung
-
Patent number: 11704158Abstract: Methods, systems, and computer storage media storing instructions for managing processing system efficiency. One of the methods includes obtaining data splitting a plurality of general-purpose processing units in a processing system into a high-priority domain and a low-priority domain, wherein the general-purpose processing units in the high-priority domain are assigned to perform one or more tasks comprising one or more high-priority tasks, and the general-purpose processing units in the low-priority domain are assigned to perform one or more low-priority tasks; and during runtime of the processing system, obtaining memory usage measurements that characterize usage of system memory by the high-priority domain and the low-priority domain; and adjusting, based on the memory usage measurements, a configuration of (i) the high-priority domain, (ii) the low-priority domain, or (iii) both to adjust utilization of the system memory by the general-purpose processing units.Type: GrantFiled: January 29, 2021Date of Patent: July 18, 2023Assignee: Google LLCInventors: Liqun Cheng, Rama Krishna Govindaraju, Haishan Zhu, David Lo, Parthasarathy Ranganathan, Nishant Patil
-
Publication number: 20230110219Abstract: Bounding box quantization can reduce the quantity of bits utilized to express numerical values prior to the multiplication of matrices comprised of such numerical values, thereby reducing both memory consumption and processor utilization. Stochastic rounding can provide sufficient precision to enable the storage of weight values in reduced-precision formats without having to separately store weight values in a full-precision format. Alternatively, other rounding mechanisms, such as round to nearest, can be utilized to exchange weight values in reduced-precision formats, while also storing weight values in full-precision formats for subsequent updating. To facilitate conversion, reduced-precision formats such as brain floating-point format can be utilized.Type: ApplicationFiled: December 9, 2022Publication date: April 13, 2023Applicant: Microsoft Technology Licensing, LLCInventors: Taesik NA, Daniel LO, Haishan ZHU, Eric Sen CHUNG
-
Patent number: 11526761Abstract: Bounding box quantization can reduce the quantity of bits utilized to express numerical values prior to the multiplication of matrices comprised of such numerical values, thereby reducing both memory consumption and processor utilization. Stochastic rounding can provide sufficient precision to enable the storage of weight values in reduced-precision formats without having to separately store weight values in a full-precision format. Alternatively, other rounding mechanisms, such as round to nearest, can be utilized to exchange weight values in reduced-precision formats, while also storing weight values in full-precision formats for subsequent updating. To facilitate conversion, reduced-precision formats such as brain floating-point format can be utilized.Type: GrantFiled: August 24, 2019Date of Patent: December 13, 2022Assignee: Microsoft Technology Licensing, LLCInventors: Taesik Na, Daniel Lo, Haishan Zhu, Eric Sen Chung
-
Publication number: 20210224129Abstract: Methods, systems, and computer storage media storing instructions for managing processing system efficiency. One of the methods includes obtaining data splitting a plurality of general-purpose processing units in a processing system into a high-priority domain and a low-priority domain, wherein the general-purpose processing units in the high-priority domain are assigned to perform one or more tasks comprising one or more high-priority tasks, and the general-purpose processing units in the low-priority domain are assigned to perform one or more low-priority tasks; and during runtime of the processing system, obtaining memory usage measurements that characterize usage of system memory by the high-priority domain and the low-priority domain; and adjusting, based on the memory usage measurements, a configuration of (i) the high-priority domain, (ii) the low-priority domain, or (iii) both to adjust utilization of the system memory by the general-purpose processing units.Type: ApplicationFiled: January 29, 2021Publication date: July 22, 2021Inventors: Liqun Cheng, Rama Krishna Govindaraju, Haishan Zhu, David Lo, Parthasarathy Ranganathan, Nishant Patil
-
Publication number: 20210056423Abstract: Bounding box quantization can reduce the quantity of bits utilized to express numerical values prior to the multiplication of matrices comprised of such numerical values, thereby reducing both memory consumption and processor utilization. Stochastic rounding can provide sufficient precision to enable the storage of weight values in reduced-precision formats without having to separately store weight values in a full-precision format. Alternatively, other rounding mechanisms, such as round to nearest, can be utilized to exchange weight values in reduced-precision formats, while also storing weight values in full-precision formats for subsequent updating. To facilitate conversion, reduced-precision formats such as brain floating-point format can be utilized.Type: ApplicationFiled: August 24, 2019Publication date: February 25, 2021Inventors: Taesik NA, Daniel LO, Haishan ZHU, Eric Sen CHUNG
-
Patent number: 10908964Abstract: Methods, systems, and computer storage media storing instructions for managing processing system efficiency. One of the methods includes obtaining data splitting a plurality of general-purpose processing units in a processing system into a high-priority domain and a low-priority domain, wherein the general-purpose processing units in the high-priority domain are assigned to perform one or more tasks comprising one or more high-priority tasks, and the general-purpose processing units in the low-priority domain are assigned to perform one or more low-priority tasks; and during runtime of the processing system, obtaining memory usage measurements that characterize usage of system memory by the high-priority domain and the low-priority domain; and adjusting, based on the memory usage measurements, a configuration of (i) the high-priority domain, (ii) the low-priority domain, or (iii) both to adjust utilization of the system memory by the general-purpose processing units.Type: GrantFiled: November 21, 2018Date of Patent: February 2, 2021Assignee: Google LLCInventors: Liqun Cheng, Rama Krishna Govindaraju, Haishan Zhu, David Lo, Parthasarathy Ranganathan, Nishant Patil
-
Publication number: 20200302283Abstract: The use of mixed precision values when training an artificial neural network (ANN) can increase performance while reducing cost. Certain portions and/or steps of an ANN may be selected to use higher or lower precision values when training. Additionally, or alternatively, early phases of training are accurate enough with lower levels of precision to quickly refine an ANN model, while higher levels of precision may be used to increase accuracy for later steps and epochs. Similarly, different gates of a long short-term memory (LSTM) may be supplied with values having different precisions.Type: ApplicationFiled: March 18, 2019Publication date: September 24, 2020Inventors: Haishan ZHU, Taesik NA, Daniel LO, Eric S. CHUNG
-
Publication number: 20200218982Abstract: A machine learning tool uses dithered quantization of parameters during training of a machine learning model such as a neural network. The machine learning tool receives training data and initializes certain parameters of the machine learning model (e.g., weights for connections between nodes of a neural network, biases for nodes). The machine learning tool trains the parameters in one or more iterations based on the training data. In particular, in a given iteration, the machine learning tool applies the machine learning model to at least some of the training data and, based at least in part on the results, determines parameter updates to the parameters. The machine learning tool updates the parameters using the parameter updates and a dithered quantizer function, which can add random values before a rounding or truncation operation.Type: ApplicationFiled: January 4, 2019Publication date: July 9, 2020Applicant: Microsoft Technology Licensing, LLCInventors: Thomas M. ANNAU, Haishan ZHU, Daniel LO, Eric S. CHUNG
-
Publication number: 20190155658Abstract: Methods, systems, and computer storage media storing instructions for managing processing system efficiency. One of the methods includes obtaining data splitting a plurality of general-purpose processing units in a processing system into a high-priority domain and a low-priority domain, wherein the general-purpose processing units in the high-priority domain are assigned to perform one or more tasks comprising one or more high-priority tasks, and the general-purpose processing units in the low-priority domain are assigned to perform one or more low-priority tasks; and during runtime of the processing system, obtaining memory usage measurements that characterize usage of system memory by the high-priority domain and the low-priority domain; and adjusting, based on the memory usage measurements, a configuration of (i) the high-priority domain, (ii) the low-priority domain, or (iii) both to adjust utilization of the system memory by the general-purpose processing units.Type: ApplicationFiled: November 21, 2018Publication date: May 23, 2019Inventors: Liqun Cheng, Rama Krishna Govindaraju, Haishan Zhu, David Lo, Parthasarathy Ranganathan, Nishant Patil