Patents by Inventor Liangzhen Lai

Liangzhen Lai has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Circular buffer for input and output of tensor computations

Patent number: 12265492

Abstract: In one embodiment, a method includes receiving a token indicating a data chunk becomes available in a first circular buffer of a pre-determined size from a direct memory access component, determining that a computation is to be performed with data including the data chunk based on the token, and generating one or more addresses corresponding to one or more data chunks within the first circular buffer that are to be retrieved for the computation, where a generated address is subtracted by the pre-determined size of the first circular buffer when the generated address is greater than a pre-determined maximum associated with the first circular buffer, and where the generated address is added by the pre-determined size of the first circular buffer when the generated address is less than a pre-determined minimum associated with the first circular buffer.

Type: Grant

Filed: February 21, 2023

Date of Patent: April 1, 2025

Assignee: Meta Platforms, Inc.

Inventors: Liangzhen Lai, Harshit Khaitan, Yu Hsin Chen, Kyong Ho Lee, Xu Chen
Batch matrix multiplication operations in a machine learning accelerator

Patent number: 12197362

Abstract: In one embodiment, a method includes, determining that a bmm operation between a first activation tensor and a second activation tensor needs to be performed, collecting the second activation tensor in two blocks from activation buffers of N tensor processor units, splitting each of the two blocks of the second activation tensor into an MSB tile and an LSB tile, loading the second activation tensor to weight buffers of the N tensor processor units by filling a first entry of each weight buffer of each of the N tensor processor units with contents of the MSB tiles of the two blocks and filling a second entry of the weight buffer with contents of the LSB tiles of the two blocks, and generating a bmm result using the first activation tensor distributed in the activation buffers and the second activation tensor in the weight buffers.

Type: Grant

Filed: January 26, 2023

Date of Patent: January 14, 2025

Assignee: Meta Platforms, Inc.

Inventors: Yu Hsin Chen, Liangzhen Lai, Kyong Ho Lee, Harshit Khaitan
CIRCULAR BUFFER FOR INPUT AND OUTPUT OF TENSOR COMPUTATIONS

Publication number: 20240281393

Abstract: In one embodiment, a method includes receiving a token indicating a data chunk becomes available in a first circular buffer of a pre-determined size from a direct memory access component, determining that a computation is to be performed with data including the data chunk based on the token, and generating one or more addresses corresponding to one or more data chunks within the first circular buffer that are to be retrieved for the computation, where a generated address is subtracted by the pre-determined size of the first circular buffer when the generated address is greater than a pre-determined maximum associated with the first circular buffer, and where the generated address is added by the pre-determined size of the first circular buffer when the generated address is less than a pre-determined minimum associated with the first circular buffer.

Type: Application

Filed: February 21, 2023

Publication date: August 22, 2024

Inventors: Liangzhen Lai, Harshit Khaitan, Yu Hsin Chen, Kyong Ho Lee, Xu Chen
BATCH MATRIX MULTIPLICATION OPERATIONS IN A MACHINE LEARNING ACCELERATOR

Publication number: 20240256475

Abstract: In one embodiment, a method includes, determining that a bmm operation between a first activation tensor and a second activation tensor needs to be performed, collecting the second activation tensor in two blocks from activation buffers of N tensor processor units, splitting each of the two blocks of the second activation tensor into an MSB tile and an LSB tile, loading the second activation tensor to weight buffers of the N tensor processor units by filling a first entry of each weight buffer of each of the N tensor processor units with contents of the MSB tiles of the two blocks and filling a second entry of the weight buffer with contents of the LSB tiles of the two blocks, and generating a bmm result using the first activation tensor distributed in the activation buffers and the second activation tensor in the weight buffers.

Type: Application

Filed: January 26, 2023

Publication date: August 1, 2024

Inventors: Yu Hsin Chen, Liangzhen Lai, Kyong Ho Lee, Harshit Khaitan
Distributed synchronization scheme

Patent number: 12001893

Abstract: A system including a machine-learning accelerator (MLA) hardware comprising computation-control units that each have a programmable dependency matrix; and a compiler computing module configured to generate, based on a machine-learning model, dependency instructions indicating dependencies between the computation-control units; wherein the computation-control units include at least: a first computation-control unit configured to generate, after completion of a first operation, a synchronization token representing the completion of the first operation, the synchronization token specifying a recipient identifier for an intended recipient computation-control unit of the synchronization token; a second computation-control unit configured to: configure the programmable dependency matrix of the second computation-control unit according to the dependency instructions to include dependency conditions for performing operations; receive the synchronization token based on the recipient identifier; update a dependency sta

Type: Grant

Filed: December 28, 2020

Date of Patent: June 4, 2024

Assignee: Meta Platforms, Inc.

Inventors: Harshit Khaitan, Liangzhen Lai, Xu Chen, Miguel Angel Guerrero, Simon James Hollis
TRANSFERRING NON-CONTIGUOUS BLOCKS OF DATA USING INSTRUCTION-BASED DIRECT-MEMORY ACCESS (DMA)

Publication number: 20240143525

Abstract: In one embodiment, a method for iteratively transferring a plurality of non-contiguous blocks of data from a source memory to a destination memory through n-dimensional loops without being re-programmed by a direct memory access within a machine-learning accelerator includes reading a first block of data from a first address of the source memory, processing the first block of data with an ingress modification function, and storing the first block of data to a second address of a data buffer, by an ingress component of the direct memory access within the machine-learning accelerator, and reading a second block of data from a third address of the data buffer, processing the second block of data with an egress modification function, and storing the second block to a fourth address of the destination memory, by an egress component of the direct memory access within the machine-learning accelerator.

Type: Application

Filed: October 28, 2022

Publication date: May 2, 2024

Inventors: Xu Chen, Kyong Ho Lee, Harshit Khaitan, Liangzhen Lai
Flexible compute array utilization in a tensor processor

Patent number: 11972349

Abstract: In one embodiment, a method for machine learning acceleration includes receiving instructions to perform convolution on an input tensor using a filter tensor, determining that the size of a first dimension of the input tensor is less than a processing capacity of each of multiple subarrays of computation units in a tensor processor, selecting a second dimension of the input tensor along which to perform the convolution, selecting, based on the second dimension, one or more dimensions of the filter tensor, generating (1) first instructions for reading, using vector read operations, activation elements in the input tensor organized such that elements with different values in the second dimension are stored contiguously in memory, and (2) second instructions for reading weights of the filter tensor along the selected one or more dimensions, and using the first and second instructions to provide the activation elements and the weights to the subarrays.

Type: Grant

Filed: November 12, 2020

Date of Patent: April 30, 2024

Assignee: Meta Platforms, Inc.

Inventors: Liangzhen Lai, Yu Hsin Chen, Vikas Chandra
Systems and methods for reading and writing sparse data in a neural network accelerator

Patent number: 11954025

Abstract: Disclosed herein includes a system, a method, and a device for reading and writing sparse data in a neural network accelerator. A mask identifying byte positions within a data word having non-zero values in memory can be accessed. Each bit of the mask can have a first value or a second value, the first value indicating that a byte of the data word corresponds to a non-zero byte value, the second value indicating that the byte of the data word corresponds to a zero byte value. The data word can be modified to have non-zero byte values stored at an end of a first side of the data word in the memory, and any zero byte values stored in a remainder of the data word. The modified data word can be written to the memory via at least a first slice of a plurality of slices that is configured to access the first side of the data word in the memory.

Type: Grant

Filed: March 24, 2023

Date of Patent: April 9, 2024

Assignee: Meta Platforms Technologies, LLC

Inventors: Ganesh Venkatesh, Liangzhen Lai, Pierce I-Jen Chuang, Meng Li
Tensor data distribution using grid direct-memory access (DMA) controller

Patent number: 11709783

Abstract: In one embodiment, a method for tensor data distribution using a direct-memory access agent includes generating, by a first controller, source addresses indicating locations in a source memory where portions of a source tensor are stored. A second controller may generate destination addresses indicating locations in a destination memory where portions of a destination tensor are to be stored. The direct-memory access agent receives a source address generated by the first controller and a destination address generated by the second controller and determines a burst size. The direct-memory access agent may issue a read request comprising the source address and the burst size to read tensor data from the source memory and may store the tensor data into an alignment buffer. The direct-memory access agent then issues a write request comprising the destination address and the burst size to write data from the alignment buffer into the destination memory.

Type: Grant

Filed: November 11, 2020

Date of Patent: July 25, 2023

Assignee: Meta Platforms, Inc.

Inventors: Xu Chen, Harshit Khaitan, Yu Hsin Chen, Liangzhen Lai
SYSTEMS AND METHODS FOR READING AND WRITING SPARSE DATA IN A NEURAL NETWORK ACCELERATOR

Publication number: 20230229591

Abstract: Disclosed herein includes a system, a method, and a device for reading and writing sparse data in a neural network accelerator. A mask identifying byte positions within a data word having non-zero values in memory can be accessed. Each bit of the mask can have a first value or a second value, the first value indicating that a byte of the data word corresponds to a non-zero byte value, the second value indicating that the byte of the data word corresponds to a zero byte value. The data word can be modified to have non-zero byte values stored at an end of a first side of the data word in the memory, and any zero byte values stored in a remainder of the data word. The modified data word can be written to the memory via at least a first slice of a plurality of slices that is configured to access the first side of the data word in the memory.

Type: Application

Filed: March 24, 2023

Publication date: July 20, 2023

Inventors: Ganesh Venkatesh, Liangzhen Lai, Pierce I-Jen Chuang, Meng Li
Architecture for virtual instructions

Patent number: 11704562

Abstract: A system including a machine learning accelerator (MLA) hardware configured to perform machine-learning operations according to native instructions; an interpreter computing module configured to: generate, based on virtual instructions, machine language instructions configured to be processed by a processing hardware implementing the interpreter computing module; and cause the processing hardware to perform machine-learning operations according to the machine language instructions; and a compiler computing module associated with the MLA hardware, the compiler computing module configured to: receive instructions for performing an inference using a machine-learning model; based on the received instructions: generate the native instructions configured to be processed by the MLA hardware, the native instructions specifying first machine-learning operations associated with performing the inference; and generate the virtual instructions configured to be processed by the interpreter computing module, the virtual ins

Type: Grant

Filed: November 4, 2020

Date of Patent: July 18, 2023

Assignee: Meta Platforms, Inc.

Inventors: Harshit Khaitan, Miguel Angel Guerrero, Liangzhen Lai, Simon James Hollis
Systems and methods for distributing a neural network across multiple computing devices

Patent number: 11698529

Abstract: Disclosed herein is a method for using a neural network across multiple devices. The method can include receiving, by a first device configured with a first one or more layers of a neural network, input data for processing via the neural network implemented across the first device and a second device. The method can include outputting, by the first one or more layers of the neural network implemented on the first device, a data set that is reduced in size relative to the input data while identifying one or more features of the input data for processing by a second one or more layers of the neural network. The method can include communicating, by the first device, the data set to the second device for processing via the second one or more layers of the neural network implemented on the second device.

Type: Grant

Filed: July 9, 2019

Date of Patent: July 11, 2023

Assignee: Meta Platforms Technologies, LLC

Inventors: Liangzhen Lai, Pierce I-Jen Chuang, Vikas Chandra, Ganesh Venkatesh
System and method for performing small channel count convolutions in energy-efficient input operand stationary accelerator

Patent number: 11675998

Abstract: Disclosed herein includes a system, a method, and a device for receiving input data to generate a plurality of outputs for a layer of a neural network. The plurality of outputs are arranged in a first array. Dimensions of the first array may be compared with dimensions of a processing unit (PE) array including a plurality of PEs. According to a result of the comparing, the first array is partitioned into subarrays by the processor. Each of the subarrays has dimensions less than or equal to the dimensions of the PE array. A first group of PEs in the PE array is assigned to a first one of the subarrays. A corresponding output of the plurality of outputs is generated using a portion of the input data by each PE of the first group of PEs assigned to the first one of the subarrays.

Type: Grant

Filed: July 15, 2019

Date of Patent: June 13, 2023

Assignee: Meta Platforms Technologies, LLC

Inventors: Ganesh Venkatesh, Liangzhen Lai, Pierce I-Jen Chuang, Meng Li
Systems and methods for reading and writing sparse data in a neural network accelerator

Patent number: 11630770

Abstract: Disclosed herein includes a system, a method, and a device for reading and writing sparse data in a neural network accelerator. A plurality of slices can be established to access a memory having an access size of a data word. A first slice can be configured to access a first side of the data word in memory. Circuitry can access a mask identifying byte positions within the data word having non-zero values. The circuitry can modify the data word to have non-zero byte values stored starting at an end of the first side, and any zero byte values stored in a remainder of the data word. A determination can be made whether a number of non-zero byte values is less than or equal to a first access size of the first slice. The circuitry can write the modified data word to the memory via at least the first slice.

Type: Grant

Filed: July 11, 2019

Date of Patent: April 18, 2023

Assignee: Meta Platforms Technologies, LLC

Inventors: Ganesh Venkatesh, Liangzhen Lai, Pierce I-Jen Chuang, Meng Li
COUNTER BASED MULTIPLY-AND-ACCUMULATE CIRCUIT FOR NEURAL NETWORK

Publication number: 20220308835

Abstract: Disclosed herein includes a system, a method, and a device for improving computation efficiency of a neural network. In one aspect, adder circuitry is configured to add input data from processing of the neural network and a first number of bits of accumulated data for the neural network to generate summation data. In one aspect, according to a carry value of the adding from the adder circuitry, a multiplexer is configured to select between i) a second number of bits of the accumulated data and ii) incremented data comprising the second number of bits of the accumulated data incremented by a predetermined value. The summation data appended with the selected one of the second number of bits of the accumulated data or the incremented data may form appended data.

Type: Application

Filed: June 15, 2022

Publication date: September 29, 2022

Inventors: Liangzhen LAI, Pierce I-Jen CHUANG
POWER EFFICIENT MULTIPLY-ACCUMULATE CIRCUITRY

Publication number: 20220237262

Abstract: Disclosed herein includes a system, a method, and a device for multiply-accumulate operation. In one aspect, an input operand is received by control circuitry. In one aspect, the control circuitry determines a sparsity of the input operand, where the sparsity may indicate whether a value of the input operand has a predetermined value or not. In one aspect, the control circuitry determines a stationarity of the input operand, where the stationarity may indicate whether the value of the input operand changes over one or more clock cycles. In one aspect, the input operand is provided to multiply-accumulate circuitry as an input, according to the determined sparsity and stationarity of the input operand.

Type: Application

Filed: April 11, 2022

Publication date: July 28, 2022

Inventor: Liangzhen Lai
Counter based multiply-and-accumulate circuit for neural network

Patent number: 11385864

Abstract: Disclosed herein includes a system, a method, and a device for improving computation efficiency of a neural network. In one aspect, adder circuitry is configured to add input data from processing of the neural network and a first number of bits of accumulated data for the neural network to generate summation data. In one aspect, according to a carry value of the adding from the adder circuitry, a multiplexer is configured to select between i) a second number of bits of the accumulated data and ii) incremented data comprising the second number of bits of the accumulated data incremented by a predetermined value. The summation data appended with the selected one of the second number of bits of the accumulated data or the incremented data may form appended data.

Type: Grant

Filed: July 2, 2019

Date of Patent: July 12, 2022

Assignee: Facebook Technologies, LLC

Inventors: Liangzhen Lai, Pierce I-Jen Chuang
Power efficient multiply-accumulate circuitry

Patent number: 11301545

Abstract: Disclosed herein includes a system, a method, and a device for multiply-accumulate operation. In one aspect, an input operand is received by control circuitry. In one aspect, the control circuitry determines a sparsity of the input operand, where the sparsity may indicate whether a value of the input operand has a predetermined value or not. In one aspect, the control circuitry determines a stationarity of the input operand, where the stationarity may indicate whether the value of the input operand changes over one or more clock cycles. In one aspect, the input operand is provided to multiply-accumulate circuitry as an input, according to the determined sparsity and stationarity of the input operand.

Type: Grant

Filed: July 11, 2019

Date of Patent: April 12, 2022

Assignee: FACEBOOK TECHNOLOGIES, LLC

Inventor: Liangzhen Lai
System and method for supporting alternate number format for efficient multiplication

Patent number: 10977002

Abstract: Disclosed herein includes a system, a method, and a device including shift circuitry and add circuitry for performing multiplication of a first value and a second value for a neural network. The first value has a predetermined format including a first bit, and two or more second bits to represent a value of zero or 2n where n is an integer greater than or equal to 0. The device shifts, when the two or more second bits represent the value of 2n, the second value by (n+1) bits via the shift circuitry to provide a first result, selectively outputs zero or the second value, based on a value of the first bit of the first value, to provide a second result, and adds the first result and the second results via the add circuitry to provide a result of the multiplication of the first and second values.

Type: Grant

Filed: July 15, 2019

Date of Patent: April 13, 2021

Assignee: Facebook Technologies, LLC

Inventors: Ganesh Venkatesh, Liangzhen Lai, Pierce I-Jen Chuang, Meng Li, Vikas Chandra
SYSTEM AND METHOD FOR SUPPORTING ALTERNATE NUMBER FORMAT FOR EFFICIENT MULTIPLICATION

Publication number: 20210019115

Abstract: Disclosed herein includes a system, a method, and a device including shift circuitry and add circuitry for performing multiplication of a first value and a second value for a neural network. The first value has a predetermined format including a first bit, and two or more second bits to represent a value of zero or 2n where n is an integer greater than or equal to 0. The device shifts, when the two or more second bits represent the value of 2n, the second value by (n+1) bits via the shift circuitry to provide a first result, selectively outputs zero or the second value, based on a value of the first bit of the first value, to provide a second result, and adds the first result and the second results via the add circuitry to provide a result of the multiplication of the first and second values.

Type: Application

Filed: July 15, 2019

Publication date: January 21, 2021

Applicant: Facebook Technologies, LLC

Inventors: Ganesh Venkatesh, Liangzhen Lai, Pierce I-Jen Chuang, Meng Li, Vikas Chandra

1 2 next