Patents by Inventor Harshit Khaitan

Harshit Khaitan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

NEURAL NETWORK ACCELERATOR WITH PARAMETERS RESIDENT ON CHIP

Publication number: 20250232164

Abstract: One embodiment of an accelerator includes a computing unit; a first memory bank for storing input activations and a second memory bank for storing parameters used in performing computations, the second memory bank configured to store a sufficient amount of the neural network parameters on the computing unit to allow for latency below a specified level with throughput above a specified level. The computing unit includes at least one cell comprising at least one multiply accumulate (“MAC”) operator that receives parameters from the second memory bank and performs computations. The computing unit further includes a first traversal unit that provides a control signal to the first memory bank to cause an input activation to be provided to a data bus accessible by the MAC operator. The computing unit performs computations associated with at least one element of a data array, the one or more computations performed by the MAC operator.

Type: Application

Filed: January 15, 2025

Publication date: July 17, 2025

Inventors: Olivier Temam, Harshit Khaitan, Ravi Narayanaswami, Dong Hyuk Woo
NEURAL NETWORK COMPUTE TILE

Publication number: 20250231765

Abstract: A computing unit is disclosed, comprising a first memory bank for storing input activations and a second memory bank for storing parameters used in performing computations. The computing unit includes at least one cell comprising at least one multiply accumulate (“MAC”) operator that receives parameters from the second memory bank and performs computations. The computing unit further includes a first traversal unit that provides a control signal to the first memory bank to cause an input activation to be provided to a data bus accessible by the MAC operator. The computing unit performs one or more computations associated with at least one element of a data array, the one or more computations being performed by the MAC operator and comprising, in part, a multiply operation of the input activation received from the data bus and a parameter received from the second memory bank.

Type: Application

Filed: January 16, 2025

Publication date: July 17, 2025

Inventors: Olivier Temam, Ravi Narayanaswami, Harshit Khaitan, Dong Hyuk Woo
Circular buffer for input and output of tensor computations

Patent number: 12265492

Abstract: In one embodiment, a method includes receiving a token indicating a data chunk becomes available in a first circular buffer of a pre-determined size from a direct memory access component, determining that a computation is to be performed with data including the data chunk based on the token, and generating one or more addresses corresponding to one or more data chunks within the first circular buffer that are to be retrieved for the computation, where a generated address is subtracted by the pre-determined size of the first circular buffer when the generated address is greater than a pre-determined maximum associated with the first circular buffer, and where the generated address is added by the pre-determined size of the first circular buffer when the generated address is less than a pre-determined minimum associated with the first circular buffer.

Type: Grant

Filed: February 21, 2023

Date of Patent: April 1, 2025

Assignee: Meta Platforms, Inc.

Inventors: Liangzhen Lai, Harshit Khaitan, Yu Hsin Chen, Kyong Ho Lee, Xu Chen
NEURAL NETWORK INSTRUCTION SET ARCHITECTURE

Publication number: 20250045559

Abstract: A computer-implemented method that includes receiving, by a processing unit, an instruction that specifies data values for performing a tensor computation. In response to receiving the instruction, the method may include, performing, by the processing unit, the tensor computation by executing a loop nest comprising a plurality of loops, wherein a structure of the loop nest is defined based on one or more of the data values of the instruction. The tensor computation can be at least a portion of a computation of a neural network layer. The data values specified by the instruction may comprise a value that specifies a type of the neural network layer, and the structure of the loop nest can be defined at least in part by the type of the neural network layer.

Type: Application

Filed: July 9, 2024

Publication date: February 6, 2025

Inventors: Ravi Narayanaswami, Dong Hyuk Woo, Olivier Temam, Harshit Khaitan
Batch matrix multiplication operations in a machine learning accelerator

Patent number: 12197362

Abstract: In one embodiment, a method includes, determining that a bmm operation between a first activation tensor and a second activation tensor needs to be performed, collecting the second activation tensor in two blocks from activation buffers of N tensor processor units, splitting each of the two blocks of the second activation tensor into an MSB tile and an LSB tile, loading the second activation tensor to weight buffers of the N tensor processor units by filling a first entry of each weight buffer of each of the N tensor processor units with contents of the MSB tiles of the two blocks and filling a second entry of the weight buffer with contents of the LSB tiles of the two blocks, and generating a bmm result using the first activation tensor distributed in the activation buffers and the second activation tensor in the weight buffers.

Type: Grant

Filed: January 26, 2023

Date of Patent: January 14, 2025

Assignee: Meta Platforms, Inc.

Inventors: Yu Hsin Chen, Liangzhen Lai, Kyong Ho Lee, Harshit Khaitan
CIRCULAR BUFFER FOR INPUT AND OUTPUT OF TENSOR COMPUTATIONS

Publication number: 20240281393

Abstract: In one embodiment, a method includes receiving a token indicating a data chunk becomes available in a first circular buffer of a pre-determined size from a direct memory access component, determining that a computation is to be performed with data including the data chunk based on the token, and generating one or more addresses corresponding to one or more data chunks within the first circular buffer that are to be retrieved for the computation, where a generated address is subtracted by the pre-determined size of the first circular buffer when the generated address is greater than a pre-determined maximum associated with the first circular buffer, and where the generated address is added by the pre-determined size of the first circular buffer when the generated address is less than a pre-determined minimum associated with the first circular buffer.

Type: Application

Filed: February 21, 2023

Publication date: August 22, 2024

Inventors: Liangzhen Lai, Harshit Khaitan, Yu Hsin Chen, Kyong Ho Lee, Xu Chen
Neural network instruction set architecture

Patent number: 12061968

Abstract: A computer-implemented method that includes receiving, by a processing unit, an instruction that specifies data values for performing a tensor computation. In response to receiving the instruction, the method may include, performing, by the processing unit, the tensor computation by executing a loop nest comprising a plurality of loops, wherein a structure of the loop nest is defined based on one or more of the data values of the instruction. The tensor computation can be at least a portion of a computation of a neural network layer. The data values specified by the instruction may comprise a value that specifies a type of the neural network layer, and the structure of the loop nest can be defined at least in part by the type of the neural network layer.

Type: Grant

Filed: June 21, 2022

Date of Patent: August 13, 2024

Assignee: Google LLC

Inventors: Ravi Narayanaswami, Dong Hyuk Woo, Olivier Temam, Harshit Khaitan
BATCH MATRIX MULTIPLICATION OPERATIONS IN A MACHINE LEARNING ACCELERATOR

Publication number: 20240256475

Abstract: In one embodiment, a method includes, determining that a bmm operation between a first activation tensor and a second activation tensor needs to be performed, collecting the second activation tensor in two blocks from activation buffers of N tensor processor units, splitting each of the two blocks of the second activation tensor into an MSB tile and an LSB tile, loading the second activation tensor to weight buffers of the N tensor processor units by filling a first entry of each weight buffer of each of the N tensor processor units with contents of the MSB tiles of the two blocks and filling a second entry of the weight buffer with contents of the LSB tiles of the two blocks, and generating a bmm result using the first activation tensor distributed in the activation buffers and the second activation tensor in the weight buffers.

Type: Application

Filed: January 26, 2023

Publication date: August 1, 2024

Inventors: Yu Hsin Chen, Liangzhen Lai, Kyong Ho Lee, Harshit Khaitan
NEURAL NETWORK COMPUTE TILE

Publication number: 20240231819

Abstract: A computing unit is disclosed, comprising a first memory bank for storing input activations and a second memory bank for storing parameters used in performing computations. The computing unit includes at least one cell comprising at least one multiply accumulate (“MAC”) operator that receives parameters from the second memory bank and performs computations. The computing unit further includes a first traversal unit that provides a control signal to the first memory bank to cause an input activation to be provided to a data bus accessible by the MAC operator. The computing unit performs one or more computations associated with at least one element of a data array, the one or more computations being performed by the MAC operator and comprising, in part, a multiply operation of the input activation received from the data bus and a parameter received from the second memory bank.

Type: Application

Filed: November 9, 2023

Publication date: July 11, 2024

Inventors: Olivier Temam, Ravi Narayanaswami, Harshit Khaitan, Dong Hyuk Woo
Distributed synchronization scheme

Patent number: 12001893

Abstract: A system including a machine-learning accelerator (MLA) hardware comprising computation-control units that each have a programmable dependency matrix; and a compiler computing module configured to generate, based on a machine-learning model, dependency instructions indicating dependencies between the computation-control units; wherein the computation-control units include at least: a first computation-control unit configured to generate, after completion of a first operation, a synchronization token representing the completion of the first operation, the synchronization token specifying a recipient identifier for an intended recipient computation-control unit of the synchronization token; a second computation-control unit configured to: configure the programmable dependency matrix of the second computation-control unit according to the dependency instructions to include dependency conditions for performing operations; receive the synchronization token based on the recipient identifier; update a dependency sta

Type: Grant

Filed: December 28, 2020

Date of Patent: June 4, 2024

Assignee: Meta Platforms, Inc.

Inventors: Harshit Khaitan, Liangzhen Lai, Xu Chen, Miguel Angel Guerrero, Simon James Hollis
TRANSFERRING NON-CONTIGUOUS BLOCKS OF DATA USING INSTRUCTION-BASED DIRECT-MEMORY ACCESS (DMA)

Publication number: 20240143525

Abstract: In one embodiment, a method for iteratively transferring a plurality of non-contiguous blocks of data from a source memory to a destination memory through n-dimensional loops without being re-programmed by a direct memory access within a machine-learning accelerator includes reading a first block of data from a first address of the source memory, processing the first block of data with an ingress modification function, and storing the first block of data to a second address of a data buffer, by an ingress component of the direct memory access within the machine-learning accelerator, and reading a second block of data from a third address of the data buffer, processing the second block of data with an egress modification function, and storing the second block to a fourth address of the destination memory, by an egress component of the direct memory access within the machine-learning accelerator.

Type: Application

Filed: October 28, 2022

Publication date: May 2, 2024

Inventors: Xu Chen, Kyong Ho Lee, Harshit Khaitan, Liangzhen Lai
Spatial tiling of compute arrays with shared control

Patent number: 11954580

Abstract: In one embodiment, a method for machine learning acceleration includes receiving, by a shared controller of a tensor processor cluster that includes multiple tensor processors, a multi-cycle instruction, determining, based on the instruction, a sequence of vector operations to be executed by the tensor processors and address information usable to determine a respective spatial partition of an input tensor on which each tensor processor is to operate when performing each vector operation. The method also includes, for each vector operation in the sequence, generating, based on the address information, a common address offset, relative to a respective base address associated with each tensor processor, at which each tensor processor is to retrieve the respective spatial partition on which the tensor processor is to operate, multicasting the common address offset to the tensor processors, and controlling the tensor processors to execute the vector operation in parallel and in lock step.

Type: Grant

Filed: September 16, 2020

Date of Patent: April 9, 2024

Assignee: Meta Platforms, Inc.

Inventors: Harshit Khaitan, Ganesh Venkatesh, Vikas Chandra
NEURAL NETWORK ACCELERATOR WITH PARAMETERS RESIDENT ON CHIP

Publication number: 20240078417

Abstract: One embodiment of an accelerator includes a computing unit; a first memory bank for storing input activations and a second memory bank for storing parameters used in performing computations, the second memory bank configured to store a sufficient amount of the neural network parameters on the computing unit to allow for latency below a specified level with throughput above a specified level. The computing unit includes at least one cell comprising at least one multiply accumulate (“MAC”) operator that receives parameters from the second memory bank and performs computations. The computing unit further includes a first traversal unit that provides a control signal to the first memory bank to cause an input activation to be provided to a data bus accessible by the MAC operator. The computing unit performs computations associated with at least one element of a data array, the one or more computations performed by the MAC operator.

Type: Application

Filed: June 30, 2023

Publication date: March 7, 2024

Inventors: Olivier Temam, Harshit Khaitan, Ravi Narayanaswami, Dong Hyuk Woo
Tensor controller architecture

Patent number: 11922306

Abstract: A machine-learning accelerator system, comprising: a plurality of controllers each configured to traverse a feature map with n-dimensions according to instructions that specify, for each of the n-dimensions, a respective traversal size, wherein each controller comprises: a counter stack comprising counters each associated with a respective dimension of the n-dimensions of the feature map, wherein each counter is configured to increment a respective count from a respective initial value to the respective traversal size associated with the respective dimension associated with that counter; a plurality of address generators each configured to use the respective counts of the counters to generate at least one memory address at which a portion of the feature map is stored; and a dependency controller computing module configured to (1) track conditional statuses for incrementing the counters and (2) allow or disallow each of the counters to increment based on the conditional statuses.

Type: Grant

Filed: December 28, 2020

Date of Patent: March 5, 2024

Assignee: Meta Platforms, Inc.

Inventors: Harshit Khaitan, Ganesh Venkatesh, Simon James Hollis
Multi-component detection of gestures

Patent number: 11893159

Abstract: This disclosure describes techniques for recognizing gestures performed by a user, including techniques for conserving power when performing finger or hand gesture recognition operations that involve processing electromyography (EMG) data. In one example, a wearable device capable of being worn by a user comprises: a motion detector configured to detect motion of the wearable device; a tissue movement sensor configured to collect tissue movement information associated with motion of muscles or tissues beneath the user's skin; and a gesture detection module.

Type: Grant

Filed: October 7, 2022

Date of Patent: February 6, 2024

Assignee: META PLATFORMS TECHNOLOGIES, LLC

Inventors: Rodney Hooker, Maurizio Paganini, Harshit Khaitan
Neural network compute tile

Patent number: 11816480

Abstract: A computing unit is disclosed, comprising a first memory bank for storing input activations and a second memory bank for storing parameters used in performing computations. The computing unit includes at least one cell comprising at least one multiply accumulate (“MAC”) operator that receives parameters from the second memory bank and performs computations. The computing unit further includes a first traversal unit that provides a control signal to the first memory bank to cause an input activation to be provided to a data bus accessible by the MAC operator. The computing unit performs one or more computations associated with at least one element of a data array, the one or more computations being performed by the MAC operator and comprising, in part, a multiply operation of the input activation received from the data bus and a parameter received from the second memory bank.

Type: Grant

Filed: August 22, 2022

Date of Patent: November 14, 2023

Assignee: Google LLC

Inventors: Olivier Temam, Ravi Narayanaswami, Harshit Khaitan, Dong Hyuk Woo
Neural network accelerator with parameters resident on chip

Patent number: 11727259

Abstract: One embodiment of an accelerator includes a computing unit; a first memory bank for storing input activations and a second memory bank for storing parameters used in performing computations, the second memory bank configured to store a sufficient amount of the neural network parameters on the computing unit to allow for latency below a specified level with throughput above a specified level. The computing unit includes at least one cell comprising at least one multiply accumulate (“MAC”) operator that receives parameters from the second memory bank and performs computations. The computing unit further includes a first traversal unit that provides a control signal to the first memory bank to cause an input activation to be provided to a data bus accessible by the MAC operator. The computing unit performs computations associated with at least one element of a data array, the one or more computations performed by the MAC operator.

Type: Grant

Filed: November 10, 2022

Date of Patent: August 15, 2023

Assignee: Google LLC

Inventors: Olivier Temam, Harshit Khaitan, Ravi Narayanaswami, Dong Hyuk Woo
Tensor data distribution using grid direct-memory access (DMA) controller

Patent number: 11709783

Abstract: In one embodiment, a method for tensor data distribution using a direct-memory access agent includes generating, by a first controller, source addresses indicating locations in a source memory where portions of a source tensor are stored. A second controller may generate destination addresses indicating locations in a destination memory where portions of a destination tensor are to be stored. The direct-memory access agent receives a source address generated by the first controller and a destination address generated by the second controller and determines a burst size. The direct-memory access agent may issue a read request comprising the source address and the burst size to read tensor data from the source memory and may store the tensor data into an alignment buffer. The direct-memory access agent then issues a write request comprising the destination address and the burst size to write data from the alignment buffer into the destination memory.

Type: Grant

Filed: November 11, 2020

Date of Patent: July 25, 2023

Assignee: Meta Platforms, Inc.

Inventors: Xu Chen, Harshit Khaitan, Yu Hsin Chen, Liangzhen Lai
Architecture for virtual instructions

Patent number: 11704562

Abstract: A system including a machine learning accelerator (MLA) hardware configured to perform machine-learning operations according to native instructions; an interpreter computing module configured to: generate, based on virtual instructions, machine language instructions configured to be processed by a processing hardware implementing the interpreter computing module; and cause the processing hardware to perform machine-learning operations according to the machine language instructions; and a compiler computing module associated with the MLA hardware, the compiler computing module configured to: receive instructions for performing an inference using a machine-learning model; based on the received instructions: generate the native instructions configured to be processed by the MLA hardware, the native instructions specifying first machine-learning operations associated with performing the inference; and generate the virtual instructions configured to be processed by the interpreter computing module, the virtual ins

Type: Grant

Filed: November 4, 2020

Date of Patent: July 18, 2023

Assignee: Meta Platforms, Inc.

Inventors: Harshit Khaitan, Miguel Angel Guerrero, Liangzhen Lai, Simon James Hollis
MULTI-COMPONENT DETECTION OF GESTURES

Publication number: 20230031432

Abstract: This disclosure describes techniques for recognizing gestures performed by a user, including techniques for conserving power when performing finger or hand gesture recognition operations that involve processing electromyography (EMG) data. In one example, a wearable device capable of being worn by a user comprises: a motion detector configured to detect motion of the wearable device; a tissue movement sensor configured to collect tissue movement information associated with motion of muscles or tissues beneath the user's skin; and a gesture detection module.

Type: Application

Filed: October 7, 2022

Publication date: February 2, 2023

Inventors: Rodney Hooker, Maurizio Paganini, Harshit Khaitan

1 2 3 4 next