Patents by Inventor Harshit Khaitan
Harshit Khaitan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12265492Abstract: In one embodiment, a method includes receiving a token indicating a data chunk becomes available in a first circular buffer of a pre-determined size from a direct memory access component, determining that a computation is to be performed with data including the data chunk based on the token, and generating one or more addresses corresponding to one or more data chunks within the first circular buffer that are to be retrieved for the computation, where a generated address is subtracted by the pre-determined size of the first circular buffer when the generated address is greater than a pre-determined maximum associated with the first circular buffer, and where the generated address is added by the pre-determined size of the first circular buffer when the generated address is less than a pre-determined minimum associated with the first circular buffer.Type: GrantFiled: February 21, 2023Date of Patent: April 1, 2025Assignee: Meta Platforms, Inc.Inventors: Liangzhen Lai, Harshit Khaitan, Yu Hsin Chen, Kyong Ho Lee, Xu Chen
-
Publication number: 20250045559Abstract: A computer-implemented method that includes receiving, by a processing unit, an instruction that specifies data values for performing a tensor computation. In response to receiving the instruction, the method may include, performing, by the processing unit, the tensor computation by executing a loop nest comprising a plurality of loops, wherein a structure of the loop nest is defined based on one or more of the data values of the instruction. The tensor computation can be at least a portion of a computation of a neural network layer. The data values specified by the instruction may comprise a value that specifies a type of the neural network layer, and the structure of the loop nest can be defined at least in part by the type of the neural network layer.Type: ApplicationFiled: July 9, 2024Publication date: February 6, 2025Inventors: Ravi Narayanaswami, Dong Hyuk Woo, Olivier Temam, Harshit Khaitan
-
Patent number: 12197362Abstract: In one embodiment, a method includes, determining that a bmm operation between a first activation tensor and a second activation tensor needs to be performed, collecting the second activation tensor in two blocks from activation buffers of N tensor processor units, splitting each of the two blocks of the second activation tensor into an MSB tile and an LSB tile, loading the second activation tensor to weight buffers of the N tensor processor units by filling a first entry of each weight buffer of each of the N tensor processor units with contents of the MSB tiles of the two blocks and filling a second entry of the weight buffer with contents of the LSB tiles of the two blocks, and generating a bmm result using the first activation tensor distributed in the activation buffers and the second activation tensor in the weight buffers.Type: GrantFiled: January 26, 2023Date of Patent: January 14, 2025Assignee: Meta Platforms, Inc.Inventors: Yu Hsin Chen, Liangzhen Lai, Kyong Ho Lee, Harshit Khaitan
-
Publication number: 20240281393Abstract: In one embodiment, a method includes receiving a token indicating a data chunk becomes available in a first circular buffer of a pre-determined size from a direct memory access component, determining that a computation is to be performed with data including the data chunk based on the token, and generating one or more addresses corresponding to one or more data chunks within the first circular buffer that are to be retrieved for the computation, where a generated address is subtracted by the pre-determined size of the first circular buffer when the generated address is greater than a pre-determined maximum associated with the first circular buffer, and where the generated address is added by the pre-determined size of the first circular buffer when the generated address is less than a pre-determined minimum associated with the first circular buffer.Type: ApplicationFiled: February 21, 2023Publication date: August 22, 2024Inventors: Liangzhen Lai, Harshit Khaitan, Yu Hsin Chen, Kyong Ho Lee, Xu Chen
-
Patent number: 12061968Abstract: A computer-implemented method that includes receiving, by a processing unit, an instruction that specifies data values for performing a tensor computation. In response to receiving the instruction, the method may include, performing, by the processing unit, the tensor computation by executing a loop nest comprising a plurality of loops, wherein a structure of the loop nest is defined based on one or more of the data values of the instruction. The tensor computation can be at least a portion of a computation of a neural network layer. The data values specified by the instruction may comprise a value that specifies a type of the neural network layer, and the structure of the loop nest can be defined at least in part by the type of the neural network layer.Type: GrantFiled: June 21, 2022Date of Patent: August 13, 2024Assignee: Google LLCInventors: Ravi Narayanaswami, Dong Hyuk Woo, Olivier Temam, Harshit Khaitan
-
Publication number: 20240256475Abstract: In one embodiment, a method includes, determining that a bmm operation between a first activation tensor and a second activation tensor needs to be performed, collecting the second activation tensor in two blocks from activation buffers of N tensor processor units, splitting each of the two blocks of the second activation tensor into an MSB tile and an LSB tile, loading the second activation tensor to weight buffers of the N tensor processor units by filling a first entry of each weight buffer of each of the N tensor processor units with contents of the MSB tiles of the two blocks and filling a second entry of the weight buffer with contents of the LSB tiles of the two blocks, and generating a bmm result using the first activation tensor distributed in the activation buffers and the second activation tensor in the weight buffers.Type: ApplicationFiled: January 26, 2023Publication date: August 1, 2024Inventors: Yu Hsin Chen, Liangzhen Lai, Kyong Ho Lee, Harshit Khaitan
-
Publication number: 20240231819Abstract: A computing unit is disclosed, comprising a first memory bank for storing input activations and a second memory bank for storing parameters used in performing computations. The computing unit includes at least one cell comprising at least one multiply accumulate (“MAC”) operator that receives parameters from the second memory bank and performs computations. The computing unit further includes a first traversal unit that provides a control signal to the first memory bank to cause an input activation to be provided to a data bus accessible by the MAC operator. The computing unit performs one or more computations associated with at least one element of a data array, the one or more computations being performed by the MAC operator and comprising, in part, a multiply operation of the input activation received from the data bus and a parameter received from the second memory bank.Type: ApplicationFiled: November 9, 2023Publication date: July 11, 2024Inventors: Olivier Temam, Ravi Narayanaswami, Harshit Khaitan, Dong Hyuk Woo
-
Patent number: 12001893Abstract: A system including a machine-learning accelerator (MLA) hardware comprising computation-control units that each have a programmable dependency matrix; and a compiler computing module configured to generate, based on a machine-learning model, dependency instructions indicating dependencies between the computation-control units; wherein the computation-control units include at least: a first computation-control unit configured to generate, after completion of a first operation, a synchronization token representing the completion of the first operation, the synchronization token specifying a recipient identifier for an intended recipient computation-control unit of the synchronization token; a second computation-control unit configured to: configure the programmable dependency matrix of the second computation-control unit according to the dependency instructions to include dependency conditions for performing operations; receive the synchronization token based on the recipient identifier; update a dependency staType: GrantFiled: December 28, 2020Date of Patent: June 4, 2024Assignee: Meta Platforms, Inc.Inventors: Harshit Khaitan, Liangzhen Lai, Xu Chen, Miguel Angel Guerrero, Simon James Hollis
-
Publication number: 20240143525Abstract: In one embodiment, a method for iteratively transferring a plurality of non-contiguous blocks of data from a source memory to a destination memory through n-dimensional loops without being re-programmed by a direct memory access within a machine-learning accelerator includes reading a first block of data from a first address of the source memory, processing the first block of data with an ingress modification function, and storing the first block of data to a second address of a data buffer, by an ingress component of the direct memory access within the machine-learning accelerator, and reading a second block of data from a third address of the data buffer, processing the second block of data with an egress modification function, and storing the second block to a fourth address of the destination memory, by an egress component of the direct memory access within the machine-learning accelerator.Type: ApplicationFiled: October 28, 2022Publication date: May 2, 2024Inventors: Xu Chen, Kyong Ho Lee, Harshit Khaitan, Liangzhen Lai
-
Patent number: 11954580Abstract: In one embodiment, a method for machine learning acceleration includes receiving, by a shared controller of a tensor processor cluster that includes multiple tensor processors, a multi-cycle instruction, determining, based on the instruction, a sequence of vector operations to be executed by the tensor processors and address information usable to determine a respective spatial partition of an input tensor on which each tensor processor is to operate when performing each vector operation. The method also includes, for each vector operation in the sequence, generating, based on the address information, a common address offset, relative to a respective base address associated with each tensor processor, at which each tensor processor is to retrieve the respective spatial partition on which the tensor processor is to operate, multicasting the common address offset to the tensor processors, and controlling the tensor processors to execute the vector operation in parallel and in lock step.Type: GrantFiled: September 16, 2020Date of Patent: April 9, 2024Assignee: Meta Platforms, Inc.Inventors: Harshit Khaitan, Ganesh Venkatesh, Vikas Chandra
-
Publication number: 20240078417Abstract: One embodiment of an accelerator includes a computing unit; a first memory bank for storing input activations and a second memory bank for storing parameters used in performing computations, the second memory bank configured to store a sufficient amount of the neural network parameters on the computing unit to allow for latency below a specified level with throughput above a specified level. The computing unit includes at least one cell comprising at least one multiply accumulate (“MAC”) operator that receives parameters from the second memory bank and performs computations. The computing unit further includes a first traversal unit that provides a control signal to the first memory bank to cause an input activation to be provided to a data bus accessible by the MAC operator. The computing unit performs computations associated with at least one element of a data array, the one or more computations performed by the MAC operator.Type: ApplicationFiled: June 30, 2023Publication date: March 7, 2024Inventors: Olivier Temam, Harshit Khaitan, Ravi Narayanaswami, Dong Hyuk Woo
-
Patent number: 11922306Abstract: A machine-learning accelerator system, comprising: a plurality of controllers each configured to traverse a feature map with n-dimensions according to instructions that specify, for each of the n-dimensions, a respective traversal size, wherein each controller comprises: a counter stack comprising counters each associated with a respective dimension of the n-dimensions of the feature map, wherein each counter is configured to increment a respective count from a respective initial value to the respective traversal size associated with the respective dimension associated with that counter; a plurality of address generators each configured to use the respective counts of the counters to generate at least one memory address at which a portion of the feature map is stored; and a dependency controller computing module configured to (1) track conditional statuses for incrementing the counters and (2) allow or disallow each of the counters to increment based on the conditional statuses.Type: GrantFiled: December 28, 2020Date of Patent: March 5, 2024Assignee: Meta Platforms, Inc.Inventors: Harshit Khaitan, Ganesh Venkatesh, Simon James Hollis
-
Patent number: 11893159Abstract: This disclosure describes techniques for recognizing gestures performed by a user, including techniques for conserving power when performing finger or hand gesture recognition operations that involve processing electromyography (EMG) data. In one example, a wearable device capable of being worn by a user comprises: a motion detector configured to detect motion of the wearable device; a tissue movement sensor configured to collect tissue movement information associated with motion of muscles or tissues beneath the user's skin; and a gesture detection module.Type: GrantFiled: October 7, 2022Date of Patent: February 6, 2024Assignee: META PLATFORMS TECHNOLOGIES, LLCInventors: Rodney Hooker, Maurizio Paganini, Harshit Khaitan
-
Patent number: 11816480Abstract: A computing unit is disclosed, comprising a first memory bank for storing input activations and a second memory bank for storing parameters used in performing computations. The computing unit includes at least one cell comprising at least one multiply accumulate (“MAC”) operator that receives parameters from the second memory bank and performs computations. The computing unit further includes a first traversal unit that provides a control signal to the first memory bank to cause an input activation to be provided to a data bus accessible by the MAC operator. The computing unit performs one or more computations associated with at least one element of a data array, the one or more computations being performed by the MAC operator and comprising, in part, a multiply operation of the input activation received from the data bus and a parameter received from the second memory bank.Type: GrantFiled: August 22, 2022Date of Patent: November 14, 2023Assignee: Google LLCInventors: Olivier Temam, Ravi Narayanaswami, Harshit Khaitan, Dong Hyuk Woo
-
Patent number: 11727259Abstract: One embodiment of an accelerator includes a computing unit; a first memory bank for storing input activations and a second memory bank for storing parameters used in performing computations, the second memory bank configured to store a sufficient amount of the neural network parameters on the computing unit to allow for latency below a specified level with throughput above a specified level. The computing unit includes at least one cell comprising at least one multiply accumulate (“MAC”) operator that receives parameters from the second memory bank and performs computations. The computing unit further includes a first traversal unit that provides a control signal to the first memory bank to cause an input activation to be provided to a data bus accessible by the MAC operator. The computing unit performs computations associated with at least one element of a data array, the one or more computations performed by the MAC operator.Type: GrantFiled: November 10, 2022Date of Patent: August 15, 2023Assignee: Google LLCInventors: Olivier Temam, Harshit Khaitan, Ravi Narayanaswami, Dong Hyuk Woo
-
Patent number: 11709783Abstract: In one embodiment, a method for tensor data distribution using a direct-memory access agent includes generating, by a first controller, source addresses indicating locations in a source memory where portions of a source tensor are stored. A second controller may generate destination addresses indicating locations in a destination memory where portions of a destination tensor are to be stored. The direct-memory access agent receives a source address generated by the first controller and a destination address generated by the second controller and determines a burst size. The direct-memory access agent may issue a read request comprising the source address and the burst size to read tensor data from the source memory and may store the tensor data into an alignment buffer. The direct-memory access agent then issues a write request comprising the destination address and the burst size to write data from the alignment buffer into the destination memory.Type: GrantFiled: November 11, 2020Date of Patent: July 25, 2023Assignee: Meta Platforms, Inc.Inventors: Xu Chen, Harshit Khaitan, Yu Hsin Chen, Liangzhen Lai
-
Patent number: 11704562Abstract: A system including a machine learning accelerator (MLA) hardware configured to perform machine-learning operations according to native instructions; an interpreter computing module configured to: generate, based on virtual instructions, machine language instructions configured to be processed by a processing hardware implementing the interpreter computing module; and cause the processing hardware to perform machine-learning operations according to the machine language instructions; and a compiler computing module associated with the MLA hardware, the compiler computing module configured to: receive instructions for performing an inference using a machine-learning model; based on the received instructions: generate the native instructions configured to be processed by the MLA hardware, the native instructions specifying first machine-learning operations associated with performing the inference; and generate the virtual instructions configured to be processed by the interpreter computing module, the virtual insType: GrantFiled: November 4, 2020Date of Patent: July 18, 2023Assignee: Meta Platforms, Inc.Inventors: Harshit Khaitan, Miguel Angel Guerrero, Liangzhen Lai, Simon James Hollis
-
Publication number: 20230031432Abstract: This disclosure describes techniques for recognizing gestures performed by a user, including techniques for conserving power when performing finger or hand gesture recognition operations that involve processing electromyography (EMG) data. In one example, a wearable device capable of being worn by a user comprises: a motion detector configured to detect motion of the wearable device; a tissue movement sensor configured to collect tissue movement information associated with motion of muscles or tissues beneath the user's skin; and a gesture detection module.Type: ApplicationFiled: October 7, 2022Publication date: February 2, 2023Inventors: Rodney Hooker, Maurizio Paganini, Harshit Khaitan
-
Publication number: 20230004386Abstract: A computing unit is disclosed, comprising a first memory bank for storing input activations and a second memory bank for storing parameters used in performing computations. The computing unit includes at least one cell comprising at least one multiply accumulate (“MAC”) operator that receives parameters from the second memory bank and performs computations. The computing unit further includes a first traversal unit that provides a control signal to the first memory bank to cause an input activation to be provided to a data bus accessible by the MAC operator. The computing unit performs one or more computations associated with at least one element of a data array, the one or more computations being performed by the MAC operator and comprising, in part, a multiply operation of the input activation received from the data bus and a parameter received from the second memory bank.Type: ApplicationFiled: August 22, 2022Publication date: January 5, 2023Inventors: Olivier Temam, Ravi Narayanaswami, Harshit Khaitan, Dong Hyuk Woo
-
Patent number: 11501144Abstract: One embodiment of an accelerator includes a computing unit; a first memory bank for storing input activations and a second memory bank for storing parameters used in performing computations, the second memory bank configured to store a sufficient amount of the neural network parameters on the computing unit to allow for latency below a specified level with throughput above a specified level. The computing unit includes at least one cell comprising at least one multiply accumulate (“MAC”) operator that receives parameters from the second memory bank and performs computations. The computing unit further includes a first traversal unit that provides a control signal to the first memory bank to cause an input activation to be provided to a data bus accessible by the MAC operator. The computing unit performs computations associated with at least one element of a data array, the one or more computations performed by the MAC operator.Type: GrantFiled: September 12, 2019Date of Patent: November 15, 2022Assignee: Google LLCInventors: Olivier Temam, Harshit Khaitan, Ravi Narayanaswami, Dong Hyuk Woo