Patents by Inventor Harshit Khaitan

Harshit Khaitan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12265492
    Abstract: In one embodiment, a method includes receiving a token indicating a data chunk becomes available in a first circular buffer of a pre-determined size from a direct memory access component, determining that a computation is to be performed with data including the data chunk based on the token, and generating one or more addresses corresponding to one or more data chunks within the first circular buffer that are to be retrieved for the computation, where a generated address is subtracted by the pre-determined size of the first circular buffer when the generated address is greater than a pre-determined maximum associated with the first circular buffer, and where the generated address is added by the pre-determined size of the first circular buffer when the generated address is less than a pre-determined minimum associated with the first circular buffer.
    Type: Grant
    Filed: February 21, 2023
    Date of Patent: April 1, 2025
    Assignee: Meta Platforms, Inc.
    Inventors: Liangzhen Lai, Harshit Khaitan, Yu Hsin Chen, Kyong Ho Lee, Xu Chen
  • Publication number: 20250045559
    Abstract: A computer-implemented method that includes receiving, by a processing unit, an instruction that specifies data values for performing a tensor computation. In response to receiving the instruction, the method may include, performing, by the processing unit, the tensor computation by executing a loop nest comprising a plurality of loops, wherein a structure of the loop nest is defined based on one or more of the data values of the instruction. The tensor computation can be at least a portion of a computation of a neural network layer. The data values specified by the instruction may comprise a value that specifies a type of the neural network layer, and the structure of the loop nest can be defined at least in part by the type of the neural network layer.
    Type: Application
    Filed: July 9, 2024
    Publication date: February 6, 2025
    Inventors: Ravi Narayanaswami, Dong Hyuk Woo, Olivier Temam, Harshit Khaitan
  • Patent number: 12197362
    Abstract: In one embodiment, a method includes, determining that a bmm operation between a first activation tensor and a second activation tensor needs to be performed, collecting the second activation tensor in two blocks from activation buffers of N tensor processor units, splitting each of the two blocks of the second activation tensor into an MSB tile and an LSB tile, loading the second activation tensor to weight buffers of the N tensor processor units by filling a first entry of each weight buffer of each of the N tensor processor units with contents of the MSB tiles of the two blocks and filling a second entry of the weight buffer with contents of the LSB tiles of the two blocks, and generating a bmm result using the first activation tensor distributed in the activation buffers and the second activation tensor in the weight buffers.
    Type: Grant
    Filed: January 26, 2023
    Date of Patent: January 14, 2025
    Assignee: Meta Platforms, Inc.
    Inventors: Yu Hsin Chen, Liangzhen Lai, Kyong Ho Lee, Harshit Khaitan
  • Publication number: 20240281393
    Abstract: In one embodiment, a method includes receiving a token indicating a data chunk becomes available in a first circular buffer of a pre-determined size from a direct memory access component, determining that a computation is to be performed with data including the data chunk based on the token, and generating one or more addresses corresponding to one or more data chunks within the first circular buffer that are to be retrieved for the computation, where a generated address is subtracted by the pre-determined size of the first circular buffer when the generated address is greater than a pre-determined maximum associated with the first circular buffer, and where the generated address is added by the pre-determined size of the first circular buffer when the generated address is less than a pre-determined minimum associated with the first circular buffer.
    Type: Application
    Filed: February 21, 2023
    Publication date: August 22, 2024
    Inventors: Liangzhen Lai, Harshit Khaitan, Yu Hsin Chen, Kyong Ho Lee, Xu Chen
  • Patent number: 12061968
    Abstract: A computer-implemented method that includes receiving, by a processing unit, an instruction that specifies data values for performing a tensor computation. In response to receiving the instruction, the method may include, performing, by the processing unit, the tensor computation by executing a loop nest comprising a plurality of loops, wherein a structure of the loop nest is defined based on one or more of the data values of the instruction. The tensor computation can be at least a portion of a computation of a neural network layer. The data values specified by the instruction may comprise a value that specifies a type of the neural network layer, and the structure of the loop nest can be defined at least in part by the type of the neural network layer.
    Type: Grant
    Filed: June 21, 2022
    Date of Patent: August 13, 2024
    Assignee: Google LLC
    Inventors: Ravi Narayanaswami, Dong Hyuk Woo, Olivier Temam, Harshit Khaitan
  • Publication number: 20240256475
    Abstract: In one embodiment, a method includes, determining that a bmm operation between a first activation tensor and a second activation tensor needs to be performed, collecting the second activation tensor in two blocks from activation buffers of N tensor processor units, splitting each of the two blocks of the second activation tensor into an MSB tile and an LSB tile, loading the second activation tensor to weight buffers of the N tensor processor units by filling a first entry of each weight buffer of each of the N tensor processor units with contents of the MSB tiles of the two blocks and filling a second entry of the weight buffer with contents of the LSB tiles of the two blocks, and generating a bmm result using the first activation tensor distributed in the activation buffers and the second activation tensor in the weight buffers.
    Type: Application
    Filed: January 26, 2023
    Publication date: August 1, 2024
    Inventors: Yu Hsin Chen, Liangzhen Lai, Kyong Ho Lee, Harshit Khaitan
  • Publication number: 20240231819
    Abstract: A computing unit is disclosed, comprising a first memory bank for storing input activations and a second memory bank for storing parameters used in performing computations. The computing unit includes at least one cell comprising at least one multiply accumulate (“MAC”) operator that receives parameters from the second memory bank and performs computations. The computing unit further includes a first traversal unit that provides a control signal to the first memory bank to cause an input activation to be provided to a data bus accessible by the MAC operator. The computing unit performs one or more computations associated with at least one element of a data array, the one or more computations being performed by the MAC operator and comprising, in part, a multiply operation of the input activation received from the data bus and a parameter received from the second memory bank.
    Type: Application
    Filed: November 9, 2023
    Publication date: July 11, 2024
    Inventors: Olivier Temam, Ravi Narayanaswami, Harshit Khaitan, Dong Hyuk Woo
  • Patent number: 12001893
    Abstract: A system including a machine-learning accelerator (MLA) hardware comprising computation-control units that each have a programmable dependency matrix; and a compiler computing module configured to generate, based on a machine-learning model, dependency instructions indicating dependencies between the computation-control units; wherein the computation-control units include at least: a first computation-control unit configured to generate, after completion of a first operation, a synchronization token representing the completion of the first operation, the synchronization token specifying a recipient identifier for an intended recipient computation-control unit of the synchronization token; a second computation-control unit configured to: configure the programmable dependency matrix of the second computation-control unit according to the dependency instructions to include dependency conditions for performing operations; receive the synchronization token based on the recipient identifier; update a dependency sta
    Type: Grant
    Filed: December 28, 2020
    Date of Patent: June 4, 2024
    Assignee: Meta Platforms, Inc.
    Inventors: Harshit Khaitan, Liangzhen Lai, Xu Chen, Miguel Angel Guerrero, Simon James Hollis
  • Publication number: 20240143525
    Abstract: In one embodiment, a method for iteratively transferring a plurality of non-contiguous blocks of data from a source memory to a destination memory through n-dimensional loops without being re-programmed by a direct memory access within a machine-learning accelerator includes reading a first block of data from a first address of the source memory, processing the first block of data with an ingress modification function, and storing the first block of data to a second address of a data buffer, by an ingress component of the direct memory access within the machine-learning accelerator, and reading a second block of data from a third address of the data buffer, processing the second block of data with an egress modification function, and storing the second block to a fourth address of the destination memory, by an egress component of the direct memory access within the machine-learning accelerator.
    Type: Application
    Filed: October 28, 2022
    Publication date: May 2, 2024
    Inventors: Xu Chen, Kyong Ho Lee, Harshit Khaitan, Liangzhen Lai
  • Patent number: 11954580
    Abstract: In one embodiment, a method for machine learning acceleration includes receiving, by a shared controller of a tensor processor cluster that includes multiple tensor processors, a multi-cycle instruction, determining, based on the instruction, a sequence of vector operations to be executed by the tensor processors and address information usable to determine a respective spatial partition of an input tensor on which each tensor processor is to operate when performing each vector operation. The method also includes, for each vector operation in the sequence, generating, based on the address information, a common address offset, relative to a respective base address associated with each tensor processor, at which each tensor processor is to retrieve the respective spatial partition on which the tensor processor is to operate, multicasting the common address offset to the tensor processors, and controlling the tensor processors to execute the vector operation in parallel and in lock step.
    Type: Grant
    Filed: September 16, 2020
    Date of Patent: April 9, 2024
    Assignee: Meta Platforms, Inc.
    Inventors: Harshit Khaitan, Ganesh Venkatesh, Vikas Chandra
  • Publication number: 20240078417
    Abstract: One embodiment of an accelerator includes a computing unit; a first memory bank for storing input activations and a second memory bank for storing parameters used in performing computations, the second memory bank configured to store a sufficient amount of the neural network parameters on the computing unit to allow for latency below a specified level with throughput above a specified level. The computing unit includes at least one cell comprising at least one multiply accumulate (“MAC”) operator that receives parameters from the second memory bank and performs computations. The computing unit further includes a first traversal unit that provides a control signal to the first memory bank to cause an input activation to be provided to a data bus accessible by the MAC operator. The computing unit performs computations associated with at least one element of a data array, the one or more computations performed by the MAC operator.
    Type: Application
    Filed: June 30, 2023
    Publication date: March 7, 2024
    Inventors: Olivier Temam, Harshit Khaitan, Ravi Narayanaswami, Dong Hyuk Woo
  • Patent number: 11922306
    Abstract: A machine-learning accelerator system, comprising: a plurality of controllers each configured to traverse a feature map with n-dimensions according to instructions that specify, for each of the n-dimensions, a respective traversal size, wherein each controller comprises: a counter stack comprising counters each associated with a respective dimension of the n-dimensions of the feature map, wherein each counter is configured to increment a respective count from a respective initial value to the respective traversal size associated with the respective dimension associated with that counter; a plurality of address generators each configured to use the respective counts of the counters to generate at least one memory address at which a portion of the feature map is stored; and a dependency controller computing module configured to (1) track conditional statuses for incrementing the counters and (2) allow or disallow each of the counters to increment based on the conditional statuses.
    Type: Grant
    Filed: December 28, 2020
    Date of Patent: March 5, 2024
    Assignee: Meta Platforms, Inc.
    Inventors: Harshit Khaitan, Ganesh Venkatesh, Simon James Hollis
  • Patent number: 11893159
    Abstract: This disclosure describes techniques for recognizing gestures performed by a user, including techniques for conserving power when performing finger or hand gesture recognition operations that involve processing electromyography (EMG) data. In one example, a wearable device capable of being worn by a user comprises: a motion detector configured to detect motion of the wearable device; a tissue movement sensor configured to collect tissue movement information associated with motion of muscles or tissues beneath the user's skin; and a gesture detection module.
    Type: Grant
    Filed: October 7, 2022
    Date of Patent: February 6, 2024
    Assignee: META PLATFORMS TECHNOLOGIES, LLC
    Inventors: Rodney Hooker, Maurizio Paganini, Harshit Khaitan
  • Patent number: 11816480
    Abstract: A computing unit is disclosed, comprising a first memory bank for storing input activations and a second memory bank for storing parameters used in performing computations. The computing unit includes at least one cell comprising at least one multiply accumulate (“MAC”) operator that receives parameters from the second memory bank and performs computations. The computing unit further includes a first traversal unit that provides a control signal to the first memory bank to cause an input activation to be provided to a data bus accessible by the MAC operator. The computing unit performs one or more computations associated with at least one element of a data array, the one or more computations being performed by the MAC operator and comprising, in part, a multiply operation of the input activation received from the data bus and a parameter received from the second memory bank.
    Type: Grant
    Filed: August 22, 2022
    Date of Patent: November 14, 2023
    Assignee: Google LLC
    Inventors: Olivier Temam, Ravi Narayanaswami, Harshit Khaitan, Dong Hyuk Woo
  • Patent number: 11727259
    Abstract: One embodiment of an accelerator includes a computing unit; a first memory bank for storing input activations and a second memory bank for storing parameters used in performing computations, the second memory bank configured to store a sufficient amount of the neural network parameters on the computing unit to allow for latency below a specified level with throughput above a specified level. The computing unit includes at least one cell comprising at least one multiply accumulate (“MAC”) operator that receives parameters from the second memory bank and performs computations. The computing unit further includes a first traversal unit that provides a control signal to the first memory bank to cause an input activation to be provided to a data bus accessible by the MAC operator. The computing unit performs computations associated with at least one element of a data array, the one or more computations performed by the MAC operator.
    Type: Grant
    Filed: November 10, 2022
    Date of Patent: August 15, 2023
    Assignee: Google LLC
    Inventors: Olivier Temam, Harshit Khaitan, Ravi Narayanaswami, Dong Hyuk Woo
  • Patent number: 11709783
    Abstract: In one embodiment, a method for tensor data distribution using a direct-memory access agent includes generating, by a first controller, source addresses indicating locations in a source memory where portions of a source tensor are stored. A second controller may generate destination addresses indicating locations in a destination memory where portions of a destination tensor are to be stored. The direct-memory access agent receives a source address generated by the first controller and a destination address generated by the second controller and determines a burst size. The direct-memory access agent may issue a read request comprising the source address and the burst size to read tensor data from the source memory and may store the tensor data into an alignment buffer. The direct-memory access agent then issues a write request comprising the destination address and the burst size to write data from the alignment buffer into the destination memory.
    Type: Grant
    Filed: November 11, 2020
    Date of Patent: July 25, 2023
    Assignee: Meta Platforms, Inc.
    Inventors: Xu Chen, Harshit Khaitan, Yu Hsin Chen, Liangzhen Lai
  • Patent number: 11704562
    Abstract: A system including a machine learning accelerator (MLA) hardware configured to perform machine-learning operations according to native instructions; an interpreter computing module configured to: generate, based on virtual instructions, machine language instructions configured to be processed by a processing hardware implementing the interpreter computing module; and cause the processing hardware to perform machine-learning operations according to the machine language instructions; and a compiler computing module associated with the MLA hardware, the compiler computing module configured to: receive instructions for performing an inference using a machine-learning model; based on the received instructions: generate the native instructions configured to be processed by the MLA hardware, the native instructions specifying first machine-learning operations associated with performing the inference; and generate the virtual instructions configured to be processed by the interpreter computing module, the virtual ins
    Type: Grant
    Filed: November 4, 2020
    Date of Patent: July 18, 2023
    Assignee: Meta Platforms, Inc.
    Inventors: Harshit Khaitan, Miguel Angel Guerrero, Liangzhen Lai, Simon James Hollis
  • Publication number: 20230031432
    Abstract: This disclosure describes techniques for recognizing gestures performed by a user, including techniques for conserving power when performing finger or hand gesture recognition operations that involve processing electromyography (EMG) data. In one example, a wearable device capable of being worn by a user comprises: a motion detector configured to detect motion of the wearable device; a tissue movement sensor configured to collect tissue movement information associated with motion of muscles or tissues beneath the user's skin; and a gesture detection module.
    Type: Application
    Filed: October 7, 2022
    Publication date: February 2, 2023
    Inventors: Rodney Hooker, Maurizio Paganini, Harshit Khaitan
  • Publication number: 20230004386
    Abstract: A computing unit is disclosed, comprising a first memory bank for storing input activations and a second memory bank for storing parameters used in performing computations. The computing unit includes at least one cell comprising at least one multiply accumulate (“MAC”) operator that receives parameters from the second memory bank and performs computations. The computing unit further includes a first traversal unit that provides a control signal to the first memory bank to cause an input activation to be provided to a data bus accessible by the MAC operator. The computing unit performs one or more computations associated with at least one element of a data array, the one or more computations being performed by the MAC operator and comprising, in part, a multiply operation of the input activation received from the data bus and a parameter received from the second memory bank.
    Type: Application
    Filed: August 22, 2022
    Publication date: January 5, 2023
    Inventors: Olivier Temam, Ravi Narayanaswami, Harshit Khaitan, Dong Hyuk Woo
  • Patent number: 11501144
    Abstract: One embodiment of an accelerator includes a computing unit; a first memory bank for storing input activations and a second memory bank for storing parameters used in performing computations, the second memory bank configured to store a sufficient amount of the neural network parameters on the computing unit to allow for latency below a specified level with throughput above a specified level. The computing unit includes at least one cell comprising at least one multiply accumulate (“MAC”) operator that receives parameters from the second memory bank and performs computations. The computing unit further includes a first traversal unit that provides a control signal to the first memory bank to cause an input activation to be provided to a data bus accessible by the MAC operator. The computing unit performs computations associated with at least one element of a data array, the one or more computations performed by the MAC operator.
    Type: Grant
    Filed: September 12, 2019
    Date of Patent: November 15, 2022
    Assignee: Google LLC
    Inventors: Olivier Temam, Harshit Khaitan, Ravi Narayanaswami, Dong Hyuk Woo