Patents by Inventor Avishaii Abuhatzera

Avishaii Abuhatzera has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12112171
    Abstract: Techniques for processing loops are described. An exemplary apparatus at least includes decoder circuitry to decode a single instruction, the single instruction to include a field for an opcode, the opcode to indicate execution circuitry is to perform an operation to configure execution of one or more loops, wherein the one or more loops are to include a plurality of configuration instructions and instructions that are to use metadata generated by ones of the plurality of configuration instructions; and execution circuitry to perform the operation as indicated by the opcode.
    Type: Grant
    Filed: December 26, 2020
    Date of Patent: October 8, 2024
    Assignee: Intel Corporation
    Inventors: Anant Nori, Shankar Balachandran, Sreenivas Subramoney, Joydeep Rakshit, Vedvyas Shanbhogue, Avishaii Abuhatzera, Belliappa Kuttanna
  • Publication number: 20240005135
    Abstract: An apparatus to facilitate accelerating neural networks with low precision-based multiplication and exploiting sparsity in higher order bits is disclosed. The apparatus includes a processor comprising a re-encoder to re-encode a first input number of signed input numbers represented in a first precision format as part of a machine learning model, the first input number re-encoded into two signed input numbers of a second precision format, wherein the first precision format is a higher precision format than the second precision format. The processor further includes a multiply-add circuit to perform operations in the first precision format using the two signed input numbers of the second precision format; and a sparsity hardware circuit to reduce computing on zero values at the multiply-add circuit, wherein the processor to execute the machine learning model using the re-encoder, the multiply-add circuit, and the sparsity hardware circuit.
    Type: Application
    Filed: April 18, 2023
    Publication date: January 4, 2024
    Applicant: Intel Corporation
    Inventors: Avishaii Abuhatzera, Om Ji Omer, Ritwika Chowdhury, Lance Hacking
  • Patent number: 11714998
    Abstract: An apparatus to facilitate accelerating neural networks with low precision-based multiplication and exploiting sparsity in higher order bits is disclosed. The apparatus includes a processor comprising a re-encoder to re-encode a first input number of signed input numbers represented in a first precision format as part of a machine learning model, the first input number re-encoded into two signed input numbers of a second precision format, wherein the first precision format is a higher precision format than the second precision format. The processor further includes a multiply-add circuit to perform operations in the first precision format using the two signed input numbers of the second precision format; and a sparsity hardware circuit to reduce computing on zero values at the multiply-add circuit, wherein the processor to execute the machine learning model using the re-encoder, the multiply-add circuit, and the sparsity hardware circuit.
    Type: Grant
    Filed: June 23, 2020
    Date of Patent: August 1, 2023
    Assignee: INTEL CORPORATION
    Inventors: Avishaii Abuhatzera, Om Ji Omer, Ritwika Chowdhury, Lance Hacking
  • Publication number: 20230205692
    Abstract: Apparatus and method for leveraging simultaneous multithreading for bulk compute operations. For example, one embodiment of a processor comprises: a plurality of cores including a first core to simultaneously process instructions of a plurality of threads; a cache hierarchy coupled to the first core and the memory, the cache hierarchy comprising a Level 1 (L1) cache, a Level 2 (L2) cache, and a Level 3 (L3) cache; and a plurality of compute units coupled to the first core including a first compute unit associated with the L1 cache, a second compute unit associated with the L2 cache, and a third compute unit associated with the L3 cache, wherein the first core is to offload instructions for execution by the compute units, the first core to offload instructions from a first thread to the first compute unit, instructions from a second thread to the second compute unit, and instructions from a third thread to the third compute unit.
    Type: Application
    Filed: December 23, 2021
    Publication date: June 29, 2023
    Inventors: ANANT NORI, RAHUL BERA, SHANKAR BALACHANDRAN, JOYDEEP RAKSHIT, Om Ji OMER, SREENIVAS SUBRAMONEY, AVISHAII ABUHATZERA, BELLIAPPA KUTTANNA
  • Publication number: 20220113974
    Abstract: A memory architecture includes processing circuits co-located with memory subarrays for performing computations within the memory architecture. The memory architecture includes a plurality of decoders in hierarchical levels that include a multicast capability for distributing data or compute operations to individual subarrays. The multicast may be configurable with respect to individual fan-outs at each hierarchical level. A computation workflow may be organized into a compute supertile representing one or more “supertiles” of input data to be processed in the compute supertile. The individual data tiles of the input data supertile may be used by multiple compute tiles executed by the processing circuits of the subarrays, and the data tiles multicast to the respective processing circuits for efficient data loading and parallel computation.
    Type: Application
    Filed: December 23, 2021
    Publication date: April 14, 2022
    Applicant: INTEL CORPORATION
    Inventors: Om Ji Omer, Gurpreet Singh Kalsi, Anirud Thyagharajan, Saurabh Jain, Kamlesh R. Pillai, Sreenivas Subramoney, Avishaii Abuhatzera
  • Publication number: 20220100514
    Abstract: Techniques for processing loops are described. An exemplary apparatus at least includes decoder circuitry to decode a single instruction, the single instruction to include a field for an opcode, the opcode to indicate execution circuitry is to perform an operation to configure execution of one or more loops, wherein the one or more loops are to include a plurality of configuration instructions and instructions that are to use metadata generated by ones of the plurality of configuration instructions; and execution circuitry to perform the operation as indicated by the opcode.
    Type: Application
    Filed: December 26, 2020
    Publication date: March 31, 2022
    Inventors: Anant NORI, Shankar BALACHANDRAN, Sreenivas SUBRAMONEY, Joydeep RAKSHIT, Vedvyas SHANBHOGUE, Avishaii ABUHATZERA, Belliappa KUTTANNA
  • Publication number: 20220012571
    Abstract: Apparatuses and articles of manufacture are disclosed. An example apparatus includes an activation function control and decode circuitry to populate an input buffer circuitry with an input data element bit subset of less than a threshold number of bits of the input data element retrieved from the memory circuitry. The activation function and control circuitry also populate a kernel weight buffer circuitry with a weight data element bit subset of less than the threshold number of bits of the weight data element retrieved from the memory circuitry. The apparatus also including a preprocessor circuitry to calculate a partial convolution value of at least a portion of the input data element bit subset and the weight data element bit subset to determine a predicted sign of the partial convolution value.
    Type: Application
    Filed: September 24, 2021
    Publication date: January 13, 2022
    Inventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Bharathwaj Suresh, Sreenivas Subramoney, Avishaii Abuhatzera
  • Publication number: 20200320375
    Abstract: An apparatus to facilitate accelerating neural networks with low precision-based multiplication and exploiting sparsity in higher order bits is disclosed. The apparatus includes a processor comprising a re-encoder to re-encode a first input number of signed input numbers represented in a first precision format as part of a machine learning model, the first input number re-encoded into two signed input numbers of a second precision format, wherein the first precision format is a higher precision format than the second precision format. The processor further includes a multiply-add circuit to perform operations in the first precision format using the two signed input numbers of the second precision format; and a sparsity hardware circuit to reduce computing on zero values at the multiply-add circuit, wherein the processor to execute the machine learning model using the re-encoder, the multiply-add circuit, and the sparsity hardware circuit.
    Type: Application
    Filed: June 23, 2020
    Publication date: October 8, 2020
    Applicant: Intel Corporation
    Inventors: Avishaii Abuhatzera, Om Ji Omer, Ritwika Chowdhury, Lance Hacking