Patents by Inventor Kamlesh R. Pillai

Kamlesh R. Pillai has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20220391128
    Abstract: Example compute-in-memory (CIM) or processor-in-memory (PIM) techniques using repurposed or dedicated static random access memory (SRAM) rows of an SRAM sub-array to store look-up-table (LUT) entries for use in a multiply and accumulate (MAC) operation.
    Type: Application
    Filed: June 7, 2021
    Publication date: December 8, 2022
    Inventors: Saurabh JAIN, Srivatsa RANGACHAR SRINIVASA, Akshay Krishna RAMANATHAN, Gurpreet Singh KALSI, Kamlesh R. PILLAI, Sreenivas SUBRAMONEY
  • Publication number: 20220113974
    Abstract: A memory architecture includes processing circuits co-located with memory subarrays for performing computations within the memory architecture. The memory architecture includes a plurality of decoders in hierarchical levels that include a multicast capability for distributing data or compute operations to individual subarrays. The multicast may be configurable with respect to individual fan-outs at each hierarchical level. A computation workflow may be organized into a compute supertile representing one or more “supertiles” of input data to be processed in the compute supertile. The individual data tiles of the input data supertile may be used by multiple compute tiles executed by the processing circuits of the subarrays, and the data tiles multicast to the respective processing circuits for efficient data loading and parallel computation.
    Type: Application
    Filed: December 23, 2021
    Publication date: April 14, 2022
    Applicant: INTEL CORPORATION
    Inventors: Om Ji Omer, Gurpreet Singh Kalsi, Anirud Thyagharajan, Saurabh Jain, Kamlesh R. Pillai, Sreenivas Subramoney, Avishaii Abuhatzera
  • Patent number: 11169957
    Abstract: Systems and techniques are provided for hardware architecture used in parallel computing applications to improve computation efficiency. An integrated circuit system may include a data store that stores data for processing and a reconfigurable systolic array that may process the data. The reconfigurable systolic array may include a first row of processing elements (PE) that process the data according to a first function and a second row of PE that process the data according to a second function. The reconfigurable systolic array may also include a routing block coupled to the first row of PE, the second row of PE, and the data store. Further, the reconfigurable systolic array may receive data from the first row of PE, transmit the data received from the first row of PE to the second row of PE, and transmit data output by the second row of PE to the first row of PE.
    Type: Grant
    Filed: March 31, 2019
    Date of Patent: November 9, 2021
    Assignee: Intel Corporation
    Inventors: Kamlesh R. Pillai, Christopher J. Hughes
  • Patent number: 11138290
    Abstract: The present disclosure is directed to systems and methods for performing discrete cosine transforms and inverse discrete cosine transforms (DCT/IDCT) using a CORDIC algorithm implemented in systolic array circuitry that includes a plurality cells or nodes, each containing circuitry to implement the CORDIC algorithm. DCT/IDCT control circuitry multiplies the systolic array output matrix generated by the systolic array circuitry by a scaling factor that may include a defined scaling value or an actual cosine value. The DCT/IDCT control circuitry causes the transfer of the scaled systolic array output matrix to combination circuitry where the DCT/IDCT input matrix is combined with the scaled systolic array output matrix to provide the DCT/IDCT output matrix. The DCT/IDCT control circuitry also transfers bypass information to at least a portion of the cells or nodes in the systolic array circuitry.
    Type: Grant
    Filed: March 30, 2019
    Date of Patent: October 5, 2021
    Assignee: Intel Corporation
    Inventors: Kamlesh R. Pillai, Christopher J. Hughes
  • Publication number: 20210200711
    Abstract: A system is provided that includes a reconfigurable systolic array circuitry. The reconfigurable systolic array circuitry includes a first circuit block comprising one or more groups of processing elements and a second circuit block comprising one or more groups of processing elements. The reconfigurable systolic array circuitry further includes a first bias addition with accumulation circuitry configured to add a matrix bias to an accumulated value, to a multiplication product, or to a combination thereof. The reconfigurable systolic array circuitry additionally includes a first routing circuitry configured to route derivations from the first circuit block into the second circuit block, from the first circuit block into the first bias addition with accumulation circuitry, or into a combination thereof.
    Type: Application
    Filed: December 28, 2019
    Publication date: July 1, 2021
    Inventors: Kamlesh R. Pillai, Gurpreet Singh Kalsi, Christopher Justin Hughes
  • Publication number: 20210166114
    Abstract: Embodiments are generally directed to techniques for accelerating neural networks. Many embodiments include a hardware accelerator for a bi-directional multi-layered GRU and LC neural network. Some embodiments are particularly directed to a hardware accelerator that enables offloading of the entire LC+GRU network to the hardware accelerator. Various embodiments include a hardware accelerator with a plurality of matrix vector units to perform GRU steps in parallel with LC steps. For example, at least a portion of computation by a first matrix vector unit of a GRU step in a neural network may overlap at least a portion of computation by a second matrix vector unit of an output feature vector for the neural network. Several embodiments include overlapping computation associated with a layer of a neural network with data transfer associated with another of the neural network.
    Type: Application
    Filed: February 10, 2021
    Publication date: June 3, 2021
    Applicant: Intel Corporation
    Inventors: Gurpreet S Kalsi, Ramachandra Chakenalli Nanjegowda, Kamlesh R Pillai, Sreenivas Subramoney
  • Publication number: 20200311021
    Abstract: Systems and techniques are provided for hardware architecture used in parallel computing applications to improve computation efficiency. An integrated circuit system may include a data store that stores data for processing and a reconfigurable systolic array that may process the data. The reconfigurable systolic array may include a first row of processing elements (PE) that process the data according to a first function and a second row of PE that process the data according to a second function. The reconfigurable systolic array may also include a routing block coupled to the first row of PE, the second row of PE, and the data store. Further, the reconfigurable systolic array may receive data from the first row of PE, transmit the data received from the first row of PE to the second row of PE, and transmit data output by the second row of PE to the first row of PE.
    Type: Application
    Filed: March 31, 2019
    Publication date: October 1, 2020
    Inventors: Kamlesh R. Pillai, Christopher J. Hughes
  • Publication number: 20200026745
    Abstract: Systems, methods, and apparatuses relating to a matrix operations accelerator are described.
    Type: Application
    Filed: September 27, 2019
    Publication date: January 23, 2020
    Inventors: Kamlesh R. Pillai, Christopher J. Hughes, Alexander Heinecke
  • Patent number: 10528322
    Abstract: One embodiment provides a unified multifunction circuitry. The unified multifunction circuitry includes a logarithm circuitry and an antilogarithm circuitry. The logarithm circuitry is to determine a log output operand. The log output operand includes a piecewise linear approximation of a base 2 logarithm of a significand of a log input operand. The antilogarithm circuitry is to determine an antilog output operand. The antilog output operand includes a piecewise linear approximation of a base 2 antilogarithm of a fraction of a selected input operand.
    Type: Grant
    Filed: December 29, 2017
    Date of Patent: January 7, 2020
    Assignee: Intel IP Corporation
    Inventors: Kamlesh R. Pillai, Gurpreet S. Kalsi
  • Patent number: 10445064
    Abstract: Implementations of the disclosure provide logarithm and anti-logarithm operations on a hardware processor based on linear piecewise approximation. An example processor includes a piece wise linear log approximation circuit that receives an input of a floating-point number comprising a sign, an exponent and a mantissa. The piece wise linear log approximation circuit approximates a fractional portion of a fixed point number using a linear approximation of the mantissa of the floating-point number. The piece wise linear log approximation circuit also derives an integer from the exponent.
    Type: Grant
    Filed: February 3, 2017
    Date of Patent: October 15, 2019
    Assignee: Intel Corporation
    Inventors: Kamlesh R. Pillai, Gurpreet S. Kalsi
  • Publication number: 20190228049
    Abstract: The present disclosure is directed to systems and methods for performing discrete cosine transforms and inverse discrete cosine transforms (DCT/IDCT) using a CORDIC algorithm implemented in systolic array circuitry that includes a plurality cells or nodes, each containing circuitry to implement the CORDIC algorithm. DCT/IDCT control circuitry multiplies the systolic array output matrix generated by the systolic array circuitry by a scaling factor that may include a defined scaling value or an actual cosine value. The DCT/IDCT control circuitry causes the transfer of the scaled systolic array output matrix to combination circuitry where the DCT/IDCT input matrix is combined with the scaled systolic array output matrix to provide the DCT/IDCT output matrix. The DCT/IDCT control circuitry also transfers bypass information to at least a portion of the cells or nodes in the systolic array circuitry.
    Type: Application
    Filed: March 30, 2019
    Publication date: July 25, 2019
    Applicant: Intel Corporation
    Inventors: Kamlesh R. Pillai, Christopher J. Hughes
  • Publication number: 20190042192
    Abstract: One embodiment provides a unified multifunction circuitry. The unified multifunction circuitry includes a logarithm circuitry and an antilogarithm circuitry. The logarithm circuitry is to determine a log output operand. The log output operand includes a piecewise linear approximation of a base 2 logarithm of a significand of a log input operand. The antilogarithm circuitry is to determine an antilog output operand. The antilog output operand includes a piecewise linear approximation of a base 2 antilogarithm of a fraction of a selected input operand.
    Type: Application
    Filed: December 29, 2017
    Publication date: February 7, 2019
    Applicant: Intel IP Corporation
    Inventors: Kamlesh R. Pillai, GURPREET S. KALSI
  • Publication number: 20180225093
    Abstract: Implementations of the disclosure provide logarithm and anti-logarithm operations on a hardware processor based on linear piecewise approximation. An example processor includes a piece wise linear log approximation circuit that receives an input of a floating-point number comprising a sign, an exponent and a mantissa. The piece wise linear log approximation circuit approximates a fractional portion of a fixed point number using a linear approximation of the mantissa of the floating-point number. The piece wise linear log approximation circuit also derives an integer from the exponent.
    Type: Application
    Filed: February 3, 2017
    Publication date: August 9, 2018
    Inventors: Kamlesh R. Pillai, Gurpreet S. Kalsi