Patents by Inventor Kamlesh R. Pillai

Kamlesh R. Pillai has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

TECHNIQUES TO REPURPOSE STATIC RANDOM ACCESS MEMORY ROWS TO STORE A LOOK-UP-TABLE FOR PROCESSOR-IN-MEMORY OPERATIONS

Publication number: 20220391128

Abstract: Example compute-in-memory (CIM) or processor-in-memory (PIM) techniques using repurposed or dedicated static random access memory (SRAM) rows of an SRAM sub-array to store look-up-table (LUT) entries for use in a multiply and accumulate (MAC) operation.

Type: Application

Filed: June 7, 2021

Publication date: December 8, 2022

Inventors: Saurabh JAIN, Srivatsa RANGACHAR SRINIVASA, Akshay Krishna RAMANATHAN, Gurpreet Singh KALSI, Kamlesh R. PILLAI, Sreenivas SUBRAMONEY
HARDWARE-SOFTWARE CO-DESIGNED MULTI-CAST FOR IN-MEMORY COMPUTING ARCHITECTURES

Publication number: 20220113974

Abstract: A memory architecture includes processing circuits co-located with memory subarrays for performing computations within the memory architecture. The memory architecture includes a plurality of decoders in hierarchical levels that include a multicast capability for distributing data or compute operations to individual subarrays. The multicast may be configurable with respect to individual fan-outs at each hierarchical level. A computation workflow may be organized into a compute supertile representing one or more “supertiles” of input data to be processed in the compute supertile. The individual data tiles of the input data supertile may be used by multiple compute tiles executed by the processing circuits of the subarrays, and the data tiles multicast to the respective processing circuits for efficient data loading and parallel computation.

Type: Application

Filed: December 23, 2021

Publication date: April 14, 2022

Applicant: INTEL CORPORATION

Inventors: Om Ji Omer, Gurpreet Singh Kalsi, Anirud Thyagharajan, Saurabh Jain, Kamlesh R. Pillai, Sreenivas Subramoney, Avishaii Abuhatzera
Systems and methods for reconfigurable systolic arrays

Patent number: 11169957

Abstract: Systems and techniques are provided for hardware architecture used in parallel computing applications to improve computation efficiency. An integrated circuit system may include a data store that stores data for processing and a reconfigurable systolic array that may process the data. The reconfigurable systolic array may include a first row of processing elements (PE) that process the data according to a first function and a second row of PE that process the data according to a second function. The reconfigurable systolic array may also include a routing block coupled to the first row of PE, the second row of PE, and the data store. Further, the reconfigurable systolic array may receive data from the first row of PE, transmit the data received from the first row of PE to the second row of PE, and transmit data output by the second row of PE to the first row of PE.

Type: Grant

Filed: March 31, 2019

Date of Patent: November 9, 2021

Assignee: Intel Corporation

Inventors: Kamlesh R. Pillai, Christopher J. Hughes
Discrete cosine transform/inverse discrete cosine transform (DCT/IDCT) systems and methods

Patent number: 11138290

Abstract: The present disclosure is directed to systems and methods for performing discrete cosine transforms and inverse discrete cosine transforms (DCT/IDCT) using a CORDIC algorithm implemented in systolic array circuitry that includes a plurality cells or nodes, each containing circuitry to implement the CORDIC algorithm. DCT/IDCT control circuitry multiplies the systolic array output matrix generated by the systolic array circuitry by a scaling factor that may include a defined scaling value or an actual cosine value. The DCT/IDCT control circuitry causes the transfer of the scaled systolic array output matrix to combination circuitry where the DCT/IDCT input matrix is combined with the scaled systolic array output matrix to provide the DCT/IDCT output matrix. The DCT/IDCT control circuitry also transfers bypass information to at least a portion of the cells or nodes in the systolic array circuitry.

Type: Grant

Filed: March 30, 2019

Date of Patent: October 5, 2021

Assignee: Intel Corporation

Inventors: Kamlesh R. Pillai, Christopher J. Hughes
System and Method for Configurable Systolic Array with Partial Read/Write

Publication number: 20210200711

Abstract: A system is provided that includes a reconfigurable systolic array circuitry. The reconfigurable systolic array circuitry includes a first circuit block comprising one or more groups of processing elements and a second circuit block comprising one or more groups of processing elements. The reconfigurable systolic array circuitry further includes a first bias addition with accumulation circuitry configured to add a matrix bias to an accumulated value, to a multiplication product, or to a combination thereof. The reconfigurable systolic array circuitry additionally includes a first routing circuitry configured to route derivations from the first circuit block into the second circuit block, from the first circuit block into the first bias addition with accumulation circuitry, or into a combination thereof.

Type: Application

Filed: December 28, 2019

Publication date: July 1, 2021

Inventors: Kamlesh R. Pillai, Gurpreet Singh Kalsi, Christopher Justin Hughes
Techniques for Accelerating Neural Networks

Publication number: 20210166114

Abstract: Embodiments are generally directed to techniques for accelerating neural networks. Many embodiments include a hardware accelerator for a bi-directional multi-layered GRU and LC neural network. Some embodiments are particularly directed to a hardware accelerator that enables offloading of the entire LC+GRU network to the hardware accelerator. Various embodiments include a hardware accelerator with a plurality of matrix vector units to perform GRU steps in parallel with LC steps. For example, at least a portion of computation by a first matrix vector unit of a GRU step in a neural network may overlap at least a portion of computation by a second matrix vector unit of an output feature vector for the neural network. Several embodiments include overlapping computation associated with a layer of a neural network with data transfer associated with another of the neural network.

Type: Application

Filed: February 10, 2021

Publication date: June 3, 2021

Applicant: Intel Corporation

Inventors: Gurpreet S Kalsi, Ramachandra Chakenalli Nanjegowda, Kamlesh R Pillai, Sreenivas Subramoney
SYSTEMS AND METHODS FOR RECONFIGURABLE SYSTOLIC ARRAYS

Publication number: 20200311021

Abstract: Systems and techniques are provided for hardware architecture used in parallel computing applications to improve computation efficiency. An integrated circuit system may include a data store that stores data for processing and a reconfigurable systolic array that may process the data. The reconfigurable systolic array may include a first row of processing elements (PE) that process the data according to a first function and a second row of PE that process the data according to a second function. The reconfigurable systolic array may also include a routing block coupled to the first row of PE, the second row of PE, and the data store. Further, the reconfigurable systolic array may receive data from the first row of PE, transmit the data received from the first row of PE to the second row of PE, and transmit data output by the second row of PE to the first row of PE.

Type: Application

Filed: March 31, 2019

Publication date: October 1, 2020

Inventors: Kamlesh R. Pillai, Christopher J. Hughes
APPARATUSES, METHODS, AND SYSTEMS FOR INSTRUCTIONS OF A MATRIX OPERATIONS ACCELERATOR

Publication number: 20200026745

Abstract: Systems, methods, and apparatuses relating to a matrix operations accelerator are described.

Type: Application

Filed: September 27, 2019

Publication date: January 23, 2020

Inventors: Kamlesh R. Pillai, Christopher J. Hughes, Alexander Heinecke
Unified multifunction circuitry

Patent number: 10528322

Abstract: One embodiment provides a unified multifunction circuitry. The unified multifunction circuitry includes a logarithm circuitry and an antilogarithm circuitry. The logarithm circuitry is to determine a log output operand. The log output operand includes a piecewise linear approximation of a base 2 logarithm of a significand of a log input operand. The antilogarithm circuitry is to determine an antilog output operand. The antilog output operand includes a piecewise linear approximation of a base 2 antilogarithm of a fraction of a selected input operand.

Type: Grant

Filed: December 29, 2017

Date of Patent: January 7, 2020

Assignee: Intel IP Corporation

Inventors: Kamlesh R. Pillai, Gurpreet S. Kalsi
Implementing logarithmic and antilogarithmic operations based on piecewise linear approximation

Patent number: 10445064

Abstract: Implementations of the disclosure provide logarithm and anti-logarithm operations on a hardware processor based on linear piecewise approximation. An example processor includes a piece wise linear log approximation circuit that receives an input of a floating-point number comprising a sign, an exponent and a mantissa. The piece wise linear log approximation circuit approximates a fractional portion of a fixed point number using a linear approximation of the mantissa of the floating-point number. The piece wise linear log approximation circuit also derives an integer from the exponent.

Type: Grant

Filed: February 3, 2017

Date of Patent: October 15, 2019

Assignee: Intel Corporation

Inventors: Kamlesh R. Pillai, Gurpreet S. Kalsi
DISCRETE COSINE TRANSFORM/INVERSE DISCRETE COSINE TRANSFORM (DCT/IDCT) SYSTEMS AND METHODS

Publication number: 20190228049

Abstract: The present disclosure is directed to systems and methods for performing discrete cosine transforms and inverse discrete cosine transforms (DCT/IDCT) using a CORDIC algorithm implemented in systolic array circuitry that includes a plurality cells or nodes, each containing circuitry to implement the CORDIC algorithm. DCT/IDCT control circuitry multiplies the systolic array output matrix generated by the systolic array circuitry by a scaling factor that may include a defined scaling value or an actual cosine value. The DCT/IDCT control circuitry causes the transfer of the scaled systolic array output matrix to combination circuitry where the DCT/IDCT input matrix is combined with the scaled systolic array output matrix to provide the DCT/IDCT output matrix. The DCT/IDCT control circuitry also transfers bypass information to at least a portion of the cells or nodes in the systolic array circuitry.

Type: Application

Filed: March 30, 2019

Publication date: July 25, 2019

Applicant: Intel Corporation

Inventors: Kamlesh R. Pillai, Christopher J. Hughes
UNIFIED MULTIFUNCTION CIRCUITRY

Publication number: 20190042192

Abstract: One embodiment provides a unified multifunction circuitry. The unified multifunction circuitry includes a logarithm circuitry and an antilogarithm circuitry. The logarithm circuitry is to determine a log output operand. The log output operand includes a piecewise linear approximation of a base 2 logarithm of a significand of a log input operand. The antilogarithm circuitry is to determine an antilog output operand. The antilog output operand includes a piecewise linear approximation of a base 2 antilogarithm of a fraction of a selected input operand.

Type: Application

Filed: December 29, 2017

Publication date: February 7, 2019

Applicant: Intel IP Corporation

Inventors: Kamlesh R. Pillai, GURPREET S. KALSI
IMPLEMENTING LOGARITHMIC AND ANTILOGARITHMIC OPERATIONS BASED ON PIECEWISE LINEAR APPROXIMATION

Publication number: 20180225093

Abstract: Implementations of the disclosure provide logarithm and anti-logarithm operations on a hardware processor based on linear piecewise approximation. An example processor includes a piece wise linear log approximation circuit that receives an input of a floating-point number comprising a sign, an exponent and a mantissa. The piece wise linear log approximation circuit approximates a fractional portion of a fixed point number using a linear approximation of the mantissa of the floating-point number. The piece wise linear log approximation circuit also derives an integer from the exponent.

Type: Application

Filed: February 3, 2017

Publication date: August 9, 2018

Inventors: Kamlesh R. Pillai, Gurpreet S. Kalsi