Patents by Inventor Kamlesh R. Pillai
Kamlesh R. Pillai has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20220391128Abstract: Example compute-in-memory (CIM) or processor-in-memory (PIM) techniques using repurposed or dedicated static random access memory (SRAM) rows of an SRAM sub-array to store look-up-table (LUT) entries for use in a multiply and accumulate (MAC) operation.Type: ApplicationFiled: June 7, 2021Publication date: December 8, 2022Inventors: Saurabh JAIN, Srivatsa RANGACHAR SRINIVASA, Akshay Krishna RAMANATHAN, Gurpreet Singh KALSI, Kamlesh R. PILLAI, Sreenivas SUBRAMONEY
-
Publication number: 20220113974Abstract: A memory architecture includes processing circuits co-located with memory subarrays for performing computations within the memory architecture. The memory architecture includes a plurality of decoders in hierarchical levels that include a multicast capability for distributing data or compute operations to individual subarrays. The multicast may be configurable with respect to individual fan-outs at each hierarchical level. A computation workflow may be organized into a compute supertile representing one or more “supertiles” of input data to be processed in the compute supertile. The individual data tiles of the input data supertile may be used by multiple compute tiles executed by the processing circuits of the subarrays, and the data tiles multicast to the respective processing circuits for efficient data loading and parallel computation.Type: ApplicationFiled: December 23, 2021Publication date: April 14, 2022Applicant: INTEL CORPORATIONInventors: Om Ji Omer, Gurpreet Singh Kalsi, Anirud Thyagharajan, Saurabh Jain, Kamlesh R. Pillai, Sreenivas Subramoney, Avishaii Abuhatzera
-
Patent number: 11169957Abstract: Systems and techniques are provided for hardware architecture used in parallel computing applications to improve computation efficiency. An integrated circuit system may include a data store that stores data for processing and a reconfigurable systolic array that may process the data. The reconfigurable systolic array may include a first row of processing elements (PE) that process the data according to a first function and a second row of PE that process the data according to a second function. The reconfigurable systolic array may also include a routing block coupled to the first row of PE, the second row of PE, and the data store. Further, the reconfigurable systolic array may receive data from the first row of PE, transmit the data received from the first row of PE to the second row of PE, and transmit data output by the second row of PE to the first row of PE.Type: GrantFiled: March 31, 2019Date of Patent: November 9, 2021Assignee: Intel CorporationInventors: Kamlesh R. Pillai, Christopher J. Hughes
-
Patent number: 11138290Abstract: The present disclosure is directed to systems and methods for performing discrete cosine transforms and inverse discrete cosine transforms (DCT/IDCT) using a CORDIC algorithm implemented in systolic array circuitry that includes a plurality cells or nodes, each containing circuitry to implement the CORDIC algorithm. DCT/IDCT control circuitry multiplies the systolic array output matrix generated by the systolic array circuitry by a scaling factor that may include a defined scaling value or an actual cosine value. The DCT/IDCT control circuitry causes the transfer of the scaled systolic array output matrix to combination circuitry where the DCT/IDCT input matrix is combined with the scaled systolic array output matrix to provide the DCT/IDCT output matrix. The DCT/IDCT control circuitry also transfers bypass information to at least a portion of the cells or nodes in the systolic array circuitry.Type: GrantFiled: March 30, 2019Date of Patent: October 5, 2021Assignee: Intel CorporationInventors: Kamlesh R. Pillai, Christopher J. Hughes
-
Publication number: 20210200711Abstract: A system is provided that includes a reconfigurable systolic array circuitry. The reconfigurable systolic array circuitry includes a first circuit block comprising one or more groups of processing elements and a second circuit block comprising one or more groups of processing elements. The reconfigurable systolic array circuitry further includes a first bias addition with accumulation circuitry configured to add a matrix bias to an accumulated value, to a multiplication product, or to a combination thereof. The reconfigurable systolic array circuitry additionally includes a first routing circuitry configured to route derivations from the first circuit block into the second circuit block, from the first circuit block into the first bias addition with accumulation circuitry, or into a combination thereof.Type: ApplicationFiled: December 28, 2019Publication date: July 1, 2021Inventors: Kamlesh R. Pillai, Gurpreet Singh Kalsi, Christopher Justin Hughes
-
Publication number: 20210166114Abstract: Embodiments are generally directed to techniques for accelerating neural networks. Many embodiments include a hardware accelerator for a bi-directional multi-layered GRU and LC neural network. Some embodiments are particularly directed to a hardware accelerator that enables offloading of the entire LC+GRU network to the hardware accelerator. Various embodiments include a hardware accelerator with a plurality of matrix vector units to perform GRU steps in parallel with LC steps. For example, at least a portion of computation by a first matrix vector unit of a GRU step in a neural network may overlap at least a portion of computation by a second matrix vector unit of an output feature vector for the neural network. Several embodiments include overlapping computation associated with a layer of a neural network with data transfer associated with another of the neural network.Type: ApplicationFiled: February 10, 2021Publication date: June 3, 2021Applicant: Intel CorporationInventors: Gurpreet S Kalsi, Ramachandra Chakenalli Nanjegowda, Kamlesh R Pillai, Sreenivas Subramoney
-
Publication number: 20200311021Abstract: Systems and techniques are provided for hardware architecture used in parallel computing applications to improve computation efficiency. An integrated circuit system may include a data store that stores data for processing and a reconfigurable systolic array that may process the data. The reconfigurable systolic array may include a first row of processing elements (PE) that process the data according to a first function and a second row of PE that process the data according to a second function. The reconfigurable systolic array may also include a routing block coupled to the first row of PE, the second row of PE, and the data store. Further, the reconfigurable systolic array may receive data from the first row of PE, transmit the data received from the first row of PE to the second row of PE, and transmit data output by the second row of PE to the first row of PE.Type: ApplicationFiled: March 31, 2019Publication date: October 1, 2020Inventors: Kamlesh R. Pillai, Christopher J. Hughes
-
Publication number: 20200026745Abstract: Systems, methods, and apparatuses relating to a matrix operations accelerator are described.Type: ApplicationFiled: September 27, 2019Publication date: January 23, 2020Inventors: Kamlesh R. Pillai, Christopher J. Hughes, Alexander Heinecke
-
Patent number: 10528322Abstract: One embodiment provides a unified multifunction circuitry. The unified multifunction circuitry includes a logarithm circuitry and an antilogarithm circuitry. The logarithm circuitry is to determine a log output operand. The log output operand includes a piecewise linear approximation of a base 2 logarithm of a significand of a log input operand. The antilogarithm circuitry is to determine an antilog output operand. The antilog output operand includes a piecewise linear approximation of a base 2 antilogarithm of a fraction of a selected input operand.Type: GrantFiled: December 29, 2017Date of Patent: January 7, 2020Assignee: Intel IP CorporationInventors: Kamlesh R. Pillai, Gurpreet S. Kalsi
-
Patent number: 10445064Abstract: Implementations of the disclosure provide logarithm and anti-logarithm operations on a hardware processor based on linear piecewise approximation. An example processor includes a piece wise linear log approximation circuit that receives an input of a floating-point number comprising a sign, an exponent and a mantissa. The piece wise linear log approximation circuit approximates a fractional portion of a fixed point number using a linear approximation of the mantissa of the floating-point number. The piece wise linear log approximation circuit also derives an integer from the exponent.Type: GrantFiled: February 3, 2017Date of Patent: October 15, 2019Assignee: Intel CorporationInventors: Kamlesh R. Pillai, Gurpreet S. Kalsi
-
Publication number: 20190228049Abstract: The present disclosure is directed to systems and methods for performing discrete cosine transforms and inverse discrete cosine transforms (DCT/IDCT) using a CORDIC algorithm implemented in systolic array circuitry that includes a plurality cells or nodes, each containing circuitry to implement the CORDIC algorithm. DCT/IDCT control circuitry multiplies the systolic array output matrix generated by the systolic array circuitry by a scaling factor that may include a defined scaling value or an actual cosine value. The DCT/IDCT control circuitry causes the transfer of the scaled systolic array output matrix to combination circuitry where the DCT/IDCT input matrix is combined with the scaled systolic array output matrix to provide the DCT/IDCT output matrix. The DCT/IDCT control circuitry also transfers bypass information to at least a portion of the cells or nodes in the systolic array circuitry.Type: ApplicationFiled: March 30, 2019Publication date: July 25, 2019Applicant: Intel CorporationInventors: Kamlesh R. Pillai, Christopher J. Hughes
-
Publication number: 20190042192Abstract: One embodiment provides a unified multifunction circuitry. The unified multifunction circuitry includes a logarithm circuitry and an antilogarithm circuitry. The logarithm circuitry is to determine a log output operand. The log output operand includes a piecewise linear approximation of a base 2 logarithm of a significand of a log input operand. The antilogarithm circuitry is to determine an antilog output operand. The antilog output operand includes a piecewise linear approximation of a base 2 antilogarithm of a fraction of a selected input operand.Type: ApplicationFiled: December 29, 2017Publication date: February 7, 2019Applicant: Intel IP CorporationInventors: Kamlesh R. Pillai, GURPREET S. KALSI
-
Publication number: 20180225093Abstract: Implementations of the disclosure provide logarithm and anti-logarithm operations on a hardware processor based on linear piecewise approximation. An example processor includes a piece wise linear log approximation circuit that receives an input of a floating-point number comprising a sign, an exponent and a mantissa. The piece wise linear log approximation circuit approximates a fractional portion of a fixed point number using a linear approximation of the mantissa of the floating-point number. The piece wise linear log approximation circuit also derives an integer from the exponent.Type: ApplicationFiled: February 3, 2017Publication date: August 9, 2018Inventors: Kamlesh R. Pillai, Gurpreet S. Kalsi