Patents by Inventor Gurpreet Singh Kalsi
Gurpreet Singh Kalsi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11949414Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed to improve in-memory multiply and accumulate operations. An example apparatus includes a first multiplexer in a subarray of memory, the first multiplexer to receive first values representative of a column of a lookup table (LUT) including entries to represent products of four-bit numbers and return second values from an intersection of a row and the column of the LUT based on a first element of a first operand; shift and adder logic in the subarray, the shift and adder logic to shift the second values based on at least one of the first element of the first operand or a first element of a second operand; and accumulation storage in the subarray, the accumulation storage to store at least the shifted second values.Type: GrantFiled: December 22, 2020Date of Patent: April 2, 2024Assignee: INTEL CORPORATIONInventors: Gurpreet Singh Kalsi, Akshay Krishna Ramanathan, Kamlesh Pillai, Sreenivas Subramoney, Srivatsa Rangachar Srinivasa, Anirud Thyagharajan, Om Ji Omer, Saurabh Jain
-
Patent number: 11941534Abstract: A system is provided that includes a bit vector-based distance counter circuitry configured to generate one or more bit vectors encoded with information about potential matches and edits between a read and a reference genome, wherein the read comprises an encoding of a fragment of deoxyribonucleic acid (DNA) encoded via bases G, A, T, C. The system further includes a bit vector-based traceback circuitry configured to divide the reference genome into one or more windows and to use the plurality of bit vectors to generate a traceback output for each of the one or more windows, wherein the traceback output comprises a match, a substitution, an insert, a delete, or a combination thereof, between the read and the one or more windows.Type: GrantFiled: December 28, 2019Date of Patent: March 26, 2024Assignee: Intel CorporationInventors: Gurpreet Singh Kalsi, Anant V. Nori, Christopher Justin Hughes, Sreenivas Subramoney, Damla Senol
-
Patent number: 11783170Abstract: Systems, apparatuses and methods may provide for technology that decodes data via an instruction that indicates a number of rulebooks to be processed, an input feature size, an output feature size, and a plurality of feature map base addresses, rearranges spatially distributed voxel output feature maps in the decoded data based on weight planes, and performs a channel-wise multiply-accumulate (MAC) operation on the rearranged spatially distributed voxel output feature maps to obtain an output, wherein the channel-wise MAC operation is performed as partial accumulations by a plurality of processing elements.Type: GrantFiled: January 25, 2023Date of Patent: October 10, 2023Assignee: INTEL CORPORATIONInventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Sreenivas Subramoney, Prashant Laddha, Om Ji Omer
-
Publication number: 20230169315Abstract: Systems, apparatuses and methods may provide for technology that generates a vector output based on a bitmask, wherein the vector output includes non-zero bit indices in a first portion of the vector output, and wherein the non-zero bit indices correspond to non-zero values in the bitmask. The technology may also generate an offset based on the bitmask, wherein the offset indicates a start position in the vector output for the non-zero bit indices.Type: ApplicationFiled: September 1, 2022Publication date: June 1, 2023Inventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Om Ji Omer
-
Publication number: 20230169319Abstract: Systems, apparatuses and methods may provide for technology that decodes data via an instruction that indicates a number of rulebooks to be processed, an input feature size, an output feature size, and a plurality of feature map base addresses, rearranges spatially distributed voxel output feature maps in the decoded data based on weight planes, and performs a channel-wise multiply-accumulate (MAC) operation on the rearranged spatially distributed voxel output feature maps to obtain an output, wherein the channel-wise MAC operation is performed as partial accumulations by a plurality of processing elements.Type: ApplicationFiled: January 25, 2023Publication date: June 1, 2023Inventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Sreenivas Subramoney, Prashant Laddha, Om Ji Omer
-
Patent number: 11620818Abstract: Systems, apparatuses and methods may provide for technology that decodes data via an instruction that indicates a number of rulebooks to be processed, an input feature size, an output feature size, and a plurality of feature map base addresses, rearranges spatially distributed voxel output feature maps in the decoded data based on weight planes, and performs a channel-wise multiply-accumulate (MAC) operation on the rearranged spatially distributed voxel output feature maps to obtain an output, wherein the channel-wise MAC operation is performed as partial accumulations by a plurality of processing elements.Type: GrantFiled: December 22, 2020Date of Patent: April 4, 2023Assignee: INTEL CORPORATIONInventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Sreenivas Subramoney, Prashant Laddha, Om Ji Omer
-
Publication number: 20220391128Abstract: Example compute-in-memory (CIM) or processor-in-memory (PIM) techniques using repurposed or dedicated static random access memory (SRAM) rows of an SRAM sub-array to store look-up-table (LUT) entries for use in a multiply and accumulate (MAC) operation.Type: ApplicationFiled: June 7, 2021Publication date: December 8, 2022Inventors: Saurabh JAIN, Srivatsa RANGACHAR SRINIVASA, Akshay Krishna RAMANATHAN, Gurpreet Singh KALSI, Kamlesh R. PILLAI, Sreenivas SUBRAMONEY
-
Publication number: 20220382514Abstract: Systems, apparatuses, and methods include technology that determines whether an operation is a floating-point based computation or an integer-based computation. When the operation is the floating-point based computation, the technology generates a map of the operation to integer-based compute engines to control the integer-based compute engines to execute the floating-point based computation. When the operation is the integer-based computation, the technology controls the integer-based compute engines to execute the integer-based computation.Type: ApplicationFiled: June 6, 2022Publication date: December 1, 2022Inventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Sreedevi Ambika, Om Omer, Sreenivas Subramoney
-
Publication number: 20220198307Abstract: A processor package comprises at least one Baum-Welch core. The Baum-Welch core comprises a likelihood-value generator, an emission-probability generator, and a transition-probability generator. The likelihood-value generator generates forward values and backward values for a set of observations. The emission-probability generator generates emission probabilities for the set of observations. The transition-probability generator generates transition probabilities for the set of observations. Furthermore, the BW core comprises a look-up table comprising preconfigured transition*emission values to be used by the LV generator when generating FVs and BVs. Other embodiments are described and claimed.Type: ApplicationFiled: December 23, 2020Publication date: June 23, 2022Inventors: KAMLESH PILLAI, GURPREET SINGH KALSI, BHARATHWAJ SURESH
-
Publication number: 20220198306Abstract: A processor package comprises at least one Baum-Welch core. The Baum-Welch core comprises a likelihood-value generator, an emission-probability generator, and a transition-probability generator. The likelihood-value generator generates forward values and backward values for a set of observations. The emission-probability generator generates emission probabilities for the set of observations. The transition-probability generator generates transition probabilities for the set of observations. Furthermore, the BW core is to generate, in parallel, at least two types of probability values from the group consisting of forward values, backward values, emission probabilities, and transition probabilities. Other embodiments are described and claimed.Type: ApplicationFiled: December 23, 2020Publication date: June 23, 2022Inventors: KAMLESH PILLAI, GURPREET SINGH KALSI, BHARATHWAJ SURESH
-
Publication number: 20220113974Abstract: A memory architecture includes processing circuits co-located with memory subarrays for performing computations within the memory architecture. The memory architecture includes a plurality of decoders in hierarchical levels that include a multicast capability for distributing data or compute operations to individual subarrays. The multicast may be configurable with respect to individual fan-outs at each hierarchical level. A computation workflow may be organized into a compute supertile representing one or more “supertiles” of input data to be processed in the compute supertile. The individual data tiles of the input data supertile may be used by multiple compute tiles executed by the processing circuits of the subarrays, and the data tiles multicast to the respective processing circuits for efficient data loading and parallel computation.Type: ApplicationFiled: December 23, 2021Publication date: April 14, 2022Applicant: INTEL CORPORATIONInventors: Om Ji Omer, Gurpreet Singh Kalsi, Anirud Thyagharajan, Saurabh Jain, Kamlesh R. Pillai, Sreenivas Subramoney, Avishaii Abuhatzera
-
Publication number: 20220012571Abstract: Apparatuses and articles of manufacture are disclosed. An example apparatus includes an activation function control and decode circuitry to populate an input buffer circuitry with an input data element bit subset of less than a threshold number of bits of the input data element retrieved from the memory circuitry. The activation function and control circuitry also populate a kernel weight buffer circuitry with a weight data element bit subset of less than the threshold number of bits of the weight data element retrieved from the memory circuitry. The apparatus also including a preprocessor circuitry to calculate a partial convolution value of at least a portion of the input data element bit subset and the weight data element bit subset to determine a predicted sign of the partial convolution value.Type: ApplicationFiled: September 24, 2021Publication date: January 13, 2022Inventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Bharathwaj Suresh, Sreenivas Subramoney, Avishaii Abuhatzera
-
Publication number: 20210200711Abstract: A system is provided that includes a reconfigurable systolic array circuitry. The reconfigurable systolic array circuitry includes a first circuit block comprising one or more groups of processing elements and a second circuit block comprising one or more groups of processing elements. The reconfigurable systolic array circuitry further includes a first bias addition with accumulation circuitry configured to add a matrix bias to an accumulated value, to a multiplication product, or to a combination thereof. The reconfigurable systolic array circuitry additionally includes a first routing circuitry configured to route derivations from the first circuit block into the second circuit block, from the first circuit block into the first bias addition with accumulation circuitry, or into a combination thereof.Type: ApplicationFiled: December 28, 2019Publication date: July 1, 2021Inventors: Kamlesh R. Pillai, Gurpreet Singh Kalsi, Christopher Justin Hughes
-
Publication number: 20210201163Abstract: A system is provided that includes a bit vector-based distance counter circuitry configured to generate one or more bit vectors encoded with information about potential matches and edits between a read and a reference genome, wherein the read comprises an encoding of a fragment of deoxyribonucleic acid (DNA) encoded via bases G, A, T, C. The system further includes a bit vector-based traceback circuitry configured to divide the reference genome into one or more windows and to use the plurality of bit vectors to generate a traceback output for each of the one or more windows, wherein the traceback output comprises a match, a substitution, an insert, a delete, or a combination thereof, between the read and the one or more windows.Type: ApplicationFiled: December 28, 2019Publication date: July 1, 2021Inventors: Gurpreet Singh Kalsi, Anant V. Nori, Christopher Justin Hughes, Sreenivas Subramoney, Damla Senol
-
Publication number: 20210110187Abstract: Systems, apparatuses and methods may provide for technology that decodes data via an instruction that indicates a number of rulebooks to be processed, an input feature size, an output feature size, and a plurality of feature map base addresses, rearranges spatially distributed voxel output feature maps in the decoded data based on weight planes, and performs a channel-wise multiply-accumulate (MAC) operation on the rearranged spatially distributed voxel output feature maps to obtain an output, wherein the channel-wise MAC operation is performed as partial accumulations by a plurality of processing elements.Type: ApplicationFiled: December 22, 2020Publication date: April 15, 2021Inventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Sreenivas Subramoney, Prashant Laddha, Om Ji Omer
-
Publication number: 20210111722Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed to improve in-memory multiply and accumulate operations. An example apparatus includes a first multiplexer in a subarray of memory, the first multiplexer to receive first values representative of a column of a lookup table (LUT) including entries to represent products of four-bit numbers and return second values from an intersection of a row and the column of the LUT based on a first element of a first operand; shift and adder logic in the subarray, the shift and adder logic to shift the second values based on at least one of the first element of the first operand or a first element of a second operand; and accumulation storage in the subarray, the accumulation storage to store at least the shifted second values.Type: ApplicationFiled: December 22, 2020Publication date: April 15, 2021Inventors: Gurpreet Singh Kalsi, Akshay Krishna Ramanathan, Kamlesh Pillai, Sreenivas Subramoney, Srivatsa Rangachar Srinivasa, Anirud Thyagharajan, Om Ji Omer, Saurabh Jain
-
Patent number: 10540420Abstract: Systems and methods for a hardware accelerated matrix decomposition matrix decomposition circuit are described herein. This matrix decomposition circuit splits matrix decomposition operations into parallel operation circuits and serial operation circuits, and joins the parallel and serial operation circuits using specific dependency handling logic for efficient parallel execution. This provides fast matrix decomposition with low power consumption, reduced memory footprint, and reduced memory bandwidth.Type: GrantFiled: December 29, 2017Date of Patent: January 21, 2020Assignee: Intel CorporationInventors: Gurpreet Singh Kalsi, Om Ji Omer, Santhosh Kumar Rethinagiri, Anish N K, Dipan Kumar Mandal
-
Patent number: 10324689Abstract: Systems and methods for matrix-solve applications include a memory-optimized hardware acceleration (HWA) solution with scalable architecture (i.e. specialized circuitry) for HWA matrix-solve operations. The matrix-solve solutions described herein may include a scalable hardware architecture with parallel processing (e.g., “within column” processing), which provides the ability to compute several output values in parallel. The HWA matrix-solve solutions described herein may include simultaneous multi-column processing, which provides a lower execution cycle count and a reduced total number of memory accesses. This HWA matrix-solve provides a low latency and energy-efficient matrix-solve solutions, which may be used to reduce energy consumption and improve performance in various matrix-based applications, such as computer vision, SLAM, AR/VR/mixed-reality, machine learning, data analytics, and other matrix-based applications.Type: GrantFiled: November 21, 2017Date of Patent: June 18, 2019Assignee: Intel IP CorporationInventors: Gurpreet Singh Kalsi, Om Ji Omer, Dipan Kumar Mandal, Santhosh Kumar Rethinagiri, Gopi Neela
-
Publication number: 20190042539Abstract: Systems and methods for a hardware accelerated matrix decomposition matrix decomposition circuit are described herein. This matrix decomposition circuit splits matrix decomposition operations into parallel operation circuits and serial operation circuits, and joins the parallel and serial operation circuits using specific dependency handling logic for efficient parallel execution. This provides fast matrix decomposition with low power consumption, reduced memory footprint, and reduced memory bandwidth.Type: ApplicationFiled: December 29, 2017Publication date: February 7, 2019Inventors: Gurpreet Singh Kalsi, Om Ji Omer, Santhosh Kumar Rethinagiri, Anish N K, Dipan Kumar Mandal
-
Publication number: 20190042195Abstract: Systems and methods for matrix-solve applications include a memory-optimized hardware acceleration (HWA) solution with scalable architecture (i.e. specialized circuitry) for HWA matrix-solve operations. The matrix-solve solutions described herein may include a scalable hardware architecture with parallel processing (e.g., “within column” processing), which provides the ability to compute several output values in parallel. The HWA matrix-solve solutions described herein may include simultaneous multi-column processing, which provides a lower execution cycle count and a reduced total number of memory accesses. This HWA matrix-solve provides a low latency and energy-efficient matrix-solve solutions, which may be used to reduce energy consumption and improve performance in various matrix-based applications, such as computer vision, SLAM, AR/VR/mixed-reality, machine learning, data analytics, and other matrix-based applications.Type: ApplicationFiled: November 21, 2017Publication date: February 7, 2019Inventors: Gurpreet Singh Kalsi, Om Ji Omer, Dipan Kumar Mandal, Santhosh Kumar Rethinagiri, Gopi Neela