Patents by Inventor Gurpreet Singh Kalsi

Gurpreet Singh Kalsi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Methods, apparatus, and articles of manufacture to improve in-memory multiply and accumulate operations

Patent number: 11949414

Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed to improve in-memory multiply and accumulate operations. An example apparatus includes a first multiplexer in a subarray of memory, the first multiplexer to receive first values representative of a column of a lookup table (LUT) including entries to represent products of four-bit numbers and return second values from an intersection of a row and the column of the LUT based on a first element of a first operand; shift and adder logic in the subarray, the shift and adder logic to shift the second values based on at least one of the first element of the first operand or a first element of a second operand; and accumulation storage in the subarray, the accumulation storage to store at least the shifted second values.

Type: Grant

Filed: December 22, 2020

Date of Patent: April 2, 2024

Assignee: INTEL CORPORATION

Inventors: Gurpreet Singh Kalsi, Akshay Krishna Ramanathan, Kamlesh Pillai, Sreenivas Subramoney, Srivatsa Rangachar Srinivasa, Anirud Thyagharajan, Om Ji Omer, Saurabh Jain
Genome sequence alignment system and method

Patent number: 11941534

Abstract: A system is provided that includes a bit vector-based distance counter circuitry configured to generate one or more bit vectors encoded with information about potential matches and edits between a read and a reference genome, wherein the read comprises an encoding of a fragment of deoxyribonucleic acid (DNA) encoded via bases G, A, T, C. The system further includes a bit vector-based traceback circuitry configured to divide the reference genome into one or more windows and to use the plurality of bit vectors to generate a traceback output for each of the one or more windows, wherein the traceback output comprises a match, a substitution, an insert, a delete, or a combination thereof, between the read and the one or more windows.

Type: Grant

Filed: December 28, 2019

Date of Patent: March 26, 2024

Assignee: Intel Corporation

Inventors: Gurpreet Singh Kalsi, Anant V. Nori, Christopher Justin Hughes, Sreenivas Subramoney, Damla Senol
Spatially sparse neural network accelerator for multi-dimension visual analytics

Patent number: 11783170

Abstract: Systems, apparatuses and methods may provide for technology that decodes data via an instruction that indicates a number of rulebooks to be processed, an input feature size, an output feature size, and a plurality of feature map base addresses, rearranges spatially distributed voxel output feature maps in the decoded data based on weight planes, and performs a channel-wise multiply-accumulate (MAC) operation on the rearranged spatially distributed voxel output feature maps to obtain an output, wherein the channel-wise MAC operation is performed as partial accumulations by a plurality of processing elements.

Type: Grant

Filed: January 25, 2023

Date of Patent: October 10, 2023

Assignee: INTEL CORPORATION

Inventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Sreenivas Subramoney, Prashant Laddha, Om Ji Omer
SPARSE INDEX GENERATOR

Publication number: 20230169315

Abstract: Systems, apparatuses and methods may provide for technology that generates a vector output based on a bitmask, wherein the vector output includes non-zero bit indices in a first portion of the vector output, and wherein the non-zero bit indices correspond to non-zero values in the bitmask. The technology may also generate an offset based on the bitmask, wherein the offset indicates a start position in the vector output for the non-zero bit indices.

Type: Application

Filed: September 1, 2022

Publication date: June 1, 2023

Inventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Om Ji Omer
SPATIALLY SPARSE NEURAL NETWORK ACCELERATOR FOR MULTI-DIMENSION VISUAL ANALYTICS

Publication number: 20230169319

Abstract: Systems, apparatuses and methods may provide for technology that decodes data via an instruction that indicates a number of rulebooks to be processed, an input feature size, an output feature size, and a plurality of feature map base addresses, rearranges spatially distributed voxel output feature maps in the decoded data based on weight planes, and performs a channel-wise multiply-accumulate (MAC) operation on the rearranged spatially distributed voxel output feature maps to obtain an output, wherein the channel-wise MAC operation is performed as partial accumulations by a plurality of processing elements.

Type: Application

Filed: January 25, 2023

Publication date: June 1, 2023

Inventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Sreenivas Subramoney, Prashant Laddha, Om Ji Omer
Spatially sparse neural network accelerator for multi-dimension visual analytics

Patent number: 11620818

Abstract: Systems, apparatuses and methods may provide for technology that decodes data via an instruction that indicates a number of rulebooks to be processed, an input feature size, an output feature size, and a plurality of feature map base addresses, rearranges spatially distributed voxel output feature maps in the decoded data based on weight planes, and performs a channel-wise multiply-accumulate (MAC) operation on the rearranged spatially distributed voxel output feature maps to obtain an output, wherein the channel-wise MAC operation is performed as partial accumulations by a plurality of processing elements.

Type: Grant

Filed: December 22, 2020

Date of Patent: April 4, 2023

Assignee: INTEL CORPORATION

Inventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Sreenivas Subramoney, Prashant Laddha, Om Ji Omer
TECHNIQUES TO REPURPOSE STATIC RANDOM ACCESS MEMORY ROWS TO STORE A LOOK-UP-TABLE FOR PROCESSOR-IN-MEMORY OPERATIONS

Publication number: 20220391128

Abstract: Example compute-in-memory (CIM) or processor-in-memory (PIM) techniques using repurposed or dedicated static random access memory (SRAM) rows of an SRAM sub-array to store look-up-table (LUT) entries for use in a multiply and accumulate (MAC) operation.

Type: Application

Filed: June 7, 2021

Publication date: December 8, 2022

Inventors: Saurabh JAIN, Srivatsa RANGACHAR SRINIVASA, Akshay Krishna RAMANATHAN, Gurpreet Singh KALSI, Kamlesh R. PILLAI, Sreenivas SUBRAMONEY
CONTROL LOGIC FOR CONFIGURABLE AND SCALABLE MULTI-PRECISION OPERATION

Publication number: 20220382514

Abstract: Systems, apparatuses, and methods include technology that determines whether an operation is a floating-point based computation or an integer-based computation. When the operation is the floating-point based computation, the technology generates a map of the operation to integer-based compute engines to control the integer-based compute engines to execute the floating-point based computation. When the operation is the integer-based computation, the technology controls the integer-based compute engines to execute the integer-based computation.

Type: Application

Filed: June 6, 2022

Publication date: December 1, 2022

Inventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Sreedevi Ambika, Om Omer, Sreenivas Subramoney
Baum-Welch Accelerator

Publication number: 20220198307

Abstract: A processor package comprises at least one Baum-Welch core. The Baum-Welch core comprises a likelihood-value generator, an emission-probability generator, and a transition-probability generator. The likelihood-value generator generates forward values and backward values for a set of observations. The emission-probability generator generates emission probabilities for the set of observations. The transition-probability generator generates transition probabilities for the set of observations. Furthermore, the BW core comprises a look-up table comprising preconfigured transition*emission values to be used by the LV generator when generating FVs and BVs. Other embodiments are described and claimed.

Type: Application

Filed: December 23, 2020

Publication date: June 23, 2022

Inventors: KAMLESH PILLAI, GURPREET SINGH KALSI, BHARATHWAJ SURESH
Baum-Welch Accelerator

Publication number: 20220198306

Abstract: A processor package comprises at least one Baum-Welch core. The Baum-Welch core comprises a likelihood-value generator, an emission-probability generator, and a transition-probability generator. The likelihood-value generator generates forward values and backward values for a set of observations. The emission-probability generator generates emission probabilities for the set of observations. The transition-probability generator generates transition probabilities for the set of observations. Furthermore, the BW core is to generate, in parallel, at least two types of probability values from the group consisting of forward values, backward values, emission probabilities, and transition probabilities. Other embodiments are described and claimed.

Type: Application

Filed: December 23, 2020

Publication date: June 23, 2022

Inventors: KAMLESH PILLAI, GURPREET SINGH KALSI, BHARATHWAJ SURESH
HARDWARE-SOFTWARE CO-DESIGNED MULTI-CAST FOR IN-MEMORY COMPUTING ARCHITECTURES

Publication number: 20220113974

Abstract: A memory architecture includes processing circuits co-located with memory subarrays for performing computations within the memory architecture. The memory architecture includes a plurality of decoders in hierarchical levels that include a multicast capability for distributing data or compute operations to individual subarrays. The multicast may be configurable with respect to individual fan-outs at each hierarchical level. A computation workflow may be organized into a compute supertile representing one or more “supertiles” of input data to be processed in the compute supertile. The individual data tiles of the input data supertile may be used by multiple compute tiles executed by the processing circuits of the subarrays, and the data tiles multicast to the respective processing circuits for efficient data loading and parallel computation.

Type: Application

Filed: December 23, 2021

Publication date: April 14, 2022

Applicant: INTEL CORPORATION

Inventors: Om Ji Omer, Gurpreet Singh Kalsi, Anirud Thyagharajan, Saurabh Jain, Kamlesh R. Pillai, Sreenivas Subramoney, Avishaii Abuhatzera
APPARATUS, METHOD, AND COMPUTER-READABLE MEDIUM FOR ACTIVATION FUNCTION PREDICTION IN DEEP NEURAL NETWORKS

Publication number: 20220012571

Abstract: Apparatuses and articles of manufacture are disclosed. An example apparatus includes an activation function control and decode circuitry to populate an input buffer circuitry with an input data element bit subset of less than a threshold number of bits of the input data element retrieved from the memory circuitry. The activation function and control circuitry also populate a kernel weight buffer circuitry with a weight data element bit subset of less than the threshold number of bits of the weight data element retrieved from the memory circuitry. The apparatus also including a preprocessor circuitry to calculate a partial convolution value of at least a portion of the input data element bit subset and the weight data element bit subset to determine a predicted sign of the partial convolution value.

Type: Application

Filed: September 24, 2021

Publication date: January 13, 2022

Inventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Bharathwaj Suresh, Sreenivas Subramoney, Avishaii Abuhatzera
System and Method for Configurable Systolic Array with Partial Read/Write

Publication number: 20210200711

Abstract: A system is provided that includes a reconfigurable systolic array circuitry. The reconfigurable systolic array circuitry includes a first circuit block comprising one or more groups of processing elements and a second circuit block comprising one or more groups of processing elements. The reconfigurable systolic array circuitry further includes a first bias addition with accumulation circuitry configured to add a matrix bias to an accumulated value, to a multiplication product, or to a combination thereof. The reconfigurable systolic array circuitry additionally includes a first routing circuitry configured to route derivations from the first circuit block into the second circuit block, from the first circuit block into the first bias addition with accumulation circuitry, or into a combination thereof.

Type: Application

Filed: December 28, 2019

Publication date: July 1, 2021

Inventors: Kamlesh R. Pillai, Gurpreet Singh Kalsi, Christopher Justin Hughes
Genome Sequence Alignment System and Method

Publication number: 20210201163

Abstract: A system is provided that includes a bit vector-based distance counter circuitry configured to generate one or more bit vectors encoded with information about potential matches and edits between a read and a reference genome, wherein the read comprises an encoding of a fragment of deoxyribonucleic acid (DNA) encoded via bases G, A, T, C. The system further includes a bit vector-based traceback circuitry configured to divide the reference genome into one or more windows and to use the plurality of bit vectors to generate a traceback output for each of the one or more windows, wherein the traceback output comprises a match, a substitution, an insert, a delete, or a combination thereof, between the read and the one or more windows.

Type: Application

Filed: December 28, 2019

Publication date: July 1, 2021

Inventors: Gurpreet Singh Kalsi, Anant V. Nori, Christopher Justin Hughes, Sreenivas Subramoney, Damla Senol
SPATIALLY SPARSE NEURAL NETWORK ACCELERATOR FOR MULTI-DIMENSION VISUAL ANALYTICS

Publication number: 20210110187

Abstract: Systems, apparatuses and methods may provide for technology that decodes data via an instruction that indicates a number of rulebooks to be processed, an input feature size, an output feature size, and a plurality of feature map base addresses, rearranges spatially distributed voxel output feature maps in the decoded data based on weight planes, and performs a channel-wise multiply-accumulate (MAC) operation on the rearranged spatially distributed voxel output feature maps to obtain an output, wherein the channel-wise MAC operation is performed as partial accumulations by a plurality of processing elements.

Type: Application

Filed: December 22, 2020

Publication date: April 15, 2021

Inventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Sreenivas Subramoney, Prashant Laddha, Om Ji Omer
METHODS, APPARATUS, AND ARTICLES OF MANUFACTURE TO IMPROVE IN-MEMORY MULTIPLY AND ACCUMULATE OPERATIONS

Publication number: 20210111722

Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed to improve in-memory multiply and accumulate operations. An example apparatus includes a first multiplexer in a subarray of memory, the first multiplexer to receive first values representative of a column of a lookup table (LUT) including entries to represent products of four-bit numbers and return second values from an intersection of a row and the column of the LUT based on a first element of a first operand; shift and adder logic in the subarray, the shift and adder logic to shift the second values based on at least one of the first element of the first operand or a first element of a second operand; and accumulation storage in the subarray, the accumulation storage to store at least the shifted second values.

Type: Application

Filed: December 22, 2020

Publication date: April 15, 2021

Inventors: Gurpreet Singh Kalsi, Akshay Krishna Ramanathan, Kamlesh Pillai, Sreenivas Subramoney, Srivatsa Rangachar Srinivasa, Anirud Thyagharajan, Om Ji Omer, Saurabh Jain
Accelerator for matrix decomposition

Patent number: 10540420

Abstract: Systems and methods for a hardware accelerated matrix decomposition matrix decomposition circuit are described herein. This matrix decomposition circuit splits matrix decomposition operations into parallel operation circuits and serial operation circuits, and joins the parallel and serial operation circuits using specific dependency handling logic for efficient parallel execution. This provides fast matrix decomposition with low power consumption, reduced memory footprint, and reduced memory bandwidth.

Type: Grant

Filed: December 29, 2017

Date of Patent: January 21, 2020

Assignee: Intel Corporation

Inventors: Gurpreet Singh Kalsi, Om Ji Omer, Santhosh Kumar Rethinagiri, Anish N K, Dipan Kumar Mandal
Scalable memory-optimized hardware for matrix-solve

Patent number: 10324689

Abstract: Systems and methods for matrix-solve applications include a memory-optimized hardware acceleration (HWA) solution with scalable architecture (i.e. specialized circuitry) for HWA matrix-solve operations. The matrix-solve solutions described herein may include a scalable hardware architecture with parallel processing (e.g., “within column” processing), which provides the ability to compute several output values in parallel. The HWA matrix-solve solutions described herein may include simultaneous multi-column processing, which provides a lower execution cycle count and a reduced total number of memory accesses. This HWA matrix-solve provides a low latency and energy-efficient matrix-solve solutions, which may be used to reduce energy consumption and improve performance in various matrix-based applications, such as computer vision, SLAM, AR/VR/mixed-reality, machine learning, data analytics, and other matrix-based applications.

Type: Grant

Filed: November 21, 2017

Date of Patent: June 18, 2019

Assignee: Intel IP Corporation

Inventors: Gurpreet Singh Kalsi, Om Ji Omer, Dipan Kumar Mandal, Santhosh Kumar Rethinagiri, Gopi Neela
ACCELERATOR FOR MATRIX DECOMPOSITION

Publication number: 20190042539

Abstract: Systems and methods for a hardware accelerated matrix decomposition matrix decomposition circuit are described herein. This matrix decomposition circuit splits matrix decomposition operations into parallel operation circuits and serial operation circuits, and joins the parallel and serial operation circuits using specific dependency handling logic for efficient parallel execution. This provides fast matrix decomposition with low power consumption, reduced memory footprint, and reduced memory bandwidth.

Type: Application

Filed: December 29, 2017

Publication date: February 7, 2019

Inventors: Gurpreet Singh Kalsi, Om Ji Omer, Santhosh Kumar Rethinagiri, Anish N K, Dipan Kumar Mandal
SCALABLE MEMORY-OPTIMIZED HARDWARE FOR MATRIX-SOLVE

Publication number: 20190042195

Abstract: Systems and methods for matrix-solve applications include a memory-optimized hardware acceleration (HWA) solution with scalable architecture (i.e. specialized circuitry) for HWA matrix-solve operations. The matrix-solve solutions described herein may include a scalable hardware architecture with parallel processing (e.g., “within column” processing), which provides the ability to compute several output values in parallel. The HWA matrix-solve solutions described herein may include simultaneous multi-column processing, which provides a lower execution cycle count and a reduced total number of memory accesses. This HWA matrix-solve provides a low latency and energy-efficient matrix-solve solutions, which may be used to reduce energy consumption and improve performance in various matrix-based applications, such as computer vision, SLAM, AR/VR/mixed-reality, machine learning, data analytics, and other matrix-based applications.

Type: Application

Filed: November 21, 2017

Publication date: February 7, 2019

Inventors: Gurpreet Singh Kalsi, Om Ji Omer, Dipan Kumar Mandal, Santhosh Kumar Rethinagiri, Gopi Neela