Patents by Inventor Gurpreet Singh Kalsi

Gurpreet Singh Kalsi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11949414
    Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed to improve in-memory multiply and accumulate operations. An example apparatus includes a first multiplexer in a subarray of memory, the first multiplexer to receive first values representative of a column of a lookup table (LUT) including entries to represent products of four-bit numbers and return second values from an intersection of a row and the column of the LUT based on a first element of a first operand; shift and adder logic in the subarray, the shift and adder logic to shift the second values based on at least one of the first element of the first operand or a first element of a second operand; and accumulation storage in the subarray, the accumulation storage to store at least the shifted second values.
    Type: Grant
    Filed: December 22, 2020
    Date of Patent: April 2, 2024
    Assignee: INTEL CORPORATION
    Inventors: Gurpreet Singh Kalsi, Akshay Krishna Ramanathan, Kamlesh Pillai, Sreenivas Subramoney, Srivatsa Rangachar Srinivasa, Anirud Thyagharajan, Om Ji Omer, Saurabh Jain
  • Patent number: 11941534
    Abstract: A system is provided that includes a bit vector-based distance counter circuitry configured to generate one or more bit vectors encoded with information about potential matches and edits between a read and a reference genome, wherein the read comprises an encoding of a fragment of deoxyribonucleic acid (DNA) encoded via bases G, A, T, C. The system further includes a bit vector-based traceback circuitry configured to divide the reference genome into one or more windows and to use the plurality of bit vectors to generate a traceback output for each of the one or more windows, wherein the traceback output comprises a match, a substitution, an insert, a delete, or a combination thereof, between the read and the one or more windows.
    Type: Grant
    Filed: December 28, 2019
    Date of Patent: March 26, 2024
    Assignee: Intel Corporation
    Inventors: Gurpreet Singh Kalsi, Anant V. Nori, Christopher Justin Hughes, Sreenivas Subramoney, Damla Senol
  • Patent number: 11783170
    Abstract: Systems, apparatuses and methods may provide for technology that decodes data via an instruction that indicates a number of rulebooks to be processed, an input feature size, an output feature size, and a plurality of feature map base addresses, rearranges spatially distributed voxel output feature maps in the decoded data based on weight planes, and performs a channel-wise multiply-accumulate (MAC) operation on the rearranged spatially distributed voxel output feature maps to obtain an output, wherein the channel-wise MAC operation is performed as partial accumulations by a plurality of processing elements.
    Type: Grant
    Filed: January 25, 2023
    Date of Patent: October 10, 2023
    Assignee: INTEL CORPORATION
    Inventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Sreenivas Subramoney, Prashant Laddha, Om Ji Omer
  • Publication number: 20230169315
    Abstract: Systems, apparatuses and methods may provide for technology that generates a vector output based on a bitmask, wherein the vector output includes non-zero bit indices in a first portion of the vector output, and wherein the non-zero bit indices correspond to non-zero values in the bitmask. The technology may also generate an offset based on the bitmask, wherein the offset indicates a start position in the vector output for the non-zero bit indices.
    Type: Application
    Filed: September 1, 2022
    Publication date: June 1, 2023
    Inventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Om Ji Omer
  • Publication number: 20230169319
    Abstract: Systems, apparatuses and methods may provide for technology that decodes data via an instruction that indicates a number of rulebooks to be processed, an input feature size, an output feature size, and a plurality of feature map base addresses, rearranges spatially distributed voxel output feature maps in the decoded data based on weight planes, and performs a channel-wise multiply-accumulate (MAC) operation on the rearranged spatially distributed voxel output feature maps to obtain an output, wherein the channel-wise MAC operation is performed as partial accumulations by a plurality of processing elements.
    Type: Application
    Filed: January 25, 2023
    Publication date: June 1, 2023
    Inventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Sreenivas Subramoney, Prashant Laddha, Om Ji Omer
  • Patent number: 11620818
    Abstract: Systems, apparatuses and methods may provide for technology that decodes data via an instruction that indicates a number of rulebooks to be processed, an input feature size, an output feature size, and a plurality of feature map base addresses, rearranges spatially distributed voxel output feature maps in the decoded data based on weight planes, and performs a channel-wise multiply-accumulate (MAC) operation on the rearranged spatially distributed voxel output feature maps to obtain an output, wherein the channel-wise MAC operation is performed as partial accumulations by a plurality of processing elements.
    Type: Grant
    Filed: December 22, 2020
    Date of Patent: April 4, 2023
    Assignee: INTEL CORPORATION
    Inventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Sreenivas Subramoney, Prashant Laddha, Om Ji Omer
  • Publication number: 20220391128
    Abstract: Example compute-in-memory (CIM) or processor-in-memory (PIM) techniques using repurposed or dedicated static random access memory (SRAM) rows of an SRAM sub-array to store look-up-table (LUT) entries for use in a multiply and accumulate (MAC) operation.
    Type: Application
    Filed: June 7, 2021
    Publication date: December 8, 2022
    Inventors: Saurabh JAIN, Srivatsa RANGACHAR SRINIVASA, Akshay Krishna RAMANATHAN, Gurpreet Singh KALSI, Kamlesh R. PILLAI, Sreenivas SUBRAMONEY
  • Publication number: 20220382514
    Abstract: Systems, apparatuses, and methods include technology that determines whether an operation is a floating-point based computation or an integer-based computation. When the operation is the floating-point based computation, the technology generates a map of the operation to integer-based compute engines to control the integer-based compute engines to execute the floating-point based computation. When the operation is the integer-based computation, the technology controls the integer-based compute engines to execute the integer-based computation.
    Type: Application
    Filed: June 6, 2022
    Publication date: December 1, 2022
    Inventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Sreedevi Ambika, Om Omer, Sreenivas Subramoney
  • Publication number: 20220198307
    Abstract: A processor package comprises at least one Baum-Welch core. The Baum-Welch core comprises a likelihood-value generator, an emission-probability generator, and a transition-probability generator. The likelihood-value generator generates forward values and backward values for a set of observations. The emission-probability generator generates emission probabilities for the set of observations. The transition-probability generator generates transition probabilities for the set of observations. Furthermore, the BW core comprises a look-up table comprising preconfigured transition*emission values to be used by the LV generator when generating FVs and BVs. Other embodiments are described and claimed.
    Type: Application
    Filed: December 23, 2020
    Publication date: June 23, 2022
    Inventors: KAMLESH PILLAI, GURPREET SINGH KALSI, BHARATHWAJ SURESH
  • Publication number: 20220198306
    Abstract: A processor package comprises at least one Baum-Welch core. The Baum-Welch core comprises a likelihood-value generator, an emission-probability generator, and a transition-probability generator. The likelihood-value generator generates forward values and backward values for a set of observations. The emission-probability generator generates emission probabilities for the set of observations. The transition-probability generator generates transition probabilities for the set of observations. Furthermore, the BW core is to generate, in parallel, at least two types of probability values from the group consisting of forward values, backward values, emission probabilities, and transition probabilities. Other embodiments are described and claimed.
    Type: Application
    Filed: December 23, 2020
    Publication date: June 23, 2022
    Inventors: KAMLESH PILLAI, GURPREET SINGH KALSI, BHARATHWAJ SURESH
  • Publication number: 20220113974
    Abstract: A memory architecture includes processing circuits co-located with memory subarrays for performing computations within the memory architecture. The memory architecture includes a plurality of decoders in hierarchical levels that include a multicast capability for distributing data or compute operations to individual subarrays. The multicast may be configurable with respect to individual fan-outs at each hierarchical level. A computation workflow may be organized into a compute supertile representing one or more “supertiles” of input data to be processed in the compute supertile. The individual data tiles of the input data supertile may be used by multiple compute tiles executed by the processing circuits of the subarrays, and the data tiles multicast to the respective processing circuits for efficient data loading and parallel computation.
    Type: Application
    Filed: December 23, 2021
    Publication date: April 14, 2022
    Applicant: INTEL CORPORATION
    Inventors: Om Ji Omer, Gurpreet Singh Kalsi, Anirud Thyagharajan, Saurabh Jain, Kamlesh R. Pillai, Sreenivas Subramoney, Avishaii Abuhatzera
  • Publication number: 20220012571
    Abstract: Apparatuses and articles of manufacture are disclosed. An example apparatus includes an activation function control and decode circuitry to populate an input buffer circuitry with an input data element bit subset of less than a threshold number of bits of the input data element retrieved from the memory circuitry. The activation function and control circuitry also populate a kernel weight buffer circuitry with a weight data element bit subset of less than the threshold number of bits of the weight data element retrieved from the memory circuitry. The apparatus also including a preprocessor circuitry to calculate a partial convolution value of at least a portion of the input data element bit subset and the weight data element bit subset to determine a predicted sign of the partial convolution value.
    Type: Application
    Filed: September 24, 2021
    Publication date: January 13, 2022
    Inventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Bharathwaj Suresh, Sreenivas Subramoney, Avishaii Abuhatzera
  • Publication number: 20210200711
    Abstract: A system is provided that includes a reconfigurable systolic array circuitry. The reconfigurable systolic array circuitry includes a first circuit block comprising one or more groups of processing elements and a second circuit block comprising one or more groups of processing elements. The reconfigurable systolic array circuitry further includes a first bias addition with accumulation circuitry configured to add a matrix bias to an accumulated value, to a multiplication product, or to a combination thereof. The reconfigurable systolic array circuitry additionally includes a first routing circuitry configured to route derivations from the first circuit block into the second circuit block, from the first circuit block into the first bias addition with accumulation circuitry, or into a combination thereof.
    Type: Application
    Filed: December 28, 2019
    Publication date: July 1, 2021
    Inventors: Kamlesh R. Pillai, Gurpreet Singh Kalsi, Christopher Justin Hughes
  • Publication number: 20210201163
    Abstract: A system is provided that includes a bit vector-based distance counter circuitry configured to generate one or more bit vectors encoded with information about potential matches and edits between a read and a reference genome, wherein the read comprises an encoding of a fragment of deoxyribonucleic acid (DNA) encoded via bases G, A, T, C. The system further includes a bit vector-based traceback circuitry configured to divide the reference genome into one or more windows and to use the plurality of bit vectors to generate a traceback output for each of the one or more windows, wherein the traceback output comprises a match, a substitution, an insert, a delete, or a combination thereof, between the read and the one or more windows.
    Type: Application
    Filed: December 28, 2019
    Publication date: July 1, 2021
    Inventors: Gurpreet Singh Kalsi, Anant V. Nori, Christopher Justin Hughes, Sreenivas Subramoney, Damla Senol
  • Publication number: 20210110187
    Abstract: Systems, apparatuses and methods may provide for technology that decodes data via an instruction that indicates a number of rulebooks to be processed, an input feature size, an output feature size, and a plurality of feature map base addresses, rearranges spatially distributed voxel output feature maps in the decoded data based on weight planes, and performs a channel-wise multiply-accumulate (MAC) operation on the rearranged spatially distributed voxel output feature maps to obtain an output, wherein the channel-wise MAC operation is performed as partial accumulations by a plurality of processing elements.
    Type: Application
    Filed: December 22, 2020
    Publication date: April 15, 2021
    Inventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Sreenivas Subramoney, Prashant Laddha, Om Ji Omer
  • Publication number: 20210111722
    Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed to improve in-memory multiply and accumulate operations. An example apparatus includes a first multiplexer in a subarray of memory, the first multiplexer to receive first values representative of a column of a lookup table (LUT) including entries to represent products of four-bit numbers and return second values from an intersection of a row and the column of the LUT based on a first element of a first operand; shift and adder logic in the subarray, the shift and adder logic to shift the second values based on at least one of the first element of the first operand or a first element of a second operand; and accumulation storage in the subarray, the accumulation storage to store at least the shifted second values.
    Type: Application
    Filed: December 22, 2020
    Publication date: April 15, 2021
    Inventors: Gurpreet Singh Kalsi, Akshay Krishna Ramanathan, Kamlesh Pillai, Sreenivas Subramoney, Srivatsa Rangachar Srinivasa, Anirud Thyagharajan, Om Ji Omer, Saurabh Jain
  • Patent number: 10540420
    Abstract: Systems and methods for a hardware accelerated matrix decomposition matrix decomposition circuit are described herein. This matrix decomposition circuit splits matrix decomposition operations into parallel operation circuits and serial operation circuits, and joins the parallel and serial operation circuits using specific dependency handling logic for efficient parallel execution. This provides fast matrix decomposition with low power consumption, reduced memory footprint, and reduced memory bandwidth.
    Type: Grant
    Filed: December 29, 2017
    Date of Patent: January 21, 2020
    Assignee: Intel Corporation
    Inventors: Gurpreet Singh Kalsi, Om Ji Omer, Santhosh Kumar Rethinagiri, Anish N K, Dipan Kumar Mandal
  • Patent number: 10324689
    Abstract: Systems and methods for matrix-solve applications include a memory-optimized hardware acceleration (HWA) solution with scalable architecture (i.e. specialized circuitry) for HWA matrix-solve operations. The matrix-solve solutions described herein may include a scalable hardware architecture with parallel processing (e.g., “within column” processing), which provides the ability to compute several output values in parallel. The HWA matrix-solve solutions described herein may include simultaneous multi-column processing, which provides a lower execution cycle count and a reduced total number of memory accesses. This HWA matrix-solve provides a low latency and energy-efficient matrix-solve solutions, which may be used to reduce energy consumption and improve performance in various matrix-based applications, such as computer vision, SLAM, AR/VR/mixed-reality, machine learning, data analytics, and other matrix-based applications.
    Type: Grant
    Filed: November 21, 2017
    Date of Patent: June 18, 2019
    Assignee: Intel IP Corporation
    Inventors: Gurpreet Singh Kalsi, Om Ji Omer, Dipan Kumar Mandal, Santhosh Kumar Rethinagiri, Gopi Neela
  • Publication number: 20190042539
    Abstract: Systems and methods for a hardware accelerated matrix decomposition matrix decomposition circuit are described herein. This matrix decomposition circuit splits matrix decomposition operations into parallel operation circuits and serial operation circuits, and joins the parallel and serial operation circuits using specific dependency handling logic for efficient parallel execution. This provides fast matrix decomposition with low power consumption, reduced memory footprint, and reduced memory bandwidth.
    Type: Application
    Filed: December 29, 2017
    Publication date: February 7, 2019
    Inventors: Gurpreet Singh Kalsi, Om Ji Omer, Santhosh Kumar Rethinagiri, Anish N K, Dipan Kumar Mandal
  • Publication number: 20190042195
    Abstract: Systems and methods for matrix-solve applications include a memory-optimized hardware acceleration (HWA) solution with scalable architecture (i.e. specialized circuitry) for HWA matrix-solve operations. The matrix-solve solutions described herein may include a scalable hardware architecture with parallel processing (e.g., “within column” processing), which provides the ability to compute several output values in parallel. The HWA matrix-solve solutions described herein may include simultaneous multi-column processing, which provides a lower execution cycle count and a reduced total number of memory accesses. This HWA matrix-solve provides a low latency and energy-efficient matrix-solve solutions, which may be used to reduce energy consumption and improve performance in various matrix-based applications, such as computer vision, SLAM, AR/VR/mixed-reality, machine learning, data analytics, and other matrix-based applications.
    Type: Application
    Filed: November 21, 2017
    Publication date: February 7, 2019
    Inventors: Gurpreet Singh Kalsi, Om Ji Omer, Dipan Kumar Mandal, Santhosh Kumar Rethinagiri, Gopi Neela