Patents by Inventor Somnath Paul
Somnath Paul has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12197601Abstract: Examples described herein relate to offload circuitry comprising one or more compute engines that are configurable to perform a workload offloaded from a process executed by a processor based on a descriptor particular to the workload. In some examples, the offload circuitry is configurable to perform the workload, among multiple different workloads. In some examples, the multiple different workloads include one or more of: data transformation (DT) for data format conversion, Locality Sensitive Hashing (LSH) for neural network (NN), similarity search, sparse general matrix-matrix multiplication (SpGEMM) acceleration of hash based sparse matrix multiplication, data encode, data decode, or embedding lookup.Type: GrantFiled: December 22, 2021Date of Patent: January 14, 2025Assignee: Intel CorporationInventors: Ren Wang, Sameh Gobriel, Somnath Paul, Yipeng Wang, Priya Autee, Abhirupa Layek, Shaman Narayana, Edwin Verplanke, Mrittika Ganguli, Jr-Shian Tsai, Anton Sorokin, Suvadeep Banerjee, Abhijit Davare, Desmond Kirkpatrick, Rajesh M. Sankaran, Jaykant B. Timbadiya, Sriram Kabisthalam Muthukumar, Narayan Ranganathan, Nalini Murari, Brinda Ganesh, Nilesh Jain
-
Patent number: 11908542Abstract: Prior knowledge of access pattern is leveraged to improve energy dissipation for general matrix operations. This improves memory access energy for a multitude of applications such as image processing, deep neural networks, and scientific computing workloads, for example. In some embodiments, prior knowledge of access pattern allows for burst read and/or write operations. As such, burst mode solution can provide energy savings in both READ (RD) and WRITE (WR) operations. For machine learning or inference, the weight values are known ahead in time (e.g., inference operation), and so the unused bytes in the cache line are exploited to store a sparsity map that is used for disabling read from either upper or lower half of the cache line, thus saving dynamic capacitance.Type: GrantFiled: December 23, 2019Date of Patent: February 20, 2024Assignee: Intel CorporationInventors: Charles Augustine, Somnath Paul, Turbo Majumder, Iqbal Rajwani, Andrew Lines, Altug Koker, Lakshminarayanan Striramassarma, Muhammad Khellah
-
Publication number: 20230273832Abstract: A system for autonomous and proactive power management for energy efficient execution of machine learning workloads may include an apparatus such as system-on-chip (SoC) comprising an accelerator configurable to load and execute a neural network and circuitry to receive a profile of the neural network. The profile may be received from a compiler and include information regarding a plurality of layers of the neural network. Responsive to the profile and the information regarding the plurality of layers, circuitry may adjust, using a local power management unit (PMU) included the apparatus, a power level to the accelerator while the accelerator executes the neural network. The power level adjustment may be based on whether the particular layer is a compute-intensive layer or a memory-intensive layer.Type: ApplicationFiled: April 12, 2023Publication date: August 31, 2023Inventors: Somnath Paul, Muhammad M. Khellah, Linda Zeng, Mohamed Elmalaki
-
Publication number: 20220415050Abstract: A Media Analytics Co-optimizer (MAC) engine that utilizes available motion and scene information to increase the activation sparsity in artificial intelligence (AI) visual media applications. In an example, the MAC engine receives video frames and associated video characteristics determined by a video decoder and reformats the video frames by applying a threshold level of motion to the video frames and zeroing out areas that fall below the threshold level of motion. In some examples, the MAC engine further receives scene information from an optical flow engine or event processing engine and reformats further based thereon. The reformatted video frames are consumed by the first stage of AI inference.Type: ApplicationFiled: August 31, 2022Publication date: December 29, 2022Applicant: Intel CorporationInventors: Palanivel Guruva reddiar, Siew Hoon Lim, Somnath Paul, Shabbir Abbasali Saifee
-
Patent number: 11513893Abstract: A system includes a compute circuit that preemptively performs a computation on a data word before receiving an indication of data errors from an error checking and correction (ECC) circuit. The ECC circuit reads the data word from a memory array and performs error detection and error correction on the data word. The compute circuit reads the data word and performs the computation on the data word to generate an output value, without waiting for the ECC circuit to check and correct the data word. In response to error detection in the data word by the ECC circuit, the compute circuit delays outputting the output value until correction of the output value in accordance with the error detection by the ECC circuit.Type: GrantFiled: December 21, 2020Date of Patent: November 29, 2022Assignee: Intel CorporationInventors: Somnath Paul, Charles Augustine, Chen Koren, George Shchupak, Muhammad M. Khellah
-
Patent number: 11450672Abstract: An ultra-deep compute Static Random Access Memory (SRAM) with high compute throughput and multi-directional data transfer capability is provided. Compute units are placed in both horizontal and vertical directions to achieve a symmetric layout while enabling communication between the compute units. An SRAM array supports simultaneous read and write to the left and right section of the same SRAM subarray by duplicating pre-decoding logic inside the SRAM array. This allows applications with non-overlapping read and write address spaces to have twice the bandwidth as compared to a baseline SRAM array.Type: GrantFiled: April 27, 2020Date of Patent: September 20, 2022Assignee: Intel CorporationInventors: Charles Augustine, Somnath Paul, Muhammad M. Khellah, Chen Koren
-
Publication number: 20220114270Abstract: Examples described herein relate to offload circuitry comprising one or more compute engines that are configurable to perform a workload offloaded from a process executed by a processor based on a descriptor particular to the workload. In some examples, the offload circuitry is configurable to perform the workload, among multiple different workloads. In some examples, the multiple different workloads include one or more of: data transformation (DT) for data format conversion, Locality Sensitive Hashing (LSH) for neural network (NN), similarity search, sparse general matrix-matrix multiplication (SpGEMM) acceleration of hash based sparse matrix multiplication, data encode, data decode, or embedding lookup.Type: ApplicationFiled: December 22, 2021Publication date: April 14, 2022Inventors: Ren WANG, Sameh GOBRIEL, Somnath PAUL, Yipeng WANG, Priya AUTEE, Abhirupa LAYEK, Shaman NARAYANA, Edwin VERPLANKE, Mrittika GANGULI, Jr-Shian TSAI, Anton SOROKIN, Suvadeep BANERJEE, Abhijit DAVARE, Desmond KIRKPATRICK
-
Publication number: 20220012563Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed for high throughput compression of neural network weights. An example apparatus includes at least one memory, instructions in the apparatus and processor circuitry to execute the instructions to determine sizes of data lanes in a partition of neural network weights, determine a slice size based on a size difference between a first data lane and a second data lane of the data lanes in the partition, the first data lane including first data, the second data lane including second data, the second data of a smaller size than the first data, cut a portion of the first data from the first data lane based on the slice size, and append the portion of the first data to the second data lane.Type: ApplicationFiled: September 24, 2021Publication date: January 13, 2022Inventors: Alejandro Castro Gonzalez, Praveen Nair, Somnath Paul, Sudheendra Kadri, Palanivel Guruvareddiar, Aaron Gubrud, Vinodh Gopal
-
Patent number: 11176994Abstract: Embodiments include apparatuses, methods, and systems to implement a multi-read and/or multi-write process with a set of memory cells. The set of memory cells may be multiplexed with a same sense amplifier. As part of a multi-read process, a memory controller coupled to a memory circuit may precharge the bit lines associated with the set of memory cells, provide a single assertion of a word line signal on the word line, and then sequentially read data from the set of memory cells (using the sense amplifier) based on the precharge and the single assertion of the word line signal. Additionally, or alternatively, a multi-write process may be performed to sequentially write data to the set of memory cells based on one precharge of the associated bit lines. Other embodiments may be described and claimed.Type: GrantFiled: August 24, 2020Date of Patent: November 16, 2021Assignee: Intel CorporationInventors: Muhammad M. Khellah, Somnath Paul, Charles Augustine, Turbo Majumder, Suyoung Bang
-
Publication number: 20210319022Abstract: Systems, apparatuses and methods include technology that determines, with a first processing engine of a plurality of processing engines, a first partial similarity measurement based on a first portion of a query vector and a first portion of a first candidate vector. The technology determines, with a second processing engine of the plurality of processing engines, a total similarity measurement based on the query vector and a second candidate vector. The technology determines, with the first processing engine, whether to compare a second portion of the query vector to a second portion of the first candidate vector based on the first partial similarity measurement and the total similarity measurement.Type: ApplicationFiled: June 25, 2021Publication date: October 14, 2021Applicant: Intel CorporationInventors: Srajudheen Makkadayil, Somnath Paul, Shabbir Saifee, Bakshree Mishra, Vidhya Thyagarajan, Manoj Velayudha, Muhammad Khellah, Aniekeme Udofia
-
Publication number: 20210193196Abstract: Prior knowledge of access pattern is leveraged to improve energy dissipation for general matrix operations. This improves memory access energy for a multitude of applications such as image processing, deep neural networks, and scientific computing workloads, for example. In some embodiments, prior knowledge of access pattern allows for burst read and/or write operations. As such, burst mode solution can provide energy savings in both READ (RD) and WRITE (WR) operations. For machine learning or inference, the weight values are known ahead in time (e.g., inference operation), and so the unused bytes in the cache line are exploited to store a sparsity map that is used for disabling read from either upper or lower half of the cache line, thus saving dynamic capacitance.Type: ApplicationFiled: December 23, 2019Publication date: June 24, 2021Applicant: Intel CorporationInventors: Charles Augustine, Somnath Paul, Turbo Majumder, Iqbal Rajwani, Andrew Lines, Altug Koker, Lakshminarayanan Striramassarma, Muhammad Khellah
-
Publication number: 20210109809Abstract: A system includes a compute circuit that preemptively performs a computation on a data word before receiving an indication of data errors from an error checking and correction (ECC) circuit. The ECC circuit reads the data word from a memory array and performs error detection and error correction on the data word. The compute circuit reads the data word and performs the computation on the data word to generate an output value, without waiting for the ECC circuit to check and correct the data word. In response to error detection in the data word by the ECC circuit, the compute circuit delays outputting the output value until correction of the output value in accordance with the error detection by the ECC circuit.Type: ApplicationFiled: December 21, 2020Publication date: April 15, 2021Inventors: Somnath Paul, Charles Augustine, Chen Koren, George Shchupak, Muhammad M. Khellah
-
Publication number: 20210043251Abstract: Embodiments include apparatuses, methods, and systems to implement a multi-read and/or multi-write process with a set of memory cells. The set of memory cells may be multiplexed with a same sense amplifier. As part of a multi-read process, a memory controller coupled to a memory circuit may precharge the bit lines associated with the set of memory cells, provide a single assertion of a word line signal on the word line, and then sequentially read data from the set of memory cells (using the sense amplifier) based on the precharge and the single assertion of the word line signal. Additionally, or alternatively, a multi-write process may be performed to sequentially write data to the set of memory cells based on one precharge of the associated bit lines. Other embodiments may be described and claimed.Type: ApplicationFiled: August 24, 2020Publication date: February 11, 2021Inventors: Muhammad M. Khellah, Somnath Paul, Charles Augustine, Turbo Majumder, Suyoung Bang
-
Patent number: 10892012Abstract: An apparatus, vision processing unit, and method are provided for clustering motion events in a content addressable memory. A motion event is received including coordinates in an image frame that have experienced a change and a timestamp of the change. A determination is made as to whether determine whether there is a valid entry in the memory having coordinates within a predefined range of coordinates included in the motion event. In response to a determination that there is the valid entry having the coordinates within the predefined range of coordinates included in the motion event, write to the valid entry the coordinates and the timestamp in the motion event.Type: GrantFiled: August 23, 2018Date of Patent: January 12, 2021Assignee: INTEL CORPORATIONInventors: Turbo Majumder, Somnath Paul, Charles Augustine, Muhammad M. Khellah
-
Patent number: 10878313Abstract: A spike sent from a first artificial neuron in a spiking neural network (SNN) to a second artificial neuron in the SNN is identified, with the spike sent over a particular artificial synapse in the SNN. The membrane potential of the second artificial neuron at a particular time step, corresponding to sending of the spike, is compared to a threshold potential, where the threshold potential is set lower than a firing potential of the second artificial neuron. A change to the synaptic weight of the particular artificial synapse is determined based on the spike, where the synaptic weight is to be decreased if the membrane potential of the second artificial neuron is lower than the threshold potential at the particular time step and the synaptic weight is to be increased if the membrane potential of the second artificial neuron is higher than the threshold potential at the particular time step.Type: GrantFiled: May 2, 2017Date of Patent: December 29, 2020Assignee: INTEL CORPORATIONInventors: Charles Augustine, Somnath Paul, Sadique Ul Ameen Sheik, Muhammad M. Khellah
-
Patent number: 10755771Abstract: Embodiments include apparatuses, methods, and systems to implement a multi-read and/or multi-write process with a set of memory cells. The set of memory cells may be multiplexed with a same sense amplifier. As part of a multi-read process, a memory controller coupled to a memory circuit may precharge the bit lines associated with the set of memory cells, provide a single assertion of a word line signal on the word line, and then sequentially read data from the set of memory cells (using the sense amplifier) based on the precharge and the single assertion of the word line signal. Additionally, or alternatively, a multi-write process may be performed to sequentially write data to the set of memory cells based on one precharge of the associated bit lines. Other embodiments may be described and claimed.Type: GrantFiled: December 19, 2018Date of Patent: August 25, 2020Assignee: Intel CorporationInventors: Muhammad M. Khellah, Somnath Paul, Charles Augustine, Turbo Majumder, Suyoung Bang
-
Patent number: 10748060Abstract: A processor or integrated circuit includes a memory to store weight values for a plurality neuromorphic states and a circuitry coupled to the memory. The circuitry is to detect an incoming data signal for a pre-synaptic neuromorphic state and initiate a time window for the pre-synaptic neuromorphic state in response to detecting the incoming data signal. The circuitry is further to, responsive to detecting an end of the time window: retrieve, from the memory, a weight value for a post-synaptic neuromorphic state for which an outgoing data signal is generated during the time window, the post-synaptic neuromorphic state being a fan-out connection of the pre-synaptic neuromorphic state; perform a causal update to the weight value, according to a learning function, to generate an updated weight value; and store the updated weight value back to the memory.Type: GrantFiled: October 14, 2016Date of Patent: August 18, 2020Assignee: Intel CorporationInventors: Somnath Paul, Charles Augustine, Muhammad M. Khellah
-
Publication number: 20200258890Abstract: An ultra-deep compute Static Random Access Memory (SRAM) with high compute throughput and multi-directional data transfer capability is provided. Compute units are placed in both horizontal and vertical directions to achieve a symmetric layout while enabling communication between the compute units. An SRAM array supports simultaneous read and write to the left and right section of the same SRAM subarray by duplicating pre-decoding logic inside the SRAM array. This allows applications with non-overlapping read and write address spaces to have twice the bandwidth as compared to a baseline SRAM array.Type: ApplicationFiled: April 27, 2020Publication date: August 13, 2020Inventors: Charles AUGUSTINE, Somnath PAUL, Muhammad M. KHELLAH, Chen KOREN
-
Publication number: 20200183922Abstract: An apparatus is described. The apparatus includes a nearest neighbor search circuit to perform a search according to a first stage search and a second stage search. The nearest neighbor search circuit includes a first stage circuit and a second stage circuit. The first stage search circuit includes a hash logic circuit and a content addressable memory. The hash logic circuit is to generate a hash word from a input query vector. The hash word has B bands. The content addressable memory is to store hashes of a random access memory's data items. The hashes each have B bands. The content addressable memory is to compare the hashes against the hash word on a sequential band-by-band basis. The second stage circuit char the random access memory and a compare and sort circuit. The compare and sort circuit is to receive the input query vector. The random access memory has crosswise bit lines coupled to the compare and sort circuit.Type: ApplicationFiled: February 19, 2020Publication date: June 11, 2020Inventors: Wootaek LIM, Minchang CHO, Somnath PAUL, Charles AUGUSTINE, Suyoung BANG, Turbo MAJUMDER, Muhammad M. KHELLAH
-
Patent number: 10665222Abstract: A system, article, and method provide temporal-domain feature extraction for automatic speech recognition.Type: GrantFiled: June 28, 2018Date of Patent: May 26, 2020Assignee: Intel CorporationInventors: Suyoung Bang, Muhammad Khellah, Somnath Paul, Charles Augustine, Turbo Majumder, Wootaek Lim, Tobias Bocklet, David Pearce