Patents by Inventor Somnath Paul

Somnath Paul has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11908542
    Abstract: Prior knowledge of access pattern is leveraged to improve energy dissipation for general matrix operations. This improves memory access energy for a multitude of applications such as image processing, deep neural networks, and scientific computing workloads, for example. In some embodiments, prior knowledge of access pattern allows for burst read and/or write operations. As such, burst mode solution can provide energy savings in both READ (RD) and WRITE (WR) operations. For machine learning or inference, the weight values are known ahead in time (e.g., inference operation), and so the unused bytes in the cache line are exploited to store a sparsity map that is used for disabling read from either upper or lower half of the cache line, thus saving dynamic capacitance.
    Type: Grant
    Filed: December 23, 2019
    Date of Patent: February 20, 2024
    Assignee: Intel Corporation
    Inventors: Charles Augustine, Somnath Paul, Turbo Majumder, Iqbal Rajwani, Andrew Lines, Altug Koker, Lakshminarayanan Striramassarma, Muhammad Khellah
  • Publication number: 20230273832
    Abstract: A system for autonomous and proactive power management for energy efficient execution of machine learning workloads may include an apparatus such as system-on-chip (SoC) comprising an accelerator configurable to load and execute a neural network and circuitry to receive a profile of the neural network. The profile may be received from a compiler and include information regarding a plurality of layers of the neural network. Responsive to the profile and the information regarding the plurality of layers, circuitry may adjust, using a local power management unit (PMU) included the apparatus, a power level to the accelerator while the accelerator executes the neural network. The power level adjustment may be based on whether the particular layer is a compute-intensive layer or a memory-intensive layer.
    Type: Application
    Filed: April 12, 2023
    Publication date: August 31, 2023
    Inventors: Somnath Paul, Muhammad M. Khellah, Linda Zeng, Mohamed Elmalaki
  • Publication number: 20220415050
    Abstract: A Media Analytics Co-optimizer (MAC) engine that utilizes available motion and scene information to increase the activation sparsity in artificial intelligence (AI) visual media applications. In an example, the MAC engine receives video frames and associated video characteristics determined by a video decoder and reformats the video frames by applying a threshold level of motion to the video frames and zeroing out areas that fall below the threshold level of motion. In some examples, the MAC engine further receives scene information from an optical flow engine or event processing engine and reformats further based thereon. The reformatted video frames are consumed by the first stage of AI inference.
    Type: Application
    Filed: August 31, 2022
    Publication date: December 29, 2022
    Applicant: Intel Corporation
    Inventors: Palanivel Guruva reddiar, Siew Hoon Lim, Somnath Paul, Shabbir Abbasali Saifee
  • Patent number: 11513893
    Abstract: A system includes a compute circuit that preemptively performs a computation on a data word before receiving an indication of data errors from an error checking and correction (ECC) circuit. The ECC circuit reads the data word from a memory array and performs error detection and error correction on the data word. The compute circuit reads the data word and performs the computation on the data word to generate an output value, without waiting for the ECC circuit to check and correct the data word. In response to error detection in the data word by the ECC circuit, the compute circuit delays outputting the output value until correction of the output value in accordance with the error detection by the ECC circuit.
    Type: Grant
    Filed: December 21, 2020
    Date of Patent: November 29, 2022
    Assignee: Intel Corporation
    Inventors: Somnath Paul, Charles Augustine, Chen Koren, George Shchupak, Muhammad M. Khellah
  • Patent number: 11450672
    Abstract: An ultra-deep compute Static Random Access Memory (SRAM) with high compute throughput and multi-directional data transfer capability is provided. Compute units are placed in both horizontal and vertical directions to achieve a symmetric layout while enabling communication between the compute units. An SRAM array supports simultaneous read and write to the left and right section of the same SRAM subarray by duplicating pre-decoding logic inside the SRAM array. This allows applications with non-overlapping read and write address spaces to have twice the bandwidth as compared to a baseline SRAM array.
    Type: Grant
    Filed: April 27, 2020
    Date of Patent: September 20, 2022
    Assignee: Intel Corporation
    Inventors: Charles Augustine, Somnath Paul, Muhammad M. Khellah, Chen Koren
  • Publication number: 20220114270
    Abstract: Examples described herein relate to offload circuitry comprising one or more compute engines that are configurable to perform a workload offloaded from a process executed by a processor based on a descriptor particular to the workload. In some examples, the offload circuitry is configurable to perform the workload, among multiple different workloads. In some examples, the multiple different workloads include one or more of: data transformation (DT) for data format conversion, Locality Sensitive Hashing (LSH) for neural network (NN), similarity search, sparse general matrix-matrix multiplication (SpGEMM) acceleration of hash based sparse matrix multiplication, data encode, data decode, or embedding lookup.
    Type: Application
    Filed: December 22, 2021
    Publication date: April 14, 2022
    Inventors: Ren WANG, Sameh GOBRIEL, Somnath PAUL, Yipeng WANG, Priya AUTEE, Abhirupa LAYEK, Shaman NARAYANA, Edwin VERPLANKE, Mrittika GANGULI, Jr-Shian TSAI, Anton SOROKIN, Suvadeep BANERJEE, Abhijit DAVARE, Desmond KIRKPATRICK
  • Publication number: 20220012563
    Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed for high throughput compression of neural network weights. An example apparatus includes at least one memory, instructions in the apparatus and processor circuitry to execute the instructions to determine sizes of data lanes in a partition of neural network weights, determine a slice size based on a size difference between a first data lane and a second data lane of the data lanes in the partition, the first data lane including first data, the second data lane including second data, the second data of a smaller size than the first data, cut a portion of the first data from the first data lane based on the slice size, and append the portion of the first data to the second data lane.
    Type: Application
    Filed: September 24, 2021
    Publication date: January 13, 2022
    Inventors: Alejandro Castro Gonzalez, Praveen Nair, Somnath Paul, Sudheendra Kadri, Palanivel Guruvareddiar, Aaron Gubrud, Vinodh Gopal
  • Patent number: 11176994
    Abstract: Embodiments include apparatuses, methods, and systems to implement a multi-read and/or multi-write process with a set of memory cells. The set of memory cells may be multiplexed with a same sense amplifier. As part of a multi-read process, a memory controller coupled to a memory circuit may precharge the bit lines associated with the set of memory cells, provide a single assertion of a word line signal on the word line, and then sequentially read data from the set of memory cells (using the sense amplifier) based on the precharge and the single assertion of the word line signal. Additionally, or alternatively, a multi-write process may be performed to sequentially write data to the set of memory cells based on one precharge of the associated bit lines. Other embodiments may be described and claimed.
    Type: Grant
    Filed: August 24, 2020
    Date of Patent: November 16, 2021
    Assignee: Intel Corporation
    Inventors: Muhammad M. Khellah, Somnath Paul, Charles Augustine, Turbo Majumder, Suyoung Bang
  • Publication number: 20210319022
    Abstract: Systems, apparatuses and methods include technology that determines, with a first processing engine of a plurality of processing engines, a first partial similarity measurement based on a first portion of a query vector and a first portion of a first candidate vector. The technology determines, with a second processing engine of the plurality of processing engines, a total similarity measurement based on the query vector and a second candidate vector. The technology determines, with the first processing engine, whether to compare a second portion of the query vector to a second portion of the first candidate vector based on the first partial similarity measurement and the total similarity measurement.
    Type: Application
    Filed: June 25, 2021
    Publication date: October 14, 2021
    Applicant: Intel Corporation
    Inventors: Srajudheen Makkadayil, Somnath Paul, Shabbir Saifee, Bakshree Mishra, Vidhya Thyagarajan, Manoj Velayudha, Muhammad Khellah, Aniekeme Udofia
  • Publication number: 20210193196
    Abstract: Prior knowledge of access pattern is leveraged to improve energy dissipation for general matrix operations. This improves memory access energy for a multitude of applications such as image processing, deep neural networks, and scientific computing workloads, for example. In some embodiments, prior knowledge of access pattern allows for burst read and/or write operations. As such, burst mode solution can provide energy savings in both READ (RD) and WRITE (WR) operations. For machine learning or inference, the weight values are known ahead in time (e.g., inference operation), and so the unused bytes in the cache line are exploited to store a sparsity map that is used for disabling read from either upper or lower half of the cache line, thus saving dynamic capacitance.
    Type: Application
    Filed: December 23, 2019
    Publication date: June 24, 2021
    Applicant: Intel Corporation
    Inventors: Charles Augustine, Somnath Paul, Turbo Majumder, Iqbal Rajwani, Andrew Lines, Altug Koker, Lakshminarayanan Striramassarma, Muhammad Khellah
  • Publication number: 20210109809
    Abstract: A system includes a compute circuit that preemptively performs a computation on a data word before receiving an indication of data errors from an error checking and correction (ECC) circuit. The ECC circuit reads the data word from a memory array and performs error detection and error correction on the data word. The compute circuit reads the data word and performs the computation on the data word to generate an output value, without waiting for the ECC circuit to check and correct the data word. In response to error detection in the data word by the ECC circuit, the compute circuit delays outputting the output value until correction of the output value in accordance with the error detection by the ECC circuit.
    Type: Application
    Filed: December 21, 2020
    Publication date: April 15, 2021
    Inventors: Somnath Paul, Charles Augustine, Chen Koren, George Shchupak, Muhammad M. Khellah
  • Publication number: 20210043251
    Abstract: Embodiments include apparatuses, methods, and systems to implement a multi-read and/or multi-write process with a set of memory cells. The set of memory cells may be multiplexed with a same sense amplifier. As part of a multi-read process, a memory controller coupled to a memory circuit may precharge the bit lines associated with the set of memory cells, provide a single assertion of a word line signal on the word line, and then sequentially read data from the set of memory cells (using the sense amplifier) based on the precharge and the single assertion of the word line signal. Additionally, or alternatively, a multi-write process may be performed to sequentially write data to the set of memory cells based on one precharge of the associated bit lines. Other embodiments may be described and claimed.
    Type: Application
    Filed: August 24, 2020
    Publication date: February 11, 2021
    Inventors: Muhammad M. Khellah, Somnath Paul, Charles Augustine, Turbo Majumder, Suyoung Bang
  • Patent number: 10892012
    Abstract: An apparatus, vision processing unit, and method are provided for clustering motion events in a content addressable memory. A motion event is received including coordinates in an image frame that have experienced a change and a timestamp of the change. A determination is made as to whether determine whether there is a valid entry in the memory having coordinates within a predefined range of coordinates included in the motion event. In response to a determination that there is the valid entry having the coordinates within the predefined range of coordinates included in the motion event, write to the valid entry the coordinates and the timestamp in the motion event.
    Type: Grant
    Filed: August 23, 2018
    Date of Patent: January 12, 2021
    Assignee: INTEL CORPORATION
    Inventors: Turbo Majumder, Somnath Paul, Charles Augustine, Muhammad M. Khellah
  • Patent number: 10878313
    Abstract: A spike sent from a first artificial neuron in a spiking neural network (SNN) to a second artificial neuron in the SNN is identified, with the spike sent over a particular artificial synapse in the SNN. The membrane potential of the second artificial neuron at a particular time step, corresponding to sending of the spike, is compared to a threshold potential, where the threshold potential is set lower than a firing potential of the second artificial neuron. A change to the synaptic weight of the particular artificial synapse is determined based on the spike, where the synaptic weight is to be decreased if the membrane potential of the second artificial neuron is lower than the threshold potential at the particular time step and the synaptic weight is to be increased if the membrane potential of the second artificial neuron is higher than the threshold potential at the particular time step.
    Type: Grant
    Filed: May 2, 2017
    Date of Patent: December 29, 2020
    Assignee: INTEL CORPORATION
    Inventors: Charles Augustine, Somnath Paul, Sadique Ul Ameen Sheik, Muhammad M. Khellah
  • Patent number: 10755771
    Abstract: Embodiments include apparatuses, methods, and systems to implement a multi-read and/or multi-write process with a set of memory cells. The set of memory cells may be multiplexed with a same sense amplifier. As part of a multi-read process, a memory controller coupled to a memory circuit may precharge the bit lines associated with the set of memory cells, provide a single assertion of a word line signal on the word line, and then sequentially read data from the set of memory cells (using the sense amplifier) based on the precharge and the single assertion of the word line signal. Additionally, or alternatively, a multi-write process may be performed to sequentially write data to the set of memory cells based on one precharge of the associated bit lines. Other embodiments may be described and claimed.
    Type: Grant
    Filed: December 19, 2018
    Date of Patent: August 25, 2020
    Assignee: Intel Corporation
    Inventors: Muhammad M. Khellah, Somnath Paul, Charles Augustine, Turbo Majumder, Suyoung Bang
  • Patent number: 10748060
    Abstract: A processor or integrated circuit includes a memory to store weight values for a plurality neuromorphic states and a circuitry coupled to the memory. The circuitry is to detect an incoming data signal for a pre-synaptic neuromorphic state and initiate a time window for the pre-synaptic neuromorphic state in response to detecting the incoming data signal. The circuitry is further to, responsive to detecting an end of the time window: retrieve, from the memory, a weight value for a post-synaptic neuromorphic state for which an outgoing data signal is generated during the time window, the post-synaptic neuromorphic state being a fan-out connection of the pre-synaptic neuromorphic state; perform a causal update to the weight value, according to a learning function, to generate an updated weight value; and store the updated weight value back to the memory.
    Type: Grant
    Filed: October 14, 2016
    Date of Patent: August 18, 2020
    Assignee: Intel Corporation
    Inventors: Somnath Paul, Charles Augustine, Muhammad M. Khellah
  • Publication number: 20200258890
    Abstract: An ultra-deep compute Static Random Access Memory (SRAM) with high compute throughput and multi-directional data transfer capability is provided. Compute units are placed in both horizontal and vertical directions to achieve a symmetric layout while enabling communication between the compute units. An SRAM array supports simultaneous read and write to the left and right section of the same SRAM subarray by duplicating pre-decoding logic inside the SRAM array. This allows applications with non-overlapping read and write address spaces to have twice the bandwidth as compared to a baseline SRAM array.
    Type: Application
    Filed: April 27, 2020
    Publication date: August 13, 2020
    Inventors: Charles AUGUSTINE, Somnath PAUL, Muhammad M. KHELLAH, Chen KOREN
  • Publication number: 20200183922
    Abstract: An apparatus is described. The apparatus includes a nearest neighbor search circuit to perform a search according to a first stage search and a second stage search. The nearest neighbor search circuit includes a first stage circuit and a second stage circuit. The first stage search circuit includes a hash logic circuit and a content addressable memory. The hash logic circuit is to generate a hash word from a input query vector. The hash word has B bands. The content addressable memory is to store hashes of a random access memory's data items. The hashes each have B bands. The content addressable memory is to compare the hashes against the hash word on a sequential band-by-band basis. The second stage circuit char the random access memory and a compare and sort circuit. The compare and sort circuit is to receive the input query vector. The random access memory has crosswise bit lines coupled to the compare and sort circuit.
    Type: Application
    Filed: February 19, 2020
    Publication date: June 11, 2020
    Inventors: Wootaek LIM, Minchang CHO, Somnath PAUL, Charles AUGUSTINE, Suyoung BANG, Turbo MAJUMDER, Muhammad M. KHELLAH
  • Patent number: 10665222
    Abstract: A system, article, and method provide temporal-domain feature extraction for automatic speech recognition.
    Type: Grant
    Filed: June 28, 2018
    Date of Patent: May 26, 2020
    Assignee: Intel Corporation
    Inventors: Suyoung Bang, Muhammad Khellah, Somnath Paul, Charles Augustine, Turbo Majumder, Wootaek Lim, Tobias Bocklet, David Pearce
  • Publication number: 20200133884
    Abstract: An apparatus is described. The apparatus includes a memory controller to interface with a memory side cache and an NVRAM system memory. The memory controller has logic circuitry to favor items cached in the memory side cache that are expected to be written to above items cached in the memory side cache that are expected to only be read from.
    Type: Application
    Filed: December 19, 2019
    Publication date: April 30, 2020
    Inventors: Zeshan A. CHISHTI, Somnath PAUL, Charles AUGUSTINE, Muhammad M. KHELLAH