Patents by Inventor Somnath Paul

Somnath Paul has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Hardware offload circuitry

Patent number: 12197601

Abstract: Examples described herein relate to offload circuitry comprising one or more compute engines that are configurable to perform a workload offloaded from a process executed by a processor based on a descriptor particular to the workload. In some examples, the offload circuitry is configurable to perform the workload, among multiple different workloads. In some examples, the multiple different workloads include one or more of: data transformation (DT) for data format conversion, Locality Sensitive Hashing (LSH) for neural network (NN), similarity search, sparse general matrix-matrix multiplication (SpGEMM) acceleration of hash based sparse matrix multiplication, data encode, data decode, or embedding lookup.

Type: Grant

Filed: December 22, 2021

Date of Patent: January 14, 2025

Assignee: Intel Corporation

Inventors: Ren Wang, Sameh Gobriel, Somnath Paul, Yipeng Wang, Priya Autee, Abhirupa Layek, Shaman Narayana, Edwin Verplanke, Mrittika Ganguli, Jr-Shian Tsai, Anton Sorokin, Suvadeep Banerjee, Abhijit Davare, Desmond Kirkpatrick, Rajesh M. Sankaran, Jaykant B. Timbadiya, Sriram Kabisthalam Muthukumar, Narayan Ranganathan, Nalini Murari, Brinda Ganesh, Nilesh Jain
Energy efficient memory array with optimized burst read and write data access

Patent number: 11908542

Abstract: Prior knowledge of access pattern is leveraged to improve energy dissipation for general matrix operations. This improves memory access energy for a multitude of applications such as image processing, deep neural networks, and scientific computing workloads, for example. In some embodiments, prior knowledge of access pattern allows for burst read and/or write operations. As such, burst mode solution can provide energy savings in both READ (RD) and WRITE (WR) operations. For machine learning or inference, the weight values are known ahead in time (e.g., inference operation), and so the unused bytes in the cache line are exploited to store a sparsity map that is used for disabling read from either upper or lower half of the cache line, thus saving dynamic capacitance.

Type: Grant

Filed: December 23, 2019

Date of Patent: February 20, 2024

Assignee: Intel Corporation

Inventors: Charles Augustine, Somnath Paul, Turbo Majumder, Iqbal Rajwani, Andrew Lines, Altug Koker, Lakshminarayanan Striramassarma, Muhammad Khellah
POWER MANAGEMENT FOR EXECUTION OF MACHINE LEARNING WORKLOADS

Publication number: 20230273832

Abstract: A system for autonomous and proactive power management for energy efficient execution of machine learning workloads may include an apparatus such as system-on-chip (SoC) comprising an accelerator configurable to load and execute a neural network and circuitry to receive a profile of the neural network. The profile may be received from a compiler and include information regarding a plurality of layers of the neural network. Responsive to the profile and the information regarding the plurality of layers, circuitry may adjust, using a local power management unit (PMU) included the apparatus, a power level to the accelerator while the accelerator executes the neural network. The power level adjustment may be based on whether the particular layer is a compute-intensive layer or a memory-intensive layer.

Type: Application

Filed: April 12, 2023

Publication date: August 31, 2023

Inventors: Somnath Paul, Muhammad M. Khellah, Linda Zeng, Mohamed Elmalaki
APPARATUS AND METHOD FOR INCREASING ACTIVATION SPARSITY IN VISUAL MEDIA ARTIFICIAL INTELLIGENCE (AI) APPLICATIONS

Publication number: 20220415050

Abstract: A Media Analytics Co-optimizer (MAC) engine that utilizes available motion and scene information to increase the activation sparsity in artificial intelligence (AI) visual media applications. In an example, the MAC engine receives video frames and associated video characteristics determined by a video decoder and reformats the video frames by applying a threshold level of motion to the video frames and zeroing out areas that fall below the threshold level of motion. In some examples, the MAC engine further receives scene information from an optical flow engine or event processing engine and reformats further based thereon. The reformatted video frames are consumed by the first stage of AI inference.

Type: Application

Filed: August 31, 2022

Publication date: December 29, 2022

Applicant: Intel Corporation

Inventors: Palanivel Guruva reddiar, Siew Hoon Lim, Somnath Paul, Shabbir Abbasali Saifee
Concurrent compute and ECC for in-memory matrix vector operations

Patent number: 11513893

Abstract: A system includes a compute circuit that preemptively performs a computation on a data word before receiving an indication of data errors from an error checking and correction (ECC) circuit. The ECC circuit reads the data word from a memory array and performs error detection and error correction on the data word. The compute circuit reads the data word and performs the computation on the data word to generate an output value, without waiting for the ECC circuit to check and correct the data word. In response to error detection in the data word by the ECC circuit, the compute circuit delays outputting the output value until correction of the output value in accordance with the error detection by the ECC circuit.

Type: Grant

Filed: December 21, 2020

Date of Patent: November 29, 2022

Assignee: Intel Corporation

Inventors: Somnath Paul, Charles Augustine, Chen Koren, George Shchupak, Muhammad M. Khellah
Ultra-deep compute static random access memory with high compute throughput and multi-directional data propagation

Patent number: 11450672

Abstract: An ultra-deep compute Static Random Access Memory (SRAM) with high compute throughput and multi-directional data transfer capability is provided. Compute units are placed in both horizontal and vertical directions to achieve a symmetric layout while enabling communication between the compute units. An SRAM array supports simultaneous read and write to the left and right section of the same SRAM subarray by duplicating pre-decoding logic inside the SRAM array. This allows applications with non-overlapping read and write address spaces to have twice the bandwidth as compared to a baseline SRAM array.

Type: Grant

Filed: April 27, 2020

Date of Patent: September 20, 2022

Assignee: Intel Corporation

Inventors: Charles Augustine, Somnath Paul, Muhammad M. Khellah, Chen Koren
HARDWARE OFFLOAD CIRCUITRY

Publication number: 20220114270

Abstract: Examples described herein relate to offload circuitry comprising one or more compute engines that are configurable to perform a workload offloaded from a process executed by a processor based on a descriptor particular to the workload. In some examples, the offload circuitry is configurable to perform the workload, among multiple different workloads. In some examples, the multiple different workloads include one or more of: data transformation (DT) for data format conversion, Locality Sensitive Hashing (LSH) for neural network (NN), similarity search, sparse general matrix-matrix multiplication (SpGEMM) acceleration of hash based sparse matrix multiplication, data encode, data decode, or embedding lookup.

Type: Application

Filed: December 22, 2021

Publication date: April 14, 2022

Inventors: Ren WANG, Sameh GOBRIEL, Somnath PAUL, Yipeng WANG, Priya AUTEE, Abhirupa LAYEK, Shaman NARAYANA, Edwin VERPLANKE, Mrittika GANGULI, Jr-Shian TSAI, Anton SOROKIN, Suvadeep BANERJEE, Abhijit DAVARE, Desmond KIRKPATRICK
METHODS AND APPARATUS FOR HIGH THROUGHPUT COMPRESSION OF NEURAL NETWORK WEIGHTS

Publication number: 20220012563

Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed for high throughput compression of neural network weights. An example apparatus includes at least one memory, instructions in the apparatus and processor circuitry to execute the instructions to determine sizes of data lanes in a partition of neural network weights, determine a slice size based on a size difference between a first data lane and a second data lane of the data lanes in the partition, the first data lane including first data, the second data lane including second data, the second data of a smaller size than the first data, cut a portion of the first data from the first data lane based on the slice size, and append the portion of the first data to the second data lane.

Type: Application

Filed: September 24, 2021

Publication date: January 13, 2022

Inventors: Alejandro Castro Gonzalez, Praveen Nair, Somnath Paul, Sudheendra Kadri, Palanivel Guruvareddiar, Aaron Gubrud, Vinodh Gopal
Techniques for multi-read and multi-write of memory circuit

Patent number: 11176994

Abstract: Embodiments include apparatuses, methods, and systems to implement a multi-read and/or multi-write process with a set of memory cells. The set of memory cells may be multiplexed with a same sense amplifier. As part of a multi-read process, a memory controller coupled to a memory circuit may precharge the bit lines associated with the set of memory cells, provide a single assertion of a word line signal on the word line, and then sequentially read data from the set of memory cells (using the sense amplifier) based on the precharge and the single assertion of the word line signal. Additionally, or alternatively, a multi-write process may be performed to sequentially write data to the set of memory cells based on one precharge of the associated bit lines. Other embodiments may be described and claimed.

Type: Grant

Filed: August 24, 2020

Date of Patent: November 16, 2021

Assignee: Intel Corporation

Inventors: Muhammad M. Khellah, Somnath Paul, Charles Augustine, Turbo Majumder, Suyoung Bang
PARALLEL PRUNING AND BATCH SORTING FOR SIMILARITY SEARCH ACCELERATORS

Publication number: 20210319022

Abstract: Systems, apparatuses and methods include technology that determines, with a first processing engine of a plurality of processing engines, a first partial similarity measurement based on a first portion of a query vector and a first portion of a first candidate vector. The technology determines, with a second processing engine of the plurality of processing engines, a total similarity measurement based on the query vector and a second candidate vector. The technology determines, with the first processing engine, whether to compare a second portion of the query vector to a second portion of the first candidate vector based on the first partial similarity measurement and the total similarity measurement.

Type: Application

Filed: June 25, 2021

Publication date: October 14, 2021

Applicant: Intel Corporation

Inventors: Srajudheen Makkadayil, Somnath Paul, Shabbir Saifee, Bakshree Mishra, Vidhya Thyagarajan, Manoj Velayudha, Muhammad Khellah, Aniekeme Udofia
ENERGY EFFICIENT MEMORY ARRAY WITH OPTIMIZED BURST READ AND WRITE DATA ACCESS, AND SCHEME FOR READING AND WRITING DATA FROM/TO REARRANGED MEMORY SUBARRAY WHERE UNUSED METADATA IS STORED IN A SPARSITY MAP

Publication number: 20210193196

Abstract: Prior knowledge of access pattern is leveraged to improve energy dissipation for general matrix operations. This improves memory access energy for a multitude of applications such as image processing, deep neural networks, and scientific computing workloads, for example. In some embodiments, prior knowledge of access pattern allows for burst read and/or write operations. As such, burst mode solution can provide energy savings in both READ (RD) and WRITE (WR) operations. For machine learning or inference, the weight values are known ahead in time (e.g., inference operation), and so the unused bytes in the cache line are exploited to store a sparsity map that is used for disabling read from either upper or lower half of the cache line, thus saving dynamic capacitance.

Type: Application

Filed: December 23, 2019

Publication date: June 24, 2021

Applicant: Intel Corporation

Inventors: Charles Augustine, Somnath Paul, Turbo Majumder, Iqbal Rajwani, Andrew Lines, Altug Koker, Lakshminarayanan Striramassarma, Muhammad Khellah
CONCURRENT COMPUTE AND ECC FOR IN-MEMORY MATRIX VECTOR OPERATIONS

Publication number: 20210109809

Abstract: A system includes a compute circuit that preemptively performs a computation on a data word before receiving an indication of data errors from an error checking and correction (ECC) circuit. The ECC circuit reads the data word from a memory array and performs error detection and error correction on the data word. The compute circuit reads the data word and performs the computation on the data word to generate an output value, without waiting for the ECC circuit to check and correct the data word. In response to error detection in the data word by the ECC circuit, the compute circuit delays outputting the output value until correction of the output value in accordance with the error detection by the ECC circuit.

Type: Application

Filed: December 21, 2020

Publication date: April 15, 2021

Inventors: Somnath Paul, Charles Augustine, Chen Koren, George Shchupak, Muhammad M. Khellah
TECHNIQUES FOR MULTI-READ AND MULTI-WRITE OF MEMORY CIRCUIT

Publication number: 20210043251

Abstract: Embodiments include apparatuses, methods, and systems to implement a multi-read and/or multi-write process with a set of memory cells. The set of memory cells may be multiplexed with a same sense amplifier. As part of a multi-read process, a memory controller coupled to a memory circuit may precharge the bit lines associated with the set of memory cells, provide a single assertion of a word line signal on the word line, and then sequentially read data from the set of memory cells (using the sense amplifier) based on the precharge and the single assertion of the word line signal. Additionally, or alternatively, a multi-write process may be performed to sequentially write data to the set of memory cells based on one precharge of the associated bit lines. Other embodiments may be described and claimed.

Type: Application

Filed: August 24, 2020

Publication date: February 11, 2021

Inventors: Muhammad M. Khellah, Somnath Paul, Charles Augustine, Turbo Majumder, Suyoung Bang
Apparatus, video processing unit and method for clustering events in a content addressable memory

Patent number: 10892012

Abstract: An apparatus, vision processing unit, and method are provided for clustering motion events in a content addressable memory. A motion event is received including coordinates in an image frame that have experienced a change and a timestamp of the change. A determination is made as to whether determine whether there is a valid entry in the memory having coordinates within a predefined range of coordinates included in the motion event. In response to a determination that there is the valid entry having the coordinates within the predefined range of coordinates included in the motion event, write to the valid entry the coordinates and the timestamp in the motion event.

Type: Grant

Filed: August 23, 2018

Date of Patent: January 12, 2021

Assignee: INTEL CORPORATION

Inventors: Turbo Majumder, Somnath Paul, Charles Augustine, Muhammad M. Khellah
Post synaptic potential-based learning rule

Patent number: 10878313

Abstract: A spike sent from a first artificial neuron in a spiking neural network (SNN) to a second artificial neuron in the SNN is identified, with the spike sent over a particular artificial synapse in the SNN. The membrane potential of the second artificial neuron at a particular time step, corresponding to sending of the spike, is compared to a threshold potential, where the threshold potential is set lower than a firing potential of the second artificial neuron. A change to the synaptic weight of the particular artificial synapse is determined based on the spike, where the synaptic weight is to be decreased if the membrane potential of the second artificial neuron is lower than the threshold potential at the particular time step and the synaptic weight is to be increased if the membrane potential of the second artificial neuron is higher than the threshold potential at the particular time step.

Type: Grant

Filed: May 2, 2017

Date of Patent: December 29, 2020

Assignee: INTEL CORPORATION

Inventors: Charles Augustine, Somnath Paul, Sadique Ul Ameen Sheik, Muhammad M. Khellah
Techniques for multi-read and multi-write of memory circuit

Patent number: 10755771

Abstract: Embodiments include apparatuses, methods, and systems to implement a multi-read and/or multi-write process with a set of memory cells. The set of memory cells may be multiplexed with a same sense amplifier. As part of a multi-read process, a memory controller coupled to a memory circuit may precharge the bit lines associated with the set of memory cells, provide a single assertion of a word line signal on the word line, and then sequentially read data from the set of memory cells (using the sense amplifier) based on the precharge and the single assertion of the word line signal. Additionally, or alternatively, a multi-write process may be performed to sequentially write data to the set of memory cells based on one precharge of the associated bit lines. Other embodiments may be described and claimed.

Type: Grant

Filed: December 19, 2018

Date of Patent: August 25, 2020

Assignee: Intel Corporation

Inventors: Muhammad M. Khellah, Somnath Paul, Charles Augustine, Turbo Majumder, Suyoung Bang
Pre-synaptic learning using delayed causal updates

Patent number: 10748060

Abstract: A processor or integrated circuit includes a memory to store weight values for a plurality neuromorphic states and a circuitry coupled to the memory. The circuitry is to detect an incoming data signal for a pre-synaptic neuromorphic state and initiate a time window for the pre-synaptic neuromorphic state in response to detecting the incoming data signal. The circuitry is further to, responsive to detecting an end of the time window: retrieve, from the memory, a weight value for a post-synaptic neuromorphic state for which an outgoing data signal is generated during the time window, the post-synaptic neuromorphic state being a fan-out connection of the pre-synaptic neuromorphic state; perform a causal update to the weight value, according to a learning function, to generate an updated weight value; and store the updated weight value back to the memory.

Type: Grant

Filed: October 14, 2016

Date of Patent: August 18, 2020

Assignee: Intel Corporation

Inventors: Somnath Paul, Charles Augustine, Muhammad M. Khellah
ULTRA-DEEP COMPUTE STATIC RANDOM ACCESS MEMORY WITH HIGH COMPUTE THROUGHPUT AND MULTI-DIRECTIONAL DATA PROPAGATION

Publication number: 20200258890

Abstract: An ultra-deep compute Static Random Access Memory (SRAM) with high compute throughput and multi-directional data transfer capability is provided. Compute units are placed in both horizontal and vertical directions to achieve a symmetric layout while enabling communication between the compute units. An SRAM array supports simultaneous read and write to the left and right section of the same SRAM subarray by duplicating pre-decoding logic inside the SRAM array. This allows applications with non-overlapping read and write address spaces to have twice the bandwidth as compared to a baseline SRAM array.

Type: Application

Filed: April 27, 2020

Publication date: August 13, 2020

Inventors: Charles AUGUSTINE, Somnath PAUL, Muhammad M. KHELLAH, Chen KOREN
NEAREST NEIGHBOR SEARCH LOGIC CIRCUIT WITH REDUCED LATENCY AND POWER CONSUMPTION

Publication number: 20200183922

Abstract: An apparatus is described. The apparatus includes a nearest neighbor search circuit to perform a search according to a first stage search and a second stage search. The nearest neighbor search circuit includes a first stage circuit and a second stage circuit. The first stage search circuit includes a hash logic circuit and a content addressable memory. The hash logic circuit is to generate a hash word from a input query vector. The hash word has B bands. The content addressable memory is to store hashes of a random access memory's data items. The hashes each have B bands. The content addressable memory is to compare the hashes against the hash word on a sequential band-by-band basis. The second stage circuit char the random access memory and a compare and sort circuit. The compare and sort circuit is to receive the input query vector. The random access memory has crosswise bit lines coupled to the compare and sort circuit.

Type: Application

Filed: February 19, 2020

Publication date: June 11, 2020

Inventors: Wootaek LIM, Minchang CHO, Somnath PAUL, Charles AUGUSTINE, Suyoung BANG, Turbo MAJUMDER, Muhammad M. KHELLAH
Method and system of temporal-domain feature extraction for automatic speech recognition

Patent number: 10665222

Abstract: A system, article, and method provide temporal-domain feature extraction for automatic speech recognition.

Type: Grant

Filed: June 28, 2018

Date of Patent: May 26, 2020

Assignee: Intel Corporation

Inventors: Suyoung Bang, Muhammad Khellah, Somnath Paul, Charles Augustine, Turbo Majumder, Wootaek Lim, Tobias Bocklet, David Pearce

1 2 3 4 next