Patents by Inventor Deepak Abraham Mathaikutty

Deepak Abraham Mathaikutty has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240119269
    Abstract: A deep neural network (DNN) accelerator may facilitate dynamic sparsity-based acceleration and operate in various sparsity modes including a combined sparsity mode, a weight sparsity mode, an activation sparsity mode, and a dense mode. The DNN accelerator may receive a configuration parameter indicating whether to accelerate the layer based on sparsity in a weight tensor of the layer. The configuration parameter may be generated offline, e.g., before the execution of the DNN is started. The DNN accelerator computes one or more activations of the layer in a previous layer in the DNN. The one or more activations are one or more elements of an activation tensor of the layer. The DNN accelerator may determine a sparsity mode for the layer based on the configuration parameter and sparsity in the activation tensor. One or more sparse cells in the DNN accelerator may execute the layer in the sparsity mode.
    Type: Application
    Filed: December 18, 2023
    Publication date: April 11, 2024
    Inventors: Arnab Raha, Dinakar Kondru, Deepak Abraham Mathaikutty, Umer Iftikhar Cheema
  • Publication number: 20240111830
    Abstract: A non-linear activation function in a neural network may be approximated by one or more linear functions. The input range may be divided into input segments, each of which corresponds to a different exponent in the input range of the activation function and includes input data elements having the exponent. Target accuracies may be assigned to the identified exponents based on a statistics analysis of the input data elements. The target accuracy of an input segment will be used to determine one or more linear functions that approximate the activation function for the input segment. An error of an approximation of the activation function by a linear function for the input segment may be within the target accuracy. The parameters of the linear functions may be stored in a look-up table (LUT). During the execution of the DNN, the LUT may be used to execute the activation function.
    Type: Application
    Filed: December 8, 2023
    Publication date: April 4, 2024
    Applicant: Intel Corporation
    Inventors: Umer Iftikhar Cheema, Robert Simofi, Deepak Abraham Mathaikutty, Arnab Raha, Dinakar Kondru
  • Publication number: 20240028895
    Abstract: A load module in a deep neural network (DNN) accelerator may receive a configuration parameter indicating a selection between an activation sparsity mode and a weight sparsity mode. The load module may read a sparse activation tensor, an activation sparsity bitmap, a sparse weight tensor, and a weight sparsity bitmap from a memory. The load module may densify one of the compressed tensors based on the sparsity mode and leave the other compressed tensor as is. The load module may load the dense tensor and the sparse tensor to a sparse cell. The sparse cell includes a sparsity module that may select one or more elements of the dense tensor based on the sparsity bitmap of the sparse tensor. The sparse cell also includes multiply-accumulate (MAC) units that perform MAC operation on the selected elements and the sparse tensor. MAC operations on unselected elements of the dense tensor are skipped.
    Type: Application
    Filed: September 28, 2023
    Publication date: January 25, 2024
    Inventors: Arnab Raha, Deepak Abraham Mathaikutty, Dinakar Kondru, Umer Iftikhar Cheema, Martin Power, Niall Hanrahan
  • Publication number: 20240013040
    Abstract: A drain module may drain activations in an output tensor of a convolution from a PE array that performs the convolution. The drain module may extract activations generated in a collection of PE columns. The activations generated in the PE columns in the collection may be concatenated, e.g., activations generated in the first PE column of the collection may be followed by activations generated in the second PE column of the collection, and so on. The activations in the output tensor may be rearranged into activation vectors. Each activation vector may include activations in different output channels of the deep learning operation. The activations in each activation vector may have the same (X, Y) coordinate in the output tensor. The drain module may determine a memory address for an activation based on the activation's (X, Y, Z) coordinate in the output tensor and write the activation to the memory address.
    Type: Application
    Filed: September 26, 2023
    Publication date: January 11, 2024
    Applicant: Intel Corporation
    Inventors: Arnab Raha, Deepak Abraham Mathaikutty, Umer Iftikhar Cheema, Dinakar Kondru
  • Publication number: 20230394312
    Abstract: Activations (e.g., output activations) or weights of intermediate layers of deep neural networks (DNNs) can be pruned to increase sparsity and reduce the amount of computation required for performing the computations in the layers or subsequent layers. A pruning threshold may be determined, e.g., through an iterative process, and activations or weights having absolute values lower than the pruning threshold may be changed to zero. A first pruning threshold may be used to prune an output tensor or kernel of a layer. The loss in the accuracy of the DNN due to the pruning may be determined. A second pruning threshold may be determined based on the first pruning threshold and the accuracy loss. The DNN may be modified by adding a pruning operation to the layer. The pruning operation can prune output tensors or kernels of the layer based on the second pruning threshold.
    Type: Application
    Filed: August 22, 2023
    Publication date: December 7, 2023
    Applicant: Intel Corporation
    Inventors: Soumendu Kumar Ghosh, Shamik Kundu, Arnab Raha, Deepak Abraham Mathaikutty
  • Publication number: 20230376274
    Abstract: A fused dot-product multiply-accumulate (MAC) circuit may support variable precisions of floating-point data elements to perform computations (e.g., MAC operations) in deep learning operations. An operation mode of the circuit may be selected based on the precision of an input element. The operation mode may be a FP16 mode or a FP8 mode. In the FP8 mode, product exponents may be computed based on exponents of floating-point input elements. A maximum exponent may be selected from the one or more product exponents. A global maximum exponent may be selected from a plurality of maximum exponents. A product mantissa may be computed and aligned with another product mantissa based on a difference between the global maximum exponent and a corresponding maximum exponent. An adder tree may accumulate the aligned product mantissas and compute a partial sum mantissa. The partial sum mantissa may be normalized using the global maximum exponent.
    Type: Application
    Filed: July 31, 2023
    Publication date: November 23, 2023
    Applicant: Intel Corporation
    Inventors: Mark Anders, Arnab Raha, Amit Agarwal, Steven Hsu, Deepak Abraham Mathaikutty, Ram K. Krishnamurthy, Martin Power
  • Publication number: 20230368030
    Abstract: Weights can be pruned during DNN training to increase sparsity in the weights and reduce the amount of computation required for performing the deep learning operations in DNNs. A DNN layer may have one or more weight tensors corresponding to one or more output channels of the layer. A weight tensor has weights, the values of which are determined by training the DNN. A weight tensor may have a dimension corresponding to the input channels of the layer. The weight tensor may be partitioned into subtensors, each of which has a subset of the input channels. The subtensor may have the same number of input channels. One or more subtensors may be selected, e.g., based on the weights in the one or more subtensors. The weights in a selected subtensor are pruned, e.g., changed to zeros. The weights in an unselected subtensor may be modified by further training the DNN.
    Type: Application
    Filed: July 25, 2023
    Publication date: November 16, 2023
    Inventors: Arnab Raha, Michael Wu, Deepak Abraham Mathaikutty, Martin Langhammer, Nihat Tunali
  • Publication number: 20230351181
    Abstract: An activation function unit can compute activation functions approximated by Taylor series. The activation function unit may include a plurality of compute elements. Each compute element may include two multipliers and an accumulator. The first multiplier may compute intermediate products using an activation, such as an output activation of a DNN layer. The second multiplier may compute terms of Taylor series approximating an activation function based on the intermediate products from the first multiplier and coefficients of the Taylor series. The accumulator may compute a partial sum of the terms as an output of the activation function. The number of the terms may be determined based on a predetermined accuracy of the output of the activation function. The activation function unit may process multiple activations. Different activations may be input into different compute elements in different clock cycles. The activation function unit may compute activation functions with different accuracies.
    Type: Application
    Filed: July 5, 2023
    Publication date: November 2, 2023
    Inventors: Umer Iftikhar Cheema, Deepak Abraham Mathaikutty, Arnab Raha, Dinakar Kondru, Raymond Jit-Hung Sung, Soumendu Kumar Ghosh
  • Publication number: 20230334289
    Abstract: A deep neural network (DNN) accelerator includes one or more compute blocks that perform deep learning operations in DNNs. A compute block includes a memory and one or more processing elements. The memory may include bank groups, each of which includes memory banks. The memory may also include a group selection module, buffers, interconnects, and bank selection modules. The group selection module may select a bank group for a data transfer request from a processing element and store the data transfer request in a buffer associated with the bank group. The memory address in the data transfer request may be transmitted from the buffer to a bank selection module associated with the bank group through an interconnect. The bank selection module may select a memory bank in the bank group based on the memory address. Data can be read from or written into the selected memory bank.
    Type: Application
    Filed: June 20, 2023
    Publication date: October 19, 2023
    Applicant: Intel Corporation
    Inventors: Robert Cezary Zaglewski, Deepak Abraham Mathaikutty
  • Publication number: 20230252299
    Abstract: Computations in processing elements (PEs) for executing a deep neural network (DNN) may be accelerated based on sparsity. A compressed activation operand and a compressed weight operand may be stored. The compressed activation operand includes one or more nonzero activations in an activation operand. The compressed weight operand includes one or more nonzero weights in a weight operand. A sparsity module associated with the PE may generate a bitmap based on an activation sparsity vector of the activation operand and a weight sparsity vector of the weight operand. The sparsity module identifies a nonzero activation (or a nonzero weight) from the compressed activation operand (or the compressed weight operand) based on the bitmap. The sparsity module may detect a fault in identifying the nonzero activation or the nonzero weight based on the number of one or more nonzero elements in the bitmap. The sparsity module may further mitigate the fault.
    Type: Application
    Filed: April 21, 2023
    Publication date: August 10, 2023
    Inventors: Shamik Kundu, Arnab Raha, Deepak Abraham Mathaikutty
  • Publication number: 20230229917
    Abstract: A compute block can perform hybrid multiply-accumulate (MAC) operations. The compute block may include a weight compressing module and a processing element (PE) array. The weight compression module may select a first group of one or more weights and a second group of one or more weights from a weight tensor of a DNN (deep neural network) layer. A weight in the first group is quantized to a power of two value. A weight in the second group is quantized to an integer. The integer and the exponent of the power of two value may be stored in a memory in lieu of the original values of the weights. A PE in the PE array includes a shifter configured to shift an activation of the layer by the exponent of the power of two value and a multiplier configured to multiplying the integer with another activation of the layer.
    Type: Application
    Filed: March 15, 2023
    Publication date: July 20, 2023
    Applicant: Intel Corporation
    Inventors: Michael Wu, Arnab Raha, Deepak Abraham Mathaikutty, Nihat Tunali, Martin Langhammer
  • Publication number: 20230229507
    Abstract: Computations in processing elements (PEs) for executing a deep neural network are scheduled via a computation scheduler based on sparsity in input data of the computations to reduce voltage droops. Each PE may compute an input operand and a weight operand in a computation. The computation scheduler may predict the workload of the PE for the computation based on a combined sparsity bitmap, which may be generated based on a sparsity bitmap of the input operand and a sparsity bitmap of the weight operand. The computation scheduler can schedule the starts of the computations in the PEs based on the predicted workloads of the PEs. The computation scheduler may instruct the PE having the highest workload to start the computation first and instruct the other PEs to start computations later. In some embodiments, the computations in the PEs may end in the same clock cycle.
    Type: Application
    Filed: March 8, 2023
    Publication date: July 20, 2023
    Applicant: Intel Corporation
    Inventors: Raymond Jit-Hung Sung, Arnab Raha, Deepak Abraham Mathaikutty, Umer Iftikhar Cheema
  • Publication number: 20230221994
    Abstract: A compute block can dynamically uncompress compressed data for executing a channel-separable operation. The compressed data includes one or more nonzero-valued data elements. The compressed data may be stored in a datastore along with a sparsity bitmap of an input operand including the compressed data. An uncompressing module may determine whether the input operand includes any zero-valued data element, e.g., by determining whether the sparsity bitmap includes a zero-valued bit. After determining that the sparsity bitmap includes a zero-valued bit, the uncompressing module inserts a zero-valued data element into the compressed data based on a position of the bit in the sparsity bitmap and generates uncompressed data and update the sparsity bitmap so that all the bits become ones. The uncompressed dense data is transmitted to one or more processing elements (PE) in the compute block for computing an output operand based on the uncompressed dense data.
    Type: Application
    Filed: March 16, 2023
    Publication date: July 13, 2023
    Inventors: Arnab Raha, Deepak Abraham Mathaikutty, Raymond Jit-Hung Sung, Umer Iftikhar Cheema, Dinakar Kondru, Soumendu Kumar Ghosh
  • Publication number: 20230140173
    Abstract: An DNN accelerator includes one or more heterogenous tile sets. A heterogenous tile set includes tiles of different sizes, e.g., PE arrays including different numbers of columns or rows. The DNN accelerator may identify a tile set from the tile sets for running a DNN model based on dimensions of output tensors convolutional layers in the DNN. Within the selected tile set, a tile may be selected for a convolutional layer in the DNN, e.g., based on dimensions of the output tensor of the convolutional layer and the size of the tile. After the tile is selected, the workload for running a convolutional operation of the layer may be partitioned and assigned to individual PEs in the tile by partitioning the output tensor into output tensor segments. The workload of computing an individual output tensor segment can be assigned to an individual PE in the tile.
    Type: Application
    Filed: August 19, 2022
    Publication date: May 4, 2023
    Inventors: Arnab Raha, Umer Iftikhar Cheema, Praveen Kumar Gupta, Deepak Abraham Mathaikutty, Raymond Jit-Hung Sung
  • Publication number: 20230073661
    Abstract: An DNN (deep neural network) accelerator may accelerate deep learning, such as convolutions in frontend layers through a scheduler for loading data to be processed. The DNN accelerator may store, in a memory, an input tensor of a convolutional layer in a DNN. The convolutional layer may be the first layer or a layer that is arranged before the one or more other convolutional layers in the DNN such that data processed by the first layer can be efficiently reused across data load rounds. The input tensor includes one or more channels. A channel includes activations arranged in rows and columns. The DNN accelerator may read at least a portion of the input tensor from the memory into a datastore. The datastore includes some databanks. The DNN accelerator may provide a vector of one or more activations to a processing element for operations such as multiplications on the vector.
    Type: Application
    Filed: November 14, 2022
    Publication date: March 9, 2023
    Applicant: Intel Corporation
    Inventors: Deepak Abraham Mathaikutty, Arnab Raha, Umer Iftikhar Cheema, Raymond Jit-Hung Sung
  • Publication number: 20230059976
    Abstract: An DNN accelerator may include a PE array performing MAC operations. The PE array may include PEs capable of MAC operations on quantized values. A PE may include subtractors for subtracting zeropoints from quantized activations and quantized weights to generate intermediate activations and intermediate weights. The intermediate activations and intermediate weights may be stored in data storage units in the PE and maybe used by an MAC unit in the PE. The subtractors may be placed outside the MAC unit but inside the PE. The MAC unit may perform sequential cycles of MAC operations. The MAC unit may include a plurality of multipliers. The intermediate activations and intermediate weights stored in the data storage units may be reused by different multipliers in different cycles of MAC operations. An output of the MAC unit or of the PE may be multiplied with a quantization scale to produce a floating-point value.
    Type: Application
    Filed: October 18, 2022
    Publication date: February 23, 2023
    Applicant: Intel Corporation
    Inventors: Deepak Abraham Mathaikutty, Arnab Raha, Raymond Jit-Hung Sung, Martin Power, Umer Iftikhar Cheema, David Thomas Bernard
  • Publication number: 20230017662
    Abstract: An DNN accelerator includes a DMA engine that can rearrange weight data layout. The DMA engine may read a weight tensor from a memory (e.g., DRAM). The weight tensor includes weights arranged in a 3D matrix. The DMA engine may partition the weight tensor into a plurality of virtual banks based on a structure of a PE array, e.g., based on the number of activated PE columns in the PE array. Then the DMA engine may partition a virtual bank into a plurality of virtual sub-banks. The DMA engine may also identify data blocks from different ones of the plurality of virtual sub-banks. A data block may include a plurality of input channels and may have a predetermined spatial size and storage size. The DMA engine form a linear data structure by interleaving the data blocks. The DMA engine can write the linear data structure into another memory (e.g., SRAM).
    Type: Application
    Filed: September 16, 2022
    Publication date: January 19, 2023
    Inventors: Sudheendra Kadri, Darren Crews, Deepak Abraham Mathaikutty, Andrea Deidda, Arnab Raha, Kevin Brady, David Thomas Bernard
  • Publication number: 20230014656
    Abstract: A memory array of a compute tile may store activations or weights of a DNN. The memory array may include databanks for storing contexts, context MUXs, and byte MUXs. A databank may store a context with flip-flop arrays, each of which includes a sequence of flip-flops. A logic gate and an ICG unit may gate flip-flops and control whether states of the flip-flops can be changed. The data gating can prevent a context not selected for the databank from inadvertently toggling and wasting power A context MUX may read a context from different flip-flop arrays in a databank based on gray-coded addresses. A byte MUX can combine bits from different bytes in a context read by the context MUX. The memory array may be implemented with bit packing to reduce distance between the context MUX and byte MUX to reduce lengths of wires connecting the context MUXs and byte MUXs.
    Type: Application
    Filed: September 23, 2022
    Publication date: January 19, 2023
    Inventors: Raymond Jit-Hung Sung, Deepak Abraham Mathaikutty, Amit Agarwal, David Thomas Bernard, Steven Hsu, Martin Power, Conor Byme, Arnab Raha
  • Publication number: 20230018857
    Abstract: Sparsity processing within a compute block can be done on unpacked data. The compute block includes a sparsity decoder that generates a combined sparsity vector from an activation sparsity vector and a weight sparsity vector. The activation sparsity vector indicates positions of non-zero valued activations in an activation context. The weight sparsity vector indicates positions of non-zero valued weights in a weight context. The combined sparsity vector comprises one or more zero valued bits and one or more non-zero valued bits. The sparsity decoder may determine the position of a non-zero valued bit in the combined sparsity vector and determine an address for the non-zero valued activation and the non-zero valued weight based on the position of the non-zero valued bit. The non-zero valued activation and the non-zero valued weight may be provided to a PE for performing MAC operations.
    Type: Application
    Filed: September 19, 2022
    Publication date: January 19, 2023
    Inventors: Martin Power, Conor Byrne, Niall Hanrahan, Deepak Abraham Mathaikutty, Arnab Raha, Raymond Jit-Hung Sung, David Thomas Bernard, Kevin Brady, Martin-Thomas Grymel
  • Publication number: 20230008622
    Abstract: An DNN accelerator may perform 1×N kernel decomposition to decompose a convolutional kernel into kernel vectors, each of which includes multiple weights. Through the kernel decomposition, a weight operand may be generated from a filter. The DNN accelerator converts an input tensor into input operands. An input operand includes activations and has the same size as the weight operand. The DNN accelerator may read a first activation in the input operand from memory to an internal memory of a first PE and read a second activation in the input operand from the memory to an internal memory of a second PE. The first PE may receive the second activation from the second PE through activation broadcasting between the two PEs and perform MAC operations on the input operand and weight operand. The second PE may perform MAC operations on another input operand in the input tensor and the weight operand.
    Type: Application
    Filed: September 22, 2022
    Publication date: January 12, 2023
    Inventors: Richard Boyd, David Thomas Bernard, Deepak Abraham Mathaikutty, Martin Power, Niall Hanrahan