Patents by Inventor Umer Iftikhar Cheema

Umer Iftikhar Cheema has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240160695
    Abstract: A non-linear activation function may be approximated by linear functions. The input range of the activation function may be divided into input segments. One or more input segments may be selected based on statistical analysis of input data elements in the input range. A parameter of a first linear function that approximates the activation function for at least part of a selected input segment may be stored in a first portion of a first look-up table (LUT). The first portion of the first LUT is dedicated to a first group of post processing engines (PPEs). A parameter of a second linear function that approximates the activation function for at least part of an unselected input segment may be stored in a shared pool of LUT entries, which includes a second portion of the first LUT and a portion of a second LUT and is shared by multiple groups of PPEs.
    Type: Application
    Filed: December 21, 2023
    Publication date: May 16, 2024
    Applicant: Intel Corporation
    Inventors: Dinakar Kondru, Deepak Abraham Mathaikutty, Arnab Raha, Umer Iftikhar Cheema
  • Publication number: 20240119269
    Abstract: A deep neural network (DNN) accelerator may facilitate dynamic sparsity-based acceleration and operate in various sparsity modes including a combined sparsity mode, a weight sparsity mode, an activation sparsity mode, and a dense mode. The DNN accelerator may receive a configuration parameter indicating whether to accelerate the layer based on sparsity in a weight tensor of the layer. The configuration parameter may be generated offline, e.g., before the execution of the DNN is started. The DNN accelerator computes one or more activations of the layer in a previous layer in the DNN. The one or more activations are one or more elements of an activation tensor of the layer. The DNN accelerator may determine a sparsity mode for the layer based on the configuration parameter and sparsity in the activation tensor. One or more sparse cells in the DNN accelerator may execute the layer in the sparsity mode.
    Type: Application
    Filed: December 18, 2023
    Publication date: April 11, 2024
    Inventors: Arnab Raha, Dinakar Kondru, Deepak Abraham Mathaikutty, Umer Iftikhar Cheema
  • Publication number: 20240111830
    Abstract: A non-linear activation function in a neural network may be approximated by one or more linear functions. The input range may be divided into input segments, each of which corresponds to a different exponent in the input range of the activation function and includes input data elements having the exponent. Target accuracies may be assigned to the identified exponents based on a statistics analysis of the input data elements. The target accuracy of an input segment will be used to determine one or more linear functions that approximate the activation function for the input segment. An error of an approximation of the activation function by a linear function for the input segment may be within the target accuracy. The parameters of the linear functions may be stored in a look-up table (LUT). During the execution of the DNN, the LUT may be used to execute the activation function.
    Type: Application
    Filed: December 8, 2023
    Publication date: April 4, 2024
    Applicant: Intel Corporation
    Inventors: Umer Iftikhar Cheema, Robert Simofi, Deepak Abraham Mathaikutty, Arnab Raha, Dinakar Kondru
  • Publication number: 20240028895
    Abstract: A load module in a deep neural network (DNN) accelerator may receive a configuration parameter indicating a selection between an activation sparsity mode and a weight sparsity mode. The load module may read a sparse activation tensor, an activation sparsity bitmap, a sparse weight tensor, and a weight sparsity bitmap from a memory. The load module may densify one of the compressed tensors based on the sparsity mode and leave the other compressed tensor as is. The load module may load the dense tensor and the sparse tensor to a sparse cell. The sparse cell includes a sparsity module that may select one or more elements of the dense tensor based on the sparsity bitmap of the sparse tensor. The sparse cell also includes multiply-accumulate (MAC) units that perform MAC operation on the selected elements and the sparse tensor. MAC operations on unselected elements of the dense tensor are skipped.
    Type: Application
    Filed: September 28, 2023
    Publication date: January 25, 2024
    Inventors: Arnab Raha, Deepak Abraham Mathaikutty, Dinakar Kondru, Umer Iftikhar Cheema, Martin Power, Niall Hanrahan
  • Publication number: 20240013040
    Abstract: A drain module may drain activations in an output tensor of a convolution from a PE array that performs the convolution. The drain module may extract activations generated in a collection of PE columns. The activations generated in the PE columns in the collection may be concatenated, e.g., activations generated in the first PE column of the collection may be followed by activations generated in the second PE column of the collection, and so on. The activations in the output tensor may be rearranged into activation vectors. Each activation vector may include activations in different output channels of the deep learning operation. The activations in each activation vector may have the same (X, Y) coordinate in the output tensor. The drain module may determine a memory address for an activation based on the activation's (X, Y, Z) coordinate in the output tensor and write the activation to the memory address.
    Type: Application
    Filed: September 26, 2023
    Publication date: January 11, 2024
    Applicant: Intel Corporation
    Inventors: Arnab Raha, Deepak Abraham Mathaikutty, Umer Iftikhar Cheema, Dinakar Kondru
  • Publication number: 20230351181
    Abstract: An activation function unit can compute activation functions approximated by Taylor series. The activation function unit may include a plurality of compute elements. Each compute element may include two multipliers and an accumulator. The first multiplier may compute intermediate products using an activation, such as an output activation of a DNN layer. The second multiplier may compute terms of Taylor series approximating an activation function based on the intermediate products from the first multiplier and coefficients of the Taylor series. The accumulator may compute a partial sum of the terms as an output of the activation function. The number of the terms may be determined based on a predetermined accuracy of the output of the activation function. The activation function unit may process multiple activations. Different activations may be input into different compute elements in different clock cycles. The activation function unit may compute activation functions with different accuracies.
    Type: Application
    Filed: July 5, 2023
    Publication date: November 2, 2023
    Inventors: Umer Iftikhar Cheema, Deepak Abraham Mathaikutty, Arnab Raha, Dinakar Kondru, Raymond Jit-Hung Sung, Soumendu Kumar Ghosh
  • Publication number: 20230229507
    Abstract: Computations in processing elements (PEs) for executing a deep neural network are scheduled via a computation scheduler based on sparsity in input data of the computations to reduce voltage droops. Each PE may compute an input operand and a weight operand in a computation. The computation scheduler may predict the workload of the PE for the computation based on a combined sparsity bitmap, which may be generated based on a sparsity bitmap of the input operand and a sparsity bitmap of the weight operand. The computation scheduler can schedule the starts of the computations in the PEs based on the predicted workloads of the PEs. The computation scheduler may instruct the PE having the highest workload to start the computation first and instruct the other PEs to start computations later. In some embodiments, the computations in the PEs may end in the same clock cycle.
    Type: Application
    Filed: March 8, 2023
    Publication date: July 20, 2023
    Applicant: Intel Corporation
    Inventors: Raymond Jit-Hung Sung, Arnab Raha, Deepak Abraham Mathaikutty, Umer Iftikhar Cheema
  • Publication number: 20230221994
    Abstract: A compute block can dynamically uncompress compressed data for executing a channel-separable operation. The compressed data includes one or more nonzero-valued data elements. The compressed data may be stored in a datastore along with a sparsity bitmap of an input operand including the compressed data. An uncompressing module may determine whether the input operand includes any zero-valued data element, e.g., by determining whether the sparsity bitmap includes a zero-valued bit. After determining that the sparsity bitmap includes a zero-valued bit, the uncompressing module inserts a zero-valued data element into the compressed data based on a position of the bit in the sparsity bitmap and generates uncompressed data and update the sparsity bitmap so that all the bits become ones. The uncompressed dense data is transmitted to one or more processing elements (PE) in the compute block for computing an output operand based on the uncompressed dense data.
    Type: Application
    Filed: March 16, 2023
    Publication date: July 13, 2023
    Inventors: Arnab Raha, Deepak Abraham Mathaikutty, Raymond Jit-Hung Sung, Umer Iftikhar Cheema, Dinakar Kondru, Soumendu Kumar Ghosh
  • Publication number: 20230140173
    Abstract: An DNN accelerator includes one or more heterogenous tile sets. A heterogenous tile set includes tiles of different sizes, e.g., PE arrays including different numbers of columns or rows. The DNN accelerator may identify a tile set from the tile sets for running a DNN model based on dimensions of output tensors convolutional layers in the DNN. Within the selected tile set, a tile may be selected for a convolutional layer in the DNN, e.g., based on dimensions of the output tensor of the convolutional layer and the size of the tile. After the tile is selected, the workload for running a convolutional operation of the layer may be partitioned and assigned to individual PEs in the tile by partitioning the output tensor into output tensor segments. The workload of computing an individual output tensor segment can be assigned to an individual PE in the tile.
    Type: Application
    Filed: August 19, 2022
    Publication date: May 4, 2023
    Inventors: Arnab Raha, Umer Iftikhar Cheema, Praveen Kumar Gupta, Deepak Abraham Mathaikutty, Raymond Jit-Hung Sung
  • Publication number: 20230073661
    Abstract: An DNN (deep neural network) accelerator may accelerate deep learning, such as convolutions in frontend layers through a scheduler for loading data to be processed. The DNN accelerator may store, in a memory, an input tensor of a convolutional layer in a DNN. The convolutional layer may be the first layer or a layer that is arranged before the one or more other convolutional layers in the DNN such that data processed by the first layer can be efficiently reused across data load rounds. The input tensor includes one or more channels. A channel includes activations arranged in rows and columns. The DNN accelerator may read at least a portion of the input tensor from the memory into a datastore. The datastore includes some databanks. The DNN accelerator may provide a vector of one or more activations to a processing element for operations such as multiplications on the vector.
    Type: Application
    Filed: November 14, 2022
    Publication date: March 9, 2023
    Applicant: Intel Corporation
    Inventors: Deepak Abraham Mathaikutty, Arnab Raha, Umer Iftikhar Cheema, Raymond Jit-Hung Sung
  • Publication number: 20230059976
    Abstract: An DNN accelerator may include a PE array performing MAC operations. The PE array may include PEs capable of MAC operations on quantized values. A PE may include subtractors for subtracting zeropoints from quantized activations and quantized weights to generate intermediate activations and intermediate weights. The intermediate activations and intermediate weights may be stored in data storage units in the PE and maybe used by an MAC unit in the PE. The subtractors may be placed outside the MAC unit but inside the PE. The MAC unit may perform sequential cycles of MAC operations. The MAC unit may include a plurality of multipliers. The intermediate activations and intermediate weights stored in the data storage units may be reused by different multipliers in different cycles of MAC operations. An output of the MAC unit or of the PE may be multiplied with a quantization scale to produce a floating-point value.
    Type: Application
    Filed: October 18, 2022
    Publication date: February 23, 2023
    Applicant: Intel Corporation
    Inventors: Deepak Abraham Mathaikutty, Arnab Raha, Raymond Jit-Hung Sung, Martin Power, Umer Iftikhar Cheema, David Thomas Bernard