Patents by Inventor Umer Iftikhar Cheema
Umer Iftikhar Cheema has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240403616Abstract: An activation function in a neural network may be approximated by one or more linear functions. A linear function may correspond to a segment of the input range of the activation function, e.g., a linear segment. A programmable look-up table may store slopes and intercepts of linear functions. A post processing engine (PPE) array executing the activation function may determine that an input data element of the activation function falls into the linear segment and compute an output of the linear function using the input data element. The output of the linear function may be used as the approximated output of the activation function. Alternatively, the PPE array may determine that the input data element is in a saturation segment and use a fixed value associated with the saturation segment as the approximated output of the activation function.Type: ApplicationFiled: November 2, 2023Publication date: December 5, 2024Inventors: Umer Iftikhar Cheema, Kevin Brady, Robert Simofi, Colm O Faolain, Deepak Abraham Mathaikutty, Arnab Raha, Dinakar Kondru, Gary Baugh, Darren Crews, Fergal Connor
-
Publication number: 20240160695Abstract: A non-linear activation function may be approximated by linear functions. The input range of the activation function may be divided into input segments. One or more input segments may be selected based on statistical analysis of input data elements in the input range. A parameter of a first linear function that approximates the activation function for at least part of a selected input segment may be stored in a first portion of a first look-up table (LUT). The first portion of the first LUT is dedicated to a first group of post processing engines (PPEs). A parameter of a second linear function that approximates the activation function for at least part of an unselected input segment may be stored in a shared pool of LUT entries, which includes a second portion of the first LUT and a portion of a second LUT and is shared by multiple groups of PPEs.Type: ApplicationFiled: December 21, 2023Publication date: May 16, 2024Applicant: Intel CorporationInventors: Dinakar Kondru, Deepak Abraham Mathaikutty, Arnab Raha, Umer Iftikhar Cheema
-
Publication number: 20240119269Abstract: A deep neural network (DNN) accelerator may facilitate dynamic sparsity-based acceleration and operate in various sparsity modes including a combined sparsity mode, a weight sparsity mode, an activation sparsity mode, and a dense mode. The DNN accelerator may receive a configuration parameter indicating whether to accelerate the layer based on sparsity in a weight tensor of the layer. The configuration parameter may be generated offline, e.g., before the execution of the DNN is started. The DNN accelerator computes one or more activations of the layer in a previous layer in the DNN. The one or more activations are one or more elements of an activation tensor of the layer. The DNN accelerator may determine a sparsity mode for the layer based on the configuration parameter and sparsity in the activation tensor. One or more sparse cells in the DNN accelerator may execute the layer in the sparsity mode.Type: ApplicationFiled: December 18, 2023Publication date: April 11, 2024Inventors: Arnab Raha, Dinakar Kondru, Deepak Abraham Mathaikutty, Umer Iftikhar Cheema
-
Publication number: 20240111830Abstract: A non-linear activation function in a neural network may be approximated by one or more linear functions. The input range may be divided into input segments, each of which corresponds to a different exponent in the input range of the activation function and includes input data elements having the exponent. Target accuracies may be assigned to the identified exponents based on a statistics analysis of the input data elements. The target accuracy of an input segment will be used to determine one or more linear functions that approximate the activation function for the input segment. An error of an approximation of the activation function by a linear function for the input segment may be within the target accuracy. The parameters of the linear functions may be stored in a look-up table (LUT). During the execution of the DNN, the LUT may be used to execute the activation function.Type: ApplicationFiled: December 8, 2023Publication date: April 4, 2024Applicant: Intel CorporationInventors: Umer Iftikhar Cheema, Robert Simofi, Deepak Abraham Mathaikutty, Arnab Raha, Dinakar Kondru
-
Publication number: 20240028895Abstract: A load module in a deep neural network (DNN) accelerator may receive a configuration parameter indicating a selection between an activation sparsity mode and a weight sparsity mode. The load module may read a sparse activation tensor, an activation sparsity bitmap, a sparse weight tensor, and a weight sparsity bitmap from a memory. The load module may densify one of the compressed tensors based on the sparsity mode and leave the other compressed tensor as is. The load module may load the dense tensor and the sparse tensor to a sparse cell. The sparse cell includes a sparsity module that may select one or more elements of the dense tensor based on the sparsity bitmap of the sparse tensor. The sparse cell also includes multiply-accumulate (MAC) units that perform MAC operation on the selected elements and the sparse tensor. MAC operations on unselected elements of the dense tensor are skipped.Type: ApplicationFiled: September 28, 2023Publication date: January 25, 2024Inventors: Arnab Raha, Deepak Abraham Mathaikutty, Dinakar Kondru, Umer Iftikhar Cheema, Martin Power, Niall Hanrahan
-
Publication number: 20240013040Abstract: A drain module may drain activations in an output tensor of a convolution from a PE array that performs the convolution. The drain module may extract activations generated in a collection of PE columns. The activations generated in the PE columns in the collection may be concatenated, e.g., activations generated in the first PE column of the collection may be followed by activations generated in the second PE column of the collection, and so on. The activations in the output tensor may be rearranged into activation vectors. Each activation vector may include activations in different output channels of the deep learning operation. The activations in each activation vector may have the same (X, Y) coordinate in the output tensor. The drain module may determine a memory address for an activation based on the activation's (X, Y, Z) coordinate in the output tensor and write the activation to the memory address.Type: ApplicationFiled: September 26, 2023Publication date: January 11, 2024Applicant: Intel CorporationInventors: Arnab Raha, Deepak Abraham Mathaikutty, Umer Iftikhar Cheema, Dinakar Kondru
-
Publication number: 20230351181Abstract: An activation function unit can compute activation functions approximated by Taylor series. The activation function unit may include a plurality of compute elements. Each compute element may include two multipliers and an accumulator. The first multiplier may compute intermediate products using an activation, such as an output activation of a DNN layer. The second multiplier may compute terms of Taylor series approximating an activation function based on the intermediate products from the first multiplier and coefficients of the Taylor series. The accumulator may compute a partial sum of the terms as an output of the activation function. The number of the terms may be determined based on a predetermined accuracy of the output of the activation function. The activation function unit may process multiple activations. Different activations may be input into different compute elements in different clock cycles. The activation function unit may compute activation functions with different accuracies.Type: ApplicationFiled: July 5, 2023Publication date: November 2, 2023Inventors: Umer Iftikhar Cheema, Deepak Abraham Mathaikutty, Arnab Raha, Dinakar Kondru, Raymond Jit-Hung Sung, Soumendu Kumar Ghosh
-
Publication number: 20230229507Abstract: Computations in processing elements (PEs) for executing a deep neural network are scheduled via a computation scheduler based on sparsity in input data of the computations to reduce voltage droops. Each PE may compute an input operand and a weight operand in a computation. The computation scheduler may predict the workload of the PE for the computation based on a combined sparsity bitmap, which may be generated based on a sparsity bitmap of the input operand and a sparsity bitmap of the weight operand. The computation scheduler can schedule the starts of the computations in the PEs based on the predicted workloads of the PEs. The computation scheduler may instruct the PE having the highest workload to start the computation first and instruct the other PEs to start computations later. In some embodiments, the computations in the PEs may end in the same clock cycle.Type: ApplicationFiled: March 8, 2023Publication date: July 20, 2023Applicant: Intel CorporationInventors: Raymond Jit-Hung Sung, Arnab Raha, Deepak Abraham Mathaikutty, Umer Iftikhar Cheema
-
Publication number: 20230221994Abstract: A compute block can dynamically uncompress compressed data for executing a channel-separable operation. The compressed data includes one or more nonzero-valued data elements. The compressed data may be stored in a datastore along with a sparsity bitmap of an input operand including the compressed data. An uncompressing module may determine whether the input operand includes any zero-valued data element, e.g., by determining whether the sparsity bitmap includes a zero-valued bit. After determining that the sparsity bitmap includes a zero-valued bit, the uncompressing module inserts a zero-valued data element into the compressed data based on a position of the bit in the sparsity bitmap and generates uncompressed data and update the sparsity bitmap so that all the bits become ones. The uncompressed dense data is transmitted to one or more processing elements (PE) in the compute block for computing an output operand based on the uncompressed dense data.Type: ApplicationFiled: March 16, 2023Publication date: July 13, 2023Inventors: Arnab Raha, Deepak Abraham Mathaikutty, Raymond Jit-Hung Sung, Umer Iftikhar Cheema, Dinakar Kondru, Soumendu Kumar Ghosh
-
Publication number: 20230140173Abstract: An DNN accelerator includes one or more heterogenous tile sets. A heterogenous tile set includes tiles of different sizes, e.g., PE arrays including different numbers of columns or rows. The DNN accelerator may identify a tile set from the tile sets for running a DNN model based on dimensions of output tensors convolutional layers in the DNN. Within the selected tile set, a tile may be selected for a convolutional layer in the DNN, e.g., based on dimensions of the output tensor of the convolutional layer and the size of the tile. After the tile is selected, the workload for running a convolutional operation of the layer may be partitioned and assigned to individual PEs in the tile by partitioning the output tensor into output tensor segments. The workload of computing an individual output tensor segment can be assigned to an individual PE in the tile.Type: ApplicationFiled: August 19, 2022Publication date: May 4, 2023Inventors: Arnab Raha, Umer Iftikhar Cheema, Praveen Kumar Gupta, Deepak Abraham Mathaikutty, Raymond Jit-Hung Sung
-
Publication number: 20230073661Abstract: An DNN (deep neural network) accelerator may accelerate deep learning, such as convolutions in frontend layers through a scheduler for loading data to be processed. The DNN accelerator may store, in a memory, an input tensor of a convolutional layer in a DNN. The convolutional layer may be the first layer or a layer that is arranged before the one or more other convolutional layers in the DNN such that data processed by the first layer can be efficiently reused across data load rounds. The input tensor includes one or more channels. A channel includes activations arranged in rows and columns. The DNN accelerator may read at least a portion of the input tensor from the memory into a datastore. The datastore includes some databanks. The DNN accelerator may provide a vector of one or more activations to a processing element for operations such as multiplications on the vector.Type: ApplicationFiled: November 14, 2022Publication date: March 9, 2023Applicant: Intel CorporationInventors: Deepak Abraham Mathaikutty, Arnab Raha, Umer Iftikhar Cheema, Raymond Jit-Hung Sung
-
Publication number: 20230059976Abstract: An DNN accelerator may include a PE array performing MAC operations. The PE array may include PEs capable of MAC operations on quantized values. A PE may include subtractors for subtracting zeropoints from quantized activations and quantized weights to generate intermediate activations and intermediate weights. The intermediate activations and intermediate weights may be stored in data storage units in the PE and maybe used by an MAC unit in the PE. The subtractors may be placed outside the MAC unit but inside the PE. The MAC unit may perform sequential cycles of MAC operations. The MAC unit may include a plurality of multipliers. The intermediate activations and intermediate weights stored in the data storage units may be reused by different multipliers in different cycles of MAC operations. An output of the MAC unit or of the PE may be multiplied with a quantization scale to produce a floating-point value.Type: ApplicationFiled: October 18, 2022Publication date: February 23, 2023Applicant: Intel CorporationInventors: Deepak Abraham Mathaikutty, Arnab Raha, Raymond Jit-Hung Sung, Martin Power, Umer Iftikhar Cheema, David Thomas Bernard