Patents by Inventor Rehan Hameed

Rehan Hameed has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

DEEP VISION PROCESSOR

Publication number: 20230418610

Abstract: Disclosed herein is a processor for deep learning. In one embodiment, the processor comprises: a load and store unit configured to load and store image pixel data and stencil data; a register unit, implementing a banked register file, configured to: load and store a subset of the image pixel data from the load and store unit, and concurrently provide access to image pixel values stored in a register file entry of the banked register file, wherein the subset of the image pixel data comprises the image pixel values stored in the register file entry; and a plurality of arithmetic logic units configured to concurrently perform one or more operations on the image pixel values stored in the register file entry and corresponding stencil data of the stencil data.

Type: Application

Filed: June 30, 2023

Publication date: December 28, 2023

Inventors: Wajahat Qadeer, Rehan Hameed
METHOD FOR AUTOMATIC HYBRID QUANTIZATION OF DEEP ARTIFICIAL NEURAL NETWORKS

Publication number: 20230385645

Abstract: A method includes, for each floating-point layer in a set of floating-point layers: calculating a set of input activations and a set of output activations of the floating-point layer; converting the floating-point layer to a low-bit-width layer; calculating a set of low-bit-width output activations based on the set of input activations; and calculating a per-layer deviation statistic of the low-bit-width layer. The method also includes ordering the set of low-bit-width layers based on the per-layer deviation statistic of each low-bit-width layer.

Type: Application

Filed: August 9, 2023

Publication date: November 30, 2023

Inventors: Wajahat Qadeer, Rehan Hameed, Satyanarayana Raju Uppalapati, Abhilash Bharath Ghanore, Kasanagottu Sai Ram
Method for automatic hybrid quantization of deep artificial neural networks

Patent number: 11763158

Abstract: A method includes, for each floating-point layer in a set of floating-point layers: calculating a set of input activations and a set of output activations of the floating-point layer; converting the floating-point layer to a low-bit-width layer; calculating a set of low-bit-width output activations based on the set of input activations; and calculating a per-layer deviation statistic of the low-bit-width layer. The method also includes ordering the set of low-bit-width layers based on the per-layer deviation statistic of each low-bit-width layer.

Type: Grant

Filed: December 4, 2020

Date of Patent: September 19, 2023

Assignee: Deep Vision Inc.

Inventors: Wajahat Qadeer, Rehan Hameed, Satyanarayana Raju Uppalapati, Abhilash Bharath Ghanore, Kasanagottu Sai Ram
Deep vision processor

Patent number: 11734006

Abstract: Disclosed herein is a processor for deep learning. In one embodiment, the processor comprises: a load and store unit configured to load and store image pixel data and stencil data; a register unit, implementing a banked register file, configured to: load and store a subset of the image pixel data from the load and store unit, and concurrently provide access to image pixel values stored in a register file entry of the banked register file, wherein the subset of the image pixel data comprises the image pixel values stored in the register file entry; and a plurality of arithmetic logic units configured to concurrently perform one or more operations on the image pixel values stored in the register file entry and corresponding stencil data of the stencil data.

Type: Grant

Filed: July 19, 2022

Date of Patent: August 22, 2023

Assignee: Deep Vision, Inc.

Inventors: Wajahat Qadeer, Rehan Hameed
SYSTEM AND METHOD FOR PROFILING ON-CHIP PERFORMANCE OF NEURAL NETWORK EXECUTION

Publication number: 20230195590

Abstract: A method includes: accessing a static schedule of a target neural network for execution by a processing device, the target neural network including a set of layers; generating a set of expected performance metrics of the target neural network based on the static schedule, the set of expected performance metrics including a first expected performance metric for a first layer in the set of layers; accessing a set of runtime performance metrics captured during execution of the target neural network by the processing device, the set of runtime performance metrics including a first runtime performance metric for the first layer; and, in response to detecting a difference between the first runtime performance metric and the first expected performance metric exceeding a threshold, serving an alert at a user interface.

Type: Application

Filed: December 20, 2022

Publication date: June 22, 2023

Inventors: Satyanarayana Raju Uppalapati, Rajasekhar Reddy Ereddy, Sameek Banerjee, Mohammed Shahim, Shilpa Kallem, Suresh Kumar Vennam, Abhilash Bharath Ghanore, Raju Datla, Wajahat Qadeer, Rehan Hameed
Method and tensor traversal engine for strided memory access during execution of neural networks

Patent number: 11550586

Abstract: A tensor traversal engine in a processor system comprising a source memory component and a destination memory component, the tensor traversal engine comprising: a control signal register storing a control signal for a strided data transfer operation from the source memory component to the destination memory component, the control signal comprising an initial source address, an initial destination address, a first source stride length in a first dimension, and a first source stride count in the first dimension; a source address register communicatively coupled to the control signal register; a destination address register communicatively coupled to the control signal register; a first source stride counter communicatively coupled to the control signal register; and control logic communicatively coupled to the control signal register, the source address register, and the first source stride counter.

Type: Grant

Filed: May 26, 2021

Date of Patent: January 10, 2023

Assignee: Deep Vision Inc.

Inventors: Mohamed Shahim, Raju Datla, Rehan Hameed, Shilpa Kallem
DEEP VISION PROCESSOR

Publication number: 20220357946

Abstract: Disclosed herein is a processor for deep learning. In one embodiment, the processor comprises: a load and store unit configured to load and store image pixel data and stencil data; a register unit, implementing a banked register file, configured to: load and store a subset of the image pixel data from the load and store unit, and concurrently provide access to image pixel values stored in a register file entry of the banked register file, wherein the subset of the image pixel data comprises the image pixel values stored in the register file entry; and a plurality of arithmetic logic units configured to concurrently perform one or more operations on the image pixel values stored in the register file entry and corresponding stencil data of the stencil data.

Type: Application

Filed: July 19, 2022

Publication date: November 10, 2022

Inventors: Wajahat Qadeer, Rehan Hameed
Deep vision processor

Patent number: 11436014

Abstract: Disclosed herein is a processor for deep learning. In one embodiment, the processor comprises: a load and store unit configured to load and store image pixel data and stencil data; a register unit, implementing a banked register file, configured to: load and store a subset of the image pixel data from the load and store unit, and concurrently provide access to image pixel values stored in a register file entry of the banked register file, wherein the subset of the image pixel data comprises the image pixel values stored in the register file entry; and a plurality of arithmetic logic units configured to concurrently perform one or more operations on the image pixel values stored in the register file entry and corresponding stencil data of the stencil data.

Type: Grant

Filed: June 23, 2021

Date of Patent: September 6, 2022

Assignee: Deep Vision, Inc.

Inventors: Wajahat Qadeer, Rehan Hameed
METHOD AND TENSOR TRAVERSAL ENGINE FOR STRIDED MEMORY ACCESS DURING EXECUTION OF NEURAL NETWORKS

Publication number: 20210373895

Abstract: A tensor traversal engine in a processor system comprising a source memory component and a destination memory component, the tensor traversal engine comprising: a control signal register storing a control signal for a strided data transfer operation from the source memory component to the destination memory component, the control signal comprising an initial source address, an initial destination address, a first source stride length in a first dimension, and a first source stride count in the first dimension; a source address register communicatively coupled to the control signal register; a destination address register communicatively coupled to the control signal register; a first source stride counter communicatively coupled to the control signal register; and control logic communicatively coupled to the control signal register, the source address register, and the first source stride counter.

Type: Application

Filed: May 26, 2021

Publication date: December 2, 2021

Inventors: Mohamed Shahim, Raju Datla, Rehan Hameed, Shilpa Kallem
DEEP VISION PROCESSOR

Publication number: 20210326133

Abstract: Disclosed herein is a processor for deep learning. In one embodiment, the processor comprises: a load and store unit configured to load and store image pixel data and stencil data; a register unit, implementing a banked register file, configured to: load and store a subset of the image pixel data from the load and store unit, and concurrently provide access to image pixel values stored in a register file entry of the banked register file, wherein the subset of the image pixel data comprises the image pixel values stored in the register file entry; and a plurality of arithmetic logic units configured to concurrently perform one or more operations on the image pixel values stored in the register file entry and corresponding stencil data of the stencil data.

Type: Application

Filed: June 23, 2021

Publication date: October 21, 2021

Inventors: Wajahat Qadeer, Rehan Hameed
Deep vision processor

Patent number: 11080056

Abstract: Disclosed herein is a processor for deep learning. In one embodiment, the processor comprises: a load and store unit configured to load and store image pixel data and stencil data; a register unit, implementing a banked register file, configured to: load and store a subset of the image pixel data from the load and store unit, and concurrently provide access to image pixel values stored in a register file entry of the banked register file, wherein the subset of the image pixel data comprises the image pixel values stored in the register file entry; and a plurality of arithmetic logic units configured to concurrently perform one or more operations on the image pixel values stored in the register file entry and corresponding stencil data of the stencil data.

Type: Grant

Filed: October 31, 2019

Date of Patent: August 3, 2021

Assignee: Deep Vision, Inc.

Inventors: Wajahat Qadeer, Rehan Hameed
METHOD FOR STATIC SCHEDULING OF ARTIFICIAL NEURAL NETWORKS FOR A PROCESSOR

Publication number: 20210191765

Abstract: A method for scheduling an artificial neural network includes: accessing a processor representation of a multicore processor comprising processor cores, direct memory access cores, and a cost model; and accessing a network structure defining a set of layers. The method also includes, for each layer in the set of layers: generating a graph based on the processor representation, the graph defining compute nodes, data transfer nodes, and edges representing dependencies between the compute nodes and the data transfer nodes; and generating a schedule for the layer based on the graph, the schedule assigning the compute nodes to the processor cores and assigning the data transfer nodes to the direct memory access cores. The method further includes aggregating the schedule for each layer in the set of layers to generate a complete schedule for the artificial neural network.

Type: Application

Filed: December 18, 2020

Publication date: June 24, 2021

Inventors: Lava Kumar Bokam, Sameek Bannerjee, Abhilash Bharath Ghanore, Rajashekar Reddy Ereddy, Wajahat Qadeer, Rehan Hameed, Mohamed Shahim, Sreenivas Aerra Reddy
METHOD FOR AUTOMATIC HYBRID QUANTIZATION OF DEEP ARTIFICIAL NEURAL NETWORKS

Publication number: 20210174172

Abstract: A method includes, for each floating-point layer in a set of floating-point layers: calculating a set of input activations and a set of output activations of the floating-point layer; converting the floating-point layer to a low-bit-width layer; calculating a set of low-bit-width output activations based on the set of input activations; and calculating a per-layer deviation statistic of the low-bit-width layer. The method also includes ordering the set of low-bit-width layers based on the per-layer deviation statistic of each low-bit-width layer.

Type: Application

Filed: December 4, 2020

Publication date: June 10, 2021

Inventors: Wajahat Qadeer, Rehan Hameed, Satyanarayana Raju Uppalapati, Abhilash Bharath Ghanore, Kasanagottu Sai Ram
DEEP VISION PROCESSOR

Publication number: 20200409699

Abstract: Disclosed herein is a processor for deep learning. In one embodiment, the processor comprises: a load and store unit configured to load and store image pixel data and stencil data; a register unit, implementing a banked register file, configured to: load and store a subset of the image pixel data from the load and store unit, and concurrently provide access to image pixel values stored in a register file entry of the banked register file, wherein the subset of the image pixel data comprises the image pixel values stored in the register file entry; and a plurality of arithmetic logic units configured to concurrently perform one or more operations on the image pixel values stored in the register file entry and corresponding stencil data of the stencil data.

Type: Application

Filed: October 31, 2019

Publication date: December 31, 2020

Inventors: Wajahat Qadeer, Rehan Hameed
Deep vision processor

Patent number: 10474464

Abstract: Disclosed herein is a processor for deep learning. In one embodiment, the processor comprises: a load and store unit configured to load and store image pixel data and stencil data; a register unit, implementing a banked register file, configured to: load and store a subset of the image pixel data from the load and store unit, and concurrently provide access to image pixel values stored in a register file entry of the banked register file, wherein the subset of the image pixel data comprises the image pixel values stored in the register file entry; and a plurality of arithmetic logic units configured to concurrently perform one or more operations on the image pixel values stored in the register file entry and corresponding stencil data of the stencil data.

Type: Grant

Filed: July 3, 2018

Date of Patent: November 12, 2019

Assignee: DEEP VISION, INC.

Inventors: Wajahat Qadeer, Rehan Hameed
DEEP VISION PROCESSOR

Publication number: 20190012170

Abstract: Disclosed herein is a processor for deep learning. In one embodiment, the processor comprises: a load and store unit configured to load and store image pixel data and stencil data; a register unit, implementing a banked register file, configured to: load and store a subset of the image pixel data from the load and store unit, and concurrently provide access to image pixel values stored in a register file entry of the banked register file, wherein the subset of the image pixel data comprises the image pixel values stored in the register file entry; and a plurality of arithmetic logic units configured to concurrently perform one or more operations on the image pixel values stored in the register file entry and corresponding stencil data of the stencil data.

Type: Application

Filed: July 3, 2018

Publication date: January 10, 2019

Inventors: Wajahat Qadeer, Rehan Hameed
Low power programmable image processor

Patent number: 9477999

Abstract: A convolution image processor includes a load and store unit, a shift register unit, and a mapping unit. The load and store unit is configured to load and store image pixel data and allow for unaligned access of the image pixel data. The shift register is configured to load and store at least a portion of the image pixel data from the load and store unit and concurrently provide access to each image pixel value in the portion of the image pixel data. The mapping unit is configured to generate a number of shifted versions of image pixel data and corresponding stencil data from the portion of the image pixel data, and concurrently perform one or more operations on each image pixel value in the shifted versions of the portion of the image pixel data and a corresponding stencil value in the corresponding stencil data.

Type: Grant

Filed: September 22, 2014

Date of Patent: October 25, 2016

Assignee: The Board of Trustees of the Leland Stanford Junior University

Inventors: Rehan Hameed, Wajahat Qadeer, Christoforos Kozyrakis, Mark A. Horowitz
LOW POWER PROGRAMMABLE IMAGE PROCESSOR

Publication number: 20150086134

Abstract: A convolution image processor includes a load and store unit, a shift register unit, and a mapping unit. The load and store unit is configured to load and store image pixel data and allow for unaligned access of the image pixel data. The shift register is configured to load and store at least a portion of the image pixel data from the load and store unit and concurrently provide access to each image pixel value in the portion of the image pixel data. The mapping unit is configured to generate a number of shifted versions of image pixel data and corresponding stencil data from the portion of the image pixel data, and concurrently perform one or more operations on each image pixel value in the shifted versions of the portion of the image pixel data and a corresponding stencil value in the corresponding stencil data.

Type: Application

Filed: September 22, 2014

Publication date: March 26, 2015

Inventors: Rehan Hameed, Wajahat Qadeer, Christoforos Kozyrakis, Mark A. Horowitz
Hardware Function Generator Support in a DSP

Publication number: 20110119520

Abstract: The present invention relates to digital signal processors with an integrated module configured to compute a Coordinate Rotation Digital Computer (CORDIC) in a pipeline. The pipelined module can advantageously complete computation of one CORDIC computation for each clock pulse applied to the CORDIC module, thereby providing a CORDIC computation for each clock pulse. One embodiment advantageously computes a first portion of a computation with a lookup table and a second portion in accordance with a CORDIC algorithm. Advantageously, data in a CORDIC pipeline is automatically advanced in response to read instructions and can be automatically advanced from the beginning of the pipeline to the end of the pipeline to reinitialize the pipeline. This allows information to be retrieved from the CORDIC pipeline with relatively little overhead. The automatic starting and stopping of the CORDIC pipeline advantageously allows the retrieval of computations from efficient pipeline architectures on an as-needed basis.

Type: Application

Filed: November 11, 2010

Publication date: May 19, 2011

Inventors: Shoab A. Khan, Rehan Hameed, Hassan Farooq
Hardware function generator support in a DSP

Publication number: 20060282489

Abstract: The present invention relates to digital signal processors with an integrated module configured to compute a Coordinate Rotation Digital Computer (CORDIC) in a pipeline. The pipelined module can advantageously complete computation of one CORDIC computation for each clock pulse applied to the CORDIC module, thereby providing a CORDIC computation for each clock pulse. One embodiment advantageously computes a first portion of a computation with a lookup table and a second portion in accordance with a CORDIC algorithm. Advantageously, data in a CORDIC pipeline is automatically advanced in response to read instructions and can be automatically advanced from the beginning of the pipeline to the end of the pipeline to reinitialize the pipeline. This allows information to be retrieved from the CORDIC pipeline with relatively little overhead The automatic starting and stopping of the CORDIC pipeline advantageously allows the retrieval of computations from efficient pipeline architectures on an as-needed basis.

Type: Application

Filed: March 27, 2006

Publication date: December 14, 2006

Inventors: Shoab Khan, Rehan Hameed, Hassan Farooq

1 2 next