Patents by Inventor Rehan Hameed

Rehan Hameed has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20230418610
    Abstract: Disclosed herein is a processor for deep learning. In one embodiment, the processor comprises: a load and store unit configured to load and store image pixel data and stencil data; a register unit, implementing a banked register file, configured to: load and store a subset of the image pixel data from the load and store unit, and concurrently provide access to image pixel values stored in a register file entry of the banked register file, wherein the subset of the image pixel data comprises the image pixel values stored in the register file entry; and a plurality of arithmetic logic units configured to concurrently perform one or more operations on the image pixel values stored in the register file entry and corresponding stencil data of the stencil data.
    Type: Application
    Filed: June 30, 2023
    Publication date: December 28, 2023
    Inventors: Wajahat Qadeer, Rehan Hameed
  • Publication number: 20230385645
    Abstract: A method includes, for each floating-point layer in a set of floating-point layers: calculating a set of input activations and a set of output activations of the floating-point layer; converting the floating-point layer to a low-bit-width layer; calculating a set of low-bit-width output activations based on the set of input activations; and calculating a per-layer deviation statistic of the low-bit-width layer. The method also includes ordering the set of low-bit-width layers based on the per-layer deviation statistic of each low-bit-width layer.
    Type: Application
    Filed: August 9, 2023
    Publication date: November 30, 2023
    Inventors: Wajahat Qadeer, Rehan Hameed, Satyanarayana Raju Uppalapati, Abhilash Bharath Ghanore, Kasanagottu Sai Ram
  • Patent number: 11763158
    Abstract: A method includes, for each floating-point layer in a set of floating-point layers: calculating a set of input activations and a set of output activations of the floating-point layer; converting the floating-point layer to a low-bit-width layer; calculating a set of low-bit-width output activations based on the set of input activations; and calculating a per-layer deviation statistic of the low-bit-width layer. The method also includes ordering the set of low-bit-width layers based on the per-layer deviation statistic of each low-bit-width layer.
    Type: Grant
    Filed: December 4, 2020
    Date of Patent: September 19, 2023
    Assignee: Deep Vision Inc.
    Inventors: Wajahat Qadeer, Rehan Hameed, Satyanarayana Raju Uppalapati, Abhilash Bharath Ghanore, Kasanagottu Sai Ram
  • Patent number: 11734006
    Abstract: Disclosed herein is a processor for deep learning. In one embodiment, the processor comprises: a load and store unit configured to load and store image pixel data and stencil data; a register unit, implementing a banked register file, configured to: load and store a subset of the image pixel data from the load and store unit, and concurrently provide access to image pixel values stored in a register file entry of the banked register file, wherein the subset of the image pixel data comprises the image pixel values stored in the register file entry; and a plurality of arithmetic logic units configured to concurrently perform one or more operations on the image pixel values stored in the register file entry and corresponding stencil data of the stencil data.
    Type: Grant
    Filed: July 19, 2022
    Date of Patent: August 22, 2023
    Assignee: Deep Vision, Inc.
    Inventors: Wajahat Qadeer, Rehan Hameed
  • Publication number: 20230195590
    Abstract: A method includes: accessing a static schedule of a target neural network for execution by a processing device, the target neural network including a set of layers; generating a set of expected performance metrics of the target neural network based on the static schedule, the set of expected performance metrics including a first expected performance metric for a first layer in the set of layers; accessing a set of runtime performance metrics captured during execution of the target neural network by the processing device, the set of runtime performance metrics including a first runtime performance metric for the first layer; and, in response to detecting a difference between the first runtime performance metric and the first expected performance metric exceeding a threshold, serving an alert at a user interface.
    Type: Application
    Filed: December 20, 2022
    Publication date: June 22, 2023
    Inventors: Satyanarayana Raju Uppalapati, Rajasekhar Reddy Ereddy, Sameek Banerjee, Mohammed Shahim, Shilpa Kallem, Suresh Kumar Vennam, Abhilash Bharath Ghanore, Raju Datla, Wajahat Qadeer, Rehan Hameed
  • Patent number: 11550586
    Abstract: A tensor traversal engine in a processor system comprising a source memory component and a destination memory component, the tensor traversal engine comprising: a control signal register storing a control signal for a strided data transfer operation from the source memory component to the destination memory component, the control signal comprising an initial source address, an initial destination address, a first source stride length in a first dimension, and a first source stride count in the first dimension; a source address register communicatively coupled to the control signal register; a destination address register communicatively coupled to the control signal register; a first source stride counter communicatively coupled to the control signal register; and control logic communicatively coupled to the control signal register, the source address register, and the first source stride counter.
    Type: Grant
    Filed: May 26, 2021
    Date of Patent: January 10, 2023
    Assignee: Deep Vision Inc.
    Inventors: Mohamed Shahim, Raju Datla, Rehan Hameed, Shilpa Kallem
  • Publication number: 20220357946
    Abstract: Disclosed herein is a processor for deep learning. In one embodiment, the processor comprises: a load and store unit configured to load and store image pixel data and stencil data; a register unit, implementing a banked register file, configured to: load and store a subset of the image pixel data from the load and store unit, and concurrently provide access to image pixel values stored in a register file entry of the banked register file, wherein the subset of the image pixel data comprises the image pixel values stored in the register file entry; and a plurality of arithmetic logic units configured to concurrently perform one or more operations on the image pixel values stored in the register file entry and corresponding stencil data of the stencil data.
    Type: Application
    Filed: July 19, 2022
    Publication date: November 10, 2022
    Inventors: Wajahat Qadeer, Rehan Hameed
  • Patent number: 11436014
    Abstract: Disclosed herein is a processor for deep learning. In one embodiment, the processor comprises: a load and store unit configured to load and store image pixel data and stencil data; a register unit, implementing a banked register file, configured to: load and store a subset of the image pixel data from the load and store unit, and concurrently provide access to image pixel values stored in a register file entry of the banked register file, wherein the subset of the image pixel data comprises the image pixel values stored in the register file entry; and a plurality of arithmetic logic units configured to concurrently perform one or more operations on the image pixel values stored in the register file entry and corresponding stencil data of the stencil data.
    Type: Grant
    Filed: June 23, 2021
    Date of Patent: September 6, 2022
    Assignee: Deep Vision, Inc.
    Inventors: Wajahat Qadeer, Rehan Hameed
  • Publication number: 20210373895
    Abstract: A tensor traversal engine in a processor system comprising a source memory component and a destination memory component, the tensor traversal engine comprising: a control signal register storing a control signal for a strided data transfer operation from the source memory component to the destination memory component, the control signal comprising an initial source address, an initial destination address, a first source stride length in a first dimension, and a first source stride count in the first dimension; a source address register communicatively coupled to the control signal register; a destination address register communicatively coupled to the control signal register; a first source stride counter communicatively coupled to the control signal register; and control logic communicatively coupled to the control signal register, the source address register, and the first source stride counter.
    Type: Application
    Filed: May 26, 2021
    Publication date: December 2, 2021
    Inventors: Mohamed Shahim, Raju Datla, Rehan Hameed, Shilpa Kallem
  • Publication number: 20210326133
    Abstract: Disclosed herein is a processor for deep learning. In one embodiment, the processor comprises: a load and store unit configured to load and store image pixel data and stencil data; a register unit, implementing a banked register file, configured to: load and store a subset of the image pixel data from the load and store unit, and concurrently provide access to image pixel values stored in a register file entry of the banked register file, wherein the subset of the image pixel data comprises the image pixel values stored in the register file entry; and a plurality of arithmetic logic units configured to concurrently perform one or more operations on the image pixel values stored in the register file entry and corresponding stencil data of the stencil data.
    Type: Application
    Filed: June 23, 2021
    Publication date: October 21, 2021
    Inventors: Wajahat Qadeer, Rehan Hameed
  • Patent number: 11080056
    Abstract: Disclosed herein is a processor for deep learning. In one embodiment, the processor comprises: a load and store unit configured to load and store image pixel data and stencil data; a register unit, implementing a banked register file, configured to: load and store a subset of the image pixel data from the load and store unit, and concurrently provide access to image pixel values stored in a register file entry of the banked register file, wherein the subset of the image pixel data comprises the image pixel values stored in the register file entry; and a plurality of arithmetic logic units configured to concurrently perform one or more operations on the image pixel values stored in the register file entry and corresponding stencil data of the stencil data.
    Type: Grant
    Filed: October 31, 2019
    Date of Patent: August 3, 2021
    Assignee: Deep Vision, Inc.
    Inventors: Wajahat Qadeer, Rehan Hameed
  • Publication number: 20210191765
    Abstract: A method for scheduling an artificial neural network includes: accessing a processor representation of a multicore processor comprising processor cores, direct memory access cores, and a cost model; and accessing a network structure defining a set of layers. The method also includes, for each layer in the set of layers: generating a graph based on the processor representation, the graph defining compute nodes, data transfer nodes, and edges representing dependencies between the compute nodes and the data transfer nodes; and generating a schedule for the layer based on the graph, the schedule assigning the compute nodes to the processor cores and assigning the data transfer nodes to the direct memory access cores. The method further includes aggregating the schedule for each layer in the set of layers to generate a complete schedule for the artificial neural network.
    Type: Application
    Filed: December 18, 2020
    Publication date: June 24, 2021
    Inventors: Lava Kumar Bokam, Sameek Bannerjee, Abhilash Bharath Ghanore, Rajashekar Reddy Ereddy, Wajahat Qadeer, Rehan Hameed, Mohamed Shahim, Sreenivas Aerra Reddy
  • Publication number: 20210174172
    Abstract: A method includes, for each floating-point layer in a set of floating-point layers: calculating a set of input activations and a set of output activations of the floating-point layer; converting the floating-point layer to a low-bit-width layer; calculating a set of low-bit-width output activations based on the set of input activations; and calculating a per-layer deviation statistic of the low-bit-width layer. The method also includes ordering the set of low-bit-width layers based on the per-layer deviation statistic of each low-bit-width layer.
    Type: Application
    Filed: December 4, 2020
    Publication date: June 10, 2021
    Inventors: Wajahat Qadeer, Rehan Hameed, Satyanarayana Raju Uppalapati, Abhilash Bharath Ghanore, Kasanagottu Sai Ram
  • Publication number: 20200409699
    Abstract: Disclosed herein is a processor for deep learning. In one embodiment, the processor comprises: a load and store unit configured to load and store image pixel data and stencil data; a register unit, implementing a banked register file, configured to: load and store a subset of the image pixel data from the load and store unit, and concurrently provide access to image pixel values stored in a register file entry of the banked register file, wherein the subset of the image pixel data comprises the image pixel values stored in the register file entry; and a plurality of arithmetic logic units configured to concurrently perform one or more operations on the image pixel values stored in the register file entry and corresponding stencil data of the stencil data.
    Type: Application
    Filed: October 31, 2019
    Publication date: December 31, 2020
    Inventors: Wajahat Qadeer, Rehan Hameed
  • Patent number: 10474464
    Abstract: Disclosed herein is a processor for deep learning. In one embodiment, the processor comprises: a load and store unit configured to load and store image pixel data and stencil data; a register unit, implementing a banked register file, configured to: load and store a subset of the image pixel data from the load and store unit, and concurrently provide access to image pixel values stored in a register file entry of the banked register file, wherein the subset of the image pixel data comprises the image pixel values stored in the register file entry; and a plurality of arithmetic logic units configured to concurrently perform one or more operations on the image pixel values stored in the register file entry and corresponding stencil data of the stencil data.
    Type: Grant
    Filed: July 3, 2018
    Date of Patent: November 12, 2019
    Assignee: DEEP VISION, INC.
    Inventors: Wajahat Qadeer, Rehan Hameed
  • Publication number: 20190012170
    Abstract: Disclosed herein is a processor for deep learning. In one embodiment, the processor comprises: a load and store unit configured to load and store image pixel data and stencil data; a register unit, implementing a banked register file, configured to: load and store a subset of the image pixel data from the load and store unit, and concurrently provide access to image pixel values stored in a register file entry of the banked register file, wherein the subset of the image pixel data comprises the image pixel values stored in the register file entry; and a plurality of arithmetic logic units configured to concurrently perform one or more operations on the image pixel values stored in the register file entry and corresponding stencil data of the stencil data.
    Type: Application
    Filed: July 3, 2018
    Publication date: January 10, 2019
    Inventors: Wajahat Qadeer, Rehan Hameed
  • Patent number: 9477999
    Abstract: A convolution image processor includes a load and store unit, a shift register unit, and a mapping unit. The load and store unit is configured to load and store image pixel data and allow for unaligned access of the image pixel data. The shift register is configured to load and store at least a portion of the image pixel data from the load and store unit and concurrently provide access to each image pixel value in the portion of the image pixel data. The mapping unit is configured to generate a number of shifted versions of image pixel data and corresponding stencil data from the portion of the image pixel data, and concurrently perform one or more operations on each image pixel value in the shifted versions of the portion of the image pixel data and a corresponding stencil value in the corresponding stencil data.
    Type: Grant
    Filed: September 22, 2014
    Date of Patent: October 25, 2016
    Assignee: The Board of Trustees of the Leland Stanford Junior University
    Inventors: Rehan Hameed, Wajahat Qadeer, Christoforos Kozyrakis, Mark A. Horowitz
  • Publication number: 20150086134
    Abstract: A convolution image processor includes a load and store unit, a shift register unit, and a mapping unit. The load and store unit is configured to load and store image pixel data and allow for unaligned access of the image pixel data. The shift register is configured to load and store at least a portion of the image pixel data from the load and store unit and concurrently provide access to each image pixel value in the portion of the image pixel data. The mapping unit is configured to generate a number of shifted versions of image pixel data and corresponding stencil data from the portion of the image pixel data, and concurrently perform one or more operations on each image pixel value in the shifted versions of the portion of the image pixel data and a corresponding stencil value in the corresponding stencil data.
    Type: Application
    Filed: September 22, 2014
    Publication date: March 26, 2015
    Inventors: Rehan Hameed, Wajahat Qadeer, Christoforos Kozyrakis, Mark A. Horowitz
  • Publication number: 20110119520
    Abstract: The present invention relates to digital signal processors with an integrated module configured to compute a Coordinate Rotation Digital Computer (CORDIC) in a pipeline. The pipelined module can advantageously complete computation of one CORDIC computation for each clock pulse applied to the CORDIC module, thereby providing a CORDIC computation for each clock pulse. One embodiment advantageously computes a first portion of a computation with a lookup table and a second portion in accordance with a CORDIC algorithm. Advantageously, data in a CORDIC pipeline is automatically advanced in response to read instructions and can be automatically advanced from the beginning of the pipeline to the end of the pipeline to reinitialize the pipeline. This allows information to be retrieved from the CORDIC pipeline with relatively little overhead. The automatic starting and stopping of the CORDIC pipeline advantageously allows the retrieval of computations from efficient pipeline architectures on an as-needed basis.
    Type: Application
    Filed: November 11, 2010
    Publication date: May 19, 2011
    Inventors: Shoab A. Khan, Rehan Hameed, Hassan Farooq
  • Publication number: 20060282489
    Abstract: The present invention relates to digital signal processors with an integrated module configured to compute a Coordinate Rotation Digital Computer (CORDIC) in a pipeline. The pipelined module can advantageously complete computation of one CORDIC computation for each clock pulse applied to the CORDIC module, thereby providing a CORDIC computation for each clock pulse. One embodiment advantageously computes a first portion of a computation with a lookup table and a second portion in accordance with a CORDIC algorithm. Advantageously, data in a CORDIC pipeline is automatically advanced in response to read instructions and can be automatically advanced from the beginning of the pipeline to the end of the pipeline to reinitialize the pipeline. This allows information to be retrieved from the CORDIC pipeline with relatively little overhead The automatic starting and stopping of the CORDIC pipeline advantageously allows the retrieval of computations from efficient pipeline architectures on an as-needed basis.
    Type: Application
    Filed: March 27, 2006
    Publication date: December 14, 2006
    Inventors: Shoab Khan, Rehan Hameed, Hassan Farooq