Patents by Inventor Wajahat Qadeer

Wajahat Qadeer has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20230418610
    Abstract: Disclosed herein is a processor for deep learning. In one embodiment, the processor comprises: a load and store unit configured to load and store image pixel data and stencil data; a register unit, implementing a banked register file, configured to: load and store a subset of the image pixel data from the load and store unit, and concurrently provide access to image pixel values stored in a register file entry of the banked register file, wherein the subset of the image pixel data comprises the image pixel values stored in the register file entry; and a plurality of arithmetic logic units configured to concurrently perform one or more operations on the image pixel values stored in the register file entry and corresponding stencil data of the stencil data.
    Type: Application
    Filed: June 30, 2023
    Publication date: December 28, 2023
    Inventors: Wajahat Qadeer, Rehan Hameed
  • Publication number: 20230385645
    Abstract: A method includes, for each floating-point layer in a set of floating-point layers: calculating a set of input activations and a set of output activations of the floating-point layer; converting the floating-point layer to a low-bit-width layer; calculating a set of low-bit-width output activations based on the set of input activations; and calculating a per-layer deviation statistic of the low-bit-width layer. The method also includes ordering the set of low-bit-width layers based on the per-layer deviation statistic of each low-bit-width layer.
    Type: Application
    Filed: August 9, 2023
    Publication date: November 30, 2023
    Inventors: Wajahat Qadeer, Rehan Hameed, Satyanarayana Raju Uppalapati, Abhilash Bharath Ghanore, Kasanagottu Sai Ram
  • Patent number: 11763158
    Abstract: A method includes, for each floating-point layer in a set of floating-point layers: calculating a set of input activations and a set of output activations of the floating-point layer; converting the floating-point layer to a low-bit-width layer; calculating a set of low-bit-width output activations based on the set of input activations; and calculating a per-layer deviation statistic of the low-bit-width layer. The method also includes ordering the set of low-bit-width layers based on the per-layer deviation statistic of each low-bit-width layer.
    Type: Grant
    Filed: December 4, 2020
    Date of Patent: September 19, 2023
    Assignee: Deep Vision Inc.
    Inventors: Wajahat Qadeer, Rehan Hameed, Satyanarayana Raju Uppalapati, Abhilash Bharath Ghanore, Kasanagottu Sai Ram
  • Patent number: 11734006
    Abstract: Disclosed herein is a processor for deep learning. In one embodiment, the processor comprises: a load and store unit configured to load and store image pixel data and stencil data; a register unit, implementing a banked register file, configured to: load and store a subset of the image pixel data from the load and store unit, and concurrently provide access to image pixel values stored in a register file entry of the banked register file, wherein the subset of the image pixel data comprises the image pixel values stored in the register file entry; and a plurality of arithmetic logic units configured to concurrently perform one or more operations on the image pixel values stored in the register file entry and corresponding stencil data of the stencil data.
    Type: Grant
    Filed: July 19, 2022
    Date of Patent: August 22, 2023
    Assignee: Deep Vision, Inc.
    Inventors: Wajahat Qadeer, Rehan Hameed
  • Publication number: 20230195590
    Abstract: A method includes: accessing a static schedule of a target neural network for execution by a processing device, the target neural network including a set of layers; generating a set of expected performance metrics of the target neural network based on the static schedule, the set of expected performance metrics including a first expected performance metric for a first layer in the set of layers; accessing a set of runtime performance metrics captured during execution of the target neural network by the processing device, the set of runtime performance metrics including a first runtime performance metric for the first layer; and, in response to detecting a difference between the first runtime performance metric and the first expected performance metric exceeding a threshold, serving an alert at a user interface.
    Type: Application
    Filed: December 20, 2022
    Publication date: June 22, 2023
    Inventors: Satyanarayana Raju Uppalapati, Rajasekhar Reddy Ereddy, Sameek Banerjee, Mohammed Shahim, Shilpa Kallem, Suresh Kumar Vennam, Abhilash Bharath Ghanore, Raju Datla, Wajahat Qadeer, Rehan Hameed
  • Publication number: 20220357946
    Abstract: Disclosed herein is a processor for deep learning. In one embodiment, the processor comprises: a load and store unit configured to load and store image pixel data and stencil data; a register unit, implementing a banked register file, configured to: load and store a subset of the image pixel data from the load and store unit, and concurrently provide access to image pixel values stored in a register file entry of the banked register file, wherein the subset of the image pixel data comprises the image pixel values stored in the register file entry; and a plurality of arithmetic logic units configured to concurrently perform one or more operations on the image pixel values stored in the register file entry and corresponding stencil data of the stencil data.
    Type: Application
    Filed: July 19, 2022
    Publication date: November 10, 2022
    Inventors: Wajahat Qadeer, Rehan Hameed
  • Patent number: 11436014
    Abstract: Disclosed herein is a processor for deep learning. In one embodiment, the processor comprises: a load and store unit configured to load and store image pixel data and stencil data; a register unit, implementing a banked register file, configured to: load and store a subset of the image pixel data from the load and store unit, and concurrently provide access to image pixel values stored in a register file entry of the banked register file, wherein the subset of the image pixel data comprises the image pixel values stored in the register file entry; and a plurality of arithmetic logic units configured to concurrently perform one or more operations on the image pixel values stored in the register file entry and corresponding stencil data of the stencil data.
    Type: Grant
    Filed: June 23, 2021
    Date of Patent: September 6, 2022
    Assignee: Deep Vision, Inc.
    Inventors: Wajahat Qadeer, Rehan Hameed
  • Publication number: 20210326133
    Abstract: Disclosed herein is a processor for deep learning. In one embodiment, the processor comprises: a load and store unit configured to load and store image pixel data and stencil data; a register unit, implementing a banked register file, configured to: load and store a subset of the image pixel data from the load and store unit, and concurrently provide access to image pixel values stored in a register file entry of the banked register file, wherein the subset of the image pixel data comprises the image pixel values stored in the register file entry; and a plurality of arithmetic logic units configured to concurrently perform one or more operations on the image pixel values stored in the register file entry and corresponding stencil data of the stencil data.
    Type: Application
    Filed: June 23, 2021
    Publication date: October 21, 2021
    Inventors: Wajahat Qadeer, Rehan Hameed
  • Patent number: 11080056
    Abstract: Disclosed herein is a processor for deep learning. In one embodiment, the processor comprises: a load and store unit configured to load and store image pixel data and stencil data; a register unit, implementing a banked register file, configured to: load and store a subset of the image pixel data from the load and store unit, and concurrently provide access to image pixel values stored in a register file entry of the banked register file, wherein the subset of the image pixel data comprises the image pixel values stored in the register file entry; and a plurality of arithmetic logic units configured to concurrently perform one or more operations on the image pixel values stored in the register file entry and corresponding stencil data of the stencil data.
    Type: Grant
    Filed: October 31, 2019
    Date of Patent: August 3, 2021
    Assignee: Deep Vision, Inc.
    Inventors: Wajahat Qadeer, Rehan Hameed
  • Publication number: 20210191765
    Abstract: A method for scheduling an artificial neural network includes: accessing a processor representation of a multicore processor comprising processor cores, direct memory access cores, and a cost model; and accessing a network structure defining a set of layers. The method also includes, for each layer in the set of layers: generating a graph based on the processor representation, the graph defining compute nodes, data transfer nodes, and edges representing dependencies between the compute nodes and the data transfer nodes; and generating a schedule for the layer based on the graph, the schedule assigning the compute nodes to the processor cores and assigning the data transfer nodes to the direct memory access cores. The method further includes aggregating the schedule for each layer in the set of layers to generate a complete schedule for the artificial neural network.
    Type: Application
    Filed: December 18, 2020
    Publication date: June 24, 2021
    Inventors: Lava Kumar Bokam, Sameek Bannerjee, Abhilash Bharath Ghanore, Rajashekar Reddy Ereddy, Wajahat Qadeer, Rehan Hameed, Mohamed Shahim, Sreenivas Aerra Reddy
  • Publication number: 20210174172
    Abstract: A method includes, for each floating-point layer in a set of floating-point layers: calculating a set of input activations and a set of output activations of the floating-point layer; converting the floating-point layer to a low-bit-width layer; calculating a set of low-bit-width output activations based on the set of input activations; and calculating a per-layer deviation statistic of the low-bit-width layer. The method also includes ordering the set of low-bit-width layers based on the per-layer deviation statistic of each low-bit-width layer.
    Type: Application
    Filed: December 4, 2020
    Publication date: June 10, 2021
    Inventors: Wajahat Qadeer, Rehan Hameed, Satyanarayana Raju Uppalapati, Abhilash Bharath Ghanore, Kasanagottu Sai Ram
  • Publication number: 20200409699
    Abstract: Disclosed herein is a processor for deep learning. In one embodiment, the processor comprises: a load and store unit configured to load and store image pixel data and stencil data; a register unit, implementing a banked register file, configured to: load and store a subset of the image pixel data from the load and store unit, and concurrently provide access to image pixel values stored in a register file entry of the banked register file, wherein the subset of the image pixel data comprises the image pixel values stored in the register file entry; and a plurality of arithmetic logic units configured to concurrently perform one or more operations on the image pixel values stored in the register file entry and corresponding stencil data of the stencil data.
    Type: Application
    Filed: October 31, 2019
    Publication date: December 31, 2020
    Inventors: Wajahat Qadeer, Rehan Hameed
  • Patent number: 10474464
    Abstract: Disclosed herein is a processor for deep learning. In one embodiment, the processor comprises: a load and store unit configured to load and store image pixel data and stencil data; a register unit, implementing a banked register file, configured to: load and store a subset of the image pixel data from the load and store unit, and concurrently provide access to image pixel values stored in a register file entry of the banked register file, wherein the subset of the image pixel data comprises the image pixel values stored in the register file entry; and a plurality of arithmetic logic units configured to concurrently perform one or more operations on the image pixel values stored in the register file entry and corresponding stencil data of the stencil data.
    Type: Grant
    Filed: July 3, 2018
    Date of Patent: November 12, 2019
    Assignee: DEEP VISION, INC.
    Inventors: Wajahat Qadeer, Rehan Hameed
  • Publication number: 20190012170
    Abstract: Disclosed herein is a processor for deep learning. In one embodiment, the processor comprises: a load and store unit configured to load and store image pixel data and stencil data; a register unit, implementing a banked register file, configured to: load and store a subset of the image pixel data from the load and store unit, and concurrently provide access to image pixel values stored in a register file entry of the banked register file, wherein the subset of the image pixel data comprises the image pixel values stored in the register file entry; and a plurality of arithmetic logic units configured to concurrently perform one or more operations on the image pixel values stored in the register file entry and corresponding stencil data of the stencil data.
    Type: Application
    Filed: July 3, 2018
    Publication date: January 10, 2019
    Inventors: Wajahat Qadeer, Rehan Hameed
  • Patent number: 9477999
    Abstract: A convolution image processor includes a load and store unit, a shift register unit, and a mapping unit. The load and store unit is configured to load and store image pixel data and allow for unaligned access of the image pixel data. The shift register is configured to load and store at least a portion of the image pixel data from the load and store unit and concurrently provide access to each image pixel value in the portion of the image pixel data. The mapping unit is configured to generate a number of shifted versions of image pixel data and corresponding stencil data from the portion of the image pixel data, and concurrently perform one or more operations on each image pixel value in the shifted versions of the portion of the image pixel data and a corresponding stencil value in the corresponding stencil data.
    Type: Grant
    Filed: September 22, 2014
    Date of Patent: October 25, 2016
    Assignee: The Board of Trustees of the Leland Stanford Junior University
    Inventors: Rehan Hameed, Wajahat Qadeer, Christoforos Kozyrakis, Mark A. Horowitz
  • Publication number: 20150086134
    Abstract: A convolution image processor includes a load and store unit, a shift register unit, and a mapping unit. The load and store unit is configured to load and store image pixel data and allow for unaligned access of the image pixel data. The shift register is configured to load and store at least a portion of the image pixel data from the load and store unit and concurrently provide access to each image pixel value in the portion of the image pixel data. The mapping unit is configured to generate a number of shifted versions of image pixel data and corresponding stencil data from the portion of the image pixel data, and concurrently perform one or more operations on each image pixel value in the shifted versions of the portion of the image pixel data and a corresponding stencil value in the corresponding stencil data.
    Type: Application
    Filed: September 22, 2014
    Publication date: March 26, 2015
    Inventors: Rehan Hameed, Wajahat Qadeer, Christoforos Kozyrakis, Mark A. Horowitz