Patents by Inventor Wajahat Qadeer
Wajahat Qadeer has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20230418610Abstract: Disclosed herein is a processor for deep learning. In one embodiment, the processor comprises: a load and store unit configured to load and store image pixel data and stencil data; a register unit, implementing a banked register file, configured to: load and store a subset of the image pixel data from the load and store unit, and concurrently provide access to image pixel values stored in a register file entry of the banked register file, wherein the subset of the image pixel data comprises the image pixel values stored in the register file entry; and a plurality of arithmetic logic units configured to concurrently perform one or more operations on the image pixel values stored in the register file entry and corresponding stencil data of the stencil data.Type: ApplicationFiled: June 30, 2023Publication date: December 28, 2023Inventors: Wajahat Qadeer, Rehan Hameed
-
Publication number: 20230385645Abstract: A method includes, for each floating-point layer in a set of floating-point layers: calculating a set of input activations and a set of output activations of the floating-point layer; converting the floating-point layer to a low-bit-width layer; calculating a set of low-bit-width output activations based on the set of input activations; and calculating a per-layer deviation statistic of the low-bit-width layer. The method also includes ordering the set of low-bit-width layers based on the per-layer deviation statistic of each low-bit-width layer.Type: ApplicationFiled: August 9, 2023Publication date: November 30, 2023Inventors: Wajahat Qadeer, Rehan Hameed, Satyanarayana Raju Uppalapati, Abhilash Bharath Ghanore, Kasanagottu Sai Ram
-
Patent number: 11763158Abstract: A method includes, for each floating-point layer in a set of floating-point layers: calculating a set of input activations and a set of output activations of the floating-point layer; converting the floating-point layer to a low-bit-width layer; calculating a set of low-bit-width output activations based on the set of input activations; and calculating a per-layer deviation statistic of the low-bit-width layer. The method also includes ordering the set of low-bit-width layers based on the per-layer deviation statistic of each low-bit-width layer.Type: GrantFiled: December 4, 2020Date of Patent: September 19, 2023Assignee: Deep Vision Inc.Inventors: Wajahat Qadeer, Rehan Hameed, Satyanarayana Raju Uppalapati, Abhilash Bharath Ghanore, Kasanagottu Sai Ram
-
Patent number: 11734006Abstract: Disclosed herein is a processor for deep learning. In one embodiment, the processor comprises: a load and store unit configured to load and store image pixel data and stencil data; a register unit, implementing a banked register file, configured to: load and store a subset of the image pixel data from the load and store unit, and concurrently provide access to image pixel values stored in a register file entry of the banked register file, wherein the subset of the image pixel data comprises the image pixel values stored in the register file entry; and a plurality of arithmetic logic units configured to concurrently perform one or more operations on the image pixel values stored in the register file entry and corresponding stencil data of the stencil data.Type: GrantFiled: July 19, 2022Date of Patent: August 22, 2023Assignee: Deep Vision, Inc.Inventors: Wajahat Qadeer, Rehan Hameed
-
Publication number: 20230195590Abstract: A method includes: accessing a static schedule of a target neural network for execution by a processing device, the target neural network including a set of layers; generating a set of expected performance metrics of the target neural network based on the static schedule, the set of expected performance metrics including a first expected performance metric for a first layer in the set of layers; accessing a set of runtime performance metrics captured during execution of the target neural network by the processing device, the set of runtime performance metrics including a first runtime performance metric for the first layer; and, in response to detecting a difference between the first runtime performance metric and the first expected performance metric exceeding a threshold, serving an alert at a user interface.Type: ApplicationFiled: December 20, 2022Publication date: June 22, 2023Inventors: Satyanarayana Raju Uppalapati, Rajasekhar Reddy Ereddy, Sameek Banerjee, Mohammed Shahim, Shilpa Kallem, Suresh Kumar Vennam, Abhilash Bharath Ghanore, Raju Datla, Wajahat Qadeer, Rehan Hameed
-
Publication number: 20220357946Abstract: Disclosed herein is a processor for deep learning. In one embodiment, the processor comprises: a load and store unit configured to load and store image pixel data and stencil data; a register unit, implementing a banked register file, configured to: load and store a subset of the image pixel data from the load and store unit, and concurrently provide access to image pixel values stored in a register file entry of the banked register file, wherein the subset of the image pixel data comprises the image pixel values stored in the register file entry; and a plurality of arithmetic logic units configured to concurrently perform one or more operations on the image pixel values stored in the register file entry and corresponding stencil data of the stencil data.Type: ApplicationFiled: July 19, 2022Publication date: November 10, 2022Inventors: Wajahat Qadeer, Rehan Hameed
-
Patent number: 11436014Abstract: Disclosed herein is a processor for deep learning. In one embodiment, the processor comprises: a load and store unit configured to load and store image pixel data and stencil data; a register unit, implementing a banked register file, configured to: load and store a subset of the image pixel data from the load and store unit, and concurrently provide access to image pixel values stored in a register file entry of the banked register file, wherein the subset of the image pixel data comprises the image pixel values stored in the register file entry; and a plurality of arithmetic logic units configured to concurrently perform one or more operations on the image pixel values stored in the register file entry and corresponding stencil data of the stencil data.Type: GrantFiled: June 23, 2021Date of Patent: September 6, 2022Assignee: Deep Vision, Inc.Inventors: Wajahat Qadeer, Rehan Hameed
-
Publication number: 20210326133Abstract: Disclosed herein is a processor for deep learning. In one embodiment, the processor comprises: a load and store unit configured to load and store image pixel data and stencil data; a register unit, implementing a banked register file, configured to: load and store a subset of the image pixel data from the load and store unit, and concurrently provide access to image pixel values stored in a register file entry of the banked register file, wherein the subset of the image pixel data comprises the image pixel values stored in the register file entry; and a plurality of arithmetic logic units configured to concurrently perform one or more operations on the image pixel values stored in the register file entry and corresponding stencil data of the stencil data.Type: ApplicationFiled: June 23, 2021Publication date: October 21, 2021Inventors: Wajahat Qadeer, Rehan Hameed
-
Patent number: 11080056Abstract: Disclosed herein is a processor for deep learning. In one embodiment, the processor comprises: a load and store unit configured to load and store image pixel data and stencil data; a register unit, implementing a banked register file, configured to: load and store a subset of the image pixel data from the load and store unit, and concurrently provide access to image pixel values stored in a register file entry of the banked register file, wherein the subset of the image pixel data comprises the image pixel values stored in the register file entry; and a plurality of arithmetic logic units configured to concurrently perform one or more operations on the image pixel values stored in the register file entry and corresponding stencil data of the stencil data.Type: GrantFiled: October 31, 2019Date of Patent: August 3, 2021Assignee: Deep Vision, Inc.Inventors: Wajahat Qadeer, Rehan Hameed
-
Publication number: 20210191765Abstract: A method for scheduling an artificial neural network includes: accessing a processor representation of a multicore processor comprising processor cores, direct memory access cores, and a cost model; and accessing a network structure defining a set of layers. The method also includes, for each layer in the set of layers: generating a graph based on the processor representation, the graph defining compute nodes, data transfer nodes, and edges representing dependencies between the compute nodes and the data transfer nodes; and generating a schedule for the layer based on the graph, the schedule assigning the compute nodes to the processor cores and assigning the data transfer nodes to the direct memory access cores. The method further includes aggregating the schedule for each layer in the set of layers to generate a complete schedule for the artificial neural network.Type: ApplicationFiled: December 18, 2020Publication date: June 24, 2021Inventors: Lava Kumar Bokam, Sameek Bannerjee, Abhilash Bharath Ghanore, Rajashekar Reddy Ereddy, Wajahat Qadeer, Rehan Hameed, Mohamed Shahim, Sreenivas Aerra Reddy
-
Publication number: 20210174172Abstract: A method includes, for each floating-point layer in a set of floating-point layers: calculating a set of input activations and a set of output activations of the floating-point layer; converting the floating-point layer to a low-bit-width layer; calculating a set of low-bit-width output activations based on the set of input activations; and calculating a per-layer deviation statistic of the low-bit-width layer. The method also includes ordering the set of low-bit-width layers based on the per-layer deviation statistic of each low-bit-width layer.Type: ApplicationFiled: December 4, 2020Publication date: June 10, 2021Inventors: Wajahat Qadeer, Rehan Hameed, Satyanarayana Raju Uppalapati, Abhilash Bharath Ghanore, Kasanagottu Sai Ram
-
Publication number: 20200409699Abstract: Disclosed herein is a processor for deep learning. In one embodiment, the processor comprises: a load and store unit configured to load and store image pixel data and stencil data; a register unit, implementing a banked register file, configured to: load and store a subset of the image pixel data from the load and store unit, and concurrently provide access to image pixel values stored in a register file entry of the banked register file, wherein the subset of the image pixel data comprises the image pixel values stored in the register file entry; and a plurality of arithmetic logic units configured to concurrently perform one or more operations on the image pixel values stored in the register file entry and corresponding stencil data of the stencil data.Type: ApplicationFiled: October 31, 2019Publication date: December 31, 2020Inventors: Wajahat Qadeer, Rehan Hameed
-
Patent number: 10474464Abstract: Disclosed herein is a processor for deep learning. In one embodiment, the processor comprises: a load and store unit configured to load and store image pixel data and stencil data; a register unit, implementing a banked register file, configured to: load and store a subset of the image pixel data from the load and store unit, and concurrently provide access to image pixel values stored in a register file entry of the banked register file, wherein the subset of the image pixel data comprises the image pixel values stored in the register file entry; and a plurality of arithmetic logic units configured to concurrently perform one or more operations on the image pixel values stored in the register file entry and corresponding stencil data of the stencil data.Type: GrantFiled: July 3, 2018Date of Patent: November 12, 2019Assignee: DEEP VISION, INC.Inventors: Wajahat Qadeer, Rehan Hameed
-
Publication number: 20190012170Abstract: Disclosed herein is a processor for deep learning. In one embodiment, the processor comprises: a load and store unit configured to load and store image pixel data and stencil data; a register unit, implementing a banked register file, configured to: load and store a subset of the image pixel data from the load and store unit, and concurrently provide access to image pixel values stored in a register file entry of the banked register file, wherein the subset of the image pixel data comprises the image pixel values stored in the register file entry; and a plurality of arithmetic logic units configured to concurrently perform one or more operations on the image pixel values stored in the register file entry and corresponding stencil data of the stencil data.Type: ApplicationFiled: July 3, 2018Publication date: January 10, 2019Inventors: Wajahat Qadeer, Rehan Hameed
-
Patent number: 9477999Abstract: A convolution image processor includes a load and store unit, a shift register unit, and a mapping unit. The load and store unit is configured to load and store image pixel data and allow for unaligned access of the image pixel data. The shift register is configured to load and store at least a portion of the image pixel data from the load and store unit and concurrently provide access to each image pixel value in the portion of the image pixel data. The mapping unit is configured to generate a number of shifted versions of image pixel data and corresponding stencil data from the portion of the image pixel data, and concurrently perform one or more operations on each image pixel value in the shifted versions of the portion of the image pixel data and a corresponding stencil value in the corresponding stencil data.Type: GrantFiled: September 22, 2014Date of Patent: October 25, 2016Assignee: The Board of Trustees of the Leland Stanford Junior UniversityInventors: Rehan Hameed, Wajahat Qadeer, Christoforos Kozyrakis, Mark A. Horowitz
-
Publication number: 20150086134Abstract: A convolution image processor includes a load and store unit, a shift register unit, and a mapping unit. The load and store unit is configured to load and store image pixel data and allow for unaligned access of the image pixel data. The shift register is configured to load and store at least a portion of the image pixel data from the load and store unit and concurrently provide access to each image pixel value in the portion of the image pixel data. The mapping unit is configured to generate a number of shifted versions of image pixel data and corresponding stencil data from the portion of the image pixel data, and concurrently perform one or more operations on each image pixel value in the shifted versions of the portion of the image pixel data and a corresponding stencil value in the corresponding stencil data.Type: ApplicationFiled: September 22, 2014Publication date: March 26, 2015Inventors: Rehan Hameed, Wajahat Qadeer, Christoforos Kozyrakis, Mark A. Horowitz