Patents by Inventor Ganesh Venkatesh

Ganesh Venkatesh has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20200285618
    Abstract: Compressed data is oftentimes beneficial for reducing the computing resources required, for example, to transmit and store data. The compression of data is particularly useful when dealing with sparse data (data that includes numerous zeros or near-zero values) and only non-zero values above a certain threshold have significance. When dealing with compressed data, oftentimes the data needs to be decompressed for processing (e.g., by deep learning networks or other applications configured to operate on sparse, or other uncompressed data). Instructions are disclosed for supporting the decompression of compressed data by a processing unit such as a CPU and GPU.
    Type: Application
    Filed: March 20, 2019
    Publication date: September 10, 2020
    Inventors: Jorge Albericio Latorre, Jack H. Choquette, Manan Maheshkumar Patel, Jeffrey Pool, Ming Y. Siu, Ronny Meir Krashinsky, Ganesh Venkatesh
  • Publication number: 20190347125
    Abstract: Embodiments of systems, methods, and apparatuses for heterogeneous computing are described. In some embodiments, a hardware heterogeneous scheduler dispatches instructions for execution on one or more plurality of heterogeneous processing elements, the instructions corresponding to a code fragment to be processed by the one or more of the plurality of heterogeneous processing elements, wherein the instructions are native instructions to at least one of the one or more of the plurality of heterogeneous processing elements.
    Type: Application
    Filed: December 31, 2016
    Publication date: November 14, 2019
    Inventors: Rajesh M. SANKARAN, Gilbert NEIGER, Narayan RANGANATHAN, Stephen R. VAN DOREN, Joseph NUZMAN, Niall D. MCDONNELL, Michael A. O'HANLON, Lokpraveen B. MOSUR, Tracy Garrett DRYSDALE, Eriko NURVITADHI, Asit K. MISHRA, Ganesh VENKATESH, Deborah T. MARR, Nicholas P. CARTER, Jonathan D. PEARCE, Edward T. GROCHOWSKI, Richard J. GRECO, Robert VALENTINE, Jesus CORBAL, Thomas D. FLETCHER, Dennis R. BRADFORD, Dwight P. MANLEY, Mark J. CHARNEY, Jeffrey J. COOK, Paul CAPRIOLI, Koichi YAMADA, Kent D. GLOSSOP, David B. SHEFFIELD
  • Patent number: 10452551
    Abstract: A processor may include a programmable memory prefetcher that includes a programmable hardware prefetch engine and a prefetch engine control register.
    Type: Grant
    Filed: December 12, 2016
    Date of Patent: October 22, 2019
    Assignee: Intel Corporation
    Inventors: Ganesh Venkatesh, Christopher B. Wilkerson, Seth H. Pugsley, Deborah T. Marr
  • Patent number: 10387037
    Abstract: Techniques for enabling enhanced parallelism for sparse linear algebra operations having write-to-read dependencies are disclosed. A hardware processor includes a plurality of processing elements, a memory that is heavily-banked into a plurality of banks, and an arbiter. The arbiter is to receive requests from threads executing at the plurality of processing elements seeking to perform operations involving the memory, and to maintain a plurality of lock buffers corresponding to the plurality of banks. Each of the lock buffers is able to track up to a plurality of memory addresses within the corresponding bank that are to be treated as locked in that the values stored at those memory addresses cannot be updated by those of the threads that did not cause the memory addresses to be locked until those memory addresses have been removed from being tracked by the plurality of lock buffers.
    Type: Grant
    Filed: December 31, 2016
    Date of Patent: August 20, 2019
    Assignee: Intel Corporation
    Inventors: Ganesh Venkatesh, Deborah Marr
  • Patent number: 10372507
    Abstract: Techniques involving a compute engine architecture to support data-parallel loops with reduction operations are described. In some embodiments, a hardware processor includes a memory unit and a plurality of processing elements (PEs). Each of the PEs is directly coupled via one or more neighbor-to-neighbor links with one or more neighboring PEs so that each PE can receive a value from a neighboring PE, provide a value to a neighboring PE, or both receive a value from one neighboring PE and also provide a value to another neighboring PE. The hardware processor also includes a control engine coupled with the plurality of PEs that is to cause the plurality of PEs to collectively perform a task to generate one or more output values by each performing one or more iterations of a same subtask of the task.
    Type: Grant
    Filed: December 31, 2016
    Date of Patent: August 6, 2019
    Assignee: Intel Corporation
    Inventors: Ganesh Venkatesh, Deborah Marr
  • Patent number: 10289752
    Abstract: A processor may include a gather-update-scatter accelerator, and an allocator comprising circuitry to direct an instruction to the accelerator for execution. The instruction may include a search index, an operation to be performed, and a scalar data value. The accelerator may include a content-addressable memory (CAM) storing multiple entries, each of which stores a respective index key and a data value associated with the index key. The accelerator may include a CAM controller, which includes circuitry. The CAM controller may be configured to select, based on the information in the instruction, one of the plurality of entries in the CAM on which to operate. The CAM controller may be configured to perform an arithmetic or logical operation on the selected entry dependent on the information in the instruction. The CAM controller may be configured to store a result of the operation in the selected entry in the CAM.
    Type: Grant
    Filed: December 12, 2016
    Date of Patent: May 14, 2019
    Assignee: Intel Corporation
    Inventors: Ganesh Venkatesh, Nicholas P. Carter, Deborah T. Marr
  • Publication number: 20180188961
    Abstract: Techniques for enabling enhanced parallelism for sparse linear algebra operations having write-to-read dependencies are disclosed. A hardware processor includes a plurality of processing elements, a memory that is heavily-banked into a plurality of banks, and an arbiter. The arbiter is to receive requests from threads executing at the plurality of processing elements seeking to perform operations involving the memory, and to maintain a plurality of lock buffers corresponding to the plurality of banks. Each of the lock buffers is able to track up to a plurality of memory addresses within the corresponding bank that are to be treated as locked in that the values stored at those memory addresses cannot be updated by those of the threads that did not cause the memory addresses to be locked until those memory addresses have been removed from being tracked by the plurality of lock buffers.
    Type: Application
    Filed: December 31, 2016
    Publication date: July 5, 2018
    Inventors: Ganesh VENKATESH, Deborah MARR
  • Publication number: 20180189675
    Abstract: Hardware accelerator architectures for clustering are described. A hardware accelerator includes sparse tiles and very/hyper sparse tiles. The sparse tile(s) execute operations for a clustering task involving a matrix. Each sparse tile includes a first plurality of processing units to operate upon a first plurality of blocks of the matrix that have been streamed to one or more random access memories of the sparse tiles over a high bandwidth interface from a first memory unit. Each of the very/hyper sparse tiles are to execute operations for the clustering task involving the matrix. Each of the very/hyper sparse tiles includes a second plurality of processing units to operate upon a second plurality of blocks of the matrix that have been randomly accessed over a low-latency interface from a second memory unit.
    Type: Application
    Filed: December 31, 2016
    Publication date: July 5, 2018
    Inventors: Eriko NURVITADHI, Ganesh VENKATESH, Srivatsan KRISHNAN, Suchit SUBHASCHANDRA, Deborah MARR
  • Publication number: 20180189110
    Abstract: Techniques involving a compute engine architecture to support data-parallel loops with reduction operations are described. In some embodiments, a hardware processor includes a memory unit and a plurality of processing elements (PEs). Each of the PEs is directly coupled via one or more neighbor-to-neighbor links with one or more neighboring PEs so that each PE can receive a value from a neighboring PE, provide a value to a neighboring PE, or both receive a value from one neighboring PE and also provide a value to another neighboring PE. The hardware processor also includes a control engine coupled with the plurality of PEs that is to cause the plurality of PEs to collectively perform a task to generate one or more output values by each performing one or more iterations of a same subtask of the task.
    Type: Application
    Filed: December 31, 2016
    Publication date: July 5, 2018
    Inventors: Ganesh VENKATESH, Deborah MARR
  • Publication number: 20180165381
    Abstract: A processor may include a gather-update-scatter accelerator, and circuitry to direct an instruction to the accelerator for execution. The instruction may include a search index, an operation to be performed, and a scalar data value. The accelerator may include a content-associative memory (CAM) storing multiple entries, each of which stores a respective index key and a data value associated with the index key. The accelerator may include a CAM controller, including circuitry to select, based on the information in the instruction, one of the plurality of entries in the CAM on which to operate, an arithmetic logic unit (ALU), including circuitry to perform an arithmetic or logical operation on the selected entry, the operation being dependent on the information in the instruction, and circuitry to store a result of the operation in the selected entry in the CAM.
    Type: Application
    Filed: December 12, 2016
    Publication date: June 14, 2018
    Inventors: Ganesh Venkatesh, Nicholas P. Carter, Deborah T. Marr
  • Publication number: 20180165204
    Abstract: A processor may include a programmable hardware prefetch engine and a prefetch engine control register. The processor may include circuitry to receive, during execution of an application, a first instruction for configuring the prefetch engine for prefetching multiple cache lines to be accessed in the future, at predictable locations, by the application; to store, in the prefetch engine control register, dependent on information in the first instruction, data representing an amount of prefetching to be performed and data representing a stride distance between consecutive cache lines to be prefetched; to receive a second instruction for prefetching a single cache line whose location is identified in the second instruction; and to initiate, in response to receiving the second instruction, prefetching of multiple cache lines by the prefetch engine, to be performed in parallel with execution of the application and in accordance with the data stored in the prefetch engine control register.
    Type: Application
    Filed: December 12, 2016
    Publication date: June 14, 2018
    Inventors: Ganesh Venkatesh, Christopher B. Wilkerson, Seth H. Pugsley, Deborah T. Marr
  • Publication number: 20160378465
    Abstract: In one embodiment, a processor includes at least one core to execute instructions and an accelerator coupled to the at least one core. The accelerator may include a plurality of walker logics, which may be adapted to fetch at least a portion of a first array block and at least a portion of a second array block, determine whether a first index of the first array block matches a second index of the second array block, and send a first value of the first array block associated with the first index and a second value of the second array block associated with the second index to an arithmetic unit, based at least in part on the determination. Other embodiments are described and claimed.
    Type: Application
    Filed: June 23, 2015
    Publication date: December 29, 2016
    Inventors: Ganesh Venkatesh, Tianlu C. Zhang, Deborah T. Marr