Patents by Inventor Ashok Jagannathan
Ashok Jagannathan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240118892Abstract: Methods and apparatuses relating to processing neural networks are described. In one embodiment, an apparatus to process a neural network includes a plurality of fully connected layer chips coupled by an interconnect; a plurality of convolutional layer chips each coupled by an interconnect to a respective fully connected layer chip of the plurality of fully connected layer chips and each of the plurality of fully connected layer chips and the plurality of convolutional layer chips including an interconnect to couple each of a forward propagation compute intensive tile, a back propagation compute intensive tile, and a weight gradient compute intensive tile of a column of compute intensive tiles between a first memory intensive tile and a second memory intensive tile.Type: ApplicationFiled: December 18, 2023Publication date: April 11, 2024Inventors: Swagath VENKATARAMANI, Dipankar DAS, Ashish RANJAN, Subarno BANERJEE, Sasikanth AVANCHA, Ashok JAGANNATHAN, Ajaya V. DURG, Dheemanth NAGARAJ, Bharat KAUL, Anand RAGHUNATHAN
-
Patent number: 11663135Abstract: A fabric controller to provide a coherent accelerator fabric, including: a host interconnect to communicatively couple to a host device; a memory interconnect to communicatively couple to an accelerator memory; an accelerator interconnect to communicatively couple to an accelerator having a last-level cache (LLC); and an LLC controller configured to provide a bias check for memory access operations.Type: GrantFiled: December 20, 2021Date of Patent: May 30, 2023Assignee: Intel CorporationInventors: Ritu Gupta, Aravindh V. Anantaraman, Stephen R. Van Doren, Ashok Jagannathan
-
Publication number: 20220198110Abstract: A method is described. The method includes maintaining a synchronized count value in each of a plurality of logic chips within a same package. The method includes comparing the count value against a same looked for count value in each of the plurality of logic chips. The method includes each of the plurality of logic chips recording in its respective local memory at least some of its state information in response to each of the plurality of logic chips recognizing within a same cycle that the count value has reached the same looked for count value.Type: ApplicationFiled: December 23, 2020Publication date: June 23, 2022Inventors: Shanker Raman NAGESH, Ashok JAGANNATHAN
-
Publication number: 20220114105Abstract: A fabric controller to provide a coherent accelerator fabric, including: a host interconnect to communicatively couple to a host device; a memory interconnect to communicatively couple to an accelerator memory; an accelerator interconnect to communicatively couple to an accelerator having a last-level cache (LLC); and an LLC controller configured to provide a bias check for memory access operations.Type: ApplicationFiled: December 20, 2021Publication date: April 14, 2022Applicant: Intel CorporationInventors: Ritu Gupta, Aravindh V. Anantaraman, Stephen R. Van Doren, Ashok Jagannathan
-
Patent number: 11263143Abstract: A fabric controller is provided for a coherent accelerator fabric. The coherent accelerator fabric includes a host interconnect, a memory interconnect, and an accelerator interconnect. The host interconnect communicatively couples to a host device. The memory interconnect communicatively couples to an accelerator memory. The accelerator interconnect communicatively couples to an accelerator having a last-level cache (LLC). An LLC controller is provided that is configured to provide a bias check for memory access operations on the fabric.Type: GrantFiled: September 29, 2017Date of Patent: March 1, 2022Assignee: Intel CorporationInventors: Ritu Gupta, Aravindh V. Anantaraman, Stephen R. Van Doren, Ashok Jagannathan
-
Publication number: 20220050683Abstract: Methods and apparatuses relating to processing neural networks are described. In one embodiment, an apparatus to process a neural network includes a plurality of fully connected layer chips coupled by an interconnect; a plurality of convolutional layer chips each coupled by an interconnect to a respective fully connected layer chip of the plurality of fully connected layer chips and each of the plurality of fully connected layer chips and the plurality of convolutional layer chips including an interconnect to couple each of a forward propagation compute intensive tile, a back propagation compute intensive tile, and a weight gradient compute intensive tile of a column of compute intensive tiles between a first memory intensive tile and a second memory intensive tile.Type: ApplicationFiled: October 26, 2021Publication date: February 17, 2022Inventors: Swagath VENKATARAMANI, Dipankar DAS, Ashish RANJAN, Subarno BANERJEE, Sasikanth AVANCHA, Ashok JAGANNATHAN, Ajaya V. DURG, Dheemanth NAGARAJ, Bharat KAUL, Anand RAGHUNATHAN
-
Publication number: 20210318980Abstract: A processor unit comprising a first controller to couple to a host processing unit over a first link; a second controller to couple to a second processor unit over a second link, wherein the second processor unit is to couple to the host central processing unit via a third link; and circuitry to determine whether to send a cache coherent request to the host central processing unit over the first link or over the second link via the second processing unit.Type: ApplicationFiled: June 25, 2021Publication date: October 14, 2021Applicant: Intel CorporationInventors: Rahul Pal, Nayan Amrutlal Suthar, David M. Puffer, Ashok Jagannathan
-
Publication number: 20190303743Abstract: Methods and apparatuses relating to processing neural networks are described. In one embodiment, an apparatus to process a neural network includes a plurality of fully connected layer chips coupled by an interconnect; a plurality of convolutional layer chips each coupled by an interconnect to a respective fully connected layer chip of the plurality of fully connected layer chips and each of the plurality of fully connected layer chips and the plurality of convolutional layer chips including an interconnect to couple each of a forward propagation compute intensive tile, a back propagation compute intensive tile, and a weight gradient compute intensive tile of a column of compute intensive tiles between a first memory intensive tile and a second memory intensive tile.Type: ApplicationFiled: September 27, 2016Publication date: October 3, 2019Inventors: Swagath VENKATARAMANI, Dipankar DAS, Ashish RANJAN, Subarno BANERJEE, Sasikanth AVANCHA, Ashok JAGANNATHAN, Ajaya V. DURG, Dheemanth NAGARAJ, Bharat KAUL, Anand RAGHUNATHAN
-
Patent number: 10339060Abstract: System, method, and processor for enabling early deallocation of tracker entries which track memory accesses are described herein. One embodiment of a method includes: maintaining an RSF corresponding to a first processing unit of a plurality of processing units to track cache lines, wherein a cache line is tracked by the RSF if the cache line is stored in both a memory and one or more other processing unit, the memory is coupled to and shared by the plurality of processing units; receiving a request to access a target cache line from a processing core of the first processing unit; allocating a tracker entry corresponding to the request, the tracker entry used to track a status of the request; performing a lookup in the RSF for the target cache line; and deallocating the tracker entry responsive to a detection that the target cache line is not tracked the RSF.Type: GrantFiled: December 30, 2016Date of Patent: July 2, 2019Assignee: Intel CorporationInventors: Bahaa Fahim, Ashok Jagannathan, Jeffrey D. Chamberlain, Samuel D. Strom
-
Publication number: 20190102311Abstract: A fabric controller to provide a coherent accelerator fabric, including: a host interconnect to communicatively couple to a host device; a memory interconnect to communicatively couple to an accelerator memory; an accelerator interconnect to communicatively couple to an accelerator having a last-level cache (LLC); and an LLC controller configured to provide a bias check for memory access operations.Type: ApplicationFiled: September 29, 2017Publication date: April 4, 2019Inventors: Ritu Gupta, Aravindh V. Anantaraman, Stephen R. Van Doren, Ashok Jagannathan
-
Publication number: 20180189180Abstract: System, method, and processor for enabling early deallocation of tracker entries which track memory accesses are described herein. One embodiment of a method includes: maintaining an RSF corresponding to a first processing unit of a plurality of processing units to track cache lines, wherein a cache line is tracked by the RSF if the cache line is stored in both a memory and one or more other processing unit, the memory is coupled to and shared by the plurality of processing units; receiving a request to access a target cache line from a processing core of the first processing unit; allocating a tracker entry corresponding to the request, the tracker entry used to track a status of the request; performing a lookup in the RSF for the target cache line; and deallocating the tracker entry responsive to a detection that the target cache line is not tracked the RSF.Type: ApplicationFiled: December 30, 2016Publication date: July 5, 2018Inventors: Bahaa Fahim, Ashok Jagannathan, Jeffrey D. Chamberlain, Samuel D. Strom
-
Patent number: 9727475Abstract: An apparatus and method are described for distributed snoop filtering. For example, one embodiment of a processor comprises: a plurality of cores to execute instructions and process data; first snoop logic to track a first plurality of cache lines stored in a mid-level cache (“MLC”) accessible by one or more of the cores, the first snoop logic to allocate entries for cache lines stored in the MLC and to deallocate entries for cache lines evicted from the MLC, wherein at least some of the cache lines evicted from the MLC are retained in a level 1 (L1) cache; and second snoop logic to track a second plurality of cache lines stored in a non-inclusive last level cache (NI LLC), the second snoop logic to allocate entries in the NI LLC for cache lines evicted from the MLC and to deallocate entries for cache lines stored in the MLC, wherein the second snoop logic is to store and maintain a first set of core valid bits to identify cores containing copies of the cache lines stored in the NI LLC.Type: GrantFiled: September 26, 2014Date of Patent: August 8, 2017Assignee: Intel CorporationInventors: Rahul Pal, Ishwar Agarwal, Yen-Cheng Liu, Joseph Nuzman, Ashok Jagannathan, Bahaa Fahim, Nithiyanandan Bashyam
-
Patent number: 9507596Abstract: A processor includes a core, a prefetcher, and a prefetcher control module. The prefetcher includes logic to make speculative prefetch requests through a memory subsystem for an element for execution by the core, and logic to store prefetched elements in a cache. The prefetcher control module includes logic to determine counts of memory accesses to two types of memory and, based upon the counts and the type of memory, reduce the speculative prefetch requests of the prefetcher.Type: GrantFiled: August 28, 2014Date of Patent: November 29, 2016Assignee: Intel CorporationInventors: Ashok Jagannathan, Prabhat Jain, Krishna N. Vinod, Avinash Sodani
-
Patent number: 9430392Abstract: Technologies for supporting large pages in hardware prefetchers are described. A processor includes a processor core comprising a pipeline, cache memory and a hardware prefetcher coupled to the processor core and the cache memory. The hardware prefetcher is a region-based hardware prefetcher to track memory regions of a predefined region size that is defined by software to be executed by the processor. The hardware prefetcher is operative to receive incoming requests and track different memory regions of predefined size with multiple streams in a stream table with stream entries. The hardware prefetcher generates a prefetch request and determines whether the prefetch request goes beyond a page boundary of the one memory region. The hardware prefetcher creates a new stream entry to track a successive memory region when the prefetch request goes beyond the page boundary of the one memory region, allowing subsequent prefetch requests to the successive memory region.Type: GrantFiled: March 26, 2014Date of Patent: August 30, 2016Assignee: Intel CorporationInventors: Prabhat Jain, Ashok Jagannathan
-
Publication number: 20160092366Abstract: An apparatus and method are described for distributed snoop filtering. For example, one embodiment of a processor comprises: a plurality of cores to execute instructions and process data; first snoop logic to track a first plurality of cache lines stored in a mid-level cache (“MLC”) accessible by one or more of the cores, the first snoop logic to allocate entries for cache lines stored in the MLC and to deallocate entries for cache lines evicted from the MLC, wherein at least some of the cache lines evicted from the MLC are retained in a level 1 (L1) cache; and second snoop logic to track a second plurality of cache lines stored in a non-inclusive last level cache (NI LLC), the second snoop logic to allocate entries in the NI LLC for cache lines evicted from the MLC and to deallocate entries for cache lines stored in the MLC, wherein the second snoop logic is to store and maintain a first set of core valid bits to identify cores containing copies of the cache lines stored in the NI LLC.Type: ApplicationFiled: September 26, 2014Publication date: March 31, 2016Inventors: Rahul PAL, Ishwar AGARWAL, Yen-Cheng LIU, Joseph NUZMAN, Ashok JAGANNATHAN, Bahaa FAHIM, Nithiyanandan BASHYAM
-
Publication number: 20160062768Abstract: A processor includes a core, a prefetcher, and a prefetcher control module. The prefetcher includes logic to make speculative prefetch requests through a memory subsystem for an element for execution by the core, and logic to store prefetched elements in a cache. The prefetcher control module includes logic to determine counts of memory accesses to two types of memory and, based upon the counts and the type of memory, reduce the speculative prefetch requests of the prefetcher.Type: ApplicationFiled: August 28, 2014Publication date: March 3, 2016Inventors: Ashok Jagannathan, Prabhat Jain, Krishna N. Vinod, Avinash Sodani
-
Patent number: 9229879Abstract: Embodiments of the present disclosure describe techniques and configurations to reduce power consumption using unmodified information in evicted cache lines. A method includes identifying unmodified information of a cache line stored in a cache of a processor, tracking the unmodified information using a bit vector comprising one or more bits to indicate the unmodified information of the cache line, and selectively suppressing a write operation or send operation for the unmodified information of the cache line that is evicted from the cache to an input/output (I/O) component coupled to the cache, the selective suppressing being based on the one or more bits, and the I/O component being an outer component external to the cache. Other embodiments may be described and/or claimed.Type: GrantFiled: July 11, 2011Date of Patent: January 5, 2016Assignee: Intel CorporationInventors: Mahesh K. Kumashikar, Ashok Jagannathan
-
Publication number: 20150278099Abstract: Technologies for supporting large pages in hardware prefetchers are described. A processor includes a processor core comprising a pipeline, cache memory and a hardware prefetcher coupled to the processor core and the cache memory. The hardware prefetcher is a region-based hardware prefetcher to track memory regions of a predefined region size that is defined by software to be executed by the processor. The hardware prefetcher is operative to receive incoming requests and track different memory regions of predefined size with multiple streams in a stream table with stream entries. The hardware prefetcher generates a prefetch request and determines whether the prefetch request goes beyond a page boundary of the one memory region. The hardware prefetcher creates a new stream entry to track a successive memory region when the prefetch request goes beyond the page boundary of the one memory region, allowing subsequent prefetch requests to the successive memory region.Type: ApplicationFiled: March 26, 2014Publication date: October 1, 2015Inventors: PRABHAT JAIN, ASHOK JAGANNATHAN
-
Patent number: 8862828Abstract: Method and apparatus to efficiently store and cache data. Cores of a processor and cache slices co-located with the cores may be grouped into a cluster. A memory space may be partitioned into address regions. The cluster may be associated with an address region from the address regions. Each memory address of the address region may be mapped to one or more of the cache slices grouped into the cluster. A cache access from one or more of the cores grouped into the cluster may be biased to the address region based on the association of the cluster with the address region.Type: GrantFiled: August 13, 2012Date of Patent: October 14, 2014Assignee: Intel CorporationInventors: Ravindra P. Saraf, Rahul Pal, Ashok Jagannathan
-
Publication number: 20140006715Abstract: Method and apparatus to efficiently store and cache data. Cores of a processor and cache slices co-located with the cores may be grouped into a cluster. A memory space may be partitioned into address regions. The cluster may be associated with an address region from the address regions. Each memory address of the address region may be mapped to one or more of the cache slices grouped into the cluster. A cache access from one or more of the cores grouped into the cluster may be biased to the address region based on the association of the cluster with the address region.Type: ApplicationFiled: August 13, 2012Publication date: January 2, 2014Applicant: INTEL CORPORATIONInventors: Ravindra P. Saraf, Rahul Pal, Ashok Jagannathan