Patents by Inventor Anurag Chaudhary
Anurag Chaudhary has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20250113287Abstract: A transceiver for sending and receiving data packets on a communication channel. The transceiver receives a first request packet including a plurality of information fields having a first type of operation to be performed on a memory device and a first address. The transceiver stores the first type of operation and the first address in a memory associated with the transceiver, and sends to a target device, the first request packet with the first address. The transceiver then receives a second request packet, including a second address in the memory device, and determines, based on the first type of operation, the first address, and the second address, that the second request packet is part of a sequence of request packets to the target device. The transceiver then eliminates, in a header of the second request packet, a portion of the second address to form a third request packet and sends the third request packet to the target device.Type: ApplicationFiled: September 29, 2023Publication date: April 3, 2025Inventors: Mark Rosenbluth, Anurag Chaudhary, Harsh Kumar, Guan Wang
-
Patent number: 12235765Abstract: Various embodiments include techniques for storing data in a repurposed cache memory in a computing system. The disclosed techniques include a system level cache controller that processes a memory operation for a processing unit. The controller and the processing unit communicate over a network-on-chip. To process the memory operation, the controller selects a repurposed cache memory from a pool of active cache memories associated with processing units that are inoperable and/or are in a low-power state. To select the repurposed cache memory, the controller generates a candidate vector that identifies the position of the requesting processing unit relative to the controller. The candidate vector enables the controller in selecting a repurposed cache memory that is, for example, on the shortest path between the processing unit and the controller. These techniques result in a lower latency, and improved memory performance, relative to prior conventional techniques.Type: GrantFiled: May 23, 2023Date of Patent: February 25, 2025Assignee: NVIDIA CORPORATIONInventors: Ariel Szapiro, Anurag Chaudhary, Mark Rosenbluth, Mayank Baunthiyal
-
Publication number: 20250061078Abstract: In various examples, when a bridge of a chip has received an eviction request from a client of the chip, the bridge may transmit a read request that corresponds to the same cache line to another chip without waiting for an inter-chip completion response for the eviction request. When the read request is received, the bridge may determine whether the eviction request has already been sent to the other chip and transmit the read request based at least on the eviction request being sent to the other chip using an ordered communication network to ensure the communications are received and/or processed by the other chip in an order that maintains memory coherency. Additionally, the chips may process read unique requests without using an inter-chip completion acknowledgement and may process copy back requests by transmitting corresponding copy back write data with the copy back requests.Type: ApplicationFiled: August 15, 2023Publication date: February 20, 2025Inventors: Anurag Chaudhary, Guan Wang, Harsh Kumar
-
Publication number: 20240394186Abstract: Various embodiments include techniques for storing data in a repurposed cache memory in a computing system. The disclosed techniques include a system level cache controller that processes a memory operation for a processing unit. The controller and the processing unit communicate over a network-on-chip. To process the memory operation, the controller selects a repurposed cache memory from a pool of active cache memories associated with processing units that are inoperable and/or are in a low-power state. To select the repurposed cache memory, the controller generates a candidate vector that identifies the position of the requesting processing unit relative to the controller. The candidate vector enables the controller in selecting a repurposed cache memory that is, for example, on the shortest path between the processing unit and the controller. These techniques result in a lower latency, and improved memory performance, relative to prior conventional techniques.Type: ApplicationFiled: May 23, 2023Publication date: November 28, 2024Inventors: Ariel SZAPIRO, Anurag CHAUDHARY, Mark ROSENBLUTH, Mayank BAUNTHIYAL
-
Publication number: 20240296130Abstract: Various embodiments include a network for transmitting data words from a source node to a destination node. The source node optionally inverts the logic levels of each data word so that the number of logic ‘1’ bits in each data word is less than or equal to half of the data bits. The destination node recovers the original data words by passing the data words not inverted by the source node and inverting the data words that were inverted by the source node. As the packet is transmitted through the network, each node encodes and/or decodes the data words by generating an output transition for each logic ‘1’ bit of the input data word. Because no more than half the bits of the input data word are logic ‘1’ bits, the node generates output transitions for no more than one half of the data bits.Type: ApplicationFiled: March 2, 2023Publication date: September 5, 2024Inventors: Anurag CHAUDHARY, Scott Matthew PITKETHLY, Peter Lindsay GENTLE
-
Patent number: 12072815Abstract: Various embodiments include a network for transmitting data words from a source node to a destination node. The source node optionally inverts the logic levels of each data word so that the number of logic ‘1’ bits in each data word is less than or equal to half of the data bits. The destination node recovers the original data words by passing the data words not inverted by the source node and inverting the data words that were inverted by the source node. As the packet is transmitted through the network, each node encodes and/or decodes the data words by generating an output transition for each logic ‘1’ bit of the input data word. Because no more than half the bits of the input data word are logic ‘1’ bits, the node generates output transitions for no more than one half of the data bits.Type: GrantFiled: March 2, 2023Date of Patent: August 27, 2024Assignee: NVIDIA CORPORATIONInventors: Anurag Chaudhary, Scott Matthew Pitkethly, Peter Lindsay Gentle
-
Patent number: 11809319Abstract: The technology disclosed herein involves tracking contention and using the tracked contention to manage processor cache. The technology can be implemented in a processor's cache controlling logic and can enable the processor to track which locations in main memory are contentious. The technology can use the contentiousness of locations to determine where to store the data in cache and how to allocate and evict cache lines in the cache. In one example, the technology can store the data in a shared cache when the location is contentious and can bypass the shared cache and store the data in the private cache when the location is uncontentious. This may be advantageous because storing the data in shared cache can reduce or avoid having multiple copies in different private caches and can reduce the cache coherency overhead involved to keep copies in the private caches in sync.Type: GrantFiled: January 20, 2022Date of Patent: November 7, 2023Assignee: Nvidia CorporationInventors: Anurag Chaudhary, Christopher Richard Feilbach, Jasjit Singh, Manuel Gautho, Aprajith Thirumalai, Shailender Chaudhry
-
Patent number: 11789869Abstract: The technology disclosed herein involves tracking contention and using the tracked contention to reduce latency of exclusive memory operations. The technology enables a processor to track which locations in main memory are contentious and to modify the order exclusive memory operations are processed based on the contentiousness. A thread can include multiple exclusive operations for the same memory location (e.g., exclusive load and a complementary exclusive store). The multiple exclusive memory operations can be added to a queue and include one or more intervening operations between them in the queue. The processor may process the operations in the queue based on the order they were added and may use the tracked contention to perform out-of-order processing for some of the exclusive operations. For example, the processor can execute the exclusive load operation and because the corresponding location is contentious can process the complementary exclusive store operation before the intervening operations.Type: GrantFiled: January 20, 2022Date of Patent: October 17, 2023Assignee: Nvidia CorporationInventors: Anurag Chaudhary, Christopher Richard Feilbach, Jasjit Singh, Manuel Gautho, Aprajith Thirumalai, Shailender Chaudhry
-
Publication number: 20230244604Abstract: The technology disclosed herein involves tracking contention and using the tracked contention to reduce latency of exclusive memory operations. The technology enables a processor to track which locations in main memory are contentious and to modify the order exclusive memory operations are processed based on the contentiousness. A thread can include multiple exclusive operations for the same memory location (e.g., exclusive load and a complementary exclusive store). The multiple exclusive memory operations can be added to a queue and include one or more intervening operations between them in the queue. The processor may process the operations in the queue based on the order they were added and may use the tracked contention to perform out-of-order processing for some of the exclusive operations. For example, the processor can execute the exclusive load operation and because the corresponding location is contentious can process the complementary exclusive store operation before the intervening operations.Type: ApplicationFiled: January 20, 2022Publication date: August 3, 2023Inventors: Anurag Chaudhary, Christopher Richard Feilbach, Jasjit Singh, Manuel Gautho, Aprajith Thirumalai, Shailender Chaudhry
-
Publication number: 20230244603Abstract: The technology disclosed herein involves tracking contention and using the tracked contention to manage processor cache. The technology can be implemented in a processor’s cache controlling logic and can enable the processor to track which locations in main memory are contentious. The technology can use the contentiousness of locations to determine where to store the data in cache and how to allocate and evict cache lines in the cache. In one example, the technology can store the data in a shared cache when the location is contentious and can bypass the shared cache and store the data in the private cache when the location is uncontentious. This may be advantageous because storing the data in shared cache can reduce or avoid having multiple copies in different private caches and can reduce the cache coherency overhead involved to keep copies in the private caches in sync.Type: ApplicationFiled: January 20, 2022Publication date: August 3, 2023Inventors: Anurag Chaudhary, Christopher Richard Feilbach, Jasjit Singh, Manuel Gautho, Aprajith Thirumalai, Shailender Chaudhry
-
Patent number: 9824009Abstract: Systems and methods for coherency maintenance are presented. The systems and methods include utilization of multiple information state tracking approaches or protocols at different memory or storage levels. In one embodiment, a first coherency maintenance approach (e.g., similar to a MESI protocol, etc.) can be implemented at one storage level while a second coherency maintenance approach (e.g., similar to a MOESI protocol, etc.) can be implemented at another storage level. Information at a particular storage level or tier can be tracked by a set of local state indications and a set of essence state indications. The essence state indication can be tracked “externally” from a storage layer or tier directory (e.g., in a directory of another cache level, in a hub between cache levels, etc.). One storage level can control operations based upon the local state indications and another storage level can control operations based in least in part upon an essence state indication.Type: GrantFiled: December 21, 2012Date of Patent: November 21, 2017Assignee: NVIDIA CORPORATIONInventors: Anurag Chaudhary, Guillermo Juan Rozas
-
Patent number: 9639471Abstract: Attributes of access requests can be used to distinguish one set of access requests from another set of access requests. The prefetcher can determine a pattern for each set of access requests and then prefetch cache lines accordingly. In an embodiment in which there are multiple caches, a prefetcher can determine a destination for prefetched cache lines associated with a respective set of access requests. For example, the prefetcher can prefetch one set of cache lines into one cache, and another set of cache lines into another cache. Also, the prefetcher can determine a prefetch distance for each set of access requests. For example, the prefetch distances for the sets of access requests can be different.Type: GrantFiled: November 27, 2012Date of Patent: May 2, 2017Assignee: NVIDIA CorporationInventor: Anurag Chaudhary
-
Patent number: 9563562Abstract: Prefetching is permitted to cross from one physical memory page to another. More specifically, if a stream of access requests contains virtual addresses that map to more than one physical memory page, then prefetching can continue from a first physical memory page to a second physical memory page. The prefetching advantageously continues to the second physical memory page based on the confidence level and prefetch distance established while the first physical memory page was the target of the access requests.Type: GrantFiled: November 27, 2012Date of Patent: February 7, 2017Assignee: Nvidia CorporationInventors: Joseph Rowlands, Anurag Chaudhary
-
Patent number: 9367467Abstract: A system and method for managing cache replacements and a memory subsystem incorporating the system or the method. In one embodiment, the system includes: (1) a cache controller operable to control a cache and, in order: (1a) issue a pre-fetch command when the cache has a cache miss, (1b) perform at least one housekeeping task to ensure that the cache can store a replacement line and (1c) issue a fetch command and (2) a memory controller associated with a memory of a lower level than the cache and operable to respond to the pre-fetch command by performing at least one housekeeping task to ensure that the memory can provide the replacement line and respond to the fetch command by providing the replacement line.Type: GrantFiled: August 22, 2014Date of Patent: June 14, 2016Assignee: Nvidia CorporationInventors: Anurag Chaudhary, Guillermo Rozas
-
Publication number: 20160055087Abstract: A system and method for managing cache replacements and a memory subsystem incorporating the system or the method. In one embodiment, the system includes: (1) a cache controller operable to control a cache and, in order: (1a) issue a pre-fetch command when the cache has a cache miss, (1b) perform at least one housekeeping task to ensure that the cache can store a replacement line and (1c) issue a fetch command and (2) a memory controller associated with a memory of a lower level than the cache and operable to respond to the pre-fetch command by performing at least one housekeeping task to ensure that the memory can provide the replacement line and respond to the fetch command by providing the replacement line.Type: ApplicationFiled: August 22, 2014Publication date: February 25, 2016Inventors: Anurag Chaudhary, Guillermo Rozas
-
Patent number: 9262328Abstract: Cache hit information is used to manage (e.g., cap) the prefetch distance for a cache. In an embodiment in which there is a first cache and a second cache, where the second cache (e.g., a level two cache) has greater latency than the first cache (e.g., a level one cache), a prefetcher prefetches cache lines to the second cache and is configured to receive feedback from that cache. The feedback indicates whether an access request issued in response to a cache miss in the first cache results in a cache hit in the second cache. The prefetch distance for the second cache is determined according to the feedback.Type: GrantFiled: November 27, 2012Date of Patent: February 16, 2016Assignee: NVIDIA CORPORATIONInventor: Anurag Chaudhary
-
Publication number: 20140181404Abstract: Systems and methods for coherency maintenance are presented. The systems and methods include utilization of multiple information state tracking approaches or protocols at different memory or storage levels. In one embodiment, a first coherency maintenance approach (e.g., similar to a MESI protocol, etc.) can be implemented at one storage level while a second coherency maintenance approach (e.g., similar to a MOESI protocol, etc.) can be implemented at another storage level. Information at a particular storage level or tier can be tracked by a set of local state indications and a set of essence state indications. The essence state indication can be tracked “externally” from a storage layer or tier directory (e.g., in a directory of another cache level, in a hub between cache levels, etc.). One storage level can control operations based upon the local state indications and another storage level can control operations based in least in part upon an essence state indication.Type: ApplicationFiled: December 21, 2012Publication date: June 26, 2014Applicant: NVIDIA CORPORATIONInventors: Anurag Chaudhary, Guillermo Juan Rozas
-
Publication number: 20140149668Abstract: Attributes of access requests can be used to distinguish one set of access requests from another set of access requests. The prefetcher can determine a pattern for each set of access requests and then prefetch cache lines accordingly. In an embodiment in which there are multiple caches, a prefetcher can determine a destination for prefetched cache lines associated with a respective set of access requests. For example, the prefetcher can prefetch one set of cache lines into one cache, and another set of cache lines into another cache. Also, the prefetcher can determine a prefetch distance for each set of access requests. For example, the prefetch distances for the sets of access requests can be different.Type: ApplicationFiled: November 27, 2012Publication date: May 29, 2014Applicant: NVIDIA CORPORATIONInventor: Anurag Chaudhary
-
Publication number: 20140149678Abstract: Cache hit information is used to manage (e.g., cap) the prefetch distance for a cache. In an embodiment in which there is a first cache and a second cache, where the second cache (e.g., a level two cache) has greater latency than the first cache (e.g., a level one cache), a prefetcher prefetches cache lines to the second cache and is configured to receive feedback from that cache. The feedback indicates whether an access request issued in response to a cache miss in the first cache results in a cache hit in the second cache. The prefetch distance for the second cache is determined according to the feedback.Type: ApplicationFiled: November 27, 2012Publication date: May 29, 2014Applicant: NVIDIA CORPORATIONInventor: Anurag Chaudhary
-
Publication number: 20140149679Abstract: Prefetching is permitted to cross from one physical memory page to another. More specifically, if a stream of access requests contains virtual addresses that map to more than one physical memory page, then prefetching can continue from a first physical memory page to a second physical memory page. The prefetching advantageously continues to the second physical memory page based on the confidence level and prefetch distance established while the first physical memory page was the target of the access requests.Type: ApplicationFiled: November 27, 2012Publication date: May 29, 2014Applicant: NVIDIA CORPORATIONInventors: Joseph Rowlands, Anurag Chaudhary