Patents by Inventor Shailender Chaudhry

Shailender Chaudhry has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11809319
    Abstract: The technology disclosed herein involves tracking contention and using the tracked contention to manage processor cache. The technology can be implemented in a processor's cache controlling logic and can enable the processor to track which locations in main memory are contentious. The technology can use the contentiousness of locations to determine where to store the data in cache and how to allocate and evict cache lines in the cache. In one example, the technology can store the data in a shared cache when the location is contentious and can bypass the shared cache and store the data in the private cache when the location is uncontentious. This may be advantageous because storing the data in shared cache can reduce or avoid having multiple copies in different private caches and can reduce the cache coherency overhead involved to keep copies in the private caches in sync.
    Type: Grant
    Filed: January 20, 2022
    Date of Patent: November 7, 2023
    Assignee: Nvidia Corporation
    Inventors: Anurag Chaudhary, Christopher Richard Feilbach, Jasjit Singh, Manuel Gautho, Aprajith Thirumalai, Shailender Chaudhry
  • Patent number: 11789869
    Abstract: The technology disclosed herein involves tracking contention and using the tracked contention to reduce latency of exclusive memory operations. The technology enables a processor to track which locations in main memory are contentious and to modify the order exclusive memory operations are processed based on the contentiousness. A thread can include multiple exclusive operations for the same memory location (e.g., exclusive load and a complementary exclusive store). The multiple exclusive memory operations can be added to a queue and include one or more intervening operations between them in the queue. The processor may process the operations in the queue based on the order they were added and may use the tracked contention to perform out-of-order processing for some of the exclusive operations. For example, the processor can execute the exclusive load operation and because the corresponding location is contentious can process the complementary exclusive store operation before the intervening operations.
    Type: Grant
    Filed: January 20, 2022
    Date of Patent: October 17, 2023
    Assignee: Nvidia Corporation
    Inventors: Anurag Chaudhary, Christopher Richard Feilbach, Jasjit Singh, Manuel Gautho, Aprajith Thirumalai, Shailender Chaudhry
  • Publication number: 20230244604
    Abstract: The technology disclosed herein involves tracking contention and using the tracked contention to reduce latency of exclusive memory operations. The technology enables a processor to track which locations in main memory are contentious and to modify the order exclusive memory operations are processed based on the contentiousness. A thread can include multiple exclusive operations for the same memory location (e.g., exclusive load and a complementary exclusive store). The multiple exclusive memory operations can be added to a queue and include one or more intervening operations between them in the queue. The processor may process the operations in the queue based on the order they were added and may use the tracked contention to perform out-of-order processing for some of the exclusive operations. For example, the processor can execute the exclusive load operation and because the corresponding location is contentious can process the complementary exclusive store operation before the intervening operations.
    Type: Application
    Filed: January 20, 2022
    Publication date: August 3, 2023
    Inventors: Anurag Chaudhary, Christopher Richard Feilbach, Jasjit Singh, Manuel Gautho, Aprajith Thirumalai, Shailender Chaudhry
  • Publication number: 20230244603
    Abstract: The technology disclosed herein involves tracking contention and using the tracked contention to manage processor cache. The technology can be implemented in a processor’s cache controlling logic and can enable the processor to track which locations in main memory are contentious. The technology can use the contentiousness of locations to determine where to store the data in cache and how to allocate and evict cache lines in the cache. In one example, the technology can store the data in a shared cache when the location is contentious and can bypass the shared cache and store the data in the private cache when the location is uncontentious. This may be advantageous because storing the data in shared cache can reduce or avoid having multiple copies in different private caches and can reduce the cache coherency overhead involved to keep copies in the private caches in sync.
    Type: Application
    Filed: January 20, 2022
    Publication date: August 3, 2023
    Inventors: Anurag Chaudhary, Christopher Richard Feilbach, Jasjit Singh, Manuel Gautho, Aprajith Thirumalai, Shailender Chaudhry
  • Patent number: 10642744
    Abstract: An improved architectural means to address processor cache attacks based on speculative execution defines a new memory type that is both cacheable and inaccessible by speculation. Speculative execution cannot access and expose a memory location that is speculatively inaccessible. Such mechanisms can disqualify certain sensitive data from being exposed through speculative execution. Data which must be protected at a performance cost may be specifically marked. If the processor is told where secrets are stored in memory and is forbidden from speculating on those memory locations, then the processor will ensure the process trying to access those memory locations is privileged to access those locations before reading and caching them. Such countermeasure is effective against attacks that use speculative execution to leak secrets from a processor cache.
    Type: Grant
    Filed: June 28, 2018
    Date of Patent: May 5, 2020
    Assignee: NVIDIA Corporation
    Inventors: Darrell D. Boggs, Ross Segelken, Mike Cornaby, Nick Fortino, Shailender Chaudhry, Denis Khartikov, Alok Mooley, Nathan Tuck, Gordon Vreugdenhil
  • Publication number: 20190004961
    Abstract: An improved architectural means to address processor cache attacks based on speculative execution defines a new memory type that is both cacheable and inaccessible by speculation. Speculative execution cannot access and expose a memory location that is speculatively inaccessible. Such mechanisms can disqualify certain sensitive data from being exposed through speculative execution. Data which must be protected at a performance cost may be specifically marked. If the processor is told where secrets are stored in memory and is forbidden from speculating on those memory locations, then the processor will ensure the process trying to access those memory locations is privileged to access those locations before reading and caching them. Such countermeasure is effective against attacks that use speculative execution to leak secrets from a processor cache.
    Type: Application
    Filed: June 28, 2018
    Publication date: January 3, 2019
    Inventors: Darrell D. BOGGS, Ross SEGELKEN, Mike CORNABY, Nick FORTINO, Shailender CHAUDHRY, Denis KHARTIKOV, Alok MOOLEY, Nathan TUCK, Gordon VREUGDENHIL
  • Patent number: 9471395
    Abstract: Embodiments of the present technology provide for migrating processes executing one any one of a plurality of cores in a multi-core cluster to a core of a separate cluster without first having to transfer the processes to a predetermined core of the multi-core cluster. Similarly, the processes may be transferred from the core of the separate cluster to the given core of the multi-core cluster.
    Type: Grant
    Filed: August 23, 2012
    Date of Patent: October 18, 2016
    Assignee: NVIDIA Corporation
    Inventors: Sagheer Ahmad, Shailender Chaudhry, John George Mathieson, Mark Alan Overby
  • Patent number: 9280343
    Abstract: Some embodiments of the present invention provide a system for operating a store queue, wherein the store queue buffers stores that are waiting to be committed to a memory system in a processor. During operation, the system examines an entry at the head of the store queue. If the entry contains a membar token, the system examines an unacknowledged counter that keeps track of the number of store operations that have been sent from the store queue to the memory system but have not been acknowledged as being committed to the memory system. If the unacknowledged counter is non-zero, the system waits until the unacknowledged counter equals zero, and then removes the membar token from the store queue.
    Type: Grant
    Filed: August 10, 2009
    Date of Patent: March 8, 2016
    Assignee: ORACLE AMERICA, INC.
    Inventors: Haakan E. Zeffer, Robert E. Cypher, Shailender Chaudhry
  • Patent number: 9268710
    Abstract: One embodiment of the present invention provides a system that facilitates efficient transactional execution. The system starts by executing a transaction for a thread, wherein executing the transaction involves placing load-marks on cache lines which are loaded during the transaction and placing store-marks on cache lines which are stored to during the transaction. Upon completing the transaction, the system releases the load-marks and the store-marks from the cache lines which were load-marked and store-marked during the transaction. Note that during the transaction, the load-marks and store-marks prevent interfering accesses from other threads to the cache lines.
    Type: Grant
    Filed: January 18, 2007
    Date of Patent: February 23, 2016
    Assignee: ORACLE AMERICA, INC.
    Inventors: Robert E. Cypher, Shailender Chaudhry
  • Patent number: 9256438
    Abstract: A computer processor pipeline has both an architectural register file and a working register file. The lifetime of an entry in the working register file is determined by a predetermined number of instructions passing through a specified stage in the pipeline after the location in the working register file is allocated for an instruction. The size of the working register file is selected based upon performance characteristics. A working register file creditor indicator is coupled to the front end pipeline portion and to the back end pipeline portion. The working register file credit indicator is monitored to prevent a working register file overflow. When the a location in the architectural register file is read early, the location is monitored to determine whether the location is written to prior to issuance of the instruction associated with the early read.
    Type: Grant
    Filed: January 15, 2009
    Date of Patent: February 9, 2016
    Assignee: ORACLE AMERICA, INC.
    Inventors: Shailender Chaudhry, Paul Caprioli, Marc Tremblay
  • Patent number: 9146744
    Abstract: Embodiments of the present invention provide a system which executes a load instruction or a store instruction. During operation the system receives a load instruction. The system then determines if an unrestricted entry or a restricted entry in a store queue contains data that satisfies the load instruction. If not, the system retrieves data for the load instruction from a cache.
    Type: Grant
    Filed: May 6, 2008
    Date of Patent: September 29, 2015
    Assignee: ORACLE AMERICA, INC.
    Inventors: Paul Caprioli, Martin Karlsson, Shailender Chaudhry, Gideon N. Levinsky
  • Patent number: 9086889
    Abstract: Techniques are disclosed relating to reducing the latency of restarting a pipeline in a processor that implements scouting. In one embodiment, the processor may reduce pipeline restart latency using two instruction fetch units that are configured to fetch and re-fetch instructions in parallel with one another. In some embodiments, the processor may reduce pipeline restart latency by initiating re-fetching instructions in response to determining that a commit operation is to be attempted with respect to one or more deferred instructions. In other embodiments, the processor may reduce pipeline restart latency by initiating re-fetching instructions in response to receiving an indication that a request for a set of data has been received by a cache, where the indication is sent by the cache before determining whether the data is present in the cache or not.
    Type: Grant
    Filed: April 27, 2010
    Date of Patent: July 21, 2015
    Assignee: Oracle International Corporation
    Inventors: Martin Karlsson, Sherman H. Yip, Shailender Chaudhry
  • Patent number: 8984264
    Abstract: The described embodiments provide a system for executing instructions in a processor. In the described embodiments, upon detecting a return of input data for a deferred instruction while executing instructions in an execute-ahead mode, the processor determines whether a replay bit is set in a corresponding entry for the returned input data in a miss buffer. If the replay bit is set, the processor transitions to a deferred-execution mode to execute deferred instructions. Otherwise, the processor continues to execute instructions in the execute-ahead mode.
    Type: Grant
    Filed: January 15, 2010
    Date of Patent: March 17, 2015
    Assignee: Oracle America, Inc.
    Inventors: Martin R. Karlsson, Sherman H. Yip, Shailender Chaudhry
  • Patent number: 8898436
    Abstract: A register file, in a processor, includes a first plurality of registers of a first size, n-bits. A decoder uses a mapping that divides the register file into a second plurality M of registers having a second size. Each of the registers having the second size is assigned a different name in a continuous name space. Each register of the second size includes a plurality N of registers of the first size, n-bits. Each register in the plurality N of registers is assigned the same name as the register of the second size that includes that plurality. State information is maintained in the register file for each n-bit register. The dependence of an instruction on other instructions is detected through the continuous name space. The state information allows the processor to determine when the information in any portion, or all, of a register is valid.
    Type: Grant
    Filed: April 20, 2009
    Date of Patent: November 25, 2014
    Assignee: Oracle America, Inc.
    Inventors: Shailender Chaudhry, Marc Tremblay
  • Patent number: 8745419
    Abstract: A processor includes a device providing a throttling power output signal. The throttling power output signal is used to determine when to logically throttle the power consumed by the processor. At least one core in the processor includes a pipeline having a decode pipe; and a logical power throttling unit coupled to the device to receive the output signal, and coupled to the decode pipe. Following the logical power throttling unit receiving the power throttling output signal satisfying a predetermined criterion, the logical power throttling unit causes the decode pipe to reduce an average number of instructions decoded per processor cycle without physically changing the processor cycle or any processor supply voltages.
    Type: Grant
    Filed: June 21, 2012
    Date of Patent: June 3, 2014
    Assignee: Oracle America, Inc.
    Inventors: Shailender Chaudhry, Quinn A. Jacobson, Marc Tremblay
  • Patent number: 8732407
    Abstract: Some embodiments of the present invention provide a system that avoids deadlock while attempting to acquire store-marks on cache lines. During operation, the system keeps track of store-mark requests that arise during execution of a thread, wherein a store-mark on a cache line indicates that one or more associated store buffer entries are waiting to be committed to the cache line. In this system, store-mark requests are processed in a pipelined manner, which allows a store-mark request to be initiated before preceding store-mark requests for the same thread complete. Next, if a store-mark request fails, within a bounded amount of time, the system removes or prevents store-marks associated with younger store-mark requests for the same thread, thereby avoiding a potential deadlock that can arise when one or more other threads attempt to store-mark the same cache lines.
    Type: Grant
    Filed: November 19, 2008
    Date of Patent: May 20, 2014
    Assignee: Oracle America, Inc.
    Inventors: Robert E. Cypher, Haakan E. Zeffer, Shailender Chaudhry
  • Patent number: 8688963
    Abstract: The embodiments described in the instant application provide a system for generating checkpoints. In the described embodiments, while speculatively executing instructions with one or more checkpoints in use, upon detecting an occurrence of a predetermined operating condition or encountering a predetermined type of instruction, the system is configured to determine whether an additional checkpoint is to be generated by computing a factor based on one or more operating conditions of the processor. When the factor is greater than a predetermined value, the processor is configured to generate the additional checkpoint.
    Type: Grant
    Filed: April 22, 2010
    Date of Patent: April 1, 2014
    Assignee: Oracle International Corporation
    Inventors: Shailender Chaudhry, Martin R. Karlsson, Sherman H. Yip
  • Publication number: 20140059548
    Abstract: Embodiments of the present technology provide for migrating processes executing one any one of a plurality of cores in a multi-core cluster to a core of a separate cluster without first having to transfer the processes to a predetermined core of the multi-core cluster. Similarly, the processes may be transferred from the core of the separate cluster to the given core of the multi-core cluster.
    Type: Application
    Filed: August 23, 2012
    Publication date: February 27, 2014
    Applicant: NVIDIA Corporation
    Inventors: Sagheer Ahmad, Shailender Chaudhry, John George Mathieson, Mark Alan Overby
  • Patent number: 8627044
    Abstract: The described embodiments include a processor that determines instructions that can be issued based on unresolved data dependencies. In an issue unit in the processor, the processor keeps a record of each instruction that is directly or indirectly dependent on a base instruction. Upon determining that the base instruction has been deferred, the processor monitors instructions that are being issued from an issue queue to an execution unit for execution. Upon determining that an instruction from the record has reached a head of the issue queue, the processor immediately issues the instruction from the issue queue.
    Type: Grant
    Filed: October 6, 2010
    Date of Patent: January 7, 2014
    Assignee: Oracle International Corporation
    Inventors: Shailender Chaudhry, Richard Thuy Van, Robert E. Cypher, Debasish Chandra
  • Patent number: 8601240
    Abstract: The described embodiments provide a system for executing instructions in a processor. While executing instructions in an execute-ahead mode, the processor encounters a store instruction for which a destination address is unknown. The processor then defers the store instruction. Upon encountering a load instruction while the store instruction with the unknown destination address is deferred, the processor determines if the load instruction is to continue executing. If not, the processor defers the load instruction. Otherwise, the processor continues executing the load instruction.
    Type: Grant
    Filed: May 4, 2010
    Date of Patent: December 3, 2013
    Assignee: Oracle International Corporation
    Inventors: Shailender Chaudhry, Martin R. Karlsson, Gideon N. Levinsky