Patents by Inventor Shailender Chaudhry
Shailender Chaudhry has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11809319Abstract: The technology disclosed herein involves tracking contention and using the tracked contention to manage processor cache. The technology can be implemented in a processor's cache controlling logic and can enable the processor to track which locations in main memory are contentious. The technology can use the contentiousness of locations to determine where to store the data in cache and how to allocate and evict cache lines in the cache. In one example, the technology can store the data in a shared cache when the location is contentious and can bypass the shared cache and store the data in the private cache when the location is uncontentious. This may be advantageous because storing the data in shared cache can reduce or avoid having multiple copies in different private caches and can reduce the cache coherency overhead involved to keep copies in the private caches in sync.Type: GrantFiled: January 20, 2022Date of Patent: November 7, 2023Assignee: Nvidia CorporationInventors: Anurag Chaudhary, Christopher Richard Feilbach, Jasjit Singh, Manuel Gautho, Aprajith Thirumalai, Shailender Chaudhry
-
Patent number: 11789869Abstract: The technology disclosed herein involves tracking contention and using the tracked contention to reduce latency of exclusive memory operations. The technology enables a processor to track which locations in main memory are contentious and to modify the order exclusive memory operations are processed based on the contentiousness. A thread can include multiple exclusive operations for the same memory location (e.g., exclusive load and a complementary exclusive store). The multiple exclusive memory operations can be added to a queue and include one or more intervening operations between them in the queue. The processor may process the operations in the queue based on the order they were added and may use the tracked contention to perform out-of-order processing for some of the exclusive operations. For example, the processor can execute the exclusive load operation and because the corresponding location is contentious can process the complementary exclusive store operation before the intervening operations.Type: GrantFiled: January 20, 2022Date of Patent: October 17, 2023Assignee: Nvidia CorporationInventors: Anurag Chaudhary, Christopher Richard Feilbach, Jasjit Singh, Manuel Gautho, Aprajith Thirumalai, Shailender Chaudhry
-
Publication number: 20230244604Abstract: The technology disclosed herein involves tracking contention and using the tracked contention to reduce latency of exclusive memory operations. The technology enables a processor to track which locations in main memory are contentious and to modify the order exclusive memory operations are processed based on the contentiousness. A thread can include multiple exclusive operations for the same memory location (e.g., exclusive load and a complementary exclusive store). The multiple exclusive memory operations can be added to a queue and include one or more intervening operations between them in the queue. The processor may process the operations in the queue based on the order they were added and may use the tracked contention to perform out-of-order processing for some of the exclusive operations. For example, the processor can execute the exclusive load operation and because the corresponding location is contentious can process the complementary exclusive store operation before the intervening operations.Type: ApplicationFiled: January 20, 2022Publication date: August 3, 2023Inventors: Anurag Chaudhary, Christopher Richard Feilbach, Jasjit Singh, Manuel Gautho, Aprajith Thirumalai, Shailender Chaudhry
-
Publication number: 20230244603Abstract: The technology disclosed herein involves tracking contention and using the tracked contention to manage processor cache. The technology can be implemented in a processor’s cache controlling logic and can enable the processor to track which locations in main memory are contentious. The technology can use the contentiousness of locations to determine where to store the data in cache and how to allocate and evict cache lines in the cache. In one example, the technology can store the data in a shared cache when the location is contentious and can bypass the shared cache and store the data in the private cache when the location is uncontentious. This may be advantageous because storing the data in shared cache can reduce or avoid having multiple copies in different private caches and can reduce the cache coherency overhead involved to keep copies in the private caches in sync.Type: ApplicationFiled: January 20, 2022Publication date: August 3, 2023Inventors: Anurag Chaudhary, Christopher Richard Feilbach, Jasjit Singh, Manuel Gautho, Aprajith Thirumalai, Shailender Chaudhry
-
Patent number: 10642744Abstract: An improved architectural means to address processor cache attacks based on speculative execution defines a new memory type that is both cacheable and inaccessible by speculation. Speculative execution cannot access and expose a memory location that is speculatively inaccessible. Such mechanisms can disqualify certain sensitive data from being exposed through speculative execution. Data which must be protected at a performance cost may be specifically marked. If the processor is told where secrets are stored in memory and is forbidden from speculating on those memory locations, then the processor will ensure the process trying to access those memory locations is privileged to access those locations before reading and caching them. Such countermeasure is effective against attacks that use speculative execution to leak secrets from a processor cache.Type: GrantFiled: June 28, 2018Date of Patent: May 5, 2020Assignee: NVIDIA CorporationInventors: Darrell D. Boggs, Ross Segelken, Mike Cornaby, Nick Fortino, Shailender Chaudhry, Denis Khartikov, Alok Mooley, Nathan Tuck, Gordon Vreugdenhil
-
Publication number: 20190004961Abstract: An improved architectural means to address processor cache attacks based on speculative execution defines a new memory type that is both cacheable and inaccessible by speculation. Speculative execution cannot access and expose a memory location that is speculatively inaccessible. Such mechanisms can disqualify certain sensitive data from being exposed through speculative execution. Data which must be protected at a performance cost may be specifically marked. If the processor is told where secrets are stored in memory and is forbidden from speculating on those memory locations, then the processor will ensure the process trying to access those memory locations is privileged to access those locations before reading and caching them. Such countermeasure is effective against attacks that use speculative execution to leak secrets from a processor cache.Type: ApplicationFiled: June 28, 2018Publication date: January 3, 2019Inventors: Darrell D. BOGGS, Ross SEGELKEN, Mike CORNABY, Nick FORTINO, Shailender CHAUDHRY, Denis KHARTIKOV, Alok MOOLEY, Nathan TUCK, Gordon VREUGDENHIL
-
Patent number: 9471395Abstract: Embodiments of the present technology provide for migrating processes executing one any one of a plurality of cores in a multi-core cluster to a core of a separate cluster without first having to transfer the processes to a predetermined core of the multi-core cluster. Similarly, the processes may be transferred from the core of the separate cluster to the given core of the multi-core cluster.Type: GrantFiled: August 23, 2012Date of Patent: October 18, 2016Assignee: NVIDIA CorporationInventors: Sagheer Ahmad, Shailender Chaudhry, John George Mathieson, Mark Alan Overby
-
Patent number: 9280343Abstract: Some embodiments of the present invention provide a system for operating a store queue, wherein the store queue buffers stores that are waiting to be committed to a memory system in a processor. During operation, the system examines an entry at the head of the store queue. If the entry contains a membar token, the system examines an unacknowledged counter that keeps track of the number of store operations that have been sent from the store queue to the memory system but have not been acknowledged as being committed to the memory system. If the unacknowledged counter is non-zero, the system waits until the unacknowledged counter equals zero, and then removes the membar token from the store queue.Type: GrantFiled: August 10, 2009Date of Patent: March 8, 2016Assignee: ORACLE AMERICA, INC.Inventors: Haakan E. Zeffer, Robert E. Cypher, Shailender Chaudhry
-
Patent number: 9268710Abstract: One embodiment of the present invention provides a system that facilitates efficient transactional execution. The system starts by executing a transaction for a thread, wherein executing the transaction involves placing load-marks on cache lines which are loaded during the transaction and placing store-marks on cache lines which are stored to during the transaction. Upon completing the transaction, the system releases the load-marks and the store-marks from the cache lines which were load-marked and store-marked during the transaction. Note that during the transaction, the load-marks and store-marks prevent interfering accesses from other threads to the cache lines.Type: GrantFiled: January 18, 2007Date of Patent: February 23, 2016Assignee: ORACLE AMERICA, INC.Inventors: Robert E. Cypher, Shailender Chaudhry
-
Patent number: 9256438Abstract: A computer processor pipeline has both an architectural register file and a working register file. The lifetime of an entry in the working register file is determined by a predetermined number of instructions passing through a specified stage in the pipeline after the location in the working register file is allocated for an instruction. The size of the working register file is selected based upon performance characteristics. A working register file creditor indicator is coupled to the front end pipeline portion and to the back end pipeline portion. The working register file credit indicator is monitored to prevent a working register file overflow. When the a location in the architectural register file is read early, the location is monitored to determine whether the location is written to prior to issuance of the instruction associated with the early read.Type: GrantFiled: January 15, 2009Date of Patent: February 9, 2016Assignee: ORACLE AMERICA, INC.Inventors: Shailender Chaudhry, Paul Caprioli, Marc Tremblay
-
Patent number: 9146744Abstract: Embodiments of the present invention provide a system which executes a load instruction or a store instruction. During operation the system receives a load instruction. The system then determines if an unrestricted entry or a restricted entry in a store queue contains data that satisfies the load instruction. If not, the system retrieves data for the load instruction from a cache.Type: GrantFiled: May 6, 2008Date of Patent: September 29, 2015Assignee: ORACLE AMERICA, INC.Inventors: Paul Caprioli, Martin Karlsson, Shailender Chaudhry, Gideon N. Levinsky
-
Patent number: 9086889Abstract: Techniques are disclosed relating to reducing the latency of restarting a pipeline in a processor that implements scouting. In one embodiment, the processor may reduce pipeline restart latency using two instruction fetch units that are configured to fetch and re-fetch instructions in parallel with one another. In some embodiments, the processor may reduce pipeline restart latency by initiating re-fetching instructions in response to determining that a commit operation is to be attempted with respect to one or more deferred instructions. In other embodiments, the processor may reduce pipeline restart latency by initiating re-fetching instructions in response to receiving an indication that a request for a set of data has been received by a cache, where the indication is sent by the cache before determining whether the data is present in the cache or not.Type: GrantFiled: April 27, 2010Date of Patent: July 21, 2015Assignee: Oracle International CorporationInventors: Martin Karlsson, Sherman H. Yip, Shailender Chaudhry
-
Patent number: 8984264Abstract: The described embodiments provide a system for executing instructions in a processor. In the described embodiments, upon detecting a return of input data for a deferred instruction while executing instructions in an execute-ahead mode, the processor determines whether a replay bit is set in a corresponding entry for the returned input data in a miss buffer. If the replay bit is set, the processor transitions to a deferred-execution mode to execute deferred instructions. Otherwise, the processor continues to execute instructions in the execute-ahead mode.Type: GrantFiled: January 15, 2010Date of Patent: March 17, 2015Assignee: Oracle America, Inc.Inventors: Martin R. Karlsson, Sherman H. Yip, Shailender Chaudhry
-
Patent number: 8898436Abstract: A register file, in a processor, includes a first plurality of registers of a first size, n-bits. A decoder uses a mapping that divides the register file into a second plurality M of registers having a second size. Each of the registers having the second size is assigned a different name in a continuous name space. Each register of the second size includes a plurality N of registers of the first size, n-bits. Each register in the plurality N of registers is assigned the same name as the register of the second size that includes that plurality. State information is maintained in the register file for each n-bit register. The dependence of an instruction on other instructions is detected through the continuous name space. The state information allows the processor to determine when the information in any portion, or all, of a register is valid.Type: GrantFiled: April 20, 2009Date of Patent: November 25, 2014Assignee: Oracle America, Inc.Inventors: Shailender Chaudhry, Marc Tremblay
-
Patent number: 8745419Abstract: A processor includes a device providing a throttling power output signal. The throttling power output signal is used to determine when to logically throttle the power consumed by the processor. At least one core in the processor includes a pipeline having a decode pipe; and a logical power throttling unit coupled to the device to receive the output signal, and coupled to the decode pipe. Following the logical power throttling unit receiving the power throttling output signal satisfying a predetermined criterion, the logical power throttling unit causes the decode pipe to reduce an average number of instructions decoded per processor cycle without physically changing the processor cycle or any processor supply voltages.Type: GrantFiled: June 21, 2012Date of Patent: June 3, 2014Assignee: Oracle America, Inc.Inventors: Shailender Chaudhry, Quinn A. Jacobson, Marc Tremblay
-
Patent number: 8732407Abstract: Some embodiments of the present invention provide a system that avoids deadlock while attempting to acquire store-marks on cache lines. During operation, the system keeps track of store-mark requests that arise during execution of a thread, wherein a store-mark on a cache line indicates that one or more associated store buffer entries are waiting to be committed to the cache line. In this system, store-mark requests are processed in a pipelined manner, which allows a store-mark request to be initiated before preceding store-mark requests for the same thread complete. Next, if a store-mark request fails, within a bounded amount of time, the system removes or prevents store-marks associated with younger store-mark requests for the same thread, thereby avoiding a potential deadlock that can arise when one or more other threads attempt to store-mark the same cache lines.Type: GrantFiled: November 19, 2008Date of Patent: May 20, 2014Assignee: Oracle America, Inc.Inventors: Robert E. Cypher, Haakan E. Zeffer, Shailender Chaudhry
-
Patent number: 8688963Abstract: The embodiments described in the instant application provide a system for generating checkpoints. In the described embodiments, while speculatively executing instructions with one or more checkpoints in use, upon detecting an occurrence of a predetermined operating condition or encountering a predetermined type of instruction, the system is configured to determine whether an additional checkpoint is to be generated by computing a factor based on one or more operating conditions of the processor. When the factor is greater than a predetermined value, the processor is configured to generate the additional checkpoint.Type: GrantFiled: April 22, 2010Date of Patent: April 1, 2014Assignee: Oracle International CorporationInventors: Shailender Chaudhry, Martin R. Karlsson, Sherman H. Yip
-
Publication number: 20140059548Abstract: Embodiments of the present technology provide for migrating processes executing one any one of a plurality of cores in a multi-core cluster to a core of a separate cluster without first having to transfer the processes to a predetermined core of the multi-core cluster. Similarly, the processes may be transferred from the core of the separate cluster to the given core of the multi-core cluster.Type: ApplicationFiled: August 23, 2012Publication date: February 27, 2014Applicant: NVIDIA CorporationInventors: Sagheer Ahmad, Shailender Chaudhry, John George Mathieson, Mark Alan Overby
-
Patent number: 8627044Abstract: The described embodiments include a processor that determines instructions that can be issued based on unresolved data dependencies. In an issue unit in the processor, the processor keeps a record of each instruction that is directly or indirectly dependent on a base instruction. Upon determining that the base instruction has been deferred, the processor monitors instructions that are being issued from an issue queue to an execution unit for execution. Upon determining that an instruction from the record has reached a head of the issue queue, the processor immediately issues the instruction from the issue queue.Type: GrantFiled: October 6, 2010Date of Patent: January 7, 2014Assignee: Oracle International CorporationInventors: Shailender Chaudhry, Richard Thuy Van, Robert E. Cypher, Debasish Chandra
-
Patent number: 8601240Abstract: The described embodiments provide a system for executing instructions in a processor. While executing instructions in an execute-ahead mode, the processor encounters a store instruction for which a destination address is unknown. The processor then defers the store instruction. Upon encountering a load instruction while the store instruction with the unknown destination address is deferred, the processor determines if the load instruction is to continue executing. If not, the processor defers the load instruction. Otherwise, the processor continues executing the load instruction.Type: GrantFiled: May 4, 2010Date of Patent: December 3, 2013Assignee: Oracle International CorporationInventors: Shailender Chaudhry, Martin R. Karlsson, Gideon N. Levinsky