Patents by Inventor Sanyam Mehta
Sanyam Mehta has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12254318Abstract: A system determines an original instruction with a first logical register (LR) mapped to a first physical register (PR). The system determines a current instruction with a current LR. A prior instruction is associated with a second LR mapped to a second PR. The system allocates the current LR to a third PR. Responsive to determining that the current and prior instructions are executed in different iterations, the system marks the second PR as not eligible for early release. Responsive to determining that the current LR is previously mapped to the first PR, the allocation comprises a redefinition of the first LR. Responsive to determining that the first PR is eligible for early release and that the current and original instructions are executed in the same or consecutive iterations, the system releases the first PR based upon the redefinition and not the prior instruction completing or the current instruction committing.Type: GrantFiled: May 5, 2023Date of Patent: March 18, 2025Assignee: Hewlett Packard Enterprise Development LPInventor: Sanyam Mehta
-
Patent number: 12079631Abstract: One aspect provides a system for hardware-assisted pre-execution. During operation, the system determines a pre-execution code region comprising one or more instructions. The system increments a global counter upon initiating the one or more instructions. The system issues a first instruction, which involves setting, in a first entry for the first instruction in a data structure, a first prefetch region identifier with a current value of the global counter. Responsive to a head pointer of the data structure reaching the first entry, the system: determines, based on a non-zero value for the first prefetch region identifier, that the first entry is not available to be allocated; and advances the head pointer to a next entry in the data structure, which renders a load associated with the first entry as a non-blocking load. The system resets the global counter upon completing the one or more instructions.Type: GrantFiled: June 2, 2023Date of Patent: September 3, 2024Assignee: Hewlett Packard Enterprise Development LPInventor: Sanyam Mehta
-
Publication number: 20240248719Abstract: A system determines an original instruction with a first logical register (LR) mapped to a first physical register (PR). The system determines a current instruction with a current LR. A prior instruction is associated with a second LR mapped to a second PR. The system allocates the current LR to a third PR. Responsive to determining that the current and prior instructions are executed in different iterations, the system marks the second PR as not eligible for early release. Responsive to determining that the current LR is previously mapped to the first PR, the allocation comprises a redefinition of the first LR. Responsive to determining that the first PR is eligible for early release and that the current and original instructions are executed in the same or consecutive iterations, the system releases the first PR based upon the redefinition and not the prior instruction completing or the current instruction committing.Type: ApplicationFiled: May 5, 2023Publication date: July 25, 2024Inventor: Sanyam Mehta
-
Patent number: 11941250Abstract: A process includes determining a memory bandwidth of a processor subsystem corresponding to an execution of an application by the processor subsystem. The process includes determining an average memory latency corresponding to the execution of the application and determining an average occupancy of a miss status handling register queue associated with the execution of the application based on the memory bandwidth and the average memory latency. The process includes, based on the average occupancy of the miss status handling register queue and a capacity of the miss status handling register queue, generating data that represents a recommendation of an optimization to be applied to the application.Type: GrantFiled: May 6, 2022Date of Patent: March 26, 2024Assignee: Hewlett Packard Enterprise Development LPInventor: Sanyam Mehta
-
Publication number: 20230359358Abstract: A process includes determining a memory bandwidth of a processor subsystem corresponding to an execution of an application by the processor subsystem. The process includes determining an average memory latency corresponding to the execution of the application and determining an average occupancy of a miss status handling register queue associated with the execution of the application based on the memory bandwidth and the average memory latency. The process includes, based on the average occupancy of the miss status handling register queue and a capacity of the miss status handling register queue, generating data that represents a recommendation of an optimization to be applied to the application.Type: ApplicationFiled: May 6, 2022Publication date: November 9, 2023Inventor: Sanyam Mehta
-
Publication number: 20230315471Abstract: One aspect provides a system for hardware-assisted pre-execution. During operation, the system determines a pre-execution code region comprising one or more instructions. The system increments a global counter upon initiating the one or more instructions. The system issues a first instruction, which involves setting, in a first entry for the first instruction in a data structure, a first prefetch region identifier with a current value of the global counter. Responsive to a head pointer of the data structure reaching the first entry, the system: determines, based on a non-zero value for the first prefetch region identifier, that the first entry is not available to be allocated; and advances the head pointer to a next entry in the data structure, which renders a load associated with the first entry as a non-blocking load. The system resets the global counter upon completing the one or more instructions.Type: ApplicationFiled: June 2, 2023Publication date: October 5, 2023Inventor: Sanyam Mehta
-
Patent number: 11687344Abstract: One aspect provides a system for hardware-assisted pre-execution. During operation, the system determines a pre-execution code region comprising one or more instructions. The system increments a global counter upon initiating the one or more instructions. The system issues a first instruction, which involves setting, in a first entry for the first instruction in a data structure, a first prefetch region identifier with a current value of the global counter. Responsive to a head pointer of the data structure reaching the first entry, the system: determines, based on a non-zero value for the first prefetch region identifier, that the first entry is not available to be allocated; and advances the head pointer to a next entry in the data structure, which renders a load associated with the first entry as a non-blocking load. The system resets the global counter upon completing the one or more instructions.Type: GrantFiled: August 25, 2021Date of Patent: June 27, 2023Assignee: Hewlett Packard Enterprise Development LPInventor: Sanyam Mehta
-
Publication number: 20230061576Abstract: One aspect provides a system for hardware-assisted pre-execution. During operation, the system determines a pre-execution code region comprising one or more instructions. The system increments a global counter upon initiating the one or more instructions. The system issues a first instruction, which involves setting, in a first entry for the first instruction in a data structure, a first prefetch region identifier with a current value of the global counter. Responsive to a head pointer of the data structure reaching the first entry, the system: determines, based on a non-zero value for the first prefetch region identifier, that the first entry is not available to be allocated; and advances the head pointer to a next entry in the data structure, which renders a load associated with the first entry as a non-blocking load. The system resets the global counter upon completing the one or more instructions.Type: ApplicationFiled: August 25, 2021Publication date: March 2, 2023Inventor: Sanyam Mehta
-
Patent number: 11567771Abstract: A system for processing gather and scatter instructions can implement a front-end subsystem, a back-end subsystem, or both. The front-end subsystem includes a prediction unit configured to determine a predicted quantity of coalesced memory access operations required by an instruction. A decode unit converts the instruction into a plurality of access operations based on the predicted quantity, and transmits the plurality of access operations and an indication of the predicted quantity to an issue queue. The back-end subsystem includes a load-store unit that receives a plurality of access operations corresponding to an instruction, determines a subset of the plurality of access operations that can be coalesced, and forms a coalesced memory access operation from the subset. A queue stores multiple memory addresses for a given load-store entry to provide for execution of coalesced memory accesses.Type: GrantFiled: July 30, 2020Date of Patent: January 31, 2023Assignees: Marvell Asia PTE, LTD., Cray Inc.Inventors: Harold Wade Cain, III, Nagesh Bangalore Lakshminarayana, Daniel Jonathan Ernst, Sanyam Mehta
-
Patent number: 11567767Abstract: A system for processing gather and scatter instructions can implement a front-end subsystem, a back-end subsystem, or both. The front-end subsystem includes a prediction unit configured to determine a predicted quantity of coalesced memory access operations required by an instruction. A decode unit converts the instruction into a plurality of access operations based on the predicted quantity, and transmits the plurality of access operations and an indication of the predicted quantity to an issue queue. The back-end subsystem includes a load-store unit that receives a plurality of access operations corresponding to an instruction, determines a subset of the plurality of access operations that can be coalesced, and forms a coalesced memory access operation from the subset. A queue stores multiple memory addresses for a given load-store entry to provide for execution of coalesced memory accesses.Type: GrantFiled: July 30, 2020Date of Patent: January 31, 2023Assignees: MARVELL ASIA PTE, LTD., CRAY INC.Inventors: Harold Wade Cain, III, Rabin Andrew Sugumar, Nagesh Bangalore Lakshminarayana, Daniel Jonathan Ernst, Sanyam Mehta
-
Patent number: 11531544Abstract: The system creates, in a scheduler data structure, a first entry for a consumer instruction associated with a logical register ID. The first entry includes: a scheduler entry ID; a physical register ID allocated for the logical register ID; a checkpoint ID; one or more scheduler entry IDs for one or more prior producer instructions; and a release field which indicates whether to early release a physical register. The system updates a register alias table entry to include the scheduler entry ID and the checkpoint ID of the consumer instruction. The system receives the scheduler entry ID and a checkpoint ID for a respective prior producer instruction. Responsive to determining that the received checkpoint ID does not match the checkpoint ID associated with the consumer instruction, the system sets a release field to indicate that a physical register is to remain allocated.Type: GrantFiled: July 29, 2021Date of Patent: December 20, 2022Assignee: Hewlett Packard Enterprise Development LPInventor: Sanyam Mehta
-
Patent number: 11403082Abstract: Systems and methods are configured to receive code containing an original loop that includes irregular memory accesses. The original loop can be split. A pre-execution loop that contains code to prefetch content of the memory can be generated. Execution of the pre-execution loop can access memory inclusively between a starting location and the starting location plus a prefetch distance. A modified loop that can perform at least one computation based on the content prefetched with execution of the pre-execution loop can be generated. Execution of the main loop can to follow the execution of the pre-execution loop. The original loop can be replaced with the pre-execution loop and the modified loop.Type: GrantFiled: April 30, 2021Date of Patent: August 2, 2022Assignee: Hewlett Packard Enterprise Development LPInventors: Sanyam Mehta, Gary William Elsesser, Terry D. Greyzck
-
Publication number: 20220035632Abstract: A system for processing gather and scatter instructions can implement a front-end subsystem, a back-end subsystem, or both. The front-end subsystem includes a prediction unit configured to determine a predicted quantity of coalesced memory access operations required by an instruction. A decode unit converts the instruction into a plurality of access operations based on the predicted quantity, and transmits the plurality of access operations and an indication of the predicted quantity to an issue queue. The back-end subsystem includes a load-store unit that receives a plurality of access operations corresponding to an instruction, determines a subset of the plurality of access operations that can be coalesced, and forms a coalesced memory access operation from the subset. A queue stores multiple memory addresses for a given load-store entry to provide for execution of coalesced memory accesses.Type: ApplicationFiled: July 30, 2020Publication date: February 3, 2022Inventors: Harold Wade Cain, III, Rabin Andrew Sugumar, Nagesh Bangalore Lakshminarayana, Daniel Jonathan Ernst, Sanyam Mehta
-
Publication number: 20220035633Abstract: A system for processing gather and scatter instructions can implement a front-end subsystem, a back-end subsystem, or both. The front-end subsystem includes a prediction unit configured to determine a predicted quantity of coalesced memory access operations required by an instruction. A decode unit converts the instruction into a plurality of access operations based on the predicted quantity, and transmits the plurality of access operations and an indication of the predicted quantity to an issue queue. The back-end subsystem includes a load-store unit that receives a plurality of access operations corresponding to an instruction, determines a subset of the plurality of access operations that can be coalesced, and forms a coalesced memory access operation from the subset. A queue stores multiple memory addresses for a given load-store entry to provide for execution of coalesced memory accesses.Type: ApplicationFiled: July 30, 2020Publication date: February 3, 2022Inventors: Harold Wade Cain, III, Nagesh Bangalore Lakshminarayana, Daniel Jonathan Ernst, Sanyam Mehta
-
Patent number: 10698813Abstract: A system is provided for allocating memory for data of a program for execution by a computer system with a multi-tier memory that includes LBM and HBM. The system accesses a data structure map that maps data structures of the program to the memory addresses within an address space of the program to which the data structures are initially allocated. The system executes the program to collect statistics relating to memory requests and memory bandwidth utilization of the program. The system determines an extent to which each data structure is used by a high memory utilization portion of the program based on the data structure map and the collected statistics. The system generates a memory allocation plan that favors allocating data structures in HBM based on the extent to which the data structures are used by a high memory utilization portion of the program.Type: GrantFiled: July 12, 2018Date of Patent: June 30, 2020Assignee: Hewlett Packard Enterprise Development LPInventors: Heidi Lynn Poxon, William Homer, David W. Oehmke, Luiz DeRose, Clayton D. Andreasen, Sanyam Mehta
-
Publication number: 20190163637Abstract: A method for prefetching data into a cache is provided. The method allocates an outstanding request buffer (“ORB”). The method stores in an address field of the ORB an address and a number of blocks. The method issues prefetch requests for a degree number of blocks starting at the address. When a prefetch response is received for all the prefetch requests, the method adjusts the address of the next block to prefetch and adjusts the number of blocks remaining to be retrieved and then issues prefetch requests for a degree number of blocks starting at the adjusted address. The prefetching pauses when a maximum distance between the reads of the prefetched blocks and the last prefetched block is reached. When a read request for a prefetched block is received, the method resumes prefetching when a resume criterion is satisfied.Type: ApplicationFiled: March 6, 2018Publication date: May 30, 2019Inventors: Sanyam Mehta, James Robert Kohn, Daniel Jonathan Ernst, Heidi Lynn Poxon, Luiz DeRose
-
Patent number: 10303610Abstract: A method for prefetching data into a cache is provided. The method allocates an outstanding request buffer (“ORB”). The method stores in an address field of the ORB an address and a number of blocks. The method issues prefetch requests for a degree number of blocks starting at the address. When a prefetch response is received for all the prefetch requests, the method adjusts the address of the next block to prefetch and adjusts the number of blocks remaining to be retrieved and then issues prefetch requests for a degree number of blocks starting at the adjusted address. The prefetching pauses when a maximum distance between the reads of the prefetched blocks and the last prefetched block is reached. When a read request for a prefetched block is received, the method resumes prefetching when a resume criterion is satisfied.Type: GrantFiled: March 6, 2018Date of Patent: May 28, 2019Assignee: Cray, Inc.Inventors: Sanyam Mehta, James Robert Kohn, Daniel Jonathan Ernst, Heidi Lynn Poxon, Luiz DeRose
-
Publication number: 20190042435Abstract: A method for prefetching data into a cache is provided. The method allocates an outstanding request buffer (“ORB”). The method stores in an address field of the ORB an address and a number of blocks. The method issues prefetch requests for a degree number of blocks starting at the address. When a prefetch response is received for all the prefetch requests, the method adjusts the address of the next block to prefetch and adjusts the number of blocks remaining to be retrieved and then issues prefetch requests for a degree number of blocks starting at the adjusted address. The prefetching pauses when a maximum distance between the reads of the prefetched blocks and the last prefetched block is reached. When a read request for a prefetched block is received, the method resumes prefetching when a resume criterion is satisfied.Type: ApplicationFiled: March 6, 2018Publication date: February 7, 2019Inventors: Sanyam Mehta, James Robert Kohn, Daniel Jonathan Ernst, Heidi Lynn Poxon, Luiz DeRose
-
Patent number: 10185659Abstract: A system is provided for allocating memory for data of a program for execution by a computer system with a multi-tier memory that includes LBM and HBM. The system accesses a data structure map that maps data structures of the program to the memory addresses within an address space of the program to which the data structures are initially allocated. The system executes the program to collect statistics relating to memory requests and memory bandwidth utilization of the program. The system determines an extent to which each data structure is used by a high memory utilization portion of the program based on the data structure map and the collected statistics. The system generates a memory allocation plan that favors allocating data structures in HBM based on the extent to which the data structures are used by a high memory utilization portion of the program.Type: GrantFiled: December 9, 2016Date of Patent: January 22, 2019Assignee: Cray, Inc.Inventors: Heidi Lynn Poxon, William Homer, David W. Oehmke, Luiz DeRose, Clayton D. Andreasen, Sanyam Mehta
-
Publication number: 20180322064Abstract: A system is provided for allocating memory for data of a program for execution by a computer system with a multi-tier memory that includes LBM and HBM. The system accesses a data structure map that maps data structures of the program to the memory addresses within an address space of the program to which the data structures are initially allocated. The system executes the program to collect statistics relating to memory requests and memory bandwidth utilization of the program. The system determines an extent to which each data structure is used by a high memory utilization portion of the program based on the data structure map and the collected statistics. The system generates a memory allocation plan that favors allocating data structures in HBM based on the extent to which the data structures are used by a high memory utilization portion of the program.Type: ApplicationFiled: July 12, 2018Publication date: November 8, 2018Inventors: Heidi Lynn Poxon, William Homer, David W. Oehmke, Luiz DeRose, Clayton D. Andreasen, Sanyam Mehta