Patents by Inventor Sanyam Mehta

Sanyam Mehta has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11941250
    Abstract: A process includes determining a memory bandwidth of a processor subsystem corresponding to an execution of an application by the processor subsystem. The process includes determining an average memory latency corresponding to the execution of the application and determining an average occupancy of a miss status handling register queue associated with the execution of the application based on the memory bandwidth and the average memory latency. The process includes, based on the average occupancy of the miss status handling register queue and a capacity of the miss status handling register queue, generating data that represents a recommendation of an optimization to be applied to the application.
    Type: Grant
    Filed: May 6, 2022
    Date of Patent: March 26, 2024
    Assignee: Hewlett Packard Enterprise Development LP
    Inventor: Sanyam Mehta
  • Publication number: 20230359358
    Abstract: A process includes determining a memory bandwidth of a processor subsystem corresponding to an execution of an application by the processor subsystem. The process includes determining an average memory latency corresponding to the execution of the application and determining an average occupancy of a miss status handling register queue associated with the execution of the application based on the memory bandwidth and the average memory latency. The process includes, based on the average occupancy of the miss status handling register queue and a capacity of the miss status handling register queue, generating data that represents a recommendation of an optimization to be applied to the application.
    Type: Application
    Filed: May 6, 2022
    Publication date: November 9, 2023
    Inventor: Sanyam Mehta
  • Publication number: 20230315471
    Abstract: One aspect provides a system for hardware-assisted pre-execution. During operation, the system determines a pre-execution code region comprising one or more instructions. The system increments a global counter upon initiating the one or more instructions. The system issues a first instruction, which involves setting, in a first entry for the first instruction in a data structure, a first prefetch region identifier with a current value of the global counter. Responsive to a head pointer of the data structure reaching the first entry, the system: determines, based on a non-zero value for the first prefetch region identifier, that the first entry is not available to be allocated; and advances the head pointer to a next entry in the data structure, which renders a load associated with the first entry as a non-blocking load. The system resets the global counter upon completing the one or more instructions.
    Type: Application
    Filed: June 2, 2023
    Publication date: October 5, 2023
    Inventor: Sanyam Mehta
  • Patent number: 11687344
    Abstract: One aspect provides a system for hardware-assisted pre-execution. During operation, the system determines a pre-execution code region comprising one or more instructions. The system increments a global counter upon initiating the one or more instructions. The system issues a first instruction, which involves setting, in a first entry for the first instruction in a data structure, a first prefetch region identifier with a current value of the global counter. Responsive to a head pointer of the data structure reaching the first entry, the system: determines, based on a non-zero value for the first prefetch region identifier, that the first entry is not available to be allocated; and advances the head pointer to a next entry in the data structure, which renders a load associated with the first entry as a non-blocking load. The system resets the global counter upon completing the one or more instructions.
    Type: Grant
    Filed: August 25, 2021
    Date of Patent: June 27, 2023
    Assignee: Hewlett Packard Enterprise Development LP
    Inventor: Sanyam Mehta
  • Publication number: 20230061576
    Abstract: One aspect provides a system for hardware-assisted pre-execution. During operation, the system determines a pre-execution code region comprising one or more instructions. The system increments a global counter upon initiating the one or more instructions. The system issues a first instruction, which involves setting, in a first entry for the first instruction in a data structure, a first prefetch region identifier with a current value of the global counter. Responsive to a head pointer of the data structure reaching the first entry, the system: determines, based on a non-zero value for the first prefetch region identifier, that the first entry is not available to be allocated; and advances the head pointer to a next entry in the data structure, which renders a load associated with the first entry as a non-blocking load. The system resets the global counter upon completing the one or more instructions.
    Type: Application
    Filed: August 25, 2021
    Publication date: March 2, 2023
    Inventor: Sanyam Mehta
  • Patent number: 11567771
    Abstract: A system for processing gather and scatter instructions can implement a front-end subsystem, a back-end subsystem, or both. The front-end subsystem includes a prediction unit configured to determine a predicted quantity of coalesced memory access operations required by an instruction. A decode unit converts the instruction into a plurality of access operations based on the predicted quantity, and transmits the plurality of access operations and an indication of the predicted quantity to an issue queue. The back-end subsystem includes a load-store unit that receives a plurality of access operations corresponding to an instruction, determines a subset of the plurality of access operations that can be coalesced, and forms a coalesced memory access operation from the subset. A queue stores multiple memory addresses for a given load-store entry to provide for execution of coalesced memory accesses.
    Type: Grant
    Filed: July 30, 2020
    Date of Patent: January 31, 2023
    Assignees: Marvell Asia PTE, LTD., Cray Inc.
    Inventors: Harold Wade Cain, III, Nagesh Bangalore Lakshminarayana, Daniel Jonathan Ernst, Sanyam Mehta
  • Patent number: 11567767
    Abstract: A system for processing gather and scatter instructions can implement a front-end subsystem, a back-end subsystem, or both. The front-end subsystem includes a prediction unit configured to determine a predicted quantity of coalesced memory access operations required by an instruction. A decode unit converts the instruction into a plurality of access operations based on the predicted quantity, and transmits the plurality of access operations and an indication of the predicted quantity to an issue queue. The back-end subsystem includes a load-store unit that receives a plurality of access operations corresponding to an instruction, determines a subset of the plurality of access operations that can be coalesced, and forms a coalesced memory access operation from the subset. A queue stores multiple memory addresses for a given load-store entry to provide for execution of coalesced memory accesses.
    Type: Grant
    Filed: July 30, 2020
    Date of Patent: January 31, 2023
    Assignees: MARVELL ASIA PTE, LTD., CRAY INC.
    Inventors: Harold Wade Cain, III, Rabin Andrew Sugumar, Nagesh Bangalore Lakshminarayana, Daniel Jonathan Ernst, Sanyam Mehta
  • Patent number: 11531544
    Abstract: The system creates, in a scheduler data structure, a first entry for a consumer instruction associated with a logical register ID. The first entry includes: a scheduler entry ID; a physical register ID allocated for the logical register ID; a checkpoint ID; one or more scheduler entry IDs for one or more prior producer instructions; and a release field which indicates whether to early release a physical register. The system updates a register alias table entry to include the scheduler entry ID and the checkpoint ID of the consumer instruction. The system receives the scheduler entry ID and a checkpoint ID for a respective prior producer instruction. Responsive to determining that the received checkpoint ID does not match the checkpoint ID associated with the consumer instruction, the system sets a release field to indicate that a physical register is to remain allocated.
    Type: Grant
    Filed: July 29, 2021
    Date of Patent: December 20, 2022
    Assignee: Hewlett Packard Enterprise Development LP
    Inventor: Sanyam Mehta
  • Patent number: 11403082
    Abstract: Systems and methods are configured to receive code containing an original loop that includes irregular memory accesses. The original loop can be split. A pre-execution loop that contains code to prefetch content of the memory can be generated. Execution of the pre-execution loop can access memory inclusively between a starting location and the starting location plus a prefetch distance. A modified loop that can perform at least one computation based on the content prefetched with execution of the pre-execution loop can be generated. Execution of the main loop can to follow the execution of the pre-execution loop. The original loop can be replaced with the pre-execution loop and the modified loop.
    Type: Grant
    Filed: April 30, 2021
    Date of Patent: August 2, 2022
    Assignee: Hewlett Packard Enterprise Development LP
    Inventors: Sanyam Mehta, Gary William Elsesser, Terry D. Greyzck
  • Publication number: 20220035633
    Abstract: A system for processing gather and scatter instructions can implement a front-end subsystem, a back-end subsystem, or both. The front-end subsystem includes a prediction unit configured to determine a predicted quantity of coalesced memory access operations required by an instruction. A decode unit converts the instruction into a plurality of access operations based on the predicted quantity, and transmits the plurality of access operations and an indication of the predicted quantity to an issue queue. The back-end subsystem includes a load-store unit that receives a plurality of access operations corresponding to an instruction, determines a subset of the plurality of access operations that can be coalesced, and forms a coalesced memory access operation from the subset. A queue stores multiple memory addresses for a given load-store entry to provide for execution of coalesced memory accesses.
    Type: Application
    Filed: July 30, 2020
    Publication date: February 3, 2022
    Inventors: Harold Wade Cain, III, Nagesh Bangalore Lakshminarayana, Daniel Jonathan Ernst, Sanyam Mehta
  • Publication number: 20220035632
    Abstract: A system for processing gather and scatter instructions can implement a front-end subsystem, a back-end subsystem, or both. The front-end subsystem includes a prediction unit configured to determine a predicted quantity of coalesced memory access operations required by an instruction. A decode unit converts the instruction into a plurality of access operations based on the predicted quantity, and transmits the plurality of access operations and an indication of the predicted quantity to an issue queue. The back-end subsystem includes a load-store unit that receives a plurality of access operations corresponding to an instruction, determines a subset of the plurality of access operations that can be coalesced, and forms a coalesced memory access operation from the subset. A queue stores multiple memory addresses for a given load-store entry to provide for execution of coalesced memory accesses.
    Type: Application
    Filed: July 30, 2020
    Publication date: February 3, 2022
    Inventors: Harold Wade Cain, III, Rabin Andrew Sugumar, Nagesh Bangalore Lakshminarayana, Daniel Jonathan Ernst, Sanyam Mehta
  • Patent number: 10698813
    Abstract: A system is provided for allocating memory for data of a program for execution by a computer system with a multi-tier memory that includes LBM and HBM. The system accesses a data structure map that maps data structures of the program to the memory addresses within an address space of the program to which the data structures are initially allocated. The system executes the program to collect statistics relating to memory requests and memory bandwidth utilization of the program. The system determines an extent to which each data structure is used by a high memory utilization portion of the program based on the data structure map and the collected statistics. The system generates a memory allocation plan that favors allocating data structures in HBM based on the extent to which the data structures are used by a high memory utilization portion of the program.
    Type: Grant
    Filed: July 12, 2018
    Date of Patent: June 30, 2020
    Assignee: Hewlett Packard Enterprise Development LP
    Inventors: Heidi Lynn Poxon, William Homer, David W. Oehmke, Luiz DeRose, Clayton D. Andreasen, Sanyam Mehta
  • Publication number: 20190163637
    Abstract: A method for prefetching data into a cache is provided. The method allocates an outstanding request buffer (“ORB”). The method stores in an address field of the ORB an address and a number of blocks. The method issues prefetch requests for a degree number of blocks starting at the address. When a prefetch response is received for all the prefetch requests, the method adjusts the address of the next block to prefetch and adjusts the number of blocks remaining to be retrieved and then issues prefetch requests for a degree number of blocks starting at the adjusted address. The prefetching pauses when a maximum distance between the reads of the prefetched blocks and the last prefetched block is reached. When a read request for a prefetched block is received, the method resumes prefetching when a resume criterion is satisfied.
    Type: Application
    Filed: March 6, 2018
    Publication date: May 30, 2019
    Inventors: Sanyam Mehta, James Robert Kohn, Daniel Jonathan Ernst, Heidi Lynn Poxon, Luiz DeRose
  • Patent number: 10303610
    Abstract: A method for prefetching data into a cache is provided. The method allocates an outstanding request buffer (“ORB”). The method stores in an address field of the ORB an address and a number of blocks. The method issues prefetch requests for a degree number of blocks starting at the address. When a prefetch response is received for all the prefetch requests, the method adjusts the address of the next block to prefetch and adjusts the number of blocks remaining to be retrieved and then issues prefetch requests for a degree number of blocks starting at the adjusted address. The prefetching pauses when a maximum distance between the reads of the prefetched blocks and the last prefetched block is reached. When a read request for a prefetched block is received, the method resumes prefetching when a resume criterion is satisfied.
    Type: Grant
    Filed: March 6, 2018
    Date of Patent: May 28, 2019
    Assignee: Cray, Inc.
    Inventors: Sanyam Mehta, James Robert Kohn, Daniel Jonathan Ernst, Heidi Lynn Poxon, Luiz DeRose
  • Publication number: 20190042435
    Abstract: A method for prefetching data into a cache is provided. The method allocates an outstanding request buffer (“ORB”). The method stores in an address field of the ORB an address and a number of blocks. The method issues prefetch requests for a degree number of blocks starting at the address. When a prefetch response is received for all the prefetch requests, the method adjusts the address of the next block to prefetch and adjusts the number of blocks remaining to be retrieved and then issues prefetch requests for a degree number of blocks starting at the adjusted address. The prefetching pauses when a maximum distance between the reads of the prefetched blocks and the last prefetched block is reached. When a read request for a prefetched block is received, the method resumes prefetching when a resume criterion is satisfied.
    Type: Application
    Filed: March 6, 2018
    Publication date: February 7, 2019
    Inventors: Sanyam Mehta, James Robert Kohn, Daniel Jonathan Ernst, Heidi Lynn Poxon, Luiz DeRose
  • Patent number: 10185659
    Abstract: A system is provided for allocating memory for data of a program for execution by a computer system with a multi-tier memory that includes LBM and HBM. The system accesses a data structure map that maps data structures of the program to the memory addresses within an address space of the program to which the data structures are initially allocated. The system executes the program to collect statistics relating to memory requests and memory bandwidth utilization of the program. The system determines an extent to which each data structure is used by a high memory utilization portion of the program based on the data structure map and the collected statistics. The system generates a memory allocation plan that favors allocating data structures in HBM based on the extent to which the data structures are used by a high memory utilization portion of the program.
    Type: Grant
    Filed: December 9, 2016
    Date of Patent: January 22, 2019
    Assignee: Cray, Inc.
    Inventors: Heidi Lynn Poxon, William Homer, David W. Oehmke, Luiz DeRose, Clayton D. Andreasen, Sanyam Mehta
  • Publication number: 20180322064
    Abstract: A system is provided for allocating memory for data of a program for execution by a computer system with a multi-tier memory that includes LBM and HBM. The system accesses a data structure map that maps data structures of the program to the memory addresses within an address space of the program to which the data structures are initially allocated. The system executes the program to collect statistics relating to memory requests and memory bandwidth utilization of the program. The system determines an extent to which each data structure is used by a high memory utilization portion of the program based on the data structure map and the collected statistics. The system generates a memory allocation plan that favors allocating data structures in HBM based on the extent to which the data structures are used by a high memory utilization portion of the program.
    Type: Application
    Filed: July 12, 2018
    Publication date: November 8, 2018
    Inventors: Heidi Lynn Poxon, William Homer, David W. Oehmke, Luiz DeRose, Clayton D. Andreasen, Sanyam Mehta
  • Publication number: 20180165209
    Abstract: A system is provided for allocating memory for data of a program for execution by a computer system with a multi-tier memory that includes LBM and HBM. The system accesses a data structure map that maps data structures of the program to the memory addresses within an address space of the program to which the data structures are initially allocated. The system executes the program to collect statistics relating to memory requests and memory bandwidth utilization of the program. The system determines an extent to which each data structure is used by a high memory utilization portion of the program based on the data structure map and the collected statistics. The system generates a memory allocation plan that favors allocating data structures in HBM based on the extent to which the data structures are used by a high memory utilization portion of the program.
    Type: Application
    Filed: December 9, 2016
    Publication date: June 14, 2018
    Inventors: Heidi Lynn Poxon, William Homer, David W. Oehmke, Luiz DeRose, Clayton D. Andreasen, Sanyam Mehta
  • Patent number: 9946654
    Abstract: A method for prefetching data into a cache is provided. The method allocates an outstanding request buffer (“ORB”). The method stores in an address field of the ORB an address and a number of blocks. The method issues prefetch requests for a degree number of blocks starting at the address. When a prefetch response is received for all the prefetch requests, the method adjusts the address of the next block to prefetch and adjusts the number of blocks remaining to be retrieved and then issues prefetch requests for a degree number of blocks starting at the adjusted address. The prefetching pauses when a maximum distance between the reads of the prefetched blocks and the last prefetched block is reached. When a read request for a prefetched block is received, the method resumes prefetching when a resume criterion is satisfied.
    Type: Grant
    Filed: October 26, 2016
    Date of Patent: April 17, 2018
    Assignee: Cray Inc.
    Inventors: Sanyam Mehta, James Robert Kohn, Daniel Jonathan Ernst, Heidi Lynn Poxon, Luiz DeRose
  • Publication number: 20180074963
    Abstract: A method for prefetching data into a cache is provided. The method allocates an outstanding request buffer (“ORB”). The method stores in an address field of the ORB an address and a number of blocks. The method issues prefetch requests for a degree number of blocks starting at the address. When a prefetch response is received for all the prefetch requests, the method adjusts the address of the next block to prefetch and adjusts the number of blocks remaining to be retrieved and then issues prefetch requests for a degree number of blocks starting at the adjusted address. The prefetching pauses when a maximum distance between the reads of the prefetched blocks and the last prefetched block is reached. When a read request for a prefetched block is received, the method resumes prefetching when a resume criterion is satisfied.
    Type: Application
    Filed: October 26, 2016
    Publication date: March 15, 2018
    Inventors: Sanyam Mehta, James Robert Kohn, Daniel Jonathan Ernst, Heidi Lynn Poxon, Luiz DeRose