Patents by Inventor Sanyam Mehta

Sanyam Mehta has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12254318
    Abstract: A system determines an original instruction with a first logical register (LR) mapped to a first physical register (PR). The system determines a current instruction with a current LR. A prior instruction is associated with a second LR mapped to a second PR. The system allocates the current LR to a third PR. Responsive to determining that the current and prior instructions are executed in different iterations, the system marks the second PR as not eligible for early release. Responsive to determining that the current LR is previously mapped to the first PR, the allocation comprises a redefinition of the first LR. Responsive to determining that the first PR is eligible for early release and that the current and original instructions are executed in the same or consecutive iterations, the system releases the first PR based upon the redefinition and not the prior instruction completing or the current instruction committing.
    Type: Grant
    Filed: May 5, 2023
    Date of Patent: March 18, 2025
    Assignee: Hewlett Packard Enterprise Development LP
    Inventor: Sanyam Mehta
  • Patent number: 12079631
    Abstract: One aspect provides a system for hardware-assisted pre-execution. During operation, the system determines a pre-execution code region comprising one or more instructions. The system increments a global counter upon initiating the one or more instructions. The system issues a first instruction, which involves setting, in a first entry for the first instruction in a data structure, a first prefetch region identifier with a current value of the global counter. Responsive to a head pointer of the data structure reaching the first entry, the system: determines, based on a non-zero value for the first prefetch region identifier, that the first entry is not available to be allocated; and advances the head pointer to a next entry in the data structure, which renders a load associated with the first entry as a non-blocking load. The system resets the global counter upon completing the one or more instructions.
    Type: Grant
    Filed: June 2, 2023
    Date of Patent: September 3, 2024
    Assignee: Hewlett Packard Enterprise Development LP
    Inventor: Sanyam Mehta
  • Publication number: 20240248719
    Abstract: A system determines an original instruction with a first logical register (LR) mapped to a first physical register (PR). The system determines a current instruction with a current LR. A prior instruction is associated with a second LR mapped to a second PR. The system allocates the current LR to a third PR. Responsive to determining that the current and prior instructions are executed in different iterations, the system marks the second PR as not eligible for early release. Responsive to determining that the current LR is previously mapped to the first PR, the allocation comprises a redefinition of the first LR. Responsive to determining that the first PR is eligible for early release and that the current and original instructions are executed in the same or consecutive iterations, the system releases the first PR based upon the redefinition and not the prior instruction completing or the current instruction committing.
    Type: Application
    Filed: May 5, 2023
    Publication date: July 25, 2024
    Inventor: Sanyam Mehta
  • Patent number: 11941250
    Abstract: A process includes determining a memory bandwidth of a processor subsystem corresponding to an execution of an application by the processor subsystem. The process includes determining an average memory latency corresponding to the execution of the application and determining an average occupancy of a miss status handling register queue associated with the execution of the application based on the memory bandwidth and the average memory latency. The process includes, based on the average occupancy of the miss status handling register queue and a capacity of the miss status handling register queue, generating data that represents a recommendation of an optimization to be applied to the application.
    Type: Grant
    Filed: May 6, 2022
    Date of Patent: March 26, 2024
    Assignee: Hewlett Packard Enterprise Development LP
    Inventor: Sanyam Mehta
  • Publication number: 20230359358
    Abstract: A process includes determining a memory bandwidth of a processor subsystem corresponding to an execution of an application by the processor subsystem. The process includes determining an average memory latency corresponding to the execution of the application and determining an average occupancy of a miss status handling register queue associated with the execution of the application based on the memory bandwidth and the average memory latency. The process includes, based on the average occupancy of the miss status handling register queue and a capacity of the miss status handling register queue, generating data that represents a recommendation of an optimization to be applied to the application.
    Type: Application
    Filed: May 6, 2022
    Publication date: November 9, 2023
    Inventor: Sanyam Mehta
  • Publication number: 20230315471
    Abstract: One aspect provides a system for hardware-assisted pre-execution. During operation, the system determines a pre-execution code region comprising one or more instructions. The system increments a global counter upon initiating the one or more instructions. The system issues a first instruction, which involves setting, in a first entry for the first instruction in a data structure, a first prefetch region identifier with a current value of the global counter. Responsive to a head pointer of the data structure reaching the first entry, the system: determines, based on a non-zero value for the first prefetch region identifier, that the first entry is not available to be allocated; and advances the head pointer to a next entry in the data structure, which renders a load associated with the first entry as a non-blocking load. The system resets the global counter upon completing the one or more instructions.
    Type: Application
    Filed: June 2, 2023
    Publication date: October 5, 2023
    Inventor: Sanyam Mehta
  • Patent number: 11687344
    Abstract: One aspect provides a system for hardware-assisted pre-execution. During operation, the system determines a pre-execution code region comprising one or more instructions. The system increments a global counter upon initiating the one or more instructions. The system issues a first instruction, which involves setting, in a first entry for the first instruction in a data structure, a first prefetch region identifier with a current value of the global counter. Responsive to a head pointer of the data structure reaching the first entry, the system: determines, based on a non-zero value for the first prefetch region identifier, that the first entry is not available to be allocated; and advances the head pointer to a next entry in the data structure, which renders a load associated with the first entry as a non-blocking load. The system resets the global counter upon completing the one or more instructions.
    Type: Grant
    Filed: August 25, 2021
    Date of Patent: June 27, 2023
    Assignee: Hewlett Packard Enterprise Development LP
    Inventor: Sanyam Mehta
  • Publication number: 20230061576
    Abstract: One aspect provides a system for hardware-assisted pre-execution. During operation, the system determines a pre-execution code region comprising one or more instructions. The system increments a global counter upon initiating the one or more instructions. The system issues a first instruction, which involves setting, in a first entry for the first instruction in a data structure, a first prefetch region identifier with a current value of the global counter. Responsive to a head pointer of the data structure reaching the first entry, the system: determines, based on a non-zero value for the first prefetch region identifier, that the first entry is not available to be allocated; and advances the head pointer to a next entry in the data structure, which renders a load associated with the first entry as a non-blocking load. The system resets the global counter upon completing the one or more instructions.
    Type: Application
    Filed: August 25, 2021
    Publication date: March 2, 2023
    Inventor: Sanyam Mehta
  • Patent number: 11567771
    Abstract: A system for processing gather and scatter instructions can implement a front-end subsystem, a back-end subsystem, or both. The front-end subsystem includes a prediction unit configured to determine a predicted quantity of coalesced memory access operations required by an instruction. A decode unit converts the instruction into a plurality of access operations based on the predicted quantity, and transmits the plurality of access operations and an indication of the predicted quantity to an issue queue. The back-end subsystem includes a load-store unit that receives a plurality of access operations corresponding to an instruction, determines a subset of the plurality of access operations that can be coalesced, and forms a coalesced memory access operation from the subset. A queue stores multiple memory addresses for a given load-store entry to provide for execution of coalesced memory accesses.
    Type: Grant
    Filed: July 30, 2020
    Date of Patent: January 31, 2023
    Assignees: Marvell Asia PTE, LTD., Cray Inc.
    Inventors: Harold Wade Cain, III, Nagesh Bangalore Lakshminarayana, Daniel Jonathan Ernst, Sanyam Mehta
  • Patent number: 11567767
    Abstract: A system for processing gather and scatter instructions can implement a front-end subsystem, a back-end subsystem, or both. The front-end subsystem includes a prediction unit configured to determine a predicted quantity of coalesced memory access operations required by an instruction. A decode unit converts the instruction into a plurality of access operations based on the predicted quantity, and transmits the plurality of access operations and an indication of the predicted quantity to an issue queue. The back-end subsystem includes a load-store unit that receives a plurality of access operations corresponding to an instruction, determines a subset of the plurality of access operations that can be coalesced, and forms a coalesced memory access operation from the subset. A queue stores multiple memory addresses for a given load-store entry to provide for execution of coalesced memory accesses.
    Type: Grant
    Filed: July 30, 2020
    Date of Patent: January 31, 2023
    Assignees: MARVELL ASIA PTE, LTD., CRAY INC.
    Inventors: Harold Wade Cain, III, Rabin Andrew Sugumar, Nagesh Bangalore Lakshminarayana, Daniel Jonathan Ernst, Sanyam Mehta
  • Patent number: 11531544
    Abstract: The system creates, in a scheduler data structure, a first entry for a consumer instruction associated with a logical register ID. The first entry includes: a scheduler entry ID; a physical register ID allocated for the logical register ID; a checkpoint ID; one or more scheduler entry IDs for one or more prior producer instructions; and a release field which indicates whether to early release a physical register. The system updates a register alias table entry to include the scheduler entry ID and the checkpoint ID of the consumer instruction. The system receives the scheduler entry ID and a checkpoint ID for a respective prior producer instruction. Responsive to determining that the received checkpoint ID does not match the checkpoint ID associated with the consumer instruction, the system sets a release field to indicate that a physical register is to remain allocated.
    Type: Grant
    Filed: July 29, 2021
    Date of Patent: December 20, 2022
    Assignee: Hewlett Packard Enterprise Development LP
    Inventor: Sanyam Mehta
  • Patent number: 11403082
    Abstract: Systems and methods are configured to receive code containing an original loop that includes irregular memory accesses. The original loop can be split. A pre-execution loop that contains code to prefetch content of the memory can be generated. Execution of the pre-execution loop can access memory inclusively between a starting location and the starting location plus a prefetch distance. A modified loop that can perform at least one computation based on the content prefetched with execution of the pre-execution loop can be generated. Execution of the main loop can to follow the execution of the pre-execution loop. The original loop can be replaced with the pre-execution loop and the modified loop.
    Type: Grant
    Filed: April 30, 2021
    Date of Patent: August 2, 2022
    Assignee: Hewlett Packard Enterprise Development LP
    Inventors: Sanyam Mehta, Gary William Elsesser, Terry D. Greyzck
  • Publication number: 20220035632
    Abstract: A system for processing gather and scatter instructions can implement a front-end subsystem, a back-end subsystem, or both. The front-end subsystem includes a prediction unit configured to determine a predicted quantity of coalesced memory access operations required by an instruction. A decode unit converts the instruction into a plurality of access operations based on the predicted quantity, and transmits the plurality of access operations and an indication of the predicted quantity to an issue queue. The back-end subsystem includes a load-store unit that receives a plurality of access operations corresponding to an instruction, determines a subset of the plurality of access operations that can be coalesced, and forms a coalesced memory access operation from the subset. A queue stores multiple memory addresses for a given load-store entry to provide for execution of coalesced memory accesses.
    Type: Application
    Filed: July 30, 2020
    Publication date: February 3, 2022
    Inventors: Harold Wade Cain, III, Rabin Andrew Sugumar, Nagesh Bangalore Lakshminarayana, Daniel Jonathan Ernst, Sanyam Mehta
  • Publication number: 20220035633
    Abstract: A system for processing gather and scatter instructions can implement a front-end subsystem, a back-end subsystem, or both. The front-end subsystem includes a prediction unit configured to determine a predicted quantity of coalesced memory access operations required by an instruction. A decode unit converts the instruction into a plurality of access operations based on the predicted quantity, and transmits the plurality of access operations and an indication of the predicted quantity to an issue queue. The back-end subsystem includes a load-store unit that receives a plurality of access operations corresponding to an instruction, determines a subset of the plurality of access operations that can be coalesced, and forms a coalesced memory access operation from the subset. A queue stores multiple memory addresses for a given load-store entry to provide for execution of coalesced memory accesses.
    Type: Application
    Filed: July 30, 2020
    Publication date: February 3, 2022
    Inventors: Harold Wade Cain, III, Nagesh Bangalore Lakshminarayana, Daniel Jonathan Ernst, Sanyam Mehta
  • Patent number: 10698813
    Abstract: A system is provided for allocating memory for data of a program for execution by a computer system with a multi-tier memory that includes LBM and HBM. The system accesses a data structure map that maps data structures of the program to the memory addresses within an address space of the program to which the data structures are initially allocated. The system executes the program to collect statistics relating to memory requests and memory bandwidth utilization of the program. The system determines an extent to which each data structure is used by a high memory utilization portion of the program based on the data structure map and the collected statistics. The system generates a memory allocation plan that favors allocating data structures in HBM based on the extent to which the data structures are used by a high memory utilization portion of the program.
    Type: Grant
    Filed: July 12, 2018
    Date of Patent: June 30, 2020
    Assignee: Hewlett Packard Enterprise Development LP
    Inventors: Heidi Lynn Poxon, William Homer, David W. Oehmke, Luiz DeRose, Clayton D. Andreasen, Sanyam Mehta
  • Publication number: 20190163637
    Abstract: A method for prefetching data into a cache is provided. The method allocates an outstanding request buffer (“ORB”). The method stores in an address field of the ORB an address and a number of blocks. The method issues prefetch requests for a degree number of blocks starting at the address. When a prefetch response is received for all the prefetch requests, the method adjusts the address of the next block to prefetch and adjusts the number of blocks remaining to be retrieved and then issues prefetch requests for a degree number of blocks starting at the adjusted address. The prefetching pauses when a maximum distance between the reads of the prefetched blocks and the last prefetched block is reached. When a read request for a prefetched block is received, the method resumes prefetching when a resume criterion is satisfied.
    Type: Application
    Filed: March 6, 2018
    Publication date: May 30, 2019
    Inventors: Sanyam Mehta, James Robert Kohn, Daniel Jonathan Ernst, Heidi Lynn Poxon, Luiz DeRose
  • Patent number: 10303610
    Abstract: A method for prefetching data into a cache is provided. The method allocates an outstanding request buffer (“ORB”). The method stores in an address field of the ORB an address and a number of blocks. The method issues prefetch requests for a degree number of blocks starting at the address. When a prefetch response is received for all the prefetch requests, the method adjusts the address of the next block to prefetch and adjusts the number of blocks remaining to be retrieved and then issues prefetch requests for a degree number of blocks starting at the adjusted address. The prefetching pauses when a maximum distance between the reads of the prefetched blocks and the last prefetched block is reached. When a read request for a prefetched block is received, the method resumes prefetching when a resume criterion is satisfied.
    Type: Grant
    Filed: March 6, 2018
    Date of Patent: May 28, 2019
    Assignee: Cray, Inc.
    Inventors: Sanyam Mehta, James Robert Kohn, Daniel Jonathan Ernst, Heidi Lynn Poxon, Luiz DeRose
  • Publication number: 20190042435
    Abstract: A method for prefetching data into a cache is provided. The method allocates an outstanding request buffer (“ORB”). The method stores in an address field of the ORB an address and a number of blocks. The method issues prefetch requests for a degree number of blocks starting at the address. When a prefetch response is received for all the prefetch requests, the method adjusts the address of the next block to prefetch and adjusts the number of blocks remaining to be retrieved and then issues prefetch requests for a degree number of blocks starting at the adjusted address. The prefetching pauses when a maximum distance between the reads of the prefetched blocks and the last prefetched block is reached. When a read request for a prefetched block is received, the method resumes prefetching when a resume criterion is satisfied.
    Type: Application
    Filed: March 6, 2018
    Publication date: February 7, 2019
    Inventors: Sanyam Mehta, James Robert Kohn, Daniel Jonathan Ernst, Heidi Lynn Poxon, Luiz DeRose
  • Patent number: 10185659
    Abstract: A system is provided for allocating memory for data of a program for execution by a computer system with a multi-tier memory that includes LBM and HBM. The system accesses a data structure map that maps data structures of the program to the memory addresses within an address space of the program to which the data structures are initially allocated. The system executes the program to collect statistics relating to memory requests and memory bandwidth utilization of the program. The system determines an extent to which each data structure is used by a high memory utilization portion of the program based on the data structure map and the collected statistics. The system generates a memory allocation plan that favors allocating data structures in HBM based on the extent to which the data structures are used by a high memory utilization portion of the program.
    Type: Grant
    Filed: December 9, 2016
    Date of Patent: January 22, 2019
    Assignee: Cray, Inc.
    Inventors: Heidi Lynn Poxon, William Homer, David W. Oehmke, Luiz DeRose, Clayton D. Andreasen, Sanyam Mehta
  • Publication number: 20180322064
    Abstract: A system is provided for allocating memory for data of a program for execution by a computer system with a multi-tier memory that includes LBM and HBM. The system accesses a data structure map that maps data structures of the program to the memory addresses within an address space of the program to which the data structures are initially allocated. The system executes the program to collect statistics relating to memory requests and memory bandwidth utilization of the program. The system determines an extent to which each data structure is used by a high memory utilization portion of the program based on the data structure map and the collected statistics. The system generates a memory allocation plan that favors allocating data structures in HBM based on the extent to which the data structures are used by a high memory utilization portion of the program.
    Type: Application
    Filed: July 12, 2018
    Publication date: November 8, 2018
    Inventors: Heidi Lynn Poxon, William Homer, David W. Oehmke, Luiz DeRose, Clayton D. Andreasen, Sanyam Mehta