Patents by Inventor Sooraj Puthoor

Sooraj Puthoor has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20230229494
    Abstract: A processor includes a task scheduling unit and a compute unit coupled to the task scheduling unit. The task scheduling unit performs a task dependency assessment of a task dependency graph and task data requirements that correspond to each task of the plurality of tasks. Based on the task dependency assessment, the task scheduling unit schedules a first task of the plurality of tasks and a second proxy object of a plurality of proxy objects specified by the task data requirements such that a memory transfer of the second proxy object of the plurality of proxy objects occurs while the first task is being executed.
    Type: Application
    Filed: January 11, 2023
    Publication date: July 20, 2023
    Inventors: Muhammad Amber HASSAAN, Anirudh Mohan KAUSHIK, Sooraj PUTHOOR, Gokul Subramanian RAVI, Bradford BECKMANN, Ashwin AJI
  • Publication number: 20230205705
    Abstract: An approach provides indirect addressing support for PIM. Indirect PIM commands include address translation information that allows memory modules to perform indirect addressing. Processing logic in a memory module processes an indirect PIM command and retrieves, from a first memory location, a virtual address of a second memory location. The processing logic calculates a corresponding physical address for the virtual address using the address translation information included with the indirect PIM command and retrieves, from the second memory location, a virtual address of a third memory location. This process is repeated any number of times until one or more indirection stop criteria are satisfied. The indirection stop criteria stop the process when work has been completed normally or to prevent errors. Implementations include the processing logic in the memory module working in cooperation with a memory controller to perform indirect addressing.
    Type: Application
    Filed: December 23, 2021
    Publication date: June 29, 2023
    Inventors: Matthew R. Poremba, Alexandru Dutu, Sooraj Puthoor
  • Publication number: 20230196502
    Abstract: A processing unit includes one or more processor cores and a set of registers to store configuration information for the processing unit. The processing unit also includes a coprocessor configured to receive a request to modify a memory allocation for a kernel concurrently with the kernel executing on the at least one processor core. The coprocessor is configured to modify the memory allocation by modifying the configuration information stored in the set of registers. In some cases, initial configuration information is provided to the set of registers by a different processing unit. The initial configuration information is stored in the set of registers prior to the coprocessor modifying the configuration information.
    Type: Application
    Filed: January 30, 2023
    Publication date: June 22, 2023
    Inventors: Anthony GUTIERREZ, Muhammad Amber Hassaan, Sooraj Puthoor
  • Publication number: 20230195645
    Abstract: Process isolation for a PIM device includes: receiving, from a process, a call to allocate a virtual address space where the process stores a PIM configuration context; allocating the virtual address space including mapping a physical address space including PIM device configuration registers to the virtual address space only if the physical address space is not mapped to another process's virtual address space; and programming the PIM device configuration space according to the configuration context. When a PIM command is executed, a translation mechanism determines whether there is a valid mapping of a virtual address of the PIM command to a physical address of a PIM resource, such as a LIS entry. If a valid mapping exists, the translation is completed and the resource is accessed, but if there is not a valid mapping, the translation fails and the process is blocked from accessing the PIM resource.
    Type: Application
    Filed: December 20, 2021
    Publication date: June 22, 2023
    Inventors: SOORAJ PUTHOOR, MUHAMMAD AMBER HASSAAN, ASHWIN AJI, MICHAEL L. CHU, NUWAN JAYASENA
  • Publication number: 20230195375
    Abstract: Process isolation for a PIM device through exclusive locking includes receiving, from a process, a call requesting ownership of a PIM device. The request includes one or more PIM configuration parameters. The exclusive locking technique also includes granting the process ownership of the PIM device responsive to determining that ownership is available. The PIM device is configured according to the PIM configuration parameters.
    Type: Application
    Filed: December 20, 2021
    Publication date: June 22, 2023
    Inventors: SOORAJ PUTHOOR, MUHAMMAD AMBER HASSAAN, ASHWIN AJI, MICHAEL L. CHU, NUWAN JAYASENA
  • Publication number: 20230195459
    Abstract: An apparatus that manages multi-process execution in a processing-in-memory (“PIM”) device includes a gatekeeper configured to: receive an identification of one or more registered PIM processes; receive, from a process, a memory request that includes a PIM command; if the requesting process is a registered PIM process and another registered PIM process is active on the PIM device, perform a context switch of PIM state between the registered PIM processes; and issue the PIM command of the requesting process to the PIM device.
    Type: Application
    Filed: December 20, 2021
    Publication date: June 22, 2023
    Inventors: SOORAJ PUTHOOR, MUHAMMAD AMBER HASSAAN, ASHWIN AJI, MICHAEL L. CHU, NUWAN JAYASENA
  • Publication number: 20230185607
    Abstract: A processor core is configured to execute a parent task that is described by a data structure stored in a memory. A coprocessor is configured to dispatch a child task to the at least one processor core in response to the coprocessor receiving a request from the parent task concurrently with the parent task executing on the at least one processor core. In some cases, the parent task registers the child task in a task pool and the child task is a future task that is configured to monitor a completion object and enqueue another task associated with the future task in response to detecting the completion object. The future task is configured to self-enqueue by adding a continuation future task to a continuation queue for subsequent execution in response to the future task failing to detect the completion object.
    Type: Application
    Filed: November 23, 2022
    Publication date: June 15, 2023
    Inventors: Anthony GUTIERREZ, Sooraj PUTHOOR
  • Publication number: 20230097115
    Abstract: A processing system executes a specialized wavefront, referred to as a “garbage collecting wavefront” or GCWF, to identify and deallocate resources such as, for example, scalar registers, vector registers, and local data share space, that are no longer being used by wavefronts of a workgroup executing at the processing system (i.e., dead resources). In some embodiments, the GCWF is programmed to have compiler information regarding the resource requirements of the other wavefronts of the workgroup and specifies the program counter after which there will be a permanent drop in resource requirements for the other wavefronts. In other embodiments, the standard compute wavefronts signal the GCWF when they have completed using resources. The GCWF sends a command to deallocate the dead resources so the dead resources can be made available for additional wavefronts.
    Type: Application
    Filed: September 27, 2021
    Publication date: March 30, 2023
    Inventors: Anthony Gutierrez, Sooraj Puthoor
  • Patent number: 11573724
    Abstract: A processing apparatus is provided that includes NVRAM and one or more processors configured to process a first set and a second set of instructions according to a hierarchical processing scope and process a scoped persistence barrier residing in the program after the first instruction set and before the second instruction set. The barrier includes an instruction to cause first data to persist in the NVRAM before second data persists in the NVRAM. The first data results from execution of each of the first set of instructions processed according to the one hierarchical processing scope. The second data results from execution of each of the second set of instructions processed according to the one hierarchical processing scope. The processing apparatus also includes a controller configured to cause the first data to persist in the NVRAM before the second data persists in the NVRAM based on the scoped persistence barrier.
    Type: Grant
    Filed: June 5, 2019
    Date of Patent: February 7, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Arkaprava Basu, Mitesh R. Meswani, Dibakar Gope, Sooraj Puthoor
  • Patent number: 11550627
    Abstract: A processor core is configured to execute a parent task that is described by a data structure stored in a memory. A coprocessor is configured to dispatch a child task to the at least one processor core in response to the coprocessor receiving a request from the parent task concurrently with the parent task executing on the at least one processor core. In some cases, the parent task registers the child task in a task pool and the child task is a future task that is configured to monitor a completion object and enqueue another task associated with the future task in response to detecting the completion object. The future task is configured to self-enqueue by adding a continuation future task to a continuation queue for subsequent execution in response to the future task failing to detect the completion object.
    Type: Grant
    Filed: March 29, 2021
    Date of Patent: January 10, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Anthony Gutierrez, Sooraj Puthoor
  • Patent number: 11544106
    Abstract: Systems, apparatuses, and methods for implementing continuation analysis tasks (CATs) are disclosed. In one embodiment, a system implements hardware acceleration of CATs to manage the dependencies and scheduling of an application composed of multiple tasks. In one embodiment, a continuation packet is referenced directly by a first task. When the first task completes, the first task enqueues a continuation packet on a first queue. The first task can specify on which queue to place the continuation packet. The agent responsible for the first queue dequeues and executes the continuation packet which invokes an analysis phase which is performed prior to determining which dependent tasks to enqueue. If it is determined during the analysis phase that a second task is now ready to be launched, the second task is enqueued on one of the queues. Then, an agent responsible for this queue dequeues and executes the second task.
    Type: Grant
    Filed: April 13, 2020
    Date of Patent: January 3, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Steven Tony Tye, Brian L. Sumner, Bradford Michael Beckmann, Sooraj Puthoor
  • Patent number: 11507522
    Abstract: Systems, apparatuses, and methods for implementing memory request priority assignment techniques for parallel processors are disclosed. A system includes at least a parallel processor coupled to a memory subsystem, where the parallel processor includes at least a plurality of compute units for executing wavefronts in lock-step. The parallel processor assigns priorities to memory requests of wavefronts on a per-work-item basis by indexing into a first priority vector, with the index generated based on lane-specific information. If a given event is detected, a second priority vector is generated by applying a given priority promotion vector to the first priority vector. Then, for subsequent wavefronts, memory requests are assigned priorities by indexing into the second priority vector with lane-specific information. The use of priority vectors to assign priorities to memory requests helps to reduce the memory divergence problem experienced by different work-items of a wavefront.
    Type: Grant
    Filed: December 6, 2019
    Date of Patent: November 22, 2022
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Sooraj Puthoor, Kishore Punniyamurthy, Onur Kayiran, Xianwei Zhang, Yasuko Eckert, Johnathan Alsop, Bradford Michael Beckmann
  • Publication number: 20220207643
    Abstract: Implementing heterogenous wavefronts on a graphics processing unit (GPU) is disclosed. A schedule assigns heterogeneous wavefronts for execution on a compute unit of a processing device. The heterogeneous wavefronts include different types of wavefronts such as vector compute wavefronts service-level wavefronts that vary in resource requirements and instruction sets. As one example, heterogenous wavefronts may include scalar wavefronts and vector compute wavefronts that execute on scalar units and vector units, respectively. Distinct sets of instructions are executed for the heterogenous wavefronts on the compute unit. Heterogenous wavefronts are processed in the same pipeline of the processing device.
    Type: Application
    Filed: December 28, 2020
    Publication date: June 30, 2022
    Inventors: SOORAJ PUTHOOR, BRADFORD BECKMANN, NUWAN JAYASENA, ANTHONY GUTIERREZ
  • Publication number: 20220197696
    Abstract: Methods, devices, and systems for launching a compute kernel. A reference kernel dispatch packet is received by a kernel agent. The reference kernel dispatch packet is processed by the kernel agent to determine kernel dispatch information. The kernel dispatch information is stored by the kernel agent. A kernel is dispatched by the kernel agent, based on the kernel dispatch information. In some implementations, a condensed kernel dispatch packet is received by the kernel agent, the condensed kernel dispatch packet is processed by the kernel agent to retrieve the stored kernel dispatch information, and a kernel is dispatched by the kernel agent based on the retrieved kernel dispatch information.
    Type: Application
    Filed: December 23, 2020
    Publication date: June 23, 2022
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Sooraj Puthoor, Bradford M. Beckmann
  • Publication number: 20220188493
    Abstract: Methods, devices, and systems for information communication. Information transmitted from a host to a graphics processing unit (GPU) is received by information analysis circuitry of a field-programmable gate array (FPGA). A pattern in the information is determined by the information analysis circuitry. A predicted information pattern is determined, by the information analysis circuitry, based on the information. An indication of the predicted information pattern is transmitted to the host. Responsive to a signal from the host based on the predicted information pattern, the FPGA is reprogrammed to implement decompression circuitry based on the predicted information pattern. In some implementations, the information includes a plurality of packets. In some implementations, the predicted information pattern includes a pattern in a plurality of packets. In some implementations, the predicted information pattern includes a zero data pattern.
    Type: Application
    Filed: December 10, 2020
    Publication date: June 16, 2022
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Kevin Y. Cheng, Sooraj Puthoor, Onur Kayiran
  • Publication number: 20220114097
    Abstract: Methods, devices, and systems for managing performance of a processor having multiple compute units. An effective number of the multiple compute units may be determined to designate as having priority. On a condition that the effective number is nonzero, the effective number of the multiple compute units may each be designated as a priority compute unit. Priority compute units may have access to a shared cache whereas non-priority compute units may not. Workgroups may be preferentially dispatched to priority compute units. Memory access requests from priority compute units may be served ahead of requests from non-priority compute units.
    Type: Application
    Filed: December 20, 2021
    Publication date: April 14, 2022
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Zhe Wang, Sooraj Puthoor, Bradford M. Beckmann
  • Patent number: 11231931
    Abstract: A processor includes a first core and a second core to execute computer instructions. Each of the cores includes its own private memory cache and speculative load queue. The speculative load queue stores cachelines for the computer instructions and data when the core is operating in a speculative state with respect to a process or thread. The processor includes a state tracking buffer having a state field to store a speculative exclusive ownership state for each cacheline in the speculative load queue when present therein.
    Type: Grant
    Filed: December 20, 2018
    Date of Patent: January 25, 2022
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventor: Sooraj Puthoor
  • Patent number: 11204871
    Abstract: Methods, devices, and systems for managing performance of a processor having multiple compute units. An effective number of the multiple compute units may be determined to designate as having priority. On a condition that the effective number is nonzero, the effective number of the multiple compute units may each be designated as a priority compute unit. Priority compute units may have access to a shared cache whereas non-priority compute units may not. Workgroups may be preferentially dispatched to priority compute units. Memory access requests from priority compute units may be served ahead of requests from non-priority compute units.
    Type: Grant
    Filed: June 30, 2015
    Date of Patent: December 21, 2021
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Zhe Wang, Sooraj Puthoor, Bradford M. Beckmann
  • Publication number: 20210294646
    Abstract: A processor includes a task scheduling unit and a compute unit coupled to the task scheduling unit. The task scheduling unit performs a task dependency assessment of a task dependency graph and task data requirements that correspond to each task of the plurality of tasks. Based on the task dependency assessment, the task scheduling unit schedules a first task of the plurality of tasks and a second proxy object of a plurality of proxy objects specified by the task data requirements such that a memory transfer of the second proxy object of the plurality of proxy objects occurs while the first task is being executed.
    Type: Application
    Filed: March 19, 2020
    Publication date: September 23, 2021
    Inventors: Muhammad Amber HASSAAN, Anirudh Mohan KAUSHIK, Sooraj PUTHOOR, Gokul Subramanian RAVI, Bradford BECKMANN, Ashwin AJI
  • Publication number: 20210216368
    Abstract: A processor core is configured to execute a parent task that is described by a data structure stored in a memory. A coprocessor is configured to dispatch a child task to the at least one processor core in response to the coprocessor receiving a request from the parent task concurrently with the parent task executing on the at least one processor core. In some cases, the parent task registers the child task in a task pool and the child task is a future task that is configured to monitor a completion object and enqueue another task associated with the future task in response to detecting the completion object. The future task is configured to self-enqueue by adding a continuation future task to a continuation queue for subsequent execution in response to the future task failing to detect the completion object.
    Type: Application
    Filed: March 29, 2021
    Publication date: July 15, 2021
    Inventors: Anthony GUTIERREZ, Sooraj PUTHOOR