Patents by Inventor Sooraj Puthoor

Sooraj Puthoor has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

CONDENSED COMMAND PACKET FOR HIGH THROUGHPUT AND LOW OVERHEAD KERNEL LAUNCH

Publication number: 20220197696

Abstract: Methods, devices, and systems for launching a compute kernel. A reference kernel dispatch packet is received by a kernel agent. The reference kernel dispatch packet is processed by the kernel agent to determine kernel dispatch information. The kernel dispatch information is stored by the kernel agent. A kernel is dispatched by the kernel agent, based on the kernel dispatch information. In some implementations, a condensed kernel dispatch packet is received by the kernel agent, the condensed kernel dispatch packet is processed by the kernel agent to retrieve the stored kernel dispatch information, and a kernel is dispatched by the kernel agent based on the retrieved kernel dispatch information.

Type: Application

Filed: December 23, 2020

Publication date: June 23, 2022

Applicant: Advanced Micro Devices, Inc.

Inventors: Sooraj Puthoor, Bradford M. Beckmann
FPGA-BASED PROGRAMMABLE DATA ANALYSIS AND COMPRESSION FRONT END FOR GPU

Publication number: 20220188493

Abstract: Methods, devices, and systems for information communication. Information transmitted from a host to a graphics processing unit (GPU) is received by information analysis circuitry of a field-programmable gate array (FPGA). A pattern in the information is determined by the information analysis circuitry. A predicted information pattern is determined, by the information analysis circuitry, based on the information. An indication of the predicted information pattern is transmitted to the host. Responsive to a signal from the host based on the predicted information pattern, the FPGA is reprogrammed to implement decompression circuitry based on the predicted information pattern. In some implementations, the information includes a plurality of packets. In some implementations, the predicted information pattern includes a pattern in a plurality of packets. In some implementations, the predicted information pattern includes a zero data pattern.

Type: Application

Filed: December 10, 2020

Publication date: June 16, 2022

Applicant: Advanced Micro Devices, Inc.

Inventors: Kevin Y. Cheng, Sooraj Puthoor, Onur Kayiran
SYSTEM PERFORMANCE MANAGEMENT USING PRIORITIZED COMPUTE UNITS

Publication number: 20220114097

Abstract: Methods, devices, and systems for managing performance of a processor having multiple compute units. An effective number of the multiple compute units may be determined to designate as having priority. On a condition that the effective number is nonzero, the effective number of the multiple compute units may each be designated as a priority compute unit. Priority compute units may have access to a shared cache whereas non-priority compute units may not. Workgroups may be preferentially dispatched to priority compute units. Memory access requests from priority compute units may be served ahead of requests from non-priority compute units.

Type: Application

Filed: December 20, 2021

Publication date: April 14, 2022

Applicant: Advanced Micro Devices, Inc.

Inventors: Zhe Wang, Sooraj Puthoor, Bradford M. Beckmann
Mechanism for mitigating information leak via cache side channels during speculative execution

Patent number: 11231931

Abstract: A processor includes a first core and a second core to execute computer instructions. Each of the cores includes its own private memory cache and speculative load queue. The speculative load queue stores cachelines for the computer instructions and data when the core is operating in a speculative state with respect to a process or thread. The processor includes a state tracking buffer having a state field to store a speculative exclusive ownership state for each cacheline in the speculative load queue when present therein.

Type: Grant

Filed: December 20, 2018

Date of Patent: January 25, 2022

Assignee: ADVANCED MICRO DEVICES, INC.

Inventor: Sooraj Puthoor
System performance management using prioritized compute units

Patent number: 11204871

Abstract: Methods, devices, and systems for managing performance of a processor having multiple compute units. An effective number of the multiple compute units may be determined to designate as having priority. On a condition that the effective number is nonzero, the effective number of the multiple compute units may each be designated as a priority compute unit. Priority compute units may have access to a shared cache whereas non-priority compute units may not. Workgroups may be preferentially dispatched to priority compute units. Memory access requests from priority compute units may be served ahead of requests from non-priority compute units.

Type: Grant

Filed: June 30, 2015

Date of Patent: December 21, 2021

Assignee: Advanced Micro Devices, Inc.

Inventors: Zhe Wang, Sooraj Puthoor, Bradford M. Beckmann
HARDWARE ASSISTED FINE-GRAINED DATA MOVEMENT

Publication number: 20210294646

Abstract: A processor includes a task scheduling unit and a compute unit coupled to the task scheduling unit. The task scheduling unit performs a task dependency assessment of a task dependency graph and task data requirements that correspond to each task of the plurality of tasks. Based on the task dependency assessment, the task scheduling unit schedules a first task of the plurality of tasks and a second proxy object of a plurality of proxy objects specified by the task data requirements such that a memory transfer of the second proxy object of the plurality of proxy objects occurs while the first task is being executed.

Type: Application

Filed: March 19, 2020

Publication date: September 23, 2021

Inventors: Muhammad Amber HASSAAN, Anirudh Mohan KAUSHIK, Sooraj PUTHOOR, Gokul Subramanian RAVI, Bradford BECKMANN, Ashwin AJI
HARDWARE ACCELERATED DYNAMIC WORK CREATION ON A GRAPHICS PROCESSING UNIT

Publication number: 20210216368

Abstract: A processor core is configured to execute a parent task that is described by a data structure stored in a memory. A coprocessor is configured to dispatch a child task to the at least one processor core in response to the coprocessor receiving a request from the parent task concurrently with the parent task executing on the at least one processor core. In some cases, the parent task registers the child task in a task pool and the child task is a future task that is configured to monitor a completion object and enqueue another task associated with the future task in response to detecting the completion object. The future task is configured to self-enqueue by adding a continuation future task to a continuation queue for subsequent execution in response to the future task failing to detect the completion object.

Type: Application

Filed: March 29, 2021

Publication date: July 15, 2021

Inventors: Anthony GUTIERREZ, Sooraj PUTHOOR
MEMORY REQUEST PRIORITY ASSIGNMENT TECHNIQUES FOR PARALLEL PROCESSORS

Publication number: 20210173796

Abstract: Systems, apparatuses, and methods for implementing memory request priority assignment techniques for parallel processors are disclosed. A system includes at least a parallel processor coupled to a memory subsystem, where the parallel processor includes at least a plurality of compute units for executing wavefronts in lock-step. The parallel processor assigns priorities to memory requests of wavefronts on a per-work-item basis by indexing into a first priority vector, with the index generated based on lane-specific information. If a given event is detected, a second priority vector is generated by applying a given priority promotion vector to the first priority vector. Then, for subsequent wavefronts, memory requests are assigned priorities by indexing into the second priority vector with lane-specific information. The use of priority vectors to assign priorities to memory requests helps to reduce the memory divergence problem experienced by different work-items of a wavefront.

Type: Application

Filed: December 6, 2019

Publication date: June 10, 2021

Inventors: Sooraj Puthoor, Kishore Punniyamurthy, Onur Kayiran, Xianwei Zhang, Yasuko Eckert, Johnathan Alsop, Bradford Michael Beckmann
Hardware accelerated dynamic work creation on a graphics processing unit

Patent number: 10963299

Abstract: A processor core is configured to execute a parent task that is described by a data structure stored in a memory. A coprocessor is configured to dispatch a child task to the at least one processor core in response to the coprocessor receiving a request from the parent task concurrently with the parent task executing on the at least one processor core. In some cases, the parent task registers the child task in a task pool and the child task is a future task that is configured to monitor a completion object and enqueue another task associated with the future task in response to detecting the completion object. The future task is configured to self-enqueue by adding a continuation future task to a continuation queue for subsequent execution in response to the future task failing to detect the completion object.

Type: Grant

Filed: September 18, 2018

Date of Patent: March 30, 2021

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Anthony Gutierrez, Sooraj Puthoor
CONTINUATION ANALYSIS TASKS FOR GPU TASK SCHEDULING

Publication number: 20200379802

Abstract: Systems, apparatuses, and methods for implementing continuation analysis tasks (CATs) are disclosed. In one embodiment, a system implements hardware acceleration of CATs to manage the dependencies and scheduling of an application composed of multiple tasks. In one embodiment, a continuation packet is referenced directly by a first task. When the first task completes, the first task enqueues a continuation packet on a first queue. The first task can specify on which queue to place the continuation packet. The agent responsible for the first queue dequeues and executes the continuation packet which invokes an analysis phase which is performed prior to determining which dependent tasks to enqueue. If it is determined during the analysis phase that a second task is now ready to be launched, the second task is enqueued on one of the queues. Then, an agent responsible for this queue dequeues and executes the second task.

Type: Application

Filed: April 13, 2020

Publication date: December 3, 2020

Inventors: Steven Tony Tye, Brian L. Sumner, Bradford Michael Beckmann, Sooraj Puthoor
Heterogeneous graphics processing unit for scheduling thread groups for execution on variable width SIMD units

Patent number: 10713059

Abstract: A compute unit configured to execute multiple threads in parallel is presented. The compute unit includes one or more single instruction multiple data (SIMD) units and a fetch and decode logic. The SIMD units have differing numbers of arithmetic logic units (ALUs), such that each SIMD unit can execute a different number of threads. The fetch and decode logic is in communication with each of the SIMD units, and is configured to assign the threads to the SIMD units for execution based on such differing numbers of ALUs.

Type: Grant

Filed: September 18, 2014

Date of Patent: July 14, 2020

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Joseph L. Greathouse, Mitesh R. Meswani, Sooraj Puthoor, Dmitri Yudanov, James M. O'Connor
LAXITY-AWARE, DYNAMIC PRIORITY VARIATION AT A PROCESSOR

Publication number: 20200167191

Abstract: A processing system includes a task queue, a laxity-aware task scheduler coupled to the task queue, and a workgroup dispatcher coupled to the laxity-aware task scheduler. Based on a laxity evaluation of laxity values associated with a plurality of tasks stored in the task queue, the workgroup dispatcher schedules the plurality of tasks. The laxity evaluation includes determining a priority of each task of the plurality of tasks. The laxity value is determined using laxity information, where the laxity information includes an arrival time, a task duration, a task deadline, and a number of workgroups.

Type: Application

Filed: November 26, 2018

Publication date: May 28, 2020

Inventors: Tsung Tai YEH, Bradford BECKMANN, Sooraj PUTHOOR, Matthew David SINCLAIR
Continuation analysis tasks for GPU task scheduling

Patent number: 10620994

Abstract: Systems, apparatuses, and methods for implementing continuation analysis tasks (CATs) are disclosed. In one embodiment, a system implements hardware acceleration of CATs to manage the dependencies and scheduling of an application composed of multiple tasks. In one embodiment, a continuation packet is referenced directly by a first task. When the first task completes, the first task enqueues a continuation packet on a first queue. The first task can specify on which queue to place the continuation packet. The agent responsible for the first queue dequeues and executes the continuation packet which invokes an analysis phase which is performed prior to determining which dependent tasks to enqueue. If it is determined during the analysis phase that a second task is now ready to be launched, the second task is enqueued on one of the queues. Then, an agent responsible for this queue dequeues and executes the second task.

Type: Grant

Filed: May 30, 2017

Date of Patent: April 14, 2020

Assignee: Advanced Micro Devices, Inc.

Inventors: Steven Tony Tye, Brian L. Sumner, Bradford Michael Beckmann, Sooraj Puthoor
DYNAMIC MEMORY MANAGEMENT ON A GRAPHICS PROCESSING UNIT

Publication number: 20200098082

Abstract: A processing unit includes one or more processor cores and a set of registers to store configuration information for the processing unit. The processing unit also includes a coprocessor configured to receive a request to modify a memory allocation for a kernel concurrently with the kernel executing on the at least one processor core. The coprocessor is configured to modify the memory allocation by modifying the configuration information stored in the set of registers. In some cases, initial configuration information is provided to the set of registers by a different processing unit. The initial configuration information is stored in the set of registers prior to the coprocessor modifying the configuration information.

Type: Application

Filed: September 21, 2018

Publication date: March 26, 2020

Inventors: Anthony GUTIERREZ, Muhammad Amber HASSAAN, Sooraj PUTHOOR
HARDWARE ACCELERATED DYNAMIC WORK CREATION ON A GRAPHICS PROCESSING UNIT

Publication number: 20200089528

Abstract: A processor core is configured to execute a parent task that is described by a data structure stored in a memory. A coprocessor is configured to dispatch a child task to the at least one processor core in response to the coprocessor receiving a request from the parent task concurrently with the parent task executing on the at least one processor core. In some cases, the parent task registers the child task in a task pool and the child task is a future task that is configured to monitor a completion object and enqueue another task associated with the future task in response to detecting the completion object. The future task is configured to self-enqueue by adding a continuation future task to a continuation queue for subsequent execution in response to the future task failing to detect the completion object.

Type: Application

Filed: September 18, 2018

Publication date: March 19, 2020

Inventors: Anthony GUTIERREZ, Sooraj PUTHOOR
MULTI-KERNEL WAVEFRONT SCHEDULER

Publication number: 20190370059

Abstract: Systems, apparatuses, and methods for implementing a multi-kernel wavefront scheduler are disclosed. A system includes at least a parallel processor coupled to one or more memories, wherein the parallel processor includes a command processor and a plurality of compute units. The command processor launches multiple kernels for execution on the compute units. Each compute unit includes a multi-level scheduler for scheduling wavefronts from multiple kernels for execution on its execution units. A first level scheduler creates scheduling groups by grouping together wavefronts based on the priority of their kernels. Accordingly, wavefronts from kernels with the same priority are grouped together in the same scheduling group by the first level scheduler. Next, the first level scheduler selects, from a plurality of scheduling groups, the highest priority scheduling group for execution. Then, a second level scheduler schedules wavefronts for execution from the scheduling group selected by the first level scheduler.

Type: Application

Filed: May 30, 2018

Publication date: December 5, 2019

Inventors: Sooraj Puthoor, Joseph Gross, Xulong Tang, Bradford Michael Beckmann
SCOPED PERSISTENCE BARRIERS FOR NON-VOLATILE MEMORIES

Publication number: 20190286362

Abstract: A processing apparatus is provided that includes NVRAM and one or more processors configured to process a first set and a second set of instructions according to a hierarchical processing scope and process a scoped persistence barrier residing in the program after the first instruction set and before the second instruction set. The barrier includes an instruction to cause first data to persist in the NVRAM before second data persists in the NVRAM. The first data results from execution of each of the first set of instructions processed according to the one hierarchical processing scope. The second data results from execution of each of the second set of instructions processed according to the one hierarchical processing scope. The processing apparatus also includes a controller configured to cause the first data to persist in the NVRAM before the second data persists in the NVRAM based on the scoped persistence barrier.

Type: Application

Filed: June 5, 2019

Publication date: September 19, 2019

Applicant: Advanced Micro Devices, Inc.

Inventors: Arkaprava Basu, Mitesh R. Meswani, Dibakar Gope, Sooraj Puthoor
Method and apparatus for inter-lane thread migration

Patent number: 10409610

Abstract: Briefly, methods and apparatus to migrate a software thread from one wavefront executing on one execution unit to another wavefront executing on another execution unit whereby both execution units are associated with a compute unit of a processing device such as, for example, a GPU. The methods and apparatus may execute compiled dynamic thread migration swizzle buffer instructions that when executed allow access to a dynamic thread migration swizzle buffer that allows for the migration of register context information when migrating software threads. The register context information may be located in one or more locations of a register file prior to storing the register context information into the dynamic thread migration swizzle buffer. The method and apparatus may also return the register context information from the dynamic thread migration swizzle buffer to one or more different register file locations of the register file.

Type: Grant

Filed: January 29, 2016

Date of Patent: September 10, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Bradford Beckmann, Sooraj Puthoor
Scoped persistence barriers for non-volatile memories

Patent number: 10324650

Abstract: A processing apparatus is provided that includes NVRAM and one or more processors configured to process a first set and a second set of instructions according to a hierarchical processing scope and process a scoped persistence barrier residing in the program after the first instruction set and before the second instruction set. The barrier includes an instruction to cause first data to persist in the NVRAM before second data persists in the NVRAM. The first data results from execution of each of the first set of instructions processed according to the one hierarchical processing scope. The second data results from execution of each of the second set of instructions processed according to the one hierarchical processing scope. The processing apparatus also includes a controller configured to cause the first data to persist in the NVRAM before the second data persists in the NVRAM based on the scoped persistence barrier.

Type: Grant

Filed: September 23, 2016

Date of Patent: June 18, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Arkaprava Basu, Mitesh R. Meswani, Dibakar Gope, Sooraj Puthoor
CONTINUATION ANALYSIS TASKS FOR GPU TASK SCHEDULING

Publication number: 20180349145

Abstract: Systems, apparatuses, and methods for implementing continuation analysis tasks (CATs) are disclosed. In one embodiment, a system implements hardware acceleration of CATs to manage the dependencies and scheduling of an application composed of multiple tasks. In one embodiment, a continuation packet is referenced directly by a first task. When the first task completes, the first task enqueues a continuation packet on a first queue. The first task can specify on which queue to place the continuation packet. The agent responsible for the first queue dequeues and executes the continuation packet which invokes an analysis phase which is performed prior to determining which dependent tasks to enqueue. If it is determined during the analysis phase that a second task is now ready to be launched, the second task is enqueued on one of the queues. Then, an agent responsible for this queue dequeues and executes the second task.

Type: Application

Filed: May 30, 2017

Publication date: December 6, 2018

Inventors: Steven Tony Tye, Brian L. Sumner, Bradford Michael Beckmann, Sooraj Puthoor

prev 1 2 3 next