Patents by Inventor Keith Lowery

Keith Lowery has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

HETEROGENEOUS ENQUEUING AND DEQUEUING MECHANISM FOR TASK SCHEDULING

Publication number: 20160371116

Abstract: Methods, systems and computer-readable mediums for task scheduling on an accelerated processing device (APD) are provided. In an embodiment, a method comprises: enqueuing one or more tasks in a memory storage module based on the APD; using a software-based enqueuing module; and dequeuing the one or more tasks from the memory storage module using a hardware-based command processor, wherein the command processor forwards the one or more tasks to the shader cote.

Type: Application

Filed: August 29, 2016

Publication date: December 22, 2016

Applicant: Advanced Micro Devices, Inc.

Inventors: Benjamin Thomas Sander, Michael Houston, Newton Cheung, Keith Lowery
Heterogeneous enqueuing and dequeuing mechanism for task scheduling

Patent number: 9430281

Abstract: Methods, systems and computer-readable mediums for task scheduling on an accelerated processing device (APD) are provided. In an embodiment, a method comprises: enqueuing one or more tasks in a memory storage module based on the APD; using a software-based enqueuing module; and dequeuing the one or more tasks from the memory storage module using a hardware-based command processor, wherein the command processor forwards the one or more tasks to the shader core.

Type: Grant

Filed: November 9, 2011

Date of Patent: August 30, 2016

Assignee: Advanced Micro Devices, Inc.

Inventors: Benjamin Thomas Sander, Michael Houston, Newton Cheung, Keith Lowery
Automatic load balancing for heterogeneous cores

Patent number: 8782645

Abstract: A system and method for efficient automatic scheduling of the execution of work units between multiple heterogeneous processor cores. A processing node includes a first processor core with a general-purpose micro-architecture and a second processor core with a single instruction multiple data micro-architecture. A computer program comprises one or more compute kernels, or function calls. A compiler computes pre-runtime information of the given function call. A runtime scheduler produces one or more work units by matching each of the one or more kernels with an associated record of data. The scheduler assigns work units either to the first or to the second processor core based at least in part on the computed pre-runtime information. In addition, the scheduler is able to change an original assignment for a waiting work unit based on dynamic runtime behavior of other work units corresponding to a same kernel as the waiting work unit.

Type: Grant

Filed: May 11, 2011

Date of Patent: July 15, 2014

Assignee: Advanced Micro Devices, Inc.

Inventors: Mauricio Breternitz, Patryk Kaminski, Keith Lowery, Anton Chernoff
Optimizing communication of system call requests

Patent number: 8752064

Abstract: Provided herein is a method for optimizing communication for system calls. The method includes storing a system call for each work item in a wavefront and transmitting said stored system calls to a processor for execution. The method also includes receiving a result to each work item in the wavefront responsive to said transmitting.

Type: Grant

Filed: November 30, 2011

Date of Patent: June 10, 2014

Assignee: Advanced Micro Devices, Inc.

Inventors: Benjamin Thomas Sander, Michael Houston, Newton Cheung, Keith Lowery
Systems and methods implementing non-shared page tables for sharing memory resources managed by a main operating system with accelerator devices

Patent number: 8719543

Abstract: Systems and methods are provided that utilize non-shared page tables to allow an accelerator device to share physical memory of a computer system that is managed by and operates under control of an operating system. The computer system can include a multi-core central processor unit. The accelerator device can be, for example, an isolated core processor device of the multi-core central processor unit that is sequestered for use independently of the operating system, or an external device that is communicatively coupled to the computer system.

Type: Grant

Filed: December 29, 2009

Date of Patent: May 6, 2014

Assignee: Advanced Micro Devices, Inc.

Inventors: Patryk Kaminski, Thomas Woller, Keith Lowery, Erich Boleyn
Automatic kernel migration for heterogeneous cores

Patent number: 8683468

Abstract: A system and method for automatically migrating the execution of work units between multiple heterogeneous cores. A computing system includes a first processor core with a single instruction multiple data micro-architecture and a second processor core with a general-purpose micro-architecture. A compiler predicts execution of a function call in a program migrates at a given location to a different processor core. The compiler creates a data structure to support moving live values associated with the execution of the function call at the given location. An operating system (OS) scheduler schedules at least code before the given location in program order to the first processor core. In response to receiving an indication that a condition for migration is satisfied, the OS scheduler moves the live values to a location indicated by the data structure for access by the second processor core and schedules code after the given location to the second processor core.

Type: Grant

Filed: May 16, 2011

Date of Patent: March 25, 2014

Assignee: Advanced Micro Devices, Inc.

Inventors: Mauricio Breternitz, Patryk Kaminski, Keith Lowery, Anton Chernoff, Dz-Ching Ju
Computer system interrupt handling

Patent number: 8667201

Abstract: A system, method and article of manufacture for an accelerated processing device (APD) to request a central processing unit (CPU) to process a task, comprising enqueuing a plurality of tasks on a queue using the APD, generating a user-level interrupt and transmitting to the CPU the plurality of tasks in the queue using an interrupt handler associated with a CPU thread.

Type: Grant

Filed: November 9, 2011

Date of Patent: March 4, 2014

Assignee: Advanced Micro Devices, Inc.

Inventors: Benjamin Thomas Sander, Michael Houston, Newton Cheung, Keith Lowery
System Call Queue Between Visible and Invisible Computing Devices

Publication number: 20130263144

Abstract: Embodiments described herein include a system, a computer-readable medium and a computer-implemented method for processing a system call (SYSCALL) request. The SYSCALL request from an invisible processing device is stored in a queueing mechanism that is accessible to a visible processing device, where the visible processing device is visible to an operating system and the invisible processing device is invisible to the operating system. The SYSCALL request is processed using the visible processing device, and the invisible processing device is notified using a notification mechanism that the SYSCALL request was processed.

Type: Application

Filed: March 29, 2013

Publication date: October 3, 2013

Applicant: Advanced Micro Devices, Inc.

Inventors: Benjamin Thomas Sander, Michael Clair Houston, Keith Lowery, Newton Cheung
BRANCH REMOVAL BY DATA SHUFFLING

Publication number: 20120331278

Abstract: A system and method for automatically optimizing parallel execution of multiple work units in a processor by reducing a number of branch instructions. A computing system includes a first processor core with a general-purpose micro-architecture and a second processor core with a same instruction multiple data (SIMD) micro-architecture. A compiler detects and evaluates branches within function calls with one or more records of data used to determine one or more outcomes. Multiple compute sub-kernels are generated, each comprising code from the function corresponding to a unique outcome of the branch. Multiple work units are produced by assigning one or more records of data corresponding to a given outcome of the branch to one of the multiple compute sub-kernels associated with the given outcome. The branch is removed. An operating system scheduler schedules each of the one or more compute sub-kernels to the first processor core or to the second processor core.

Type: Application

Filed: June 23, 2011

Publication date: December 27, 2012

Inventors: Mauricio Breternitz, Patryk Kaminski, Keith Lowery
AUTOMATIC KERNEL MIGRATION FOR HETEROGENEOUS CORES

Publication number: 20120297163

Abstract: A system and method for automatically migrating the execution of work units between multiple heterogeneous cores. A computing system includes a first processor core with a single instruction multiple data micro-architecture and a second processor core with a general-purpose micro-architecture. A compiler predicts execution of a function call in a program migrates at a given location to a different processor core. The compiler creates a data structure to support moving live values associated with the execution of the function call at the given location. An operating system (OS) scheduler schedules at least code before the given location in program order to the first processor core. In response to receiving an indication that a condition for migration is satisfied, the OS scheduler moves the live values to a location indicated by the data structure for access by the second processor core and schedules code after the given location to the second processor core.

Type: Application

Filed: May 16, 2011

Publication date: November 22, 2012

Inventors: Mauricio Breternitz, Patryk Kaminski, Keith Lowery, Anton Chernoff, Dz-Ching Ju
AUTOMATIC LOAD BALANCING FOR HETEROGENEOUS CORES

Publication number: 20120291040

Abstract: A system and method for efficient automatic scheduling of the execution of work units between multiple heterogeneous processor cores. A processing node includes a first processor core with a general-purpose micro-architecture and a second processor core with a single instruction multiple data micro-architecture. A computer program comprises one or more compute kernels, or function calls. A compiler computes pre-runtime information of the given function call. A runtime scheduler produces one or more work units by matching each of the one or more kernels with an associated record of data. The scheduler assigns work units either to the first or to the second processor core based at least in part on the computed pre-runtime information. In addition, the scheduler is able to change an original assignment for a waiting work unit based on dynamic runtime behavior of other work units corresponding to a same kernel as the waiting work unit.

Type: Application

Filed: May 11, 2011

Publication date: November 15, 2012

Inventors: Mauricio Breternitz, Patryk Kaminski, Keith Lowery, Anton Chernoff
System and method for NUMA-aware heap memory management

Patent number: 8245008

Abstract: A system and method for allocating memory to multi-threaded programs on a Non-Uniform Memory Access (NUMA) computer system using a NUMA-aware memory heap manager is disclosed. In embodiments, a NUMA-aware memory heap manager may attempt to maximize the locality of memory allocations in a NUMA system by allocating memory blocks that are near, or on the same node, as the thread that requested the memory allocation. A heap manager may keep track of each memory block's location and satisfy allocation requests by determining an allocation node dependent, at least in part, on its locality to that of the requesting thread. When possible, a heap manger may attempt to allocate memory on the same node as the requesting thread. The heap manager may be non-application-specific, may employ multiple levels of free block caching, and/or may employ various listings that associate given memory blocks with each NUMA node.

Type: Grant

Filed: February 18, 2009

Date of Patent: August 14, 2012

Assignee: Advanced Micro Devices, Inc.

Inventors: Patryk Kaminski, Keith Lowery
Task Scheduling

Publication number: 20120194526

Abstract: Systems, methods, and articles of manufacture for optimizing task scheduling on an accelerated processing device (APD) device are provided. In an embodiment, a method comprises: enqueuing, using the APD, one or more tasks in a memory storage; and dequeuing, using the APD, the one or more tasks from the memory storage using a hardware-based command processor, wherein the command processor forwards the one or more tasks to a shader core.

Type: Application

Filed: November 30, 2011

Publication date: August 2, 2012

Inventors: Benjamin Thomas SANDER, Michael Houston, Newton Cheung, Keith Lowery
Dynamic Work Partitioning on Heterogeneous Processing Devices

Publication number: 20120192201

Abstract: A method, system and article of manufacture for balancing a workload on heterogeneous processing devices.

Type: Application

Filed: November 2, 2011

Publication date: July 26, 2012

Inventors: Benjamin Thomas SANDER, Michael Houston, Newton Cheung, Keith Lowery
Heterogeneous Enqueuinig and Dequeuing Mechanism for Task Scheduling

Publication number: 20120180056

Abstract: Methods, systems and computer-readable mediums for task scheduling on an accelerated processing device (APD) are provided. In an embodiment, a method comprises: enqueuing one or more tasks in a memory storage module based on the APD; using a software-based enqueuing module; and dequeuing the one or more tasks from the memory storage module using a hardware-based command processor, wherein the command processor forwards the one or more tasks to the shader cote.

Type: Application

Filed: November 9, 2011

Publication date: July 12, 2012

Inventors: Benjamin Thomas SANDER, Michael Houston, Newton Cheung, Keith Lowery
Optimizing Communication of System Call Requests

Publication number: 20120180072

Abstract: Provided herein is a method for optimizing communication for system calls. The method includes storing a system call for each work item in a wavefront and transmitting said stored system calls to a processor for execution. The method also includes receiving a result to each work item in the wavefront responsive to said transmitting.

Type: Application

Filed: November 30, 2011

Publication date: July 12, 2012

Inventors: Benjamin Thomas SANDER, Michael Houston, Newton Cheung, Keith Lowery
Computer System Interrupt Handling

Publication number: 20120179851

Abstract: A system, method and article of manufacture for an accelerated processing device (APD) to request a central processing unit (CPU) to process a task, comprising enqueuing a plurality of tasks on a queue using the APD, generating a user-level interrupt and transmitting to the CPU the plurality of tasks in the queue using an interrupt handler associated with a CPU thread.

Type: Application

Filed: November 9, 2011

Publication date: July 12, 2012

Applicant: Advanced Micro Devices, Inc.

Inventors: Benjamin Thomas SANDER, Michael Houston, Newton Cheung, Keith Lowery
System and Method for NUMA-Aware Heap Memory Management

Publication number: 20100211756

Abstract: A system and method for allocating memory to multi-threaded programs on a Non-Uniform Memory Access (NUMA) computer system using a NUMA-aware memory heap manager is disclosed. In embodiments, a NUMA-aware memory heap manager may attempt to maximize the locality of memory allocations in a NUMA system by allocating memory blocks that are near, or on the same node, as the thread that requested the memory allocation. A heap manager may keep track of each memory block's location and satisfy allocation requests by determining an allocation node dependent, at least in part, on its locality to that of the requesting thread. When possible, a heap manger may attempt to allocate memory on the same node as the requesting thread. The heap manager may be non-application-specific, may employ multiple levels of free block caching, and/or may employ various listings that associate given memory blocks with each NUMA node.

Type: Application

Filed: February 18, 2009

Publication date: August 19, 2010

Inventors: Patryk Kaminski, Keith Lowery
Method and System for Dynamic Distributed Data Caching

Publication number: 20070150575

Abstract: A method and system for dynamic distributed data caching is presented. The method includes providing a cache community (402) comprising at least one peer (413). Each peer has an associated first content portion (511) indicating content to be cached by the respective peer. A client (404) may be allowed to join the cache community. A peer list (426) associated with the cache community is updated to include the client. The peer list indicates the peers in the cache community. A respective second content portion (511) is associated with each peer based on the addition of the client.

Type: Application

Filed: March 2, 2007

Publication date: June 28, 2007

Applicant: epicRealm Operating Inc.

Inventors: Keith Lowery, Bryan Chin, David Consolver, Gregg DeMasters
Method and System for Dynamic Distributed Data Caching

Publication number: 20070150577

Abstract: A method and system for dynamic distributed data caching is presented. The method includes providing a cache community (402) comprising at least one peer (413). Each peer has an associated first content portion (511) indicating content to be cached by the respective peer. A client (404) may be allowed to join the cache community. A peer list (426) associated with the cache community is updated to include the client. The peer list indicates the peers in the cache community. A respective second content portion (511) is associated with each peer based on the addition of the client.

Type: Application

Filed: March 2, 2007

Publication date: June 28, 2007

Applicant: epicRealm Operating Inc.

Inventors: Keith Lowery, Bryan Chin, David Consolver, Gregg DeMasters

prev 1 2 3 next