Patents by Inventor Keith Lowery
Keith Lowery has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20160371116Abstract: Methods, systems and computer-readable mediums for task scheduling on an accelerated processing device (APD) are provided. In an embodiment, a method comprises: enqueuing one or more tasks in a memory storage module based on the APD; using a software-based enqueuing module; and dequeuing the one or more tasks from the memory storage module using a hardware-based command processor, wherein the command processor forwards the one or more tasks to the shader cote.Type: ApplicationFiled: August 29, 2016Publication date: December 22, 2016Applicant: Advanced Micro Devices, Inc.Inventors: Benjamin Thomas Sander, Michael Houston, Newton Cheung, Keith Lowery
-
Patent number: 9430281Abstract: Methods, systems and computer-readable mediums for task scheduling on an accelerated processing device (APD) are provided. In an embodiment, a method comprises: enqueuing one or more tasks in a memory storage module based on the APD; using a software-based enqueuing module; and dequeuing the one or more tasks from the memory storage module using a hardware-based command processor, wherein the command processor forwards the one or more tasks to the shader core.Type: GrantFiled: November 9, 2011Date of Patent: August 30, 2016Assignee: Advanced Micro Devices, Inc.Inventors: Benjamin Thomas Sander, Michael Houston, Newton Cheung, Keith Lowery
-
Patent number: 8782645Abstract: A system and method for efficient automatic scheduling of the execution of work units between multiple heterogeneous processor cores. A processing node includes a first processor core with a general-purpose micro-architecture and a second processor core with a single instruction multiple data micro-architecture. A computer program comprises one or more compute kernels, or function calls. A compiler computes pre-runtime information of the given function call. A runtime scheduler produces one or more work units by matching each of the one or more kernels with an associated record of data. The scheduler assigns work units either to the first or to the second processor core based at least in part on the computed pre-runtime information. In addition, the scheduler is able to change an original assignment for a waiting work unit based on dynamic runtime behavior of other work units corresponding to a same kernel as the waiting work unit.Type: GrantFiled: May 11, 2011Date of Patent: July 15, 2014Assignee: Advanced Micro Devices, Inc.Inventors: Mauricio Breternitz, Patryk Kaminski, Keith Lowery, Anton Chernoff
-
Patent number: 8752064Abstract: Provided herein is a method for optimizing communication for system calls. The method includes storing a system call for each work item in a wavefront and transmitting said stored system calls to a processor for execution. The method also includes receiving a result to each work item in the wavefront responsive to said transmitting.Type: GrantFiled: November 30, 2011Date of Patent: June 10, 2014Assignee: Advanced Micro Devices, Inc.Inventors: Benjamin Thomas Sander, Michael Houston, Newton Cheung, Keith Lowery
-
Patent number: 8719543Abstract: Systems and methods are provided that utilize non-shared page tables to allow an accelerator device to share physical memory of a computer system that is managed by and operates under control of an operating system. The computer system can include a multi-core central processor unit. The accelerator device can be, for example, an isolated core processor device of the multi-core central processor unit that is sequestered for use independently of the operating system, or an external device that is communicatively coupled to the computer system.Type: GrantFiled: December 29, 2009Date of Patent: May 6, 2014Assignee: Advanced Micro Devices, Inc.Inventors: Patryk Kaminski, Thomas Woller, Keith Lowery, Erich Boleyn
-
Patent number: 8683468Abstract: A system and method for automatically migrating the execution of work units between multiple heterogeneous cores. A computing system includes a first processor core with a single instruction multiple data micro-architecture and a second processor core with a general-purpose micro-architecture. A compiler predicts execution of a function call in a program migrates at a given location to a different processor core. The compiler creates a data structure to support moving live values associated with the execution of the function call at the given location. An operating system (OS) scheduler schedules at least code before the given location in program order to the first processor core. In response to receiving an indication that a condition for migration is satisfied, the OS scheduler moves the live values to a location indicated by the data structure for access by the second processor core and schedules code after the given location to the second processor core.Type: GrantFiled: May 16, 2011Date of Patent: March 25, 2014Assignee: Advanced Micro Devices, Inc.Inventors: Mauricio Breternitz, Patryk Kaminski, Keith Lowery, Anton Chernoff, Dz-Ching Ju
-
Patent number: 8667201Abstract: A system, method and article of manufacture for an accelerated processing device (APD) to request a central processing unit (CPU) to process a task, comprising enqueuing a plurality of tasks on a queue using the APD, generating a user-level interrupt and transmitting to the CPU the plurality of tasks in the queue using an interrupt handler associated with a CPU thread.Type: GrantFiled: November 9, 2011Date of Patent: March 4, 2014Assignee: Advanced Micro Devices, Inc.Inventors: Benjamin Thomas Sander, Michael Houston, Newton Cheung, Keith Lowery
-
Publication number: 20130263144Abstract: Embodiments described herein include a system, a computer-readable medium and a computer-implemented method for processing a system call (SYSCALL) request. The SYSCALL request from an invisible processing device is stored in a queueing mechanism that is accessible to a visible processing device, where the visible processing device is visible to an operating system and the invisible processing device is invisible to the operating system. The SYSCALL request is processed using the visible processing device, and the invisible processing device is notified using a notification mechanism that the SYSCALL request was processed.Type: ApplicationFiled: March 29, 2013Publication date: October 3, 2013Applicant: Advanced Micro Devices, Inc.Inventors: Benjamin Thomas Sander, Michael Clair Houston, Keith Lowery, Newton Cheung
-
Publication number: 20120331278Abstract: A system and method for automatically optimizing parallel execution of multiple work units in a processor by reducing a number of branch instructions. A computing system includes a first processor core with a general-purpose micro-architecture and a second processor core with a same instruction multiple data (SIMD) micro-architecture. A compiler detects and evaluates branches within function calls with one or more records of data used to determine one or more outcomes. Multiple compute sub-kernels are generated, each comprising code from the function corresponding to a unique outcome of the branch. Multiple work units are produced by assigning one or more records of data corresponding to a given outcome of the branch to one of the multiple compute sub-kernels associated with the given outcome. The branch is removed. An operating system scheduler schedules each of the one or more compute sub-kernels to the first processor core or to the second processor core.Type: ApplicationFiled: June 23, 2011Publication date: December 27, 2012Inventors: Mauricio Breternitz, Patryk Kaminski, Keith Lowery
-
Publication number: 20120297163Abstract: A system and method for automatically migrating the execution of work units between multiple heterogeneous cores. A computing system includes a first processor core with a single instruction multiple data micro-architecture and a second processor core with a general-purpose micro-architecture. A compiler predicts execution of a function call in a program migrates at a given location to a different processor core. The compiler creates a data structure to support moving live values associated with the execution of the function call at the given location. An operating system (OS) scheduler schedules at least code before the given location in program order to the first processor core. In response to receiving an indication that a condition for migration is satisfied, the OS scheduler moves the live values to a location indicated by the data structure for access by the second processor core and schedules code after the given location to the second processor core.Type: ApplicationFiled: May 16, 2011Publication date: November 22, 2012Inventors: Mauricio Breternitz, Patryk Kaminski, Keith Lowery, Anton Chernoff, Dz-Ching Ju
-
Publication number: 20120291040Abstract: A system and method for efficient automatic scheduling of the execution of work units between multiple heterogeneous processor cores. A processing node includes a first processor core with a general-purpose micro-architecture and a second processor core with a single instruction multiple data micro-architecture. A computer program comprises one or more compute kernels, or function calls. A compiler computes pre-runtime information of the given function call. A runtime scheduler produces one or more work units by matching each of the one or more kernels with an associated record of data. The scheduler assigns work units either to the first or to the second processor core based at least in part on the computed pre-runtime information. In addition, the scheduler is able to change an original assignment for a waiting work unit based on dynamic runtime behavior of other work units corresponding to a same kernel as the waiting work unit.Type: ApplicationFiled: May 11, 2011Publication date: November 15, 2012Inventors: Mauricio Breternitz, Patryk Kaminski, Keith Lowery, Anton Chernoff
-
Patent number: 8245008Abstract: A system and method for allocating memory to multi-threaded programs on a Non-Uniform Memory Access (NUMA) computer system using a NUMA-aware memory heap manager is disclosed. In embodiments, a NUMA-aware memory heap manager may attempt to maximize the locality of memory allocations in a NUMA system by allocating memory blocks that are near, or on the same node, as the thread that requested the memory allocation. A heap manager may keep track of each memory block's location and satisfy allocation requests by determining an allocation node dependent, at least in part, on its locality to that of the requesting thread. When possible, a heap manger may attempt to allocate memory on the same node as the requesting thread. The heap manager may be non-application-specific, may employ multiple levels of free block caching, and/or may employ various listings that associate given memory blocks with each NUMA node.Type: GrantFiled: February 18, 2009Date of Patent: August 14, 2012Assignee: Advanced Micro Devices, Inc.Inventors: Patryk Kaminski, Keith Lowery
-
Publication number: 20120194526Abstract: Systems, methods, and articles of manufacture for optimizing task scheduling on an accelerated processing device (APD) device are provided. In an embodiment, a method comprises: enqueuing, using the APD, one or more tasks in a memory storage; and dequeuing, using the APD, the one or more tasks from the memory storage using a hardware-based command processor, wherein the command processor forwards the one or more tasks to a shader core.Type: ApplicationFiled: November 30, 2011Publication date: August 2, 2012Inventors: Benjamin Thomas SANDER, Michael Houston, Newton Cheung, Keith Lowery
-
Publication number: 20120192201Abstract: A method, system and article of manufacture for balancing a workload on heterogeneous processing devices.Type: ApplicationFiled: November 2, 2011Publication date: July 26, 2012Inventors: Benjamin Thomas SANDER, Michael Houston, Newton Cheung, Keith Lowery
-
Publication number: 20120180056Abstract: Methods, systems and computer-readable mediums for task scheduling on an accelerated processing device (APD) are provided. In an embodiment, a method comprises: enqueuing one or more tasks in a memory storage module based on the APD; using a software-based enqueuing module; and dequeuing the one or more tasks from the memory storage module using a hardware-based command processor, wherein the command processor forwards the one or more tasks to the shader cote.Type: ApplicationFiled: November 9, 2011Publication date: July 12, 2012Inventors: Benjamin Thomas SANDER, Michael Houston, Newton Cheung, Keith Lowery
-
Publication number: 20120180072Abstract: Provided herein is a method for optimizing communication for system calls. The method includes storing a system call for each work item in a wavefront and transmitting said stored system calls to a processor for execution. The method also includes receiving a result to each work item in the wavefront responsive to said transmitting.Type: ApplicationFiled: November 30, 2011Publication date: July 12, 2012Inventors: Benjamin Thomas SANDER, Michael Houston, Newton Cheung, Keith Lowery
-
Publication number: 20120179851Abstract: A system, method and article of manufacture for an accelerated processing device (APD) to request a central processing unit (CPU) to process a task, comprising enqueuing a plurality of tasks on a queue using the APD, generating a user-level interrupt and transmitting to the CPU the plurality of tasks in the queue using an interrupt handler associated with a CPU thread.Type: ApplicationFiled: November 9, 2011Publication date: July 12, 2012Applicant: Advanced Micro Devices, Inc.Inventors: Benjamin Thomas SANDER, Michael Houston, Newton Cheung, Keith Lowery
-
Publication number: 20100211756Abstract: A system and method for allocating memory to multi-threaded programs on a Non-Uniform Memory Access (NUMA) computer system using a NUMA-aware memory heap manager is disclosed. In embodiments, a NUMA-aware memory heap manager may attempt to maximize the locality of memory allocations in a NUMA system by allocating memory blocks that are near, or on the same node, as the thread that requested the memory allocation. A heap manager may keep track of each memory block's location and satisfy allocation requests by determining an allocation node dependent, at least in part, on its locality to that of the requesting thread. When possible, a heap manger may attempt to allocate memory on the same node as the requesting thread. The heap manager may be non-application-specific, may employ multiple levels of free block caching, and/or may employ various listings that associate given memory blocks with each NUMA node.Type: ApplicationFiled: February 18, 2009Publication date: August 19, 2010Inventors: Patryk Kaminski, Keith Lowery
-
Publication number: 20070150575Abstract: A method and system for dynamic distributed data caching is presented. The method includes providing a cache community (402) comprising at least one peer (413). Each peer has an associated first content portion (511) indicating content to be cached by the respective peer. A client (404) may be allowed to join the cache community. A peer list (426) associated with the cache community is updated to include the client. The peer list indicates the peers in the cache community. A respective second content portion (511) is associated with each peer based on the addition of the client.Type: ApplicationFiled: March 2, 2007Publication date: June 28, 2007Applicant: epicRealm Operating Inc.Inventors: Keith Lowery, Bryan Chin, David Consolver, Gregg DeMasters
-
Publication number: 20070150577Abstract: A method and system for dynamic distributed data caching is presented. The method includes providing a cache community (402) comprising at least one peer (413). Each peer has an associated first content portion (511) indicating content to be cached by the respective peer. A client (404) may be allowed to join the cache community. A peer list (426) associated with the cache community is updated to include the client. The peer list indicates the peers in the cache community. A respective second content portion (511) is associated with each peer based on the addition of the client.Type: ApplicationFiled: March 2, 2007Publication date: June 28, 2007Applicant: epicRealm Operating Inc.Inventors: Keith Lowery, Bryan Chin, David Consolver, Gregg DeMasters