Patents by Inventor Keith Lowery

Keith Lowery has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20160371116
    Abstract: Methods, systems and computer-readable mediums for task scheduling on an accelerated processing device (APD) are provided. In an embodiment, a method comprises: enqueuing one or more tasks in a memory storage module based on the APD; using a software-based enqueuing module; and dequeuing the one or more tasks from the memory storage module using a hardware-based command processor, wherein the command processor forwards the one or more tasks to the shader cote.
    Type: Application
    Filed: August 29, 2016
    Publication date: December 22, 2016
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Benjamin Thomas Sander, Michael Houston, Newton Cheung, Keith Lowery
  • Patent number: 9430281
    Abstract: Methods, systems and computer-readable mediums for task scheduling on an accelerated processing device (APD) are provided. In an embodiment, a method comprises: enqueuing one or more tasks in a memory storage module based on the APD; using a software-based enqueuing module; and dequeuing the one or more tasks from the memory storage module using a hardware-based command processor, wherein the command processor forwards the one or more tasks to the shader core.
    Type: Grant
    Filed: November 9, 2011
    Date of Patent: August 30, 2016
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Benjamin Thomas Sander, Michael Houston, Newton Cheung, Keith Lowery
  • Patent number: 8782645
    Abstract: A system and method for efficient automatic scheduling of the execution of work units between multiple heterogeneous processor cores. A processing node includes a first processor core with a general-purpose micro-architecture and a second processor core with a single instruction multiple data micro-architecture. A computer program comprises one or more compute kernels, or function calls. A compiler computes pre-runtime information of the given function call. A runtime scheduler produces one or more work units by matching each of the one or more kernels with an associated record of data. The scheduler assigns work units either to the first or to the second processor core based at least in part on the computed pre-runtime information. In addition, the scheduler is able to change an original assignment for a waiting work unit based on dynamic runtime behavior of other work units corresponding to a same kernel as the waiting work unit.
    Type: Grant
    Filed: May 11, 2011
    Date of Patent: July 15, 2014
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Mauricio Breternitz, Patryk Kaminski, Keith Lowery, Anton Chernoff
  • Patent number: 8752064
    Abstract: Provided herein is a method for optimizing communication for system calls. The method includes storing a system call for each work item in a wavefront and transmitting said stored system calls to a processor for execution. The method also includes receiving a result to each work item in the wavefront responsive to said transmitting.
    Type: Grant
    Filed: November 30, 2011
    Date of Patent: June 10, 2014
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Benjamin Thomas Sander, Michael Houston, Newton Cheung, Keith Lowery
  • Patent number: 8719543
    Abstract: Systems and methods are provided that utilize non-shared page tables to allow an accelerator device to share physical memory of a computer system that is managed by and operates under control of an operating system. The computer system can include a multi-core central processor unit. The accelerator device can be, for example, an isolated core processor device of the multi-core central processor unit that is sequestered for use independently of the operating system, or an external device that is communicatively coupled to the computer system.
    Type: Grant
    Filed: December 29, 2009
    Date of Patent: May 6, 2014
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Patryk Kaminski, Thomas Woller, Keith Lowery, Erich Boleyn
  • Patent number: 8683468
    Abstract: A system and method for automatically migrating the execution of work units between multiple heterogeneous cores. A computing system includes a first processor core with a single instruction multiple data micro-architecture and a second processor core with a general-purpose micro-architecture. A compiler predicts execution of a function call in a program migrates at a given location to a different processor core. The compiler creates a data structure to support moving live values associated with the execution of the function call at the given location. An operating system (OS) scheduler schedules at least code before the given location in program order to the first processor core. In response to receiving an indication that a condition for migration is satisfied, the OS scheduler moves the live values to a location indicated by the data structure for access by the second processor core and schedules code after the given location to the second processor core.
    Type: Grant
    Filed: May 16, 2011
    Date of Patent: March 25, 2014
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Mauricio Breternitz, Patryk Kaminski, Keith Lowery, Anton Chernoff, Dz-Ching Ju
  • Patent number: 8667201
    Abstract: A system, method and article of manufacture for an accelerated processing device (APD) to request a central processing unit (CPU) to process a task, comprising enqueuing a plurality of tasks on a queue using the APD, generating a user-level interrupt and transmitting to the CPU the plurality of tasks in the queue using an interrupt handler associated with a CPU thread.
    Type: Grant
    Filed: November 9, 2011
    Date of Patent: March 4, 2014
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Benjamin Thomas Sander, Michael Houston, Newton Cheung, Keith Lowery
  • Publication number: 20130263144
    Abstract: Embodiments described herein include a system, a computer-readable medium and a computer-implemented method for processing a system call (SYSCALL) request. The SYSCALL request from an invisible processing device is stored in a queueing mechanism that is accessible to a visible processing device, where the visible processing device is visible to an operating system and the invisible processing device is invisible to the operating system. The SYSCALL request is processed using the visible processing device, and the invisible processing device is notified using a notification mechanism that the SYSCALL request was processed.
    Type: Application
    Filed: March 29, 2013
    Publication date: October 3, 2013
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Benjamin Thomas Sander, Michael Clair Houston, Keith Lowery, Newton Cheung
  • Publication number: 20120331278
    Abstract: A system and method for automatically optimizing parallel execution of multiple work units in a processor by reducing a number of branch instructions. A computing system includes a first processor core with a general-purpose micro-architecture and a second processor core with a same instruction multiple data (SIMD) micro-architecture. A compiler detects and evaluates branches within function calls with one or more records of data used to determine one or more outcomes. Multiple compute sub-kernels are generated, each comprising code from the function corresponding to a unique outcome of the branch. Multiple work units are produced by assigning one or more records of data corresponding to a given outcome of the branch to one of the multiple compute sub-kernels associated with the given outcome. The branch is removed. An operating system scheduler schedules each of the one or more compute sub-kernels to the first processor core or to the second processor core.
    Type: Application
    Filed: June 23, 2011
    Publication date: December 27, 2012
    Inventors: Mauricio Breternitz, Patryk Kaminski, Keith Lowery
  • Publication number: 20120297163
    Abstract: A system and method for automatically migrating the execution of work units between multiple heterogeneous cores. A computing system includes a first processor core with a single instruction multiple data micro-architecture and a second processor core with a general-purpose micro-architecture. A compiler predicts execution of a function call in a program migrates at a given location to a different processor core. The compiler creates a data structure to support moving live values associated with the execution of the function call at the given location. An operating system (OS) scheduler schedules at least code before the given location in program order to the first processor core. In response to receiving an indication that a condition for migration is satisfied, the OS scheduler moves the live values to a location indicated by the data structure for access by the second processor core and schedules code after the given location to the second processor core.
    Type: Application
    Filed: May 16, 2011
    Publication date: November 22, 2012
    Inventors: Mauricio Breternitz, Patryk Kaminski, Keith Lowery, Anton Chernoff, Dz-Ching Ju
  • Publication number: 20120291040
    Abstract: A system and method for efficient automatic scheduling of the execution of work units between multiple heterogeneous processor cores. A processing node includes a first processor core with a general-purpose micro-architecture and a second processor core with a single instruction multiple data micro-architecture. A computer program comprises one or more compute kernels, or function calls. A compiler computes pre-runtime information of the given function call. A runtime scheduler produces one or more work units by matching each of the one or more kernels with an associated record of data. The scheduler assigns work units either to the first or to the second processor core based at least in part on the computed pre-runtime information. In addition, the scheduler is able to change an original assignment for a waiting work unit based on dynamic runtime behavior of other work units corresponding to a same kernel as the waiting work unit.
    Type: Application
    Filed: May 11, 2011
    Publication date: November 15, 2012
    Inventors: Mauricio Breternitz, Patryk Kaminski, Keith Lowery, Anton Chernoff
  • Patent number: 8245008
    Abstract: A system and method for allocating memory to multi-threaded programs on a Non-Uniform Memory Access (NUMA) computer system using a NUMA-aware memory heap manager is disclosed. In embodiments, a NUMA-aware memory heap manager may attempt to maximize the locality of memory allocations in a NUMA system by allocating memory blocks that are near, or on the same node, as the thread that requested the memory allocation. A heap manager may keep track of each memory block's location and satisfy allocation requests by determining an allocation node dependent, at least in part, on its locality to that of the requesting thread. When possible, a heap manger may attempt to allocate memory on the same node as the requesting thread. The heap manager may be non-application-specific, may employ multiple levels of free block caching, and/or may employ various listings that associate given memory blocks with each NUMA node.
    Type: Grant
    Filed: February 18, 2009
    Date of Patent: August 14, 2012
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Patryk Kaminski, Keith Lowery
  • Publication number: 20120194526
    Abstract: Systems, methods, and articles of manufacture for optimizing task scheduling on an accelerated processing device (APD) device are provided. In an embodiment, a method comprises: enqueuing, using the APD, one or more tasks in a memory storage; and dequeuing, using the APD, the one or more tasks from the memory storage using a hardware-based command processor, wherein the command processor forwards the one or more tasks to a shader core.
    Type: Application
    Filed: November 30, 2011
    Publication date: August 2, 2012
    Inventors: Benjamin Thomas SANDER, Michael Houston, Newton Cheung, Keith Lowery
  • Publication number: 20120192201
    Abstract: A method, system and article of manufacture for balancing a workload on heterogeneous processing devices.
    Type: Application
    Filed: November 2, 2011
    Publication date: July 26, 2012
    Inventors: Benjamin Thomas SANDER, Michael Houston, Newton Cheung, Keith Lowery
  • Publication number: 20120180056
    Abstract: Methods, systems and computer-readable mediums for task scheduling on an accelerated processing device (APD) are provided. In an embodiment, a method comprises: enqueuing one or more tasks in a memory storage module based on the APD; using a software-based enqueuing module; and dequeuing the one or more tasks from the memory storage module using a hardware-based command processor, wherein the command processor forwards the one or more tasks to the shader cote.
    Type: Application
    Filed: November 9, 2011
    Publication date: July 12, 2012
    Inventors: Benjamin Thomas SANDER, Michael Houston, Newton Cheung, Keith Lowery
  • Publication number: 20120180072
    Abstract: Provided herein is a method for optimizing communication for system calls. The method includes storing a system call for each work item in a wavefront and transmitting said stored system calls to a processor for execution. The method also includes receiving a result to each work item in the wavefront responsive to said transmitting.
    Type: Application
    Filed: November 30, 2011
    Publication date: July 12, 2012
    Inventors: Benjamin Thomas SANDER, Michael Houston, Newton Cheung, Keith Lowery
  • Publication number: 20120179851
    Abstract: A system, method and article of manufacture for an accelerated processing device (APD) to request a central processing unit (CPU) to process a task, comprising enqueuing a plurality of tasks on a queue using the APD, generating a user-level interrupt and transmitting to the CPU the plurality of tasks in the queue using an interrupt handler associated with a CPU thread.
    Type: Application
    Filed: November 9, 2011
    Publication date: July 12, 2012
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Benjamin Thomas SANDER, Michael Houston, Newton Cheung, Keith Lowery
  • Publication number: 20100211756
    Abstract: A system and method for allocating memory to multi-threaded programs on a Non-Uniform Memory Access (NUMA) computer system using a NUMA-aware memory heap manager is disclosed. In embodiments, a NUMA-aware memory heap manager may attempt to maximize the locality of memory allocations in a NUMA system by allocating memory blocks that are near, or on the same node, as the thread that requested the memory allocation. A heap manager may keep track of each memory block's location and satisfy allocation requests by determining an allocation node dependent, at least in part, on its locality to that of the requesting thread. When possible, a heap manger may attempt to allocate memory on the same node as the requesting thread. The heap manager may be non-application-specific, may employ multiple levels of free block caching, and/or may employ various listings that associate given memory blocks with each NUMA node.
    Type: Application
    Filed: February 18, 2009
    Publication date: August 19, 2010
    Inventors: Patryk Kaminski, Keith Lowery
  • Publication number: 20070150575
    Abstract: A method and system for dynamic distributed data caching is presented. The method includes providing a cache community (402) comprising at least one peer (413). Each peer has an associated first content portion (511) indicating content to be cached by the respective peer. A client (404) may be allowed to join the cache community. A peer list (426) associated with the cache community is updated to include the client. The peer list indicates the peers in the cache community. A respective second content portion (511) is associated with each peer based on the addition of the client.
    Type: Application
    Filed: March 2, 2007
    Publication date: June 28, 2007
    Applicant: epicRealm Operating Inc.
    Inventors: Keith Lowery, Bryan Chin, David Consolver, Gregg DeMasters
  • Publication number: 20070150577
    Abstract: A method and system for dynamic distributed data caching is presented. The method includes providing a cache community (402) comprising at least one peer (413). Each peer has an associated first content portion (511) indicating content to be cached by the respective peer. A client (404) may be allowed to join the cache community. A peer list (426) associated with the cache community is updated to include the client. The peer list indicates the peers in the cache community. A respective second content portion (511) is associated with each peer based on the addition of the client.
    Type: Application
    Filed: March 2, 2007
    Publication date: June 28, 2007
    Applicant: epicRealm Operating Inc.
    Inventors: Keith Lowery, Bryan Chin, David Consolver, Gregg DeMasters