Patents by Inventor Gary R. Frost

Gary R. Frost has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Power-efficient nested map-reduce execution on a cloud of heterogeneous accelerated processing units

Patent number: 9152601

Abstract: An approach and a method for efficient execution of nested map-reduce framework workloads to take advantage of the combined execution of central processing units (CPUs) and graphics processing units (GPUs) and lower latency of data access in accelerated processing units (APUs) is described. In embodiments, metrics are generated to determine whether a map or reduce function is more efficiently processed on a CPU or a GPU. A first metric is based on ratio of a number of branch instructions to a number of non-branch instructions, and a second metric is based on the comparison of execution times on each of the CPU and the GPU. Selecting execution of map and reduce functions based on the first and second metrics result in accelerated computations. Some embodiments include scheduling pipelined executions of functions on the CPU and functions on the GPU concurrently to achieve power-efficient nested map reduce framework execution.

Type: Grant

Filed: May 9, 2013

Date of Patent: October 6, 2015

Assignee: Advanced Micro Devices, Inc.

Inventors: Patryk Kaminski, Mauricio Breternitz, Gary R. Frost, Christophe Harle
POWER-EFFICIENT NESTED MAP-REDUCE EXECUTION ON A CLOUD OF HETEROGENEOUS ACCELERATED PROCESSING UNITS

Publication number: 20140333638

Abstract: An approach and a method for efficient execution of nested map-reduce framework workloads to take advantage of the combined execution of central processing units (CPUs) and graphics processing units (GPUs) and lower latency of data access in accelerated processing units (APUs) is described. In embodiments, metrics are generated to determine whether a map or reduce function is more efficiently processed on a CPU or a GPU. A first metric is based on ratio of a number of branch instructions to a number of non-branch instructions, and a second metric is based on the comparison of execution times on each of the CPU and the GPU. Selecting execution of map and reduce functions based on the first and second metrics result in accelerated computations. Some embodiments include scheduling pipelined executions of functions on the CPU and functions on the GPU concurrently to achieve power-efficient nested map reduce framework execution.

Type: Application

Filed: May 9, 2013

Publication date: November 13, 2014

Applicant: Advanced Micro Devices, Inc.

Inventors: Patryk KAMINSKI, Mauricio Breternitz, Gary R. Frost, Christophe Harle
GPU assisted garbage collection

Patent number: 8639730

Abstract: A system and method for efficient garbage collection. A general-purpose central processing unit (CPU) sends a garbage collection request and a first log to a special processing unit (SPU). The first log includes an address and a data size of each allocated data object stored in a heap in memory corresponding to the CPU. The SPU has a single instruction multiple data (SIMD) parallel architecture and may be a graphics processing unit (GPU). The SPU efficiently performs operations of a garbage collection algorithm due to its architecture on a local representation of the data objects stored in the memory. The SPU records a list of changes it performs to remove dead data objects and compact live data objects. This list is subsequently sent to the CPU, which performs the included operations.

Type: Grant

Filed: September 24, 2012

Date of Patent: January 28, 2014

Assignee: Advanced Micro Devices, Inc.

Inventors: Azeem S. Jiva, Gary R. Frost
Combining classes referenced by immutable classes into a single synthetic class

Patent number: 8473900

Abstract: A system and method for creating synthetic immutable classes. A processor identifies first and second classes, instances of which include first and second data fields, respectively. The first data fields include a data field that references the second class. In response to determining that the first class is immutable and the second class is immutable, the processor constructs a first synthetic immutable class, an instance of which comprises a combination of the first data fields and the second data fields. The processor creates an instance of the first synthetic immutable class in which the first data fields and the second data fields occupy a contiguous region of a memory. In response to determining the first synthetic immutable class does not include an accessor for the second class, the processor combines header fields of the first and second data fields into a single data field in the first synthetic immutable class.

Type: Grant

Filed: July 1, 2009

Date of Patent: June 25, 2013

Assignee: Advanced Micro Devices, Inc.

Inventor: Gary R. Frost
GPU assisted garbage collection

Patent number: 8301672

Abstract: A system and method for efficient garbage collection. A general-purpose central processing unit (CPU) sends a garbage collection request and a first log to a special processing unit (SPU). The first log includes an address and a data size of each allocated data object stored in a heap in memory corresponding to the CPU. The SPU has a single instruction multiple data (SIMD) parallel architecture and may be a graphics processing unit (GPU). The SPU efficiently performs operations of a garbage collection algorithm due to its architecture on a local representation of the data objects stored in the memory. The SPU records a list of changes it performs to remove dead data objects and compact live data objects. This list is subsequently sent to the CPU, which performs the included operations.

Type: Grant

Filed: September 22, 2008

Date of Patent: October 30, 2012

Assignee: Advanced Micro Devices, Inc.

Inventors: Azeem S. Jiva, Gary R. Frost
Correlating instruction sequences with CPU performance events to improve software performance

Patent number: 8195583

Abstract: A system and method are disclosed for correlating instruction sequences. A plurality of instructions is processed to parse a first sequence of instructions comprising a first area of interest. A first instruction sequence pattern is then generated from the first sequence of instructions. Pattern matching operations are performed with the first instruction sequence pattern. A second sequence of instructions are parsed, comprising a second instruction sequence pattern and a second address of interest that is a substantially equivalent match to the first instruction sequence pattern.

Type: Grant

Filed: May 27, 2009

Date of Patent: June 5, 2012

Assignee: Advanced Micro Devices, Inc.

Inventor: Gary R. Frost
DISTRIBUTING WORKLOADS IN A COMPUTING PLATFORM

Publication number: 20110289519

Abstract: Techniques are disclosed relating to distributing workloads between processors. In one embodiment, a computer system includes a first processor and a second processor. The first processor executes program instructions to receive a first set of bytecode specifying a first set of tasks and to determine whether to offload the first set of tasks to the second processor. In response to determining to offload the first set of tasks to the second processor, the program instructions are further executable to cause generation of a set of instructions to perform the first set of tasks, where the set of instructions are in a format different from that of the first set of bytecode, and where the format is supported by the second processor. The program instructions are further executable to cause the second processor to execute the set of instructions by causing the set of instructions to be provided to the second processor.

Type: Application

Filed: May 21, 2010

Publication date: November 24, 2011

Inventor: Gary R. Frost
COMBINING CLASSES REFERENCED BY IMMUTABLE CLASSES INTO A SINGLE SYNTHETIC CLASS

Publication number: 20110004866

Abstract: A system and method for creating synthetic immutable classes. A processor identifies first and second classes, instances of which include first and second data fields, respectively. The first data fields include a data field that references the second class. In response to determining that the first class is immutable and the second class is immutable, the processor constructs a first synthetic immutable class, an instance of which comprises a combination of the first data fields and the second data fields. The processor creates an instance of the first synthetic immutable class in which the first data fields and the second data fields occupy a contiguous region of a memory. In response to determining the first synthetic immutable class does not include an accessor for the second class, the processor combines header fields of the first and second data fields into a single data field in the first synthetic immutable class.

Type: Application

Filed: July 1, 2009

Publication date: January 6, 2011

Inventor: Gary R. Frost
Correlating Instruction Sequences with CPU Performance Events to Improve Software Performance

Publication number: 20100306514

Abstract: A system and method are disclosed for correlating instruction sequences. A plurality of instructions is processed to parse a first sequence of instructions comprising a first area of interest. A first instruction sequence pattern is then generated from the first sequence of instructions. Pattern matching operations are performed with the first instruction sequence pattern. A second sequence of instructions are parsed, comprising a second instruction sequence pattern and a second address of interest that is a substantially equivalent match to the first instruction sequence pattern.

Type: Application

Filed: May 27, 2009

Publication date: December 2, 2010

Inventor: Gary R. Frost
Post Processing of Dynamically Generated Code

Publication number: 20100115502

Abstract: A system and method are disclosed for improving the performance of compiled Java code. Java source code is annotated and then compiled by a Java compiler to produce annotated Java bytecode, which in turn is compiled by a just-in-time (JIT) compiler into annotated native code. The execution of the annotated native code is monitored with a patching agent, which captures the annotated native code as it is being executed. The captured native code is then provided through an application program interface to a dynamic linkage module, which in turn provides the captured native code to a user or to an application plug-in module for modifications. The modifications are saved as a patch. The annotated native code is then re-executed and the modifications to the annotated native code are applied as a patch by the patching agent.

Type: Application

Filed: November 6, 2008

Publication date: May 6, 2010

Inventors: Azeem S. Jiva, Gary R. Frost
GPU ASSISTED GARBAGE COLLECTION

Publication number: 20100082930

Abstract: A system and method for efficient garbage collection. A general-purpose central processing unit (CPU) sends a garbage collection request and a first log to a special processing unit (SPU). The first log includes an address and a data size of each allocated data object stored in a heap in memory corresponding to the CPU. The SPU has a single instruction multiple data (SIMD) parallel architecture and may be a graphics processing unit (GPU). The SPU efficiently performs operations of a garbage collection algorithm due to its architecture on a local representation of the data objects stored in the memory. The SPU records a list of changes it performs to remove dead data objects and compact live data objects. This list is subsequently sent to the CPU, which performs the included operations.

Type: Application

Filed: September 22, 2008

Publication date: April 1, 2010

Inventors: Azeem S. Jiva, Gary R. Frost