Patents by Inventor Gary R. Frost
Gary R. Frost has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Power-efficient nested map-reduce execution on a cloud of heterogeneous accelerated processing units
Patent number: 9152601Abstract: An approach and a method for efficient execution of nested map-reduce framework workloads to take advantage of the combined execution of central processing units (CPUs) and graphics processing units (GPUs) and lower latency of data access in accelerated processing units (APUs) is described. In embodiments, metrics are generated to determine whether a map or reduce function is more efficiently processed on a CPU or a GPU. A first metric is based on ratio of a number of branch instructions to a number of non-branch instructions, and a second metric is based on the comparison of execution times on each of the CPU and the GPU. Selecting execution of map and reduce functions based on the first and second metrics result in accelerated computations. Some embodiments include scheduling pipelined executions of functions on the CPU and functions on the GPU concurrently to achieve power-efficient nested map reduce framework execution.Type: GrantFiled: May 9, 2013Date of Patent: October 6, 2015Assignee: Advanced Micro Devices, Inc.Inventors: Patryk Kaminski, Mauricio Breternitz, Gary R. Frost, Christophe Harle -
POWER-EFFICIENT NESTED MAP-REDUCE EXECUTION ON A CLOUD OF HETEROGENEOUS ACCELERATED PROCESSING UNITS
Publication number: 20140333638Abstract: An approach and a method for efficient execution of nested map-reduce framework workloads to take advantage of the combined execution of central processing units (CPUs) and graphics processing units (GPUs) and lower latency of data access in accelerated processing units (APUs) is described. In embodiments, metrics are generated to determine whether a map or reduce function is more efficiently processed on a CPU or a GPU. A first metric is based on ratio of a number of branch instructions to a number of non-branch instructions, and a second metric is based on the comparison of execution times on each of the CPU and the GPU. Selecting execution of map and reduce functions based on the first and second metrics result in accelerated computations. Some embodiments include scheduling pipelined executions of functions on the CPU and functions on the GPU concurrently to achieve power-efficient nested map reduce framework execution.Type: ApplicationFiled: May 9, 2013Publication date: November 13, 2014Applicant: Advanced Micro Devices, Inc.Inventors: Patryk KAMINSKI, Mauricio Breternitz, Gary R. Frost, Christophe Harle -
Patent number: 8639730Abstract: A system and method for efficient garbage collection. A general-purpose central processing unit (CPU) sends a garbage collection request and a first log to a special processing unit (SPU). The first log includes an address and a data size of each allocated data object stored in a heap in memory corresponding to the CPU. The SPU has a single instruction multiple data (SIMD) parallel architecture and may be a graphics processing unit (GPU). The SPU efficiently performs operations of a garbage collection algorithm due to its architecture on a local representation of the data objects stored in the memory. The SPU records a list of changes it performs to remove dead data objects and compact live data objects. This list is subsequently sent to the CPU, which performs the included operations.Type: GrantFiled: September 24, 2012Date of Patent: January 28, 2014Assignee: Advanced Micro Devices, Inc.Inventors: Azeem S. Jiva, Gary R. Frost
-
Patent number: 8473900Abstract: A system and method for creating synthetic immutable classes. A processor identifies first and second classes, instances of which include first and second data fields, respectively. The first data fields include a data field that references the second class. In response to determining that the first class is immutable and the second class is immutable, the processor constructs a first synthetic immutable class, an instance of which comprises a combination of the first data fields and the second data fields. The processor creates an instance of the first synthetic immutable class in which the first data fields and the second data fields occupy a contiguous region of a memory. In response to determining the first synthetic immutable class does not include an accessor for the second class, the processor combines header fields of the first and second data fields into a single data field in the first synthetic immutable class.Type: GrantFiled: July 1, 2009Date of Patent: June 25, 2013Assignee: Advanced Micro Devices, Inc.Inventor: Gary R. Frost
-
Patent number: 8301672Abstract: A system and method for efficient garbage collection. A general-purpose central processing unit (CPU) sends a garbage collection request and a first log to a special processing unit (SPU). The first log includes an address and a data size of each allocated data object stored in a heap in memory corresponding to the CPU. The SPU has a single instruction multiple data (SIMD) parallel architecture and may be a graphics processing unit (GPU). The SPU efficiently performs operations of a garbage collection algorithm due to its architecture on a local representation of the data objects stored in the memory. The SPU records a list of changes it performs to remove dead data objects and compact live data objects. This list is subsequently sent to the CPU, which performs the included operations.Type: GrantFiled: September 22, 2008Date of Patent: October 30, 2012Assignee: Advanced Micro Devices, Inc.Inventors: Azeem S. Jiva, Gary R. Frost
-
Patent number: 8195583Abstract: A system and method are disclosed for correlating instruction sequences. A plurality of instructions is processed to parse a first sequence of instructions comprising a first area of interest. A first instruction sequence pattern is then generated from the first sequence of instructions. Pattern matching operations are performed with the first instruction sequence pattern. A second sequence of instructions are parsed, comprising a second instruction sequence pattern and a second address of interest that is a substantially equivalent match to the first instruction sequence pattern.Type: GrantFiled: May 27, 2009Date of Patent: June 5, 2012Assignee: Advanced Micro Devices, Inc.Inventor: Gary R. Frost
-
Publication number: 20110289519Abstract: Techniques are disclosed relating to distributing workloads between processors. In one embodiment, a computer system includes a first processor and a second processor. The first processor executes program instructions to receive a first set of bytecode specifying a first set of tasks and to determine whether to offload the first set of tasks to the second processor. In response to determining to offload the first set of tasks to the second processor, the program instructions are further executable to cause generation of a set of instructions to perform the first set of tasks, where the set of instructions are in a format different from that of the first set of bytecode, and where the format is supported by the second processor. The program instructions are further executable to cause the second processor to execute the set of instructions by causing the set of instructions to be provided to the second processor.Type: ApplicationFiled: May 21, 2010Publication date: November 24, 2011Inventor: Gary R. Frost
-
Publication number: 20110004866Abstract: A system and method for creating synthetic immutable classes. A processor identifies first and second classes, instances of which include first and second data fields, respectively. The first data fields include a data field that references the second class. In response to determining that the first class is immutable and the second class is immutable, the processor constructs a first synthetic immutable class, an instance of which comprises a combination of the first data fields and the second data fields. The processor creates an instance of the first synthetic immutable class in which the first data fields and the second data fields occupy a contiguous region of a memory. In response to determining the first synthetic immutable class does not include an accessor for the second class, the processor combines header fields of the first and second data fields into a single data field in the first synthetic immutable class.Type: ApplicationFiled: July 1, 2009Publication date: January 6, 2011Inventor: Gary R. Frost
-
Publication number: 20100306514Abstract: A system and method are disclosed for correlating instruction sequences. A plurality of instructions is processed to parse a first sequence of instructions comprising a first area of interest. A first instruction sequence pattern is then generated from the first sequence of instructions. Pattern matching operations are performed with the first instruction sequence pattern. A second sequence of instructions are parsed, comprising a second instruction sequence pattern and a second address of interest that is a substantially equivalent match to the first instruction sequence pattern.Type: ApplicationFiled: May 27, 2009Publication date: December 2, 2010Inventor: Gary R. Frost
-
Publication number: 20100115502Abstract: A system and method are disclosed for improving the performance of compiled Java code. Java source code is annotated and then compiled by a Java compiler to produce annotated Java bytecode, which in turn is compiled by a just-in-time (JIT) compiler into annotated native code. The execution of the annotated native code is monitored with a patching agent, which captures the annotated native code as it is being executed. The captured native code is then provided through an application program interface to a dynamic linkage module, which in turn provides the captured native code to a user or to an application plug-in module for modifications. The modifications are saved as a patch. The annotated native code is then re-executed and the modifications to the annotated native code are applied as a patch by the patching agent.Type: ApplicationFiled: November 6, 2008Publication date: May 6, 2010Inventors: Azeem S. Jiva, Gary R. Frost
-
Publication number: 20100082930Abstract: A system and method for efficient garbage collection. A general-purpose central processing unit (CPU) sends a garbage collection request and a first log to a special processing unit (SPU). The first log includes an address and a data size of each allocated data object stored in a heap in memory corresponding to the CPU. The SPU has a single instruction multiple data (SIMD) parallel architecture and may be a graphics processing unit (GPU). The SPU efficiently performs operations of a garbage collection algorithm due to its architecture on a local representation of the data objects stored in the memory. The SPU records a list of changes it performs to remove dead data objects and compact live data objects. This list is subsequently sent to the CPU, which performs the included operations.Type: ApplicationFiled: September 22, 2008Publication date: April 1, 2010Inventors: Azeem S. Jiva, Gary R. Frost