Patents by Inventor Blake A. Hechtman

Blake A. Hechtman has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Write combining cache microarchitecture for synchronization events

Patent number: 9477599

Abstract: A method, computer program product, and system is described that enforces a release consistency with special accesses sequentially consistent (RCsc) memory model and executes release synchronization instructions such as a StRel event without tracking an outstanding store event through a memory hierarchy, while efficiently using bandwidth resources. What is also described is the decoupling of a store event from an ordering of the store event with respect to a RCsc memory model. The description also includes a set of hierarchical read/write combining buffers that coalesce stores from different parts of the system. In addition, a pool component maintains partial order of received store events and release synchronization events to avoid content addressable memory (CAM) structures, full cache flushes, as well as direct write-throughs to memory. The approach improves the performance of both global and local synchronization events since a store event may not need to reach main memory to complete.

Type: Grant

Filed: August 7, 2013

Date of Patent: October 25, 2016

Assignee: Advanced Micro Devices, Inc.

Inventors: Blake A. Hechtman, Bradford M. Beckmann
Mechanisms to save user/kernel copy for cross device communications

Patent number: 9436395

Abstract: Central processing units (CPUs) in computing systems manage graphics processing units (GPUs), network processors, security co-processors, and other data heavy devices as buffered peripherals using device drivers. Unfortunately, as a result of large and latency-sensitive data transfers between CPUs and these external devices, and memory partitioned into kernel-access and user-access spaces, these schemes to manage peripherals may introduce latency and memory use inefficiencies. Proposed are schemes to reduce latency and redundant memory copies using virtual to physical page remapping while maintaining user/kernel level access abstractions.

Type: Grant

Filed: March 14, 2014

Date of Patent: September 6, 2016

Assignee: Advanced Micro Devices, Inc.

Inventors: Blake A. Hechtman, Shuai Che
Runtime for automatically load-balancing and synchronizing heterogeneous computer systems with scoped synchronization

Patent number: 9411652

Abstract: Sharing tasks among compute units in a processor can increase the efficiency of the processor. When a compute unit does not have a task in its task memory to perform, donating tasks from other compute units can prevent the compute unit from being idle while there is task in other parts of the processor. It is desirable to share tasks among compute units that are within defined scopes of the processor. Compute units may share tasks by allowing other compute units to access their private memory, or by donating tasks to a shared memory.

Type: Grant

Filed: August 22, 2014

Date of Patent: August 9, 2016

Assignee: Advanced Micro Devices, Inc.

Inventors: Blake A. Hechtman, Derek R. Hower
Hierarchical write-combining cache coherence

Patent number: 9396112

Abstract: A method, computer program product, and system is described that enforces a release consistency with special accesses sequentially consistent (RCsc) memory model and executes release synchronization instructions such as a StRel event without tracking an outstanding store event through a memory hierarchy, while efficiently using bandwidth resources. What is also described is the decoupling of a store event from an ordering of the store event with respect to a RCsc memory model. The description also includes a set of hierarchical read-only cache and write-only combining buffers that coalesce stores from different parts of the system. In addition, a pool component maintains partial order of received store events and release synchronization events to avoid content addressable memory (CAM) structures, full cache flushes, as well as direct write-throughs to memory. The approach improves the performance of both global and local synchronization events and reduces overhead in maintaining write-only combining buffers.

Type: Grant

Filed: August 26, 2013

Date of Patent: July 19, 2016

Assignee: Advanced Micro Devices, Inc.

Inventors: Blake A. Hechtman, Bradford M. Beckmann
Method for memory consistency among heterogeneous computer components

Patent number: 9361118

Abstract: A method, computer program product, and system is described that determines the correctness of using memory operations in a computing device with heterogeneous computer components. Embodiments include an optimizer based on the characteristics of a Sequential Consistency for Heterogeneous-Race-Free (SC for HRF) model that analyzes a program and determines the correctness of the ordering of events in the program. HRF models include combinations of the properties: scope order, scope inclusion, and scope transitivity. The optimizer can determine when a program is heterogeneous-race-free in accordance with an SC for HRF memory consistency model. For example, the optimizer can analyze a portion of program code, respect the properties of the SC for HRF model, and determine whether a value produced by a store memory event will be a candidate for a value observed by a load memory event. In addition, the optimizer can determine whether reordering of events is possible.

Type: Grant

Filed: May 12, 2014

Date of Patent: June 7, 2016

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Derek R. Hower, Mark D. Hill, David Wood, Steven K. Reinhardt, Benedict R. Gaster, Blake A. Hechtman, Bradford M. Beckmann
RUNTIME FOR AUTOMATICALLY LOAD-BALANCING AND SYNCHRONIZING HETEROGENEOUS COMPUTER SYSTEMS WITH SCOPED SYNCHRONIZATION

Publication number: 20160055033

Abstract: Sharing tasks among compute units in a processor can increase the efficiency of the processor. When a compute unit does not have a task in its task memory to perform, donating tasks from other compute units can prevent the compute unit from being idle while there is task in other parts of the processor. It is desirable to share tasks among compute units that are within defined scopes of the processor. Compute units may share tasks by allowing other compute units to access their private memory, or by donating tasks to a shared memory.

Type: Application

Filed: August 22, 2014

Publication date: February 25, 2016

Applicant: Advanced Micro Devices, Inc.

Inventors: Blake A. HECHTMAN, Derek R. Hower
Mechanisms to Save User/Kernel Copy for Cross Device Communications

Publication number: 20150261457

Abstract: Central processing units (CPUs) in computing systems manage graphics processing units (GPUs), network processors, security co-processors, and other data heavy devices as buffered peripherals using device drivers. Unfortunately, as a result of large and latency-sensitive data transfers between CPUs and these external devices, and memory partitioned into kernel-access and user-access spaces, these schemes to manage peripherals may introduce latency and memory use inefficiencies. Proposed are schemes to reduce latency and redundant memory copies using virtual to physical page remapping while maintaining user/kernel level access abstractions.

Type: Application

Filed: March 14, 2014

Publication date: September 17, 2015

Applicant: Advanced Micro Devices, Inc.

Inventors: Blake A. Hechtman, Shuai Che
DATA REMAPPING FOR HETEROGENEOUS PROCESSOR

Publication number: 20150106587

Abstract: A processor remaps stored data and the corresponding memory addresses of the data for different processing units of a heterogeneous processor. The processor includes a data remap engine that changes the format of the data (that is, how the data is physically arranged in segments of memory) in response to a transfer of the data from system memory to a local memory hierarchy of an accelerated processing module (APM) of the processor. The APM's local memory hierarchy includes an address remap engine that remaps the memory addresses of the data at the local memory hierarchy so that the data can be accessed by routines at the APM that are unaware of the data remapping. By remapping the data, and the corresponding memory addresses, the APM can perform operations on the data more efficiently.

Type: Application

Filed: October 16, 2013

Publication date: April 16, 2015

Applicant: Advanced Micro Devices, Inc.

Inventors: Shuai Che, Bradford Beckmann, Blake Hechtman
HIERARCHICAL WRITE-COMBINING CACHE COHERENCE

Publication number: 20150058567

Abstract: A method, computer program product, and system is described that enforces a release consistency with special accesses sequentially consistent (RCsc) memory model and executes release synchronization instructions such as a StRel event without tracking an outstanding store event through a memory hierarchy, while efficiently using bandwidth resources. What is also described is the decoupling of a store event from an ordering of the store event with respect to a RCsc memory model. The description also includes a set of hierarchical read-only cache and write-only combining buffers that coalesce stores from different parts of the system. In addition, a pool component maintains partial order of received store events and release synchronization events to avoid content addressable memory (CAM) structures, full cache flushes, as well as direct write-throughs to memory. The approach improves the performance of both global and local synchronization events and reduces overhead in maintaining write-only combining buffers.

Type: Application

Filed: August 26, 2013

Publication date: February 26, 2015

Applicant: Advanced Micro Devices, Inc.

Inventors: Blake A. Hechtman, Bradford M. Beckmann
WRITE COMBINING CACHE MICROARCHITECTURE FOR SYNCHRONIZATION EVENTS

Publication number: 20150046652

Abstract: A method, computer program product, and system is described that enforces a release consistency with special accesses sequentially consistent (RCsc) memory model and executes release synchronization instructions such as a StRel event without tracking an outstanding store event through a memory hierarchy, while efficiently using bandwidth resources. What is also described is the decoupling of a store event from an ordering of the store event with respect to a RCsc memory model. The description also includes a set of hierarchical read/write combining buffers that coalesce stores from different parts of the system. In addition, a pool component maintains partial order of received store events and release synchronization events to avoid content addressable memory (CAM) structures, full cache flushes, as well as direct write-throughs to memory. The approach improves the performance of both global and local synchronization events since a store event may not need to reach main memory to complete.

Type: Application

Filed: August 7, 2013

Publication date: February 12, 2015

Applicant: Advanced Micro Devices, Inc.

Inventors: Blake A. HECHTMAN, Bradford M. Beckmann
METHOD FOR MEMORY CONSISTENCY AMONG HETEROGENEOUS COMPUTER COMPONENTS

Publication number: 20140337587

Abstract: A method, computer program product, and system is described that determines the correctness of using memory operations in a computing device with heterogeneous computer components. Embodiments include an optimizer based on the characteristics of a Sequential Consistency for Heterogeneous-Race-Free (SC for HRF) model that analyzes a program and determines the correctness of the ordering of events in the program. HRF models include combinations of the properties: scope order, scope inclusion, and scope transitivity. The optimizer can determine when a program is heterogeneous-race-free in accordance with an SC for HRF memory consistency model . For example, the optimizer can analyze a portion of program code, respect the properties of the SC for HRF model, and determine whether a value produced by a store memory event will be a candidate for a value observed by a load memory event. In addition, the optimizer can determine whether reordering of events is possible.

Type: Application

Filed: May 12, 2014

Publication date: November 13, 2014

Applicant: Advanced Micro Devices, Inc.

Inventors: Derek R. HOWER, Mark D. Hill, David Wood, Steven K. Reinhardt, Benedict R. Gaster, Blake A. Hechtman, Bradford M. Beckmann