Patents by Inventor Marc S. Orr

Marc S. Orr has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Wavefront resource virtualization

Patent number: 10360652

Abstract: A processor comprising hardware logic configured to execute of a first wavefront in a hardware resource and stop execution of the first wavefront before the first wavefront completes. The processor schedules a second wavefront for execution in the hardware resource.

Type: Grant

Filed: June 13, 2014

Date of Patent: July 23, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Marc S. Orr, Bradford M. Beckmann, Benedict R. Gaster, Steven K. Reinhardt, David A. Wood
Message aggregation, combining and compression for efficient data communications in GPU-based clusters

Patent number: 10320695

Abstract: A system and method for efficient management of network traffic management of highly data parallel computing. A processing node includes one or more processors capable of generating network messages. A network interface is used to receive and send network messages across a network. The processing node reduces at least one of a number or a storage size of the original network messages into one or more new network messages. The new network messages are sent to the network interface to send across the network.

Type: Grant

Filed: May 26, 2016

Date of Patent: June 11, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Steven K. Reinhardt, Marc S. Orr, Bradford M. Beckmann, Shuai Che, David A. Wood
Conditional atomic operations in single instruction multiple data processors

Patent number: 10209990

Abstract: A conditional fetch-and-phi operation tests a memory location to determine if the memory locations stores a specified value and, if so, modifies the value at the memory location. The conditional fetch-and-phi operation can be implemented so that it can be concurrently executed by a plurality of concurrently executing threads, such as the threads of wavefront at a GPU. To execute the conditional fetch-and-phi operation, one of the concurrently executing threads is selected to execute a compare-and-swap (CAS) operation at the memory location, while the other threads await the results. The CAS operation tests the value at the memory location and, if the CAS operation is successful, the value is passed to each of the concurrently executing threads.

Type: Grant

Filed: June 2, 2015

Date of Patent: February 19, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: David A. Wood, Steven K. Reinhardt, Bradford M. Beckmann, Marc S. Orr
Flexible framework to support memory synchronization operations

Patent number: 10198261

Abstract: A method of performing memory synchronization operations is provided that includes receiving, at a programmable cache controller in communication with one or more caches, an instruction in a first language to perform a memory synchronization operation of synchronizing a plurality of instruction sequences executing on a processor, mapping the received instruction in the first language to one or more selected cache operations in a second language executable by the cache controller and executing the one or more cache operations to perform the memory synchronization operation. The method further comprises receiving a second mapping that provides mapping instructions to map the received instruction to one or more other cache operations, mapping the received instruction to one or more other cache operations and executing the one or more other cache operations to perform the memory synchronization operation.

Type: Grant

Filed: April 11, 2016

Date of Patent: February 5, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Shuai Che, Marc S. Orr, Bradford M. Beckmann
Message handler compiling and scheduling in heterogeneous system architectures

Patent number: 10025605

Abstract: A receiving node in a computer system that includes a plurality of types of execution units receives an active message from a sending node. The receiving node compiles an intermediate language message handler corresponding to the active message into a machine instruction set architecture (ISA) message handler and the receiver executes the ISA message handler on a selected one of the execution units. If the active message handler is not available at the receiver, the sender sends an intermediate language version of the message handler to the receiving node. The execution unit selected to execute the message handler is chosen based on a field in the active message or on runtime criteria in the receiving system.

Type: Grant

Filed: April 8, 2016

Date of Patent: July 17, 2018

Assignee: Advanced Micro Devices, Inc.

Inventors: Shuai Che, Marc S. Orr
Remote scoped synchronization for work stealing and sharing

Patent number: 9804883

Abstract: Described herein is an apparatus and method for remote scoped synchronization, which is a new semantic that allows a work-item to order memory accesses with a scope instance outside of its scope hierarchy. More precisely, remote synchronization expands visibility at a particular scope to all scope-instances encompassed by that scope. Remote scoped synchronization operation allows smaller scopes to be used more frequently and defers added cost to only when larger scoped synchronization is required. This enables programmers to optimize the scope that memory operations are performed at for important communication patterns like work stealing. Executing memory operations at the optimum scope reduces both execution time and energy. In particular, remote synchronization allows a work-item to communicate with a scope that it otherwise would not be able to access. Specifically, work-items can pull valid data from and push updates to scopes that do not (hierarchically) contain them.

Type: Grant

Filed: November 14, 2014

Date of Patent: October 31, 2017

Assignee: Advanced Micro Devices, Inc.

Inventors: Marc S. Orr, Bradford M. Beckmann, Ayse Yilmazer, Shuai Che, David A. Wood, Mark D. Hill
FLEXIBLE FRAMEWORK TO SUPPORT MEMORY SYNCHRONIZATION OPERATIONS

Publication number: 20170293487

Abstract: A method of performing memory synchronization operations is provided that includes receiving, at a programmable cache controller in communication with one or more caches, an instruction in a first language to perform a memory synchronization operation of synchronizing a plurality of instruction sequences executing on a processor, mapping the received instruction in the first language to one or more selected cache operations in a second language executable by the cache controller and executing the one or more cache operations to perform the memory synchronization operation. The method further comprises receiving a second mapping that provides mapping instructions to map the received instruction to one or more other cache operations, mapping the received instruction to one or more other cache operations and executing the one or more other cache operations to perform the memory synchronization operation.

Type: Application

Filed: April 11, 2016

Publication date: October 12, 2017

Applicant: Advanced Micro Devices, Inc.

Inventors: Shuai Che, Marc S. Orr, Bradford M. Beckmann
Message Handler Compiling and Scheduling in Heterogeneous System Architectures

Publication number: 20170293499

Abstract: A receiving node in a computer system that includes a plurality of types of execution units receives an active message from a sending node. The receiving node compiles an intermediate language message handler corresponding to the active message into a machine instruction set architecture (ISA) message handler and the receiver executes the ISA message handler on a selected one of the execution units. If the active message handler is not available at the receiver, the sender sends an intermediate language version of the message handler to the receiving node. The execution unit selected to execute the message handler is chosen based on a field in the active message or on runtime criteria in the receiving system.

Type: Application

Filed: April 8, 2016

Publication date: October 12, 2017

Inventors: Shuai Che, Marc S. Orr
Method and apparatus for distributing processing core workloads among processing cores

Patent number: 9678806

Abstract: Briefly, methods and apparatus to rebalance workloads among processing cores utilizing a hybrid work donation and work stealing technique are disclosed that improve workload imbalances within processing devices such as, for example, GPUs. In one example, the methods and apparatus allow for workload distribution between a first processing core and a second processing core by providing queue elements from one or more workgroup queues associated with workgroups executing on the first processing core to a first donation queue that may also be associated with the workgroups executing on the first processing core. The method and apparatus also determine if a queue level of the first donation queue is beyond a threshold, and if so, steal one or more queue elements from a second donation queue associated with workgroups executing on the second processing core.

Type: Grant

Filed: June 26, 2015

Date of Patent: June 13, 2017

Assignee: Advanced Micro Devices, Inc.

Inventors: Shuai Che, Bradford Beckmann, Marc S. Orr, Ayse Yilmazer
METHOD AND APPARATUS FOR REGULATING PROCESSING CORE LOAD IMBALANCE

Publication number: 20160378565

Abstract: Briefly, methods and apparatus to rebalance workloads among processing cores utilizing a hybrid work donation and work stealing technique are disclosed that improve workload imbalances within processing devices such as, for example, GPUs. In one example, the methods and apparatus allow for workload distribution between a first processing core and a second processing core by providing queue elements from one or more workgroup queues associated with workgroups executing on the first processing core to a first donation queue that may also be associated with the workgroups executing on the first processing core. The method and apparatus also determine if a queue level of the first donation queue is beyond a threshold, and if so, steal one or more queue elements from a second donation queue associated with workgroups executing on the second processing core.

Type: Application

Filed: June 26, 2015

Publication date: December 29, 2016

Applicant: Advanced Micro Devices

Inventors: Shuai Che, Bradford Beckmann, Marc S. Orr, Ayse Yilmazer
CONDITIONAL ATOMIC OPERATIONS AT A PROCESSOR

Publication number: 20160357551

Abstract: A conditional fetch-and-phi operation tests a memory location to determine if the memory locations stores a specified value and, if so, modifies the value at the memory location. The conditional fetch-and-phi operation can be implemented so that it can be concurrently executed by a plurality of concurrently executing threads, such as the threads of wavefront at a GPU. To execute the conditional fetch-and-phi operation, one of the concurrently executing threads is selected to execute a compare-and-swap (CAS) operation at the memory location, while the other threads await the results. The CAS operation tests the value at the memory location and, if the CAS operation is successful, the value is passed to each of the concurrently executing threads.

Type: Application

Filed: June 2, 2015

Publication date: December 8, 2016

Inventors: David A. Wood, Steven K. Reinhardt, Bradford M. Beckmann, Marc S. Orr
MESSAGE AGGREGATION, COMBINING AND COMPRESSION FOR EFFICIENT DATA COMMUNICATIONS IN GPU-BASED CLUSTERS

Publication number: 20160352598

Abstract: A system and method for efficient management of network traffic management of highly data parallel computing. A processing node includes one or more processors capable of generating network messages. A network interface is used to receive and send network messages across a network. The processing node reduces at least one of a number or a storage size of the original network messages into one or more new network messages. The new network messages are sent to the network interface to send across the network.

Type: Application

Filed: May 26, 2016

Publication date: December 1, 2016

Inventors: Steven K. Reinhardt, Marc S. Orr, Bradford M. Beckmann, Shuai Che, David A. Wood
Conditional notification mechanism

Patent number: 9411663

Abstract: The described embodiments comprise a first hardware context. The first hardware context receives, from a second hardware context, an indication of a memory location and a condition to be met by the memory location. The first hardware context then sends a signal to the second hardware context when the memory location meets the condition.

Type: Grant

Filed: March 1, 2013

Date of Patent: August 9, 2016

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Steven K. Reinhardt, Marc S. Orr, Bradford M. Beckmann
PROCESSOR AND METHODS FOR REMOTE SCOPED SYNCHRONIZATION

Publication number: 20160139624

Abstract: Described herein is an apparatus and method for remote scoped synchronization, which is a new semantic that allows a work-item to order memory accesses with a scope instance outside of its scope hierarchy. More precisely, remote synchronization expands visibility at a particular scope to all scope-instances encompassed by that scope. Remote scoped synchronization operation allows smaller scopes to be used more frequently and defers added cost to only when larger scoped synchronization is required. This enables programmers to optimize the scope that memory operations are performed at for important communication patterns like work stealing. Executing memory operations at the optimum scope reduces both execution time and energy. In particular, remote synchronization allows a work-item to communicate with a scope that it otherwise would not be able to access. Specifically, work-items can pull valid data from and push updates to scopes that do not (hierarchically) contain them.

Type: Application

Filed: November 14, 2014

Publication date: May 19, 2016

Applicant: ADVANCED MICRO DEVICES, INC.

Inventors: Marc S. Orr, Bradford M. Beckmann, Ayse Yilmazer, Shuai Che, David A. Wood, Mark D. Hill
Conditional notification mechanism

Patent number: 9256535

Abstract: The described embodiments comprise a computing device with a first processor core and a second processor core. In some embodiments, during operations, the first processor core receives, from the second processor core, an indication of a memory location and a flag. The first processor core then stores the flag in a first cache line in a cache in the first processor core and stores the indication of the memory location separately in a second cache line in the cache. Upon encountering a predetermined result when evaluating a condition for the indicated memory location, the first processor core updates the flag in the first cache line. Based on the update of the flag, the first processor core causes the second processor core to perform an operation.

Type: Grant

Filed: April 4, 2013

Date of Patent: February 9, 2016

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Steven K. Reinhardt, Marc S. Orr, Bradford M. Beckmann
Wavefront Resource Virtualization

Publication number: 20150363903

Abstract: A processor comprising hardware logic configured to execute of a first wavefront in a hardware resource and stop execution of the first wavefront before the first wavefront completes. The processor schedules a second wavefront for execution in the hardware resource.

Type: Application

Filed: June 13, 2014

Publication date: December 17, 2015

Inventors: Marc S. Orr, Bradford M. Beckmann, Benedict R. Gaster, Steven K. Reinhardt, David A. Wood
Conditional Notification Mechanism

Publication number: 20140304474

Abstract: The described embodiments comprise a computing device with a first processor core and a second processor core. In some embodiments, during operations, the first processor core receives, from the second processor core, an indication of a memory location and a flag. The first processor core then stores the flag in a first cache line in a cache in the first processor core and stores the indication of the memory location separately in a second cache line in the cache. Upon encountering a predetermined result when evaluating a condition for the indicated memory location, the first processor core updates the flag in the first cache line. Based on the update of the flag, the first processor core causes the second processor core to perform an operation.

Type: Application

Filed: April 4, 2013

Publication date: October 9, 2014

Applicant: Advanced Micro Devices, Inc.

Inventors: Steven K. Reinhardt, Marc S. Orr, Bradford M. Beckmann
Conditional Notification Mechanism

Publication number: 20140250442

Abstract: The described embodiments include a computing device. In these embodiments, an entity in the computing device receives an identification of a memory location and a condition to be met by a value in the memory location. Upon a predetermined event occurring, the entity causes an operation to be performed when the value in the memory location meets the condition.

Type: Application

Filed: March 1, 2013

Publication date: September 4, 2014

Applicant: Advanced Micro Devices, Inc.

Inventors: Steven K. Reinhardt, Marc S. Orr, Bradford M. Beckmann
Conditional Notification Mechanism

Publication number: 20140250312

Abstract: The described embodiments comprise a first hardware context. The first hardware context receives, from a second hardware context, an indication of a memory location and a condition to be met by the memory location. The first hardware context then sends a signal to the second hardware context when the memory location meets the condition.

Type: Application

Filed: March 1, 2013

Publication date: September 4, 2014

Applicant: Advanced Micro Devices, Inc.

Inventors: Steven K. Reinhardt, Marc S. Orr, Bradford M. Beckmann
Fragmented Channels

Publication number: 20140181822

Abstract: A system, method and a computer-readable medium for task scheduling using fragmented channels is provided. A plurality of fragmented channels are stored in memory accessible to a plurality of compute units. Each fragmented channel is associated with a particular compute unit. Each fragmented channel also stores a plurality of data items from tasks scheduled for processing on the associated compute unit and links to another fragmented channel in the plurality of fragmented channels.

Type: Application

Filed: December 20, 2012

Publication date: June 26, 2014

Applicant: Advanced Micro Devices, Inc.

Inventors: Bradford M. BECKMANN, Marc S. Orr