Patents by Inventor Marc S. Orr
Marc S. Orr has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 10360652Abstract: A processor comprising hardware logic configured to execute of a first wavefront in a hardware resource and stop execution of the first wavefront before the first wavefront completes. The processor schedules a second wavefront for execution in the hardware resource.Type: GrantFiled: June 13, 2014Date of Patent: July 23, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Marc S. Orr, Bradford M. Beckmann, Benedict R. Gaster, Steven K. Reinhardt, David A. Wood
-
Patent number: 10320695Abstract: A system and method for efficient management of network traffic management of highly data parallel computing. A processing node includes one or more processors capable of generating network messages. A network interface is used to receive and send network messages across a network. The processing node reduces at least one of a number or a storage size of the original network messages into one or more new network messages. The new network messages are sent to the network interface to send across the network.Type: GrantFiled: May 26, 2016Date of Patent: June 11, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Steven K. Reinhardt, Marc S. Orr, Bradford M. Beckmann, Shuai Che, David A. Wood
-
Patent number: 10209990Abstract: A conditional fetch-and-phi operation tests a memory location to determine if the memory locations stores a specified value and, if so, modifies the value at the memory location. The conditional fetch-and-phi operation can be implemented so that it can be concurrently executed by a plurality of concurrently executing threads, such as the threads of wavefront at a GPU. To execute the conditional fetch-and-phi operation, one of the concurrently executing threads is selected to execute a compare-and-swap (CAS) operation at the memory location, while the other threads await the results. The CAS operation tests the value at the memory location and, if the CAS operation is successful, the value is passed to each of the concurrently executing threads.Type: GrantFiled: June 2, 2015Date of Patent: February 19, 2019Assignee: Advanced Micro Devices, Inc.Inventors: David A. Wood, Steven K. Reinhardt, Bradford M. Beckmann, Marc S. Orr
-
Patent number: 10198261Abstract: A method of performing memory synchronization operations is provided that includes receiving, at a programmable cache controller in communication with one or more caches, an instruction in a first language to perform a memory synchronization operation of synchronizing a plurality of instruction sequences executing on a processor, mapping the received instruction in the first language to one or more selected cache operations in a second language executable by the cache controller and executing the one or more cache operations to perform the memory synchronization operation. The method further comprises receiving a second mapping that provides mapping instructions to map the received instruction to one or more other cache operations, mapping the received instruction to one or more other cache operations and executing the one or more other cache operations to perform the memory synchronization operation.Type: GrantFiled: April 11, 2016Date of Patent: February 5, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Shuai Che, Marc S. Orr, Bradford M. Beckmann
-
Patent number: 10025605Abstract: A receiving node in a computer system that includes a plurality of types of execution units receives an active message from a sending node. The receiving node compiles an intermediate language message handler corresponding to the active message into a machine instruction set architecture (ISA) message handler and the receiver executes the ISA message handler on a selected one of the execution units. If the active message handler is not available at the receiver, the sender sends an intermediate language version of the message handler to the receiving node. The execution unit selected to execute the message handler is chosen based on a field in the active message or on runtime criteria in the receiving system.Type: GrantFiled: April 8, 2016Date of Patent: July 17, 2018Assignee: Advanced Micro Devices, Inc.Inventors: Shuai Che, Marc S. Orr
-
Patent number: 9804883Abstract: Described herein is an apparatus and method for remote scoped synchronization, which is a new semantic that allows a work-item to order memory accesses with a scope instance outside of its scope hierarchy. More precisely, remote synchronization expands visibility at a particular scope to all scope-instances encompassed by that scope. Remote scoped synchronization operation allows smaller scopes to be used more frequently and defers added cost to only when larger scoped synchronization is required. This enables programmers to optimize the scope that memory operations are performed at for important communication patterns like work stealing. Executing memory operations at the optimum scope reduces both execution time and energy. In particular, remote synchronization allows a work-item to communicate with a scope that it otherwise would not be able to access. Specifically, work-items can pull valid data from and push updates to scopes that do not (hierarchically) contain them.Type: GrantFiled: November 14, 2014Date of Patent: October 31, 2017Assignee: Advanced Micro Devices, Inc.Inventors: Marc S. Orr, Bradford M. Beckmann, Ayse Yilmazer, Shuai Che, David A. Wood, Mark D. Hill
-
Publication number: 20170293487Abstract: A method of performing memory synchronization operations is provided that includes receiving, at a programmable cache controller in communication with one or more caches, an instruction in a first language to perform a memory synchronization operation of synchronizing a plurality of instruction sequences executing on a processor, mapping the received instruction in the first language to one or more selected cache operations in a second language executable by the cache controller and executing the one or more cache operations to perform the memory synchronization operation. The method further comprises receiving a second mapping that provides mapping instructions to map the received instruction to one or more other cache operations, mapping the received instruction to one or more other cache operations and executing the one or more other cache operations to perform the memory synchronization operation.Type: ApplicationFiled: April 11, 2016Publication date: October 12, 2017Applicant: Advanced Micro Devices, Inc.Inventors: Shuai Che, Marc S. Orr, Bradford M. Beckmann
-
Publication number: 20170293499Abstract: A receiving node in a computer system that includes a plurality of types of execution units receives an active message from a sending node. The receiving node compiles an intermediate language message handler corresponding to the active message into a machine instruction set architecture (ISA) message handler and the receiver executes the ISA message handler on a selected one of the execution units. If the active message handler is not available at the receiver, the sender sends an intermediate language version of the message handler to the receiving node. The execution unit selected to execute the message handler is chosen based on a field in the active message or on runtime criteria in the receiving system.Type: ApplicationFiled: April 8, 2016Publication date: October 12, 2017Inventors: Shuai Che, Marc S. Orr
-
Patent number: 9678806Abstract: Briefly, methods and apparatus to rebalance workloads among processing cores utilizing a hybrid work donation and work stealing technique are disclosed that improve workload imbalances within processing devices such as, for example, GPUs. In one example, the methods and apparatus allow for workload distribution between a first processing core and a second processing core by providing queue elements from one or more workgroup queues associated with workgroups executing on the first processing core to a first donation queue that may also be associated with the workgroups executing on the first processing core. The method and apparatus also determine if a queue level of the first donation queue is beyond a threshold, and if so, steal one or more queue elements from a second donation queue associated with workgroups executing on the second processing core.Type: GrantFiled: June 26, 2015Date of Patent: June 13, 2017Assignee: Advanced Micro Devices, Inc.Inventors: Shuai Che, Bradford Beckmann, Marc S. Orr, Ayse Yilmazer
-
Publication number: 20160378565Abstract: Briefly, methods and apparatus to rebalance workloads among processing cores utilizing a hybrid work donation and work stealing technique are disclosed that improve workload imbalances within processing devices such as, for example, GPUs. In one example, the methods and apparatus allow for workload distribution between a first processing core and a second processing core by providing queue elements from one or more workgroup queues associated with workgroups executing on the first processing core to a first donation queue that may also be associated with the workgroups executing on the first processing core. The method and apparatus also determine if a queue level of the first donation queue is beyond a threshold, and if so, steal one or more queue elements from a second donation queue associated with workgroups executing on the second processing core.Type: ApplicationFiled: June 26, 2015Publication date: December 29, 2016Applicant: Advanced Micro DevicesInventors: Shuai Che, Bradford Beckmann, Marc S. Orr, Ayse Yilmazer
-
Publication number: 20160357551Abstract: A conditional fetch-and-phi operation tests a memory location to determine if the memory locations stores a specified value and, if so, modifies the value at the memory location. The conditional fetch-and-phi operation can be implemented so that it can be concurrently executed by a plurality of concurrently executing threads, such as the threads of wavefront at a GPU. To execute the conditional fetch-and-phi operation, one of the concurrently executing threads is selected to execute a compare-and-swap (CAS) operation at the memory location, while the other threads await the results. The CAS operation tests the value at the memory location and, if the CAS operation is successful, the value is passed to each of the concurrently executing threads.Type: ApplicationFiled: June 2, 2015Publication date: December 8, 2016Inventors: David A. Wood, Steven K. Reinhardt, Bradford M. Beckmann, Marc S. Orr
-
Publication number: 20160352598Abstract: A system and method for efficient management of network traffic management of highly data parallel computing. A processing node includes one or more processors capable of generating network messages. A network interface is used to receive and send network messages across a network. The processing node reduces at least one of a number or a storage size of the original network messages into one or more new network messages. The new network messages are sent to the network interface to send across the network.Type: ApplicationFiled: May 26, 2016Publication date: December 1, 2016Inventors: Steven K. Reinhardt, Marc S. Orr, Bradford M. Beckmann, Shuai Che, David A. Wood
-
Patent number: 9411663Abstract: The described embodiments comprise a first hardware context. The first hardware context receives, from a second hardware context, an indication of a memory location and a condition to be met by the memory location. The first hardware context then sends a signal to the second hardware context when the memory location meets the condition.Type: GrantFiled: March 1, 2013Date of Patent: August 9, 2016Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Steven K. Reinhardt, Marc S. Orr, Bradford M. Beckmann
-
Publication number: 20160139624Abstract: Described herein is an apparatus and method for remote scoped synchronization, which is a new semantic that allows a work-item to order memory accesses with a scope instance outside of its scope hierarchy. More precisely, remote synchronization expands visibility at a particular scope to all scope-instances encompassed by that scope. Remote scoped synchronization operation allows smaller scopes to be used more frequently and defers added cost to only when larger scoped synchronization is required. This enables programmers to optimize the scope that memory operations are performed at for important communication patterns like work stealing. Executing memory operations at the optimum scope reduces both execution time and energy. In particular, remote synchronization allows a work-item to communicate with a scope that it otherwise would not be able to access. Specifically, work-items can pull valid data from and push updates to scopes that do not (hierarchically) contain them.Type: ApplicationFiled: November 14, 2014Publication date: May 19, 2016Applicant: ADVANCED MICRO DEVICES, INC.Inventors: Marc S. Orr, Bradford M. Beckmann, Ayse Yilmazer, Shuai Che, David A. Wood, Mark D. Hill
-
Patent number: 9256535Abstract: The described embodiments comprise a computing device with a first processor core and a second processor core. In some embodiments, during operations, the first processor core receives, from the second processor core, an indication of a memory location and a flag. The first processor core then stores the flag in a first cache line in a cache in the first processor core and stores the indication of the memory location separately in a second cache line in the cache. Upon encountering a predetermined result when evaluating a condition for the indicated memory location, the first processor core updates the flag in the first cache line. Based on the update of the flag, the first processor core causes the second processor core to perform an operation.Type: GrantFiled: April 4, 2013Date of Patent: February 9, 2016Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Steven K. Reinhardt, Marc S. Orr, Bradford M. Beckmann
-
Publication number: 20150363903Abstract: A processor comprising hardware logic configured to execute of a first wavefront in a hardware resource and stop execution of the first wavefront before the first wavefront completes. The processor schedules a second wavefront for execution in the hardware resource.Type: ApplicationFiled: June 13, 2014Publication date: December 17, 2015Inventors: Marc S. Orr, Bradford M. Beckmann, Benedict R. Gaster, Steven K. Reinhardt, David A. Wood
-
Publication number: 20140304474Abstract: The described embodiments comprise a computing device with a first processor core and a second processor core. In some embodiments, during operations, the first processor core receives, from the second processor core, an indication of a memory location and a flag. The first processor core then stores the flag in a first cache line in a cache in the first processor core and stores the indication of the memory location separately in a second cache line in the cache. Upon encountering a predetermined result when evaluating a condition for the indicated memory location, the first processor core updates the flag in the first cache line. Based on the update of the flag, the first processor core causes the second processor core to perform an operation.Type: ApplicationFiled: April 4, 2013Publication date: October 9, 2014Applicant: Advanced Micro Devices, Inc.Inventors: Steven K. Reinhardt, Marc S. Orr, Bradford M. Beckmann
-
Publication number: 20140250442Abstract: The described embodiments include a computing device. In these embodiments, an entity in the computing device receives an identification of a memory location and a condition to be met by a value in the memory location. Upon a predetermined event occurring, the entity causes an operation to be performed when the value in the memory location meets the condition.Type: ApplicationFiled: March 1, 2013Publication date: September 4, 2014Applicant: Advanced Micro Devices, Inc.Inventors: Steven K. Reinhardt, Marc S. Orr, Bradford M. Beckmann
-
Publication number: 20140250312Abstract: The described embodiments comprise a first hardware context. The first hardware context receives, from a second hardware context, an indication of a memory location and a condition to be met by the memory location. The first hardware context then sends a signal to the second hardware context when the memory location meets the condition.Type: ApplicationFiled: March 1, 2013Publication date: September 4, 2014Applicant: Advanced Micro Devices, Inc.Inventors: Steven K. Reinhardt, Marc S. Orr, Bradford M. Beckmann
-
Publication number: 20140181822Abstract: A system, method and a computer-readable medium for task scheduling using fragmented channels is provided. A plurality of fragmented channels are stored in memory accessible to a plurality of compute units. Each fragmented channel is associated with a particular compute unit. Each fragmented channel also stores a plurality of data items from tasks scheduled for processing on the associated compute unit and links to another fragmented channel in the plurality of fragmented channels.Type: ApplicationFiled: December 20, 2012Publication date: June 26, 2014Applicant: Advanced Micro Devices, Inc.Inventors: Bradford M. BECKMANN, Marc S. Orr