Patents by Inventor Marc S. Orr

Marc S. Orr has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10360652
    Abstract: A processor comprising hardware logic configured to execute of a first wavefront in a hardware resource and stop execution of the first wavefront before the first wavefront completes. The processor schedules a second wavefront for execution in the hardware resource.
    Type: Grant
    Filed: June 13, 2014
    Date of Patent: July 23, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Marc S. Orr, Bradford M. Beckmann, Benedict R. Gaster, Steven K. Reinhardt, David A. Wood
  • Patent number: 10320695
    Abstract: A system and method for efficient management of network traffic management of highly data parallel computing. A processing node includes one or more processors capable of generating network messages. A network interface is used to receive and send network messages across a network. The processing node reduces at least one of a number or a storage size of the original network messages into one or more new network messages. The new network messages are sent to the network interface to send across the network.
    Type: Grant
    Filed: May 26, 2016
    Date of Patent: June 11, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Steven K. Reinhardt, Marc S. Orr, Bradford M. Beckmann, Shuai Che, David A. Wood
  • Patent number: 10209990
    Abstract: A conditional fetch-and-phi operation tests a memory location to determine if the memory locations stores a specified value and, if so, modifies the value at the memory location. The conditional fetch-and-phi operation can be implemented so that it can be concurrently executed by a plurality of concurrently executing threads, such as the threads of wavefront at a GPU. To execute the conditional fetch-and-phi operation, one of the concurrently executing threads is selected to execute a compare-and-swap (CAS) operation at the memory location, while the other threads await the results. The CAS operation tests the value at the memory location and, if the CAS operation is successful, the value is passed to each of the concurrently executing threads.
    Type: Grant
    Filed: June 2, 2015
    Date of Patent: February 19, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventors: David A. Wood, Steven K. Reinhardt, Bradford M. Beckmann, Marc S. Orr
  • Patent number: 10198261
    Abstract: A method of performing memory synchronization operations is provided that includes receiving, at a programmable cache controller in communication with one or more caches, an instruction in a first language to perform a memory synchronization operation of synchronizing a plurality of instruction sequences executing on a processor, mapping the received instruction in the first language to one or more selected cache operations in a second language executable by the cache controller and executing the one or more cache operations to perform the memory synchronization operation. The method further comprises receiving a second mapping that provides mapping instructions to map the received instruction to one or more other cache operations, mapping the received instruction to one or more other cache operations and executing the one or more other cache operations to perform the memory synchronization operation.
    Type: Grant
    Filed: April 11, 2016
    Date of Patent: February 5, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Shuai Che, Marc S. Orr, Bradford M. Beckmann
  • Patent number: 10025605
    Abstract: A receiving node in a computer system that includes a plurality of types of execution units receives an active message from a sending node. The receiving node compiles an intermediate language message handler corresponding to the active message into a machine instruction set architecture (ISA) message handler and the receiver executes the ISA message handler on a selected one of the execution units. If the active message handler is not available at the receiver, the sender sends an intermediate language version of the message handler to the receiving node. The execution unit selected to execute the message handler is chosen based on a field in the active message or on runtime criteria in the receiving system.
    Type: Grant
    Filed: April 8, 2016
    Date of Patent: July 17, 2018
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Shuai Che, Marc S. Orr
  • Patent number: 9804883
    Abstract: Described herein is an apparatus and method for remote scoped synchronization, which is a new semantic that allows a work-item to order memory accesses with a scope instance outside of its scope hierarchy. More precisely, remote synchronization expands visibility at a particular scope to all scope-instances encompassed by that scope. Remote scoped synchronization operation allows smaller scopes to be used more frequently and defers added cost to only when larger scoped synchronization is required. This enables programmers to optimize the scope that memory operations are performed at for important communication patterns like work stealing. Executing memory operations at the optimum scope reduces both execution time and energy. In particular, remote synchronization allows a work-item to communicate with a scope that it otherwise would not be able to access. Specifically, work-items can pull valid data from and push updates to scopes that do not (hierarchically) contain them.
    Type: Grant
    Filed: November 14, 2014
    Date of Patent: October 31, 2017
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Marc S. Orr, Bradford M. Beckmann, Ayse Yilmazer, Shuai Che, David A. Wood, Mark D. Hill
  • Publication number: 20170293487
    Abstract: A method of performing memory synchronization operations is provided that includes receiving, at a programmable cache controller in communication with one or more caches, an instruction in a first language to perform a memory synchronization operation of synchronizing a plurality of instruction sequences executing on a processor, mapping the received instruction in the first language to one or more selected cache operations in a second language executable by the cache controller and executing the one or more cache operations to perform the memory synchronization operation. The method further comprises receiving a second mapping that provides mapping instructions to map the received instruction to one or more other cache operations, mapping the received instruction to one or more other cache operations and executing the one or more other cache operations to perform the memory synchronization operation.
    Type: Application
    Filed: April 11, 2016
    Publication date: October 12, 2017
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Shuai Che, Marc S. Orr, Bradford M. Beckmann
  • Publication number: 20170293499
    Abstract: A receiving node in a computer system that includes a plurality of types of execution units receives an active message from a sending node. The receiving node compiles an intermediate language message handler corresponding to the active message into a machine instruction set architecture (ISA) message handler and the receiver executes the ISA message handler on a selected one of the execution units. If the active message handler is not available at the receiver, the sender sends an intermediate language version of the message handler to the receiving node. The execution unit selected to execute the message handler is chosen based on a field in the active message or on runtime criteria in the receiving system.
    Type: Application
    Filed: April 8, 2016
    Publication date: October 12, 2017
    Inventors: Shuai Che, Marc S. Orr
  • Patent number: 9678806
    Abstract: Briefly, methods and apparatus to rebalance workloads among processing cores utilizing a hybrid work donation and work stealing technique are disclosed that improve workload imbalances within processing devices such as, for example, GPUs. In one example, the methods and apparatus allow for workload distribution between a first processing core and a second processing core by providing queue elements from one or more workgroup queues associated with workgroups executing on the first processing core to a first donation queue that may also be associated with the workgroups executing on the first processing core. The method and apparatus also determine if a queue level of the first donation queue is beyond a threshold, and if so, steal one or more queue elements from a second donation queue associated with workgroups executing on the second processing core.
    Type: Grant
    Filed: June 26, 2015
    Date of Patent: June 13, 2017
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Shuai Che, Bradford Beckmann, Marc S. Orr, Ayse Yilmazer
  • Publication number: 20160378565
    Abstract: Briefly, methods and apparatus to rebalance workloads among processing cores utilizing a hybrid work donation and work stealing technique are disclosed that improve workload imbalances within processing devices such as, for example, GPUs. In one example, the methods and apparatus allow for workload distribution between a first processing core and a second processing core by providing queue elements from one or more workgroup queues associated with workgroups executing on the first processing core to a first donation queue that may also be associated with the workgroups executing on the first processing core. The method and apparatus also determine if a queue level of the first donation queue is beyond a threshold, and if so, steal one or more queue elements from a second donation queue associated with workgroups executing on the second processing core.
    Type: Application
    Filed: June 26, 2015
    Publication date: December 29, 2016
    Applicant: Advanced Micro Devices
    Inventors: Shuai Che, Bradford Beckmann, Marc S. Orr, Ayse Yilmazer
  • Publication number: 20160357551
    Abstract: A conditional fetch-and-phi operation tests a memory location to determine if the memory locations stores a specified value and, if so, modifies the value at the memory location. The conditional fetch-and-phi operation can be implemented so that it can be concurrently executed by a plurality of concurrently executing threads, such as the threads of wavefront at a GPU. To execute the conditional fetch-and-phi operation, one of the concurrently executing threads is selected to execute a compare-and-swap (CAS) operation at the memory location, while the other threads await the results. The CAS operation tests the value at the memory location and, if the CAS operation is successful, the value is passed to each of the concurrently executing threads.
    Type: Application
    Filed: June 2, 2015
    Publication date: December 8, 2016
    Inventors: David A. Wood, Steven K. Reinhardt, Bradford M. Beckmann, Marc S. Orr
  • Publication number: 20160352598
    Abstract: A system and method for efficient management of network traffic management of highly data parallel computing. A processing node includes one or more processors capable of generating network messages. A network interface is used to receive and send network messages across a network. The processing node reduces at least one of a number or a storage size of the original network messages into one or more new network messages. The new network messages are sent to the network interface to send across the network.
    Type: Application
    Filed: May 26, 2016
    Publication date: December 1, 2016
    Inventors: Steven K. Reinhardt, Marc S. Orr, Bradford M. Beckmann, Shuai Che, David A. Wood
  • Patent number: 9411663
    Abstract: The described embodiments comprise a first hardware context. The first hardware context receives, from a second hardware context, an indication of a memory location and a condition to be met by the memory location. The first hardware context then sends a signal to the second hardware context when the memory location meets the condition.
    Type: Grant
    Filed: March 1, 2013
    Date of Patent: August 9, 2016
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Steven K. Reinhardt, Marc S. Orr, Bradford M. Beckmann
  • Publication number: 20160139624
    Abstract: Described herein is an apparatus and method for remote scoped synchronization, which is a new semantic that allows a work-item to order memory accesses with a scope instance outside of its scope hierarchy. More precisely, remote synchronization expands visibility at a particular scope to all scope-instances encompassed by that scope. Remote scoped synchronization operation allows smaller scopes to be used more frequently and defers added cost to only when larger scoped synchronization is required. This enables programmers to optimize the scope that memory operations are performed at for important communication patterns like work stealing. Executing memory operations at the optimum scope reduces both execution time and energy. In particular, remote synchronization allows a work-item to communicate with a scope that it otherwise would not be able to access. Specifically, work-items can pull valid data from and push updates to scopes that do not (hierarchically) contain them.
    Type: Application
    Filed: November 14, 2014
    Publication date: May 19, 2016
    Applicant: ADVANCED MICRO DEVICES, INC.
    Inventors: Marc S. Orr, Bradford M. Beckmann, Ayse Yilmazer, Shuai Che, David A. Wood, Mark D. Hill
  • Patent number: 9256535
    Abstract: The described embodiments comprise a computing device with a first processor core and a second processor core. In some embodiments, during operations, the first processor core receives, from the second processor core, an indication of a memory location and a flag. The first processor core then stores the flag in a first cache line in a cache in the first processor core and stores the indication of the memory location separately in a second cache line in the cache. Upon encountering a predetermined result when evaluating a condition for the indicated memory location, the first processor core updates the flag in the first cache line. Based on the update of the flag, the first processor core causes the second processor core to perform an operation.
    Type: Grant
    Filed: April 4, 2013
    Date of Patent: February 9, 2016
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Steven K. Reinhardt, Marc S. Orr, Bradford M. Beckmann
  • Publication number: 20150363903
    Abstract: A processor comprising hardware logic configured to execute of a first wavefront in a hardware resource and stop execution of the first wavefront before the first wavefront completes. The processor schedules a second wavefront for execution in the hardware resource.
    Type: Application
    Filed: June 13, 2014
    Publication date: December 17, 2015
    Inventors: Marc S. Orr, Bradford M. Beckmann, Benedict R. Gaster, Steven K. Reinhardt, David A. Wood
  • Publication number: 20140304474
    Abstract: The described embodiments comprise a computing device with a first processor core and a second processor core. In some embodiments, during operations, the first processor core receives, from the second processor core, an indication of a memory location and a flag. The first processor core then stores the flag in a first cache line in a cache in the first processor core and stores the indication of the memory location separately in a second cache line in the cache. Upon encountering a predetermined result when evaluating a condition for the indicated memory location, the first processor core updates the flag in the first cache line. Based on the update of the flag, the first processor core causes the second processor core to perform an operation.
    Type: Application
    Filed: April 4, 2013
    Publication date: October 9, 2014
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Steven K. Reinhardt, Marc S. Orr, Bradford M. Beckmann
  • Publication number: 20140250442
    Abstract: The described embodiments include a computing device. In these embodiments, an entity in the computing device receives an identification of a memory location and a condition to be met by a value in the memory location. Upon a predetermined event occurring, the entity causes an operation to be performed when the value in the memory location meets the condition.
    Type: Application
    Filed: March 1, 2013
    Publication date: September 4, 2014
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Steven K. Reinhardt, Marc S. Orr, Bradford M. Beckmann
  • Publication number: 20140250312
    Abstract: The described embodiments comprise a first hardware context. The first hardware context receives, from a second hardware context, an indication of a memory location and a condition to be met by the memory location. The first hardware context then sends a signal to the second hardware context when the memory location meets the condition.
    Type: Application
    Filed: March 1, 2013
    Publication date: September 4, 2014
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Steven K. Reinhardt, Marc S. Orr, Bradford M. Beckmann
  • Publication number: 20140181822
    Abstract: A system, method and a computer-readable medium for task scheduling using fragmented channels is provided. A plurality of fragmented channels are stored in memory accessible to a plurality of compute units. Each fragmented channel is associated with a particular compute unit. Each fragmented channel also stores a plurality of data items from tasks scheduled for processing on the associated compute unit and links to another fragmented channel in the plurality of fragmented channels.
    Type: Application
    Filed: December 20, 2012
    Publication date: June 26, 2014
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Bradford M. BECKMANN, Marc S. Orr