Patents by Inventor Bradford M. Beckmann

Bradford M. Beckmann has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10198261
    Abstract: A method of performing memory synchronization operations is provided that includes receiving, at a programmable cache controller in communication with one or more caches, an instruction in a first language to perform a memory synchronization operation of synchronizing a plurality of instruction sequences executing on a processor, mapping the received instruction in the first language to one or more selected cache operations in a second language executable by the cache controller and executing the one or more cache operations to perform the memory synchronization operation. The method further comprises receiving a second mapping that provides mapping instructions to map the received instruction to one or more other cache operations, mapping the received instruction to one or more other cache operations and executing the one or more other cache operations to perform the memory synchronization operation.
    Type: Grant
    Filed: April 11, 2016
    Date of Patent: February 5, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Shuai Che, Marc S. Orr, Bradford M. Beckmann
  • Publication number: 20190034151
    Abstract: A technique for implementing synchronization monitors on an accelerated processing device (“APD”) is provided. Work on an APD includes workgroups that include one or more wavefronts. All wavefronts of a workgroup execute on a single compute unit. A monitor is a synchronization construct that allows workgroups to stall until a particular condition is met. Responsive to all wavefronts of a workgroup executing a wait instruction, the monitor coordinator records the workgroup in an “entry queue.” The workgroup begins saving its state to a general APD memory and, when such saving is complete, the monitor coordinator moves the workgroup to a “condition queue.” When the condition specified by the wait instruction is met, the monitor coordinator moves the workgroup to a “ready queue,” and, when sufficient resources are available on a compute unit, the APD schedules the ready workgroup for execution on a compute unit.
    Type: Application
    Filed: July 27, 2017
    Publication date: January 31, 2019
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Alexandru Dutu, Bradford M. Beckmann
  • Publication number: 20190013051
    Abstract: A system, method, and computer program product are provided for a memory device system. One or more memory dies and at least one logic die are disposed in a package and communicatively coupled. The logic die comprises a processing device configurable to manage virtual memory and operate in an operating mode. The operating mode is selected from a set of operating modes comprising a slave operating mode and a host operating mode.
    Type: Application
    Filed: September 12, 2018
    Publication date: January 10, 2019
    Applicant: ADVANCED MICRO DEVICES, INC.
    Inventors: Nuwan S. Jayasena, Gabriel H. Loh, Bradford M. Beckmann, James M. O'Connor, Lisa R. Hsu
  • Patent number: 10079044
    Abstract: A system, method, and computer program product are provided for a memory device system. One or more memory dies and at least one logic die are disposed in a package and communicatively coupled. The logic die comprises a processing device configurable to manage virtual memory and operate in an operating mode. The operating mode is selected from a set of operating modes comprising a slave operating mode and a host operating mode.
    Type: Grant
    Filed: December 20, 2012
    Date of Patent: September 18, 2018
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Nuwan S. Jayasena, Gabriel H. Loh, Bradford M. Beckmann, James M. O'Connor, Lisa R. Hsu
  • Patent number: 10019377
    Abstract: The described embodiments include a computing device with two or more types of processors and a memory that is shared between the two or more types of processors. The computing device performs operations for handling cache coherency between the two or more types of processors. During operation, the computing device sets a cache coherency indicator in metadata in a page table entry in a page table, the page table entry information about a page of data that is stored in the memory. The computing device then uses the cache coherency indicator to determine operations to be performed when accessing data in the page of data in the memory. For example, the computing device can use the coherency indicator to determine whether a coherency operation is to be performed when a processor of a given type accesses data in the page of data in the memory.
    Type: Grant
    Filed: May 23, 2016
    Date of Patent: July 10, 2018
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Arkaprava Basu, Bradford M. Beckmann, Shuai Che, Sooraj Puthoor
  • Patent number: 9898287
    Abstract: A method, a non-transitory computer readable medium, and a processor for repacking dynamic wavefronts during program code execution on a processing unit, each dynamic wavefront including multiple threads are presented. If a branch instruction is detected, a determination is made whether all wavefronts following a same control path in the program code have reached a compaction point, which is the branch instruction. If no branch instruction is detected in executing the program code, a determination is made whether all wavefronts following the same control path have reached a reconvergence point, which is a beginning of a program code segment to be executed by both a taken branch and a not taken branch from a previous branch instruction. The dynamic wavefronts are repacked with all threads that follow the same control path, if all wavefronts following the same control path have reached the branch instruction or the reconvergence point.
    Type: Grant
    Filed: April 9, 2015
    Date of Patent: February 20, 2018
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Sooraj Puthoor, Bradford M. Beckmann, Dmitri Yudanov
  • Publication number: 20170337136
    Abstract: The described embodiments include a computing device with two or more types of processors and a memory that is shared between the two or more types of processors. The computing device performs operations for handling cache coherency between the two or more types of processors. During operation, the computing device sets a cache coherency indicator in metadata in a page table entry in a page table, the page table entry information about a page of data that is stored in the memory. The computing device then uses the cache coherency indicator to determine operations to be performed when accessing data in the page of data in the memory. For example, the computing device can use the coherency indicator to determine whether a coherency operation is to be performed when a processor of a given type accesses data in the page of data in the memory.
    Type: Application
    Filed: May 23, 2016
    Publication date: November 23, 2017
    Inventors: Arkaprava Basu, Bradford M. Beckmann, Shuai Che, Sooraj Puthoor
  • Patent number: 9804883
    Abstract: Described herein is an apparatus and method for remote scoped synchronization, which is a new semantic that allows a work-item to order memory accesses with a scope instance outside of its scope hierarchy. More precisely, remote synchronization expands visibility at a particular scope to all scope-instances encompassed by that scope. Remote scoped synchronization operation allows smaller scopes to be used more frequently and defers added cost to only when larger scoped synchronization is required. This enables programmers to optimize the scope that memory operations are performed at for important communication patterns like work stealing. Executing memory operations at the optimum scope reduces both execution time and energy. In particular, remote synchronization allows a work-item to communicate with a scope that it otherwise would not be able to access. Specifically, work-items can pull valid data from and push updates to scopes that do not (hierarchically) contain them.
    Type: Grant
    Filed: November 14, 2014
    Date of Patent: October 31, 2017
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Marc S. Orr, Bradford M. Beckmann, Ayse Yilmazer, Shuai Che, David A. Wood, Mark D. Hill
  • Publication number: 20170293487
    Abstract: A method of performing memory synchronization operations is provided that includes receiving, at a programmable cache controller in communication with one or more caches, an instruction in a first language to perform a memory synchronization operation of synchronizing a plurality of instruction sequences executing on a processor, mapping the received instruction in the first language to one or more selected cache operations in a second language executable by the cache controller and executing the one or more cache operations to perform the memory synchronization operation. The method further comprises receiving a second mapping that provides mapping instructions to map the received instruction to one or more other cache operations, mapping the received instruction to one or more other cache operations and executing the one or more other cache operations to perform the memory synchronization operation.
    Type: Application
    Filed: April 11, 2016
    Publication date: October 12, 2017
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Shuai Che, Marc S. Orr, Bradford M. Beckmann
  • Patent number: 9766936
    Abstract: The described embodiments comprise a selection mechanism that selects a resource from a set of resources in a computing device for performing an operation. In some embodiments, the selection mechanism performs a lookup in a table selected from a set of tables to identify a resource from the set of resources. When the resource is not available for performing the operation and until another resource is selected for performing the operation, the selection mechanism identifies a next resource in the table and selects the next resource for performing the operation when the next resource is available for performing the operation.
    Type: Grant
    Filed: November 6, 2015
    Date of Patent: September 19, 2017
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Bradford M. Beckmann, Mithuna S. Thottethodi, James M. O'Connor, Mauricio Breternitz, Lisa R. Hsu, Gabriel H. Loh, Yasuko Eckert
  • Patent number: 9697147
    Abstract: A processing system comprises one or more processor devices and other system components coupled to a stacked memory device having a set of stacked memory layers and a set of one or more logic layers. The set of logic layers implements a metadata manager that offloads metadata management from the other system components. The set of logic layers also includes a memory interface coupled to memory cell circuitry implemented in the set of stacked memory layers and coupleable to the devices external to the stacked memory device. The memory interface operates to perform memory accesses for the external devices and for the metadata manager. By virtue of the metadata manager's tight integration with the stacked memory layers, the metadata manager may perform certain memory-intensive metadata management operations more efficiently than could be performed by the external devices.
    Type: Grant
    Filed: August 6, 2012
    Date of Patent: July 4, 2017
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Gabriel H. Loh, James M. O'Connor, Bradford M. Beckmann, Michael Ignatowski
  • Patent number: 9652390
    Abstract: Apparatus, computer readable medium, integrated circuit, and method of moving a plurality of data items to a first cache or a second cache are presented. The method includes receiving an indication that the first cache requested the plurality of data items. The method includes storing information indicating that the first cache requested the plurality of data items. The information may include an address for each of the plurality of data items. The method includes determining based at least on the stored information to move the plurality of data items to the second cache. The method includes moving the plurality of data items to the second cache. The method may include determining a time interval between receiving the indication that the first cache requested the plurality of data items and moving the plurality of data items to the second cache. A scratch pad memory is disclosed.
    Type: Grant
    Filed: August 5, 2014
    Date of Patent: May 16, 2017
    Assignee: Advanced Micro Devices, Inc.
    Inventors: JunLi Gu, Bradford M. Beckmann, Yuan Xie
  • Publication number: 20170004080
    Abstract: Methods, devices, and systems for managing performance of a processor having multiple compute units. An effective number of the multiple compute units may be determined to designate as having priority. On a condition that the effective number is nonzero, the effective number of the multiple compute units may each be designated as a priority compute unit. Priority compute units may have access to a shared cache whereas non-priority compute units may not. Workgroups may be preferentially dispatched to priority compute units. Memory access requests from priority compute units may be served ahead of requests from non-priority compute units.
    Type: Application
    Filed: June 30, 2015
    Publication date: January 5, 2017
    Applicant: ADVANCED MICRO DEVICES, INC.
    Inventors: Zhe Wang, Sooraj Puthoor, Bradford M. Beckmann
  • Publication number: 20160357551
    Abstract: A conditional fetch-and-phi operation tests a memory location to determine if the memory locations stores a specified value and, if so, modifies the value at the memory location. The conditional fetch-and-phi operation can be implemented so that it can be concurrently executed by a plurality of concurrently executing threads, such as the threads of wavefront at a GPU. To execute the conditional fetch-and-phi operation, one of the concurrently executing threads is selected to execute a compare-and-swap (CAS) operation at the memory location, while the other threads await the results. The CAS operation tests the value at the memory location and, if the CAS operation is successful, the value is passed to each of the concurrently executing threads.
    Type: Application
    Filed: June 2, 2015
    Publication date: December 8, 2016
    Inventors: David A. Wood, Steven K. Reinhardt, Bradford M. Beckmann, Marc S. Orr
  • Publication number: 20160352598
    Abstract: A system and method for efficient management of network traffic management of highly data parallel computing. A processing node includes one or more processors capable of generating network messages. A network interface is used to receive and send network messages across a network. The processing node reduces at least one of a number or a storage size of the original network messages into one or more new network messages. The new network messages are sent to the network interface to send across the network.
    Type: Application
    Filed: May 26, 2016
    Publication date: December 1, 2016
    Inventors: Steven K. Reinhardt, Marc S. Orr, Bradford M. Beckmann, Shuai Che, David A. Wood
  • Patent number: 9477599
    Abstract: A method, computer program product, and system is described that enforces a release consistency with special accesses sequentially consistent (RCsc) memory model and executes release synchronization instructions such as a StRel event without tracking an outstanding store event through a memory hierarchy, while efficiently using bandwidth resources. What is also described is the decoupling of a store event from an ordering of the store event with respect to a RCsc memory model. The description also includes a set of hierarchical read/write combining buffers that coalesce stores from different parts of the system. In addition, a pool component maintains partial order of received store events and release synchronization events to avoid content addressable memory (CAM) structures, full cache flushes, as well as direct write-throughs to memory. The approach improves the performance of both global and local synchronization events since a store event may not need to reach main memory to complete.
    Type: Grant
    Filed: August 7, 2013
    Date of Patent: October 25, 2016
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Blake A. Hechtman, Bradford M. Beckmann
  • Publication number: 20160239302
    Abstract: A method, a non-transitory computer readable medium, and a processor for repacking dynamic wavefronts during program code execution on a processing unit, each dynamic wavefront including multiple threads are presented. If a branch instruction is detected, a determination is made whether all wavefronts following a same control path in the program code have reached a compaction point, which is the branch instruction. If no branch instruction is detected in executing the program code, a determination is made whether all wavefronts following the same control path have reached a reconvergence point, which is a beginning of a program code segment to be executed by both a taken branch and a not taken branch from a previous branch instruction. The dynamic wavefronts are repacked with all threads that follow the same control path, if all wavefronts following the same control path have reached the branch instruction or the reconvergence point.
    Type: Application
    Filed: April 9, 2015
    Publication date: August 18, 2016
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Sooraj Puthoor, Bradford M. Beckmann, Dmitri Yudanov
  • Patent number: 9411663
    Abstract: The described embodiments comprise a first hardware context. The first hardware context receives, from a second hardware context, an indication of a memory location and a condition to be met by the memory location. The first hardware context then sends a signal to the second hardware context when the memory location meets the condition.
    Type: Grant
    Filed: March 1, 2013
    Date of Patent: August 9, 2016
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Steven K. Reinhardt, Marc S. Orr, Bradford M. Beckmann
  • Patent number: 9396112
    Abstract: A method, computer program product, and system is described that enforces a release consistency with special accesses sequentially consistent (RCsc) memory model and executes release synchronization instructions such as a StRel event without tracking an outstanding store event through a memory hierarchy, while efficiently using bandwidth resources. What is also described is the decoupling of a store event from an ordering of the store event with respect to a RCsc memory model. The description also includes a set of hierarchical read-only cache and write-only combining buffers that coalesce stores from different parts of the system. In addition, a pool component maintains partial order of received store events and release synchronization events to avoid content addressable memory (CAM) structures, full cache flushes, as well as direct write-throughs to memory. The approach improves the performance of both global and local synchronization events and reduces overhead in maintaining write-only combining buffers.
    Type: Grant
    Filed: August 26, 2013
    Date of Patent: July 19, 2016
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Blake A. Hechtman, Bradford M. Beckmann
  • Patent number: 9361118
    Abstract: A method, computer program product, and system is described that determines the correctness of using memory operations in a computing device with heterogeneous computer components. Embodiments include an optimizer based on the characteristics of a Sequential Consistency for Heterogeneous-Race-Free (SC for HRF) model that analyzes a program and determines the correctness of the ordering of events in the program. HRF models include combinations of the properties: scope order, scope inclusion, and scope transitivity. The optimizer can determine when a program is heterogeneous-race-free in accordance with an SC for HRF memory consistency model. For example, the optimizer can analyze a portion of program code, respect the properties of the SC for HRF model, and determine whether a value produced by a store memory event will be a candidate for a value observed by a load memory event. In addition, the optimizer can determine whether reordering of events is possible.
    Type: Grant
    Filed: May 12, 2014
    Date of Patent: June 7, 2016
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Derek R. Hower, Mark D. Hill, David Wood, Steven K. Reinhardt, Benedict R. Gaster, Blake A. Hechtman, Bradford M. Beckmann