Patents by Inventor Bradford M. Beckmann

Bradford M. Beckmann has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 9354892
    Abstract: Methods, media, and computing systems are provided. The method includes, the media are configured for, and the computing system includes a processor with control logic for allocating memory for storing a plurality of local register states for work items to be executed in single instruction multiple data hardware and for repacking wavefronts that include work items associated with a program instruction responsive to a conditional statement. The repacking is configured to create repacked wavefronts that include at least one of a wavefront containing work items that all pass the conditional statement and a wavefront containing work items that all fail the conditional statement.
    Type: Grant
    Filed: November 29, 2012
    Date of Patent: May 31, 2016
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Timothy G. Rogers, Bradford M. Beckmann, James M. O'Connor
  • Publication number: 20160139624
    Abstract: Described herein is an apparatus and method for remote scoped synchronization, which is a new semantic that allows a work-item to order memory accesses with a scope instance outside of its scope hierarchy. More precisely, remote synchronization expands visibility at a particular scope to all scope-instances encompassed by that scope. Remote scoped synchronization operation allows smaller scopes to be used more frequently and defers added cost to only when larger scoped synchronization is required. This enables programmers to optimize the scope that memory operations are performed at for important communication patterns like work stealing. Executing memory operations at the optimum scope reduces both execution time and energy. In particular, remote synchronization allows a work-item to communicate with a scope that it otherwise would not be able to access. Specifically, work-items can pull valid data from and push updates to scopes that do not (hierarchically) contain them.
    Type: Application
    Filed: November 14, 2014
    Publication date: May 19, 2016
    Applicant: ADVANCED MICRO DEVICES, INC.
    Inventors: Marc S. Orr, Bradford M. Beckmann, Ayse Yilmazer, Shuai Che, David A. Wood, Mark D. Hill
  • Patent number: 9342334
    Abstract: A system and method for simulating new instructions without compiler support for the new instructions. A simulator detects a given region in code generated by a compiler. The given region may be a candidate for vectorization or may be a region already vectorized. In response to the detection, the simulator suspends execution of a time-based simulation. The simulator then serially executes the region for at least two iterations using a functional-based simulation and using instructions with operands which correspond to P or less lanes of single-instruction-multiple-data (SIMD) execution. The value P is a maximum number of lanes of SIMD exection supported both by the compiler. The simulator stores checkpoint state during the serial execution. In response to determining no inter-iteration memory dependencies exist, the simulator returns to the time-based simulation and resumes execution using N-wide vector instructions.
    Type: Grant
    Filed: June 22, 2012
    Date of Patent: May 17, 2016
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Bradford M. Beckmann, Nilay Vaish, Steven K. Reinhardt
  • Patent number: 9317296
    Abstract: Methods, and media, and computer systems are provided. The method includes, the media includes control logic for, and the computer system includes a processor with control logic for overriding an execution mask of SIMD hardware to enable at least one of a plurality of lanes of the SIMD hardware. Overriding the execution mask is responsive to a data parallel computation and a diverged control flow of a workgroup.
    Type: Grant
    Filed: December 21, 2012
    Date of Patent: April 19, 2016
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Timothy G. Rogers, Bradford M. Beckmann, James M. O'Connor
  • Publication number: 20160062803
    Abstract: The described embodiments comprise a selection mechanism that selects a resource from a set of resources in a computing device for performing an operation. In some embodiments, the selection mechanism performs a lookup in a table selected from a set of tables to identify a resource from the set of resources. When the resource is not available for performing the operation and until another resource is selected for performing the operation, the selection mechanism identifies a next resource in the table and selects the next resource for performing the operation when the next resource is available for performing the operation.
    Type: Application
    Filed: November 6, 2015
    Publication date: March 3, 2016
    Inventors: Bradford M. Beckmann, Mithuna S. Thottethodi, James M. O'Connor, Mauricio Breternitz, Lisa R. Hsu, Gabriel H. Loh, Yasuko Eckert
  • Publication number: 20160041909
    Abstract: Apparatus, computer readable medium, integrated circuit, and method of moving a plurality of data items to a first cache or a second cache are presented. The method includes receiving an indication that the first cache requested the plurality of data items. The method includes storing information indicating that the first cache requested the plurality of data items. The information may include an address for each of the plurality of data items. The method includes determining based at least on the stored information to move the plurality of data items to the second cache. The method includes moving the plurality of data items to the second cache. The method may include determining a time interval between receiving the indication that the first cache requested the plurality of data items and moving the plurality of data items to the second cache. A scratch pad memory is disclosed.
    Type: Application
    Filed: August 5, 2014
    Publication date: February 11, 2016
    Applicant: ADVANCED MICRO DEVICES, INC.
    Inventors: JunLi Gu, Bradford M. Beckmann, Yuan Xie
  • Patent number: 9256535
    Abstract: The described embodiments comprise a computing device with a first processor core and a second processor core. In some embodiments, during operations, the first processor core receives, from the second processor core, an indication of a memory location and a flag. The first processor core then stores the flag in a first cache line in a cache in the first processor core and stores the indication of the memory location separately in a second cache line in the cache. Upon encountering a predetermined result when evaluating a condition for the indicated memory location, the first processor core updates the flag in the first cache line. Based on the update of the flag, the first processor core causes the second processor core to perform an operation.
    Type: Grant
    Filed: April 4, 2013
    Date of Patent: February 9, 2016
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Steven K. Reinhardt, Marc S. Orr, Bradford M. Beckmann
  • Publication number: 20150363903
    Abstract: A processor comprising hardware logic configured to execute of a first wavefront in a hardware resource and stop execution of the first wavefront before the first wavefront completes. The processor schedules a second wavefront for execution in the hardware resource.
    Type: Application
    Filed: June 13, 2014
    Publication date: December 17, 2015
    Inventors: Marc S. Orr, Bradford M. Beckmann, Benedict R. Gaster, Steven K. Reinhardt, David A. Wood
  • Patent number: 9201777
    Abstract: A die-stacked memory device implements an integrated QoS manager to provide centralized QoS functionality in furtherance of one or more specified QoS objectives for the sharing of the memory resources by other components of the processing system. The die-stacked memory device includes a set of one or more stacked memory dies and one or more logic dies. The logic dies implement hardware logic for a memory controller and the QoS manager. The memory controller is coupleable to one or more devices external to the set of one or more stacked memory dies and operates to service memory access requests from the one or more external devices. The QoS manager comprises logic to perform operations in furtherance of one or more QoS objectives, which may be specified by a user, by an operating system, hypervisor, job management software, or other application being executed, or specified via hardcoded logic or firmware.
    Type: Grant
    Filed: December 23, 2012
    Date of Patent: December 1, 2015
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Lisa R. Hsu, Gabriel H. Loh, Bradford M. Beckmann, Michael Ignatowski
  • Patent number: 9189399
    Abstract: A processor system presented here has a plurality of execution cores and a plurality of stack caches, wherein each of the stack caches is associated with a different one of the execution cores. A method of managing stack data for the processor system is presented here. The method maintains a stack cache manager for the plurality of execution cores. The stack cache manager includes entries for stack data accessed by the plurality of execution cores. The method processes, for a requesting execution core of the plurality of execution cores, a virtual address for requested stack data. The method continues by accessing the stack cache manager to search for an entry of the stack cache manager that includes the virtual address for requested stack data, and using information in the entry to retrieve the requested stack data.
    Type: Grant
    Filed: May 3, 2013
    Date of Patent: November 17, 2015
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Lena E. Olson, Yasuko Eckert, Bradford M. Beckmann
  • Patent number: 9183055
    Abstract: The described embodiments comprise a selection mechanism that selects a resource from a set of resources in a computing device for performing an operation. In some embodiments, the selection mechanism is configured to perform a lookup in a table selected from a set of tables to identify a resource from the set of resources. When the identified resource is not available for performing the operation and until a resource is selected for performing the operation, the selection mechanism is configured to identify a next resource in the table and select the next resource for performing the operation when the next resource is available for performing the operation.
    Type: Grant
    Filed: February 7, 2013
    Date of Patent: November 10, 2015
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Bradford M. Beckmann, Mithuna S. Thottethodi, James M. O'Connor, Mauricio Breternitz, Lisa R. Hsu, Gabriel H. Loh, Yasuko Eckert
  • Patent number: 9170948
    Abstract: A die-stacked memory device implements an integrated coherency manager to offload cache coherency protocol operations for the devices of a processing system. The die-stacked memory device includes a set of one or more stacked memory dies and a set of one or more logic dies. The one or more logic dies implement hardware logic providing a memory interface and the coherency manager. The memory interface operates to perform memory accesses in response to memory access requests from the coherency manager and the one or more external devices. The coherency manager comprises logic to perform coherency operations for shared data stored at the stacked memory dies. Due to the integration of the logic dies and the memory dies, the coherency manager can access shared data stored in the memory dies and perform related coherency operations with higher bandwidth and lower latency and power consumption compared to the external devices.
    Type: Grant
    Filed: December 23, 2012
    Date of Patent: October 27, 2015
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Gabriel H. Loh, Bradford M. Beckmann, Lisa R. Hsu, Michael Ignatowski, Michael J. Schulte
  • Publication number: 20150293845
    Abstract: Described is a system and method for a multi-level memory hierarchy. Each level is based on different attributes including, for example, power, capacity, bandwidth, reliability, and volatility. In some embodiments, the different levels of the memory hierarchy may use an on-chip stacked dynamic random access memory, (providing fast, high-bandwidth, low-energy access to data) and an off-chip non-volatile random access memory, (providing low-power, high-capacity storage), in order to provide higher-capacity, lower power, and higher-bandwidth performance. The multi-level memory may present a unified interface to a processor so that specific memory hardware and software implementation details are hidden. The multi-level memory enables the illusion of a single-level memory that satisfies multiple conflicting constraints. A comparator receives a memory address from the processor, processes the address and reads from or writes to the appropriate memory level.
    Type: Application
    Filed: April 11, 2014
    Publication date: October 15, 2015
    Applicant: ADVANCED MICRO DEVICES, INC.
    Inventors: Lisa R. Hsu, James M. O'Connor, Vilas K. Sridharan, Gabriel H. Loh, Nuwan S. Jayasena, Bradford M. Beckmann
  • Patent number: 9135185
    Abstract: A die-stacked memory device incorporates a data translation controller at one or more logic dies of the device to provide data translation services for data to be stored at, or retrieved from, the die-stacked memory device. The data translation operations implemented by the data translation controller can include compression/decompression operations, encryption/decryption operations, format translations, wear-leveling translations, data ordering operations, and the like. Due to the tight integration of the logic dies and the memory dies, the data translation controller can perform data translation operations with higher bandwidth and lower latency and power consumption compared to operations performed by devices external to the die-stacked memory device.
    Type: Grant
    Filed: December 23, 2012
    Date of Patent: September 15, 2015
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Gabriel H. Loh, Bradford M. Beckmann, James M. O'Connor, Michael Ignatowski, Michael J. Schulte, Lisa R. Hsu, Nuwan S. Jayasena
  • Patent number: 9075730
    Abstract: A system and method for efficiently limiting storage space for data with particular properties in a cache memory. A computing system includes a cache and one or more sources for memory requests. In response to receiving a request to allocate data of a first type, a cache controller allocates the data in the cache responsive to determining a limit of an amount of data of the first type permitted in the cache is not reached. The controller maintains an amount and location information of the data of the first type stored in the cache. Additionally, the cache may be partitioned with each partition designated for storing data of a given type. Allocation of data of the first type is dependent at least upon the availability of a first partition and a limit of an amount of data of the first type in a second partition.
    Type: Grant
    Filed: December 21, 2012
    Date of Patent: July 7, 2015
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Mithuna S. Thottethodi, Gabriel H. Loh, James M. O'Connor, Yasuko Eckert, Bradford M. Beckmann
  • Patent number: 9015448
    Abstract: A processor and method for broadcasting data among a plurality of processing cores is disclosed. The processor includes a plurality of processing cores connected by point-to-point connections. A first of the processing cores includes a router that includes at least an allocation unit and an output port. The allocation unit is configured to determine that respective input buffers on at least two others of the processing cores are available to receive given data. The output port is usable by the router to send the given data across one of the point-to-point connections. The router is configured to send the given data contingent on determining that the respective input buffers are available. Furthermore, the processor is configured to deliver the data to the at least two other processing cores in response to the first processing core sending the data once across the point-to-point connection.
    Type: Grant
    Filed: June 17, 2010
    Date of Patent: April 21, 2015
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Tushar Krishna, Bradford M. Beckmann, Steven K. Reinhardt
  • Publication number: 20150100758
    Abstract: A data processor includes a register file divided into at least a first portion and a second portion for storing data. A single instruction, multiple data (SIMD) unit is also divided into at least a first lane and a second lane. The first and second lanes of the SIMD unit correspond respectively to the first and second portions of the register file. Furthermore, each lane of the SIMD unit is capable of data processing. The data processor also includes a realignment element in communication with the register file and the SIMD unit. The realignment element is configured to selectively realign conveyance of data between the first portion of the register file and the first lane of the SIMD unit to the second lane of the SIMD unit.
    Type: Application
    Filed: October 3, 2013
    Publication date: April 9, 2015
    Applicant: ADVANCED MICRO DEVICES, INC.
    Inventors: Timothy G. Rogers, Bradford M. Beckmann, James M. O'Connor
  • Patent number: 9003130
    Abstract: A data processing device is provided that facilitates cache coherence policies. In one embodiment, a data processing device utilizes invalidation tags in connection with a cache that is associated with a processing engine. In some embodiments, the cache is configured to store a plurality of cache entries where each cache entry includes a cache line configured to store data and a corresponding cache tag configured to store address information associated with data stored in the cache line. Such address information includes invalidation flags with respect to addresses stored in the cache tags. Each cache tag is associated with an invalidation tag configured to store information related to invalidation commands of addresses stored in the cache tag. In such embodiment, the cache is configured to set invalidation flags of cache tags based upon information stored in respective invalidation tags.
    Type: Grant
    Filed: December 19, 2012
    Date of Patent: April 7, 2015
    Assignee: Advanced Micro Devices, Inc.
    Inventors: James O'Connor, Bradford M. Beckmann
  • Patent number: 8984255
    Abstract: A data processing device is provided that employs multiple translation look-aside buffers (TLBs) associated with respective processors that are configured to store selected address translations of a page table of a memory shared by the processors. The processing device is configured such that when an address translation is requested by a processor and is not found in the TLB associated with that processor, another TLB is probed for the requested address translation. The probe across to the other TLB may occur in advance of a walk of the page table for the requested address or alternatively a walk can be initiated concurrently with the probe. Where the probe successfully finds the requested address translation, the page table walk can be avoided or discontinued.
    Type: Grant
    Filed: December 21, 2012
    Date of Patent: March 17, 2015
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Lisa Hsu, Nuwan Jayasena, Andrew Kegel, Bradford M. Beckmann
  • Publication number: 20150058567
    Abstract: A method, computer program product, and system is described that enforces a release consistency with special accesses sequentially consistent (RCsc) memory model and executes release synchronization instructions such as a StRel event without tracking an outstanding store event through a memory hierarchy, while efficiently using bandwidth resources. What is also described is the decoupling of a store event from an ordering of the store event with respect to a RCsc memory model. The description also includes a set of hierarchical read-only cache and write-only combining buffers that coalesce stores from different parts of the system. In addition, a pool component maintains partial order of received store events and release synchronization events to avoid content addressable memory (CAM) structures, full cache flushes, as well as direct write-throughs to memory. The approach improves the performance of both global and local synchronization events and reduces overhead in maintaining write-only combining buffers.
    Type: Application
    Filed: August 26, 2013
    Publication date: February 26, 2015
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Blake A. Hechtman, Bradford M. Beckmann