Patents Assigned to Micro Devices, Inc.
  • Publication number: 20180069767
    Abstract: Techniques described herein improve processor performance in situations where a large number of system service requests are being received from other devices. More specifically, upon detecting that certain operating conditions that indicate a processor slowdown are present, the processor performs one or more system service adjustment techniques. These techniques include throttling (reducing the rate of handling) of such requests, coalescing (grouping multiple requests into a single group) the requests, disabling microarchitctural structures (such as caches or branch prediction units) or updates to those structures, and prefetching data for or pre-performing these requests. Each of these adjustment techniques helps to reduce the number of and/or workload associated with servicing requests for system services.
    Type: Application
    Filed: September 6, 2016
    Publication date: March 8, 2018
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Arkaprava Basu, Joseph L. Greathouse, Guru Prasadh V. Venkataramani, Jan Vesely
  • Patent number: 9910788
    Abstract: A processor device includes a cache and a memory storing a set of counters. Each counter of the set is associated with a corresponding block of a plurality of blocks of the cache. The processor device further includes a cache access monitor to, for each time quantum for a series of one or more time quanta, increment counter values of the set of counters based on accesses to the corresponding blocks of the cache. The processor device further includes a transfer engine to, after completion of each time quantum, transfer the counter values of the set of counters for the time quantum to a corresponding location in a system memory.
    Type: Grant
    Filed: September 22, 2015
    Date of Patent: March 6, 2018
    Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC
    Inventors: Philip J. Rogers, Benjamin T. Sander, Anthony Asaro
  • Patent number: 9910638
    Abstract: Square root operations in a computer processor are disclosed. A first iteration for calculating partial results of a square root operation is performed in a larger number of cycles than remaining iterations. The first iteration requires calculation of a first digit that is larger than the subsequent digits. The first iteration thus requires multiplication of values that are larger than corresponding values for the subsequent other digits. By splitting the first digit into two parts, the required multiplications can be performed in less time than if the first digit were not split. Performing these multiplications in less time reduces the total delay for clock cycles associated with the first digit calculations, which increases the possible clock frequency allowed. A multiply-and-accumulate unit that performs either packed-single operations or double-precision operations may be used, along with a combined division/square root unit for simultaneous execution of division and square root operations.
    Type: Grant
    Filed: August 25, 2016
    Date of Patent: March 6, 2018
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Hanbing Liu, John Kelley, Michael Estlick, Erik Swanson, Jay Fleischman
  • Patent number: 9910714
    Abstract: The described embodiments include a system for executing a load using a first processor and a seond processor in a computer system. During operation, a load balancer executing on the first processor obtains one or more attributes of a load to be executed on the computer system. Next, the load balancer applies a set of configurable rules to the one or more attributes to select a processor from the first and second processors for executing the load. Finally, the system executes the load on the selected processor.
    Type: Grant
    Filed: June 29, 2015
    Date of Patent: March 6, 2018
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Kent F. Knox, Jian Liu
  • Patent number: 9910605
    Abstract: A die-stacked hybrid memory device implements a first set of one or more memory dies implementing first memory cell circuitry of a first memory architecture type and a second set of one or more memory dies implementing second memory cell circuitry of a second memory architecture type different than the first memory architecture type. The die-stacked hybrid memory device further includes a set of one or more logic dies electrically coupled to the first and second sets of one or more memory dies, the set of one or more logic dies comprising a memory interface and a page migration manager, the memory interface coupleable to a device external to the die-stacked hybrid memory device, and the page migration manager to transfer memory pages between the first set of one or more memory dies and the second set of one or more memory dies.
    Type: Grant
    Filed: November 16, 2016
    Date of Patent: March 6, 2018
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Nuwan S. Jayasena, Gabriel H. Loh, James M. O'Connor, Niladrish Chatterjee
  • Publication number: 20180060124
    Abstract: With the success of programming models such as OpenCL and CUDA, heterogeneous computing platforms are becoming mainstream. However, these heterogeneous systems are low-level, not composable, and their behavior is often implementation defined even for standardized programming models. In contrast, the method and system embodiments for the heterogeneous parallel primitives (HPP) programming model disclosed herein provide a flexible and composable programming platform that guarantees behavior even in the case of developing high-performance code.
    Type: Application
    Filed: October 30, 2017
    Publication date: March 1, 2018
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Benedict R. Gaster, Lee W. Howes
  • Publication number: 20180060074
    Abstract: Disclosed are a method and a processing device directed to determining global branch history for branch prediction. The method includes shifting first bits of a branch signature into a current global branch history and performing a bitwise exclusive-or (XOR) function on second bits of the branch signature and shifted bits of the current global branch history. In this way, the current global branch history is updated. The processing device implements the method using a shift logic configured to store and shift bits representing a current global branch history, a register configured to store the current global branch history, decision circuitry configured to determine whether or not a branch is taken, and XOR gates.
    Type: Application
    Filed: August 30, 2016
    Publication date: March 1, 2018
    Applicant: Advanced Micro Devices, Inc.
    Inventor: Steven R. Havlir
  • Publication number: 20180060039
    Abstract: Square root operations in a computer processor are disclosed. A first iteration for calculating partial results of a square root operation is performed in a larger number of cycles than remaining iterations. The first iteration requires calculation of a first digit that is larger than the subsequent digits. The first iteration thus requires multiplication of values that are larger than corresponding values for the subsequent other digits. By splitting the first digit into two parts, the required multiplications can be performed in less time than if the first digit were not split. Performing these multiplications in less time reduces the total delay for clock cycles associated with the first digit calculations, which increases the possible clock frequency allowed. A multiply-and-accumulate unit that performs either packed-single operations or double-precision operations may be used, along with a combined division/square root unit for simultaneous execution of division and square root operations.
    Type: Application
    Filed: August 25, 2016
    Publication date: March 1, 2018
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Hanbing Liu, John Kelley, Michael Estlick, Erik Swanson, Jay Fleischman
  • Publication number: 20180060073
    Abstract: Techniques for improving branch target buffer (“BTB”) operation. A compressed BTB is included within a branch prediction unit along with an uncompressed BTB. To support prediction of up to two branch instructions per cycle, the uncompressed BTB includes entries that each store data for up to two branch predictions. The compressed BTB includes entries that store data for only a single branch instruction for situations where storing that single branch instruction in the uncompressed BTB would waste space in that buffer. Space would be wasted in the uncompressed BTB due to the fact that, in order to support two branch lookups per cycle, prediction data for two branches must have certain features in common (such as cache line address) in order to be stored together in a single entry.
    Type: Application
    Filed: August 30, 2016
    Publication date: March 1, 2018
    Applicant: Advanced Micro Devices, Inc.
    Inventor: Steven R. Havlir
  • Patent number: 9903913
    Abstract: A capture clock generation control mechanism is provided. The capture clock generation control mechanism controls the number of at-speed clocks generated and supplied to one or more scan chains during scan testing of a microcircuit based on control data stored in a JTAG or scan test register. The scan test register may be formed out of scan cells and comprise part of a scan chain. Automatic Test Pattern Generation (ATPG) tools may generate the data that is loaded into the scan test register to automatically configure the clock generation control mechanism. The clock control mechanism may include the ability to adjust the position of the at-speed clocks within a capture cycle, thereby facilitating transition fault detection.
    Type: Grant
    Filed: January 20, 2014
    Date of Patent: February 27, 2018
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Atchyuth Gorti, Anirudh Kadiyala, Bill K. C. Kwan, Venkat Kuchipudi
  • Patent number: 9904623
    Abstract: A system includes a functional unit, at least one cache coupled to the functional unit, and a power management unit coupled to the functional unit and the at least one cache, the power management unit configured to trigger the functional unit to initiate prefetching of data to repopulate the at least one cache prior to a predicted exit of the functional unit from an idle mode to an active mode. The system further may include a prediction unit to predict the exit from the idle mode for the functional unit as occurring a predetermined duration from an entry into the idle mode. The prediction unit may determine the predetermined duration based on a history of idle mode durations indicative of durations of previous instances in which the functional unit was in the idle mode.
    Type: Grant
    Filed: May 1, 2015
    Date of Patent: February 27, 2018
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Madhu Saravana Sibi Govindan, William Lloyd Bircher, Aniruddha Dasgupta, Dongyuan Zhan
  • Publication number: 20180052779
    Abstract: A data cache region prefetcher creates a region when a data cache miss occurs. Each region includes a predetermined range of data lines proximate to each data cache miss and is tagged with an associated instruction pointer register (RIP). The data cache region prefetcher compares subsequent memory requests against the predetermined range of data lines for each of the existing regions. For each match, the data cache region prefetcher sets an access bit and attempts to identify a pseudo-random access pattern based on the set access bits. The data cache region prefetcher increments or decrements appropriate counters to track how often the pseudo-random access pattern occurs. If the pseudo-random access pattern occurs frequently, then the next time a memory request is processed with the same RIP and pattern, the data cache region prefetcher prefetches the data lines in accordance with the pseudo-random access pattern for that RIP.
    Type: Application
    Filed: October 13, 2016
    Publication date: February 22, 2018
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Donald W. McCauley, William E. Jones
  • Publication number: 20180052613
    Abstract: A system and method for tracking stores and loads to reduce load latency when forming the same memory address by bypassing a load store unit within an execution unit is disclosed. The system and method include storing data in one or more memory dependent architectural register numbers (MdArns), allocating the one or more MdArns to a MEMFILE, writing the allocated one or more MdArns to a map file, wherein the map file contains a MdArn map to enable subsequent access to an entry in the MEMFILE, upon receipt of a load request, checking a base, an index, a displacement and a match/hit via the map file to identify an entry in the MEMFILE and an associated store, and on a hit, providing the entry responsive to the load request from the one or more MdArns.
    Type: Application
    Filed: December 15, 2016
    Publication date: February 22, 2018
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Betty Ann McDaniel, Michael D. Achenbach, David N. Suggs, Frank C. Galloway, Kai Troester, Krishnan V. Ramani
  • Publication number: 20180052778
    Abstract: A processing apparatus and a method of accessing data using cache hot set detection is provided that includes receiving a plurality of requests to access data in a cache. The cache includes a plurality of cache sets each including N number of cache lines. Each request includes an address. The apparatus and a method also includes storing, in a HSVC array, cache line victims of one or more of the plurality of cache sets determined to be hot sets. Each cache line victim includes a corresponding address that is determined, using a HSD array, to belong to the one or more determined cache hot sets based on a hot set frequency of a plurality of addresses mapped to the set in the cache.
    Type: Application
    Filed: August 22, 2016
    Publication date: February 22, 2018
    Applicant: Advanced Micro Devices, Inc.
    Inventors: John Kalamatianos, Adithya Yalavarti, Johnsy Kanjirapallil John
  • Publication number: 20180052631
    Abstract: A method and apparatus of compressing addresses for transmission includes receiving a transaction at a first device from a source that includes a memory address request for a memory location on a second device. It is determined if a first part of the memory address is stored in a cache located on the first device. If the first part of the memory address is not stored in the cache, the first part of the memory address is stored in the cache and the entire memory address and information relating to the storage of the first part is transmitted to the second device. If the first part of the memory address is stored in the cache, only a second part of the memory address and an identifier that indicates a way in which the first part of the address is stored in the cache is transmitted to the second device.
    Type: Application
    Filed: November 8, 2016
    Publication date: February 22, 2018
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Vydhyanathan Kalyanasundharam, Greggory D. Donley
  • Patent number: 9899074
    Abstract: A data processing system includes a memory channel and a data processor coupled to the memory channel. The data processor is adapted to access at least one rank and has refresh logic. In response to an activation of the refresh logic, the data processor generates refresh cycles to a bank of the memory channel. The data processor selects one of a first state corresponding to a first auto-refresh command that causes the data processor to auto-refresh the bank, and a second state corresponding to a second auto-refresh command that causes the data processor to auto-refresh a selected subset of the bank. The data processor initiates a switch between the first state and the second state in response to the refresh logic detecting a first condition related to the bank, and between the second state and the first state in response to the refresh logic circuit detecting a second condition.
    Type: Grant
    Filed: January 17, 2017
    Date of Patent: February 20, 2018
    Assignee: Advanced Micro Devices, Inc.
    Inventor: Kedarnath Balakrishnan
  • Patent number: 9898568
    Abstract: Systems, apparatuses, and methods for reducing the load on the bitlines of a ROM bitcell array are described. The connections between nets of a ROM bitcell array may be assigned based on their programmed values using a traditional approach. Then, a plurality of optimizations may be performed on the assignment of nets to reduce the load on the bitlines of the array. A first optimization may swap the connections between ground and bitline for the nets of a given column responsive to detecting that the number of connections to the corresponding bitline is greater than the number of connections to ground for the given column. A second optimization may remove the connection of a net to a bitline if three consecutive nets of a given column are connected to the bitline.
    Type: Grant
    Filed: June 23, 2015
    Date of Patent: February 20, 2018
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Naveen Chandra Srivastava, Janardhan Achanta, Pankaj Kumar, Shreekanth Karandoor Sampigethaya
  • Patent number: 9899992
    Abstract: A circuit adapts to the occurrence of metastable states. The circuit inhibits passing of the metastable state to circuits that follow, by clock gating the output stage. In order to determine whether or not to gate the clock of the output stage, two detect circuits may be used. One circuit detects metastability and another circuit detects metastability resolved to a wrong logic level. The results from one or both detector circuits are used to gate the next clock cycle if needed, waiting for the metastable situation to be resolved.
    Type: Grant
    Filed: August 17, 2016
    Date of Patent: February 20, 2018
    Assignee: Advanced Micro Devices, Inc.
    Inventor: Greg Sadowski
  • Patent number: 9898287
    Abstract: A method, a non-transitory computer readable medium, and a processor for repacking dynamic wavefronts during program code execution on a processing unit, each dynamic wavefront including multiple threads are presented. If a branch instruction is detected, a determination is made whether all wavefronts following a same control path in the program code have reached a compaction point, which is the branch instruction. If no branch instruction is detected in executing the program code, a determination is made whether all wavefronts following the same control path have reached a reconvergence point, which is a beginning of a program code segment to be executed by both a taken branch and a not taken branch from a previous branch instruction. The dynamic wavefronts are repacked with all threads that follow the same control path, if all wavefronts following the same control path have reached the branch instruction or the reconvergence point.
    Type: Grant
    Filed: April 9, 2015
    Date of Patent: February 20, 2018
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Sooraj Puthoor, Bradford M. Beckmann, Dmitri Yudanov
  • Publication number: 20180046583
    Abstract: Techniques for improving translation lookaside buffer (TLB) operation are disclosed. A particular entry of the TLB is to be updated with data associated with a large page size. The TLB updates replacement policy data for that TLB entry for that large page size to indicate that the TLB entry is not the least-recently-used. To prevent smaller pages from evicting the TLB entry for the large page size, the TLB also updates replacement policy data for that TLB entry for the smaller page size to indicate that the TLB entry is not the least-recently-used.
    Type: Application
    Filed: August 12, 2016
    Publication date: February 15, 2018
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Anthony J. Bybell, John M. King