Patents Assigned to Advanced Micro Devices
  • Patent number: 10515173
    Abstract: An electronic device includes a first integrated circuit chip including a processing functional block, and a second integrated circuit chip including an input-output (IO) functional block. The IO functional block performs one or more IO processing operations on behalf of the processing functional block in the first integrated circuit chip. The first integrated circuit chip lacks at least some elements of the IO functional block, so that the processing functional block is unable to perform corresponding IO operations without the IO functional block.
    Type: Grant
    Filed: December 29, 2017
    Date of Patent: December 24, 2019
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: David A. Roberts, Dean Gonzales
  • Publication number: 20190384722
    Abstract: A data processing system includes a memory, a group of input/output (I/O) devices, an input/output memory management unit (IOMMU). The IOMMU is connected to the memory and adapted to allocate a hardware resource from among a group of hardware resources to receive an address translation request for a memory access from an I/O device. The IOMMU detects address translation requests from the plurality of I/O devices. The IOMMU reorders the address translation requests such that an order of dispatching an address translation request is based on a policy associated with the I/O device that is requesting the memory access. The IOMMU selectively allocates a hardware resource to the input/output device, based on the policy that is associated with the I/O device in response to the reordering.
    Type: Application
    Filed: June 13, 2018
    Publication date: December 19, 2019
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Arkaprava Basu, Michael LeBeane, Eric Van Tassell
  • Patent number: 10509736
    Abstract: An input-output (IO) memory management unit (IOMMU) uses a reverse map table (RMT) to ensure that address translations acquired from a nested page table are correct and that IO devices are permitted to access pages in a memory when performing memory accesses in a computing device. A translation lookaside buffer (TLB) flushing mechanism is used to invalidate address translation information in TLBs that are affected by changes in the RMT. A modified Address Translation Caching (ATC) mechanism may be used, in which only partial address translation information is provided to IO devices so that the RMT is checked when performing memory accesses for the IO devices using the cached address translation information.
    Type: Grant
    Filed: April 10, 2018
    Date of Patent: December 17, 2019
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Nippon Raval, David A. Kaplan, Philip Ng
  • Patent number: 10509452
    Abstract: Techniques for managing power distribution amongst processors in a massively parallel computer architecture are disclosed. The techniques utilize a hierarchy that organizes the various processors of the massively parallel computer architecture. The hierarchy groups numbers of the processors at the lowest level. When processors complete tasks, the power assigned to those processors is distributed to other processors in the same group so that the performance of those processors can be increased. Hierarchical organization simplifies the calculations required for determining how and when to distribute power, because when tasks are complete and power is available for distribution, a relatively small number of processors are available for consideration to receive that power. The number of processors that are grouped together can be adjusted in real time based on performance factors to improve the trade-off between calculation speed and power distribution efficacy.
    Type: Grant
    Filed: April 26, 2017
    Date of Patent: December 17, 2019
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Xinwei Chen, Leonardo de Paula Rosa Piga
  • Patent number: 10509596
    Abstract: A technique for accessing memory in an accelerated processing device coupled to stacked memory dies is provided herein. The technique includes receiving a memory access request from an execution unit and identifying whether the memory access request corresponds to memory cells of the stacked dies that are considered local to the execution unit or non-local. For local accesses, the access is made “directly”, that is, without using a bus. A control die coordinates operations for such local accesses, activating particular through-silicon-vias associated with the memory cells that include the data for the access. Non-local accesses are made via a distributed cache fabric and an interconnect bus in the control die. Various other features and details are provided below.
    Type: Grant
    Filed: December 21, 2017
    Date of Patent: December 17, 2019
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Dmitri Yudanov, Jiasheng Chen
  • Patent number: 10510185
    Abstract: A technique for performing rasterization and pixel shading with decoupled resolution is provided herein. The technique involves performing rasterization as normal to generate fine rasterization data and a set of (fine) quads. The quads are accumulated into a tile buffer and coarse quads are generated from the quads in the tile buffer based on a shading rate. The shading rate determines how many pixels of the fine quads are combined to generate coarse pixels of the coarse quads. Combination of fine pixels involves generating a single coarse pixel for each such fine pixel to be combined. The positions of the coarse pixels of the coarse quads are set based on the positions of the corresponding fine pixels. The coarse quads are shaded normally and the resulting shaded coarse quads are modified based on the fine rasterization data to generate shaded fine quads.
    Type: Grant
    Filed: August 25, 2017
    Date of Patent: December 17, 2019
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Skyler Jonathon Saleh, Christopher J. Brennan, Andrew S. Pomianowski, Ruijin Wu
  • Patent number: 10509732
    Abstract: A cache controller applies an aging policy to a portion of a cache based on access metrics for different test regions of the cache, whereby each test region implements a different aging policy. The aging policy for each region establishes an initial age value for each entry of the cache, and a particular aging policy can set the age for a given entry based on whether the entry was placed in the cache in response to a demand request from a processor core or in response to a prefetch request. The cache controller can use the age value of each entry as a criterion in its cache replacement policy.
    Type: Grant
    Filed: April 27, 2016
    Date of Patent: December 17, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventor: Paul Moyer
  • Patent number: 10509752
    Abstract: A data processing system includes a processing unit that forms a base die and has a group of through-silicon vias (TSVs), and is connected to a memory system. The memory system includes a die stack that includes a first die and a second die. The first die has a first surface that includes a group of micro-bump landing pads and a group of TSV landing pads. The group of micro-bump landing pads are connected to the group of TSVs of the processing unit using a corresponding group of micro-bumps. The first die has a group of memory die TSVs. The subsequent die has a first surface that includes a group of micro-bump landing pads and a group of TSV landing pads connected to the group of TSVs of the first die. The first die communicates with the processing unit using first cycle timing, and with the subsequent die using second cycle timing.
    Type: Grant
    Filed: April 27, 2018
    Date of Patent: December 17, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Russell Schreiber, John Wuu, Michael K. Ciraula, Patrick J. Shyvers
  • Patent number: 10510721
    Abstract: Various molded chip combinations and methods of manufacturing the same are disclosed. In one aspect, a molded chip combination is provided that includes a first semiconductor chip that has a first PHY region, a second semiconductor chip that has a second PHY region, an interconnect chip interconnecting the first PHY region to the second PHY region, and a molding joining together the first semiconductor chip, the second semiconductor chip and the interconnect chip.
    Type: Grant
    Filed: August 11, 2017
    Date of Patent: December 17, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Milind S. Bhagavat, Lei Fu, Ivor Barber, Chia-Ken Leong, Rahul Agarwal
  • Patent number: 10510164
    Abstract: A processing unit, method, and medium for decompressing or generating textures within a graphics processing unit (GPU). The textures are compressed with a variable-rate compression scheme such as JPEG. The compressed textures are retrieved from system memory and transferred to local cache memory on the GPU without first being decompressed. A table is utilized by the cache to locate individual blocks within the compressed texture. A decompressing shader processor receives compressed blocks and then performs on-the-fly decompression of the blocks. The decompressed blocks are then processed as usual by a texture consuming shader processor of the GPU.
    Type: Grant
    Filed: June 13, 2016
    Date of Patent: December 17, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Konstantine Iourcha, John W. Brothers
  • Patent number: 10503658
    Abstract: The present disclosure is directed to techniques for migrating data between heterogeneous memories in a computing system. More specifically, the techniques involve migrating data between a memory having better access characteristics (e.g., lower latency but greater capacity) and a memory having worse access characteristics (e.g., higher latency but lower capacity). Migrations occur with a variable migration granularity. A migration granularity specifies a number of memory pages, having virtual addresses that are contiguous in virtual address space, that are migrated in a single migration operation. A history-based technique that adjusts migration granularity based on the history of memory utilization by an application is provided. A profiling-based technique that adjusts migration granularity based on a profiling operation is also provided.
    Type: Grant
    Filed: April 27, 2017
    Date of Patent: December 10, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Arkaprava Basu, Jee Ho Ryoo
  • Patent number: 10503648
    Abstract: Systems, apparatuses, and methods for accelerating cache to cache data transfers are disclosed. A system includes at least a plurality of processing nodes and prediction units, an interconnect fabric, and a memory. A first prediction unit is configured to receive memory requests generated by a first processing node as the requests traverse the interconnect fabric on the path to memory. When the first prediction unit receives a memory request, the first prediction unit generates a prediction of whether data targeted by the request is cached by another processing node. The first prediction unit is configured to cause a speculative probe to be sent to a second processing node responsive to predicting that the data targeted by the memory request is cached by the second processing node. The speculative probe accelerates the retrieval of the data from the second processing node if the prediction is correct.
    Type: Grant
    Filed: December 12, 2017
    Date of Patent: December 10, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Vydhyanathan Kalyanasundharam, Amit P. Apte, Ganesh Balakrishnan, Ann Ling, Ravindra N. Bhargava
  • Patent number: 10503669
    Abstract: Described is a method and apparatus for application migration between a dockable device and a docking station in a seamless manner. The dockable device includes a processor and the docking station includes a high-performance processor. The method includes determining a docking state of a dockable device while at least an application is running. Application migration from the dockable device to a docking station is initiated when the dockable device is moving to a docked state. Application migration from the docking station to the dockable device is initiated when the dockable device is moving to an undocked state. The application continues to run during the application migration from the dockable device to the docking station or during the application migration from the docking station to the dockable device.
    Type: Grant
    Filed: April 27, 2018
    Date of Patent: December 10, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Jonathan Lawrence Campbell, Yuping Shen
  • Patent number: 10503655
    Abstract: The described embodiments include a computing device that caches data acquired from a main memory in a high-bandwidth memory (HBM), the computing device including channels for accessing data stored in corresponding portions of the HBM. During operation, the computing device sets each of the channels so that data blocks stored in the corresponding portions of the HBM include corresponding numbers of cache lines. Based on records of accesses of cache lines in the HBM that were acquired from pages in the main memory, the computing device sets a data block size for each of the pages, the data block size being a number of cache lines. The computing device stores, in the HBM, data blocks acquired from each of the pages in the main memory using a channel having a data block size corresponding to the data block size for each of the pages.
    Type: Grant
    Filed: July 21, 2016
    Date of Patent: December 10, 2019
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Mitesh R. Meswani, Jee Ho Ryoo
  • Patent number: 10503641
    Abstract: A cache coherence bridge protocol provides an interface between a cache coherence protocol of a host processor and a cache coherence protocol of a processor-in-memory, thereby decoupling coherence mechanisms of the host processor and the processor-in-memory. The cache coherence bridge protocol requires limited change to existing host processor cache coherence protocols. The cache coherence bridge protocol may be used to facilitate interoperability between host processors and processor-in-memory devices designed by different vendors and both the host processors and processor-in-memory devices may implement coherence techniques among computing units within each processor. The cache coherence bridge protocol may support different granularity of cache coherence permissions than those used by cache coherence protocols of a host processor and/or a processor-in-memory.
    Type: Grant
    Filed: May 31, 2016
    Date of Patent: December 10, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Michael W. Boyer, Nuwan Jayasena
  • Patent number: 10503670
    Abstract: Systems, apparatuses, and methods for performing efficient memory accesses in a computing system are disclosed. In various embodiments, a computing system includes computing resources and a memory controller coupled to a memory device. The memory controller determines a memory request targets a given rank of multiple ranks. The memory controller determines a predicted latency for the given rank as an amount of time the pending queue in the memory controller for storing outstanding memory requests does not store any memory requests targeting the given rank. The memory controller determines the total bank latency as an amount of time for refreshing a number of banks which have not yet been refreshed in the given rank with per-bank refresh operations. If there are no pending requests targeting the given rank, each of the predicted latency and the total bank latency is used to select between per-bank and all-bank refresh operations.
    Type: Grant
    Filed: December 21, 2017
    Date of Patent: December 10, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Guanhao Shen, Ravindra N. Bhargava, James Raymond Magro, Kedarnath Balakrishnan, Jing Wang
  • Patent number: 10503203
    Abstract: Various semiconductor chip clock signal pathways are disclosed. In one aspect, a semiconductor chip with a receiver includes a clock signals pathway for conveying plural clock phases in the receiver. The clock signals pathway includes plural wires in an arrangement that has a first edge, a second edge separated from the first edge and a midline between the first edge and the second edge. Each of the wires conveys a clock phase. The wires of the arrangement are routed so that, along a length of the clock signals pathway, each of the wires spends about the same percentage of time at or nearer the first edge or the second edge and at or nearer the midline.
    Type: Grant
    Filed: December 12, 2017
    Date of Patent: December 10, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventor: Dirk J. Robinson
  • Patent number: 10503640
    Abstract: A processor includes multiple processing units (e.g., processor cores), with each processing unit associated with at least one private, dedicated cache. The processor is also associated with a system memory that stores all data that can be accessed by the multiple processing units. A coherency manager (e.g., a coherence directory) of the processor enforces a specified coherency scheme to ensure data coherency between the different caches and between the caches and the system memory. In response to a memory access request to a given cache resulting in a cache miss, the coherency manager identifies the current access latency to the system memory as well as the current access latencies to other caches of the processor. The coherency manager transfers the targeted data to the given cache from the cache or system memory having the lower access latency.
    Type: Grant
    Filed: April 24, 2018
    Date of Patent: December 10, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventor: Yasuko Eckert
  • Publication number: 20190371041
    Abstract: Techniques for improving memory utilization for communication between stages of a graphics processing pipeline are disclosed. The techniques include analyzing output instructions of a first shader program to determine whether any such output instructions output some data that is not used by a second shader program. The compiler performs data packing if gaps exist between used output data to reduce memory footprint. The compiler generates optimized output instructions in the first shader program and optimized input instructions in the second shader program to output the used data from the first shader program and input that data in the second shader program in a packed format based on information about usage of output data and data packing. If needed, the compiler inserts instructions to perform runtime checking to identify unused output data of the first shader program based on information not known at compile-time.
    Type: Application
    Filed: May 30, 2018
    Publication date: December 5, 2019
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Guohua Jin, Richard A. Burns, Todd Martin, Gianpaolo Tommasi
  • Publication number: 20190371043
    Abstract: A modified bilinear filter and method for use in a texture processor system are described herein. The system includes a texture processor, which includes a texture address unit and a texture data unit. The texture data unit includes a bilinear filter. An application sends a texture instruction which is processed by a texture address unit to obtain at least a level of detail (LOD) map and texel data. The texture data unit generates modified texel inputs from the LOD map texel data and at least two weights in a texture space region. The bilinear filter applies the at least two weights to the modified texel inputs, where the modified texel inputs and weights prevent finer LOD values from leaking into an area of coarser LOD values.
    Type: Application
    Filed: May 30, 2018
    Publication date: December 5, 2019
    Applicant: Advanced Micro Devices, Inc.
    Inventor: Maxim V. Kazakov