Patents Assigned to Advanced Micro Devices

Input-output processing on a remote integrated circuit chip

Patent number: 10515173

Abstract: An electronic device includes a first integrated circuit chip including a processing functional block, and a second integrated circuit chip including an input-output (IO) functional block. The IO functional block performs one or more IO processing operations on behalf of the processing functional block in the first integrated circuit chip. The first integrated circuit chip lacks at least some elements of the IO functional block, so that the processing functional block is unable to perform corresponding IO operations without the IO functional block.

Type: Grant

Filed: December 29, 2017

Date of Patent: December 24, 2019

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: David A. Roberts, Dean Gonzales
QUALITY OF SERVICE FOR INPUT/OUTPUT MEMORY MANAGEMENT UNIT

Publication number: 20190384722

Abstract: A data processing system includes a memory, a group of input/output (I/O) devices, an input/output memory management unit (IOMMU). The IOMMU is connected to the memory and adapted to allocate a hardware resource from among a group of hardware resources to receive an address translation request for a memory access from an I/O device. The IOMMU detects address translation requests from the plurality of I/O devices. The IOMMU reorders the address translation requests such that an order of dispatching an address translation request is based on a policy associated with the I/O device that is requesting the memory access. The IOMMU selectively allocates a hardware resource to the input/output device, based on the policy that is associated with the I/O device in response to the reordering.

Type: Application

Filed: June 13, 2018

Publication date: December 19, 2019

Applicant: Advanced Micro Devices, Inc.

Inventors: Arkaprava Basu, Michael LeBeane, Eric Van Tassell
Controlling access by IO devices to pages in a memory in a computing device

Patent number: 10509736

Abstract: An input-output (IO) memory management unit (IOMMU) uses a reverse map table (RMT) to ensure that address translations acquired from a nested page table are correct and that IO devices are permitted to access pages in a memory when performing memory accesses in a computing device. A translation lookaside buffer (TLB) flushing mechanism is used to invalidate address translation information in TLBs that are affected by changes in the RMT. A modified Address Translation Caching (ATC) mechanism may be used, in which only partial address translation information is provided to IO devices so that the RMT is checked when performing memory accesses for the IO devices using the cached address translation information.

Type: Grant

Filed: April 10, 2018

Date of Patent: December 17, 2019

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Nippon Raval, David A. Kaplan, Philip Ng
Hierarchical power distribution in large scale computing systems

Patent number: 10509452

Abstract: Techniques for managing power distribution amongst processors in a massively parallel computer architecture are disclosed. The techniques utilize a hierarchy that organizes the various processors of the massively parallel computer architecture. The hierarchy groups numbers of the processors at the lowest level. When processors complete tasks, the power assigned to those processors is distributed to other processors in the same group so that the performance of those processors can be increased. Hierarchical organization simplifies the calculations required for determining how and when to distribute power, because when tasks are complete and power is available for distribution, a relatively small number of processors are available for consideration to receive that power. The number of processors that are grouped together can be adjusted in real time based on performance factors to improve the trade-off between calculation speed and power distribution efficacy.

Type: Grant

Filed: April 26, 2017

Date of Patent: December 17, 2019

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Xinwei Chen, Leonardo de Paula Rosa Piga
Extreme-bandwidth scalable performance-per-watt GPU architecture

Patent number: 10509596

Abstract: A technique for accessing memory in an accelerated processing device coupled to stacked memory dies is provided herein. The technique includes receiving a memory access request from an execution unit and identifying whether the memory access request corresponds to memory cells of the stacked dies that are considered local to the execution unit or non-local. For local accesses, the access is made “directly”, that is, without using a bus. A control die coordinates operations for such local accesses, activating particular through-silicon-vias associated with the memory cells that include the data for the access. Non-local accesses are made via a distributed cache fabric and an interconnect bus in the control die. Various other features and details are provided below.

Type: Grant

Filed: December 21, 2017

Date of Patent: December 17, 2019

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Dmitri Yudanov, Jiasheng Chen
Variable rate shading

Patent number: 10510185

Abstract: A technique for performing rasterization and pixel shading with decoupled resolution is provided herein. The technique involves performing rasterization as normal to generate fine rasterization data and a set of (fine) quads. The quads are accumulated into a tile buffer and coarse quads are generated from the quads in the tile buffer based on a shading rate. The shading rate determines how many pixels of the fine quads are combined to generate coarse pixels of the coarse quads. Combination of fine pixels involves generating a single coarse pixel for each such fine pixel to be combined. The positions of the coarse pixels of the coarse quads are set based on the positions of the corresponding fine pixels. The coarse quads are shaded normally and the resulting shaded coarse quads are modified based on the fine rasterization data to generate shaded fine quads.

Type: Grant

Filed: August 25, 2017

Date of Patent: December 17, 2019

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Skyler Jonathon Saleh, Christopher J. Brennan, Andrew S. Pomianowski, Ruijin Wu
Selecting cache aging policy for prefetches based on cache test regions

Patent number: 10509732

Abstract: A cache controller applies an aging policy to a portion of a cache based on access metrics for different test regions of the cache, whereby each test region implements a different aging policy. The aging policy for each region establishes an initial age value for each entry of the cache, and a particular aging policy can set the age for a given entry based on whether the entry was placed in the cache in response to a demand request from a processor core or in response to a prefetch request. The cache controller can use the age value of each entry as a criterion in its cache replacement policy.

Type: Grant

Filed: April 27, 2016

Date of Patent: December 17, 2019

Assignee: Advanced Micro Devices, Inc.

Inventor: Paul Moyer
Configuration of multi-die modules with through-silicon vias

Patent number: 10509752

Abstract: A data processing system includes a processing unit that forms a base die and has a group of through-silicon vias (TSVs), and is connected to a memory system. The memory system includes a die stack that includes a first die and a second die. The first die has a first surface that includes a group of micro-bump landing pads and a group of TSV landing pads. The group of micro-bump landing pads are connected to the group of TSVs of the processing unit using a corresponding group of micro-bumps. The first die has a group of memory die TSVs. The subsequent die has a first surface that includes a group of micro-bump landing pads and a group of TSV landing pads connected to the group of TSVs of the first die. The first die communicates with the processing unit using first cycle timing, and with the subsequent die using second cycle timing.

Type: Grant

Filed: April 27, 2018

Date of Patent: December 17, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Russell Schreiber, John Wuu, Michael K. Ciraula, Patrick J. Shyvers
Molded chip combination

Patent number: 10510721

Abstract: Various molded chip combinations and methods of manufacturing the same are disclosed. In one aspect, a molded chip combination is provided that includes a first semiconductor chip that has a first PHY region, a second semiconductor chip that has a second PHY region, an interconnect chip interconnecting the first PHY region to the second PHY region, and a molding joining together the first semiconductor chip, the second semiconductor chip and the interconnect chip.

Type: Grant

Filed: August 11, 2017

Date of Patent: December 17, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Milind S. Bhagavat, Lei Fu, Ivor Barber, Chia-Ken Leong, Rahul Agarwal
Real time on-chip texture decompression using shader processors

Patent number: 10510164

Abstract: A processing unit, method, and medium for decompressing or generating textures within a graphics processing unit (GPU). The textures are compressed with a variable-rate compression scheme such as JPEG. The compressed textures are retrieved from system memory and transferred to local cache memory on the GPU without first being decompressed. A table is utilized by the cache to locate individual blocks within the compressed texture. A decompressing shader processor receives compressed blocks and then performs on-the-fly decompression of the blocks. The decompressed blocks are then processed as usual by a texture consuming shader processor of the GPU.

Type: Grant

Filed: June 13, 2016

Date of Patent: December 17, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Konstantine Iourcha, John W. Brothers
Page migration with varying granularity

Patent number: 10503658

Abstract: The present disclosure is directed to techniques for migrating data between heterogeneous memories in a computing system. More specifically, the techniques involve migrating data between a memory having better access characteristics (e.g., lower latency but greater capacity) and a memory having worse access characteristics (e.g., higher latency but lower capacity). Migrations occur with a variable migration granularity. A migration granularity specifies a number of memory pages, having virtual addresses that are contiguous in virtual address space, that are migrated in a single migration operation. A history-based technique that adjusts migration granularity based on the history of memory utilization by an application is provided. A profiling-based technique that adjusts migration granularity based on a profiling operation is also provided.

Type: Grant

Filed: April 27, 2017

Date of Patent: December 10, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Arkaprava Basu, Jee Ho Ryoo
Cache to cache data transfer acceleration techniques

Patent number: 10503648

Abstract: Systems, apparatuses, and methods for accelerating cache to cache data transfers are disclosed. A system includes at least a plurality of processing nodes and prediction units, an interconnect fabric, and a memory. A first prediction unit is configured to receive memory requests generated by a first processing node as the requests traverse the interconnect fabric on the path to memory. When the first prediction unit receives a memory request, the first prediction unit generates a prediction of whether data targeted by the request is cached by another processing node. The first prediction unit is configured to cause a speculative probe to be sent to a second processing node responsive to predicting that the data targeted by the memory request is cached by the second processing node. The speculative probe accelerates the retrieval of the data from the second processing node if the prediction is correct.

Type: Grant

Filed: December 12, 2017

Date of Patent: December 10, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Vydhyanathan Kalyanasundharam, Amit P. Apte, Ganesh Balakrishnan, Ann Ling, Ravindra N. Bhargava
System and method for application migration for a dockable device

Patent number: 10503669

Abstract: Described is a method and apparatus for application migration between a dockable device and a docking station in a seamless manner. The dockable device includes a processor and the docking station includes a high-performance processor. The method includes determining a docking state of a dockable device while at least an application is running. Application migration from the dockable device to a docking station is initiated when the dockable device is moving to a docked state. Application migration from the docking station to the dockable device is initiated when the dockable device is moving to an undocked state. The application continues to run during the application migration from the dockable device to the docking station or during the application migration from the docking station to the dockable device.

Type: Grant

Filed: April 27, 2018

Date of Patent: December 10, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Jonathan Lawrence Campbell, Yuping Shen
Data block sizing for channels in a multi-channel high-bandwidth memory

Patent number: 10503655

Abstract: The described embodiments include a computing device that caches data acquired from a main memory in a high-bandwidth memory (HBM), the computing device including channels for accessing data stored in corresponding portions of the HBM. During operation, the computing device sets each of the channels so that data blocks stored in the corresponding portions of the HBM include corresponding numbers of cache lines. Based on records of accesses of cache lines in the HBM that were acquired from pages in the main memory, the computing device sets a data block size for each of the pages, the data block size being a number of cache lines. The computing device stores, in the HBM, data blocks acquired from each of the pages in the main memory using a channel having a data block size corresponding to the data block size for each of the pages.

Type: Grant

Filed: July 21, 2016

Date of Patent: December 10, 2019

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Mitesh R. Meswani, Jee Ho Ryoo
Cache coherence for processing in memory

Patent number: 10503641

Abstract: A cache coherence bridge protocol provides an interface between a cache coherence protocol of a host processor and a cache coherence protocol of a processor-in-memory, thereby decoupling coherence mechanisms of the host processor and the processor-in-memory. The cache coherence bridge protocol requires limited change to existing host processor cache coherence protocols. The cache coherence bridge protocol may be used to facilitate interoperability between host processors and processor-in-memory devices designed by different vendors and both the host processors and processor-in-memory devices may implement coherence techniques among computing units within each processor. The cache coherence bridge protocol may support different granularity of cache coherence permissions than those used by cache coherence protocols of a host processor and/or a processor-in-memory.

Type: Grant

Filed: May 31, 2016

Date of Patent: December 10, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Michael W. Boyer, Nuwan Jayasena
Dynamic per-bank and all-bank refresh

Patent number: 10503670

Abstract: Systems, apparatuses, and methods for performing efficient memory accesses in a computing system are disclosed. In various embodiments, a computing system includes computing resources and a memory controller coupled to a memory device. The memory controller determines a memory request targets a given rank of multiple ranks. The memory controller determines a predicted latency for the given rank as an amount of time the pending queue in the memory controller for storing outstanding memory requests does not store any memory requests targeting the given rank. The memory controller determines the total bank latency as an amount of time for refreshing a number of banks which have not yet been refreshed in the given rank with per-bank refresh operations. If there are no pending requests targeting the given rank, each of the predicted latency and the total bank latency is used to select between per-bank and all-bank refresh operations.

Type: Grant

Filed: December 21, 2017

Date of Patent: December 10, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Guanhao Shen, Ravindra N. Bhargava, James Raymond Magro, Kedarnath Balakrishnan, Jing Wang
Low-power multi-phase clock distribution on silicon

Patent number: 10503203

Abstract: Various semiconductor chip clock signal pathways are disclosed. In one aspect, a semiconductor chip with a receiver includes a clock signals pathway for conveying plural clock phases in the receiver. The clock signals pathway includes plural wires in an arrangement that has a first edge, a second edge separated from the first edge and a midline between the first edge and the second edge. Each of the wires conveys a clock phase. The wires of the arrangement are routed so that, along a length of the clock signals pathway, each of the wires spends about the same percentage of time at or nearer the first edge or the second edge and at or nearer the midline.

Type: Grant

Filed: December 12, 2017

Date of Patent: December 10, 2019

Assignee: Advanced Micro Devices, Inc.

Inventor: Dirk J. Robinson
Selective data retrieval based on access latency

Patent number: 10503640

Abstract: A processor includes multiple processing units (e.g., processor cores), with each processing unit associated with at least one private, dedicated cache. The processor is also associated with a system memory that stores all data that can be accessed by the multiple processing units. A coherency manager (e.g., a coherence directory) of the processor enforces a specified coherency scheme to ensure data coherency between the different caches and between the caches and the system memory. In response to a memory access request to a given cache resulting in a cache miss, the coherency manager identifies the current access latency to the system memory as well as the current access latencies to other caches of the processor. The coherency manager transfers the targeted data to the given cache from the cache or system memory having the lower access latency.

Type: Grant

Filed: April 24, 2018

Date of Patent: December 10, 2019

Assignee: Advanced Micro Devices, Inc.

Inventor: Yasuko Eckert
COMPILER-ASSISTED TECHNIQUES FOR MEMORY USE REDUCTION IN GRAPHICS PIPELINE

Publication number: 20190371041

Abstract: Techniques for improving memory utilization for communication between stages of a graphics processing pipeline are disclosed. The techniques include analyzing output instructions of a first shader program to determine whether any such output instructions output some data that is not used by a second shader program. The compiler performs data packing if gaps exist between used output data to reduce memory footprint. The compiler generates optimized output instructions in the first shader program and optimized input instructions in the second shader program to output the used data from the first shader program and input that data in the second shader program in a packed format based on information about usage of output data and data packing. If needed, the compiler inserts instructions to perform runtime checking to identify unused output data of the first shader program based on information not known at compile-time.

Type: Application

Filed: May 30, 2018

Publication date: December 5, 2019

Applicant: Advanced Micro Devices, Inc.

Inventors: Guohua Jin, Richard A. Burns, Todd Martin, Gianpaolo Tommasi
METHOD AND SYSTEM FOR SMOOTH LEVEL OF DETAIL INTERPOLATION FOR PARTIALLY RESIDENT TEXTURES

Publication number: 20190371043

Abstract: A modified bilinear filter and method for use in a texture processor system are described herein. The system includes a texture processor, which includes a texture address unit and a texture data unit. The texture data unit includes a bilinear filter. An application sends a texture instruction which is processed by a texture address unit to obtain at least a level of detail (LOD) map and texel data. The texture data unit generates modified texel inputs from the LOD map texel data and at least two weights in a texture space region. The bilinear filter applies the at least two weights to the modified texel inputs, where the modified texel inputs and weights prevent finer LOD values from leaking into an area of coarser LOD values.

Type: Application

Filed: May 30, 2018

Publication date: December 5, 2019

Applicant: Advanced Micro Devices, Inc.

Inventor: Maxim V. Kazakov

prev … 105 106 107 108 109 110 111 112 113 … next