Patents Assigned to Advanced Micros Devices, Inc.
-
Patent number: 10510164Abstract: A processing unit, method, and medium for decompressing or generating textures within a graphics processing unit (GPU). The textures are compressed with a variable-rate compression scheme such as JPEG. The compressed textures are retrieved from system memory and transferred to local cache memory on the GPU without first being decompressed. A table is utilized by the cache to locate individual blocks within the compressed texture. A decompressing shader processor receives compressed blocks and then performs on-the-fly decompression of the blocks. The decompressed blocks are then processed as usual by a texture consuming shader processor of the GPU.Type: GrantFiled: June 13, 2016Date of Patent: December 17, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Konstantine Iourcha, John W. Brothers
-
Patent number: 10510721Abstract: Various molded chip combinations and methods of manufacturing the same are disclosed. In one aspect, a molded chip combination is provided that includes a first semiconductor chip that has a first PHY region, a second semiconductor chip that has a second PHY region, an interconnect chip interconnecting the first PHY region to the second PHY region, and a molding joining together the first semiconductor chip, the second semiconductor chip and the interconnect chip.Type: GrantFiled: August 11, 2017Date of Patent: December 17, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Milind S. Bhagavat, Lei Fu, Ivor Barber, Chia-Ken Leong, Rahul Agarwal
-
Patent number: 10509732Abstract: A cache controller applies an aging policy to a portion of a cache based on access metrics for different test regions of the cache, whereby each test region implements a different aging policy. The aging policy for each region establishes an initial age value for each entry of the cache, and a particular aging policy can set the age for a given entry based on whether the entry was placed in the cache in response to a demand request from a processor core or in response to a prefetch request. The cache controller can use the age value of each entry as a criterion in its cache replacement policy.Type: GrantFiled: April 27, 2016Date of Patent: December 17, 2019Assignee: Advanced Micro Devices, Inc.Inventor: Paul Moyer
-
Patent number: 10509452Abstract: Techniques for managing power distribution amongst processors in a massively parallel computer architecture are disclosed. The techniques utilize a hierarchy that organizes the various processors of the massively parallel computer architecture. The hierarchy groups numbers of the processors at the lowest level. When processors complete tasks, the power assigned to those processors is distributed to other processors in the same group so that the performance of those processors can be increased. Hierarchical organization simplifies the calculations required for determining how and when to distribute power, because when tasks are complete and power is available for distribution, a relatively small number of processors are available for consideration to receive that power. The number of processors that are grouped together can be adjusted in real time based on performance factors to improve the trade-off between calculation speed and power distribution efficacy.Type: GrantFiled: April 26, 2017Date of Patent: December 17, 2019Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Xinwei Chen, Leonardo de Paula Rosa Piga
-
Patent number: 10509596Abstract: A technique for accessing memory in an accelerated processing device coupled to stacked memory dies is provided herein. The technique includes receiving a memory access request from an execution unit and identifying whether the memory access request corresponds to memory cells of the stacked dies that are considered local to the execution unit or non-local. For local accesses, the access is made “directly”, that is, without using a bus. A control die coordinates operations for such local accesses, activating particular through-silicon-vias associated with the memory cells that include the data for the access. Non-local accesses are made via a distributed cache fabric and an interconnect bus in the control die. Various other features and details are provided below.Type: GrantFiled: December 21, 2017Date of Patent: December 17, 2019Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Dmitri Yudanov, Jiasheng Chen
-
Patent number: 10510185Abstract: A technique for performing rasterization and pixel shading with decoupled resolution is provided herein. The technique involves performing rasterization as normal to generate fine rasterization data and a set of (fine) quads. The quads are accumulated into a tile buffer and coarse quads are generated from the quads in the tile buffer based on a shading rate. The shading rate determines how many pixels of the fine quads are combined to generate coarse pixels of the coarse quads. Combination of fine pixels involves generating a single coarse pixel for each such fine pixel to be combined. The positions of the coarse pixels of the coarse quads are set based on the positions of the corresponding fine pixels. The coarse quads are shaded normally and the resulting shaded coarse quads are modified based on the fine rasterization data to generate shaded fine quads.Type: GrantFiled: August 25, 2017Date of Patent: December 17, 2019Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Skyler Jonathon Saleh, Christopher J. Brennan, Andrew S. Pomianowski, Ruijin Wu
-
Patent number: 10503670Abstract: Systems, apparatuses, and methods for performing efficient memory accesses in a computing system are disclosed. In various embodiments, a computing system includes computing resources and a memory controller coupled to a memory device. The memory controller determines a memory request targets a given rank of multiple ranks. The memory controller determines a predicted latency for the given rank as an amount of time the pending queue in the memory controller for storing outstanding memory requests does not store any memory requests targeting the given rank. The memory controller determines the total bank latency as an amount of time for refreshing a number of banks which have not yet been refreshed in the given rank with per-bank refresh operations. If there are no pending requests targeting the given rank, each of the predicted latency and the total bank latency is used to select between per-bank and all-bank refresh operations.Type: GrantFiled: December 21, 2017Date of Patent: December 10, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Guanhao Shen, Ravindra N. Bhargava, James Raymond Magro, Kedarnath Balakrishnan, Jing Wang
-
Patent number: 10503669Abstract: Described is a method and apparatus for application migration between a dockable device and a docking station in a seamless manner. The dockable device includes a processor and the docking station includes a high-performance processor. The method includes determining a docking state of a dockable device while at least an application is running. Application migration from the dockable device to a docking station is initiated when the dockable device is moving to a docked state. Application migration from the docking station to the dockable device is initiated when the dockable device is moving to an undocked state. The application continues to run during the application migration from the dockable device to the docking station or during the application migration from the docking station to the dockable device.Type: GrantFiled: April 27, 2018Date of Patent: December 10, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Jonathan Lawrence Campbell, Yuping Shen
-
Patent number: 10503655Abstract: The described embodiments include a computing device that caches data acquired from a main memory in a high-bandwidth memory (HBM), the computing device including channels for accessing data stored in corresponding portions of the HBM. During operation, the computing device sets each of the channels so that data blocks stored in the corresponding portions of the HBM include corresponding numbers of cache lines. Based on records of accesses of cache lines in the HBM that were acquired from pages in the main memory, the computing device sets a data block size for each of the pages, the data block size being a number of cache lines. The computing device stores, in the HBM, data blocks acquired from each of the pages in the main memory using a channel having a data block size corresponding to the data block size for each of the pages.Type: GrantFiled: July 21, 2016Date of Patent: December 10, 2019Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Mitesh R. Meswani, Jee Ho Ryoo
-
Patent number: 10503640Abstract: A processor includes multiple processing units (e.g., processor cores), with each processing unit associated with at least one private, dedicated cache. The processor is also associated with a system memory that stores all data that can be accessed by the multiple processing units. A coherency manager (e.g., a coherence directory) of the processor enforces a specified coherency scheme to ensure data coherency between the different caches and between the caches and the system memory. In response to a memory access request to a given cache resulting in a cache miss, the coherency manager identifies the current access latency to the system memory as well as the current access latencies to other caches of the processor. The coherency manager transfers the targeted data to the given cache from the cache or system memory having the lower access latency.Type: GrantFiled: April 24, 2018Date of Patent: December 10, 2019Assignee: Advanced Micro Devices, Inc.Inventor: Yasuko Eckert
-
Patent number: 10503641Abstract: A cache coherence bridge protocol provides an interface between a cache coherence protocol of a host processor and a cache coherence protocol of a processor-in-memory, thereby decoupling coherence mechanisms of the host processor and the processor-in-memory. The cache coherence bridge protocol requires limited change to existing host processor cache coherence protocols. The cache coherence bridge protocol may be used to facilitate interoperability between host processors and processor-in-memory devices designed by different vendors and both the host processors and processor-in-memory devices may implement coherence techniques among computing units within each processor. The cache coherence bridge protocol may support different granularity of cache coherence permissions than those used by cache coherence protocols of a host processor and/or a processor-in-memory.Type: GrantFiled: May 31, 2016Date of Patent: December 10, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Michael W. Boyer, Nuwan Jayasena
-
Patent number: 10503203Abstract: Various semiconductor chip clock signal pathways are disclosed. In one aspect, a semiconductor chip with a receiver includes a clock signals pathway for conveying plural clock phases in the receiver. The clock signals pathway includes plural wires in an arrangement that has a first edge, a second edge separated from the first edge and a midline between the first edge and the second edge. Each of the wires conveys a clock phase. The wires of the arrangement are routed so that, along a length of the clock signals pathway, each of the wires spends about the same percentage of time at or nearer the first edge or the second edge and at or nearer the midline.Type: GrantFiled: December 12, 2017Date of Patent: December 10, 2019Assignee: Advanced Micro Devices, Inc.Inventor: Dirk J. Robinson
-
Patent number: 10503658Abstract: The present disclosure is directed to techniques for migrating data between heterogeneous memories in a computing system. More specifically, the techniques involve migrating data between a memory having better access characteristics (e.g., lower latency but greater capacity) and a memory having worse access characteristics (e.g., higher latency but lower capacity). Migrations occur with a variable migration granularity. A migration granularity specifies a number of memory pages, having virtual addresses that are contiguous in virtual address space, that are migrated in a single migration operation. A history-based technique that adjusts migration granularity based on the history of memory utilization by an application is provided. A profiling-based technique that adjusts migration granularity based on a profiling operation is also provided.Type: GrantFiled: April 27, 2017Date of Patent: December 10, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Arkaprava Basu, Jee Ho Ryoo
-
Patent number: 10503648Abstract: Systems, apparatuses, and methods for accelerating cache to cache data transfers are disclosed. A system includes at least a plurality of processing nodes and prediction units, an interconnect fabric, and a memory. A first prediction unit is configured to receive memory requests generated by a first processing node as the requests traverse the interconnect fabric on the path to memory. When the first prediction unit receives a memory request, the first prediction unit generates a prediction of whether data targeted by the request is cached by another processing node. The first prediction unit is configured to cause a speculative probe to be sent to a second processing node responsive to predicting that the data targeted by the memory request is cached by the second processing node. The speculative probe accelerates the retrieval of the data from the second processing node if the prediction is correct.Type: GrantFiled: December 12, 2017Date of Patent: December 10, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Vydhyanathan Kalyanasundharam, Amit P. Apte, Ganesh Balakrishnan, Ann Ling, Ravindra N. Bhargava
-
Publication number: 20190372561Abstract: A clock detector includes a first detector circuit, a second detector circuit, and a toggle detector circuit. The first detector circuit is for activating a first detect signal in response to detecting that a clock signal that toggles between first and second logic states when present is stuck in the first logic state, and keeping the first detect signal inactive otherwise. The second detector circuit is for providing a second detect signal in response to detecting that the clock signal is stuck in the second logic state, and keeping the second detect signal inactive otherwise. The toggle detector circuit is for activating a toggle detect signal in response to both the first detect signal and the second detect signal being inactive, and keeping the toggle detect signal inactive in response to an activation of either the first detect signal or the second detect signal.Type: ApplicationFiled: May 30, 2018Publication date: December 5, 2019Applicant: Advanced Micro Devices, Inc.Inventors: Hariprasad TT, Satish Sankaralingam, Dinakar Venkata Sarraju
-
Publication number: 20190371043Abstract: A modified bilinear filter and method for use in a texture processor system are described herein. The system includes a texture processor, which includes a texture address unit and a texture data unit. The texture data unit includes a bilinear filter. An application sends a texture instruction which is processed by a texture address unit to obtain at least a level of detail (LOD) map and texel data. The texture data unit generates modified texel inputs from the LOD map texel data and at least two weights in a texture space region. The bilinear filter applies the at least two weights to the modified texel inputs, where the modified texel inputs and weights prevent finer LOD values from leaking into an area of coarser LOD values.Type: ApplicationFiled: May 30, 2018Publication date: December 5, 2019Applicant: Advanced Micro Devices, Inc.Inventor: Maxim V. Kazakov
-
Publication number: 20190371041Abstract: Techniques for improving memory utilization for communication between stages of a graphics processing pipeline are disclosed. The techniques include analyzing output instructions of a first shader program to determine whether any such output instructions output some data that is not used by a second shader program. The compiler performs data packing if gaps exist between used output data to reduce memory footprint. The compiler generates optimized output instructions in the first shader program and optimized input instructions in the second shader program to output the used data from the first shader program and input that data in the second shader program in a packed format based on information about usage of output data and data packing. If needed, the compiler inserts instructions to perform runtime checking to identify unused output data of the first shader program based on information not known at compile-time.Type: ApplicationFiled: May 30, 2018Publication date: December 5, 2019Applicant: Advanced Micro Devices, Inc.Inventors: Guohua Jin, Richard A. Burns, Todd Martin, Gianpaolo Tommasi
-
Patent number: 10496561Abstract: Systems, apparatuses, and methods for routing traffic through vertically stacked memory are disclosed. A computing system includes a host processor die and multiple vertically stacked memory dies. The host processor die generates memory access requests for the data stored in the multiple memory array banks in the memory dies. At least one memory die uses an on-die network switch with a programmable routing table for routing packets corresponding to the generated memory requests. Routes use both vertical hops and horizontal hops to reach the target memory array bank and to avoid any congested or failed resources along the route. The vertically stacked memory dies use through silicon via interconnects and at least one via does not traverse through all of the memory dies. Accordingly, the host processor die does not have a direct connection to one or more of the multiple memory dies.Type: GrantFiled: April 18, 2017Date of Patent: December 3, 2019Assignee: Advanced Micro Devices, Inc.Inventors: David A. Roberts, Sudhanva Gurumurthi
-
Patent number: 10489218Abstract: A method of monitoring, by one or more cores of a multi-core processor, speculative instructions, where the speculative instructions store data to a shared memory location, and where a semaphore, associated with the memory location, specifies the availability of the memory location to store data. One or more speculative instructions are flushed based on when the semaphore specifies the memory location is unavailable. Any further speculative instructions are suppressed from being issued based on a count of flushed speculation instructions above a specified threshold, executing the speculative instructions when the semaphore specifies the memory location is available, and storing the data to the memory location.Type: GrantFiled: December 19, 2017Date of Patent: November 26, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Douglas Benson Hunt, William E. Jones
-
Patent number: 10491916Abstract: The present disclosure is directed a system and method for exploiting camera and depth information associated with rendered video frames, such as those rendered by a server operating as part of a cloud gaming service, to more efficiently encode the rendered video frames for transmission over a network. The method and system of the present disclosure can be used in a server operating in a cloud gaming service to improve, for example, the amount of latency, downstream bandwidth, and/or computational processing power associated with playing a video game over its service. The method and system of the present disclosure can be further used in other applications where camera and depth information of a rendered or captured video frame is available.Type: GrantFiled: October 1, 2013Date of Patent: November 26, 2019Assignees: ADVANCED MICRO DEVICES, INC., ATI TECHNOLOGIES ULCInventors: Khaled Mammou, Ihab Amer, Gabor Sines, Lei Zhang, Michael Schmit, Daniel Wong