Patents Assigned to Advanced Micro Devices

Preemptive cache writeback with transaction support

Patent number: 10452548

Abstract: A method of preemptive cache writeback includes transmitting, from a first cache controller of a first cache to a second cache controller of a second cache, an unused bandwidth message representing an unused bandwidth between the first cache and the second cache during a first cycle. During a second cycle, a cache line containing dirty data is preemptively written back from the second cache to the first cache based on the unused bandwidth message. Further, the cache line in the second cache is written over in response to a cache miss to the second cache.

Type: Grant

Filed: September 28, 2017

Date of Patent: October 22, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: David A. Roberts, Elliot H. Mednick
Temperature-aware task scheduling and proactive power management

Patent number: 10452437

Abstract: Systems, apparatuses, and methods for performing temperature-aware task scheduling and proactive power management. A SoC includes a plurality of processing units and a task queue storing pending tasks. The SoC calculates a thermal metric for each pending task to predict an amount of heat the pending task will generate. The SoC also determines a thermal gradient for each processing unit to predict a rate at which the processing unit's temperature will change when executing a task. The SoC also monitors a thermal margin of how far each processing unit is from reaching its thermal limit. The SoC minimizes non-uniform heat generation on the SoC by scheduling pending tasks from the task queue to the processing units based on the thermal metrics for the pending tasks, the thermal gradients of each processing unit, and the thermal margin available on each processing unit.

Type: Grant

Filed: June 24, 2016

Date of Patent: October 22, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Abhinandan Majumdar, Brian J. Kocoloski, Leonardo Piga, Wei Huang, Yasuko Eckert
Error injection for assessment of error detection and correction techniques using error injection logic and non-volatile memory

Patent number: 10452505

Abstract: A memory system includes a non-volatile memory unit, a content-addressable memory unit coupled to the non-volatile memory unit, and an error injection logic unit coupled to the non-volatile memory unit and the content addressable memory unit. The non-volatile memory unit is programmed to allow a first error injection onto a first data word using the error injection logic unit. The error injection logic in combination with the content addressable memory unit replaces a bit cell in the memory system. The memory system performs an evaluation of various error detection and correction techniques.

Type: Grant

Filed: December 20, 2017

Date of Patent: October 22, 2019

Assignee: Advanced Micro Devices, Inc.

Inventor: Michael K. Ciraula
Adaptive resizable cache/LCM for improved power

Patent number: 10452554

Abstract: Systems, apparatuses and methods of adaptively controlling a cache operating voltage are provided that comprise receiving indications of a plurality of cache usage amounts. Each cache usage amount corresponds to an amount of data to be accessed in a cache by one of a plurality of portions of a data processing application. The plurality of cache usage amounts are determining based on the received indications of the plurality of cache usage amounts. A voltage level applied to the cache is adaptively controlled based on one or more of the plurality of determined cache usage amounts. Memory access to the cache is controlled to be directed to a non-failing portion of the cache at the applied voltage level.

Type: Grant

Filed: April 8, 2016

Date of Patent: October 22, 2019

Assignees: ADVANCED MICRO DEVICES, INC., ATI TECHNOLOGIES ULC

Inventors: Ihab Amer, Khaled Mammou, Haibo Liu, Edward Harold, Fabio Gulino, Samuel Naffziger, Gabor Sines, Lawrence A. Bair, Andy Sung, Lei Zhang
Primitive level preemption using discrete non-real-time and real time pipelines

Patent number: 10453243

Abstract: Processing of non-real-time and real-time workloads is performed using discrete pipelines. A first pipeline includes a first shader and one or more fixed function hardware blocks. A second pipeline includes a second shader that is configured to emulate the at least one fixed function hardware block. First and second memory elements store first state information for the first pipeline and second state information for the second pipeline, respectively. A non-real-time workload executing in the first pipeline is preempted at a primitive boundary in response to a real-time workload being dispatched for execution in the second pipeline. The first memory element retains the first state information in response to preemption of the non-real-time workload. The first pipeline is configured to resume processing the subsequent primitive on the basis of the first state information stored in the first memory element.

Type: Grant

Filed: January 3, 2019

Date of Patent: October 22, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Anirudh R. Acharya, Swapnil Sakharshete, Michael Mantor, Mangesh P. Nijasure, Todd Martin, Vineet Goel
Method and apparatus of image processing

Patent number: 10455211

Abstract: A method and apparatus of precomputing includes capturing a first image by a first image capturing device. An image space for the first image is defined and pixels in the image space are analyzed for validity. Valid pixels are stored as valid pixel groups and the valid pixel groups are processed.

Type: Grant

Filed: May 25, 2017

Date of Patent: October 22, 2019

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Michael L. Schmit, Radhakrishna Giduthuri, Kiriti Nagesh Gowda
SPLIT FRAME RENDERING

Publication number: 20190318527

Abstract: Improvements in the graphics processing pipeline that allow multiple pipelines to cooperate to render a single frame are disclosed. Two approaches are provided. In a first approach, world-space pipelines for the different graphics processing pipelines process all work for draw calls received from a central processing unit (CPU). In a second approach, the world-space pipelines divide up the work. Work that is divided is synchronized and redistributed at various points in the world-space pipeline. In either approach, the triangles output by the world-space pipelines are distributed to the screen-space pipelines based on the portions of the render surface overlapped by the triangles. Triangles are rendered by screen-space pipelines associated with the render surface portions overlapped by those triangles.

Type: Application

Filed: June 26, 2019

Publication date: October 17, 2019

Applicant: Advanced Micro Devices, Inc.

Inventors: Mangesh P. NIJASURE, Todd MARTIN, Michael MANTOR
TECHNIQUES FOR IMPROVED LATENCY OF THREAD SYNCHRONIZATION MECHANISMS

Publication number: 20190317831

Abstract: A memory fence or other similar operation is executed with reduced latency. An early fence operation is executed and acts as a hint to the processor executing the thread that executes the fence. This hint causes the processor to begin performing sub-operations for the fence earlier than if no such hint were executed. Examples of sub-operations for the fence include operations to make data written to by writes prior to the fence operation available to other threads. A resolving fence, which occurs after the early fence, performs the remaining sub-operations for the fence. By triggering some or all of the sub-operations for a memory fence that will occur in the future, the early fence operation reduces the amount of latency associated with that memory fence operation.

Type: Application

Filed: April 12, 2018

Publication date: October 17, 2019

Applicant: Advanced Micro Devices, Inc.

Inventors: Amin Farmahini-Farahani, David A. Roberts, Nuwan Jayasena
METHOD AND SYSTEM FOR HARDWARE MAPPING INFERENCE PIPELINES

Publication number: 20190318229

Abstract: Methods and systems for hardware mapping inference pipelines in deep neural network (DNN) systems. Each layer of the inference pipeline is mapped to a queue, which in turn is associated with one or more processing elements. Each queue has multiple elements, where an element represents the task to be completed for a given input. Each input is associated with a queue packet which identifies, for example, a type of DNN layer, which DNN layer to use, a next DNN layer to use and a data pointer. A queue packet is written into the element of a queue, and the processing elements read the element and process the input based on the information in the queue packet. The processing element then writes another queue packet to another queue based on the processed queue packet. Multiple inputs can be processed in parallel and on-the-fly using the queues independent of layer starting points.

Type: Application

Filed: April 12, 2018

Publication date: October 17, 2019

Applicant: Advanced Micro Devices, Inc.

Inventor: Shuai Che
FAST THREAD WAKE-UP THROUGH EARLY LOCK RELEASE

Publication number: 20190317832

Abstract: A thread holding a lock notifies a sleeping thread that is waiting on the lock that the lock holding thread is “about” to release the lock. In response to the notification, the waiting thread is woken up. While the waiting thread is woken up, the lock holding thread completes other operations prior to actually releasing the lock and then releases the lock. The notification to the waiting thread hides latency associated with waking up the waiting thread by allowing operations that wake up the waiting thread to occur while the lock holding thread is performing the other operations prior to releasing the thread.

Type: Application

Filed: April 12, 2018

Publication date: October 17, 2019

Applicant: Advanced Micro Devices, Inc.

Inventors: Nuwan Jayasena, Amin Farmahini-Farahani, David A. Roberts
Hardware controlled receive response generation

Patent number: 10447452

Abstract: An approach is provided for generating response frames. Incoming frames are processed by a receive controller to determine type and attributes. Based on the type and the attributes of the incoming frame, a response frame is constructed and transmitted by a transmit controller. A response frame is constructed by setting values in a frame template. A block ACK can be implemented by means of a block ACK scoreboard.

Type: Grant

Filed: July 13, 2015

Date of Patent: October 15, 2019

Assignees: ADVANCED MICRO DEVICES, INC., AMD FAR EAST LTD.

Inventors: Douglas A. Mammoser, Sebastian Ahmed
Dynamic virtualized field-programmable gate array resource control for performance and reliability

Patent number: 10447273

Abstract: A method for allocating field-programmable gate array (FPGA) resources includes monitoring a first operating metric for one or more computing devices, identifying a first portion of plurality of macro components of a set of one or more FPGA devices in the one or more computing devices, where the first portion is allocated for implementing one or more user defined functions. The method also includes, in response to a first change in the first operating metric, reallocating the first portion of the macro components for implementing a system function associated with the first operating metric, and generating a first notification indicating the reallocation of the first portion.

Type: Grant

Filed: September 11, 2018

Date of Patent: October 15, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: David A. Roberts, Shenghsun Cho
TRACKING STORES AND LOADS BY BYPASSING LOAD STORE UNITS

Publication number: 20190310845

Abstract: A system and method for tracking stores and loads to reduce load latency when forming the same memory address by bypassing a load store unit within an execution unit is disclosed. Store-load pairs which have a strong history of store-to-load forwarding are identified. Once identified, the load is memory renamed to the register stored by the store. The memory dependency predictor may also be used to detect loads that are dependent on a store but cannot be renamed. In such a configuration, the dependence is signaled to the load store unit and the load store unit uses the information to issue the load after the identified store has its physical address.

Type: Application

Filed: June 24, 2019

Publication date: October 10, 2019

Applicant: Advanced Micro Devices, Inc.

Inventors: Krishnan V. Ramani, Kai Troester, Frank C. Galloway, David N. Suggs, Michael D. Achenbach, Betty Ann McDaniel, Marius Evers
METHOD OF DEBUGGING A PROCESSOR

Publication number: 20190311072

Abstract: Methods for designing a processor based on executing a randomly created and randomly executed executable on a fabricated processor. By implementing randomization at multiple levels in the testing of the processor, coupled with highly specific test generation constraint rules, highly focused tests on a micro-architectural feature are implemented while at the same time applying a high degree of random permutation in the way it stresses that specific feature. This allows for the detection and diagnosis of errors and bugs in the processor that elude traditional testing methods. Once the errors and bugs are detected and diagnosed, the processor can then be redesigned to no longer produce the anomalies. By eliminating the errors and bugs in the processor, a processor with improved computational efficiency and reliability can be fabricated.

Type: Application

Filed: April 10, 2018

Publication date: October 10, 2019

Applicant: Advanced Micro Devices, Inc.

Inventor: Eric W. Schieve
Capacitive structure for memory write assist

Patent number: 10438636

Abstract: Write assist circuitry facilitates increased voltage applied to a memory device such as a memory cell or bitcell in changing a logical state of the memory device during a write operation. The write assist circuitry includes a second capacitive line or “metal cap” in addition to a first capacitive line coupled to one of a pair of bitlines to which voltage may be selectively applied. The capacitive lines provide increased write assistance to the memory device. The second capacitive line structurally lies in a second orientation and is formed in an integrated circuit second metal layer relative to the first capacitive line in some embodiments. The additional capacitive line provides negative bitline assistance by selectively driving its corresponding bitlines to be negative during a write operation.

Type: Grant

Filed: December 7, 2017

Date of Patent: October 8, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Tawfik Ahmed, Amlan Ghosh, Keith A. Kasprak, Ricardo Cantu
Single instruction multiple data page table walk scheduling at input output memory management unit

Patent number: 10437736

Abstract: A data processing system includes a memory and an input output memory management unit that is connected to the memory. The input output memory management unit is adapted to receive batches of address translation requests. The input output memory management unit has instructions that identify, from among the batches of address translation requests, a later batch having a lower number of memory access requests than an earlier batch, and selectively schedules access to a page table walker for each address translation request of a batch.

Type: Grant

Filed: December 22, 2017

Date of Patent: October 8, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Arkaprava Basu, Eric Van Tassell, Mark Oskin, Guilherme Cox, Gabriel Loh
Hybrid video encoder apparatus and methods

Patent number: 10440359

Abstract: Methods and apparatus for video processing are disclosed. In one embodiment the work of processing of different types of video frames is allocated between a plurality of computing resources. For example, different computing resources for can be used for I, P and B frames, where an I frame is an intra-frame encoded with no other frames as a reference; a P frame is encoded with one previous I or P frame as a reference and a B frame is encoded with one previous and one future frame as references. In one example, a central processing unit (CPU) performs encoding of I frames and P frames of a video and a graphics processing unit (GPU) performs initial encoding of B frames of the video in connection with a fixed function video encoder configured to perform entropy encoding of the B frames.

Type: Grant

Filed: May 30, 2013

Date of Patent: October 8, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Michael L. Schmit, Radhakrishna Giduthuri, Rajy Meeyakhan Rawther, Vicky W. Tsang, Passant V. Karunaratne
Metal zero contact via redundancy on output nodes and inset power rail architecture

Patent number: 10438937

Abstract: A system and method for creating layout for non-planar cells with redundancy in one or more of output contacts and power contacts are described. In various implementations, cell layout is created for a first cell with non-planar devices. An available local path in the first cell is identified for redundant output signal routing, which includes a free available metal zero layer track. Redundant metal zero layer is placed in an available metal zero track of the available local path. Redundant contacts and redundant metal one layer are placed in a free track in the available local path to connect an original output contact to a redundant output contact. An available external path is identified between the first cell and a second cell for redundant power or ground routing. One or more metal zero extension layers and/or metal one extension layers are placed in the identified external path.

Type: Grant

Filed: April 27, 2018

Date of Patent: October 8, 2019

Assignee: Advanced Micro Devices, Inc.

Inventor: Richard T. Schultz
MEMORY POOLS IN A MEMORY MODEL FOR A UNIFIED COMPUTING SYSTEM

Publication number: 20190303302

Abstract: A method and system for allocating memory to a memory operation executed by a processor in a computer arrangement having a plurality of processors. The method includes receiving a memory operation from a processor that receives a memory operation from a processor that references an address in a shared memory, mapping the received memory operation to at least one of a plurality of virtual memory pools to produce a mapping result, and providing the mapping result to the processor.

Type: Application

Filed: June 17, 2019

Publication date: October 3, 2019

Applicants: Advanced Micro Devices, Inc., ATI Technologies ULC

Inventors: Anthony Asaro, Kevin Normoyle, Mark Hummel
Acceleration of cache-to-cache data transfers for producer-consumer communication

Patent number: 10430343

Abstract: A communication bypass mechanism accelerates cache-to-cache data transfers for communication traffic between caching agents that have separate last-level caches. A method includes bypassing a last-level cache of a first caching agent in response to a cache line having a modified state being evicted from a penultimate-level cache of the first caching agent and a communication attribute of a shadow tag entry associated with the cache line being set. The communication attribute indicates prior communication of the cache line with a second caching agent having a second last-level cache.

Type: Grant

Filed: February 21, 2017

Date of Patent: October 1, 2019

Assignee: Advanced Micro Devices, Inc.

Inventor: Patrick N. Conway

prev … 107 108 109 110 111 112 113 114 115 … next