Patents Assigned to Advanced Micro Devices
-
Patent number: 10452548Abstract: A method of preemptive cache writeback includes transmitting, from a first cache controller of a first cache to a second cache controller of a second cache, an unused bandwidth message representing an unused bandwidth between the first cache and the second cache during a first cycle. During a second cycle, a cache line containing dirty data is preemptively written back from the second cache to the first cache based on the unused bandwidth message. Further, the cache line in the second cache is written over in response to a cache miss to the second cache.Type: GrantFiled: September 28, 2017Date of Patent: October 22, 2019Assignee: Advanced Micro Devices, Inc.Inventors: David A. Roberts, Elliot H. Mednick
-
Patent number: 10452437Abstract: Systems, apparatuses, and methods for performing temperature-aware task scheduling and proactive power management. A SoC includes a plurality of processing units and a task queue storing pending tasks. The SoC calculates a thermal metric for each pending task to predict an amount of heat the pending task will generate. The SoC also determines a thermal gradient for each processing unit to predict a rate at which the processing unit's temperature will change when executing a task. The SoC also monitors a thermal margin of how far each processing unit is from reaching its thermal limit. The SoC minimizes non-uniform heat generation on the SoC by scheduling pending tasks from the task queue to the processing units based on the thermal metrics for the pending tasks, the thermal gradients of each processing unit, and the thermal margin available on each processing unit.Type: GrantFiled: June 24, 2016Date of Patent: October 22, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Abhinandan Majumdar, Brian J. Kocoloski, Leonardo Piga, Wei Huang, Yasuko Eckert
-
Patent number: 10452505Abstract: A memory system includes a non-volatile memory unit, a content-addressable memory unit coupled to the non-volatile memory unit, and an error injection logic unit coupled to the non-volatile memory unit and the content addressable memory unit. The non-volatile memory unit is programmed to allow a first error injection onto a first data word using the error injection logic unit. The error injection logic in combination with the content addressable memory unit replaces a bit cell in the memory system. The memory system performs an evaluation of various error detection and correction techniques.Type: GrantFiled: December 20, 2017Date of Patent: October 22, 2019Assignee: Advanced Micro Devices, Inc.Inventor: Michael K. Ciraula
-
Patent number: 10452554Abstract: Systems, apparatuses and methods of adaptively controlling a cache operating voltage are provided that comprise receiving indications of a plurality of cache usage amounts. Each cache usage amount corresponds to an amount of data to be accessed in a cache by one of a plurality of portions of a data processing application. The plurality of cache usage amounts are determining based on the received indications of the plurality of cache usage amounts. A voltage level applied to the cache is adaptively controlled based on one or more of the plurality of determined cache usage amounts. Memory access to the cache is controlled to be directed to a non-failing portion of the cache at the applied voltage level.Type: GrantFiled: April 8, 2016Date of Patent: October 22, 2019Assignees: ADVANCED MICRO DEVICES, INC., ATI TECHNOLOGIES ULCInventors: Ihab Amer, Khaled Mammou, Haibo Liu, Edward Harold, Fabio Gulino, Samuel Naffziger, Gabor Sines, Lawrence A. Bair, Andy Sung, Lei Zhang
-
Patent number: 10453243Abstract: Processing of non-real-time and real-time workloads is performed using discrete pipelines. A first pipeline includes a first shader and one or more fixed function hardware blocks. A second pipeline includes a second shader that is configured to emulate the at least one fixed function hardware block. First and second memory elements store first state information for the first pipeline and second state information for the second pipeline, respectively. A non-real-time workload executing in the first pipeline is preempted at a primitive boundary in response to a real-time workload being dispatched for execution in the second pipeline. The first memory element retains the first state information in response to preemption of the non-real-time workload. The first pipeline is configured to resume processing the subsequent primitive on the basis of the first state information stored in the first memory element.Type: GrantFiled: January 3, 2019Date of Patent: October 22, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Anirudh R. Acharya, Swapnil Sakharshete, Michael Mantor, Mangesh P. Nijasure, Todd Martin, Vineet Goel
-
Patent number: 10455211Abstract: A method and apparatus of precomputing includes capturing a first image by a first image capturing device. An image space for the first image is defined and pixels in the image space are analyzed for validity. Valid pixels are stored as valid pixel groups and the valid pixel groups are processed.Type: GrantFiled: May 25, 2017Date of Patent: October 22, 2019Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Michael L. Schmit, Radhakrishna Giduthuri, Kiriti Nagesh Gowda
-
Publication number: 20190318527Abstract: Improvements in the graphics processing pipeline that allow multiple pipelines to cooperate to render a single frame are disclosed. Two approaches are provided. In a first approach, world-space pipelines for the different graphics processing pipelines process all work for draw calls received from a central processing unit (CPU). In a second approach, the world-space pipelines divide up the work. Work that is divided is synchronized and redistributed at various points in the world-space pipeline. In either approach, the triangles output by the world-space pipelines are distributed to the screen-space pipelines based on the portions of the render surface overlapped by the triangles. Triangles are rendered by screen-space pipelines associated with the render surface portions overlapped by those triangles.Type: ApplicationFiled: June 26, 2019Publication date: October 17, 2019Applicant: Advanced Micro Devices, Inc.Inventors: Mangesh P. NIJASURE, Todd MARTIN, Michael MANTOR
-
Publication number: 20190317831Abstract: A memory fence or other similar operation is executed with reduced latency. An early fence operation is executed and acts as a hint to the processor executing the thread that executes the fence. This hint causes the processor to begin performing sub-operations for the fence earlier than if no such hint were executed. Examples of sub-operations for the fence include operations to make data written to by writes prior to the fence operation available to other threads. A resolving fence, which occurs after the early fence, performs the remaining sub-operations for the fence. By triggering some or all of the sub-operations for a memory fence that will occur in the future, the early fence operation reduces the amount of latency associated with that memory fence operation.Type: ApplicationFiled: April 12, 2018Publication date: October 17, 2019Applicant: Advanced Micro Devices, Inc.Inventors: Amin Farmahini-Farahani, David A. Roberts, Nuwan Jayasena
-
Publication number: 20190318229Abstract: Methods and systems for hardware mapping inference pipelines in deep neural network (DNN) systems. Each layer of the inference pipeline is mapped to a queue, which in turn is associated with one or more processing elements. Each queue has multiple elements, where an element represents the task to be completed for a given input. Each input is associated with a queue packet which identifies, for example, a type of DNN layer, which DNN layer to use, a next DNN layer to use and a data pointer. A queue packet is written into the element of a queue, and the processing elements read the element and process the input based on the information in the queue packet. The processing element then writes another queue packet to another queue based on the processed queue packet. Multiple inputs can be processed in parallel and on-the-fly using the queues independent of layer starting points.Type: ApplicationFiled: April 12, 2018Publication date: October 17, 2019Applicant: Advanced Micro Devices, Inc.Inventor: Shuai Che
-
Publication number: 20190317832Abstract: A thread holding a lock notifies a sleeping thread that is waiting on the lock that the lock holding thread is “about” to release the lock. In response to the notification, the waiting thread is woken up. While the waiting thread is woken up, the lock holding thread completes other operations prior to actually releasing the lock and then releases the lock. The notification to the waiting thread hides latency associated with waking up the waiting thread by allowing operations that wake up the waiting thread to occur while the lock holding thread is performing the other operations prior to releasing the thread.Type: ApplicationFiled: April 12, 2018Publication date: October 17, 2019Applicant: Advanced Micro Devices, Inc.Inventors: Nuwan Jayasena, Amin Farmahini-Farahani, David A. Roberts
-
Patent number: 10447452Abstract: An approach is provided for generating response frames. Incoming frames are processed by a receive controller to determine type and attributes. Based on the type and the attributes of the incoming frame, a response frame is constructed and transmitted by a transmit controller. A response frame is constructed by setting values in a frame template. A block ACK can be implemented by means of a block ACK scoreboard.Type: GrantFiled: July 13, 2015Date of Patent: October 15, 2019Assignees: ADVANCED MICRO DEVICES, INC., AMD FAR EAST LTD.Inventors: Douglas A. Mammoser, Sebastian Ahmed
-
Patent number: 10447273Abstract: A method for allocating field-programmable gate array (FPGA) resources includes monitoring a first operating metric for one or more computing devices, identifying a first portion of plurality of macro components of a set of one or more FPGA devices in the one or more computing devices, where the first portion is allocated for implementing one or more user defined functions. The method also includes, in response to a first change in the first operating metric, reallocating the first portion of the macro components for implementing a system function associated with the first operating metric, and generating a first notification indicating the reallocation of the first portion.Type: GrantFiled: September 11, 2018Date of Patent: October 15, 2019Assignee: Advanced Micro Devices, Inc.Inventors: David A. Roberts, Shenghsun Cho
-
Publication number: 20190310845Abstract: A system and method for tracking stores and loads to reduce load latency when forming the same memory address by bypassing a load store unit within an execution unit is disclosed. Store-load pairs which have a strong history of store-to-load forwarding are identified. Once identified, the load is memory renamed to the register stored by the store. The memory dependency predictor may also be used to detect loads that are dependent on a store but cannot be renamed. In such a configuration, the dependence is signaled to the load store unit and the load store unit uses the information to issue the load after the identified store has its physical address.Type: ApplicationFiled: June 24, 2019Publication date: October 10, 2019Applicant: Advanced Micro Devices, Inc.Inventors: Krishnan V. Ramani, Kai Troester, Frank C. Galloway, David N. Suggs, Michael D. Achenbach, Betty Ann McDaniel, Marius Evers
-
Publication number: 20190311072Abstract: Methods for designing a processor based on executing a randomly created and randomly executed executable on a fabricated processor. By implementing randomization at multiple levels in the testing of the processor, coupled with highly specific test generation constraint rules, highly focused tests on a micro-architectural feature are implemented while at the same time applying a high degree of random permutation in the way it stresses that specific feature. This allows for the detection and diagnosis of errors and bugs in the processor that elude traditional testing methods. Once the errors and bugs are detected and diagnosed, the processor can then be redesigned to no longer produce the anomalies. By eliminating the errors and bugs in the processor, a processor with improved computational efficiency and reliability can be fabricated.Type: ApplicationFiled: April 10, 2018Publication date: October 10, 2019Applicant: Advanced Micro Devices, Inc.Inventor: Eric W. Schieve
-
Patent number: 10438636Abstract: Write assist circuitry facilitates increased voltage applied to a memory device such as a memory cell or bitcell in changing a logical state of the memory device during a write operation. The write assist circuitry includes a second capacitive line or “metal cap” in addition to a first capacitive line coupled to one of a pair of bitlines to which voltage may be selectively applied. The capacitive lines provide increased write assistance to the memory device. The second capacitive line structurally lies in a second orientation and is formed in an integrated circuit second metal layer relative to the first capacitive line in some embodiments. The additional capacitive line provides negative bitline assistance by selectively driving its corresponding bitlines to be negative during a write operation.Type: GrantFiled: December 7, 2017Date of Patent: October 8, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Tawfik Ahmed, Amlan Ghosh, Keith A. Kasprak, Ricardo Cantu
-
Patent number: 10437736Abstract: A data processing system includes a memory and an input output memory management unit that is connected to the memory. The input output memory management unit is adapted to receive batches of address translation requests. The input output memory management unit has instructions that identify, from among the batches of address translation requests, a later batch having a lower number of memory access requests than an earlier batch, and selectively schedules access to a page table walker for each address translation request of a batch.Type: GrantFiled: December 22, 2017Date of Patent: October 8, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Arkaprava Basu, Eric Van Tassell, Mark Oskin, Guilherme Cox, Gabriel Loh
-
Patent number: 10440359Abstract: Methods and apparatus for video processing are disclosed. In one embodiment the work of processing of different types of video frames is allocated between a plurality of computing resources. For example, different computing resources for can be used for I, P and B frames, where an I frame is an intra-frame encoded with no other frames as a reference; a P frame is encoded with one previous I or P frame as a reference and a B frame is encoded with one previous and one future frame as references. In one example, a central processing unit (CPU) performs encoding of I frames and P frames of a video and a graphics processing unit (GPU) performs initial encoding of B frames of the video in connection with a fixed function video encoder configured to perform entropy encoding of the B frames.Type: GrantFiled: May 30, 2013Date of Patent: October 8, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Michael L. Schmit, Radhakrishna Giduthuri, Rajy Meeyakhan Rawther, Vicky W. Tsang, Passant V. Karunaratne
-
Patent number: 10438937Abstract: A system and method for creating layout for non-planar cells with redundancy in one or more of output contacts and power contacts are described. In various implementations, cell layout is created for a first cell with non-planar devices. An available local path in the first cell is identified for redundant output signal routing, which includes a free available metal zero layer track. Redundant metal zero layer is placed in an available metal zero track of the available local path. Redundant contacts and redundant metal one layer are placed in a free track in the available local path to connect an original output contact to a redundant output contact. An available external path is identified between the first cell and a second cell for redundant power or ground routing. One or more metal zero extension layers and/or metal one extension layers are placed in the identified external path.Type: GrantFiled: April 27, 2018Date of Patent: October 8, 2019Assignee: Advanced Micro Devices, Inc.Inventor: Richard T. Schultz
-
Publication number: 20190303302Abstract: A method and system for allocating memory to a memory operation executed by a processor in a computer arrangement having a plurality of processors. The method includes receiving a memory operation from a processor that receives a memory operation from a processor that references an address in a shared memory, mapping the received memory operation to at least one of a plurality of virtual memory pools to produce a mapping result, and providing the mapping result to the processor.Type: ApplicationFiled: June 17, 2019Publication date: October 3, 2019Applicants: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Anthony Asaro, Kevin Normoyle, Mark Hummel
-
Patent number: 10430343Abstract: A communication bypass mechanism accelerates cache-to-cache data transfers for communication traffic between caching agents that have separate last-level caches. A method includes bypassing a last-level cache of a first caching agent in response to a cache line having a modified state being evicted from a penultimate-level cache of the first caching agent and a communication attribute of a shadow tag entry associated with the cache line being set. The communication attribute indicates prior communication of the cache line with a second caching agent having a second last-level cache.Type: GrantFiled: February 21, 2017Date of Patent: October 1, 2019Assignee: Advanced Micro Devices, Inc.Inventor: Patrick N. Conway