Patents Assigned to Advanced Micro Devices
  • Patent number: 10134355
    Abstract: A processor performs vertex coloring for a graph based at least in part on the degree of each vertex of the graph and based at least in part with another coloring approach, such as comparison of random values assigned to the vertices. For each vertex in the graph, a processor determines whether the degree of the vertex is a local maximum; that is, whether the degree of the vertex is greater than the degree of each of its connected vertices. Each vertex having a local-maximum degree is assigned a specified or randomly selected color, and is then omitted from future iterations of the coloring process. After a stop criterion is met, the processor assigns random values to the remaining uncolored vertices and assigns colors based on comparisons of the random values.
    Type: Grant
    Filed: May 22, 2015
    Date of Patent: November 20, 2018
    Assignee: Advanced Micro Devices, Inc.
    Inventor: Shuai Che
  • Patent number: 10133672
    Abstract: Described is a system and method for efficient pointer chasing in systems having a single memory node or a network of memory nodes. In particular, a pointer chasing command is sent along with a memory request by an issuing node to a memory node. The pointer chasing command indicates the number of interdependent memory accesses and information needed for the identified interdependent memory accesses. An address computing unit associated with the memory node determines the relevant memory address for an interdependent memory access absent further interaction with the issuing node or without having to return to the issuing node.
    Type: Grant
    Filed: September 15, 2016
    Date of Patent: November 20, 2018
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Paula Aguilera Diez, Amin Farmahini-Farahani, Nuwan Jayasena
  • Patent number: 10133574
    Abstract: A system-on-a-chip includes a plurality of instruction processors and a hardware block such as a system management unit. The hardware block accesses values of performance counters associated with the plurality of instruction processors and modifies one or more operating points of one or more of the plurality of instruction processors based on comparisons of the instruction arrival rates and the instruction service rates to achieve optimized system metrics.
    Type: Grant
    Filed: June 14, 2016
    Date of Patent: November 20, 2018
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Akanksha Jain, Wei Huang, Indrani Paul
  • Patent number: 10127044
    Abstract: A processor, a device, and a non-transitory computer readable medium for performing branch prediction in a processor are presented. The processor includes a front end unit. The front end unit includes a level 1 branch target buffer (BTB), a BTB index predictor (BIP), and a level 1 hash perceptron (HP). The BTB is configured to predict a target address. The BIP is configured to generate a prediction based on a program counter and a global history, wherein the prediction includes a speculative partial target address, a global history value, a global history shift value, and a way prediction. The HP is configured to predict whether a branch instruction is taken or not taken.
    Type: Grant
    Filed: October 24, 2014
    Date of Patent: November 13, 2018
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Douglas Williams, Sahil Arora, Nikhil Gupta, Wei-Yu Chen, Debjit Das Sarma, Marius Evers
  • Publication number: 20180321946
    Abstract: A method for use in a processor for arbitrating between multiple processes to select wavefronts for execution on a shader core is provided. The processor includes a compute pipeline configured to issue wavefronts to the shader core for execution, a hardware queue descriptor associated with the compute pipeline, and the shader core. The shader core is configured to execute work for the compute pipeline corresponding to a first memory queue descriptor executed using data for the first memory queue descriptor that is loaded into a first hardware queue descriptor. The processor is configured to detect a context switch condition, and, responsive to the context switch condition, perform a context switch operation including loading data for a second memory queue descriptor into the first hardware queue descriptor. The shader core is configured to execute work corresponding to the second memory queue descriptor that is loaded into the first hardware queue descriptor.
    Type: Application
    Filed: July 19, 2018
    Publication date: November 8, 2018
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Robert Scott Hartog, Mark Leather, Michael Mantor, Rex McCrary, Sebastien Nussbaum, Philip J. Rogers, Ralph Clay Taylor, Thomas Woller
  • Patent number: 10121555
    Abstract: A non-volatile memory device having at least one non-volatile flash memory formatted with physical addresses to read and write data that is organized into blocks of data, wherein the blocks of data are organized into pages of data, and wherein the pages of data are organized into cells of data. The non-volatile memory device includes a non-volatile memory controller to direct read and write requests to the non-volatile flash memory for the storage and retrieval of data. The non-volatile memory controller includes a flash translation layer to correlate read and write requests for data having a logical address between the reading and writing the data to physical address location of the non-volatile flash memory. The flash translation layer, when writing to a physical address location, chooses between a wear-leveling circuit and a wear-limiting circuit to select the physical address location.
    Type: Grant
    Filed: September 15, 2016
    Date of Patent: November 6, 2018
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Amro Awad, Sergey Blagodurov
  • Patent number: 10122392
    Abstract: Systems, apparatuses, and methods for implementing a negative resistance circuit for bandwidth extension are disclosed. Within a feedback path of a differential signal path, capacitors are placed on the inputs and outputs of a fully differential amplifier connecting to the differential signal path. In one embodiment, a circuit includes a fully differential amplifier and four capacitors. A first capacitor is coupled between a first signal path and a non-inverting input terminal of the amplifier and a second capacitor is coupled between the first signal path and a non-inverting output terminal of the amplifier. A third capacitor is coupled between a second signal path and an inverting input terminal of the amplifier and a fourth capacitor is coupled between the second signal path and an inverting output terminal of the amplifier. The first and second signal paths carry a differential signal.
    Type: Grant
    Filed: August 18, 2016
    Date of Patent: November 6, 2018
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Milam Paraschou, Gerald R. Talbot, Dean E. Gonzales
  • Patent number: 10120430
    Abstract: A system and method for managing operating modes within a semiconductor chip for optimal power and performance while meeting a reliability target are described. A semiconductor chip includes a functional unit and a corresponding reliability monitor. The functional unit provides actual usage values to the reliability monitor. The reliability monitor determines expected usage values based on a reliability target and the age of the semiconductor chip. The reliability monitor compares the actual usage values and the expected usage values. The result of this comparison is used to increase or decrease current operational parameters.
    Type: Grant
    Filed: September 7, 2016
    Date of Patent: November 6, 2018
    Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC
    Inventors: Stephen V. Kosonocky, Thomas Burd, Adam Clark, Larry D. Hewitt, John Vincent Faricelli, John P. Petry
  • Patent number: 10121221
    Abstract: Described is a method and apparatus to accelerate rendering of 3D graphics images. When rendering, the transformation matrix (or equivalent) used for projecting primitives is modified so that a resulting image is smaller and/or warped compared to a regular unmodified rendering. The effect of such transformation is fewer pixels being rendered and thus a better performance. To compute the final image, the warped image is rectified by an inverse transformation. Depending on the warping transformation used, the resulting (rectified) image will be blurred in a controlled way, either simulating a directional motion blur, location-dependent sharpness/blurriness or other blurring effects. By intelligently selecting the warping transformation in correspondence with the rendered scene, overall performance is increased without losing the perceived fidelity of the final image.
    Type: Grant
    Filed: January 17, 2017
    Date of Patent: November 6, 2018
    Assignee: Advanced Micro Devices, Inc.
    Inventor: Evgene Fainstain
  • Publication number: 20180314652
    Abstract: Described is a method and apparatus for application migration between a dockable device and a docking station in a seamless manner. The dockable device includes a processor and the docking station includes a high-performance processor. The method includes determining a docking state of a dockable device while at least an application is running. Application migration from the dockable device to a docking station is initiated when the dockable device is moving to a docked state. Application migration from the docking station to the dockable device is initiated when the dockable device is moving to an undocked state. The application continues to run during the application migration from the dockable device to the docking station or during the application migration from the docking station to the dockable device.
    Type: Application
    Filed: April 27, 2018
    Publication date: November 1, 2018
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Jonathan Lawrence Campbell, Yuping Shen
  • Publication number: 20180316851
    Abstract: A method and apparatus of seam finding includes determining an overlap area between a first image and a second image. The first image is captured by a first image capturing device and the second image is captured by a second image capturing device. A plurality of seam paths for stitching the first image with the second image is computed and a cost is computed for each seam path. A seam is selected to stitch the first image to the second image based upon the cost for the seam path for that seam being less than a cost for all other computed seam paths, that seam is maintained as the selected seam for stitching based upon a predefined criteria.
    Type: Application
    Filed: April 28, 2017
    Publication date: November 1, 2018
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Michael L. Schmit, Radhakrishna Giduthuri, Kiriti Nagesh Gowda
  • Publication number: 20180314670
    Abstract: Embodiments of a peripheral component are described herein. Embodiments provide alternatives to the use of an external bridge integrated circuit (IC) architecture. For example, an embodiment multiplexes a peripheral bus such that multiple processors in one peripheral component can use one peripheral interface slot without requiring an external bridge IC. Embodiments are usable with known bus protocols.
    Type: Application
    Filed: July 3, 2018
    Publication date: November 1, 2018
    Applicants: ATI TECHNOLOGIES ULC, ADVANCED MICRO DEVICES, INC.
    Inventors: Shahin SOLKI, Stephen MOREIN, Mark S. GROSSMAN
  • Publication number: 20180314638
    Abstract: Methods, devices, and systems for GPU cache injection. A GPU compute node includes a network interface controller (NIC) which includes NIC receiver circuitry which can receive data for processing on the GPU, NIC transmitter circuitry which can send the data to a main memory of the GPU compute node and which can send coherence information to a coherence directory of the GPU compute node based on the data. The GPU compute node also includes a GPU which includes GPU receiver circuitry which can receive the coherence information; GPU processing circuitry which can determine, based on the coherence information, whether the data satisfies a heuristic; and GPU loading circuitry which can load the data into a cache of the GPU from the main memory if on the data satisfies the heuristic.
    Type: Application
    Filed: April 26, 2017
    Publication date: November 1, 2018
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Michael W. LeBeane, Walter B. Benton, Vinay Agarwala
  • Publication number: 20180314306
    Abstract: Techniques for managing power distribution amongst processors in a massively parallel computer architecture are disclosed. The techniques utilize a hierarchy that organizes the various processors of the massively parallel computer architecture. The hierarchy groups numbers of the processors at the lowest level. When processors complete tasks, the power assigned to those processors is distributed to other processors in the same group so that the performance of those processors can be increased. Hierarchical organization simplifies the calculations required for determining how and when to distribute power, because when tasks are complete and power is available for distribution, a relatively small number of processors are available for consideration to receive that power. The number of processors that are grouped together can be adjusted in real time based on performance factors to improve the trade-off between calculation speed and power distribution efficacy.
    Type: Application
    Filed: April 26, 2017
    Publication date: November 1, 2018
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Xinwei Chen, Leonardo de Paula Rosa Piga
  • Publication number: 20180314579
    Abstract: Techniques for handling memory errors are disclosed. Various memory units of an accelerated processing device (“APD”) include error units for detecting errors in data stored in the memory (e.g., using parity protection or error correcting code). Upon detecting an error considered to be an “initial uncorrectable error,” the error unit triggers transmission of an initial uncorrectable error interrupt (“IUE interrupt”) to a processor. This IUE interrupt includes information identifying the specific memory unit in which the error occurred (and possible other information about the error). A halt interrupt is generated and transmitted to the processor in response to the data having the error being consumed (i.e., used by an operation such as an instruction or command), which causes the APD to halt operations. If the data having the error is not consumed, then the halt interrupt is never generated (that the error occurred may remain logged, however).
    Type: Application
    Filed: April 28, 2017
    Publication date: November 1, 2018
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Carlos Sampayo, Michael Mantor
  • Publication number: 20180314436
    Abstract: The present disclosure is directed to techniques for migrating data between heterogeneous memories in a computing system. More specifically, the techniques involve migrating data between a memory having better access characteristics (e.g., lower latency but greater capacity) and a memory having worse access characteristics (e.g., higher latency but lower capacity). Migrations occur with a variable migration granularity. A migration granularity specifies a number of memory pages, having virtual addresses that are contiguous in virtual address space, that are migrated in a single migration operation. A history-based technique that adjusts migration granularity based on the history of memory utilization by an application is provided. A profiling-based technique that adjusts migration granularity based on a profiling operation is also provided.
    Type: Application
    Filed: April 27, 2017
    Publication date: November 1, 2018
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Arkaprava Basu, Jee Ho Ryoo
  • Patent number: 10114761
    Abstract: Techniques are provided for managing address translation request traffic where memory access requests can be made with differing quality-of-service levels, which specify latency and/or bandwidth requirements. The techniques involve translation lookaside buffers. Within the translation lookaside buffers, certain resources are reserved for specific quality-of-service levels. More specifically, translation lookaside buffer slots, which store the actual translations, as well as finite state machines in a work queue, are reserved for specific quality-of-service levels. The translation lookaside buffer receives multiple requests for address translation. The translation lookaside buffer selects requests having the highest quality-of-service level for which an available finite state machine is available.
    Type: Grant
    Filed: February 24, 2017
    Date of Patent: October 30, 2018
    Assignees: ATI TECHNOLOGIES ULC., ADVANCED MICRO DEVICES, INC.
    Inventors: Wade K. Smith, Kostantinos Danny Christidis
  • Patent number: 10115221
    Abstract: Described are a video graphics system, graphics processor, and methods for rendering three-dimensional objects. A buffer is partitioned into tiles. Each tile includes a plurality of pixels. Each pixel of each tile includes at least one sample. Each sample has a stencil value associated therewith. It is determined that each sample in a given tile has the same stencil value. A single stencil value is stored in the buffer for that tile. The single stencil value represents the stencil value for every sample in that tile.
    Type: Grant
    Filed: May 1, 2007
    Date of Patent: October 30, 2018
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventor: Christopher Brennan
  • Patent number: 10117356
    Abstract: A heat sink connector pin includes a pin assembly with linkage that provides the movement of a pin head or cap in a downward movement to cause multiple movable fingers at an opposing end of the pin to mechanically move from a retracted position that allows insertion of the heat sink connector pin through an opening in the substrate, such as a through-hole, to move to an outward extended position so that the multiple fingers engage or grasp a bottom surface of the substrate. In one example, the movable fingers are rotatably connected to share a same rotational axis with each other. In one example, the pin assembly includes a sleeve adapted to receive the shaft structure and is adapted to engage with the pin head. The sleeve includes a substrate stop surface adapted to contact a top surface of the substrate during insertion of the pin through the substrate.
    Type: Grant
    Filed: November 28, 2016
    Date of Patent: October 30, 2018
    Assignee: Advanced Micro Devices, Inc.
    Inventor: Donald L. Lambert
  • Publication number: 20180307619
    Abstract: A system including a gasket communicatively coupled between a unified northbridge (UNB) having a cache coherent interconnect (CCI) interface and a processor having an Advanced eXtensible Interface (AXI) coherency extension (ACE). The gasket is configured to translate requests from the processor that include ACE commands into equivalent CCI commands, wherein each request from the processor maps onto a specific CCI request type. The gasket is further configured to translate ACE tags into CCI tags. The gasket is further configured to translate CCI encoded probes from a system resource interface (SRI) into equivalent ACE snoop transactions. The gasket is further configured to translate the memory map to inter-operate with a UNB/coherent HyperTransport (cHT) environment. The gasket is further configured to receive a barrier transaction that is used to provide ordering for transactions.
    Type: Application
    Filed: July 2, 2018
    Publication date: October 25, 2018
    Applicants: Advanced Micro Devices, Inc., ATI Technologies ULC
    Inventors: Vydhyanathan Kalyanasundharam, Philip Ng, Maggie Chan, Vincent Cueva, Anthony Asaro, Jimshed Mirza, Greggory D. Donley, Bryan Broussard, Benjamin Tsien, Yaniv Adiri