Patents Assigned to Advanced Micros Devices, Inc.
  • Publication number: 20190197761
    Abstract: A texture processor based ray tracing accelerator method and system are described. The system includes a shader, texture processor (TP) and cache, which are interconnected. The TP includes a texture address unit (TA), a texture cache processor (TCP), a filter pipeline unit and a ray intersection engine. The shader sends a texture instruction which contains ray data and a pointer to a bounded volume hierarchy (BVH) node to the TA. The TCP uses an address provided by the TA to fetch BVH node data from the cache. The ray intersection engine performs ray-BVH node type intersection testing using the ray data and the BVH node data. The intersection testing results and indications for BVH traversal are returned to the shader via a texture data return path. The shader reviews the intersection results and the indications to decide how to traverse to the next BVH node.
    Type: Application
    Filed: December 22, 2017
    Publication date: June 27, 2019
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Skyler Jonathon Saleh, Maxim V. Kazakov, Vineet Goel
  • Patent number: 10331357
    Abstract: A system and method for tracking stores and loads to reduce load latency when forming the same memory address by bypassing a load store unit within an execution unit is disclosed. The system and method include storing data in one or more memory dependent architectural register numbers (MdArns), allocating the one or more MdArns to a MEMFILE, writing the allocated one or more MdArns to a map file, wherein the map file contains a MdArn map to enable subsequent access to an entry in the MEMFILE, upon receipt of a load request, checking a base, an index, a displacement and a match/hit via the map file to identify an entry in the MEMFILE and an associated store, and on a hit, providing the entry responsive to the load request from the one or more MdArns.
    Type: Grant
    Filed: December 15, 2016
    Date of Patent: June 25, 2019
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Betty Ann McDaniel, Michael D. Achenbach, David N. Suggs, Frank C. Galloway, Kai Troester, Krishnan V. Ramani
  • Patent number: 10331537
    Abstract: Described herein are waterfall counters and an application to architectural vulnerability factor (AVF) estimation. Waterfall counters count events that are generated at event generation logic. The waterfall counters are a combination of small, fast counters local to the event generation logic, and larger, global counters in fast memory. The local counters can be saturation or oscillation counters. When a local counter is saturated or evicted, the value from the local counter is added to the global counter. This addition can be done using logic local to the local or global counter. The waterfall counters provide a full-accuracy event count without the high bandwidth that is needed to maintain the global counters. An AVF estimation can be determined based on ratios from counts of read events, write events, and total events using the waterfall counters.
    Type: Grant
    Filed: December 23, 2016
    Date of Patent: June 25, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Manish Gupta, Vilas Sridharan, David A. Roberts
  • Patent number: 10332570
    Abstract: A memory device includes a memory cell coupled to a bitline and a bitline complement. A first capacitive structure is charged with a first voltage source such as a memory supply voltage. A second capacitive structure is charged with a second voltage source such as a core supply voltage. A coupling structure selectively and capacitively couples the first capacitive structure and the second capacitive structure to the bitline or the bitline complement, thereby applying a negative bitline write assist to the memory cell during a write operation.
    Type: Grant
    Filed: December 12, 2017
    Date of Patent: June 25, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventor: Tawfik Ahmed
  • Patent number: 10333500
    Abstract: A circuit includes a latch configured to update a stored state of the latch in response to an input data signal and a pulsed clock signal. The circuit includes a pulse generator configured to generate the pulsed clock signal based on an input clock signal, the input data signal, and a feedback signal indicative of a stored state of the latch. The pulse generator may be configured to generate a pulse enable signal based on the input data signal, the input clock signal, and the feedback signal. The pulsed clock signal may be based on the pulse enable signal and the input clock signal. The pulse generator may generate the pulsed clock signal to have a pulse of a first signal level in response to an indication that the stored state of the latch needs to change and generates the pulsed clock signal to have a second signal level, otherwise.
    Type: Grant
    Filed: January 30, 2018
    Date of Patent: June 25, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventors: David Scott Vickers, Patrick J. Shyvers
  • Patent number: 10331196
    Abstract: A system and method for providing efficient clock gating capability for functional units are described. A functional unit uses a clock gating circuit for power management. A setup time of a single device propagation delay is provided for a received enable signal. When each of a clock signal, the enable signal and a delayed clock signal is asserted, an evaluate node of the clock gating circuit is discharged. When each of the clock signal and a second clock signal is asserted and the enable signal is negated, the evaluate node is left floating for a duration equal to the hold time. Afterward, the devices in a delayed onset keeper are turned on and the evaluate node has a path to the power supply. When the clock signal is negated, the evaluate node is precharged.
    Type: Grant
    Filed: June 19, 2017
    Date of Patent: June 25, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventor: Russell Schreiber
  • Publication number: 20190188557
    Abstract: Methods, devices, systems, and instructions for adaptive quantization in an artificial neural network (ANN) calculate a distribution of ANN information; select a quantization function from a set of quantization functions based on the distribution; apply the quantization function to the ANN information to generate quantized ANN information; load the quantized ANN information into the ANN; and generate an output based on the quantized ANN information. Some examples recalculate the distribution of ANN information and reselect the quantization function from the set of quantization functions based on the resampled distribution if the output does not sufficiently correlate with a known correct output. In some examples, the ANN information includes a set of training data. In some examples, the ANN information includes a plurality of link weights.
    Type: Application
    Filed: December 20, 2017
    Publication date: June 20, 2019
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Daniel I. Lowell, Sergey Voronov, Mayank Daga
  • Publication number: 20190188577
    Abstract: A system assigns experts of a mixture-of-experts artificial intelligence model to processing devices in an automated manner. The system includes an orchestrator component that maintains priority data that stores, for each of a set of experts, and for each of a set of execution parameters, ranking information that ranks different processing devices for the particular execution parameter. In one example, for the execution parameter of execution speed, and for a first expert, the priority data indicates that a central processing unit (“CPU”) executes the first expert faster than a graphics processing unit (“GPU”). In this example, for the execution parameter of power consumption, and for the first expert, the priority data indicates that a GPU uses less power than a CPU. The priority data stores such information for one or more processing devices, one or more experts, and one or more execution characteristics.
    Type: Application
    Filed: December 20, 2017
    Publication date: June 20, 2019
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Nicholas Malaya, Nuwan Jayasena
  • Publication number: 20190187990
    Abstract: A system and method for a lightweight fence is described. In particular, micro-operations including a fencing micro-operation are dispatched to a load queue. The fencing micro-operation allows micro-operations younger than the fencing micro-operation to execute, where the micro-operations are related to a type of fencing micro-operation. The fencing micro-operation is executed if the fencing micro-operation is the oldest memory access micro-operation, where the oldest memory access micro-operation is related to the type of fencing micro-operation. The fencing micro-operation determines whether micro-operations younger than the fencing micro-operation have load ordering violations and if load ordering violations are detected, the fencing micro-operation signals the retire queue that instructions younger than the fencing micro-operation should be flushed. The instructions to be flushed should include all micro-operations with load ordering violations.
    Type: Application
    Filed: December 19, 2017
    Publication date: June 20, 2019
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Gregory W. Smaus, John M. King
  • Patent number: 10324650
    Abstract: A processing apparatus is provided that includes NVRAM and one or more processors configured to process a first set and a second set of instructions according to a hierarchical processing scope and process a scoped persistence barrier residing in the program after the first instruction set and before the second instruction set. The barrier includes an instruction to cause first data to persist in the NVRAM before second data persists in the NVRAM. The first data results from execution of each of the first set of instructions processed according to the one hierarchical processing scope. The second data results from execution of each of the second set of instructions processed according to the one hierarchical processing scope. The processing apparatus also includes a controller configured to cause the first data to persist in the NVRAM before the second data persists in the NVRAM based on the scoped persistence barrier.
    Type: Grant
    Filed: September 23, 2016
    Date of Patent: June 18, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Arkaprava Basu, Mitesh R. Meswani, Dibakar Gope, Sooraj Puthoor
  • Patent number: 10324760
    Abstract: The described embodiments include a computing device that has two or more levels of memory, each level of memory having different performance characteristics. During operation, the computing device receives a request to lease an available block of memory in a specified level of memory for storing an object. When a block of memory is available for leasing in the specified level of memory, the computing device stores the object in the block of memory in the specified level of memory. The computing device also commences the lease for the block of memory by setting an indicator for the block of memory to indicate that the block of memory is leased. During the lease (i.e., until the lease is terminated), the object is kept in the block of memory.
    Type: Grant
    Filed: April 29, 2016
    Date of Patent: June 18, 2019
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventor: Mitesh Meswani
  • Patent number: 10324860
    Abstract: A method and system for allocating memory to a memory operation executed by a processor in a computer arrangement having a first processor configured for unified operation with a second processor. The method includes receiving a memory operation from a processor and mapping the memory operation to one of a plurality of memory heaps. The mapping produces a mapping result. The method also includes providing the mapping result to the processor.
    Type: Grant
    Filed: September 5, 2017
    Date of Patent: June 18, 2019
    Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC
    Inventors: Anthony Asaro, Kevin Normoyle, Mark Hummel
  • Publication number: 20190179798
    Abstract: A method and apparatus for scheduling instructions of a shader program for a graphics processing unit (GPU) with a fixed number of registers. The method and apparatus include computing, via a processing unit (PU), a liveness-based register usage across all basic blocks in the shader program, computing, via the PU, the range of numbers of waves of a plurality of registers for the shader program, assessing the impact of available post-register allocation optimizations, computing, via the PU, the scoring data based on number of waves of the plurality of registers, and computing, via the PU, the number of waves for execution for the plurality of registers.
    Type: Application
    Filed: February 4, 2019
    Publication date: June 13, 2019
    Applicant: ADVANCED MICRO DEVICES, INC.
    Inventors: Robert A. Gottlieb, Christopher L. Reeve, Michael John Bedy
  • Patent number: 10321052
    Abstract: A method and apparatus of seam finding includes determining an overlap area between a first image and a second image. The first image is captured by a first image capturing device and the second image is captured by a second image capturing device. A plurality of seam paths for stitching the first image with the second image is computed and a cost is computed for each seam path. A seam is selected to stitch the first image to the second image based upon the cost for the seam path for that seam being less than a cost for all other computed seam paths, that seam is maintained as the selected seam for stitching based upon a predefined criteria.
    Type: Grant
    Filed: April 28, 2017
    Date of Patent: June 11, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Michael L. Schmit, Radhakrishna Giduthuri, Kiriti Nagesh Gowda
  • Patent number: 10318344
    Abstract: Systems, apparatuses, and methods for predicting page migration granularities for phases of an application executing on a non-uniform memory access (NUMA) system architecture are disclosed herein. A system with a plurality of processing units and memory devices executes a software application. The system identifies a plurality of phases of the application based on one or more characteristics (e.g., memory access pattern) of the application. The system predicts which page migration granularity will maximize performance for each phase of the application. The system performs a page migration at a first page migration granularity during a first phase of the application based on a first prediction. The system performs a page migration at a second page migration granularity during a second phase of the application based on a second prediction, wherein the second page migration granularity is different from the first page migration granularity.
    Type: Grant
    Filed: July 13, 2017
    Date of Patent: June 11, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventor: Anthony Thomas Gutierrez
  • Patent number: 10320695
    Abstract: A system and method for efficient management of network traffic management of highly data parallel computing. A processing node includes one or more processors capable of generating network messages. A network interface is used to receive and send network messages across a network. The processing node reduces at least one of a number or a storage size of the original network messages into one or more new network messages. The new network messages are sent to the network interface to send across the network.
    Type: Grant
    Filed: May 26, 2016
    Date of Patent: June 11, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Steven K. Reinhardt, Marc S. Orr, Bradford M. Beckmann, Shuai Che, David A. Wood
  • Patent number: 10318363
    Abstract: A system and method for managing operating parameters within a system for optimal power and reliability are described. A device includes a functional unit and a corresponding reliability evaluator. The functional unit provides reliability information to one or more reliability monitors, which translate the information to reliability values. The reliability evaluator determines an overall reliability level for the system based on the reliability values. The reliability monitor compares the actual usage values and the expected usage values. When system has maintained a relatively high level of reliability for a given time interval, the reliability evaluator sends an indication to update operating parameters to reduce reliability of the system, which also reduces power consumption for the system.
    Type: Grant
    Filed: October 28, 2016
    Date of Patent: June 11, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Greg Sadowski, Steven E. Raasch, Shomit N. Das, Wayne Burleson
  • Patent number: 10318153
    Abstract: A processor modifies memory management mode for a range of memory locations of a multilevel memory hierarchy based on changes in an application phase of an application executing at a processor. The processor monitors the application phase (e.g., computation-bound phase, input/output phase, or memory access phase) of the executing application and in response to a change in phase consults a management policy to identify a memory management mode. The processor automatically reconfigures a memory controller and other modules so that a range of memory locations of the multilevel memory hierarchy are managed according to the identified memory management mode. By changing the memory management mode for the range of memory locations according to the application phase, the processor improves processing efficiency and flexibility.
    Type: Grant
    Filed: December 19, 2014
    Date of Patent: June 11, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Sergey Blagodurov, Mitesh Ramesh Meswani, Gabriel H. Loh, Mauricio Breternitz, Jr., Mark Richard Nutter, John Robert Slice, David Andrew Roberts, Michael Ignatowski, Mark Henry Oskin
  • Patent number: 10318340
    Abstract: In one form, a computer system includes a central processing unit, a memory controller coupled to the central processing unit and capable of accessing non-volatile random access memory (NVRAM), and an NVRAM-aware operating system. The NVRAM-aware operating system causes the central processing unit to selectively execute selected ones of a plurality of application programs, and is responsive to a predetermined operation to cause the central processing unit to execute a memory persistence procedure using the memory controller to access the NVRAM.
    Type: Grant
    Filed: December 31, 2014
    Date of Patent: June 11, 2019
    Assignees: ATI Technologies ULC, Advanced Micro Devices, Inc.
    Inventors: Sergey Blagodurov, Gabriel H. Loh, Mauricio Breternitz
  • Publication number: 20190171452
    Abstract: A system and method for load fusion fuses small load operations into fewer, larger load operations. The system detects that a pair of adjacent operations are consecutive load operations, where the adjacent micro-operations refers to micro-operations flowing through adjacent dispatch slots and the consecutive load micro-operations refers to both of the adjacent micro-operations being load micro-operations. The consecutive load operations are then reviewed to determine if the data sizes are the same and if the load operation addresses are consecutive. The two load operations are then fused together to form one load micro-operation with twice the data size and one load data micro-operation with no load component.
    Type: Application
    Filed: December 1, 2017
    Publication date: June 6, 2019
    Applicant: Advanced Micro Devices, Inc.
    Inventor: John M. King