Patents Assigned to Advanced Micro Devices
-
Patent number: 10339068Abstract: Systems, apparatuses, and methods for implementing a virtualized translation lookaside buffer (TLB) are disclosed herein. In one embodiment, a system includes at least an execution unit and a first TLB. The system supports the execution of a plurality of virtual machines in a virtualization environment. The system detects a translation request generated by a first virtual machine with a first virtual memory identifier (VMID). The translation request is conveyed from the execution unit to the first TLB. The first TLB performs a lookup of its cache using at least a portion of a first virtual address and the first VMID. If the lookup misses in the cache, the first TLB allocates an entry which is addressable by the first virtual address and the first VMID, and the first TLB sends the translation request with the first VMID to a second TLB.Type: GrantFiled: April 24, 2017Date of Patent: July 2, 2019Assignees: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Wade K. Smith, Anthony Asaro
-
Publication number: 20190196839Abstract: A system and method for increasing address generation operations per cycle is described. In particular, a unified address generation scheduler queue (AGSQ) is a single queue structure which is accessed by first and second pickers in a picking cycle. Picking collisions are avoided by assigning a first set of entries to the first picker and a second set of entries to the second picker. The unified AGSQ uses a shifting, collapsing queue structure to shift other micro-operations into issued entries, which in turn collapses the queue and re-balances the unified AGSQ. A second level and delayed picker picks a third micro-operation that is ready for issue in the picking cycle. The third micro-operation is picked from the remaining entries across the first set of entries and the second set of entries. The third micro-operation issues in a next picking cycle.Type: ApplicationFiled: December 22, 2017Publication date: June 27, 2019Applicant: Advanced Micro Devices, Inc.Inventors: Christopher Spence Oliver, Hanbing Liu, Christopher James Burke, Michael D. Achenbach
-
Publication number: 20190196978Abstract: A data processing system includes a memory and an input output memory management unit that is connected to the memory. The input output memory management unit is adapted to receive batches of address translation requests. The input output memory management unit has instructions that identify, from among the batches of address translation requests, a later batch having a lower number of memory access requests than an earlier batch, and selectively schedules access to a page table walker for each address translation request of a batch.Type: ApplicationFiled: December 22, 2017Publication date: June 27, 2019Applicant: Advanced Micro Devices, Inc.Inventors: Arkaprava Basu, Eric Van Tassell, Mark Oskin, Guilherme Cox, Gabriel Loh
-
Publication number: 20190197761Abstract: A texture processor based ray tracing accelerator method and system are described. The system includes a shader, texture processor (TP) and cache, which are interconnected. The TP includes a texture address unit (TA), a texture cache processor (TCP), a filter pipeline unit and a ray intersection engine. The shader sends a texture instruction which contains ray data and a pointer to a bounded volume hierarchy (BVH) node to the TA. The TCP uses an address provided by the TA to fetch BVH node data from the cache. The ray intersection engine performs ray-BVH node type intersection testing using the ray data and the BVH node data. The intersection testing results and indications for BVH traversal are returned to the shader via a texture data return path. The shader reviews the intersection results and the indications to decide how to traverse to the next BVH node.Type: ApplicationFiled: December 22, 2017Publication date: June 27, 2019Applicant: Advanced Micro Devices, Inc.Inventors: Skyler Jonathon Saleh, Maxim V. Kazakov, Vineet Goel
-
Publication number: 20190196742Abstract: A technique for accessing memory in an accelerated processing device coupled to stacked memory dies is provided herein. The technique includes receiving a memory access request from an execution unit and identifying whether the memory access request corresponds to memory cells of the stacked dies that are considered local to the execution unit or non-local. For local accesses, the access is made “directly”, that is, without using a bus. A control die coordinates operations for such local accesses, activating particular through-silicon-vias associated with the memory cells that include the data for the access. Non-local accesses are made via a distributed cache fabric and an interconnect bus in the control die. Various other features and details are provided below.Type: ApplicationFiled: December 21, 2017Publication date: June 27, 2019Applicant: Advanced Micro Devices, Inc.Inventors: Dmitri Yudanov, Jiasheng Chen
-
Patent number: 10331196Abstract: A system and method for providing efficient clock gating capability for functional units are described. A functional unit uses a clock gating circuit for power management. A setup time of a single device propagation delay is provided for a received enable signal. When each of a clock signal, the enable signal and a delayed clock signal is asserted, an evaluate node of the clock gating circuit is discharged. When each of the clock signal and a second clock signal is asserted and the enable signal is negated, the evaluate node is left floating for a duration equal to the hold time. Afterward, the devices in a delayed onset keeper are turned on and the evaluate node has a path to the power supply. When the clock signal is negated, the evaluate node is precharged.Type: GrantFiled: June 19, 2017Date of Patent: June 25, 2019Assignee: Advanced Micro Devices, Inc.Inventor: Russell Schreiber
-
Patent number: 10332570Abstract: A memory device includes a memory cell coupled to a bitline and a bitline complement. A first capacitive structure is charged with a first voltage source such as a memory supply voltage. A second capacitive structure is charged with a second voltage source such as a core supply voltage. A coupling structure selectively and capacitively couples the first capacitive structure and the second capacitive structure to the bitline or the bitline complement, thereby applying a negative bitline write assist to the memory cell during a write operation.Type: GrantFiled: December 12, 2017Date of Patent: June 25, 2019Assignee: Advanced Micro Devices, Inc.Inventor: Tawfik Ahmed
-
Patent number: 10331357Abstract: A system and method for tracking stores and loads to reduce load latency when forming the same memory address by bypassing a load store unit within an execution unit is disclosed. The system and method include storing data in one or more memory dependent architectural register numbers (MdArns), allocating the one or more MdArns to a MEMFILE, writing the allocated one or more MdArns to a map file, wherein the map file contains a MdArn map to enable subsequent access to an entry in the MEMFILE, upon receipt of a load request, checking a base, an index, a displacement and a match/hit via the map file to identify an entry in the MEMFILE and an associated store, and on a hit, providing the entry responsive to the load request from the one or more MdArns.Type: GrantFiled: December 15, 2016Date of Patent: June 25, 2019Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Betty Ann McDaniel, Michael D. Achenbach, David N. Suggs, Frank C. Galloway, Kai Troester, Krishnan V. Ramani
-
Patent number: 10331537Abstract: Described herein are waterfall counters and an application to architectural vulnerability factor (AVF) estimation. Waterfall counters count events that are generated at event generation logic. The waterfall counters are a combination of small, fast counters local to the event generation logic, and larger, global counters in fast memory. The local counters can be saturation or oscillation counters. When a local counter is saturated or evicted, the value from the local counter is added to the global counter. This addition can be done using logic local to the local or global counter. The waterfall counters provide a full-accuracy event count without the high bandwidth that is needed to maintain the global counters. An AVF estimation can be determined based on ratios from counts of read events, write events, and total events using the waterfall counters.Type: GrantFiled: December 23, 2016Date of Patent: June 25, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Manish Gupta, Vilas Sridharan, David A. Roberts
-
Patent number: 10333500Abstract: A circuit includes a latch configured to update a stored state of the latch in response to an input data signal and a pulsed clock signal. The circuit includes a pulse generator configured to generate the pulsed clock signal based on an input clock signal, the input data signal, and a feedback signal indicative of a stored state of the latch. The pulse generator may be configured to generate a pulse enable signal based on the input data signal, the input clock signal, and the feedback signal. The pulsed clock signal may be based on the pulse enable signal and the input clock signal. The pulse generator may generate the pulsed clock signal to have a pulse of a first signal level in response to an indication that the stored state of the latch needs to change and generates the pulsed clock signal to have a second signal level, otherwise.Type: GrantFiled: January 30, 2018Date of Patent: June 25, 2019Assignee: Advanced Micro Devices, Inc.Inventors: David Scott Vickers, Patrick J. Shyvers
-
Publication number: 20190188557Abstract: Methods, devices, systems, and instructions for adaptive quantization in an artificial neural network (ANN) calculate a distribution of ANN information; select a quantization function from a set of quantization functions based on the distribution; apply the quantization function to the ANN information to generate quantized ANN information; load the quantized ANN information into the ANN; and generate an output based on the quantized ANN information. Some examples recalculate the distribution of ANN information and reselect the quantization function from the set of quantization functions based on the resampled distribution if the output does not sufficiently correlate with a known correct output. In some examples, the ANN information includes a set of training data. In some examples, the ANN information includes a plurality of link weights.Type: ApplicationFiled: December 20, 2017Publication date: June 20, 2019Applicant: Advanced Micro Devices, Inc.Inventors: Daniel I. Lowell, Sergey Voronov, Mayank Daga
-
Publication number: 20190188577Abstract: A system assigns experts of a mixture-of-experts artificial intelligence model to processing devices in an automated manner. The system includes an orchestrator component that maintains priority data that stores, for each of a set of experts, and for each of a set of execution parameters, ranking information that ranks different processing devices for the particular execution parameter. In one example, for the execution parameter of execution speed, and for a first expert, the priority data indicates that a central processing unit (“CPU”) executes the first expert faster than a graphics processing unit (“GPU”). In this example, for the execution parameter of power consumption, and for the first expert, the priority data indicates that a GPU uses less power than a CPU. The priority data stores such information for one or more processing devices, one or more experts, and one or more execution characteristics.Type: ApplicationFiled: December 20, 2017Publication date: June 20, 2019Applicant: Advanced Micro Devices, Inc.Inventors: Nicholas Malaya, Nuwan Jayasena
-
Publication number: 20190187990Abstract: A system and method for a lightweight fence is described. In particular, micro-operations including a fencing micro-operation are dispatched to a load queue. The fencing micro-operation allows micro-operations younger than the fencing micro-operation to execute, where the micro-operations are related to a type of fencing micro-operation. The fencing micro-operation is executed if the fencing micro-operation is the oldest memory access micro-operation, where the oldest memory access micro-operation is related to the type of fencing micro-operation. The fencing micro-operation determines whether micro-operations younger than the fencing micro-operation have load ordering violations and if load ordering violations are detected, the fencing micro-operation signals the retire queue that instructions younger than the fencing micro-operation should be flushed. The instructions to be flushed should include all micro-operations with load ordering violations.Type: ApplicationFiled: December 19, 2017Publication date: June 20, 2019Applicant: Advanced Micro Devices, Inc.Inventors: Gregory W. Smaus, John M. King
-
Patent number: 10324860Abstract: A method and system for allocating memory to a memory operation executed by a processor in a computer arrangement having a first processor configured for unified operation with a second processor. The method includes receiving a memory operation from a processor and mapping the memory operation to one of a plurality of memory heaps. The mapping produces a mapping result. The method also includes providing the mapping result to the processor.Type: GrantFiled: September 5, 2017Date of Patent: June 18, 2019Assignees: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Anthony Asaro, Kevin Normoyle, Mark Hummel
-
Patent number: 10324760Abstract: The described embodiments include a computing device that has two or more levels of memory, each level of memory having different performance characteristics. During operation, the computing device receives a request to lease an available block of memory in a specified level of memory for storing an object. When a block of memory is available for leasing in the specified level of memory, the computing device stores the object in the block of memory in the specified level of memory. The computing device also commences the lease for the block of memory by setting an indicator for the block of memory to indicate that the block of memory is leased. During the lease (i.e., until the lease is terminated), the object is kept in the block of memory.Type: GrantFiled: April 29, 2016Date of Patent: June 18, 2019Assignee: ADVANCED MICRO DEVICES, INC.Inventor: Mitesh Meswani
-
Patent number: 10324650Abstract: A processing apparatus is provided that includes NVRAM and one or more processors configured to process a first set and a second set of instructions according to a hierarchical processing scope and process a scoped persistence barrier residing in the program after the first instruction set and before the second instruction set. The barrier includes an instruction to cause first data to persist in the NVRAM before second data persists in the NVRAM. The first data results from execution of each of the first set of instructions processed according to the one hierarchical processing scope. The second data results from execution of each of the second set of instructions processed according to the one hierarchical processing scope. The processing apparatus also includes a controller configured to cause the first data to persist in the NVRAM before the second data persists in the NVRAM based on the scoped persistence barrier.Type: GrantFiled: September 23, 2016Date of Patent: June 18, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Arkaprava Basu, Mitesh R. Meswani, Dibakar Gope, Sooraj Puthoor
-
Publication number: 20190179798Abstract: A method and apparatus for scheduling instructions of a shader program for a graphics processing unit (GPU) with a fixed number of registers. The method and apparatus include computing, via a processing unit (PU), a liveness-based register usage across all basic blocks in the shader program, computing, via the PU, the range of numbers of waves of a plurality of registers for the shader program, assessing the impact of available post-register allocation optimizations, computing, via the PU, the scoring data based on number of waves of the plurality of registers, and computing, via the PU, the number of waves for execution for the plurality of registers.Type: ApplicationFiled: February 4, 2019Publication date: June 13, 2019Applicant: ADVANCED MICRO DEVICES, INC.Inventors: Robert A. Gottlieb, Christopher L. Reeve, Michael John Bedy
-
Patent number: 10321052Abstract: A method and apparatus of seam finding includes determining an overlap area between a first image and a second image. The first image is captured by a first image capturing device and the second image is captured by a second image capturing device. A plurality of seam paths for stitching the first image with the second image is computed and a cost is computed for each seam path. A seam is selected to stitch the first image to the second image based upon the cost for the seam path for that seam being less than a cost for all other computed seam paths, that seam is maintained as the selected seam for stitching based upon a predefined criteria.Type: GrantFiled: April 28, 2017Date of Patent: June 11, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Michael L. Schmit, Radhakrishna Giduthuri, Kiriti Nagesh Gowda
-
Patent number: 10318363Abstract: A system and method for managing operating parameters within a system for optimal power and reliability are described. A device includes a functional unit and a corresponding reliability evaluator. The functional unit provides reliability information to one or more reliability monitors, which translate the information to reliability values. The reliability evaluator determines an overall reliability level for the system based on the reliability values. The reliability monitor compares the actual usage values and the expected usage values. When system has maintained a relatively high level of reliability for a given time interval, the reliability evaluator sends an indication to update operating parameters to reduce reliability of the system, which also reduces power consumption for the system.Type: GrantFiled: October 28, 2016Date of Patent: June 11, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Greg Sadowski, Steven E. Raasch, Shomit N. Das, Wayne Burleson
-
Patent number: 10318340Abstract: In one form, a computer system includes a central processing unit, a memory controller coupled to the central processing unit and capable of accessing non-volatile random access memory (NVRAM), and an NVRAM-aware operating system. The NVRAM-aware operating system causes the central processing unit to selectively execute selected ones of a plurality of application programs, and is responsive to a predetermined operation to cause the central processing unit to execute a memory persistence procedure using the memory controller to access the NVRAM.Type: GrantFiled: December 31, 2014Date of Patent: June 11, 2019Assignees: ATI Technologies ULC, Advanced Micro Devices, Inc.Inventors: Sergey Blagodurov, Gabriel H. Loh, Mauricio Breternitz