Patents Assigned to Advanced Micros Devices, Inc.
-
Patent number: 10860489Abstract: Techniques are disclosed for designing cache compression algorithms that control how data in caches are compressed. The techniques generate a custom “byte select algorithm” by applying repeated transforms applied to an initial compression algorithm until a set of suitability criteria is met. The suitability criteria include that the “cost” is below a threshold and that a metadata constraint is met. The “cost” is the number of blocks that can be compressed by an algorithm as compared with the “ideal” algorithm. The metadata constraint is the number of bits required for metadata.Type: GrantFiled: October 31, 2018Date of Patent: December 8, 2020Assignee: Advanced Micro Devices, Inc.Inventors: Shomit N. Das, Matthew Tomei, David A. Wood
-
Patent number: 10862809Abstract: The described embodiments include an electronic device that handles network packets. During operation, the electronic device receives a carrier packet, the carrier packet that includes a tunneled packet in a payload of the carrier packet, wherein the tunneled packet includes a packet priority of the tunneled packet and the carrier packet includes a packet priority of the carrier packet. The electronic device then updates the packet priority of the carrier packet based on the packet priority of the tunneled packet.Type: GrantFiled: May 19, 2017Date of Patent: December 8, 2020Assignee: ADVANCED MICRO DEVICES, INC.Inventor: David A. Roberts
-
Patent number: 10861507Abstract: Systems, apparatuses, and methods for implementing a sampling circuit with increased headroom are disclosed. A sampling circuit includes at least a pair of input signal transistors connected via their drains to a cross-coupled pair of state nodes. The cross-coupled pair of state nodes are coupled to a tail transistor device via the sources of N-type transistors. When clock goes low, the circuit precharges the cross-coupled pair of state nodes while simultaneously attempting to amplify the difference between the pair of input signals. The amplification is performed by a pair of transistors in series between a source of each input signal transistor and ground. Each gate of the pair of transistors is connected to an inverted clock signal. When clock goes high, the circuit stops precharging and a voltage difference between the pair of input signals is regenerated to create a resulting differential voltage on the pair of state nodes.Type: GrantFiled: March 28, 2019Date of Patent: December 8, 2020Assignee: Advanced Micro Devices, Inc.Inventors: Milam Paraschou, Jeffrey Cooper
-
Patent number: 10861122Abstract: Methods, systems and non-transitory computer readable media are described. A system includes a shader pipe array, a redundant shader pipe array, a sequencer and a redundant shader switch. The shader pipe array includes multiple shader pipes, each of which perform rendering calculations on data provided thereto. The redundant shader pipe array also performs rendering calculations on data provided thereto. The sequencer identifies at least one defective shader pipe in the shader pipe array, and, in response, generates a signal. The redundant shader switch receives the generated signal, and, in response, transfers the data destined for each shader pipe identified as being defective independently to the redundant shader pipe array.Type: GrantFiled: May 17, 2016Date of Patent: December 8, 2020Assignee: Advanced Micro Devices, Inc.Inventors: Michael J. Mantor, Jeffrey T. Brady, Angel E. Socarras
-
Publication number: 20200379544Abstract: Platform power management includes boosting performance in a platform power boost mode or restricting performance to keep a power or temperature under a desired threshold in a platform power cap mode. Platform power management exploits the mutually exclusive nature of activities and the associated headroom created in a temperature and/or power budget of a server platform to boost performance of a particular component while also keeping temperature and/or power below a threshold or budget.Type: ApplicationFiled: May 31, 2019Publication date: December 3, 2020Applicants: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Indrani Paul, Sriram Sambamurthy, Larry David Hewitt, Kevin M. Lepak, Samuel D. Naffziger, Adam Neil Calder Clark, Aaron Joseph Grenat, Steven Frederick Liepe, Sandhya Shyamasundar, Wonje Choi, Dana Glenn Lewis, Leonardo de Paula Rosa Piga
-
Publication number: 20200380761Abstract: Described herein are techniques for performing ray tracing operations. A command processor executes custom instructions for orchestrating a ray tracing pipeline. The custom instructions cause the command processor to perform a series of loop iterations, each at a particular recursion depth. In a first loop iteration, a ray generation shader is executed that triggers execution of a trace ray operation. In any other iteration, zero or more shaders are executed based on the contents of a shader queue. Any shader may trigger execution of a trace ray operation. The trace ray operation determines whether a ray specified by the shader intersects a triangle. The ray trace operation places shader entries into a shader queue, at the current recursion depth plus 1. The command processor updates the current recursion depth based on whether a trace ray operation is executed. The loop ends when the recursion depth is less than a threshold.Type: ApplicationFiled: May 28, 2019Publication date: December 3, 2020Applicant: Advanced Micro Devices, Inc.Inventor: Rohan Mehalwal
-
Publication number: 20200379814Abstract: Techniques for scheduling resources on a managed computer system are provided herein. A generative adversarial network generates predicted resource utilization. An orchestrator trains the generative adversarial network and provides the predicted resource utilization from the generative adversarial network to a resource scheduler for usage when the quality of the predicted resource utilization is above a threshold. The quality is measured as the ability of a generator component of the generative adversarial network to “fool” a discriminator component of the generative adversarial network into misclassifying the predicted resource utilization as being real (i.e., being of the type that is actually measured from the computer system).Type: ApplicationFiled: May 29, 2019Publication date: December 3, 2020Applicant: Advanced Micro Devices, Inc.Inventors: Sergey Blagodurov, Abhinav Vishnu, Thaleia Dimitra Doudali, Jagadish B. Kotra
-
Publication number: 20200379820Abstract: A technique for synchronizing workgroups is provided. Multiple workgroups execute a wait instruction that specifies a condition variable and a condition. A workgroup scheduler stops execution of a workgroup that executes a wait instruction and an advanced controller begins monitoring the condition variable. In response to the advanced controller detecting that the condition is met, the workgroup scheduler determines whether there is a high contention scenario, which occurs when the wait instruction is part of a mutual exclusion synchronization primitive and is detected by determining that there is a low number of updates to the condition variable prior to detecting that the condition has been met. In a high contention scenario, the workgroup scheduler wakes up one workgroup and schedules another workgroup to be woken up at a time in the future. In a non-contention scenario, more than one workgroup can be woken up at the same time.Type: ApplicationFiled: May 29, 2019Publication date: December 3, 2020Applicant: Advanced Micro Devices, Inc.Inventors: Alexandru Dutu, Sergey Blagodurov, Anthony T. Gutierrez, Matthew D. Sinclair, David A. Wood, Bradford M. Beckmann
-
Publication number: 20200380063Abstract: Techniques for performing in-memory matrix multiplication, taking into account temperature variations in the memory, are disclosed. In one example, the matrix multiplication memory uses ohmic multiplication and current summing to perform the dot products involved in matrix multiplication. One downside to this analog form of multiplication is that temperature affects the accuracy of the results. Thus techniques are provided herein to compensate for the effects of temperature increases on the accuracy of in-memory matrix multiplications. According to the techniques, portions of input matrices are classified as effective or ineffective. Effective portions are mapped to low temperature regions of the in-memory matrix multiplier and ineffective portions are mapped to high temperature regions of the in-memory matrix multiplier. The matrix multiplication is then performed.Type: ApplicationFiled: May 31, 2019Publication date: December 3, 2020Applicant: Advanced Micro Devices, Inc.Inventors: Majed Valad Beigi, Amin Farmahini-Farahani, Sudhanva Gurumurthi
-
Patent number: 10855222Abstract: A clock synthesizer has integrated voltage droop detection and clock stretching. An oscillator of the clock synthesizer receives a control current from a digital to analog converter and generates an oscillator output signal. A droop detector and clock stretching circuit responds to a voltage droop of a supply voltage supplying circuits coupled to the oscillator output signal, to cause a portion of the oscillator control current to be diverted from the oscillator to thereby cause the oscillator to reduce the first frequency. The diversion can be accomplished through shunt circuits or a current mirror circuit.Type: GrantFiled: June 29, 2018Date of Patent: December 1, 2020Assignees: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Dirk J. Robinson, Andy Huei Chu, Yan Sun, Saket Sham Doshi
-
Patent number: 10853904Abstract: A processor employs a hierarchical register file for a graphics processing unit (GPU). A top level of the hierarchical register file is stored at a local memory of the GPU (e.g., a memory on the same integrated circuit die as the GPU). Lower levels of the hierarchical register file are stored at a different, larger memory, such as a remote memory located on a different die than the GPU. A register file control module monitors the status of in-flight wavefronts at the GPU, and in particular whether each in-flight wavefront is active, predicted to be become active, or inactive. The register file control module places execution data for active and predicted-active wavefronts in the top level of the hierarchical register file and places execution data for inactive wavefronts at lower levels of the hierarchical register file.Type: GrantFiled: March 24, 2016Date of Patent: December 1, 2020Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Yasuko Eckert, Nuwan Jayasena
-
Patent number: 10853075Abstract: An electronic device handles accesses of a branch prediction functional block when executing instructions in program code. The electronic device includes a processor having the branch prediction functional block that provides branch prediction information for control transfer instructions (CTIs) in the program code and a minimum predictor use (MPU) functional block. The MPU functional block determines, based on a record associated with a given fetch group of instructions, that a specified number of subsequent fetch groups of instructions that were previously determined to include no CTIs or conditional CTIs that were not taken are to be fetched for execution in sequence following the given fetch group. The MPU functional block then, when each of the specified number of the subsequent fetch groups is fetched and prepared for execution, prevents corresponding accesses of the branch prediction functional block for acquiring branch prediction information for instructions in that subsequent fetch group.Type: GrantFiled: December 23, 2019Date of Patent: December 1, 2020Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Varun Agrawal, John Kalamatianos, Adithya Yalavarti, Jingjie Qian
-
Patent number: 10848137Abstract: A C-element circuit for use in an oscillator or the like includes a first input terminal for receiving a first input signal, a second input terminal for receiving a second input signal, and an output latch for providing an output signal based on a relationship between the two input signals. A stack of input transistors is included with an outer pair of input transistors with gates connected to the first input terminal and an inner pair of input transistors with gates connected to a second input terminal. A balancing circuit operates to equalize a first delay of a change in the first input signal affecting the output signal with a second delay of a change in the second input signal affecting the output signal. Bypass control techniques are provided for using the C-element circuit with a single input.Type: GrantFiled: May 8, 2019Date of Patent: November 24, 2020Assignees: ATI Technologies ULC, Advanced Micro Devices, Inc.Inventors: Mikhail Rodionov, Stephen Victor Kosonocky, Joyce Cheuk Wai Wong
-
Patent number: 10848135Abstract: A receiver circuit holds an output voltage at a first output voltage level using a first device of a first type coupled between a first node and a first power supply node, and a second device of a second type coupled between the first node and the first power supply node. The first device is selectively enabled using an input signal. The second device is selectively enabled using a feedback signal. The second device is substantially larger than the first device. The receiver circuit switches the output voltage from the first output voltage level to a second output voltage level responsive to an input voltage level transitioning across a first threshold voltage level from a first input voltage level to a second input voltage level.Type: GrantFiled: August 12, 2019Date of Patent: November 24, 2020Assignee: Advanced Micro Devices, Inc.Inventors: Ashish Sahu, Girish Anathahally Singrigowda, Aniket Bharat Waghide, Prasanth K. Vallur
-
Patent number: 10846253Abstract: Systems, apparatuses, and methods for performing efficient memory accesses for a computing system are disclosed. When a memory controller in a computing system determines a threshold number of memory access requests have not been sent to the memory device in a current mode of a read mode and a write mode, a first cost corresponding to a latency associated with sending remaining requests in either the read queue or the write queue associated with the current mode is determined. If the first cost exceeds the cost of a data bus turnaround, the cost of a data bus turnaround comprising a latency incurred when switching a transmission direction of the data bus from one direction to an opposite direction, then a second cost is determined for sending remaining memory access requests to the memory device. If the second cost does not exceed the cost of the data bus turnaround, then a time for the data bus turnaround is indicated and the current mode of the memory controller is changed.Type: GrantFiled: December 21, 2017Date of Patent: November 24, 2020Assignee: Advanced Micro Devices, Inc.Inventors: Guanhao Shen, Ravindra N. Bhargava, Kedarnath Balakrishnan
-
Patent number: 10848177Abstract: Methods and apparatus are described. A method, implemented in a decoder, includes receiving two or more signals from an encoder over two or more respective wires. At least one of the two or more signals includes at least one code that was recoded by the encoder. The decoder receives a recoding table. The recoding table provides a mapping indicating the recoding for each code that was recoded by the encoder in the received two or more signals. The decoder decodes the two or more received signals using the received recoding table.Type: GrantFiled: February 17, 2017Date of Patent: November 24, 2020Assignee: Advanced Micro Devices, Inc.Inventor: Greg Sadowski
-
Patent number: 10846095Abstract: A system and method for a virtual load queue is described. Load micro-operations are processed through an instruction pipeline without requiring an entry in a load queue (LDQ). An address generation scheduler queue (AGSQ) entry is allocated to the load micro-operation and a LDQ entry is not allocated to the load micro-operation. The LDQ entries are reserved for the N oldest load micro-operations, where N is the depth of the LDQ. Deallocation of the AGSQ entry is done if the load micro-operation is one of the N oldest load micro-operations, or upon successful completion of the load micro-operation. Deallocation of the AGSQ entry is not done if the load micro-operation gets a bad status and is not one of the N oldest micro-operations. Consequently, the AGSQ acts as a virtual queue for the LDQ and mitigates the limiting effect of the LDQ depth.Type: GrantFiled: November 28, 2017Date of Patent: November 24, 2020Assignee: Advanced Micro Devices, Inc.Inventor: John M. King
-
Publication number: 20200364573Abstract: Systems, methods, and devices for pruning a convolutional neural network (CNN). A subset of layers of the CNN is chosen, and for each layer of the subset of layers, how salient each filter in the layer is to an output of the CNN is determined, a subset of the filters in the layer is determined based on the salience of each filter in the layer, and the subset of filters in the layer is pruned. In some implementations, the layers of the subset of layers of the CNN are non-contiguous. In some implementations, the subset of layers includes odd numbered layers of the CNN and excludes even numbered layers of the CNN. In some implementations, the subset of layers includes even numbered layers of the CNN and excludes odd numbered layers of the CNN.Type: ApplicationFiled: June 28, 2019Publication date: November 19, 2020Applicant: Advanced Micro Devices, Inc.Inventors: Arun Coimbatore Ramachandran, Chandra Kumar Ramasamy, Prakash Sathyanath Raghavendra, Keerthan Subraya Shagrithaya
-
Patent number: 10838864Abstract: A miss in a cache by a thread in a wavefront is detected. The wavefront includes a plurality of threads that are executing a memory access request concurrently on a corresponding plurality of processor cores. A priority is assigned to the thread based on whether the memory access request is addressed to a local memory or a remote memory. The memory access request for the thread is performed based on the priority. In some cases, the cache is selectively bypassed depending on whether the memory access request is addressed to the local or remote memory. A cache block is requested in response to the miss. The cache block is biased towards a least recently used position in response to requesting the cache block from the local memory and towards a most recently used position in response to requesting the cache block from the remote memory.Type: GrantFiled: May 30, 2018Date of Patent: November 17, 2020Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Michael W. Boyer, Onur Kayiran, Yasuko Eckert, Steven Raasch, Muhammad Shoaib Bin Altaf
-
Patent number: 10838727Abstract: A processing device is provided which includes memory and at least one processor. The memory includes main memory and cache memory in communication with the main memory via a link. The at least one processor is configured to receive a request for a cache line and read the cache line from main memory. The at least one processor is also configured to compress the cache line according to a compression algorithm and, when the compressed cache line includes at least one byte predicted not to be accessed, drop the at least one byte from the compressed cache line based on whether the compression algorithm is determined to successfully compress the cache line according to a compression parameter.Type: GrantFiled: December 14, 2018Date of Patent: November 17, 2020Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Shomit N. Das, Kishore Punniyamurthy, Matthew Tomei, Bradford M. Beckmann