Patents Assigned to Advanced Micros Devices, Inc.

Byte select cache compression

Patent number: 10860489

Abstract: Techniques are disclosed for designing cache compression algorithms that control how data in caches are compressed. The techniques generate a custom “byte select algorithm” by applying repeated transforms applied to an initial compression algorithm until a set of suitability criteria is met. The suitability criteria include that the “cost” is below a threshold and that a metadata constraint is met. The “cost” is the number of blocks that can be compressed by an algorithm as compared with the “ideal” algorithm. The metadata constraint is the number of bits required for metadata.

Type: Grant

Filed: October 31, 2018

Date of Patent: December 8, 2020

Assignee: Advanced Micro Devices, Inc.

Inventors: Shomit N. Das, Matthew Tomei, David A. Wood
Modifying carrier packets based on information in tunneled packets

Patent number: 10862809

Abstract: The described embodiments include an electronic device that handles network packets. During operation, the electronic device receives a carrier packet, the carrier packet that includes a tunneled packet in a payload of the carrier packet, wherein the tunneled packet includes a packet priority of the tunneled packet and the carrier packet includes a packet priority of the carrier packet. The electronic device then updates the packet priority of the carrier packet based on the packet priority of the tunneled packet.

Type: Grant

Filed: May 19, 2017

Date of Patent: December 8, 2020

Assignee: ADVANCED MICRO DEVICES, INC.

Inventor: David A. Roberts
Sense amplifier with increased headroom

Patent number: 10861507

Abstract: Systems, apparatuses, and methods for implementing a sampling circuit with increased headroom are disclosed. A sampling circuit includes at least a pair of input signal transistors connected via their drains to a cross-coupled pair of state nodes. The cross-coupled pair of state nodes are coupled to a tail transistor device via the sources of N-type transistors. When clock goes low, the circuit precharges the cross-coupled pair of state nodes while simultaneously attempting to amplify the difference between the pair of input signals. The amplification is performed by a pair of transistors in series between a source of each input signal transistor and ground. Each gate of the pair of transistors is connected to an inverted clock signal. When clock goes high, the circuit stops precharging and a voltage difference between the pair of input signals is regenerated to create a resulting differential voltage on the pair of state nodes.

Type: Grant

Filed: March 28, 2019

Date of Patent: December 8, 2020

Assignee: Advanced Micro Devices, Inc.

Inventors: Milam Paraschou, Jeffrey Cooper
Redundancy method and apparatus for shader column repair

Patent number: 10861122

Abstract: Methods, systems and non-transitory computer readable media are described. A system includes a shader pipe array, a redundant shader pipe array, a sequencer and a redundant shader switch. The shader pipe array includes multiple shader pipes, each of which perform rendering calculations on data provided thereto. The redundant shader pipe array also performs rendering calculations on data provided thereto. The sequencer identifies at least one defective shader pipe in the shader pipe array, and, in response, generates a signal. The redundant shader switch receives the generated signal, and, in response, transfers the data destined for each shader pipe identified as being defective independently to the redundant shader pipe array.

Type: Grant

Filed: May 17, 2016

Date of Patent: December 8, 2020

Assignee: Advanced Micro Devices, Inc.

Inventors: Michael J. Mantor, Jeffrey T. Brady, Angel E. Socarras
PLATFORM POWER MANAGER FOR RACK LEVEL POWER AND THERMAL CONSTRAINTS

Publication number: 20200379544

Abstract: Platform power management includes boosting performance in a platform power boost mode or restricting performance to keep a power or temperature under a desired threshold in a platform power cap mode. Platform power management exploits the mutually exclusive nature of activities and the associated headroom created in a temperature and/or power budget of a server platform to boost performance of a particular component while also keeping temperature and/or power below a threshold or budget.

Type: Application

Filed: May 31, 2019

Publication date: December 3, 2020

Applicants: Advanced Micro Devices, Inc., ATI Technologies ULC

Inventors: Indrani Paul, Sriram Sambamurthy, Larry David Hewitt, Kevin M. Lepak, Samuel D. Naffziger, Adam Neil Calder Clark, Aaron Joseph Grenat, Steven Frederick Liepe, Sandhya Shyamasundar, Wonje Choi, Dana Glenn Lewis, Leonardo de Paula Rosa Piga
COMMAND PROCESSOR BASED MULTI DISPATCH SCHEDULER

Publication number: 20200380761

Abstract: Described herein are techniques for performing ray tracing operations. A command processor executes custom instructions for orchestrating a ray tracing pipeline. The custom instructions cause the command processor to perform a series of loop iterations, each at a particular recursion depth. In a first loop iteration, a ray generation shader is executed that triggers execution of a trace ray operation. In any other iteration, zero or more shaders are executed based on the contents of a shader queue. Any shader may trigger execution of a trace ray operation. The trace ray operation determines whether a ray specified by the shader intersects a triangle. The ray trace operation places shader entries into a shader queue, at the current recursion depth plus 1. The command processor updates the current recursion depth based on whether a trace ray operation is executed. The loop ends when the recursion depth is less than a threshold.

Type: Application

Filed: May 28, 2019

Publication date: December 3, 2020

Applicant: Advanced Micro Devices, Inc.

Inventor: Rohan Mehalwal
COMPUTER RESOURCE SCHEDULING USING GENERATIVE ADVERSARIAL NETWORKS

Publication number: 20200379814

Abstract: Techniques for scheduling resources on a managed computer system are provided herein. A generative adversarial network generates predicted resource utilization. An orchestrator trains the generative adversarial network and provides the predicted resource utilization from the generative adversarial network to a resource scheduler for usage when the quality of the predicted resource utilization is above a threshold. The quality is measured as the ability of a generator component of the generative adversarial network to “fool” a discriminator component of the generative adversarial network into misclassifying the predicted resource utilization as being real (i.e., being of the type that is actually measured from the computer system).

Type: Application

Filed: May 29, 2019

Publication date: December 3, 2020

Applicant: Advanced Micro Devices, Inc.

Inventors: Sergey Blagodurov, Abhinav Vishnu, Thaleia Dimitra Doudali, Jagadish B. Kotra
SYNCHRONIZATION MECHANISM FOR WORKGROUPS

Publication number: 20200379820

Abstract: A technique for synchronizing workgroups is provided. Multiple workgroups execute a wait instruction that specifies a condition variable and a condition. A workgroup scheduler stops execution of a workgroup that executes a wait instruction and an advanced controller begins monitoring the condition variable. In response to the advanced controller detecting that the condition is met, the workgroup scheduler determines whether there is a high contention scenario, which occurs when the wait instruction is part of a mutual exclusion synchronization primitive and is detected by determining that there is a low number of updates to the condition variable prior to detecting that the condition has been met. In a high contention scenario, the workgroup scheduler wakes up one workgroup and schedules another workgroup to be woken up at a time in the future. In a non-contention scenario, more than one workgroup can be woken up at the same time.

Type: Application

Filed: May 29, 2019

Publication date: December 3, 2020

Applicant: Advanced Micro Devices, Inc.

Inventors: Alexandru Dutu, Sergey Blagodurov, Anthony T. Gutierrez, Matthew D. Sinclair, David A. Wood, Bradford M. Beckmann
TEMPERATURE-BASED ADJUSTMENTS FOR IN-MEMORY MATRIX MULTIPLICATION

Publication number: 20200380063

Abstract: Techniques for performing in-memory matrix multiplication, taking into account temperature variations in the memory, are disclosed. In one example, the matrix multiplication memory uses ohmic multiplication and current summing to perform the dot products involved in matrix multiplication. One downside to this analog form of multiplication is that temperature affects the accuracy of the results. Thus techniques are provided herein to compensate for the effects of temperature increases on the accuracy of in-memory matrix multiplications. According to the techniques, portions of input matrices are classified as effective or ineffective. Effective portions are mapped to low temperature regions of the in-memory matrix multiplier and ineffective portions are mapped to high temperature regions of the in-memory matrix multiplier. The matrix multiplication is then performed.

Type: Application

Filed: May 31, 2019

Publication date: December 3, 2020

Applicant: Advanced Micro Devices, Inc.

Inventors: Majed Valad Beigi, Amin Farmahini-Farahani, Sudhanva Gurumurthi
Clock synthesizer with integrated voltage droop detection and clock stretching

Patent number: 10855222

Abstract: A clock synthesizer has integrated voltage droop detection and clock stretching. An oscillator of the clock synthesizer receives a control current from a digital to analog converter and generates an oscillator output signal. A droop detector and clock stretching circuit responds to a voltage droop of a supply voltage supplying circuits coupled to the oscillator output signal, to cause a portion of the oscillator control current to be diverted from the oscillator to thereby cause the oscillator to reduce the first frequency. The diversion can be accomplished through shunt circuits or a current mirror circuit.

Type: Grant

Filed: June 29, 2018

Date of Patent: December 1, 2020

Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC

Inventors: Dirk J. Robinson, Andy Huei Chu, Yan Sun, Saket Sham Doshi
Hierarchical register file at a graphics processing unit

Patent number: 10853904

Abstract: A processor employs a hierarchical register file for a graphics processing unit (GPU). A top level of the hierarchical register file is stored at a local memory of the GPU (e.g., a memory on the same integrated circuit die as the GPU). Lower levels of the hierarchical register file are stored at a different, larger memory, such as a remote memory located on a different die than the GPU. A register file control module monitors the status of in-flight wavefronts at the GPU, and in particular whether each in-flight wavefront is active, predicted to be become active, or inactive. The register file control module places execution data for active and predicted-active wavefronts in the top level of the hierarchical register file and places execution data for inactive wavefronts at lower levels of the hierarchical register file.

Type: Grant

Filed: March 24, 2016

Date of Patent: December 1, 2020

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Yasuko Eckert, Nuwan Jayasena
Controlling accesses to a branch prediction unit for sequences of fetch groups

Patent number: 10853075

Abstract: An electronic device handles accesses of a branch prediction functional block when executing instructions in program code. The electronic device includes a processor having the branch prediction functional block that provides branch prediction information for control transfer instructions (CTIs) in the program code and a minimum predictor use (MPU) functional block. The MPU functional block determines, based on a record associated with a given fetch group of instructions, that a specified number of subsequent fetch groups of instructions that were previously determined to include no CTIs or conditional CTIs that were not taken are to be fetched for execution in sequence following the given fetch group. The MPU functional block then, when each of the specified number of the subsequent fetch groups is fetched and prepared for execution, prevents corresponding accesses of the branch prediction functional block for acquiring branch prediction information for instructions in that subsequent fetch group.

Type: Grant

Filed: December 23, 2019

Date of Patent: December 1, 2020

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Varun Agrawal, John Kalamatianos, Adithya Yalavarti, Jingjie Qian
Symmetrical balanced c-element

Patent number: 10848137

Abstract: A C-element circuit for use in an oscillator or the like includes a first input terminal for receiving a first input signal, a second input terminal for receiving a second input signal, and an output latch for providing an output signal based on a relationship between the two input signals. A stack of input transistors is included with an outer pair of input transistors with gates connected to the first input terminal and an inner pair of input transistors with gates connected to a second input terminal. A balancing circuit operates to equalize a first delay of a change in the first input signal affecting the output signal with a second delay of a change in the second input signal affecting the output signal. Bypass control techniques are provided for using the C-element circuit with a single input.

Type: Grant

Filed: May 8, 2019

Date of Patent: November 24, 2020

Assignees: ATI Technologies ULC, Advanced Micro Devices, Inc.

Inventors: Mikhail Rodionov, Stephen Victor Kosonocky, Joyce Cheuk Wai Wong
Receiver design with reduced variation

Patent number: 10848135

Abstract: A receiver circuit holds an output voltage at a first output voltage level using a first device of a first type coupled between a first node and a first power supply node, and a second device of a second type coupled between the first node and the first power supply node. The first device is selectively enabled using an input signal. The second device is selectively enabled using a feedback signal. The second device is substantially larger than the first device. The receiver circuit switches the output voltage from the first output voltage level to a second output voltage level responsive to an input voltage level transitioning across a first threshold voltage level from a first input voltage level to a second input voltage level.

Type: Grant

Filed: August 12, 2019

Date of Patent: November 24, 2020

Assignee: Advanced Micro Devices, Inc.

Inventors: Ashish Sahu, Girish Anathahally Singrigowda, Aniket Bharat Waghide, Prasanth K. Vallur
Dynamic page state aware scheduling of read/write burst transactions

Patent number: 10846253

Abstract: Systems, apparatuses, and methods for performing efficient memory accesses for a computing system are disclosed. When a memory controller in a computing system determines a threshold number of memory access requests have not been sent to the memory device in a current mode of a read mode and a write mode, a first cost corresponding to a latency associated with sending remaining requests in either the read queue or the write queue associated with the current mode is determined. If the first cost exceeds the cost of a data bus turnaround, the cost of a data bus turnaround comprising a latency incurred when switching a transmission direction of the data bus from one direction to an opposite direction, then a second cost is determined for sending remaining memory access requests to the memory device. If the second cost does not exceed the cost of the data bus turnaround, then a time for the data bus turnaround is indicated and the current mode of the memory controller is changed.

Type: Grant

Filed: December 21, 2017

Date of Patent: November 24, 2020

Assignee: Advanced Micro Devices, Inc.

Inventors: Guanhao Shen, Ravindra N. Bhargava, Kedarnath Balakrishnan
Reducing power needed to send signals over wires

Patent number: 10848177

Abstract: Methods and apparatus are described. A method, implemented in a decoder, includes receiving two or more signals from an encoder over two or more respective wires. At least one of the two or more signals includes at least one code that was recoded by the encoder. The decoder receives a recoding table. The recoding table provides a mapping indicating the recoding for each code that was recoded by the encoder in the received two or more signals. The decoder decodes the two or more received signals using the received recoding table.

Type: Grant

Filed: February 17, 2017

Date of Patent: November 24, 2020

Assignee: Advanced Micro Devices, Inc.

Inventor: Greg Sadowski
System and method for processing a load micro-operation by allocating an address generation scheduler queue entry without allocating a load queue entry

Patent number: 10846095

Abstract: A system and method for a virtual load queue is described. Load micro-operations are processed through an instruction pipeline without requiring an entry in a load queue (LDQ). An address generation scheduler queue (AGSQ) entry is allocated to the load micro-operation and a LDQ entry is not allocated to the load micro-operation. The LDQ entries are reserved for the N oldest load micro-operations, where N is the depth of the LDQ. Deallocation of the AGSQ entry is done if the load micro-operation is one of the N oldest load micro-operations, or upon successful completion of the load micro-operation. Deallocation of the AGSQ entry is not done if the load micro-operation gets a bad status and is not one of the N oldest micro-operations. Consequently, the AGSQ acts as a virtual queue for the LDQ and mitigates the limiting effect of the LDQ depth.

Type: Grant

Filed: November 28, 2017

Date of Patent: November 24, 2020

Assignee: Advanced Micro Devices, Inc.

Inventor: John M. King
ACCELERATING NEURAL NETWORKS WITH ONE SHOT SKIP LAYER PRUNING

Publication number: 20200364573

Abstract: Systems, methods, and devices for pruning a convolutional neural network (CNN). A subset of layers of the CNN is chosen, and for each layer of the subset of layers, how salient each filter in the layer is to an output of the CNN is determined, a subset of the filters in the layer is determined based on the salience of each filter in the layer, and the subset of filters in the layer is pruned. In some implementations, the layers of the subset of layers of the CNN are non-contiguous. In some implementations, the subset of layers includes odd numbered layers of the CNN and excludes even numbered layers of the CNN. In some implementations, the subset of layers includes even numbered layers of the CNN and excludes odd numbered layers of the CNN.

Type: Application

Filed: June 28, 2019

Publication date: November 19, 2020

Applicant: Advanced Micro Devices, Inc.

Inventors: Arun Coimbatore Ramachandran, Chandra Kumar Ramasamy, Prakash Sathyanath Raghavendra, Keerthan Subraya Shagrithaya
Prioritizing local and remote memory access in a non-uniform memory access architecture

Patent number: 10838864

Abstract: A miss in a cache by a thread in a wavefront is detected. The wavefront includes a plurality of threads that are executing a memory access request concurrently on a corresponding plurality of processor cores. A priority is assigned to the thread based on whether the memory access request is addressed to a local memory or a remote memory. The memory access request for the thread is performed based on the priority. In some cases, the cache is selectively bypassed depending on whether the memory access request is addressed to the local or remote memory. A cache block is requested in response to the miss. The cache block is biased towards a least recently used position in response to requesting the cache block from the local memory and towards a most recently used position in response to requesting the cache block from the remote memory.

Type: Grant

Filed: May 30, 2018

Date of Patent: November 17, 2020

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Michael W. Boyer, Onur Kayiran, Yasuko Eckert, Steven Raasch, Muhammad Shoaib Bin Altaf
Device and method for cache utilization aware data compression

Patent number: 10838727

Abstract: A processing device is provided which includes memory and at least one processor. The memory includes main memory and cache memory in communication with the main memory via a link. The at least one processor is configured to receive a request for a cache line and read the cache line from main memory. The at least one processor is also configured to compress the cache line according to a compression algorithm and, when the compressed cache line includes at least one byte predicted not to be accessed, drop the at least one byte from the compressed cache line based on whether the compression algorithm is determined to successfully compress the cache line according to a compression parameter.

Type: Grant

Filed: December 14, 2018

Date of Patent: November 17, 2020

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Shomit N. Das, Kishore Punniyamurthy, Matthew Tomei, Bradford M. Beckmann

prev … 91 92 93 94 95 96 97 98 99 … next