Patents Assigned to Advanced Micros Devices, Inc.

Load balancing scheme

Patent number: 10491524

Abstract: A system for implementing load balancing schemes includes one or more processing units, a memory, and a communication fabric with a plurality of switches coupled to the processing unit(s) and the memory. A switch of the fabric determines a first number of streams on a first input port that are targeting a first output port. The switch also determines a second number of requestors, from all input ports, that are targeting the first output port. Then, the switch calculates a throttle factor for the first input port by dividing the first number of streams by the second number of streams. The switch applies the throttle factor to regulate bandwidth on the first input port for requestors targeting the first output port. The switch also calculates throttle factors for the other ports and applies the throttle factors when regulating bandwidth on the other ports.

Type: Grant

Filed: November 7, 2017

Date of Patent: November 26, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Alan Dodson Smith, Chintan S. Patel, Eric Christopher Morton, Vydhyanathan Kalyanasundharam, Narendra Kamat
Data compression with inline compression metadata

Patent number: 10489350

Abstract: Techniques for handling data compression in which metadata that indicates which portions of data are compressed are which portions of data are not compressed are disclosed. Segments of a buffer referred to as block groups store compressed blocks of data along with uncompressed blocks of data and hash blocks. If a block group includes a block that is a hash of another block in the block group, then the other block is considered to be compressed. If the block group does not include a block that is a hash of another block in the block group, then the blocks in the block group are uncompressed. The hash function to generate the hash is selected to prevent “collisions,” which occur when the data being stored in the buffer is such that it is possible for a hash block and an uncompressed block to be the same.

Type: Grant

Filed: February 24, 2017

Date of Patent: November 26, 2019

Assignee: Advanced Micro Devices, Inc.

Inventor: Greg Sadowski
METHOD AND SYSTEM FOR REDUCING COMMUNICATION FREQUENCY IN NEURAL NETWORK SYSTEMS

Publication number: 20190354833

Abstract: Methods and systems for reducing communication frequency in neural networks (NN) are described. The method includes running, in an initial epoch, mini-batches of samples from a training set through the NN and determining one or more errors from a ground truth, where the ground truth is the given label for the sample. The errors are recorded for each sample and are sorted in a non-decreasing order. In a next epoch, mini-batches of samples are formed starting from the sample which has the smallest error in the sorted list. The parameters of the NN are updated and the mini-batches are run. A mini-batch(es) are communicated to the other processing elements if a previous update has resulted in making a significant impact on the NN, where significant impact is measured by determining if the errors or accumulated errors since the last communication update meet or exceed a significance threshold.

Type: Application

Filed: July 5, 2018

Publication date: November 21, 2019

Applicant: Advanced Micro Devices, Inc.

Inventor: Abhinav Vishnu
Nondeterministic memory access requests to non-volatile memory

Patent number: 10482043

Abstract: A memory module includes a memory, a cache to cache copies of information stored in the memory, and a controller. The controller is configured to access first data from the memory or the cache in response to receiving a read request from a processor. The controller is also configured to transmit a first signal a first nondeterministic time interval after receiving the read request. The first signal indicates that the first data is available. The controller is further configured to transmit a second signal a first deterministic time interval after receiving a first transmit request from the processor in response to the first signal. The second signal includes the first data. The memory module also includes a buffer to store a write request until completion and a counter that is incremented in response to receiving the write request and decremented in response to completing the write request.

Type: Grant

Filed: July 28, 2017

Date of Patent: November 19, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Aaron Nygren, Michael Ignatowski, David A. Roberts
Indicating instruction scheduling mode for processing wavefront portions

Patent number: 10474468

Abstract: Systems, apparatuses, and methods for processing variable wavefront sizes on a processor are disclosed. In one embodiment, a processor includes at least a scheduler, cache, and multiple execution units. When operating in a first mode, the processor executes the same instruction on multiple portions of a wavefront before proceeding to the next instruction of the shader program. When operating in a second mode, the processor executes a set of instructions on a first portion of a wavefront. In the second mode, when the processor finishes executing the set of instructions on the first portion of the wavefront, the processor executes the set of instructions on a second portion of the wavefront, and so on until all portions of the wavefront have been processed. The processor determines the operating mode based on one or more conditions.

Type: Grant

Filed: February 22, 2017

Date of Patent: November 12, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Michael J. Mantor, Brian D. Emberling, Mark Fowler, Mark M. Leather
Method for dynamic arbitration of real-time streams in the multi-client systems

Patent number: 10474211

Abstract: A data processing system includes a power manager for providing a power event depth signal in response to a power event request signal. A plurality of real-time clients is coupled to the power manager. Each real-time client includes a client buffer that has a plurality of entries for storing data. The real-time client also includes a register for storing a watermark threshold for the client buffer, as well as logic for providing an allow signal when a number of valid entries in the client buffer exceeds the watermark threshold. A power management state machine is coupled to each of the plurality of real-time clients. The power management state machine provides a power event start signal in response to all of the plurality of real-time clients providing respective allow signals.

Type: Grant

Filed: July 28, 2017

Date of Patent: November 12, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Sonu Arora, Alexander Branover, Benjamin Tsien
Early virtualization context switch for virtualized accelerated processing device

Patent number: 10474490

Abstract: A technique for efficient time-division of resources in a virtualized accelerated processing device (“APD”) is provided. In a virtualization scheme implemented on the APD, different virtual machines are assigned different “time-slices” in which to use the APD. When a time-slice expires, the APD performs a virtualization context switch by stopping operations for a current virtual machine (“VM”) and starting operations for another VM. Typically, each VM is assigned a fixed length of time, after which a virtualization context switch is performed. This fixed length of time can lead to inefficiencies. Therefore, in some situations, in response to a VM having no more work to perform on the APD and the APD being idle, a virtualization context switch is performed “early.” This virtualization context switch is “early” in the sense that the virtualization context switch is performed before the fixed length of time for the time-slice expires.

Type: Grant

Filed: June 29, 2017

Date of Patent: November 12, 2019

Assignees: Advanced Micro Devices, Inc., ATI TECHNOLOGIES ULC

Inventors: Gongxian Jeffrey Cheng, Louis Regniere, Anthony Asaro
Caching policies for processing units on multiple sockets

Patent number: 10467138

Abstract: A processing system includes a first socket, a second socket, and an interface between the first socket and the second socket. A first memory is associated with the first socket and a second memory is associated with the second socket. The processing system also includes a controller for the first memory. The controller is to receive a first request for a first memory transaction with the second memory and perform the first memory transaction along a path that includes the interface and bypasses at least one second cache associated with the second memory.

Type: Grant

Filed: December 28, 2015

Date of Patent: November 5, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Paul Blinzer, Ali Ibrahim, Benjamin T. Sander, Vydhyanathan Kalyanasundharam
Peripheral component

Patent number: 10467178

Abstract: Embodiments of a peripheral component are described herein. Embodiments provide alternatives to the use of an external bridge integrated circuit (IC) architecture. For example, an embodiment multiplexes a peripheral bus such that multiple processors in one peripheral component can use one peripheral interface slot without requiring an external bridge IC. Embodiments are usable with known bus protocols.

Type: Grant

Filed: December 9, 2016

Date of Patent: November 5, 2019

Assignees: ADVANCED MICRO DEVICES, INC., ATI TECHNOLOGIES ULC.

Inventors: Shahin Solki, Stephen Morein, Mark S. Grossman
Method and system for yield operation supporting thread-like behavior

Patent number: 10467013

Abstract: A method, system, and computer program product synchronize a group of workitems executing an instruction stream on a processor. The processor is yielded by a first workitem responsive to a synchronization instruction in the instruction stream. A first one of a plurality of program counters is updated to point to a next instruction following the synchronization instruction in the instruction stream to be executed by the first workitem. A second workitem is run on the processor after the yielding.

Type: Grant

Filed: November 29, 2018

Date of Patent: November 5, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Lee W. Howes, Benedict R. Gaster, Michael C. Houston
CONFIGURATION OF MULTI-DIE MODULES WITH THROUGH-SILICON VIAS

Publication number: 20190332561

Abstract: A data processing system includes a processing unit that forms a base die and has a group of through-silicon vias (TSVs), and is connected to a memory system. The memory system includes a die stack that includes a first die and a second die. The first die has a first surface that includes a group of micro-bump landing pads and a group of TSV landing pads. The group of micro-bump landing pads are connected to the group of TSVs of the processing unit using a corresponding group of micro-bumps. The first die has a group of memory die TSVs. The subsequent die has a first surface that includes a group of micro-bump landing pads and a group of TSV landing pads connected to the group of TSVs of the first die. The first die communicates with the processing unit using first cycle timing, and with the subsequent die using second cycle timing.

Type: Application

Filed: April 27, 2018

Publication date: October 31, 2019

Applicant: Advanced Micro Devices, Inc.

Inventors: Russell Schreiber, John Wuu, Michael K. Ciraula, Patrick J. Shyvers
Transmission of large messages in computer systems

Patent number: 10459776

Abstract: Techniques for managing message transmission in a large networked computer system that includes multiple individual networked computing systems are disclosed. Message passing among the computing systems include a sending computing device transmitting a message to a receiver computing device and a receiver computing device consuming that message. A build-up of data stored in a buffer at the receiver can reduce performance. In order to reduce the potential performance degradation associated with large amounts of “waiting” data in the buffer, a sending computer system first determines whether the receiver computer system is ready to receive a message and does not transmit the message if the receiver computer system is not ready. To determine whether the receiver computer system is ready to receive a message, the receiver computer system, at the request of the sending computer system, checks a counting filter that stores indications of whether particular messages are ready.

Type: Grant

Filed: June 5, 2017

Date of Patent: October 29, 2019

Assignee: ADVANCED MICRO DEVICES, INC.

Inventor: Shuai Che
Combined world-space pipeline shader stages

Patent number: 10460513

Abstract: Improvements to graphics processing pipelines are disclosed. More specifically, the vertex shader stage, which performs vertex transformations, and the hull or geometry shader stages, are combined. If tessellation is disabled and geometry shading is enabled, then the graphics processing pipeline includes a combined vertex and graphics shader stage. If tessellation is enabled, then the graphics processing pipeline includes a combined vertex and hull shader stage. If tessellation and geometry shading are both disabled, then the graphics processing pipeline does not use a combined shader stage. The combined shader stages improve efficiency by reducing the number of executing instances of shader programs and associated resources reserved.

Type: Grant

Filed: December 23, 2016

Date of Patent: October 29, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Mangesh P. Nijasure, Randy W. Ramsey, Todd Martin
System and method for virtualized process isolation including preventing a kernel from accessing user address space

Patent number: 10459850

Abstract: Systems, apparatuses, and methods for implementing virtualized process isolation are disclosed. A system includes a kernel and multiple guest virtual machines (VMs) executing on the system's processing hardware. Each guest VM includes a vShim layer for managing kernel accesses to user space and guest accesses to kernel space. The vShim layer also maintains a set of page tables separate from the kernel page tables. In one embodiment, data in the user space is encrypted and the kernel goes through the vShim layer to access user space data. When the kernel attempts to access a user space address, the kernel exits and the vShim layer is launched to process the request. If the kernel has permission to access the user space address, the vShim layer copies the data to a region in kernel space and then returns execution to the kernel. The vShim layer prevents the kernel from accessing the user space address if the kernel does not have permission to access the user space address.

Type: Grant

Filed: September 20, 2016

Date of Patent: October 29, 2019

Assignee: Advanced Micro Devices, Inc.

Inventor: David A. Kaplan
System and method for store fusion

Patent number: 10459726

Abstract: Described herein is a system and method for store fusion that fuses small store operations into fewer, larger store operations. The system detects that a pair of adjacent operations are consecutive store operations, where the adjacent micro-operations refers to micro-operations flowing through adjacent dispatch slots and the consecutive store micro-operations refers to both of the adjacent micro-operations being store micro-operations. The consecutive store operations are then reviewed to determine if the data sizes are the same and if the store operation addresses are consecutive. The two store operations are then fused together to form one store operation with twice the data size and one store data HI operation.

Type: Grant

Filed: November 27, 2017

Date of Patent: October 29, 2019

Assignee: ADVANCED MICRO DEVICES, INC.

Inventor: John M. King
Accurate on-chip temperature sensing using thermal oscillator

Patent number: 10458857

Abstract: A calibrated temperature sensor includes a power on oscillator responsive to a calibration enable signal for providing a power on clock signal, a temperature dependent oscillator responsive to said calibration enable signal for providing a temperature dependent clock signal, and a measurement logic circuit. The measurement logic circuit counts a first number of pulses of the temperature dependent clock signal during a first calibration period using the power on clock signal, a second number of pulses of the temperature dependent clock signal during a second calibration period using a system clock signal, and a third number of pulses of the power on clock signal over a third calibration period using the system clock signal, and a fourth number of pulses of the temperature dependent clock signal using the system clock signal during a normal operation mode, wherein the first calibration period precedes both the second and third calibration periods.

Type: Grant

Filed: February 22, 2018

Date of Patent: October 29, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Ravinder Reddy Rachala, Stephen Victor Kosonocky, Stephen C. Ennis
Preemptive cache writeback with transaction support

Patent number: 10452548

Abstract: A method of preemptive cache writeback includes transmitting, from a first cache controller of a first cache to a second cache controller of a second cache, an unused bandwidth message representing an unused bandwidth between the first cache and the second cache during a first cycle. During a second cycle, a cache line containing dirty data is preemptively written back from the second cache to the first cache based on the unused bandwidth message. Further, the cache line in the second cache is written over in response to a cache miss to the second cache.

Type: Grant

Filed: September 28, 2017

Date of Patent: October 22, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: David A. Roberts, Elliot H. Mednick
Error injection for assessment of error detection and correction techniques using error injection logic and non-volatile memory

Patent number: 10452505

Abstract: A memory system includes a non-volatile memory unit, a content-addressable memory unit coupled to the non-volatile memory unit, and an error injection logic unit coupled to the non-volatile memory unit and the content addressable memory unit. The non-volatile memory unit is programmed to allow a first error injection onto a first data word using the error injection logic unit. The error injection logic in combination with the content addressable memory unit replaces a bit cell in the memory system. The memory system performs an evaluation of various error detection and correction techniques.

Type: Grant

Filed: December 20, 2017

Date of Patent: October 22, 2019

Assignee: Advanced Micro Devices, Inc.

Inventor: Michael K. Ciraula
Primitive level preemption using discrete non-real-time and real time pipelines

Patent number: 10453243

Abstract: Processing of non-real-time and real-time workloads is performed using discrete pipelines. A first pipeline includes a first shader and one or more fixed function hardware blocks. A second pipeline includes a second shader that is configured to emulate the at least one fixed function hardware block. First and second memory elements store first state information for the first pipeline and second state information for the second pipeline, respectively. A non-real-time workload executing in the first pipeline is preempted at a primitive boundary in response to a real-time workload being dispatched for execution in the second pipeline. The first memory element retains the first state information in response to preemption of the non-real-time workload. The first pipeline is configured to resume processing the subsequent primitive on the basis of the first state information stored in the first memory element.

Type: Grant

Filed: January 3, 2019

Date of Patent: October 22, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Anirudh R. Acharya, Swapnil Sakharshete, Michael Mantor, Mangesh P. Nijasure, Todd Martin, Vineet Goel
Temperature-aware task scheduling and proactive power management

Patent number: 10452437

Abstract: Systems, apparatuses, and methods for performing temperature-aware task scheduling and proactive power management. A SoC includes a plurality of processing units and a task queue storing pending tasks. The SoC calculates a thermal metric for each pending task to predict an amount of heat the pending task will generate. The SoC also determines a thermal gradient for each processing unit to predict a rate at which the processing unit's temperature will change when executing a task. The SoC also monitors a thermal margin of how far each processing unit is from reaching its thermal limit. The SoC minimizes non-uniform heat generation on the SoC by scheduling pending tasks from the task queue to the processing units based on the thermal metrics for the pending tasks, the thermal gradients of each processing unit, and the thermal margin available on each processing unit.

Type: Grant

Filed: June 24, 2016

Date of Patent: October 22, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Abhinandan Majumdar, Brian J. Kocoloski, Leonardo Piga, Wei Huang, Yasuko Eckert

prev … 106 107 108 109 110 111 112 113 114 … next