Patents Assigned to Advanced Micros Devices, Inc.
-
Patent number: 10491524Abstract: A system for implementing load balancing schemes includes one or more processing units, a memory, and a communication fabric with a plurality of switches coupled to the processing unit(s) and the memory. A switch of the fabric determines a first number of streams on a first input port that are targeting a first output port. The switch also determines a second number of requestors, from all input ports, that are targeting the first output port. Then, the switch calculates a throttle factor for the first input port by dividing the first number of streams by the second number of streams. The switch applies the throttle factor to regulate bandwidth on the first input port for requestors targeting the first output port. The switch also calculates throttle factors for the other ports and applies the throttle factors when regulating bandwidth on the other ports.Type: GrantFiled: November 7, 2017Date of Patent: November 26, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Alan Dodson Smith, Chintan S. Patel, Eric Christopher Morton, Vydhyanathan Kalyanasundharam, Narendra Kamat
-
Patent number: 10489350Abstract: Techniques for handling data compression in which metadata that indicates which portions of data are compressed are which portions of data are not compressed are disclosed. Segments of a buffer referred to as block groups store compressed blocks of data along with uncompressed blocks of data and hash blocks. If a block group includes a block that is a hash of another block in the block group, then the other block is considered to be compressed. If the block group does not include a block that is a hash of another block in the block group, then the blocks in the block group are uncompressed. The hash function to generate the hash is selected to prevent “collisions,” which occur when the data being stored in the buffer is such that it is possible for a hash block and an uncompressed block to be the same.Type: GrantFiled: February 24, 2017Date of Patent: November 26, 2019Assignee: Advanced Micro Devices, Inc.Inventor: Greg Sadowski
-
Publication number: 20190354833Abstract: Methods and systems for reducing communication frequency in neural networks (NN) are described. The method includes running, in an initial epoch, mini-batches of samples from a training set through the NN and determining one or more errors from a ground truth, where the ground truth is the given label for the sample. The errors are recorded for each sample and are sorted in a non-decreasing order. In a next epoch, mini-batches of samples are formed starting from the sample which has the smallest error in the sorted list. The parameters of the NN are updated and the mini-batches are run. A mini-batch(es) are communicated to the other processing elements if a previous update has resulted in making a significant impact on the NN, where significant impact is measured by determining if the errors or accumulated errors since the last communication update meet or exceed a significance threshold.Type: ApplicationFiled: July 5, 2018Publication date: November 21, 2019Applicant: Advanced Micro Devices, Inc.Inventor: Abhinav Vishnu
-
Patent number: 10482043Abstract: A memory module includes a memory, a cache to cache copies of information stored in the memory, and a controller. The controller is configured to access first data from the memory or the cache in response to receiving a read request from a processor. The controller is also configured to transmit a first signal a first nondeterministic time interval after receiving the read request. The first signal indicates that the first data is available. The controller is further configured to transmit a second signal a first deterministic time interval after receiving a first transmit request from the processor in response to the first signal. The second signal includes the first data. The memory module also includes a buffer to store a write request until completion and a counter that is incremented in response to receiving the write request and decremented in response to completing the write request.Type: GrantFiled: July 28, 2017Date of Patent: November 19, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Aaron Nygren, Michael Ignatowski, David A. Roberts
-
Patent number: 10474468Abstract: Systems, apparatuses, and methods for processing variable wavefront sizes on a processor are disclosed. In one embodiment, a processor includes at least a scheduler, cache, and multiple execution units. When operating in a first mode, the processor executes the same instruction on multiple portions of a wavefront before proceeding to the next instruction of the shader program. When operating in a second mode, the processor executes a set of instructions on a first portion of a wavefront. In the second mode, when the processor finishes executing the set of instructions on the first portion of the wavefront, the processor executes the set of instructions on a second portion of the wavefront, and so on until all portions of the wavefront have been processed. The processor determines the operating mode based on one or more conditions.Type: GrantFiled: February 22, 2017Date of Patent: November 12, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Michael J. Mantor, Brian D. Emberling, Mark Fowler, Mark M. Leather
-
Patent number: 10474211Abstract: A data processing system includes a power manager for providing a power event depth signal in response to a power event request signal. A plurality of real-time clients is coupled to the power manager. Each real-time client includes a client buffer that has a plurality of entries for storing data. The real-time client also includes a register for storing a watermark threshold for the client buffer, as well as logic for providing an allow signal when a number of valid entries in the client buffer exceeds the watermark threshold. A power management state machine is coupled to each of the plurality of real-time clients. The power management state machine provides a power event start signal in response to all of the plurality of real-time clients providing respective allow signals.Type: GrantFiled: July 28, 2017Date of Patent: November 12, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Sonu Arora, Alexander Branover, Benjamin Tsien
-
Patent number: 10474490Abstract: A technique for efficient time-division of resources in a virtualized accelerated processing device (“APD”) is provided. In a virtualization scheme implemented on the APD, different virtual machines are assigned different “time-slices” in which to use the APD. When a time-slice expires, the APD performs a virtualization context switch by stopping operations for a current virtual machine (“VM”) and starting operations for another VM. Typically, each VM is assigned a fixed length of time, after which a virtualization context switch is performed. This fixed length of time can lead to inefficiencies. Therefore, in some situations, in response to a VM having no more work to perform on the APD and the APD being idle, a virtualization context switch is performed “early.” This virtualization context switch is “early” in the sense that the virtualization context switch is performed before the fixed length of time for the time-slice expires.Type: GrantFiled: June 29, 2017Date of Patent: November 12, 2019Assignees: Advanced Micro Devices, Inc., ATI TECHNOLOGIES ULCInventors: Gongxian Jeffrey Cheng, Louis Regniere, Anthony Asaro
-
Patent number: 10467138Abstract: A processing system includes a first socket, a second socket, and an interface between the first socket and the second socket. A first memory is associated with the first socket and a second memory is associated with the second socket. The processing system also includes a controller for the first memory. The controller is to receive a first request for a first memory transaction with the second memory and perform the first memory transaction along a path that includes the interface and bypasses at least one second cache associated with the second memory.Type: GrantFiled: December 28, 2015Date of Patent: November 5, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Paul Blinzer, Ali Ibrahim, Benjamin T. Sander, Vydhyanathan Kalyanasundharam
-
Patent number: 10467178Abstract: Embodiments of a peripheral component are described herein. Embodiments provide alternatives to the use of an external bridge integrated circuit (IC) architecture. For example, an embodiment multiplexes a peripheral bus such that multiple processors in one peripheral component can use one peripheral interface slot without requiring an external bridge IC. Embodiments are usable with known bus protocols.Type: GrantFiled: December 9, 2016Date of Patent: November 5, 2019Assignees: ADVANCED MICRO DEVICES, INC., ATI TECHNOLOGIES ULC.Inventors: Shahin Solki, Stephen Morein, Mark S. Grossman
-
Patent number: 10467013Abstract: A method, system, and computer program product synchronize a group of workitems executing an instruction stream on a processor. The processor is yielded by a first workitem responsive to a synchronization instruction in the instruction stream. A first one of a plurality of program counters is updated to point to a next instruction following the synchronization instruction in the instruction stream to be executed by the first workitem. A second workitem is run on the processor after the yielding.Type: GrantFiled: November 29, 2018Date of Patent: November 5, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Lee W. Howes, Benedict R. Gaster, Michael C. Houston
-
Publication number: 20190332561Abstract: A data processing system includes a processing unit that forms a base die and has a group of through-silicon vias (TSVs), and is connected to a memory system. The memory system includes a die stack that includes a first die and a second die. The first die has a first surface that includes a group of micro-bump landing pads and a group of TSV landing pads. The group of micro-bump landing pads are connected to the group of TSVs of the processing unit using a corresponding group of micro-bumps. The first die has a group of memory die TSVs. The subsequent die has a first surface that includes a group of micro-bump landing pads and a group of TSV landing pads connected to the group of TSVs of the first die. The first die communicates with the processing unit using first cycle timing, and with the subsequent die using second cycle timing.Type: ApplicationFiled: April 27, 2018Publication date: October 31, 2019Applicant: Advanced Micro Devices, Inc.Inventors: Russell Schreiber, John Wuu, Michael K. Ciraula, Patrick J. Shyvers
-
Patent number: 10459776Abstract: Techniques for managing message transmission in a large networked computer system that includes multiple individual networked computing systems are disclosed. Message passing among the computing systems include a sending computing device transmitting a message to a receiver computing device and a receiver computing device consuming that message. A build-up of data stored in a buffer at the receiver can reduce performance. In order to reduce the potential performance degradation associated with large amounts of “waiting” data in the buffer, a sending computer system first determines whether the receiver computer system is ready to receive a message and does not transmit the message if the receiver computer system is not ready. To determine whether the receiver computer system is ready to receive a message, the receiver computer system, at the request of the sending computer system, checks a counting filter that stores indications of whether particular messages are ready.Type: GrantFiled: June 5, 2017Date of Patent: October 29, 2019Assignee: ADVANCED MICRO DEVICES, INC.Inventor: Shuai Che
-
Patent number: 10460513Abstract: Improvements to graphics processing pipelines are disclosed. More specifically, the vertex shader stage, which performs vertex transformations, and the hull or geometry shader stages, are combined. If tessellation is disabled and geometry shading is enabled, then the graphics processing pipeline includes a combined vertex and graphics shader stage. If tessellation is enabled, then the graphics processing pipeline includes a combined vertex and hull shader stage. If tessellation and geometry shading are both disabled, then the graphics processing pipeline does not use a combined shader stage. The combined shader stages improve efficiency by reducing the number of executing instances of shader programs and associated resources reserved.Type: GrantFiled: December 23, 2016Date of Patent: October 29, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Mangesh P. Nijasure, Randy W. Ramsey, Todd Martin
-
Patent number: 10459850Abstract: Systems, apparatuses, and methods for implementing virtualized process isolation are disclosed. A system includes a kernel and multiple guest virtual machines (VMs) executing on the system's processing hardware. Each guest VM includes a vShim layer for managing kernel accesses to user space and guest accesses to kernel space. The vShim layer also maintains a set of page tables separate from the kernel page tables. In one embodiment, data in the user space is encrypted and the kernel goes through the vShim layer to access user space data. When the kernel attempts to access a user space address, the kernel exits and the vShim layer is launched to process the request. If the kernel has permission to access the user space address, the vShim layer copies the data to a region in kernel space and then returns execution to the kernel. The vShim layer prevents the kernel from accessing the user space address if the kernel does not have permission to access the user space address.Type: GrantFiled: September 20, 2016Date of Patent: October 29, 2019Assignee: Advanced Micro Devices, Inc.Inventor: David A. Kaplan
-
Patent number: 10459726Abstract: Described herein is a system and method for store fusion that fuses small store operations into fewer, larger store operations. The system detects that a pair of adjacent operations are consecutive store operations, where the adjacent micro-operations refers to micro-operations flowing through adjacent dispatch slots and the consecutive store micro-operations refers to both of the adjacent micro-operations being store micro-operations. The consecutive store operations are then reviewed to determine if the data sizes are the same and if the store operation addresses are consecutive. The two store operations are then fused together to form one store operation with twice the data size and one store data HI operation.Type: GrantFiled: November 27, 2017Date of Patent: October 29, 2019Assignee: ADVANCED MICRO DEVICES, INC.Inventor: John M. King
-
Patent number: 10458857Abstract: A calibrated temperature sensor includes a power on oscillator responsive to a calibration enable signal for providing a power on clock signal, a temperature dependent oscillator responsive to said calibration enable signal for providing a temperature dependent clock signal, and a measurement logic circuit. The measurement logic circuit counts a first number of pulses of the temperature dependent clock signal during a first calibration period using the power on clock signal, a second number of pulses of the temperature dependent clock signal during a second calibration period using a system clock signal, and a third number of pulses of the power on clock signal over a third calibration period using the system clock signal, and a fourth number of pulses of the temperature dependent clock signal using the system clock signal during a normal operation mode, wherein the first calibration period precedes both the second and third calibration periods.Type: GrantFiled: February 22, 2018Date of Patent: October 29, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Ravinder Reddy Rachala, Stephen Victor Kosonocky, Stephen C. Ennis
-
Patent number: 10452548Abstract: A method of preemptive cache writeback includes transmitting, from a first cache controller of a first cache to a second cache controller of a second cache, an unused bandwidth message representing an unused bandwidth between the first cache and the second cache during a first cycle. During a second cycle, a cache line containing dirty data is preemptively written back from the second cache to the first cache based on the unused bandwidth message. Further, the cache line in the second cache is written over in response to a cache miss to the second cache.Type: GrantFiled: September 28, 2017Date of Patent: October 22, 2019Assignee: Advanced Micro Devices, Inc.Inventors: David A. Roberts, Elliot H. Mednick
-
Patent number: 10452505Abstract: A memory system includes a non-volatile memory unit, a content-addressable memory unit coupled to the non-volatile memory unit, and an error injection logic unit coupled to the non-volatile memory unit and the content addressable memory unit. The non-volatile memory unit is programmed to allow a first error injection onto a first data word using the error injection logic unit. The error injection logic in combination with the content addressable memory unit replaces a bit cell in the memory system. The memory system performs an evaluation of various error detection and correction techniques.Type: GrantFiled: December 20, 2017Date of Patent: October 22, 2019Assignee: Advanced Micro Devices, Inc.Inventor: Michael K. Ciraula
-
Patent number: 10453243Abstract: Processing of non-real-time and real-time workloads is performed using discrete pipelines. A first pipeline includes a first shader and one or more fixed function hardware blocks. A second pipeline includes a second shader that is configured to emulate the at least one fixed function hardware block. First and second memory elements store first state information for the first pipeline and second state information for the second pipeline, respectively. A non-real-time workload executing in the first pipeline is preempted at a primitive boundary in response to a real-time workload being dispatched for execution in the second pipeline. The first memory element retains the first state information in response to preemption of the non-real-time workload. The first pipeline is configured to resume processing the subsequent primitive on the basis of the first state information stored in the first memory element.Type: GrantFiled: January 3, 2019Date of Patent: October 22, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Anirudh R. Acharya, Swapnil Sakharshete, Michael Mantor, Mangesh P. Nijasure, Todd Martin, Vineet Goel
-
Patent number: 10452437Abstract: Systems, apparatuses, and methods for performing temperature-aware task scheduling and proactive power management. A SoC includes a plurality of processing units and a task queue storing pending tasks. The SoC calculates a thermal metric for each pending task to predict an amount of heat the pending task will generate. The SoC also determines a thermal gradient for each processing unit to predict a rate at which the processing unit's temperature will change when executing a task. The SoC also monitors a thermal margin of how far each processing unit is from reaching its thermal limit. The SoC minimizes non-uniform heat generation on the SoC by scheduling pending tasks from the task queue to the processing units based on the thermal metrics for the pending tasks, the thermal gradients of each processing unit, and the thermal margin available on each processing unit.Type: GrantFiled: June 24, 2016Date of Patent: October 22, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Abhinandan Majumdar, Brian J. Kocoloski, Leonardo Piga, Wei Huang, Yasuko Eckert