Patents Assigned to Advanced Micro Devices

Predicting page migration granularity for heterogeneous memory systems

Patent number: 10318344

Abstract: Systems, apparatuses, and methods for predicting page migration granularities for phases of an application executing on a non-uniform memory access (NUMA) system architecture are disclosed herein. A system with a plurality of processing units and memory devices executes a software application. The system identifies a plurality of phases of the application based on one or more characteristics (e.g., memory access pattern) of the application. The system predicts which page migration granularity will maximize performance for each phase of the application. The system performs a page migration at a first page migration granularity during a first phase of the application based on a first prediction. The system performs a page migration at a second page migration granularity during a second phase of the application based on a second prediction, wherein the second page migration granularity is different from the first page migration granularity.

Type: Grant

Filed: July 13, 2017

Date of Patent: June 11, 2019

Assignee: Advanced Micro Devices, Inc.

Inventor: Anthony Thomas Gutierrez
Message aggregation, combining and compression for efficient data communications in GPU-based clusters

Patent number: 10320695

Abstract: A system and method for efficient management of network traffic management of highly data parallel computing. A processing node includes one or more processors capable of generating network messages. A network interface is used to receive and send network messages across a network. The processing node reduces at least one of a number or a storage size of the original network messages into one or more new network messages. The new network messages are sent to the network interface to send across the network.

Type: Grant

Filed: May 26, 2016

Date of Patent: June 11, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Steven K. Reinhardt, Marc S. Orr, Bradford M. Beckmann, Shuai Che, David A. Wood
Techniques for changing management modes of multilevel memory hierarchy

Patent number: 10318153

Abstract: A processor modifies memory management mode for a range of memory locations of a multilevel memory hierarchy based on changes in an application phase of an application executing at a processor. The processor monitors the application phase (e.g., computation-bound phase, input/output phase, or memory access phase) of the executing application and in response to a change in phase consults a management policy to identify a memory management mode. The processor automatically reconfigures a memory controller and other modules so that a range of memory locations of the multilevel memory hierarchy are managed according to the identified memory management mode. By changing the memory management mode for the range of memory locations according to the application phase, the processor improves processing efficiency and flexibility.

Type: Grant

Filed: December 19, 2014

Date of Patent: June 11, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Sergey Blagodurov, Mitesh Ramesh Meswani, Gabriel H. Loh, Mauricio Breternitz, Jr., Mark Richard Nutter, John Robert Slice, David Andrew Roberts, Michael Ignatowski, Mark Henry Oskin
SYSTEM AND METHOD FOR LOAD FUSION

Publication number: 20190171452

Abstract: A system and method for load fusion fuses small load operations into fewer, larger load operations. The system detects that a pair of adjacent operations are consecutive load operations, where the adjacent micro-operations refers to micro-operations flowing through adjacent dispatch slots and the consecutive load micro-operations refers to both of the adjacent micro-operations being load micro-operations. The consecutive load operations are then reviewed to determine if the data sizes are the same and if the load operation addresses are consecutive. The two load operations are then fused together to form one load micro-operation with twice the data size and one load data micro-operation with no load component.

Type: Application

Filed: December 1, 2017

Publication date: June 6, 2019

Applicant: Advanced Micro Devices, Inc.

Inventor: John M. King
System and method for identifying graphics workloads for dynamic allocation of resources among GPU shaders

Patent number: 10311626

Abstract: A GPU filters graphics workloads to identify candidates for profiling. In response to receiving a graphics workload for the first time, the GPU determines if the graphics workload would require the GPU shaders to use fewer resources than would be spent profiling and determining a resource allocation for subsequent receipts of the same or a similar graphics workload. The GPU can further determine if the shaders are processing more than one graphics workload at the same time, such that the performance characteristics of each individual graphics workload cannot be effectively isolated. The GPU then profiles and stores resource allocations for a plurality of shaders for processing the filtered graphics workloads, and applies those stored resource allocations when the same or a similar graphics workload is received subsequently by the GPU.

Type: Grant

Filed: October 19, 2016

Date of Patent: June 4, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Rashad Oreifej, Angel E. Socarras, Mark Russell Anderson, Randy Wayne Ramsey
Method and apparatus for performing memory prefetching

Patent number: 10310981

Abstract: A method and apparatus for performing memory prefetching includes determining whether to initiate prefetching. Upon a determination to initiate prefetching, a first memory row is determined as a suitable prefetch candidate, and it is determined whether a particular set of one or more cachelines of the first memory row is to be prefetched.

Type: Grant

Filed: September 19, 2016

Date of Patent: June 4, 2019

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Yasuko Eckert, Nuwan Jayasena, Reena Panda, Onur Kayiran, Michael W. Boyer
Method and system for streaming information in wireless virtual reality

Patent number: 10310266

Abstract: Described is a method and system to efficiently compress and stream texture-space rendered content that enables low latency wireless virtual reality applications. In particular, camera motion, object motion/deformation, and shading information are decoupled and each type of information is then compressed as needed and streamed separately, while taking into account its tolerance to delays.

Type: Grant

Filed: February 10, 2016

Date of Patent: June 4, 2019

Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC

Inventors: Khaled Mammou, Layla A. Mah
Memory including side-car arrays with irregular sized entries

Patent number: 10311191

Abstract: A system and method for floorplanning a memory. A computing system includes a processing unit which generates memory access requests and a memory. The size of each memory line in the memory includes M bits. A memory macro block includes at least a primary array and a sidecar array. The primary array stores a first portion of a memory line and the sidecar array stores a second smaller portion of the memory line being accessed. The primary array and the sidecar array have different heights. The height of the sidecar array is based on a notch height in at least one corner of the memory macro block. The notch creates on-die space for s reserved area on the die. The notches result in cross-shaped, T-shaped, and/or L-shaped memory macro blocks.

Type: Grant

Filed: January 26, 2017

Date of Patent: June 4, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: John J. Wuu, Patrick J. Shyvers, Ryan Alan Selby
Method and apparatus for providing clock signals for a scan chain

Patent number: 10310015

Abstract: An integrated circuit device includes a plurality of flip flops configured into a scan chain. The plurality of flip flops includes at least flip flop of a first type and at least one flip flop of a second type. A method includes generating a first scan clock signal for loading scan data into at least one flip flop of a first type, generating a second scan clock signal and a third scan clock signal for loading the scan data into at least one flip flop of a second type, and loading a test pattern into a scan chain defined by the at least flip flop of the first type and the at least one flip flop of the second type responsive to the first, second, and third scan clock signals.

Type: Grant

Filed: July 19, 2013

Date of Patent: June 4, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Thomas A. Clouqueur, Dwight K. Elvey, Kamran Zarrineh
Stacked dies and dummy components for improved thermal performance

Patent number: 10312221

Abstract: Various semiconductor chip devices with stacked chips are disclosed. In one aspect, a semiconductor chip device includes a stack of plural semiconductor chips. Each two adjacent semiconductor chips of the plural semiconductor chips is electrically connected by plural interconnects and physically connected by a first insulating bonding layer. A first stack of dummy chips is positioned opposite a first side of the stack of semiconductor chips and separated from the plural semiconductor chips by a first gap. Each two adjacent of the first dummy chips are physically connected by a second insulating bonding layer. A second stack of dummy chips is positioned opposite a second side of the stack of semiconductor chips and separated from the plural semiconductor chips by a second gap. Each two adjacent of the second dummy chips are physically connected by a third insulating bonding layer.

Type: Grant

Filed: December 17, 2017

Date of Patent: June 4, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Rahul Agarwal, Kaushik Mysore Srinivasa Setty, Milind S. Bhagavat, Brett P. Wilkerson
System and method for dynamically allocating memory to hold pending write requests

Patent number: 10310997

Abstract: A processing system employs a memory module as a temporary write buffer to store write requests when a write buffer at a memory controller reaches a threshold capacity, and de-allocates the temporary write buffer when the write buffer capacity falls below the threshold. Upon receiving a write request, the memory controller stores the write request in a write buffer until the write request can be written to main memory. The memory controller can temporarily extend the memory controller's write buffer to the memory module, thereby accommodating temporary periods of high memory activity without requiring a large permanent write buffer at the memory controller.

Type: Grant

Filed: September 22, 2016

Date of Patent: June 4, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Amin Farmahini Farahani, David A. Roberts, Nuwan Jayasena
Secure system memory training

Patent number: 10311236

Abstract: Systems, apparatuses, and methods for performing secure system memory training are disclosed. In one embodiment, a system includes a boot media, a security processor with a first memory, a system memory, and one or more main processors coupled to the system memory. The security processor is configured to retrieve first data from the boot media and store and authenticate the first data in the first memory. The first data includes a first set of instructions which are executable to retrieve, from the boot media, a configuration block with system memory training parameters. The security processor also executes a second set of instructions to initialize and train the system memory using the training parameters. After training the system memory, the security processor retrieves, authenticates, and stores boot code in the system memory and releases the one or more main processors from reset to execute the boot code.

Type: Grant

Filed: November 22, 2016

Date of Patent: June 4, 2019

Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC

Inventors: Kathirkamanathan Nadarajah, Oswin Housty, Sergey Blotsky, Tan Peng, Hary Devapriyan Mahesan
SYSTEM AND METHOD FOR STORE FUSION

Publication number: 20190163475

Abstract: Described herein is a system and method for store fusion that fuses small store operations into fewer, larger store operations. The system detects that a pair of adjacent operations are consecutive store operations, where the adjacent micro-operations refers to micro-operations flowing through adjacent dispatch slots and the consecutive store micro-operations refers to both of the adjacent micro-operations being store micro-operations. The consecutive store operations are then reviewed to determine if the data sizes are the same and if the store operation addresses are consecutive. The two store operations are then fused together to form one store operation with twice the data size and one store data HI operation.

Type: Application

Filed: November 27, 2017

Publication date: May 30, 2019

Applicant: Advanced Micro Devices, Inc.

Inventor: John M. King
COMPUTATIONAL SENSOR

Publication number: 20190164251

Abstract: A system and method for controlling characteristics of collected image data are disclosed. The system and method include performing pre-processing of an image using GPUs, configuring an optic based on the pre-processing, the configuring being designed to account for features of the pre-processed image, acquiring an image using the configured optic, processing the acquired image using GPUs, and determining if the processed acquired image accounts for feature of the pre-processed image, and the determination is affirmative, outputting the image, wherein if the determination is negative repeating the configuring of the optic and re-acquiring the image.

Type: Application

Filed: December 5, 2017

Publication date: May 30, 2019

Applicant: Advanced Micro Devices, Inc.

Inventors: Allen H. Rush, Hui Zhou
SYSTEM AND METHOD FOR VIRTUAL LOAD QUEUE

Publication number: 20190163471

Abstract: A system and method for a virtual load queue is described. Load micro-operations are processed through an instruction pipeline without requiring an entry in a load queue (LDQ). An address generation scheduler queue (AGSQ) entry is allocated to the load micro-operation and a LDQ entry is not allocated to the load micro-operation. The LDQ entries are reserved for the N oldest load micro-operations, where N is the depth of the LDQ. Deallocation of the AGSQ entry is done if the load micro-operation is one of the N oldest load micro-operations, or upon successful completion of the load micro-operation. Deallocation of the AGSQ entry is not done if the load micro-operation gets a bad status and is not one of the N oldest micro-operations. Consequently, the AGSQ acts as a virtual queue for the LDQ and mitigates the limiting effect of the LDQ depth.

Type: Application

Filed: November 28, 2017

Publication date: May 30, 2019

Applicant: Advanced Micro Devices, Inc.

Inventor: John M. King
Clock divider device and methods thereof

Patent number: 10303200

Abstract: A method for implementing clock dividers includes providing, in response to detecting a voltage drop at a processor core, an input clock signal to a transmission gate multiplexer for selecting between one of two stretch-enable signals. In some embodiments, selecting between the one of two stretch-enable signals includes inputting a set of core clock enable signals into a clock divider circuit, and modifying the set of core clock enable signals to generate the stretch-enable signals. An output clock signal is generated based on the selected stretch-enable signal.

Type: Grant

Filed: February 24, 2017

Date of Patent: May 28, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Deepesh John, Steven Kommrusch, Vibhor Mittal
Dynamic clock control to increase stutter efficiency in the memory subsystem

Patent number: 10304506

Abstract: Systems, apparatuses, and methods for implementing dynamic clock control to increase stutter efficiency in a memory subsystem are disclosed. A system includes at least a processor, a memory, and a communication fabric coupled to the processor and memory. The system implements a stutter mode for a first region of the fabric, with stutter mode including an idle state and an active state. Stutter efficiency is defined as the idle time divided by the sum of the active time and the idle time. Reducing the exit latency of going from the idle state to the active state increases the stutter efficiency which increases the power savings achieved by implementing the stutter mode. Since the phase-locked loop (PLL) is one of the main contributors to the exit latency, the PLL is powered down and one or more bypass clocks are provided during the stutter mode.

Type: Grant

Filed: November 10, 2017

Date of Patent: May 28, 2019

Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC

Inventors: Alexander J. Branover, Benjamin Tsien, Bradley Kent, Joyce C. Wong
Swizzling in 3D stacked memory

Patent number: 10303398

Abstract: A processing system includes a compute die and a stacked memory stacked with the compute die. The stacked memory includes a first memory die and a second memory die stacked on top of the first memory die. A parallel access using a single memory address is directed towards different memory banks of the first memory die and the second memory die. The single memory address of the parallel access is swizzled to access the first memory die and the second memory die at different physical locations.

Type: Grant

Filed: October 26, 2017

Date of Patent: May 28, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: John Wuu, Michael K. Ciraula, Russell Schreiber, Samuel Naffziger
Bufferless communication for redundant multithreading using register permutation

Patent number: 10303472

Abstract: Systems, apparatuses, and methods for implementing bufferless communication for redundant multithreading applications using register permutation are disclosed. In one embodiment, a system includes a parallel processing unit, a register file, and a scheduler. The scheduler is configured to cause execution of a plurality of threads to be performed in lockstep on the parallel processing unit. The plurality of threads include a first thread and a second thread executing on adjacent first and second lanes, respectively, of the parallel processing unit. The second thread is configured to perform a register permute operation from a first register location to a second register location in a first instruction cycle, with the second register location associated with the second processing lane. The second thread is configured to read from the second register location in a second instruction cycle, wherein the first and second instruction cycles are successive instruction cycles.

Type: Grant

Filed: November 22, 2016

Date of Patent: May 28, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Daniel I. Lowell, Manish Gupta
Performance and power optimization via block oriented performance measurement and control

Patent number: RE47420

Abstract: An integrated circuit includes a plurality of functional blocks. Utilization information for the various functional blocks is generated. Based on that information, the power consumption and thus the performance levels of the functional blocks can be tuned. Thus, when a functional block is heavily loaded by an application, the performance level and thus power consumption of that particular functional block is increased. At the same time, other functional blocks that are not being heavily utilized and thus have lower performance requirements can be kept at a relatively low power consumption level. Thus, power consumption can be reduced overall without unduly impacting performance.

Type: Grant

Filed: July 22, 2016

Date of Patent: June 4, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Morrie Altmejd, Evandro Menezes, Dave Tobias

prev … 112 113 114 115 116 117 118 119 120 … next