Patents Assigned to Advanced Micro Devices
-
Patent number: 10318344Abstract: Systems, apparatuses, and methods for predicting page migration granularities for phases of an application executing on a non-uniform memory access (NUMA) system architecture are disclosed herein. A system with a plurality of processing units and memory devices executes a software application. The system identifies a plurality of phases of the application based on one or more characteristics (e.g., memory access pattern) of the application. The system predicts which page migration granularity will maximize performance for each phase of the application. The system performs a page migration at a first page migration granularity during a first phase of the application based on a first prediction. The system performs a page migration at a second page migration granularity during a second phase of the application based on a second prediction, wherein the second page migration granularity is different from the first page migration granularity.Type: GrantFiled: July 13, 2017Date of Patent: June 11, 2019Assignee: Advanced Micro Devices, Inc.Inventor: Anthony Thomas Gutierrez
-
Patent number: 10320695Abstract: A system and method for efficient management of network traffic management of highly data parallel computing. A processing node includes one or more processors capable of generating network messages. A network interface is used to receive and send network messages across a network. The processing node reduces at least one of a number or a storage size of the original network messages into one or more new network messages. The new network messages are sent to the network interface to send across the network.Type: GrantFiled: May 26, 2016Date of Patent: June 11, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Steven K. Reinhardt, Marc S. Orr, Bradford M. Beckmann, Shuai Che, David A. Wood
-
Patent number: 10318153Abstract: A processor modifies memory management mode for a range of memory locations of a multilevel memory hierarchy based on changes in an application phase of an application executing at a processor. The processor monitors the application phase (e.g., computation-bound phase, input/output phase, or memory access phase) of the executing application and in response to a change in phase consults a management policy to identify a memory management mode. The processor automatically reconfigures a memory controller and other modules so that a range of memory locations of the multilevel memory hierarchy are managed according to the identified memory management mode. By changing the memory management mode for the range of memory locations according to the application phase, the processor improves processing efficiency and flexibility.Type: GrantFiled: December 19, 2014Date of Patent: June 11, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Sergey Blagodurov, Mitesh Ramesh Meswani, Gabriel H. Loh, Mauricio Breternitz, Jr., Mark Richard Nutter, John Robert Slice, David Andrew Roberts, Michael Ignatowski, Mark Henry Oskin
-
Publication number: 20190171452Abstract: A system and method for load fusion fuses small load operations into fewer, larger load operations. The system detects that a pair of adjacent operations are consecutive load operations, where the adjacent micro-operations refers to micro-operations flowing through adjacent dispatch slots and the consecutive load micro-operations refers to both of the adjacent micro-operations being load micro-operations. The consecutive load operations are then reviewed to determine if the data sizes are the same and if the load operation addresses are consecutive. The two load operations are then fused together to form one load micro-operation with twice the data size and one load data micro-operation with no load component.Type: ApplicationFiled: December 1, 2017Publication date: June 6, 2019Applicant: Advanced Micro Devices, Inc.Inventor: John M. King
-
Patent number: 10311626Abstract: A GPU filters graphics workloads to identify candidates for profiling. In response to receiving a graphics workload for the first time, the GPU determines if the graphics workload would require the GPU shaders to use fewer resources than would be spent profiling and determining a resource allocation for subsequent receipts of the same or a similar graphics workload. The GPU can further determine if the shaders are processing more than one graphics workload at the same time, such that the performance characteristics of each individual graphics workload cannot be effectively isolated. The GPU then profiles and stores resource allocations for a plurality of shaders for processing the filtered graphics workloads, and applies those stored resource allocations when the same or a similar graphics workload is received subsequently by the GPU.Type: GrantFiled: October 19, 2016Date of Patent: June 4, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Rashad Oreifej, Angel E. Socarras, Mark Russell Anderson, Randy Wayne Ramsey
-
Patent number: 10310981Abstract: A method and apparatus for performing memory prefetching includes determining whether to initiate prefetching. Upon a determination to initiate prefetching, a first memory row is determined as a suitable prefetch candidate, and it is determined whether a particular set of one or more cachelines of the first memory row is to be prefetched.Type: GrantFiled: September 19, 2016Date of Patent: June 4, 2019Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Yasuko Eckert, Nuwan Jayasena, Reena Panda, Onur Kayiran, Michael W. Boyer
-
Patent number: 10310266Abstract: Described is a method and system to efficiently compress and stream texture-space rendered content that enables low latency wireless virtual reality applications. In particular, camera motion, object motion/deformation, and shading information are decoupled and each type of information is then compressed as needed and streamed separately, while taking into account its tolerance to delays.Type: GrantFiled: February 10, 2016Date of Patent: June 4, 2019Assignees: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Khaled Mammou, Layla A. Mah
-
Patent number: 10311191Abstract: A system and method for floorplanning a memory. A computing system includes a processing unit which generates memory access requests and a memory. The size of each memory line in the memory includes M bits. A memory macro block includes at least a primary array and a sidecar array. The primary array stores a first portion of a memory line and the sidecar array stores a second smaller portion of the memory line being accessed. The primary array and the sidecar array have different heights. The height of the sidecar array is based on a notch height in at least one corner of the memory macro block. The notch creates on-die space for s reserved area on the die. The notches result in cross-shaped, T-shaped, and/or L-shaped memory macro blocks.Type: GrantFiled: January 26, 2017Date of Patent: June 4, 2019Assignee: Advanced Micro Devices, Inc.Inventors: John J. Wuu, Patrick J. Shyvers, Ryan Alan Selby
-
Patent number: 10310015Abstract: An integrated circuit device includes a plurality of flip flops configured into a scan chain. The plurality of flip flops includes at least flip flop of a first type and at least one flip flop of a second type. A method includes generating a first scan clock signal for loading scan data into at least one flip flop of a first type, generating a second scan clock signal and a third scan clock signal for loading the scan data into at least one flip flop of a second type, and loading a test pattern into a scan chain defined by the at least flip flop of the first type and the at least one flip flop of the second type responsive to the first, second, and third scan clock signals.Type: GrantFiled: July 19, 2013Date of Patent: June 4, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Thomas A. Clouqueur, Dwight K. Elvey, Kamran Zarrineh
-
Patent number: 10312221Abstract: Various semiconductor chip devices with stacked chips are disclosed. In one aspect, a semiconductor chip device includes a stack of plural semiconductor chips. Each two adjacent semiconductor chips of the plural semiconductor chips is electrically connected by plural interconnects and physically connected by a first insulating bonding layer. A first stack of dummy chips is positioned opposite a first side of the stack of semiconductor chips and separated from the plural semiconductor chips by a first gap. Each two adjacent of the first dummy chips are physically connected by a second insulating bonding layer. A second stack of dummy chips is positioned opposite a second side of the stack of semiconductor chips and separated from the plural semiconductor chips by a second gap. Each two adjacent of the second dummy chips are physically connected by a third insulating bonding layer.Type: GrantFiled: December 17, 2017Date of Patent: June 4, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Rahul Agarwal, Kaushik Mysore Srinivasa Setty, Milind S. Bhagavat, Brett P. Wilkerson
-
Patent number: 10310997Abstract: A processing system employs a memory module as a temporary write buffer to store write requests when a write buffer at a memory controller reaches a threshold capacity, and de-allocates the temporary write buffer when the write buffer capacity falls below the threshold. Upon receiving a write request, the memory controller stores the write request in a write buffer until the write request can be written to main memory. The memory controller can temporarily extend the memory controller's write buffer to the memory module, thereby accommodating temporary periods of high memory activity without requiring a large permanent write buffer at the memory controller.Type: GrantFiled: September 22, 2016Date of Patent: June 4, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Amin Farmahini Farahani, David A. Roberts, Nuwan Jayasena
-
Patent number: 10311236Abstract: Systems, apparatuses, and methods for performing secure system memory training are disclosed. In one embodiment, a system includes a boot media, a security processor with a first memory, a system memory, and one or more main processors coupled to the system memory. The security processor is configured to retrieve first data from the boot media and store and authenticate the first data in the first memory. The first data includes a first set of instructions which are executable to retrieve, from the boot media, a configuration block with system memory training parameters. The security processor also executes a second set of instructions to initialize and train the system memory using the training parameters. After training the system memory, the security processor retrieves, authenticates, and stores boot code in the system memory and releases the one or more main processors from reset to execute the boot code.Type: GrantFiled: November 22, 2016Date of Patent: June 4, 2019Assignees: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Kathirkamanathan Nadarajah, Oswin Housty, Sergey Blotsky, Tan Peng, Hary Devapriyan Mahesan
-
Publication number: 20190163475Abstract: Described herein is a system and method for store fusion that fuses small store operations into fewer, larger store operations. The system detects that a pair of adjacent operations are consecutive store operations, where the adjacent micro-operations refers to micro-operations flowing through adjacent dispatch slots and the consecutive store micro-operations refers to both of the adjacent micro-operations being store micro-operations. The consecutive store operations are then reviewed to determine if the data sizes are the same and if the store operation addresses are consecutive. The two store operations are then fused together to form one store operation with twice the data size and one store data HI operation.Type: ApplicationFiled: November 27, 2017Publication date: May 30, 2019Applicant: Advanced Micro Devices, Inc.Inventor: John M. King
-
Publication number: 20190164251Abstract: A system and method for controlling characteristics of collected image data are disclosed. The system and method include performing pre-processing of an image using GPUs, configuring an optic based on the pre-processing, the configuring being designed to account for features of the pre-processed image, acquiring an image using the configured optic, processing the acquired image using GPUs, and determining if the processed acquired image accounts for feature of the pre-processed image, and the determination is affirmative, outputting the image, wherein if the determination is negative repeating the configuring of the optic and re-acquiring the image.Type: ApplicationFiled: December 5, 2017Publication date: May 30, 2019Applicant: Advanced Micro Devices, Inc.Inventors: Allen H. Rush, Hui Zhou
-
Publication number: 20190163471Abstract: A system and method for a virtual load queue is described. Load micro-operations are processed through an instruction pipeline without requiring an entry in a load queue (LDQ). An address generation scheduler queue (AGSQ) entry is allocated to the load micro-operation and a LDQ entry is not allocated to the load micro-operation. The LDQ entries are reserved for the N oldest load micro-operations, where N is the depth of the LDQ. Deallocation of the AGSQ entry is done if the load micro-operation is one of the N oldest load micro-operations, or upon successful completion of the load micro-operation. Deallocation of the AGSQ entry is not done if the load micro-operation gets a bad status and is not one of the N oldest micro-operations. Consequently, the AGSQ acts as a virtual queue for the LDQ and mitigates the limiting effect of the LDQ depth.Type: ApplicationFiled: November 28, 2017Publication date: May 30, 2019Applicant: Advanced Micro Devices, Inc.Inventor: John M. King
-
Patent number: 10303200Abstract: A method for implementing clock dividers includes providing, in response to detecting a voltage drop at a processor core, an input clock signal to a transmission gate multiplexer for selecting between one of two stretch-enable signals. In some embodiments, selecting between the one of two stretch-enable signals includes inputting a set of core clock enable signals into a clock divider circuit, and modifying the set of core clock enable signals to generate the stretch-enable signals. An output clock signal is generated based on the selected stretch-enable signal.Type: GrantFiled: February 24, 2017Date of Patent: May 28, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Deepesh John, Steven Kommrusch, Vibhor Mittal
-
Patent number: 10304506Abstract: Systems, apparatuses, and methods for implementing dynamic clock control to increase stutter efficiency in a memory subsystem are disclosed. A system includes at least a processor, a memory, and a communication fabric coupled to the processor and memory. The system implements a stutter mode for a first region of the fabric, with stutter mode including an idle state and an active state. Stutter efficiency is defined as the idle time divided by the sum of the active time and the idle time. Reducing the exit latency of going from the idle state to the active state increases the stutter efficiency which increases the power savings achieved by implementing the stutter mode. Since the phase-locked loop (PLL) is one of the main contributors to the exit latency, the PLL is powered down and one or more bypass clocks are provided during the stutter mode.Type: GrantFiled: November 10, 2017Date of Patent: May 28, 2019Assignees: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Alexander J. Branover, Benjamin Tsien, Bradley Kent, Joyce C. Wong
-
Patent number: 10303398Abstract: A processing system includes a compute die and a stacked memory stacked with the compute die. The stacked memory includes a first memory die and a second memory die stacked on top of the first memory die. A parallel access using a single memory address is directed towards different memory banks of the first memory die and the second memory die. The single memory address of the parallel access is swizzled to access the first memory die and the second memory die at different physical locations.Type: GrantFiled: October 26, 2017Date of Patent: May 28, 2019Assignee: Advanced Micro Devices, Inc.Inventors: John Wuu, Michael K. Ciraula, Russell Schreiber, Samuel Naffziger
-
Patent number: 10303472Abstract: Systems, apparatuses, and methods for implementing bufferless communication for redundant multithreading applications using register permutation are disclosed. In one embodiment, a system includes a parallel processing unit, a register file, and a scheduler. The scheduler is configured to cause execution of a plurality of threads to be performed in lockstep on the parallel processing unit. The plurality of threads include a first thread and a second thread executing on adjacent first and second lanes, respectively, of the parallel processing unit. The second thread is configured to perform a register permute operation from a first register location to a second register location in a first instruction cycle, with the second register location associated with the second processing lane. The second thread is configured to read from the second register location in a second instruction cycle, wherein the first and second instruction cycles are successive instruction cycles.Type: GrantFiled: November 22, 2016Date of Patent: May 28, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Daniel I. Lowell, Manish Gupta
-
Patent number: RE47420Abstract: An integrated circuit includes a plurality of functional blocks. Utilization information for the various functional blocks is generated. Based on that information, the power consumption and thus the performance levels of the functional blocks can be tuned. Thus, when a functional block is heavily loaded by an application, the performance level and thus power consumption of that particular functional block is increased. At the same time, other functional blocks that are not being heavily utilized and thus have lower performance requirements can be kept at a relatively low power consumption level. Thus, power consumption can be reduced overall without unduly impacting performance.Type: GrantFiled: July 22, 2016Date of Patent: June 4, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Morrie Altmejd, Evandro Menezes, Dave Tobias