Patents Assigned to Advanced Micro Devices

INSTRUCTION CACHE PREFETCH THROTTLE

Publication number: 20210173783

Abstract: Techniques for controlling prefetching of instructions into an instruction cache are provided. The techniques include tracking either or both of branch target buffer misses and instruction cache misses, modifying a throttle toggle based on the tracking, and adjusting prefetch activity based on the throttle toggle.

Type: Application

Filed: December 10, 2019

Publication date: June 10, 2021

Applicant: Advanced Micro Devices, Inc.

Inventors: Aparna Thyagarajan, Ashok Tirupathy Venkatachar, Marius Evers, Angelo Wong, William E. Jones
Dedicated interface for coupling flash memory and dynamic random access memory

Patent number: 11029852

Abstract: The present application describes embodiments of an interface for coupling flash memory and dynamic random access memory (DRAM) in a processing system. Some embodiments include a dedicated interface between a flash memory and DRAM. The dedicated interface is to provide access to the flash memory in response to instructions received over a DRAM interface between the DRAM and a processing device. Some embodiments of a method include accessing a flash memory via a dedicated interface between the flash memory and a dynamic random access memory (DRAM) in response to an instruction received over a DRAM interface between the DRAM and a processing device.

Type: Grant

Filed: December 14, 2016

Date of Patent: June 8, 2021

Assignee: Advanced Micro Devices, Inc.

Inventor: James Bauman
Method and apparatus for power reduction for data movement

Patent number: 11030135

Abstract: A method of and device for transferring data is provided. The method includes determining a difference between a data segment that was transferred last relative to each of one or more data segments available to be transferred next. In some embodiments, for so long as no data segment available to be sent has been waiting too long, the data segment chosen to be sent next is the data segment having the smallest difference relative to the data segment transferred last. The chosen data segment is then transmitted as the next data segment transferred.

Type: Grant

Filed: December 7, 2018

Date of Patent: June 8, 2021

Assignee: Advanced Micro Devices, Inc.

Inventor: Greg Sadowski
Protecting host memory from access by untrusted accelerators

Patent number: 11030117

Abstract: A host processor receives an address translation request from an accelerator, which may be trusted or un-trusted. The address translation request includes a virtual address in a virtual address space that is shared by the host processor and the accelerator. The host processor encrypts a physical address in a host memory indicated by the virtual address in response to the accelerator being permitted to access the physical address. The host processor then provides the encrypted physical address to the accelerator. The accelerator provides memory access requests including the encrypted physical address to the host processor, which decrypts the physical address and selectively accesses a location in the host memory indicated by the decrypted physical address depending upon whether the accelerator is permitted to access the location indicated by the decrypted physical address.

Type: Grant

Filed: July 14, 2017

Date of Patent: June 8, 2021

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Nuwan Jayasena, Brandon K. Potter, Andrew G. Kegel
Centroid selection for variable rate shading

Patent number: 11030791

Abstract: A technique for determining the centroid for fragments generated using variable rate shading. Because the barycentric interpolation used to determine texture coordinates for pixels is based on the premise that the point being interpolated is within the triangle, centroids that are outside of the triangle can produce undesirable visual artifacts. Another concern, however, is that the further the centroid is from the center of a pixel, the less accurate quad-based pixel derivatives become for attributes of that pixel. To address these concerns, the position of the sample that is both covered by the triangle and the closest to the center of the pixel, out of all covered samples of the pixel, is used as the centroid for a partially covered pixel. For a fully covered pixel (all samples in a pixel are covered by a triangle), the center of that pixel is used as the centroid.

Type: Grant

Filed: December 19, 2018

Date of Patent: June 8, 2021

Assignee: Advanced Micro Devices, Inc.

Inventors: Skyler Jonathon Saleh, Pazhani Pillai
Virtual space memory bandwidth reduction

Patent number: 11030095

Abstract: A processing system includes a central processing unit (CPU) and a graphics processing unit (GPU) that has a plurality of compute units. The GPU receives an image from the CPU and determines a total result area in a virtual-matrix-multiplication space of a virtual matrix-multiplication output matrix based on convolutional parameters associated with the image in an image space. The GPU partitions the total result area of the virtual matrix-multiplication output matrix into a plurality of virtual segments. The GPU allocates convolution operations to the plurality of compute units based on each virtual segment of the plurality of virtual segments.

Type: Grant

Filed: December 10, 2018

Date of Patent: June 8, 2021

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Swapnil Sakharshete, Samuel Lawrence Wasmundt
Systems and methods for selectively bypassing address-generation hardware in processor instruction pipelines

Patent number: 11023241

Abstract: Systems and methods selectively bypass address-generation hardware in processor instruction pipelines. In an embodiment, a processor includes an address-generation stage and an address-generation-bypass-determination unit (ABDU). The ABDU receives a load/store instruction. If an effective address for the load/store instruction is not known at the ABDU, the ABDU routes the load/store instruction via the address-generation stage of the processor. If, however, the effective address of the load/store instruction is known at the ABDU, the ABDU routes the load/store instruction to bypass the address-generation stage of the processor.

Type: Grant

Filed: August 21, 2018

Date of Patent: June 1, 2021

Assignee: Advanced Micro Devices, Inc.

Inventors: Andrej Kocev, Jay Fleischman, Kai Troester, Johnny C. Chu, Tim J. Wilkens, Neil Marketkar, Michael W. Long
Instructions for performing multi-line memory accesses

Patent number: 11023410

Abstract: A system is described that performs memory access operations. The system includes a processor in a first node, a memory in a second node, a communication interconnect coupled to the processor and the memory, and an interconnect controller in the first node coupled between the processor and the communication interconnect. Upon executing a multi-line memory access instruction, the processor prepares a memory access operation for accessing, in the memory, a block of data including at least some of each of at least two lines of data. The processor then causes the interconnect controller to use a single remote direct memory access memory transfer to perform the memory access operation for the block of data via the communication interconnect.

Type: Grant

Filed: September 11, 2018

Date of Patent: June 1, 2021

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: David A. Roberts, Shenghsun Cho
Method and apparatus for asynchronous scheduling

Patent number: 11023242

Abstract: A method and apparatus of asynchronous scheduling in a graphics device includes sending one or more instructions from an instruction scheduler to one or more instruction first-in/first-out (FIFO) devices. An instruction in the one or more FIFO devices is selected for execution by a single-instruction/multiple-data (SIMD) pipeline unit. It is determined whether all operands for the selected instruction are available for execution of the instruction, and if all the operands are available, the selected instruction is executed on the SIMD pipeline unit. The self-timed arithmetic pipeline unit (SIMD pipeline unit) is effectively encapsulated in a synchronous, (e.g., clocked by global clock), scheduler and register file environment.

Type: Grant

Filed: January 27, 2017

Date of Patent: June 1, 2021

Assignees: ATI TECHNOLOGIES ULC, ADVANCED MICRO DEVICES, INC.

Inventors: John Kalamatianos, Greg Sadowski, Syed Zohaib M. Gilani
Methods and apparatus for decoding video using re-ordered motion vector buffer

Patent number: 11025934

Abstract: A host processor, such as a central processing unit (CPU), programmed to execute a software driver that causes the host processor to generate a motion compensation command for a plurality of cores of a massively parallel processor, such as a graphics processing unit (GPU), to provide motion compensation for encoded video. The motion compensation command for the plurality of cores of the massively parallel processor contains executable instructions for processing a plurality of motion vectors grouped by a plurality of prediction modes from a re-ordered motion vector buffer by the plurality of cores of the massively parallel processor.

Type: Grant

Filed: December 16, 2014

Date of Patent: June 1, 2021

Assignee: Advanced Micro Devices, Inc.

Inventors: Michael L. Schmit, Ashish Farmer, Radhakrishna Giduthuri
PATTERN-BASED CACHE BLOCK COMPRESSION

Publication number: 20210157485

Abstract: Systems, methods, and devices for performing pattern-based cache block compression and decompression. An uncompressed cache block is input to the compressor. Byte values are identified within the uncompressed cache block. A cache block pattern is searched for in a set of cache block patterns based on the byte values. A compressed cache block is output based on the byte values and the cache block pattern. A compressed cache block is input to the decompressor. A cache block pattern is identified based on metadata of the cache block. The cache block pattern is applied to a byte dictionary of the cache block. An uncompressed cache block is output based on the cache block pattern and the byte dictionary. A subset of cache block patterns is determined from a training cache trace based on a set of compressed sizes and a target number of patterns for each size.

Type: Application

Filed: September 23, 2020

Publication date: May 27, 2021

Applicant: Advanced Micro Devices, Inc.

Inventors: Matthew Tomei, Shomit N. Das, David A. Wood
REGISTER WRITE SUPPRESSION

Publication number: 20210157598

Abstract: Techniques are provided for allocating registers for a processor. The techniques include identifying a first instruction of an instruction dispatch set that meets all register allocation suppression criteria of a first set of register allocation suppression criteria, suppressing register allocation for the first instruction, identifying a second instruction of the instruction dispatch set that does not meet all register allocation suppression criteria of a second set of register allocation suppression criteria, and allocating a register for the second instruction.

Type: Application

Filed: November 26, 2019

Publication date: May 27, 2021

Applicant: Advanced Micro Devices, Inc.

Inventors: Neil N. Marketkar, Arun A. Nair
MECHANISM FOR SUPPORTING DISCARD FUNCTIONALITY IN A RAY TRACING CONTEXT

Publication number: 20210158601

Abstract: Described herein is a technique for performing ray tracing. According to this technique, instead of executing intersection and/or any hit shaders during traversal of an acceleration structure to determine the closest hit for a ray, an acceleration structure is fully traversed in an invocation of a shader program, and the closest intersection with a triangle is recorded in a data structure associated with the material of the triangle. Later, a scheduler launches waves by grouping together multiple data items associated with the same material. The rays processed by that wave are processed with a continuation ray, rather than the full original ray. A continuation ray starts from the previous point of intersection and extends in the direction of the original ray. These steps help counter divergence that would occur if a single shader program that inlined the intersection and any hit shaders were executed.

Type: Application

Filed: February 3, 2021

Publication date: May 27, 2021

Applicant: Advanced Micro Devices, Inc.

Inventor: Skyler Jonathon Saleh
TECHNIQUES FOR PERFORMING STORE-TO-LOAD FORWARDING

Publication number: 20210157590

Abstract: A technique for performing store-to-load forwarding is provided. The technique includes determining a virtual address for data to be loaded for the load instruction, identifying a matching store instruction from one or more store instruction memories by comparing a virtual-address-based comparison value for the load instruction to one or more virtual-address-based comparison values of one or more store instructions, determining a physical address for the load instruction, and validating the load instruction based on a comparison between the physical address of the load instruction and a physical address of the matching store instruction.

Type: Application

Filed: November 27, 2019

Publication date: May 27, 2021

Applicant: Advanced Micro Devices, Inc.

Inventors: John M. King, Matthew T. Sobel
COMPILER OPERATIONS FOR HETEROGENEOUS CODE OBJECTS

Publication number: 20210157559

Abstract: Described herein are techniques for performing compilation operations for heterogeneous code objects. According to the techniques, a compiler identifies architectures targeted by a compilation unit, compiles the compilation unit into a heterogeneous code object that includes a different code object portion for each identified architecture, performs name mangling on functions of the compilation unit, links the heterogeneous code object with a second code object to form an executable, and generates relocation records for the executable.

Type: Application

Filed: November 22, 2019

Publication date: May 27, 2021

Applicant: Advanced Micro Devices, Inc.

Inventors: Steven Tony Tye, Brian Laird Sumner, Konstantin Zhuravlyov
LOADER AND RUNTIME OPERATIONS FOR HETEROGENEOUS CODE OBJECTS

Publication number: 20210157611

Abstract: Described herein are techniques for executing a heterogeneous code object executable. According to the techniques, a loader identifies a first memory appropriate for loading a first architecture-specific portion of the heterogeneous code object executable, wherein the first architecture specific portion includes instructions for a first architecture, identifies a second memory appropriate for loading a second architecture-specific portion of the heterogeneous code object executable, wherein the second architecture specific portion includes instructions for a second architecture that is different than the first architecture, loads the first architecture-specific portion into the first memory and the second architecture-specific portion into the second memory, and performs relocations on the first architecture-specific portion and on the second architecture-specific portion.

Type: Application

Filed: November 22, 2019

Publication date: May 27, 2021

Applicant: Advanced Micro Devices, Inc.

Inventors: Steven Tony Tye, Brian Laird Sumner, Konstantin Zhuravlyov
ARTIFICIAL NEURAL NETWORK EMULATION OF HOTSPOTS

Publication number: 20210158222

Abstract: Methods, devices, and systems for emulating a compute kernel with an ANN. The compute kernel is executed on a processor, and it is determined whether the compute kernel is a hotspot kernel. If the compute kernel is a hotspot kernel, the compute kernel is emulated with an ANN, and the ANN is substituted for the compute kernel.

Type: Application

Filed: November 25, 2019

Publication date: May 27, 2021

Applicant: Advanced Micro Devices, Inc.

Inventor: Nicholas Malaya
Multi-chip package with offset 3D structure

Patent number: 11018125

Abstract: Various semiconductor chip devices and methods of manufacturing the same are disclosed. In one aspect, a semiconductor chip device is provided that has a reconstituted semiconductor chip package that includes an interposer that has a first side and a second and opposite side and a metallization stack on the first side, a first semiconductor chip on the metallization stack and at least partially encased by a dielectric layer on the metallization stack, and plural semiconductor chips positioned over and at least partially laterally overlapping the first semiconductor chip.

Type: Grant

Filed: July 13, 2020

Date of Patent: May 25, 2021

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Milind S. Bhagavat, Rahul Agarwal, Gabriel H. Loh
Control of performance levels of different types of processors via a user interface

Patent number: 11016555

Abstract: An apparatus and a method for controlling power consumption associated with a computing device having first and second processors configured to perform different types of operations includes providing a user interface that allows, during normal operation of the computing device, at least one of: (i) a user selection of desired performance levels of the first and second processors relative to one another, such that higher desired performance levels of one processor correspond to lower desired performance levels of the other processor, and (ii) a user selection of a desired performance level of the first processor and a user selection of a desired performance level of the second processor, the two user selections being made independently of one another. The apparatus and method control, during normal operation of the computing device, performance levels of the processors in response to the one or more user selections of the desired performance levels.

Type: Grant

Filed: August 1, 2018

Date of Patent: May 25, 2021

Assignee: Advanced Micro Devices, Inc.

Inventor: I-Ming Lin
Implementing a micro-operation cache with compaction

Patent number: 11016763

Abstract: Systems, apparatuses, and methods for compacting multiple groups of micro-operations into individual cache lines of a micro-operation cache are disclosed. A processor includes at least a decode unit and a micro-operation cache. When a new group of micro-operations is decoded and ready to be written to the micro-operation cache, the micro-operation cache determines which set is targeted by the new group of micro-operations. If there is a way in this set that can store the new group without evicting any existing group already stored in the way, then the new group is stored into the way with the existing group(s) of micro-operations. Metadata is then updated to indicate that the new group of micro-operations has been written to the way. Additionally, the micro-operation cache manages eviction and replacement policy at the granularity of micro-operation groups rather than at the granularity of cache lines.

Type: Grant

Filed: March 8, 2019

Date of Patent: May 25, 2021

Assignee: Advanced Micro Devices, Inc.

Inventors: Jagadish B. Kotra, John Kalamatianos

prev … 84 85 86 87 88 89 90 91 92 … next