Patents Assigned to Advanced Micro Devices

Dynamically adapting mechanism for translation lookaside buffer shootdowns

Patent number: 10552339

Abstract: An operating system (OS) of a processing system having a plurality of processor cores determines a cost associated with different mechanisms for performing a translation lookaside buffer (TLB) shootdown in response to, for example, a virtual address being remapped to a new physical address, and selects a TLB shootdown mechanism to purge outdated or invalid address translations from the TLB based on the determined cost. In some embodiments, the OS selects an inter-processor interrupt (IPI) as the TLB shootdown mechanism if the cost associated with sending an IPI is less than a threshold cost. In some embodiments, the OS compares the cost of using an IPI as the TLB shootdown mechanism versus the cost of sending a hardware broadcast to all processor cores of the processing system as the shootdown mechanism and selects the shootdown mechanism having the lower cost.

Type: Grant

Filed: June 12, 2018

Date of Patent: February 4, 2020

Assignee: Advanced Micro Devices, Inc.

Inventors: Arkaprava Basu, Joseph L. Greathouse
COMBINED WORLD-SPACE PIPELINE SHADER STAGES

Publication number: 20200035017

Abstract: Improvements to graphics processing pipelines are disclosed. More specifically, the vertex shader stage, which performs vertex transformations, and the hull or geometry shader stages, are combined. If tessellation is disabled and geometry shading is enabled, then the graphics processing pipeline includes a combined vertex and graphics shader stage. If tessellation is enabled, then the graphics processing pipeline includes a combined vertex and hull shader stage. If tessellation and geometry shading are both disabled, then the graphics processing pipeline does not use a combined shader stage. The combined shader stages improve efficiency by reducing the number of executing instances of shader programs and associated resources reserved.

Type: Application

Filed: October 2, 2019

Publication date: January 30, 2020

Applicant: Advanced Micro Devices, Inc.

Inventors: Mangesh P. NIJASURE, Randy W. RAMSEY, Todd MARTIN
NETWORK-RELATED PERFORMANCE FOR GPUS

Publication number: 20200034195

Abstract: Techniques for improved networking performance in systems where a graphics processing unit or other highly parallel non-central-processing-unit (referred to as an accelerated processing device or “APD” herein) has the ability to directly issue commands to a networking device such as a network interface controller (“NIC”) are disclosed. According to a first technique, the latency associated with loading certain metadata into NIC hardware memory is reduced or eliminated by pre-fetching network command queue metadata into hardware network command queue metadata slots of the NIC, thereby reducing the latency associated with fetching that metadata at a later time. A second technique involves reducing latency by prioritizing work on an APD when it is known that certain network traffic is soon to arrive over the network via a NIC.

Type: Application

Filed: July 30, 2018

Publication date: January 30, 2020

Applicant: Advanced Micro Devices, Inc.

Inventors: Michael W. LeBeane, Khaled Hamidouche, Bradford M. Beckmann
USING RETURN ADDRESS PREDICTOR TO SPEED UP CONTROL STACK RETURN ADDRESS VERIFICATION

Publication number: 20200034144

Abstract: Overhead associated with verifying function return addresses to protect against security exploits is reduced by taking advantage of branch prediction mechanisms for predicting return addresses. More specifically, returning from a function includes popping a return address from a data stack. Well-known security exploits overwrite the return address on the data stack to hijack control flow. In some processors, a separate data structure referred to as a control stack is used to verify the data stack. When a return instruction is executed, the processor issues an exception if the return addresses on the control stack and the data stack are not identical. This overhead can be avoided by taking advantage of the return address stack, which is a data structure used by the branch predictor to predict return addresses. In most situations, if this prediction is correct, the above check does not need to occur, thus reducing the associated overhead.

Type: Application

Filed: July 26, 2018

Publication date: January 30, 2020

Applicant: Advanced Micro Devices, Inc.

Inventors: Marius Evers, David A. Kaplan, Debjit Das Sarma
Compact supply independent temperature sensor

Patent number: 10547273

Abstract: A temperature sensor has a first transistor with a gate voltage tied to maintain the first transistor in an off state with leakage current flowing through the transistor, the leakage current varying with temperature. A second transistor is coupled to the first transistor and receives a gate voltage to keep the second transistor in an on state. A current mirror mirrors the leakage current and supplies a mirrored current used to control a frequency of an oscillator signal varies with the mirrored current. The temperature of the first transistor is determined based the frequency of the oscillator signal.

Type: Grant

Filed: October 27, 2017

Date of Patent: January 28, 2020

Assignee: Advanced Micro Devices, Inc.

Inventors: Ravinder Reddy Rachala, Stephen V. Kosonocky
Tag accelerator for low latency DRAM cache

Patent number: 10545875

Abstract: Systems, apparatuses, and methods for implementing a tag accelerator cache are disclosed. A system includes at least a data cache and a control unit coupled to the data cache via a memory controller. The control unit includes a tag accelerator cache (TAC) for caching tag blocks fetched from the data cache. The data cache is organized such that multiple tags are retrieved in a single access. This allows hiding the tag latency penalty for future accesses to neighboring tags and improves cache bandwidth. When a tag block is fetched from the data cache, the tag block is cached in the TAC. Memory requests received by the control unit first lookup the TAC before being forwarded to the data cache. Due to the presence of spatial locality in applications, the TAC can filter out a large percentage of tag accesses to the data cache, resulting in latency and bandwidth savings.

Type: Grant

Filed: December 27, 2017

Date of Patent: January 28, 2020

Assignee: Advanced Micro Devices, Inc.

Inventors: Vydhyanathan Kalyanasundharam, Kevin M. Lepak, Ganesh Balakrishnan, Ravindra N. Bhargava
Single pass flexible screen/scale rasterization

Patent number: 10546365

Abstract: An apparatus, such as a head mounted device (HMD), includes one or more processors configured to implement a graphics pipeline that renders pixels in window space with a nonuniform pixel spacing. The apparatus also includes a first distortion function that maps the non-uniformly spaced pixels in window space to uniformly spaced pixels in raster space. The apparatus further includes a scan converter configured to sample the pixels in window space through the first distortion function. The scan converter is configured to render display pixels used to generate an image for display to a user based on the uniformly spaced pixels in raster space. In some cases, the pixels in the window space are rendered such that a pixel density per subtended area is constant across the user's field of view.

Type: Grant

Filed: December 15, 2017

Date of Patent: January 28, 2020

Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC

Inventors: Michael Mantor, Laurent Lefebvre, Mika Tuomi, Kiia Kallio
Cancel and replay protocol scheme to improve ordered bandwidth

Patent number: 10540316

Abstract: Systems, apparatuses, and methods for implementing a cancel and replay mechanism for ordered requests are disclosed. A system includes at least an ordering master, a memory controller, a coherent slave coupled to the memory controller, and an interconnect fabric coupled to the ordering master and the coherent slave. The ordering master generates a write request which is forwarded to the coherent slave on the path to memory. The coherent slave sends invalidating probes to all processing nodes and then sends an indication that the write request is globally visible to the ordering master when all cached copies of the data targeted by the write request have been invalidated. In response to receiving the globally visible indication, the ordering master starts a timer. If the timer expires before all older requests have become globally visible, then the write request is cancelled and replayed to ensure forward progress in the fabric and avoid a potential deadlock scenario.

Type: Grant

Filed: December 28, 2017

Date of Patent: January 21, 2020

Assignee: Advanced Micro Devices, Inc.

Inventors: Vydhyanathan Kalyanasundharam, Eric Christopher Morton, Chen-Ping Yang, Amit P. Apte, Elizabeth M. Cooper
Method and apparatus for translation lookaside buffer with multiple compressed encodings

Patent number: 10540290

Abstract: Methods and apparatus obtain one or more system page table entries that represent virtual system (e.g., memory) page to physical system page translations. A number of the obtained system page table entries that can be encoded in each of a plurality of translation lookaside buffer (TLB) entry encoding formats are determined. The method and apparatus may select one of the TLB entry encoding formats that encode a number of the obtained system page table entries. The method and apparatus may encode a number of obtained system page table entries in the TLB entry encoding format selected into a compressed encoding format TLB entry. The method and apparatus may associate the compressed encoding format TLB entry with an encoding format indication of the encoding format selected. The method and apparatus may decode a compressed encoding format TLB entry based on a determined TLB entry encoding format.

Type: Grant

Filed: April 27, 2016

Date of Patent: January 21, 2020

Assignees: ATI Technologies ULC, Advanced Micro Devices, Inc.

Inventors: Gabriel H Loh, Jimshed Mirza
System for video compression

Patent number: 10542268

Abstract: A system and method for providing video compression that includes encoding using an encoding engine a YUV stream wherein Y, U and V color values are encoded in parallel and patching together the Y, U and V color streams to form a compressed YUV output stream. The encoding engine further includes encoding each color value of the YUV stream in parallel using parallel encoding engines and a control engine for controlling operation all of the encoding engines in parallel. The YUV stream has an average bits per pixel value that varies from a first value to a second value that is double the first value. The encoding engine includes encoding the YUV stream in generally the same amount of time regardless of the average bits per pixel value.

Type: Grant

Filed: April 19, 2017

Date of Patent: January 21, 2020

Assignees: ADVANCED MICRO DEVICES, INC., ATI TECHNOLOGIES ULC

Inventors: Haibin Li, Zhen Chen, Lei Zhang, Ji Zhou, Zhong Cai
Power-oriented bus encoding for data transmission

Patent number: 10540304

Abstract: Systems, apparatuses, and methods for reducing the toggle rates on buses are disclosed. A computing system includes a source which provides packets for transmission on a bus. The packet is compressed by a compression engine. The compressed data format of the packet includes locations (bit positions) referred to as holes which do not include valid data. A bus configuration module identifies the locations of the holes and replaces the holes with information from a previous packet transmitted earlier on the bus. The bus configuration module also determines a new transmission bus width for the packet for lowering the bus toggle rate on the bus during transmission.

Type: Grant

Filed: April 28, 2017

Date of Patent: January 21, 2020

Assignee: Advanced Micro Devices, Inc.

Inventors: Greg Sadowski, Tri Minh Nguyen
High performance context switching for virtualized FPGA accelerators

Patent number: 10540200

Abstract: A hardware context manager in a field-programmable gate array (FPGA) device includes configuration logic configured to program one or more programming regions in the FPGA device based on configuration data for implementing a target configuration of the one or more programming regions. Context management logic in the hardware context manager is coupled with the configuration logic and saves a first context corresponding to the target configuration by retrieving first state information from the set of one or more programming regions, where the first state information is generated based on the target configuration, and storing the retrieved first state information in a context memory. The context management logic restores the first context by transferring the first state information from the context memory to the one or more programming regions, and causing the configuration logic to program the one or more programming regions based on the configuration data.

Type: Grant

Filed: November 10, 2017

Date of Patent: January 21, 2020

Assignee: Advanced Micro Devices, Inc.

Inventors: Kevin Y. Cheng, David A. Roberts, William C. Brantley
Hardware transmit equalization for high speed

Patent number: 10541841

Abstract: Systems, apparatuses, and methods for performing transmit equalization at a target high speed are disclosed. A computing system includes at least a transmitter, receiver, and a communication channel connecting the transmitter and the receiver. The communication channel includes a plurality of lanes which are subdivided into a first subset of lanes and a second subset of lanes. During equalization training, the first subset of lanes operate at a first speed while the second subset of lanes operate at a second speed. The first speed is the desired target speed for operating the communication link while the second speed is a relatively low speed capable of reliably carrying data over a given lane prior to equalization training. The first subset of lanes are trained at the first speed while feedback is conveyed from the receiver to the transmitter using the second subset of lanes operating at the second speed.

Type: Grant

Filed: September 13, 2018

Date of Patent: January 21, 2020

Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC

Inventors: Shiqi Sun, Michael J. Tresidder, Yanfeng Wang
Residency map descriptors

Patent number: 10540802

Abstract: A processor receives a request to access one or more levels of a partially resident texture (PRT) resource. The levels represent a texture at different levels of detail (LOD) and the request includes normalized coordinates indicating a location in the texture. The processor accesses a texture descriptor that includes dimensions of a first level of the levels and one or more offsets between a reference level and one or more second levels that are associated with one or more residency maps that indicate texels that are resident in the PRT resource. The processor translates the normalized coordinates to texel coordinates in the one or more residency maps based on the offset and accesses, in response to the request, the one or more residency maps based on the texel coordinates to determine whether texture data indicated by the normalized coordinates is resident in the PRT resource.

Type: Grant

Filed: January 31, 2019

Date of Patent: January 21, 2020

Assignee: Advanced Micro Devices, Inc.

Inventors: Maxim V. Kazakov, Mark Fowler
High-speed selective cache invalidates and write-backs on GPUS

Patent number: 10540280

Abstract: Techniques for performing cache invalidates and write-backs in an accelerated processing device (e.g., a graphics processing device that renders three-dimensional graphics) are disclosed. The techniques involve receiving requests from a “master” (e.g., the central processing unit). The techniques involve invalidating virtual-to-physical address translations in an address translation request. The techniques include splitting up the requests based on whether the requests target virtually or physically tagged caches. Addresses for the portions of a request that target physically tagged caches are translated using invalidated virtual-to-physical address translations for speed. The split up request is processed to generate micro-transactions for individual caches targeted by the request. Micro-transactions for physically and virtually tagged caches are processed in parallel. Once all micro-transactions for a request have been processed, the unit that made the request is notified.

Type: Grant

Filed: December 23, 2016

Date of Patent: January 21, 2020

Assignees: ADVANCED MICRO DEVICES, INC., ATI TECHNOLOGIES ULC

Inventors: Mark Fowler, Jimshed Mirza, Anthony Asaro
Headerless word line driver with shared wordline underdrive control

Patent number: 10541013

Abstract: A word line driver circuit receives a word line input signal and supplies a word line driver output signal to a worldline. The word line driver circuit includes a transistor having a first current carrying terminal coupled to the word line driver output signal and a second current carrying terminal coupled to a first node. A gate of the transistor is coupled to the word line input signal, and the transistor provides a path from the word line to the first node while the word line is asserted. A programmable word line underdrive circuit is coupled between the first node and a ground node to reduce a voltage on the word line output signal. A plurality of word line driver circuits are coupled to the first node and use the word line underdrive circuit to underdrive their respective word lines.

Type: Grant

Filed: November 13, 2018

Date of Patent: January 21, 2020

Assignee: Advanced Micro Devices, Inc.

Inventors: Russell J. Schreiber, Tawfik Ahmed, Ilango Jeyasubramanian
REFRESH SCHEME IN A MEMORY CONTROLLER

Publication number: 20200020384

Abstract: In one form, a memory controller includes a command queue, an arbiter, a refresh logic circuit, and a final arbiter. The command queue receives and stores memory access requests for a memory. The arbiter selectively picks accesses from the command queue according to a first type of accesses and a second type of accesses. The first type of accesses and the second type of accesses correspond to different page statuses of corresponding memory accesses in the memory. The refresh logic circuit generates a refresh command to a bank of the memory and provides a priority indicator with the refresh command whose value is set according to a number of pending refreshes. The final arbiter selectively orders the refresh command with respect to memory access requests of the first type accesses and the second type accesses based on the priority indicator.

Type: Application

Filed: July 18, 2018

Publication date: January 16, 2020

Applicant: Advanced Micro Devices, Inc.

Inventors: Liang Zhao, YuBin Yao
METHOD AND SYSTEM FOR PARTIAL WAVEFRONT MERGER

Publication number: 20200019530

Abstract: A method and system for partial wavefront merger is described. Vector processing machines employ the partial wavefront merger to merge partial wavefronts into one or more wavefronts. The system includes a partial wavefront manager and unified registers. The partial wavefront manager detects wavefronts in different single-instruction-multiple-data (“SIMD”) units which contain inactive work items and active work items (hereinafter referred to as “partial wavefronts”), moves the partial wavefronts into one or more SIMD unit(s) and merges the partial wavefronts into one or more wavefront(s). The unified register allows each active work item in the one or more merged wavefront(s) to access the previously allocated registers in the originating SIMD units. Consequently, the contents of the unified registers do not have to be copied to the SIMD unit(s) executing the one or merged wavefront(s).

Type: Application

Filed: July 23, 2018

Publication date: January 16, 2020

Applicants: Advanced Micro Devices, Inc., ATI Technologies ULC

Inventors: Yunpeng Zhu, Jimshed Mirza
Shader writes to compressed resources

Patent number: 10535178

Abstract: Systems, apparatuses, and methods for performing shader writes to compressed surfaces are disclosed. In one embodiment, a processor includes at least a memory and one or more shader units. In one embodiment, a shader unit of the processor is configured to receive a write request targeted to a compressed surface. The shader unit is configured to identify a first block of the compressed surface targeted by the write request. Responsive to determining the data of the write request targets less than the entirety of the first block, the first shader unit reads the first block from the cache and decompress the first block. Next, the first shader unit merges the data of the write request with the decompressed first block. Then, the shader unit compresses the merged data and writes the merged data to the cache.

Type: Grant

Filed: December 22, 2016

Date of Patent: January 14, 2020

Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC

Inventors: Jimshed Mirza, Christopher J. Brennan, Anthony Chan, Leon Lai
Configuring dynamic random access memory refreshes for systems having multiple ranks of memory

Patent number: 10535393

Abstract: An electronic device including a memory functional block having multiple ranks of memory and a memory controller functional block coupled to the memory. The memory controller includes refresh logic that detects, based on buffered memory accesses for each rank of memory of the ranks of memory, two or more ranks of memory for which a refresh is to be performed during a refresh interval. Based at least in part on one or more properties of buffered memory accesses for the two or more ranks of memory, the refresh logic determines a refresh order for performing refreshes for the two or more ranks of memory during the refresh interval. The memory controller then performs, in the refresh order, refreshes for the two or more ranks of memory during the refresh interval.

Type: Grant

Filed: July 21, 2018

Date of Patent: January 14, 2020

Assignee: ADVANCED MICRO DEVICES, INC.

Inventor: Kedarnath Balakrishnan

prev … 103 104 105 106 107 108 109 110 111 … next