Patents Assigned to Advanced Micro Device, Inc.

PERSISTENT WEIGHTS IN TRAINING

Publication number: 20220101110

Abstract: Techniques are disclosed for performing machine learning operations. The techniques include fetching weights for a first layer in a first format; performing matrix multiplication of the weights fetched in the first format with values provided by a prior layer in a forwards training pass; fetching the weights for the first layer in a second format different from the first format; and performing matrix multiplication for a backwards pass, the matrix multiplication including multiplication of the weights fetched in the second format with values corresponding to values provided as the result of the forwards training pass for the first layer.

Type: Application

Filed: September 25, 2020

Publication date: March 31, 2022

Applicant: Advanced Micro Devices, Inc.

Inventors: Swapnil P. Sakharshete, Maxim V. Kazakov
Access log and address translation log for a processor

Patent number: 11288205

Abstract: A processor maintains an access log indicating a stream of cache misses at a cache of the processor. In response to each of at least a subset of cache misses at the cache, the processor records a corresponding entry in the access log, indicating a physical memory address of the memory access request that resulted in the corresponding miss. In addition, the processor maintains an address translation log that indicates a mapping of physical memory addresses to virtual memory addresses. In response to an address translation (e.g., a page walk) that translates a virtual address to a physical address, the processor stores a mapping of the physical address to the corresponding virtual address at an entry of the address translation log. Software executing at the processor can use the two logs for memory management.

Type: Grant

Filed: June 23, 2015

Date of Patent: March 29, 2022

Assignees: Advanced Micro Devices, Inc., ATI TECHNOLOGIES ULC

Inventors: Benjamin T. Sander, Mark Fowler, Anthony Asaro, Gongxian Jeffrey Cheng, Mike Mantor
Enhanced atomics for workgroup synchronization

Patent number: 11288095

Abstract: A technique for synchronizing workgroups is provided. The techniques comprise detecting that one or more non-executing workgroups are ready to execute, placing the one or more non-executing workgroups into one or more ready queues based on the synchronization status of the one or more workgroups, detecting that computing resources are available for execution of one or more ready workgroups, and scheduling for execution one or more ready workgroups from the one or more ready queues in an order that is based on the relative priority of the ready queues.

Type: Grant

Filed: September 30, 2019

Date of Patent: March 29, 2022

Assignee: Advanced Micro Devices, Inc.

Inventors: Alexandru Dutu, Matthew D. Sinclair, Bradford M. Beckmann, David A. Wood
Dynamic control of multi-region fabric

Patent number: 11289131

Abstract: Systems, apparatuses, and methods for implementing dynamic control of a multi-region fabric are disclosed. A system includes at least one or more processing units, one or more memory devices, and a communication fabric coupled to the processing unit(s) and memory device(s). The system partitions the fabric into multiple regions based on different traffic types and/or periodicities of the clients connected to the regions. For example, the system partitions the fabric into a stutter region for predictable, periodic clients and a non-stutter region for unpredictable, non-periodic clients. The system power-gates the entirety of the fabric in response to detecting a low activity condition. After power-gating the entirety of the fabric, the system periodically wakes up one or more stutter regions while keeping the other non-stutter regions in power-gated mode. Each stutter region monitors stutter client(s) for activity and processes any requests before going back into power-gated mode.

Type: Grant

Filed: December 7, 2020

Date of Patent: March 29, 2022

Assignee: Advanced Micro Devices, Inc.

Inventors: Benjamin Tsien, Alexander J. Branover, Alan Dodson Smith, Chintan S. Patel
Real-time and low latency packetization protocol for live compressed video data

Patent number: 11290515

Abstract: Systems, apparatuses, and methods for implementing real-time, low-latency packetization protocols for live compressed video data are disclosed. A wireless transmitter includes at least a codec and a media access control (MAC) layer unit. In order for the codec to communicate with the MAC layer unit, the codec encodes the compression ratio in a header embedded inside the encoded video stream. The MAC layer unit extracts the compression ratio from the header and determines a modulation coding scheme (MCS) for transferring the video stream based on the compression ratio. The MAC layer unit and the codec also implement a feedback loop such that the MAC layer unit can command the codec to adjust the compression ratio. Since the changes to the video might not be implemented immediately, the MAC layer unit relies on the header to determine when the video data is coming in with the requested compression ratio.

Type: Grant

Filed: December 7, 2017

Date of Patent: March 29, 2022

Assignee: Advanced Micro Devices, Inc.

Inventors: Ngoc Vinh Vu, Darren Rae Di Cera, Adam William Lynch, Shane Bentley, Douglas Mammoser, David Robert Stark, Jr.
METHOD AND APPARATUS FOR PROVIDING PERSISTENCE TO REMOTE NON-VOLATILE MEMORY

Publication number: 20220091974

Abstract: A processing device and methods of controlling remote persistent writes are provided. Methods include receiving an instruction of a program to issue a persistent write to remote memory. The methods also include logging an entry in a local domain when the persistent write instruction is received and providing a first indication that the persistent write will be persisted to the remote memory. The methods also include executing the persistent write to the remote memory and providing a second indication that the persistent write to the remote memory is completed. The methods also include providing the first and second indications when it is determined not to execute the persistent write according to global ordering and providing the second indication without providing the first indication when it is determined to execute the persistent write to remote memory according to global ordering.

Type: Application

Filed: September 24, 2020

Publication date: March 24, 2022

Applicant: Advanced Micro Devices, Inc.

Inventors: Nuwan Jayasena, Shaizeen Aga
DATA INTEGRITY FOR PERSISTENT MEMORY SYSTEMS AND THE LIKE

Publication number: 20220091921

Abstract: A data processor includes provides memory commands to a memory channel according to predetermined criteria. The data processor includes a first error code generation circuit, a second error code generation circuit, and a queue. The first error code generation circuit generates a first type of error code in response to data of a write request. The second error code generation circuit generates a second type of error code for the write request, the second type of error code different from the first type of error code. The queue is coupled to the first error code generation circuit and to the second error code generation circuit, for provides write commands to an interface, the write commands including the data, the first type of error code, and the second type of error code.

Type: Application

Filed: December 7, 2021

Publication date: March 24, 2022

Applicant: Advanced Micro Devices, Inc.

Inventors: Kedarnath Balakrishnan, James R. Magro, Kevin Michael Lepak, Vilas Sridharan
REFRESH MANAGEMENT LIST FOR DRAM

Publication number: 20220091784

Abstract: A memory controller includes a command queue having a first input for receiving memory access requests, and a memory interface queue having an output for coupling to a memory channel adapted for connecting to at least one dynamic random access memory (DRAM) module. A refresh control circuit monitors activate commands to be sent over the memory channel. In response to an activate command meeting a designated condition, the refresh control circuit identifies a candidate aggressor row associated with the activate command. A command is sent to the DRAM requesting that the candidate aggressor row be queued for mitigation in a future refresh or refresh management event.

Type: Application

Filed: September 21, 2020

Publication date: March 24, 2022

Applicant: Advanced Micro Devices, Inc.

Inventor: Kevin M. Brandl
TIME DOMAIN MULTIPLY AND ACCUMULATE SYSTEM

Publication number: 20220091822

Abstract: A multiply-accumulate computation is performed using digital logic circuits. To perform the computation, a plurality of target signals are received at a respective plurality of ripple counters. The counter outputs of the respective ripple counters are scaled by setting stop count values. Counter outputs of the respective ripple counters are adjusted with respective constant values by setting counter reset values at the respective ripple counters. Each count pulses of the target signals for an adjusted period. The count values of the ripple counters are summed. The results may be used to calculate an average value for an adaptive voltage and frequency scaling process.

Type: Application

Filed: September 22, 2020

Publication date: March 24, 2022

Applicant: Advanced Micro Devices, Inc.

Inventors: Ravinder Reddy Rachala, Stephen Victor Kosonocky, Miguel Rodriguez
SYSTEM AND METHOD FOR APPLICATION MIGRATION FOR A DOCKABLE DEVICE

Publication number: 20220092001

Abstract: Described is a method and apparatus for application migration between a dockable device and a docking station in a seamless manner. The dockable device includes a processor and the docking station includes a high-performance processor. The method includes determining a docking state of a dockable device while at least an application is running. Application migration from the dockable device to a docking station is initiated when the dockable device is moving to a docked state. Application migration from the docking station to the dockable device is initiated when the dockable device is moving to an undocked state. The application continues to run during the application migration from the dockable device to the docking station or during the application migration from the docking station to the dockable device.

Type: Application

Filed: December 6, 2021

Publication date: March 24, 2022

Applicant: Advanced Micro Devices, Inc.

Inventors: Jonathan Lawrence Campbell, Yuping Shen
FINE-GRAINED CONDITIONAL DISPATCHING

Publication number: 20220091880

Abstract: Techniques for executing workgroups are provided. The techniques include executing, for a first workgroup of a first kernel dispatch, a workgroup dependency instruction that includes an indication to prioritize execution of a second workgroup of a second kernel dispatch, and in response to the workgroup dependency instruction, dispatching the second workgroup of the second kernel dispatch prior to dispatching a third workgroup of the second kernel dispatch, wherein no workgroup dependency instruction including an indication to prioritize execution of the third workgroup has been executed.

Type: Application

Filed: September 24, 2020

Publication date: March 24, 2022

Applicant: Advanced Micro Devices, Inc.

Inventors: Alexandru Dutu, Marcus Nathaniel Chow, Matthew D. Sinclair, Bradford M. Beckmann, David A. Wood
Methods and apparatus for decoding video using re-ordered motion vector buffer

Patent number: 11284096

Abstract: A host processor, such as a central processing unit (CPU), programmed to execute a software driver that causes the host processor to generate a motion compensation command for a plurality of cores of a massively parallel processor, such as a graphics processing unit (GPU), to provide motion compensation for encoded video. The motion compensation command for the plurality of cores of the massively parallel processor contains executable instructions for processing a plurality of motion vectors grouped by a plurality of prediction modes from a re-ordered motion vector buffer by the plurality of cores of the massively parallel processor.

Type: Grant

Filed: February 25, 2021

Date of Patent: March 22, 2022

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Michael L. Schmit, Ashish Farmer, Radhakrishna Giduthuri
Register renaming after a non-pickable scheduler queue

Patent number: 11281466

Abstract: A floating point unit includes a non-pickable scheduler queue (NSQ) that offers a load operation concurrently with a load store unit retrieving load data for an operand that is to be loaded by the load operation. The floating point unit also includes a renamer that renames architectural registers used by the load operation and allocates physical register numbers to the load operation in response to receiving the load operation from the NSQ. The floating point unit further includes a set of pickable scheduler queues that receive the load operation from the renamer and store the load operation prior to execution. A physical register file is implemented in the floating point unit and a free list is used to store physical register numbers of entries in the physical register file that are available for allocation.

Type: Grant

Filed: October 22, 2019

Date of Patent: March 22, 2022

Assignees: Advanced Micro Devices, Inc., ATI TECHNOLOGIES ULC

Inventors: Arun A. Nair, Michael Estlick, Erik Swanson, Sneha V. Desai, Donglin Ji
Argmax use for machine learning

Patent number: 11281470

Abstract: A processing device is provided which comprises memory and a processor. The processor is configured to receive an array of floating point numbers each having a plurality of bits used to represent a probability value. For each floating point number, the processor is configured to replace values in a portion of the bits used to represent the probability value with index values to represent an index corresponding to a location of a corresponding floating point number in the memory. The processor is also configured to process the floating point numbers using SIMD instructions to execute one of an argmax operation and an argmin operation.

Type: Grant

Filed: December 19, 2019

Date of Patent: March 22, 2022

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Michael L. Schmit, Lakshmi Kumar
Deskewing method for a physical layer interface on a multi-chip module

Patent number: 11283589

Abstract: Systems, apparatuses, and methods for implementing a deskewing method for a physical layer interface on a multi-chip module are disclosed. A circuit connected to a plurality of communication lanes trains each lane to synchronize a local clock of the lane with a corresponding global clock at a beginning of a timing window. Next, the circuit symbol rotates each lane by a single step responsive to determining that all of the plurality of lanes have an incorrect symbol alignment. Responsive to determining that some but not all of the plurality of lanes have a correct symbol alignment, the circuit symbol rotates lanes which have an incorrect symbol alignment by a single step. When the end of the timing window has been reached, the circuit symbol rotates lanes which have a correct symbol alignment and adjusts a phase of a corresponding global clock to compensate for missed symbol rotations.

Type: Grant

Filed: December 21, 2020

Date of Patent: March 22, 2022

Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC

Inventors: Varun Gupta, Milam Paraschou, Gerald R. Talbot, Gurunath Dollin, Damon Tohidi, Eric Ian Carpenter, Chad S. Gallun, Jeffrey Cooper, Hanwoo Cho, Thomas H. Likens, III, Scott F. Dow, Michael J. Tresidder
Trusted memory zone

Patent number: 11281495

Abstract: A system and method for providing security of sensitive information within chips using SIMD micro-architecture are described. A command processor within a parallel data processing unit, such as a graphics processing unit (GPU), schedules commands across multiple compute units based on state information. When the command processor determines a rescheduling condition is satisfied, it causes the overwriting of at least a portion of data stored in each of the one or more local memories used by the multiple compute units. The command processor also stores in the secure memory a copy of state information associated with a given group of commands and later checks it to ensure corruption by a malicious or careless program is prevented.

Type: Grant

Filed: August 27, 2018

Date of Patent: March 22, 2022

Assignee: Advanced Micro Devices, Inc.

Inventor: Rex Eldon McCrary
Reducing chiplet wakeup latency

Patent number: 11281280

Abstract: Systems, apparatuses, and methods for reducing chiplet interrupt latency are disclosed. A system includes one or more processing nodes, one or more memory devices, a communication fabric coupled to the processing unit(s) and memory device(s) via link interfaces, and a power management unit. The power management unit manages the power states of the various components and the link interfaces of the system. If the power management unit detects a request to wake up a given component, and the link interface to the given component is powered down, then the power management unit sends an out-of-band signal to wake up the given component in parallel with powering up the link interface. Also, when multiple link interfaces need to be powered up, the power management unit powers up the multiple link interfaces in an order which complies with voltage regulator load-step requirements while minimizing the latency of pending operations.

Type: Grant

Filed: May 18, 2020

Date of Patent: March 22, 2022

Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC

Inventors: Benjamin Tsien, Michael J. Tresidder, Ivan Yanfeng Wang, Kevin M. Lepak, Ann Ling, Richard M. Born, John P. Petry, Bryan P. Broussard, Eric Christopher Morton
Dynamic banking and bit separation in memories

Patent number: 11281592

Abstract: Memories that are configurable to operate in either a banked mode or a bit-separated mode. The memories include a plurality of memory banks; multiplexing circuitry; input circuitry; and output circuitry. The input circuitry inputs at least a portion of a memory address and configuration information to the multiplexing circuitry. The multiplexing circuitry generates read data by combining a selected subset of data corresponding to the address from each of the plurality of memory banks, the subset selected based on the configuration information, if the configuration information indicates a bit-separated mode. The multiplexing circuitry generates the read data by combining data corresponding to the address from one of the memory banks, the one of the memory banks selected based on the configuration information, if the configuration information indicates a banked mode. The output circuitry outputs the generated read data from the memory.

Type: Grant

Filed: November 11, 2019

Date of Patent: March 22, 2022

Assignee: Advanced Micro Devices, Inc.

Inventor: Russell J. Schreiber
Transfer of cachelines in a processing system based on transfer costs

Patent number: 11275688

Abstract: A processing system includes a plurality of compute units, with each compute unit having an associated first cache of a plurality of first caches, and a second cache shared by the plurality of compute units. The second cache operates to manage transfers of caches between the first caches of the plurality of first caches such that when multiple candidate first caches contain a valid copy of a requested cacheline, the second cache selects the candidate first cache having the shortest total path from the second cache to the candidate first cache and from the candidate first cache to the compute unit issuing a request for the requested cacheline.

Type: Grant

Filed: December 2, 2019

Date of Patent: March 15, 2022

Assignee: Advanced Micro Devices, Inc.

Inventors: Sriram Srinivasan, John Kelley, Matthew Schoenwald
Method for matrix data broadcast in parallel processing

Patent number: 11275612

Abstract: Systems, apparatuses, and methods for efficient parallel execution of multiple work units in a processor by reducing a number of memory accesses are disclosed. A computing system includes a processor core with a parallel data architecture. One or more of a software application and firmware implement matrix operations and support the broadcast of shared data to multiple compute units of the processor core. The application creates thread groups by matching compute kernels of the application with data items, and grouping the resulting work units into thread groups. The application assigns the thread groups to compute units based on detecting shared data among the compute units. Rather than send multiple read access to a memory subsystem for the shared data, a single access request is generated. The single access request includes information to identify the multiple compute units for receiving the shared data when broadcasted.

Type: Grant

Filed: December 20, 2019

Date of Patent: March 15, 2022

Assignee: Advanced Micro Devices, Inc.

Inventors: Li Peng, Jian Yang, Chi Tang

prev … 71 72 73 74 75 76 77 78 79 … next