Patents Assigned to Advanced Micro Device, Inc.
-
Publication number: 20220101110Abstract: Techniques are disclosed for performing machine learning operations. The techniques include fetching weights for a first layer in a first format; performing matrix multiplication of the weights fetched in the first format with values provided by a prior layer in a forwards training pass; fetching the weights for the first layer in a second format different from the first format; and performing matrix multiplication for a backwards pass, the matrix multiplication including multiplication of the weights fetched in the second format with values corresponding to values provided as the result of the forwards training pass for the first layer.Type: ApplicationFiled: September 25, 2020Publication date: March 31, 2022Applicant: Advanced Micro Devices, Inc.Inventors: Swapnil P. Sakharshete, Maxim V. Kazakov
-
Patent number: 11288205Abstract: A processor maintains an access log indicating a stream of cache misses at a cache of the processor. In response to each of at least a subset of cache misses at the cache, the processor records a corresponding entry in the access log, indicating a physical memory address of the memory access request that resulted in the corresponding miss. In addition, the processor maintains an address translation log that indicates a mapping of physical memory addresses to virtual memory addresses. In response to an address translation (e.g., a page walk) that translates a virtual address to a physical address, the processor stores a mapping of the physical address to the corresponding virtual address at an entry of the address translation log. Software executing at the processor can use the two logs for memory management.Type: GrantFiled: June 23, 2015Date of Patent: March 29, 2022Assignees: Advanced Micro Devices, Inc., ATI TECHNOLOGIES ULCInventors: Benjamin T. Sander, Mark Fowler, Anthony Asaro, Gongxian Jeffrey Cheng, Mike Mantor
-
Patent number: 11288095Abstract: A technique for synchronizing workgroups is provided. The techniques comprise detecting that one or more non-executing workgroups are ready to execute, placing the one or more non-executing workgroups into one or more ready queues based on the synchronization status of the one or more workgroups, detecting that computing resources are available for execution of one or more ready workgroups, and scheduling for execution one or more ready workgroups from the one or more ready queues in an order that is based on the relative priority of the ready queues.Type: GrantFiled: September 30, 2019Date of Patent: March 29, 2022Assignee: Advanced Micro Devices, Inc.Inventors: Alexandru Dutu, Matthew D. Sinclair, Bradford M. Beckmann, David A. Wood
-
Patent number: 11289131Abstract: Systems, apparatuses, and methods for implementing dynamic control of a multi-region fabric are disclosed. A system includes at least one or more processing units, one or more memory devices, and a communication fabric coupled to the processing unit(s) and memory device(s). The system partitions the fabric into multiple regions based on different traffic types and/or periodicities of the clients connected to the regions. For example, the system partitions the fabric into a stutter region for predictable, periodic clients and a non-stutter region for unpredictable, non-periodic clients. The system power-gates the entirety of the fabric in response to detecting a low activity condition. After power-gating the entirety of the fabric, the system periodically wakes up one or more stutter regions while keeping the other non-stutter regions in power-gated mode. Each stutter region monitors stutter client(s) for activity and processes any requests before going back into power-gated mode.Type: GrantFiled: December 7, 2020Date of Patent: March 29, 2022Assignee: Advanced Micro Devices, Inc.Inventors: Benjamin Tsien, Alexander J. Branover, Alan Dodson Smith, Chintan S. Patel
-
Patent number: 11290515Abstract: Systems, apparatuses, and methods for implementing real-time, low-latency packetization protocols for live compressed video data are disclosed. A wireless transmitter includes at least a codec and a media access control (MAC) layer unit. In order for the codec to communicate with the MAC layer unit, the codec encodes the compression ratio in a header embedded inside the encoded video stream. The MAC layer unit extracts the compression ratio from the header and determines a modulation coding scheme (MCS) for transferring the video stream based on the compression ratio. The MAC layer unit and the codec also implement a feedback loop such that the MAC layer unit can command the codec to adjust the compression ratio. Since the changes to the video might not be implemented immediately, the MAC layer unit relies on the header to determine when the video data is coming in with the requested compression ratio.Type: GrantFiled: December 7, 2017Date of Patent: March 29, 2022Assignee: Advanced Micro Devices, Inc.Inventors: Ngoc Vinh Vu, Darren Rae Di Cera, Adam William Lynch, Shane Bentley, Douglas Mammoser, David Robert Stark, Jr.
-
Publication number: 20220091974Abstract: A processing device and methods of controlling remote persistent writes are provided. Methods include receiving an instruction of a program to issue a persistent write to remote memory. The methods also include logging an entry in a local domain when the persistent write instruction is received and providing a first indication that the persistent write will be persisted to the remote memory. The methods also include executing the persistent write to the remote memory and providing a second indication that the persistent write to the remote memory is completed. The methods also include providing the first and second indications when it is determined not to execute the persistent write according to global ordering and providing the second indication without providing the first indication when it is determined to execute the persistent write to remote memory according to global ordering.Type: ApplicationFiled: September 24, 2020Publication date: March 24, 2022Applicant: Advanced Micro Devices, Inc.Inventors: Nuwan Jayasena, Shaizeen Aga
-
Publication number: 20220091921Abstract: A data processor includes provides memory commands to a memory channel according to predetermined criteria. The data processor includes a first error code generation circuit, a second error code generation circuit, and a queue. The first error code generation circuit generates a first type of error code in response to data of a write request. The second error code generation circuit generates a second type of error code for the write request, the second type of error code different from the first type of error code. The queue is coupled to the first error code generation circuit and to the second error code generation circuit, for provides write commands to an interface, the write commands including the data, the first type of error code, and the second type of error code.Type: ApplicationFiled: December 7, 2021Publication date: March 24, 2022Applicant: Advanced Micro Devices, Inc.Inventors: Kedarnath Balakrishnan, James R. Magro, Kevin Michael Lepak, Vilas Sridharan
-
Publication number: 20220091784Abstract: A memory controller includes a command queue having a first input for receiving memory access requests, and a memory interface queue having an output for coupling to a memory channel adapted for connecting to at least one dynamic random access memory (DRAM) module. A refresh control circuit monitors activate commands to be sent over the memory channel. In response to an activate command meeting a designated condition, the refresh control circuit identifies a candidate aggressor row associated with the activate command. A command is sent to the DRAM requesting that the candidate aggressor row be queued for mitigation in a future refresh or refresh management event.Type: ApplicationFiled: September 21, 2020Publication date: March 24, 2022Applicant: Advanced Micro Devices, Inc.Inventor: Kevin M. Brandl
-
Publication number: 20220091822Abstract: A multiply-accumulate computation is performed using digital logic circuits. To perform the computation, a plurality of target signals are received at a respective plurality of ripple counters. The counter outputs of the respective ripple counters are scaled by setting stop count values. Counter outputs of the respective ripple counters are adjusted with respective constant values by setting counter reset values at the respective ripple counters. Each count pulses of the target signals for an adjusted period. The count values of the ripple counters are summed. The results may be used to calculate an average value for an adaptive voltage and frequency scaling process.Type: ApplicationFiled: September 22, 2020Publication date: March 24, 2022Applicant: Advanced Micro Devices, Inc.Inventors: Ravinder Reddy Rachala, Stephen Victor Kosonocky, Miguel Rodriguez
-
Publication number: 20220092001Abstract: Described is a method and apparatus for application migration between a dockable device and a docking station in a seamless manner. The dockable device includes a processor and the docking station includes a high-performance processor. The method includes determining a docking state of a dockable device while at least an application is running. Application migration from the dockable device to a docking station is initiated when the dockable device is moving to a docked state. Application migration from the docking station to the dockable device is initiated when the dockable device is moving to an undocked state. The application continues to run during the application migration from the dockable device to the docking station or during the application migration from the docking station to the dockable device.Type: ApplicationFiled: December 6, 2021Publication date: March 24, 2022Applicant: Advanced Micro Devices, Inc.Inventors: Jonathan Lawrence Campbell, Yuping Shen
-
Publication number: 20220091880Abstract: Techniques for executing workgroups are provided. The techniques include executing, for a first workgroup of a first kernel dispatch, a workgroup dependency instruction that includes an indication to prioritize execution of a second workgroup of a second kernel dispatch, and in response to the workgroup dependency instruction, dispatching the second workgroup of the second kernel dispatch prior to dispatching a third workgroup of the second kernel dispatch, wherein no workgroup dependency instruction including an indication to prioritize execution of the third workgroup has been executed.Type: ApplicationFiled: September 24, 2020Publication date: March 24, 2022Applicant: Advanced Micro Devices, Inc.Inventors: Alexandru Dutu, Marcus Nathaniel Chow, Matthew D. Sinclair, Bradford M. Beckmann, David A. Wood
-
Patent number: 11284096Abstract: A host processor, such as a central processing unit (CPU), programmed to execute a software driver that causes the host processor to generate a motion compensation command for a plurality of cores of a massively parallel processor, such as a graphics processing unit (GPU), to provide motion compensation for encoded video. The motion compensation command for the plurality of cores of the massively parallel processor contains executable instructions for processing a plurality of motion vectors grouped by a plurality of prediction modes from a re-ordered motion vector buffer by the plurality of cores of the massively parallel processor.Type: GrantFiled: February 25, 2021Date of Patent: March 22, 2022Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Michael L. Schmit, Ashish Farmer, Radhakrishna Giduthuri
-
Patent number: 11281466Abstract: A floating point unit includes a non-pickable scheduler queue (NSQ) that offers a load operation concurrently with a load store unit retrieving load data for an operand that is to be loaded by the load operation. The floating point unit also includes a renamer that renames architectural registers used by the load operation and allocates physical register numbers to the load operation in response to receiving the load operation from the NSQ. The floating point unit further includes a set of pickable scheduler queues that receive the load operation from the renamer and store the load operation prior to execution. A physical register file is implemented in the floating point unit and a free list is used to store physical register numbers of entries in the physical register file that are available for allocation.Type: GrantFiled: October 22, 2019Date of Patent: March 22, 2022Assignees: Advanced Micro Devices, Inc., ATI TECHNOLOGIES ULCInventors: Arun A. Nair, Michael Estlick, Erik Swanson, Sneha V. Desai, Donglin Ji
-
Patent number: 11281470Abstract: A processing device is provided which comprises memory and a processor. The processor is configured to receive an array of floating point numbers each having a plurality of bits used to represent a probability value. For each floating point number, the processor is configured to replace values in a portion of the bits used to represent the probability value with index values to represent an index corresponding to a location of a corresponding floating point number in the memory. The processor is also configured to process the floating point numbers using SIMD instructions to execute one of an argmax operation and an argmin operation.Type: GrantFiled: December 19, 2019Date of Patent: March 22, 2022Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Michael L. Schmit, Lakshmi Kumar
-
Patent number: 11283589Abstract: Systems, apparatuses, and methods for implementing a deskewing method for a physical layer interface on a multi-chip module are disclosed. A circuit connected to a plurality of communication lanes trains each lane to synchronize a local clock of the lane with a corresponding global clock at a beginning of a timing window. Next, the circuit symbol rotates each lane by a single step responsive to determining that all of the plurality of lanes have an incorrect symbol alignment. Responsive to determining that some but not all of the plurality of lanes have a correct symbol alignment, the circuit symbol rotates lanes which have an incorrect symbol alignment by a single step. When the end of the timing window has been reached, the circuit symbol rotates lanes which have a correct symbol alignment and adjusts a phase of a corresponding global clock to compensate for missed symbol rotations.Type: GrantFiled: December 21, 2020Date of Patent: March 22, 2022Assignees: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Varun Gupta, Milam Paraschou, Gerald R. Talbot, Gurunath Dollin, Damon Tohidi, Eric Ian Carpenter, Chad S. Gallun, Jeffrey Cooper, Hanwoo Cho, Thomas H. Likens, III, Scott F. Dow, Michael J. Tresidder
-
Patent number: 11281495Abstract: A system and method for providing security of sensitive information within chips using SIMD micro-architecture are described. A command processor within a parallel data processing unit, such as a graphics processing unit (GPU), schedules commands across multiple compute units based on state information. When the command processor determines a rescheduling condition is satisfied, it causes the overwriting of at least a portion of data stored in each of the one or more local memories used by the multiple compute units. The command processor also stores in the secure memory a copy of state information associated with a given group of commands and later checks it to ensure corruption by a malicious or careless program is prevented.Type: GrantFiled: August 27, 2018Date of Patent: March 22, 2022Assignee: Advanced Micro Devices, Inc.Inventor: Rex Eldon McCrary
-
Patent number: 11281280Abstract: Systems, apparatuses, and methods for reducing chiplet interrupt latency are disclosed. A system includes one or more processing nodes, one or more memory devices, a communication fabric coupled to the processing unit(s) and memory device(s) via link interfaces, and a power management unit. The power management unit manages the power states of the various components and the link interfaces of the system. If the power management unit detects a request to wake up a given component, and the link interface to the given component is powered down, then the power management unit sends an out-of-band signal to wake up the given component in parallel with powering up the link interface. Also, when multiple link interfaces need to be powered up, the power management unit powers up the multiple link interfaces in an order which complies with voltage regulator load-step requirements while minimizing the latency of pending operations.Type: GrantFiled: May 18, 2020Date of Patent: March 22, 2022Assignees: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Benjamin Tsien, Michael J. Tresidder, Ivan Yanfeng Wang, Kevin M. Lepak, Ann Ling, Richard M. Born, John P. Petry, Bryan P. Broussard, Eric Christopher Morton
-
Patent number: 11281592Abstract: Memories that are configurable to operate in either a banked mode or a bit-separated mode. The memories include a plurality of memory banks; multiplexing circuitry; input circuitry; and output circuitry. The input circuitry inputs at least a portion of a memory address and configuration information to the multiplexing circuitry. The multiplexing circuitry generates read data by combining a selected subset of data corresponding to the address from each of the plurality of memory banks, the subset selected based on the configuration information, if the configuration information indicates a bit-separated mode. The multiplexing circuitry generates the read data by combining data corresponding to the address from one of the memory banks, the one of the memory banks selected based on the configuration information, if the configuration information indicates a banked mode. The output circuitry outputs the generated read data from the memory.Type: GrantFiled: November 11, 2019Date of Patent: March 22, 2022Assignee: Advanced Micro Devices, Inc.Inventor: Russell J. Schreiber
-
Patent number: 11275688Abstract: A processing system includes a plurality of compute units, with each compute unit having an associated first cache of a plurality of first caches, and a second cache shared by the plurality of compute units. The second cache operates to manage transfers of caches between the first caches of the plurality of first caches such that when multiple candidate first caches contain a valid copy of a requested cacheline, the second cache selects the candidate first cache having the shortest total path from the second cache to the candidate first cache and from the candidate first cache to the compute unit issuing a request for the requested cacheline.Type: GrantFiled: December 2, 2019Date of Patent: March 15, 2022Assignee: Advanced Micro Devices, Inc.Inventors: Sriram Srinivasan, John Kelley, Matthew Schoenwald
-
Patent number: 11275612Abstract: Systems, apparatuses, and methods for efficient parallel execution of multiple work units in a processor by reducing a number of memory accesses are disclosed. A computing system includes a processor core with a parallel data architecture. One or more of a software application and firmware implement matrix operations and support the broadcast of shared data to multiple compute units of the processor core. The application creates thread groups by matching compute kernels of the application with data items, and grouping the resulting work units into thread groups. The application assigns the thread groups to compute units based on detecting shared data among the compute units. Rather than send multiple read access to a memory subsystem for the shared data, a single access request is generated. The single access request includes information to identify the multiple compute units for receiving the shared data when broadcasted.Type: GrantFiled: December 20, 2019Date of Patent: March 15, 2022Assignee: Advanced Micro Devices, Inc.Inventors: Li Peng, Jian Yang, Chi Tang