Patents Assigned to Advanced Micro Devices
-
Publication number: 20220103907Abstract: Techniques are provided herein for processing video data. The techniques include identifying one or more input factors including one or more of signal quality factors, video content complexity factors, and hardware buffering factors for one or more of a video encoding system and a video playback system; evaluating the one or more input factors to determine adjustments to apply to one or both of the video encoding system and the video playback system; and applying the determine adjustments to the one or both of the video encoding system and the video playback system.Type: ApplicationFiled: September 25, 2020Publication date: March 31, 2022Applicant: Advanced Micro Devices, Inc.Inventors: Adam H. Li, Eugene Kuznetsov, Girish P. Subramaniam, Jihyuk Choi
-
Publication number: 20220100662Abstract: The techniques described herein improve cache traffic performance in the context of contended lock instructions. More specifically, each core maintains a lock address contention table that stores addresses corresponding to contended lock instructions. The lock address contention table also includes a state value that indicates progress through a series of states meant to track whether a load by the core in a spin-loop associated with semaphore acquisition has obtained the semaphore in an exclusive state. Upon detecting that a load in a spin-loop has obtained the semaphore in an exclusive state, the core responds to incoming requests for access to the semaphore with negative acknowledgments. This allows the core to maintain the semaphore cache line in an exclusive state, which allows it to acquire the semaphore faster and to avoid transmitting that cache line to other cores unnecessarily.Type: ApplicationFiled: December 9, 2021Publication date: March 31, 2022Applicant: Advanced Micro Devices, Inc.Inventors: John M. King, Gregory W. Smaus
-
Publication number: 20220101110Abstract: Techniques are disclosed for performing machine learning operations. The techniques include fetching weights for a first layer in a first format; performing matrix multiplication of the weights fetched in the first format with values provided by a prior layer in a forwards training pass; fetching the weights for the first layer in a second format different from the first format; and performing matrix multiplication for a backwards pass, the matrix multiplication including multiplication of the weights fetched in the second format with values corresponding to values provided as the result of the forwards training pass for the first layer.Type: ApplicationFiled: September 25, 2020Publication date: March 31, 2022Applicant: Advanced Micro Devices, Inc.Inventors: Swapnil P. Sakharshete, Maxim V. Kazakov
-
Publication number: 20220101179Abstract: Techniques are disclosed for communicating between a machine learning accelerator and one or more processing cores. The techniques include obtaining data at the machine learning accelerator via an input/output die; processing the data at the machine learning accelerator to generate machine learning processing results; and exporting the machine learning processing results via the input/output die, wherein the input/output die is coupled to one or more processor chiplets via one or more processor ports, and wherein the input/output die is coupled to the machine learning accelerator via an accelerator port.Type: ApplicationFiled: September 25, 2020Publication date: March 31, 2022Applicant: Advanced Micro Devices, Inc.Inventor: Maxim V. Kazakov
-
Publication number: 20220100391Abstract: A framework disclosed herein extends a relaxed, scoped memory model to a system that includes nodes across a commodity network and maintains coherency across the system. A new scope, cluster scope, is defined, that allows for memory accesses at scopes less than cluster scope to operate on locally cached versions of remote data from across the commodity network without having to issue expensive network operations. Cluster scope operations generate network commands that are used to synchronize memory across the commodity network.Type: ApplicationFiled: September 25, 2020Publication date: March 31, 2022Applicant: Advanced Micro Devices, Inc.Inventors: Michael W. LeBeane, Khaled Hamidouche, Hari S. Thangirala, Brandon Keith Potter
-
Patent number: 11288095Abstract: A technique for synchronizing workgroups is provided. The techniques comprise detecting that one or more non-executing workgroups are ready to execute, placing the one or more non-executing workgroups into one or more ready queues based on the synchronization status of the one or more workgroups, detecting that computing resources are available for execution of one or more ready workgroups, and scheduling for execution one or more ready workgroups from the one or more ready queues in an order that is based on the relative priority of the ready queues.Type: GrantFiled: September 30, 2019Date of Patent: March 29, 2022Assignee: Advanced Micro Devices, Inc.Inventors: Alexandru Dutu, Matthew D. Sinclair, Bradford M. Beckmann, David A. Wood
-
Patent number: 11288205Abstract: A processor maintains an access log indicating a stream of cache misses at a cache of the processor. In response to each of at least a subset of cache misses at the cache, the processor records a corresponding entry in the access log, indicating a physical memory address of the memory access request that resulted in the corresponding miss. In addition, the processor maintains an address translation log that indicates a mapping of physical memory addresses to virtual memory addresses. In response to an address translation (e.g., a page walk) that translates a virtual address to a physical address, the processor stores a mapping of the physical address to the corresponding virtual address at an entry of the address translation log. Software executing at the processor can use the two logs for memory management.Type: GrantFiled: June 23, 2015Date of Patent: March 29, 2022Assignees: Advanced Micro Devices, Inc., ATI TECHNOLOGIES ULCInventors: Benjamin T. Sander, Mark Fowler, Anthony Asaro, Gongxian Jeffrey Cheng, Mike Mantor
-
Patent number: 11289131Abstract: Systems, apparatuses, and methods for implementing dynamic control of a multi-region fabric are disclosed. A system includes at least one or more processing units, one or more memory devices, and a communication fabric coupled to the processing unit(s) and memory device(s). The system partitions the fabric into multiple regions based on different traffic types and/or periodicities of the clients connected to the regions. For example, the system partitions the fabric into a stutter region for predictable, periodic clients and a non-stutter region for unpredictable, non-periodic clients. The system power-gates the entirety of the fabric in response to detecting a low activity condition. After power-gating the entirety of the fabric, the system periodically wakes up one or more stutter regions while keeping the other non-stutter regions in power-gated mode. Each stutter region monitors stutter client(s) for activity and processes any requests before going back into power-gated mode.Type: GrantFiled: December 7, 2020Date of Patent: March 29, 2022Assignee: Advanced Micro Devices, Inc.Inventors: Benjamin Tsien, Alexander J. Branover, Alan Dodson Smith, Chintan S. Patel
-
Patent number: 11290515Abstract: Systems, apparatuses, and methods for implementing real-time, low-latency packetization protocols for live compressed video data are disclosed. A wireless transmitter includes at least a codec and a media access control (MAC) layer unit. In order for the codec to communicate with the MAC layer unit, the codec encodes the compression ratio in a header embedded inside the encoded video stream. The MAC layer unit extracts the compression ratio from the header and determines a modulation coding scheme (MCS) for transferring the video stream based on the compression ratio. The MAC layer unit and the codec also implement a feedback loop such that the MAC layer unit can command the codec to adjust the compression ratio. Since the changes to the video might not be implemented immediately, the MAC layer unit relies on the header to determine when the video data is coming in with the requested compression ratio.Type: GrantFiled: December 7, 2017Date of Patent: March 29, 2022Assignee: Advanced Micro Devices, Inc.Inventors: Ngoc Vinh Vu, Darren Rae Di Cera, Adam William Lynch, Shane Bentley, Douglas Mammoser, David Robert Stark, Jr.
-
Publication number: 20220091974Abstract: A processing device and methods of controlling remote persistent writes are provided. Methods include receiving an instruction of a program to issue a persistent write to remote memory. The methods also include logging an entry in a local domain when the persistent write instruction is received and providing a first indication that the persistent write will be persisted to the remote memory. The methods also include executing the persistent write to the remote memory and providing a second indication that the persistent write to the remote memory is completed. The methods also include providing the first and second indications when it is determined not to execute the persistent write according to global ordering and providing the second indication without providing the first indication when it is determined to execute the persistent write to remote memory according to global ordering.Type: ApplicationFiled: September 24, 2020Publication date: March 24, 2022Applicant: Advanced Micro Devices, Inc.Inventors: Nuwan Jayasena, Shaizeen Aga
-
Publication number: 20220091921Abstract: A data processor includes provides memory commands to a memory channel according to predetermined criteria. The data processor includes a first error code generation circuit, a second error code generation circuit, and a queue. The first error code generation circuit generates a first type of error code in response to data of a write request. The second error code generation circuit generates a second type of error code for the write request, the second type of error code different from the first type of error code. The queue is coupled to the first error code generation circuit and to the second error code generation circuit, for provides write commands to an interface, the write commands including the data, the first type of error code, and the second type of error code.Type: ApplicationFiled: December 7, 2021Publication date: March 24, 2022Applicant: Advanced Micro Devices, Inc.Inventors: Kedarnath Balakrishnan, James R. Magro, Kevin Michael Lepak, Vilas Sridharan
-
Publication number: 20220091784Abstract: A memory controller includes a command queue having a first input for receiving memory access requests, and a memory interface queue having an output for coupling to a memory channel adapted for connecting to at least one dynamic random access memory (DRAM) module. A refresh control circuit monitors activate commands to be sent over the memory channel. In response to an activate command meeting a designated condition, the refresh control circuit identifies a candidate aggressor row associated with the activate command. A command is sent to the DRAM requesting that the candidate aggressor row be queued for mitigation in a future refresh or refresh management event.Type: ApplicationFiled: September 21, 2020Publication date: March 24, 2022Applicant: Advanced Micro Devices, Inc.Inventor: Kevin M. Brandl
-
Publication number: 20220092001Abstract: Described is a method and apparatus for application migration between a dockable device and a docking station in a seamless manner. The dockable device includes a processor and the docking station includes a high-performance processor. The method includes determining a docking state of a dockable device while at least an application is running. Application migration from the dockable device to a docking station is initiated when the dockable device is moving to a docked state. Application migration from the docking station to the dockable device is initiated when the dockable device is moving to an undocked state. The application continues to run during the application migration from the dockable device to the docking station or during the application migration from the docking station to the dockable device.Type: ApplicationFiled: December 6, 2021Publication date: March 24, 2022Applicant: Advanced Micro Devices, Inc.Inventors: Jonathan Lawrence Campbell, Yuping Shen
-
Publication number: 20220091822Abstract: A multiply-accumulate computation is performed using digital logic circuits. To perform the computation, a plurality of target signals are received at a respective plurality of ripple counters. The counter outputs of the respective ripple counters are scaled by setting stop count values. Counter outputs of the respective ripple counters are adjusted with respective constant values by setting counter reset values at the respective ripple counters. Each count pulses of the target signals for an adjusted period. The count values of the ripple counters are summed. The results may be used to calculate an average value for an adaptive voltage and frequency scaling process.Type: ApplicationFiled: September 22, 2020Publication date: March 24, 2022Applicant: Advanced Micro Devices, Inc.Inventors: Ravinder Reddy Rachala, Stephen Victor Kosonocky, Miguel Rodriguez
-
Publication number: 20220091880Abstract: Techniques for executing workgroups are provided. The techniques include executing, for a first workgroup of a first kernel dispatch, a workgroup dependency instruction that includes an indication to prioritize execution of a second workgroup of a second kernel dispatch, and in response to the workgroup dependency instruction, dispatching the second workgroup of the second kernel dispatch prior to dispatching a third workgroup of the second kernel dispatch, wherein no workgroup dependency instruction including an indication to prioritize execution of the third workgroup has been executed.Type: ApplicationFiled: September 24, 2020Publication date: March 24, 2022Applicant: Advanced Micro Devices, Inc.Inventors: Alexandru Dutu, Marcus Nathaniel Chow, Matthew D. Sinclair, Bradford M. Beckmann, David A. Wood
-
Patent number: 11281466Abstract: A floating point unit includes a non-pickable scheduler queue (NSQ) that offers a load operation concurrently with a load store unit retrieving load data for an operand that is to be loaded by the load operation. The floating point unit also includes a renamer that renames architectural registers used by the load operation and allocates physical register numbers to the load operation in response to receiving the load operation from the NSQ. The floating point unit further includes a set of pickable scheduler queues that receive the load operation from the renamer and store the load operation prior to execution. A physical register file is implemented in the floating point unit and a free list is used to store physical register numbers of entries in the physical register file that are available for allocation.Type: GrantFiled: October 22, 2019Date of Patent: March 22, 2022Assignees: Advanced Micro Devices, Inc., ATI TECHNOLOGIES ULCInventors: Arun A. Nair, Michael Estlick, Erik Swanson, Sneha V. Desai, Donglin Ji
-
Patent number: 11284096Abstract: A host processor, such as a central processing unit (CPU), programmed to execute a software driver that causes the host processor to generate a motion compensation command for a plurality of cores of a massively parallel processor, such as a graphics processing unit (GPU), to provide motion compensation for encoded video. The motion compensation command for the plurality of cores of the massively parallel processor contains executable instructions for processing a plurality of motion vectors grouped by a plurality of prediction modes from a re-ordered motion vector buffer by the plurality of cores of the massively parallel processor.Type: GrantFiled: February 25, 2021Date of Patent: March 22, 2022Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Michael L. Schmit, Ashish Farmer, Radhakrishna Giduthuri
-
Patent number: 11281495Abstract: A system and method for providing security of sensitive information within chips using SIMD micro-architecture are described. A command processor within a parallel data processing unit, such as a graphics processing unit (GPU), schedules commands across multiple compute units based on state information. When the command processor determines a rescheduling condition is satisfied, it causes the overwriting of at least a portion of data stored in each of the one or more local memories used by the multiple compute units. The command processor also stores in the secure memory a copy of state information associated with a given group of commands and later checks it to ensure corruption by a malicious or careless program is prevented.Type: GrantFiled: August 27, 2018Date of Patent: March 22, 2022Assignee: Advanced Micro Devices, Inc.Inventor: Rex Eldon McCrary
-
Patent number: 11281470Abstract: A processing device is provided which comprises memory and a processor. The processor is configured to receive an array of floating point numbers each having a plurality of bits used to represent a probability value. For each floating point number, the processor is configured to replace values in a portion of the bits used to represent the probability value with index values to represent an index corresponding to a location of a corresponding floating point number in the memory. The processor is also configured to process the floating point numbers using SIMD instructions to execute one of an argmax operation and an argmin operation.Type: GrantFiled: December 19, 2019Date of Patent: March 22, 2022Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Michael L. Schmit, Lakshmi Kumar
-
Patent number: 11281592Abstract: Memories that are configurable to operate in either a banked mode or a bit-separated mode. The memories include a plurality of memory banks; multiplexing circuitry; input circuitry; and output circuitry. The input circuitry inputs at least a portion of a memory address and configuration information to the multiplexing circuitry. The multiplexing circuitry generates read data by combining a selected subset of data corresponding to the address from each of the plurality of memory banks, the subset selected based on the configuration information, if the configuration information indicates a bit-separated mode. The multiplexing circuitry generates the read data by combining data corresponding to the address from one of the memory banks, the one of the memory banks selected based on the configuration information, if the configuration information indicates a banked mode. The output circuitry outputs the generated read data from the memory.Type: GrantFiled: November 11, 2019Date of Patent: March 22, 2022Assignee: Advanced Micro Devices, Inc.Inventor: Russell J. Schreiber