Patents Assigned to Advanced Micros Devices, Inc.
-
Patent number: 11232622Abstract: An apparatus includes a command buffer configured to temporarily store commands. The apparatus also includes processing units disposed at a substrate. The processing units are configured to access a plurality of copies of a command from the command buffer. The processing units include first processing units (such as fixed function hardware blocks) to perform geometry operations indicated by the command on a set of primitives. The geometry operations are performed concurrently by the first processing units. The processing units also include second processing units (such as shaders) to process mutually exclusive sets of pixels generated by rasterizing the set of primitives. The apparatus also includes a cache to temporarily store the pixels after shading by the shaders. The processing units stop or interrupt processing commands in response to detecting a synchronization point and resume processing the commands in response to all the processing units completing commands before synchronization point.Type: GrantFiled: November 27, 2019Date of Patent: January 25, 2022Assignee: Advanced Micro Devices, Inc.Inventors: Skyler J. Saleh, Ruijin Wu
-
Patent number: 11233510Abstract: Systems, apparatuses, and methods for efficiently performing operations system are disclosed. A computing system uses a memory for storing data, and one or more processing units. The memory includes multiple rows for storing the data with each intersection of a row and a column being a memory bit cell. The memory processes operations. For particular operations, the two or more operands are accessed simultaneously for generating a result without being read out and stored. Two indications are generated specifying at least a first row and a second row targeted by the operation. The memory generates a result by performing the operation for each of the one or more cells in the first row a stored value with a respective stored value in the one or more cells in the second row.Type: GrantFiled: April 27, 2018Date of Patent: January 25, 2022Assignee: Advanced Micro Devices, Inc.Inventors: John J. Wuu, Edward Chang
-
Patent number: 11231962Abstract: With the success of programming models such as OpenCL and CUDA, heterogeneous computing platforms are becoming mainstream. However, these heterogeneous systems are low-level, not composable, and their behavior is often implementation defined even for standardized programming models. In contrast, the method and system embodiments for the heterogeneous parallel primitives (HPP) programming model disclosed herein provide a flexible and composable programming platform that guarantees behavior even in the case of developing high-performance code.Type: GrantFiled: October 30, 2017Date of Patent: January 25, 2022Assignee: Advanced Micro Devices, Inc.Inventors: Benedict R. Gaster, Lee W. Howes
-
Patent number: 11232847Abstract: Methods, devices, and systems for testing a number of combinations of memory in a computer system. A modular memory device is installed in a memory channel in communication with a processor. The modular memory device includes a number of memory storage devices. The number of memory storage devices include a number of pins. A subset of the number of memory storage devices is selected. A subset of the plurality of pins which do not correspond to the subset of the number of memory storage devices and are not part of a memory map of the computer system is selected. Each pin of the subset of the plurality of pins configured with a termination impedance. The subset of the number of memory storage devices is tested.Type: GrantFiled: September 20, 2019Date of Patent: January 25, 2022Assignee: Advanced Micro Devices, Inc.Inventors: Glennis Eliagh Covington, Benjamin Lyle Winston, Santha Kumar Parameswaran, Shannon T. Kesner
-
Patent number: 11231931Abstract: A processor includes a first core and a second core to execute computer instructions. Each of the cores includes its own private memory cache and speculative load queue. The speculative load queue stores cachelines for the computer instructions and data when the core is operating in a speculative state with respect to a process or thread. The processor includes a state tracking buffer having a state field to store a speculative exclusive ownership state for each cacheline in the speculative load queue when present therein.Type: GrantFiled: December 20, 2018Date of Patent: January 25, 2022Assignee: ADVANCED MICRO DEVICES, INC.Inventor: Sooraj Puthoor
-
Patent number: 11227651Abstract: A read path for reading data from a memory includes a sense amplifier having data (SAT) and data complement (SAC) output nodes and a latch. The latch includes an input tri-state inverter including first and second PMOS transistors connected between VDD and an intermediate node, and first and second NMOS transistors connected between VSS and the intermediate node. A gate connection of the first PMOS and NMOS transistors is connected to the SAT node; a gate connection of the second PMOS transistor is connected to a sense amplifier enable complement input; and a gate connection of the second NMOS transistor is connected to a sense amplifier enable input. The latch also includes an output driver with an input connected to the intermediate node and an output connected to a data output node. The latch thus has two gate delays between the SAT node and the data output node.Type: GrantFiled: November 22, 2019Date of Patent: January 18, 2022Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Arijit Banerjee, Russell Schreiber, Kyle Whittle
-
Patent number: 11226819Abstract: A processing unit includes a plurality of processing elements and one or more caches. A first thread executes a program that includes one or more prefetch instructions to prefetch information into a first cache. Prefetching is selectively enabled when executing the first thread on a first processing element dependent upon whether one or more second threads previously executed the program on the first processing element. The first thread is then dispatched to execute the program on the first processing element. In some cases, a dispatcher receives the first thread four dispatching to the first processing element. The dispatcher modifies the prefetch instruction to disable prefetching into the first cache in response to the one or more second threads having previously executed the program on the first processing element.Type: GrantFiled: November 20, 2017Date of Patent: January 18, 2022Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Brian Emberling, Michael Mantor
-
Patent number: 11226900Abstract: An approach for tracking data stored in caches uses a Bloom filter to reduce the number of addresses that need to be tracked by a coherence directory. When a requested address is determined to not be currently tracked by either the coherence directory or the Bloom filter, tracking of the address is initiated in the Bloom filter, but not in the coherence directory. Initiating tracking of the address in the Bloom filter includes setting hash bits in the Bloom filter so that subsequent requests for the address will “hit” the Bloom filter. When a requested address is determined to be tracked by the coherence directory, the Bloom filter is not used to track the address.Type: GrantFiled: January 29, 2020Date of Patent: January 18, 2022Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Weon Taek Na, Yasuko Eckert, Mark H. Oskin, Gabriel H. Loh, William Louie Walker, Michael Warren Boyer
-
Patent number: 11227214Abstract: Systems, apparatuses, and methods for implementing memory bandwidth reduction techniques for low power convolutional neural network inference applications are disclosed. A system includes at least a processing unit and an external memory coupled to the processing unit. The system detects a request to perform a convolution operation on input data from a plurality of channels. Responsive to detecting the request, the system partitions the input data from the plurality of channels into 3D blocks so as to minimize the external memory bandwidth utilization for the convolution operation being performed. Next, the system loads a selected 3D block from external memory into internal memory and then generates convolution output data for the selected 3D block for one or more features. Then, for each feature, the system adds convolution output data together across channels prior to writing the convolution output data to the external memory.Type: GrantFiled: November 14, 2017Date of Patent: January 18, 2022Assignees: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Sateesh Lagudu, Lei Zhang, Allen Rush
-
Publication number: 20220012933Abstract: Techniques for performing shader operations are provided. The techniques include, performing pixel shading at a shading rate defined by pixel shader variable rate shading (“VRS”) data, and updating the pixel VRS data that indicates one or more shading rates for one or more tiles based on whether the tiles of the one or more tiles include triangle edges or do not include triangle edges, to generate updated VRS data.Type: ApplicationFiled: September 23, 2021Publication date: January 13, 2022Applicant: Advanced Micro Devices, Inc.Inventors: Skyler Jonathon Saleh, Vineet Goel, Pazhani Pillai, Ruijin Wu, Christopher J. Brennan, Andrew S. Pomianowski
-
Patent number: 11221772Abstract: A system includes a memory system comprising a memory module and a processor adapted to access the memory module using a memory controller that includes a controller having an input for receiving a power state change request signal and an output for providing memory operations, and a memory operation array comprising a plurality of entries. Each entry includes a plurality of encoded fields. The memory operation array is programmable to store different sequences of commands for particular types of memory of a plurality of types of memory in the plurality of entries that initiate entry into and exit from supported low power modes for the particular types of memory. The controller is responsive to an activation of the power state change request signal to access the memory operation array to fetch at least one entry, and to issue at least one memory operation indicated by the at least one entry.Type: GrantFiled: January 7, 2019Date of Patent: January 11, 2022Assignee: Advanced Micro Devices, Inc.Inventors: Kevin M. Brandl, Thomas H. Hamilton
-
Patent number: 11222685Abstract: A memory controller interfaces with a dynamic random access memory (DRAM) over a memory channel. A refresh control circuit monitors an activate counter which counts a rolling number of activate commands sent over the memory channel to a memory region of the DRAM. In response to the activate counter being above an intermediate management threshold value, the refresh control circuit only issue a refresh management (RFM) command if there is no REF command currently held at the refresh command circuit for the memory region.Type: GrantFiled: May 15, 2020Date of Patent: January 11, 2022Assignee: Advanced Micro Devices, Inc.Inventors: Kevin M. Brandl, Kedarnath Balakrishnan, Jing Wang, Guanhao Shen
-
Patent number: 11221902Abstract: Error handling for resilient software includes: receiving data indicating a region of resilient memory; detecting an error associated with a region of memory; and preventing raising an exception for the error in response to the region of memory falling within the region of resilient memory by preventing the region of memory as being identified as including the error.Type: GrantFiled: December 16, 2019Date of Patent: January 11, 2022Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Sudhanva Gurumurthi, Vilas Sridharan
-
Patent number: 11223575Abstract: Systems, apparatuses, and methods for efficient data transfer in a computing system are disclosed. A source generates packets to send across a communication fabric (or fabric) to a destination. The source generates partition enable signals for the partitions of payload data. The source negates an enable signal for a particular partition when the source determines the packet type indicates the particular partition should have an associated asserted enable signal in the packet, but the source also determines the particular partition includes a particular data pattern. Routing components of the fabric disable clock signals to storage elements assigned to store the particular partition. The destination inserts the particular data pattern for the particular partition in the payload data.Type: GrantFiled: December 23, 2019Date of Patent: January 11, 2022Assignee: Advanced Micro Devices, Inc.Inventors: Greggory D. Donley, Vydhyanathan Kalyanasundharam, Mark A. Silla, Ashwin Chincholi
-
Patent number: 11216378Abstract: The techniques described herein improve cache traffic performance in the context of contended lock instructions. More specifically, each core maintains a lock address contention table that stores addresses corresponding to contended lock instructions. The lock address contention table also includes a state value that indicates progress through a series of states meant to track whether a load by the core in a spin-loop associated with semaphore acquisition has obtained the semaphore in an exclusive state. Upon detecting that a load in a spin-loop has obtained the semaphore in an exclusive state, the core responds to incoming requests for access to the semaphore with negative acknowledgments. This allows the core to maintain the semaphore cache line in an exclusive state, which allows it to acquire the semaphore faster and to avoid transmitting that cache line to other cores unnecessarily.Type: GrantFiled: September 19, 2016Date of Patent: January 4, 2022Assignee: Advanced Micro Devices, Inc.Inventors: John M. King, Gregory W. Smaus
-
Patent number: 11216373Abstract: A memory controller may be configured with command logic that is capable of sending a memory access command having incomplete address information via a command/address bus that connects the memory controller to memory modules. The memory controller may send the memory access command via the bus for accessing data stored at memory locations of the memory modules. The memory locations may correspond to different near-memory generated reflecting that the data is not address aligned across the memory modules. Nonetheless, because of the near-memory address generation, the memory controller can send the memory access command having incomplete address information for accessing the data stored at the different addresses, as opposed to having to send multiple memory access commands specifying complete address information on the bus for accessing the data at the different addresses, thereby conserving usage of the available bus bandwidth, reducing power consumption, and increasing compute throughput.Type: GrantFiled: May 29, 2020Date of Patent: January 4, 2022Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Shaizeen Aga, Nuwan Jayasena, Johnathan Alsop
-
Patent number: 11216052Abstract: A processing unit includes a plurality of components configured to execute instructions and a controller. The controller is configured to determine a power consumption of the processing unit, determine a waiting status of the processing unit based on waiting statuses of components, and selectively modify an operating state of the processing unit based on the waiting status and the power consumption of the processing unit. In some cases, the operating state is modified in response to a percentage of the components that are waiting for an action to complete being below a threshold percentage and the power consumption of the processing unit being below a power limit. In some cases, the controller identifies a pattern in the power consumption by the processing unit and modifies the operating state of the processing unit to increase the power consumption of the processing unit based on the pattern identified by the controller.Type: GrantFiled: September 28, 2018Date of Patent: January 4, 2022Assignee: ADVANCED MICRO DEVICES, INC.Inventor: Greg Sadowski
-
Patent number: 11216279Abstract: A processor includes a prediction engine coupled to a training engine. The prediction engine includes a loop exit predictor. The training engine includes a loop exit branch monitor coupled to a loop detector. Based on at least one of a plurality of call return levels, the loop detector of the processor takes a snapshot of a retired predicted block during a first retirement time, compares the snapshot to a subsequent retired predicted block at a second retirement time, and based on the comparison, identifies a loop and loop exit branches within the loop for use by the loop exit branch monitor and the loop exit predictor to determine whether to override a general purpose conditional prediction.Type: GrantFiled: November 26, 2018Date of Patent: January 4, 2022Assignee: Advanced Micro Devices, Inc.Inventors: Anthony Jarvis, Thomas Clouqueur
-
Patent number: 11216250Abstract: A method includes providing a set of one or more computational units implemented in a set of one or more field programmable gate array (FPGA) devices, where the set of one or more computational units is configured to generate a plurality of output values based on one or more input values. The method further includes, for each computational unit of the set of computational units, performing a first calculation in the computational unit using a first number representation, where a first output of the plurality of output values is based on the first calculation, determining a second number representation based on the first output value, and performing a second calculation in the computational unit using the second number representation, where a second output of the plurality of output values is based on the second calculation.Type: GrantFiled: December 6, 2017Date of Patent: January 4, 2022Assignee: Advanced Micro Devices, Inc.Inventors: Nicholas P. Malaya, Elliot H. Mednick
-
Publication number: 20210406177Abstract: A method of controlling a cache is disclosed. The method comprises receiving a request to allocate a portion of memory to store data. The method also comprises directly mapping a portion of memory to an assigned contiguous portion of the cache memory when the request to allocate a portion of memory to store the data includes a cache residency request that the data continuously resides in cache memory. The method also comprises mapping the portion of memory to the cache memory using associative mapping when the request to allocate a portion of memory to store the data does not include a cache residency request that data continuously resides in the cache memory.Type: ApplicationFiled: September 25, 2020Publication date: December 30, 2021Applicant: Advanced Micro Devices, Inc.Inventors: Chintan S. Patel, Vydhyanathan Kalyanasundharam, Benjamin Tsien