Patents Assigned to Advanced Micro Device, Inc.
-
Publication number: 20220122652Abstract: A memory controller interfaces with a random access memory over a memory channel. A refresh control circuit monitors an activate counter which counts a rolling number of activate commands sent over the memory channel to a memory region of the memory. In response to the activate counter being above an intermediate management threshold value, the refresh control circuit only issue a refresh management (RFM) command if there is no REF command currently held at the refresh command circuit for the memory region.Type: ApplicationFiled: December 29, 2021Publication date: April 21, 2022Applicant: Advanced Micro Devices, Inc.Inventors: Kevin M. Brandl, Kedarnath Balakrishnan, Jing Wang, Guanhao Shen
-
Patent number: 11307631Abstract: A processing unit includes compute units partitioned into one or islands that are provided with operating voltages and clock signals having clock frequencies independent of providing operating voltages or clock signals to other islands of compute units. The processing unit also includes dynamic voltage and frequency scaling (DVFS) hardware configured to compute one or more numbers of active memory barriers in the one or more islands. The DVFS hardware is also configured to modify the operating voltages or clock frequencies provided to the one or more islands in response to a change in numbers of active memory barriers in the one or more islands. In some cases, the operating voltage or clock frequency provided to an island is increased in response to the number of active memory barriers in the island decreasing. The operating voltage or clock frequency provided to the island is decreased in response to the number of active memory barriers in the island increasing.Type: GrantFiled: May 29, 2019Date of Patent: April 19, 2022Assignee: ADVANCED MICRO DEVICES, INC.Inventor: Vedula Venkata Srikant Bharadwaj
-
Patent number: 11309222Abstract: Various semiconductor chips with solder capped probe test pads are disclosed. In accordance with one aspect of the present invention, a semiconductor chip is provided that includes a substrate, plural input/output (I/O) structures on the substrate and plural test pads on the substrate. Each of the test pads includes a first conductor pad and a first solder cap on the first conductor pad.Type: GrantFiled: August 29, 2019Date of Patent: April 19, 2022Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Lei Fu, Milind S. Bhagavat, Chia-Hao Cheng
-
Patent number: 11308057Abstract: Described herein is a system and method for multiplexer tree (muxtree) indexing. Muxtree indexing performs hashing and row reduction in parallel by use of each select bit only once in a particular path of the muxtree. The muxtree indexing generates a different final index as compared to conventional hashed indexing but still results in a fair hash, where all table entries get used with equal distribution with uniformly random selects.Type: GrantFiled: November 28, 2017Date of Patent: April 19, 2022Assignee: Advanced Micro Devices, Inc.Inventors: Steven R. Havlir, Patrick J. Shyvers
-
Patent number: 11309911Abstract: A data processing platform, method, and program product perform compression and decompression of a set of data items. Suffix data and a prefix are selected for each respective data item in the set of data items based on data content of the respective data item. The set of data items is sorted based on the prefixes. The prefixes are encoded by querying multiple encoding tables to create a code word containing compressed information representing values of all prefixes for the set of data items. The code word and suffix data for each of the data items are stored in memory. The code word is decompressed to recover the prefixes. The recovered prefixes are paired with their respective suffix data.Type: GrantFiled: August 16, 2019Date of Patent: April 19, 2022Assignee: Advanced Micro Devices, Inc.Inventors: Alexander D. Breslow, Nuwan Jayasena, John Kalamatianos
-
Patent number: 11308648Abstract: Sampling circuitry independently accesses channels of texture data that represent a set of pixels. One or more processing units separately compress the channels of the texture data and store compressed data representative of the channels of the texture data for the set of pixels. The channels can include a red channel, a blue channel, and a green channel that represent color values of the set of pixels and an alpha channel that represents degrees of transparency of the set of pixels. Storing the compressed data can include writing the compress data to portions of a cache. The processing units can identify a subset of the set of pixels that share a value of a first channel of the plurality of channels and represent the value of the first channel over the subset of the set of pixels using information representing the value, the first channel, and boundaries of the subset.Type: GrantFiled: September 23, 2020Date of Patent: April 19, 2022Assignees: Advanced Micro Devices, Inc., ATI TECHNOLOGIES ULCInventors: Saurabh Sharma, Laurent Lefebvre, Sagar Shankar Bhandare, Ruijin Wu
-
Patent number: 11307993Abstract: For one or more stages of execution of a software application at a first processor, a remap vector of a second processor is reconfigured to represent a dynamic mapping of virtual address groups to physical address groups for that stage. Each bit position of the remap vector is configured to store a value indicating whether a corresponding virtual address group is actively mapped to a corresponding physical address group. Address translation operations issued during a stage of execution of the software application are selectively processed based on the configuration of the remap vector for that stage, with the particular value at the bit position of the remap vector associated with the corresponding virtual address group controlling whether processing of the address translation operation is continued to obtain a virtual-to-physical address translation sought by the address translation operation or processing of the address translation operation is ceased and a fault is issued.Type: GrantFiled: November 26, 2018Date of Patent: April 19, 2022Assignees: Advanced Micro Devices, Inc., ATI TECHNOLOGIES ULCInventors: Anthony Asaro, Richard E. George
-
Publication number: 20220114097Abstract: Methods, devices, and systems for managing performance of a processor having multiple compute units. An effective number of the multiple compute units may be determined to designate as having priority. On a condition that the effective number is nonzero, the effective number of the multiple compute units may each be designated as a priority compute unit. Priority compute units may have access to a shared cache whereas non-priority compute units may not. Workgroups may be preferentially dispatched to priority compute units. Memory access requests from priority compute units may be served ahead of requests from non-priority compute units.Type: ApplicationFiled: December 20, 2021Publication date: April 14, 2022Applicant: Advanced Micro Devices, Inc.Inventors: Zhe Wang, Sooraj Puthoor, Bradford M. Beckmann
-
Patent number: 11301256Abstract: Embodiments disclose a system and method for reducing virtual address translation latency in a wide execution engine that implements virtual memory. One example method describes a method comprising receiving a wavefront, classifying the wavefront into a subset based on classification criteria selected to reduce virtual address translation latency associated with a memory support structure, and scheduling the wavefront for processing based on the classifying.Type: GrantFiled: August 22, 2014Date of Patent: April 12, 2022Assignee: Advanced Micro Devices, Inc.Inventors: Lisa R. Hsu, James Michael O'Connor
-
Patent number: 11294747Abstract: A neural network runs a known input data set using an error free power setting and using an error prone power setting. The differences in the outputs of the neural network using the two different power settings determine a high level error rate associated with the output of the neural network using the error prone power setting. If the high level error rate is excessive, the error prone power setting is adjusted to reduce errors by changing voltage and/or clock frequency utilized by the neural network system. If the high level error rate is within bounds, the error prone power setting can remain allowing the neural network to operate with an acceptable error tolerance and improved efficiency. The error tolerance can be specified by the neural network application.Type: GrantFiled: January 31, 2018Date of Patent: April 5, 2022Assignee: Advanced Micro Devices, Inc.Inventors: Andrew G. Kegel, David A. Roberts
-
Patent number: 11294678Abstract: Systems, apparatuses, and methods for implementing scheduler queue assignment logic are disclosed. A processor includes at least a decode unit, scheduler queue assignment logic, scheduler queues, pickers, and execution units. The assignment logic receives a plurality of operations from a decode unit in each clock cycle. The assignment logic includes a separate logical unit for each different type of operation which is executable by the different execution units of the processor. For each different type of operation, the assignment logic determines which of the possible assignment permutations are valid for assigning different numbers of operations to scheduler queues in a given clock cycle. The assignment logic receives an indication of how many operations to assign in the given clock cycle, and then the assignment logic selects one of the valid assignment permutations for the number of operations specified by the indication.Type: GrantFiled: May 29, 2018Date of Patent: April 5, 2022Assignee: Advanced Micro Devices, Inc.Inventors: Matthew T. Sobel, Donald A. Priore, Alok Garg
-
Patent number: 11294724Abstract: An approach is provided for allocating a shared resource to threads in a multi-threaded microprocessor based upon the usefulness of the shared resource to each of the threads. The usefulness of a shared resource to a thread is determined based upon the number of entries in the shared resource that are allocated to the thread and the number of active entries that the thread has in the shared resource. Threads that are allocated a large number of entries in the shared resource and have a small number of active entries in the shared resource, indicative of a low level of parallelism, can operate efficiently with fewer entries in the shared resource, and have their allocation limit in the shared resource reduced.Type: GrantFiled: September 27, 2019Date of Patent: April 5, 2022Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Kai Troester, Neil Marketkar, Matthew T. Sobel, Srinivas Keshav
-
Patent number: 11294710Abstract: A processing system suspends execution of a program thread based on an access latency required for a program thread to access memory. The processing system employs different memory modules having different memory technologies, located at different points in the processing system, and the like, or a combination thereof. The different memory modules therefore have different access latencies for memory transactions (e.g., memory reads and writes). When a program thread issues a memory transaction that results in an access to a memory module having a relatively long access latency (referred to as “slow” memory), the processor suspends execution of the program thread and releases processor resources used by the program thread. When the processor receives a response to the memory transaction from the memory module, the processor resumes execution of the suspended program thread.Type: GrantFiled: November 10, 2017Date of Patent: April 5, 2022Assignee: Advanced Micro Devices, Inc.Inventor: Douglas Benson Hunt
-
Patent number: 11295507Abstract: A graphics processing unit (GPU) or other apparatus includes a plurality of shader engines. The apparatus also includes a first front end (FE) circuit and one or more second FE circuits. The first FE circuit is configured to schedule geometry workloads for the plurality of shader engines in a first mode. The first FE circuit is configured to schedule geometry workloads for a first subset of the plurality of shader engines and the one or more second FE circuits are configured to schedule geometry workloads for a second subset of the plurality of shader engines in a second mode. In some cases, a partition switch is configured to selectively connect the first FE circuit or the one or more second FE circuits to the second subset of the plurality of shader engines depending on whether the apparatus is in the first mode or the second mode.Type: GrantFiled: November 6, 2020Date of Patent: April 5, 2022Assignee: Advanced Micro Devices, Inc.Inventors: Mark Leather, Michael Mantor
-
Patent number: 11294810Abstract: A processing system includes an interconnect fabric coupleable to a local memory and at least one compute cluster coupled to the interconnect fabric. The compute cluster includes a processor core and a cache hierarchy. The cache hierarchy has a plurality of caches and a throttle controller configured to throttle a rate of memory requests issuable by the processor core based on at least one of an access latency metric and a prefetch accuracy metric. The access latency metric represents an average access latency for memory requests for the processor core and the prefetch accuracy metric represents an accuracy of a prefetcher of a cache of the cache hierarchy.Type: GrantFiled: December 12, 2017Date of Patent: April 5, 2022Assignee: Advanced Micro Devices, Inc.Inventors: William L. Walker, William E. Jones
-
Publication number: 20220100257Abstract: Systems, methods, devices, and computer-implemented instructions for processor power management implemented in a compiler. In some implementations, a characteristic of code is determined. An instruction based on the determined characteristic is inserted into the code. The code and inserted instruction are compiled to generate compiled code. The compiled code is output.Type: ApplicationFiled: September 25, 2020Publication date: March 31, 2022Applicant: Advanced Micro Devices, Inc.Inventors: Vedula Venkata Srikant Bharadwaj, Shomit N. Das, Anthony T. Gutierrez, Vignesh Adhinarayanan
-
Publication number: 20220103907Abstract: Techniques are provided herein for processing video data. The techniques include identifying one or more input factors including one or more of signal quality factors, video content complexity factors, and hardware buffering factors for one or more of a video encoding system and a video playback system; evaluating the one or more input factors to determine adjustments to apply to one or both of the video encoding system and the video playback system; and applying the determine adjustments to the one or both of the video encoding system and the video playback system.Type: ApplicationFiled: September 25, 2020Publication date: March 31, 2022Applicant: Advanced Micro Devices, Inc.Inventors: Adam H. Li, Eugene Kuznetsov, Girish P. Subramaniam, Jihyuk Choi
-
Publication number: 20220100662Abstract: The techniques described herein improve cache traffic performance in the context of contended lock instructions. More specifically, each core maintains a lock address contention table that stores addresses corresponding to contended lock instructions. The lock address contention table also includes a state value that indicates progress through a series of states meant to track whether a load by the core in a spin-loop associated with semaphore acquisition has obtained the semaphore in an exclusive state. Upon detecting that a load in a spin-loop has obtained the semaphore in an exclusive state, the core responds to incoming requests for access to the semaphore with negative acknowledgments. This allows the core to maintain the semaphore cache line in an exclusive state, which allows it to acquire the semaphore faster and to avoid transmitting that cache line to other cores unnecessarily.Type: ApplicationFiled: December 9, 2021Publication date: March 31, 2022Applicant: Advanced Micro Devices, Inc.Inventors: John M. King, Gregory W. Smaus
-
Publication number: 20220100391Abstract: A framework disclosed herein extends a relaxed, scoped memory model to a system that includes nodes across a commodity network and maintains coherency across the system. A new scope, cluster scope, is defined, that allows for memory accesses at scopes less than cluster scope to operate on locally cached versions of remote data from across the commodity network without having to issue expensive network operations. Cluster scope operations generate network commands that are used to synchronize memory across the commodity network.Type: ApplicationFiled: September 25, 2020Publication date: March 31, 2022Applicant: Advanced Micro Devices, Inc.Inventors: Michael W. LeBeane, Khaled Hamidouche, Hari S. Thangirala, Brandon Keith Potter
-
Publication number: 20220101179Abstract: Techniques are disclosed for communicating between a machine learning accelerator and one or more processing cores. The techniques include obtaining data at the machine learning accelerator via an input/output die; processing the data at the machine learning accelerator to generate machine learning processing results; and exporting the machine learning processing results via the input/output die, wherein the input/output die is coupled to one or more processor chiplets via one or more processor ports, and wherein the input/output die is coupled to the machine learning accelerator via an accelerator port.Type: ApplicationFiled: September 25, 2020Publication date: March 31, 2022Applicant: Advanced Micro Devices, Inc.Inventor: Maxim V. Kazakov