Patents Assigned to Advanced Micro Devices
-
Patent number: 11854138Abstract: Described herein is a technique for modifying a bounding volume hierarchy. The techniques include combining preferred orientations of child nodes of a first bounding box node to generate a first preferred orientation; based on the first preferred orientation, converting one or more child nodes of the first bounding box node into one or more oriented bounding box nodes; combining preferred orientations of child nodes of a second bounding box node to generate a second preferred orientation; and based on the second preferred orientation, maintaining one or more children of the second bounding box node as non-oriented bounding box nodes.Type: GrantFiled: July 23, 2021Date of Patent: December 26, 2023Assignee: Advanced Micro Devices, Inc.Inventors: Matthäus G Chajdas, Michael A. Kern, David Ronald Oldcorn
-
Patent number: 11854652Abstract: A sense amplifier is biased to reduce leakage current equalize matched transistor bias during an idle state. A first read select transistor couples a true bit line and a sense amplifier true (SAT) signal line and a second read select transistor couples a complement bit line and a sense amplifier complement (SAC) signal line. The SAT and SAC signal lines are precharged during a precharge state. An equalization circuit shorts the SAT and SAC signal lines during the precharge state. A differential sense amplifier circuit for latching the memory cell value is coupled to the SAT signal line and the SAC signal line. The precharge circuit and the differential sense amplifier circuit are turned off during a sleep state to cause the SAT and SAC signal lines to float. A sleep circuit shorts the SAT and SAC signal lines during the sleep state.Type: GrantFiled: November 10, 2022Date of Patent: December 26, 2023Assignee: Advanced Micro Devices, Inc.Inventors: Russell J. Schreiber, Ryan T. Freese, Eric W. Busta
-
Patent number: 11853734Abstract: A processing system includes a compiler that automatically identifies sequences of instructions of tileable source code that can be replaced with tensor operations. The compiler generates enhanced code that replaces the identified sequences of instructions with tensor operations that invoke a special-purpose hardware accelerator. By automatically replacing instructions with tensor operations that invoke the special-purpose hardware accelerator, the compiler makes the performance improvements achievable through the special-purpose hardware accelerator available to programmers using high-level programming languages.Type: GrantFiled: May 10, 2022Date of Patent: December 26, 2023Assignee: Advanced Micro Devices, Inc.Inventors: Gregory P. Rodgers, Joseph L. Greathouse
-
Patent number: 11854602Abstract: A memory controller monitors memory command selected for dispatch to the memory and sends commands controlling a read clock state. A memory includes a read clock circuit and a mode register. The read clock circuit has an output for providing a hybrid read clock signal in response to a clock signal and a read clock mode signal. The mode register provides the read clock mode signal in response to a read clock mode, wherein the read clock circuit provides the hybrid read clock signal as a free-running clock signal that toggles continuously when the read clock mode is a first mode, and as a strobe signal that is active only in response to the memory receiving a read command when the read clock mode is a second mode.Type: GrantFiled: June 27, 2022Date of Patent: December 26, 2023Assignee: Advanced Micro Devices, Inc.Inventors: Aaron John Nygren, Karthik Gopalakrishnan, Tsun Ho Liu
-
Patent number: 11854139Abstract: A processing unit employs a hardware traversal engine to traverse an acceleration structure such as a ray tracing structure. The hardware traversal engine includes one or more memory modules to store state information and other data used for the structure traversal, and control logic to execute a traversal process based on the stored data and based on received information indicating a source node of the acceleration structure to be used for the traversal process. By employing a hardware traversal engine, the processing unit is able to execute the traversal process more quickly and efficiently, conserving processing resources and improving overall processing efficiency.Type: GrantFiled: December 28, 2021Date of Patent: December 26, 2023Assignee: Advanced Micro Devices, Inc.Inventors: Konstantin Igorevich Shkurko, Michael Mantor
-
Patent number: 11853111Abstract: Methods and apparatuses control electrical current supplied to a plurality of processing units in a multi-processor system. A plurality of current usage information corresponding to the processing units are received by a controller to determine a threshold current for each of the processing units. The controller determines a frequency reduction action and an instructions-per-cycle (IPC) reduction action for the each of the processing units based on the threshold current and regulates operations of the processing units based on the determined frequency and IPC reduction actions.Type: GrantFiled: September 8, 2022Date of Patent: December 26, 2023Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Amitabh Mehra, Richard Martin Born, Sriram Srinivasan, Sneha Komatireddy, Michael L Golden, Xiuting Kaleen C. Man, Gokul Subramani Ramalingam Lakshmi Devi, Xiaojie He
-
Patent number: 11853193Abstract: An approach is provided for a program profiler to implement inverse performance driven program analysis, which enables a user to specify a desired optimization end state and receive instructions on how to implement the optimization end state. The program profiler accesses profile data from an execution of a plurality of tasks executed on a plurality of computing resources. The program profiler constructs a dependency graph based on the profile data. The program profiler causes a user interface to be presented that represents the profile data. The program profiler receives an input for a modification of one or more execution attributes of one or more target tasks. The program profiler determines that the modification is projected to improve a performance metric while maintaining a validity of the dependency graph. The program profiler presents, via the user interface, one or more steps to implement the modification.Type: GrantFiled: October 29, 2021Date of Patent: December 26, 2023Assignee: Advanced Micro Devices, Inc.Inventors: Budirijanto Purnomo, Chen Shen
-
Publication number: 20230409336Abstract: In accordance with described techniques for VLIW Dynamic Communication, an instruction that causes dynamic communication of data to at least one processing element of a very long instruction word (VLIW) machine is dispatched to a plurality of processing elements of the VLIW machine. A first count of data communications issued by the plurality of processing elements and a second count of data communications served by the plurality of processing elements are maintained. At least one additional instruction is determined for dispatch to the plurality of processing elements of the VLIW machine based on the first count and the second count. For example, an instruction that is independent of the instruction is determined for dispatch while the first count and the second count are unequal, and an instruction that is dependent on the instruction is determined for dispatch based on the first count and the second count being equal.Type: ApplicationFiled: June 17, 2022Publication date: December 21, 2023Applicant: Advanced Micro Devices, Inc.Inventors: Sriseshan Srikanth, Karthik Ramu Sangaiah, Anthony Thomas Gutierrez, Vedula Venkata Srikant Bharadwaj, John Kalamatianos
-
Publication number: 20230409337Abstract: Devices and methods for partial sorting for coherence recovery are provided. The partial sorting is efficiently executed by utilizing existing hardware along the memory path (e.g., memory local to the compute unit). The devices include an accelerated processing device which comprises memory and a processor. The processor is, for example, a compute unit of a GPU which comprises a plurality of SIMD units and is configured to determine, for data entries each comprising a plurality of bits, a number of occurrences of different types of the data entries by storing the number of occurrences in one or more portions of the memory local to the processor, sort the data entries based on the determined number of occurrences stored in the one or more portions of the memory local to the processor and execute the sorted data entries.Type: ApplicationFiled: June 21, 2022Publication date: December 21, 2023Applicant: Advanced Micro Devices, Inc.Inventors: Matthäus G. Chajdas, Christopher J. Brennan
-
Publication number: 20230409982Abstract: Methods, devices, and systems for emulating a compute kernel with an ANN. The compute kernel is executed on a processor, and it is determined whether the compute kernel is a hotspot kernel. If the compute kernel is a hotspot kernel, the compute kernel is emulated with an ANN, and the ANN is substituted for the compute kernel.Type: ApplicationFiled: August 25, 2023Publication date: December 21, 2023Applicant: Advanced Micro Devices, Inc.Inventor: Nicholas Malaya
-
Publication number: 20230409232Abstract: A method and apparatus for training data in a computer system includes reading data stored in a first memory address in a memory and writing it to a buffer. Training data is generated for transmission to the first memory address. The data is transmitted to the first memory address. Information relating to the training data is read from the first memory address and the stored data is read from the buffer and written to the memory area where the training data was transmitted.Type: ApplicationFiled: June 21, 2022Publication date: December 21, 2023Applicant: Advanced Micro Devices, Inc.Inventors: Anwar Kashem, Craig Daniel Eaton, Pouya Najafi Ashtiani, Tsun Ho Liu
-
Publication number: 20230409868Abstract: Activation scaled clipping layers for neural networks are described. An activation scaled clipping layer processes an output of a neuron in a neural network using a scaling parameter and a clipping parameter. The scaling parameter defines how numerical values are amplified relative to zero. The clipping parameter specifies a numerical threshold that causes the neuron output to be expressed as a value defined by the numerical threshold if the neuron output satisfies the numerical threshold. In some implementations, the scaling parameter is linear and treats numbers within a numerical range as being equivalent, such that any number in the range is scaled by a defined magnitude, regardless of value. Alternatively, the scaling parameter is nonlinear, which causes the activation scaled clipping layer to amplify numbers within a range by different magnitudes. Each scaling and clipping parameter is learnable during training of a machine learning model implementing the neural network.Type: ApplicationFiled: June 20, 2022Publication date: December 21, 2023Applicant: Advanced Micro Devices, Inc.Inventors: Hai Xiao, Adam H Li, Harris Eleftherios Gasparakis
-
Patent number: 11847048Abstract: A processing device and methods of controlling remote persistent writes are provided. Methods include receiving an instruction of a program to issue a persistent write to remote memory. The methods also include logging an entry in a local domain when the persistent write instruction is received and providing a first indication that the persistent write will be persisted to the remote memory. The methods also include executing the persistent write to the remote memory and providing a second indication that the persistent write to the remote memory is completed. The methods also include providing the first and second indications when it is determined not to execute the persistent write according to global ordering and providing the second indication without providing the first indication when it is determined to execute the persistent write to remote memory according to global ordering.Type: GrantFiled: September 24, 2020Date of Patent: December 19, 2023Assignee: Advanced Micro Devices, Inc.Inventors: Nuwan Jayasena, Shaizeen Aga
-
Patent number: 11847463Abstract: A processor includes a load/store unit and an execution pipeline to execute an instruction that represents a single-instruction-multiple-data (SIMD) operation, and which references a memory block storing operand data for one or more lanes of a plurality of lanes and a mask vector indicating which lanes of a plurality of lanes are enabled and which are disabled for the operation. The execution pipeline executes an instruction in a first execution mode unless a memory fault is generated during execution of the instruction in the first execution mode. In response to the memory fault, the execution pipeline re-executes the instruction in a second execution mode. In the first execution mode, a single load operation is attempted to access the memory block via the load/store unit. In the second execution mode, a separate load operation is performed by the load/store unit for each enabled lane of the plurality of lanes prior to executing the SIMD operation.Type: GrantFiled: September 27, 2019Date of Patent: December 19, 2023Assignee: Advanced Micro Devices, Inc.Inventors: Kai Troester, Scott Thomas Bingham, John M. King, Michael Estlick, Erik Swanson, Robert Weidner
-
Patent number: 11847462Abstract: A software-based instruction scoreboard indicates dependencies between closely-issued instructions issued to an arithmetic logic unit (ALU) pipeline. The software-based instruction scoreboard inserts one or more control words into the command stream between the dependent instructions, which is then executed by the ALU pipeline. The control words identify the instruction(s) upon which the dependent instructions depend (parent instructions) so that the GPU hardware can ensure that the ALU pipeline does not stall while the dependent instruction waits for results from the parent instruction.Type: GrantFiled: December 15, 2020Date of Patent: December 19, 2023Assignee: Advanced Micro Devices, Inc.Inventor: Brian Emberling
-
Patent number: 11847061Abstract: A technical solution to the technical problem of how to support memory-centric operations on cached data uses a novel memory-centric memory operation that invokes write back functionality on cache controllers and memory controllers. The write back functionality enforces selective flushing of dirty, i.e., modified, cached data that is needed for memory-centric memory operations from caches to the completion level of the memory-centric memory operations, and updates the coherence state appropriately at each cache level. The technical solution ensures that commands to implement the selective cache flushing are ordered before the memory-centric memory operation at the completion level of the memory-centric memory operation.Type: GrantFiled: July 26, 2021Date of Patent: December 19, 2023Assignee: Advanced Micro Devices, Inc.Inventors: Shaizeen Aga, Nuwan Jayasena, John Kalamatianos
-
Patent number: 11848269Abstract: A system and method for creating layout for standard cells are described. In various implementations, a floating metal net in the metal zero layer of a standard cell is selected for conversion to a power rail. The metal zero layer is a lowest metal layer above the gate region of a transistor. A semiconductor process (or process) forms a power rail in a metal zero track reserved for power rails. The process forms another power rail in a metal zero track reserved for floating metal nets, and electrically shorts the two power rails using a local interconnect layer between the two power rails. The charging and discharging times of a source region physically connected to the two power rails decreases.Type: GrantFiled: October 4, 2021Date of Patent: December 19, 2023Assignee: Advanced Micro Devices, Inc.Inventors: Partha Pratim Ghosh, Pratap Kumar Das, Prasanth M
-
Patent number: 11847055Abstract: A technical solution to the technical problem of how to reduce the undesirable side effects of offloading computations to memory uses read hints to preload results of memory-side processing into a processor-side cache. A cache controller, in response to identifying a read hint in a memory-side processing instruction, causes results of the memory-side processing to be preloaded into a processor-side cache. Implementations include, without limitation, enabling or disabling the preloading based upon cache thrashing levels, preloading results, or portions of results, of memory-side processing to particular destination caches, preloading results based upon priority and/or degree of confidence, and/or during periods of low data bus and/or command bus utilization, last stores considerations, and enforcing an ordering constraint to ensure that preloading occurs after memory-side processing results are complete.Type: GrantFiled: June 30, 2021Date of Patent: December 19, 2023Assignee: Advanced Micro Devices, Inc.Inventors: Shaizeen Aga, Nuwan Jayasena
-
Patent number: 11847062Abstract: In response to eviction of a first clean data block from an intermediate level of cache in a multi-cache hierarchy of a processing system, a cache controller accesses an address of the first clean data block. The controller initiates a fetch of the first clean data block from a system memory into a last-level cache using the accessed address.Type: GrantFiled: December 16, 2021Date of Patent: December 19, 2023Assignee: Advanced Micro Devices, Inc.Inventors: Tarun Nakra, Jay Fleischman, Gautam Tarasingh Hazari, Akhil Arunkumar, William L. Walker, Gabriel H. Loh, John Kalamatianos, Marko Scrbak
-
Publication number: 20230401159Abstract: A method and system for providing memory in a computer system. The method includes receiving a memory access request for a shared memory address from a processor, mapping the received memory access request to at least one virtual memory pool to produce a mapping result, and providing the mapping result to the processor.Type: ApplicationFiled: August 24, 2023Publication date: December 14, 2023Applicants: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Anthony Asaro, Kevin Normoyle, Mark Hummel