Patents Assigned to Advanced Micro Device, Inc.
-
Patent number: 11948000Abstract: Systems, apparatuses, and methods for performing command buffer gang submission are disclosed. A system includes at least first and second processors and a memory. The first processor (e.g., CPU) generates a command buffer and stores the command buffer in the memory. A mechanism is implemented where a granularity of work provided to the second processor (e.g., GPU) is increased which, in turn, increases the opportunities for parallel work. In gang submission mode, the user-mode driver (UMD) specifies a set of multiple queues and command buffers to execute on those multiple queues, and that work is guaranteed to execute as a single unit from the GPU operating system scheduler point of view. Using gang submission, synchronization between command buffers executing on multiple queues in the same submit is safe. This opens up optimization opportunities for application use (explicit gang submission) and for internal driver use (implicit gang submission).Type: GrantFiled: March 31, 2021Date of Patent: April 2, 2024Assignee: Advanced Micro Devices, Inc.Inventors: Mitchell Howard Singer, Derrick Trevor Owens
-
Patent number: 11947473Abstract: Systems, apparatuses, and methods for implementing duplicated registers for access by initiators across multiple semiconductor dies are disclosed. A system includes multiple initiators on multiple semiconductor dies of a chiplet processor. One of the semiconductor dies is the master die, and this master die has copies of registers which can be accessed by the multiple initiators on the multiple semiconductor dies. When a given initiator on a given secondary die generates a register access, the register access is routed to the master die and a particular duplicate copy of the register maintained for the given secondary die. From the point of view of software, the multiple semiconductor dies appear as a single die, and the multiple initiators appear as a single initiator. Multiple types of registers can be maintained by the master die, with a flush register being one of the register types.Type: GrantFiled: October 12, 2021Date of Patent: April 2, 2024Assignees: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Haikun Dong, Kostantinos Danny Christidis, Ling-Ling Wang, MinHua Wu, Gaojian Cong, Rui Wang
-
Patent number: 11948251Abstract: A processing system includes hull shader circuitry that launches thread groups including one or more primitives. The hull shader circuitry also generates tessellation factors that indicate subdivisions of the primitives. The processing system also includes throttling circuitry that estimates a primitive launch time interval for the domain shader based on the tessellation factors and selectively throttles launching of the thread groups from the hull shader circuitry based on the primitive launch time interval of the domain shader and a hull shader latency. In some cases, the throttling circuitry includes a first counter that is incremented in response to launching a thread group from the buffer and a second counter that modifies the first counter based on a measured latency of the domain shader.Type: GrantFiled: October 26, 2022Date of Patent: April 2, 2024Assignee: Advanced Micro Devices, Inc.Inventor: Nishank Pathak
-
Patent number: 11947456Abstract: Techniques for invalidating cache lines are provided. The techniques include issuing, to a first level of a memory hierarchy, a weak exclusive read request for a speculatively executing store instruction; determining whether to invalidate one or more cache lines associated with the store instruction in one or more memories; and issuing the weak invalidation request to additional levels of the memory hierarchy.Type: GrantFiled: September 30, 2021Date of Patent: April 2, 2024Assignee: Advanced Micro Devices, Inc.Inventor: Paul J. Moyer
-
Publication number: 20240104685Abstract: Devices and methods method of tiled rendering are provided which comprises dividing a frame to be rendered, into a plurality of tiles, receiving commands to execute a plurality of subpasses of the tiles, interleaving execution of same subpasses of multiple tiles of the frame by executing one or more subpasses as skip operations, storing visibility data, for subsequently ordered subpasses of the tiles, at memory addresses allocated for data of corresponding adjacent tiles in a first direction of traversal and rendering the tiles for the subsequently ordered subpasses using the visibility data stored at the memory addresses allocated for corresponding adjacent tiles in a second direction of traversal, opposite the first direction of traversal.Type: ApplicationFiled: September 28, 2022Publication date: March 28, 2024Applicant: Advanced Micro Devices, Inc.Inventors: Ruijin Wu, Michael John Livesley, Kiia Kallio, Jan H. Achrenius, Mika Tuomi
-
Publication number: 20240104023Abstract: A/D bit storage, processing, and mode management techniques through use of a dense A/D bit representation are described. In one example, a memory management unit employs an A/D bit representation generation module to generate the dense A/D bit representation. In an implementation, the A/D bit representation is stored adjacent to existing page table structures of the multilevel page table hierarchy. In another example, memory management unit supports use of modes as part of A/D bit storage.Type: ApplicationFiled: September 26, 2022Publication date: March 28, 2024Applicant: Advanced Micro Devices, Inc.Inventor: William A. Moyes
-
Publication number: 20240106782Abstract: In accordance with described techniques for filtered responses to memory operation messages, a computing system or computing device includes a memory system that receives messages. A filter component in the memory system receives the responses to the memory operation messages, and filters one or more of the responses based on a filterable condition. A tracking logic component tracks the one or more responses as filtered responses for communication completion.Type: ApplicationFiled: September 28, 2022Publication date: March 28, 2024Applicant: Advanced Micro Devices, Inc.Inventors: Johnathan Robert Alsop, Shaizeen Dilawarhusen Aga, MOHAMED ASSEM ABD ELMOHSEN IBRAHIM
-
Publication number: 20240103897Abstract: Systems and methods are disclosed for managing diversified virtual memory by an engine. Techniques disclosed include receiving one or more request messages, each request message including a job descriptor that specifies an operation to be performed on a respective virtual memory space, processing the job descriptors by generating one or more commands for transmission to one or more virtual memory managers, and transmitting the one or more commands to the one or more virtual memory managers (VMMs) for processing.Type: ApplicationFiled: September 27, 2022Publication date: March 28, 2024Applicants: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Norman Vernon Douglas Stewart, Mihir Shaileshbhai Doctor, Omar Fakhri Ahmed
-
Publication number: 20240104844Abstract: Devices and methods for multi-resolution geometric representation for ray tracing are described which include casting a ray in a space comprising objects represented by geometric shapes and approximating a volume of the geometric shapes using an accelerated hierarchy structure. The accelerated hierarchy structure comprises first nodes each representing a volume of one of the geometric shapes in the space and second nodes each representing an approximate volume of a group of the geometric shapes. When the ray is determined to intersect a bounding box of a second node representing one group of the geometric shapes, a selection is made between traversal and non-traversal of other second nodes based on a LOD for representing the volume of the one group of geometric shapes.Type: ApplicationFiled: September 28, 2022Publication date: March 28, 2024Applicants: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Sho Ikeda, Paritosh Vijay Kulkarni, Takahiro Harada
-
Publication number: 20240103719Abstract: Generating optimization instructions for data processing pipelines is described. A pipeline optimization system computes resource usage information that describes memory and compute usage metrics during execution of each stage of the data processing pipeline. The system additionally generates data storage information that describes how data output by each pipeline stage is utilized by other stages of the pipeline. The pipeline optimization system then generates the optimization instructions to control how memory operations are performed for a specific data processing pipeline during execution. In implementations, the optimization instructions cause a memory system to discard data (e.g., invalidate cache entries) without copying the discarded data to another storage location after the data is no longer needed by the pipeline. The optimization instructions alternatively or additionally control at least one of evicting, writing-back, or prefetching data to minimize latency during pipeline execution.Type: ApplicationFiled: September 28, 2022Publication date: March 28, 2024Applicant: Advanced Micro Devices, Inc.Inventor: Harris Eleftherios Gasparakis
-
Publication number: 20240103739Abstract: Multi-level cell memory management techniques are described. In one example, the memory controller is configured to control whether a single-level cell operation or a multi-level cell operation to be used using different mapping schemes. The single-level cell operation, for instance, is usable to store a data word using two states whereas the multi-level cell operation is usable to store the data word by also using an intermediate state. In order to store the data word using two states, the memory controller is configurable to separate the data word across two word lines in the physical memory. In an implementation, use of the different operations and corresponding mapping schemes by the memory controller alternates between adjacent word lines in physical memory.Type: ApplicationFiled: September 25, 2022Publication date: March 28, 2024Applicant: Advanced Micro Devices, Inc.Inventor: SeyedMohammad SeyedzadehDelcheh
-
Publication number: 20240103745Abstract: A memory controller coupled to a memory module receives both processing-in-memory (PIM) requests and memory requests from a host (e.g., a host processor). The memory controller issues PIM requests to one group of memory banks and concurrently issues memory requests to one or more other groups of memory banks. Accordingly, memory requests are performed on groups of memory banks that would otherwise be idle while PIM requests are performed on the one group of memory banks. Optionally, the memory controller coupled to the memory module also takes various actions when switching between operating in a PIM mode and a non-processing-in-memory mode to reduce or hide overhead when switching between the two modes.Type: ApplicationFiled: September 28, 2022Publication date: March 28, 2024Applicant: Advanced Micro Devices, Inc.Inventors: Niti Madan, Johnathan Robert Alsop, Alexandru Dutu, Mahzabeen Islam, Yasuko Eckert, Nuwan S Jayasena
-
Publication number: 20240103860Abstract: Predicates for processing in memory is described. In accordance with the described techniques, a predicate instruction to compute a conditional value based on data stored in a memory is provided to a processing-in-memory component. A response that includes the conditional value computed by the processing-in-memory component is received, and the conditional value is stored in a predicate register. One or more conditional instructions are provided to the processing-in-memory component based on the conditional value stored in the predicate register.Type: ApplicationFiled: September 26, 2022Publication date: March 28, 2024Applicant: Advanced Micro Devices, Inc.Inventor: Nuwan S. Jayasena
-
Publication number: 20240103763Abstract: In accordance with the described techniques for bank-level parallelism for processing in memory, a plurality of commands are received for execution by a processing in memory component embedded in a memory. The memory includes a first bank and a second bank. The plurality of commands include a first stream of commands which cause the processing in memory component to perform operations that access the first bank and a second stream of commands which cause the processing in memory component to perform operations that access the second bank. A next row of the first bank that is to be accessed by the processing in memory component is identified. Further, a precharge command is scheduled to close a first row of the first bank and an activate command is scheduled to open the next row of the first bank in parallel with execution of the second stream of commands.Type: ApplicationFiled: September 27, 2022Publication date: March 28, 2024Applicant: Advanced Micro Devices, Inc.Inventors: Mahzabeen Islam, Shaizeen Dilawarhusen Aga, Johnathan Robert Alsop, MOHAMED ASSEM ABD ELMOHSEN IBRAHIM, Nuwan S Jayasena
-
Publication number: 20240106438Abstract: An integrated circuit includes a power supply monitor, a clock generator, and a divider. The power supply monitor is operable to provide a trigger signal in response to a power supply voltage dropping below a threshold voltage. The clock generator is operable to provide a first clock signal having a frequency dependent on a value of a frequency control word, and to change the frequency of the first clock signal over time using a native slope in response to a change in the frequency control word. The divider is responsive to an assertion of the trigger signal to divide a frequency of the first clock signal by a divide value to provide a second clock signal.Type: ApplicationFiled: November 30, 2023Publication date: March 28, 2024Applicants: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Kaushik Mazumdar, Ashish Jain, Joyce Cheuk Wai Wong, Mikhail Rodionov
-
Publication number: 20240103879Abstract: Block data load with transpose techniques are described. In one example, an input is received, at a control unit, specifying an instruction to load a block of data to at least one memory module using a transpose operation. Responsive to the receiving the input by the control unit, the block of data is caused to be loaded to the at least one memory module by transposing the block of data to form a transposed block of data and storing the transposed block of data in the at least one memory.Type: ApplicationFiled: September 25, 2022Publication date: March 28, 2024Applicant: Advanced Micro Devices, Inc.Inventors: Bin He, Michael John Mantor, Brian Emberling, Liang Huang, Chao Liu
-
Publication number: 20240106813Abstract: A method and system for distributing keys in a key distribution system includes receiving a connection for communication from a first component. A determination is made whether the first component requires a key be generated and distributed. Based upon a security mode for the communication, the key generated and distributed to the first component.Type: ApplicationFiled: September 28, 2022Publication date: March 28, 2024Applicants: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Norman Vernon Douglas Stewart, Mihir Shaileshbhai Doctor, Omar Fakhri Ahmed, Hemaprabhu Jayanna, John Traver
-
Publication number: 20240104015Abstract: In accordance with the described techniques for data compression and decompression for processing in memory, a page address is received by a processing in memory component that maps to a first location in memory where data of a page is maintained. The data of the page is compressed by the processing in memory component. Further, compressed data of the page is written by the processing in memory component to a compressed block device responsive to the compressed data satisfying one or more compressibility criteria. The compressed block device is a portion of the memory dedicated to storing data in a compressed form.Type: ApplicationFiled: September 26, 2022Publication date: March 28, 2024Applicant: Advanced Micro Devices, Inc.Inventors: Kishore Punniyamurthy, Jagadish B Kotra
-
Publication number: 20240103730Abstract: In accordance with described techniques for reduction of parallel memory operation messages, a computing system or computing device includes a memory system that receives memory operation messages. A shared response component in the memory system receives responses to the memory operation messages, and identifies a set of the responses that are coalesceable. The shared response component then coalesces the set of the responses into a combined message for communication completion through a communication path in the memory system.Type: ApplicationFiled: September 28, 2022Publication date: March 28, 2024Applicant: Advanced Micro Devices, Inc.Inventors: Johnathan Robert Alsop, Shaizeen Dilawarhusen Aga, Mohamed Assem Abd ElMohsen Ibrahim
-
Patent number: 11941723Abstract: Systems, methods, and techniques dynamically utilize load balancing for workgroup assignments between a group of shader engines by a command processor of a graphics processing unit (GPU). Based on one or more commands received for execution, a plurality of workgroups is generated for assignment to a plurality of shader engines for processing, each shader engine including a respective quantity of active compute units. Each workgroup of the plurality of workgroups is dynamically assigned to a respective shader engine for execution based at least in part on indications of available resources respectively associated with each of the shader engines. In various embodiments, the indications of available resources may include physical parameters regarding each shader engine, as well as current status information regarding the processing of workgroups assigned to each shader engine.Type: GrantFiled: December 29, 2021Date of Patent: March 26, 2024Assignee: Advanced Micro Devices, Inc.Inventors: Randy Ramsey, Yash Ukidave