Patents Assigned to Advanced Micro Device, Inc.
-
Publication number: 20240112297Abstract: Methods and devices are provided for processing image data on a sub-frame portion basis using layers of a convolutional neural network. The processing device comprises memory and a processor. The processor is configured to determine, for an input tile of an image, a receptive field via backward propagation and determine a size of the input tile based on the receptive field and an amount of local memory allocated to store data for the input tile. The processor determines whether the amount of local memory allocated to store the data of the input tile and padded data for the receptive field.Type: ApplicationFiled: September 30, 2022Publication date: April 4, 2024Applicants: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Tung Chuen Kwong, Ying Liu, Akila Subramaniam
-
Publication number: 20240112747Abstract: A memory controller includes a first arbiter for selecting memory commands for dispatch to a memory over a first channel, a second arbiter for selecting memory commands for dispatch to the memory over a second channel, and a test circuit. The test circuit generates a respective testing sequence of read commands and write commands for each of the first channel and second channel, and causes the testing sequences to be transmitted over the first and second channels at least partially overlapping in time without selection by the first or second arbiters.Type: ApplicationFiled: September 30, 2022Publication date: April 4, 2024Applicant: Advanced Micro Devices, Inc.Inventors: Tahsin Askar, Naveen Davanam, Kedarnath Balakrishnan, Kevin M. Brandl, James R. Magro
-
Publication number: 20240111622Abstract: A disclosed method can include (i) reporting, by a microcontroller, detection of a violation of a physical infrastructure constraint to a machine check architecture, (ii) triggering, by the machine check architecture in response to the reporting, a machine-check exception such that the violation of the physical infrastructure constraint is recorded, and (iii) performing a corrective action based on the triggering of the machine-check exception. Various other apparatuses, systems, and methods are also disclosed.Type: ApplicationFiled: September 30, 2022Publication date: April 4, 2024Applicant: Advanced Micro Devices, IncInventors: Siddharth K. Shah, Vilas Sridharan, Amitabh Mehra, Anil Harwani, William Fischofer
-
Publication number: 20240111421Abstract: Connection modification based on traffic pattern is described. In accordance with the described techniques, a traffic pattern of memory operations across a set of connections between at least one device and at least one memory is monitored. The traffic pattern is then compared to a threshold traffic pattern condition, such as an amount of data traffic in different directions across the connections. A traffic direction of at least one connection of the set of connections is modified based on the traffic pattern corresponding to the threshold traffic pattern condition.Type: ApplicationFiled: September 30, 2022Publication date: April 4, 2024Applicant: Advanced Micro Devices, Inc.Inventors: Nathaniel Morris, Kevin Yu-Cheng Cheng, Atul Kumar Sujayendra Sandur, Sergey Blagodurov
-
Publication number: 20240111355Abstract: Methods and systems are disclosed for reducing power consumption by a system including a digital unit and an optical unit. Techniques disclosed comprise generating a workload signature of an incoming workload to be executed by the system. Based on the generated workload signature, techniques disclosed comprise matching the incoming workload with a profile of stored workload profiles. The workload profiles are generated by a trace capture unit. Based on the associated profile, a task submission transaction is sent to the optical unit of the system, representative of a request to execute the incoming workload by the optical unit.Type: ApplicationFiled: September 29, 2022Publication date: April 4, 2024Applicant: Advanced Micro Devices, Inc.Inventors: Sergey Blagodurov, Kevin Y. Cheng, SeyedMohammad SeyedzadehDelcheh, Masab Ahmad
-
Patent number: 11947487Abstract: Methods and systems are disclosed for performing dataflow execution by an accelerated processing unit (APU). Techniques disclosed include decoding information from one or more dataflow instructions. The decoded information is associated with dataflow execution of a computational task. Techniques disclosed further include configuring, based on the decoded information, dataflow circuitry, and, then, executing the dataflow execution of the computational task using the dataflow circuitry.Type: GrantFiled: June 28, 2022Date of Patent: April 2, 2024Assignee: Advanced Micro Devices, Inc.Inventors: Johnathan Robert Alsop, Karthik Ramu Sangaiah, Anthony T. Gutierrez
-
Patent number: 11947456Abstract: Techniques for invalidating cache lines are provided. The techniques include issuing, to a first level of a memory hierarchy, a weak exclusive read request for a speculatively executing store instruction; determining whether to invalidate one or more cache lines associated with the store instruction in one or more memories; and issuing the weak invalidation request to additional levels of the memory hierarchy.Type: GrantFiled: September 30, 2021Date of Patent: April 2, 2024Assignee: Advanced Micro Devices, Inc.Inventor: Paul J. Moyer
-
Patent number: 11947473Abstract: Systems, apparatuses, and methods for implementing duplicated registers for access by initiators across multiple semiconductor dies are disclosed. A system includes multiple initiators on multiple semiconductor dies of a chiplet processor. One of the semiconductor dies is the master die, and this master die has copies of registers which can be accessed by the multiple initiators on the multiple semiconductor dies. When a given initiator on a given secondary die generates a register access, the register access is routed to the master die and a particular duplicate copy of the register maintained for the given secondary die. From the point of view of software, the multiple semiconductor dies appear as a single die, and the multiple initiators appear as a single initiator. Multiple types of registers can be maintained by the master die, with a flush register being one of the register types.Type: GrantFiled: October 12, 2021Date of Patent: April 2, 2024Assignees: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Haikun Dong, Kostantinos Danny Christidis, Ling-Ling Wang, MinHua Wu, Gaojian Cong, Rui Wang
-
Patent number: 11948251Abstract: A processing system includes hull shader circuitry that launches thread groups including one or more primitives. The hull shader circuitry also generates tessellation factors that indicate subdivisions of the primitives. The processing system also includes throttling circuitry that estimates a primitive launch time interval for the domain shader based on the tessellation factors and selectively throttles launching of the thread groups from the hull shader circuitry based on the primitive launch time interval of the domain shader and a hull shader latency. In some cases, the throttling circuitry includes a first counter that is incremented in response to launching a thread group from the buffer and a second counter that modifies the first counter based on a measured latency of the domain shader.Type: GrantFiled: October 26, 2022Date of Patent: April 2, 2024Assignee: Advanced Micro Devices, Inc.Inventor: Nishank Pathak
-
Patent number: 11947380Abstract: Systems and methods related to controlling clock signals for clocking shader engines modules (SEs) and non-shader-engine modules (nSEs) of a graphics processing unit (GPU) are provided. One or more dividers receive a clock signal CLK and output a clock signal CLKA to the SEs and output a clock signal CLKB to the nSEs. The frequencies of CLKA and CLKB are independently selected based on sets of performance counter data monitored at the SEs and nSEs, respectively. The clock signal frequency for either the SEs or the nSEs is reduced when the corresponding sets of performance counter data indicates a comparatively lower processing workload for the SEs or for the nSEs.Type: GrantFiled: August 18, 2022Date of Patent: April 2, 2024Assignee: Advanced Micro Devices, Inc.Inventors: Ranjith Kumar Sajja, Sreekanth Godey, Anirudh R. Acharya
-
Patent number: 11948000Abstract: Systems, apparatuses, and methods for performing command buffer gang submission are disclosed. A system includes at least first and second processors and a memory. The first processor (e.g., CPU) generates a command buffer and stores the command buffer in the memory. A mechanism is implemented where a granularity of work provided to the second processor (e.g., GPU) is increased which, in turn, increases the opportunities for parallel work. In gang submission mode, the user-mode driver (UMD) specifies a set of multiple queues and command buffers to execute on those multiple queues, and that work is guaranteed to execute as a single unit from the GPU operating system scheduler point of view. Using gang submission, synchronization between command buffers executing on multiple queues in the same submit is safe. This opens up optimization opportunities for application use (explicit gang submission) and for internal driver use (implicit gang submission).Type: GrantFiled: March 31, 2021Date of Patent: April 2, 2024Assignee: Advanced Micro Devices, Inc.Inventors: Mitchell Howard Singer, Derrick Trevor Owens
-
Patent number: 11948073Abstract: Systems, apparatuses, and methods for adaptively mapping a machine learning model to a multi-core inference accelerator engine are disclosed. A computing system includes a multi-core inference accelerator engine with multiple inference cores coupled to a memory subsystem. The system also includes a control unit which determines how to adaptively map a machine learning model to the multi-core inference accelerator engine. In one implementation, the control unit selects a mapping scheme which minimizes the memory bandwidth utilization of the multi-core inference accelerator engine. In one implementation, this mapping scheme involves having one inference core of the multi-core inference accelerator engine fetch given data and broadcast the given data to other inference cores of the inference accelerator engine. Each inference core fetches second data unique to the respective inference core.Type: GrantFiled: August 30, 2018Date of Patent: April 2, 2024Assignees: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Lei Zhang, Sateesh Lagudu, Allen Rush
-
Patent number: 11947833Abstract: A method and apparatus for training data in a computer system includes reading data stored in a first memory address in a memory and writing it to a buffer. Training data is generated for transmission to the first memory address. The data is transmitted to the first memory address. Information relating to the training data is read from the first memory address and the stored data is read from the buffer and written to the memory area where the training data was transmitted.Type: GrantFiled: June 21, 2022Date of Patent: April 2, 2024Assignee: Advanced Micro Devices, Inc.Inventors: Anwar Kashem, Craig Daniel Eaton, Pouya Najafi Ashtiani, Tsun Ho Liu
-
Patent number: 11947476Abstract: Methods and systems are disclosed for cross-chiplet performance data streaming. Techniques disclosed include accumulating, by a subservient chiplet, event data associated with an event indicative of a performance aspect of the subservient chiplet; sending, by the subservient chiplet, the event data over a chiplet bus to a master chiplet; and adding, by the master chiplet, the received event data to an event record, the event record containing previously received, from the subservient chiplet over the chiplet bus, event data associated with the event.Type: GrantFiled: March 31, 2022Date of Patent: April 2, 2024Assignee: Advanced Micro Devices, Inc.Inventors: Bryan Broussard, Pravesh Gupta, Benjamin Tsien, Vydhyanathan Kalyanasundharam
-
Patent number: 11948223Abstract: Methods and systems are described. A system includes a redundant shader pipe array that performs rendering calculations on data provided thereto and a shader pipe array that includes a plurality of shader pipes, each of which performs rendering calculations on data provided thereto. The system also includes a circuit that identifies a defective shader pipe of the plurality of shader pipes in the shader pipe array. In response to identifying the defective shader pipe, the circuit generates a signal. The system also includes a redundant shader switch. The redundant shader switch receives the generated signal, and, in response to receiving the generated signal, transfers the data for the defective shader pipe to the redundant shader pipe array.Type: GrantFiled: July 11, 2022Date of Patent: April 2, 2024Assignee: Advanced Micro Devices, Inc.Inventors: Michael J. Mantor, Jeffrey T. Brady, Angel E. Socarras
-
Publication number: 20240106813Abstract: A method and system for distributing keys in a key distribution system includes receiving a connection for communication from a first component. A determination is made whether the first component requires a key be generated and distributed. Based upon a security mode for the communication, the key generated and distributed to the first component.Type: ApplicationFiled: September 28, 2022Publication date: March 28, 2024Applicants: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Norman Vernon Douglas Stewart, Mihir Shaileshbhai Doctor, Omar Fakhri Ahmed, Hemaprabhu Jayanna, John Traver
-
Publication number: 20240104023Abstract: A/D bit storage, processing, and mode management techniques through use of a dense A/D bit representation are described. In one example, a memory management unit employs an A/D bit representation generation module to generate the dense A/D bit representation. In an implementation, the A/D bit representation is stored adjacent to existing page table structures of the multilevel page table hierarchy. In another example, memory management unit supports use of modes as part of A/D bit storage.Type: ApplicationFiled: September 26, 2022Publication date: March 28, 2024Applicant: Advanced Micro Devices, Inc.Inventor: William A. Moyes
-
Publication number: 20240104844Abstract: Devices and methods for multi-resolution geometric representation for ray tracing are described which include casting a ray in a space comprising objects represented by geometric shapes and approximating a volume of the geometric shapes using an accelerated hierarchy structure. The accelerated hierarchy structure comprises first nodes each representing a volume of one of the geometric shapes in the space and second nodes each representing an approximate volume of a group of the geometric shapes. When the ray is determined to intersect a bounding box of a second node representing one group of the geometric shapes, a selection is made between traversal and non-traversal of other second nodes based on a LOD for representing the volume of the one group of geometric shapes.Type: ApplicationFiled: September 28, 2022Publication date: March 28, 2024Applicants: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Sho Ikeda, Paritosh Vijay Kulkarni, Takahiro Harada
-
Publication number: 20240104685Abstract: Devices and methods method of tiled rendering are provided which comprises dividing a frame to be rendered, into a plurality of tiles, receiving commands to execute a plurality of subpasses of the tiles, interleaving execution of same subpasses of multiple tiles of the frame by executing one or more subpasses as skip operations, storing visibility data, for subsequently ordered subpasses of the tiles, at memory addresses allocated for data of corresponding adjacent tiles in a first direction of traversal and rendering the tiles for the subsequently ordered subpasses using the visibility data stored at the memory addresses allocated for corresponding adjacent tiles in a second direction of traversal, opposite the first direction of traversal.Type: ApplicationFiled: September 28, 2022Publication date: March 28, 2024Applicant: Advanced Micro Devices, Inc.Inventors: Ruijin Wu, Michael John Livesley, Kiia Kallio, Jan H. Achrenius, Mika Tuomi
-
Publication number: 20240103719Abstract: Generating optimization instructions for data processing pipelines is described. A pipeline optimization system computes resource usage information that describes memory and compute usage metrics during execution of each stage of the data processing pipeline. The system additionally generates data storage information that describes how data output by each pipeline stage is utilized by other stages of the pipeline. The pipeline optimization system then generates the optimization instructions to control how memory operations are performed for a specific data processing pipeline during execution. In implementations, the optimization instructions cause a memory system to discard data (e.g., invalidate cache entries) without copying the discarded data to another storage location after the data is no longer needed by the pipeline. The optimization instructions alternatively or additionally control at least one of evicting, writing-back, or prefetching data to minimize latency during pipeline execution.Type: ApplicationFiled: September 28, 2022Publication date: March 28, 2024Applicant: Advanced Micro Devices, Inc.Inventor: Harris Eleftherios Gasparakis