Patents by Inventor Mark Fowler
Mark Fowler has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11074075Abstract: Systems, apparatuses, and methods for maintaining separate pending load and store counters are disclosed herein. In one embodiment, a system includes at least one execution unit, a memory subsystem, and a pair of counters for each thread of execution. In one embodiment, the system implements a software based approach for managing dependencies between instructions. In one embodiment, the execution unit(s) maintains counters to support the software-based approach for managing dependencies between instructions. The execution unit(s) are configured to execute instructions that are used to manage the dependencies during run-time. In one embodiment, the execution unit(s) execute wait instructions to wait until a given counter is equal to a specified value before continuing to execute the instruction sequence.Type: GrantFiled: February 24, 2017Date of Patent: July 27, 2021Assignee: Advanced Micro Devices, Inc.Inventors: Mark Fowler, Brian D. Emberling
-
Patent number: 10991146Abstract: A processor receives a request to access one or more levels of a partially resident texture (PRT) resource. The levels represent a texture at different levels of detail (LOD) and the request includes normalized coordinates indicating a location in the texture. The processor accesses a texture descriptor that includes dimensions of a first level of the levels and one or more offsets between a reference level and one or more second levels that are associated with one or more residency maps that indicate texels that are resident in the PRT resource. The processor translates the normalized coordinates to texel coordinates in the one or more residency maps based on the offset and accesses, in response to the request, the one or more residency maps based on the texel coordinates to determine whether texture data indicated by the normalized coordinates is resident in the PRT resource.Type: GrantFiled: December 20, 2019Date of Patent: April 27, 2021Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Maxim V. Kazakov, Mark Fowler
-
Patent number: 10943389Abstract: Techniques for removing or identifying overlapping fragments in a fragment stream after z-culling are disclosed. The techniques include maintaining a first-in-first-out buffer that stores post-z-cull fragments. Each time a new fragment is received at the buffer, the screen position of the fragment is checked against all other fragments in the buffer. If the screen position of the fragment matches the screen position of a fragment in the buffer, then the fragment in the buffer is removed or marked as overlapping. If the screen position of the fragment does not match the screen position of any fragment in the buffer, then no modification is performed to fragments already in the buffer. In either case, he fragment is added to the buffer. The contents of the buffer are transmitted to the pixel shader for pixel shading at a later time.Type: GrantFiled: December 9, 2016Date of Patent: March 9, 2021Assignees: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Laurent Lefebvre, Michael Mantor, Mark Fowler, Mikko Alho, Mika Tuomi, Kiia Kallio, Patrick Klas Rudolf Buss, Jari Antero Komppa, Kaj Tuomi, Christopher J. Brennan
-
Publication number: 20210011760Abstract: Systems, apparatuses, and methods for abstracting tasks in virtual memory identifier (VMID) containers are disclosed. A processor coupled to a memory executes a plurality of concurrent tasks including a first task. Responsive to detecting one or more instructions of the first task which correspond to a first operation, the processor retrieves a first identifier (ID) which is used to uniquely identify the first task, wherein the first ID is transparent to the first task. Then, the processor maps the first ID to a second ID and/or a third ID. The processor completes the first operation by using the second ID and/or the third ID to identify the first task to at least a first data structure. In one implementation, the first operation is a memory access operation and the first data structure is a set of page tables. Also, in one implementation, the second ID identifies a first application of the first task and the third ID identifies a first operating system (OS) of the first task.Type: ApplicationFiled: July 24, 2020Publication date: January 14, 2021Inventors: Anirudh R. Acharya, Michael J. Mantor, Rex Eldon McCrary, Anthony Asaro, Jeffrey Gongxian Cheng, Mark Fowler
-
Publication number: 20200310684Abstract: A processing unit is configured to access a first memory that supports atomic operations and a second memory via an interface. The second memory or the interface does not support atomicity of the atomic operations. A trap handler is configured to trap atomic operations and enforce atomicity of the trapped atomic operations. The processing unit selectively provides atomic operations to the trap handler in response to detecting that memory access requests in the atomic operations are directed to the second memory via the interface. In some cases, the processing unit detects a frequency of traps that result from atomic operations that include memory access requests to a page stored in the second memory. The processing unit transfers the page from the second memory to the first memory in response to the trap frequency exceeding a threshold.Type: ApplicationFiled: March 29, 2019Publication date: October 1, 2020Inventor: Mark FOWLER
-
Publication number: 20200250877Abstract: A processor receives a request to access one or more levels of a partially resident texture (PRT) resource. The levels represent a texture at different levels of detail (LOD) and the request includes normalized coordinates indicating a location in the texture. The processor accesses a texture descriptor that includes dimensions of a first level of the levels and one or more offsets between a reference level and one or more second levels that are associated with one or more residency maps that indicate texels that are resident in the PRT resource. The processor translates the normalized coordinates to texel coordinates in the one or more residency maps based on the offset and accesses, in response to the request, the one or more residency maps based on the texel coordinates to determine whether texture data indicated by the normalized coordinates is resident in the PRT resource.Type: ApplicationFiled: December 20, 2019Publication date: August 6, 2020Inventors: Maxim V. KAZAKOV, Mark FOWLER
-
Patent number: 10725822Abstract: Systems, apparatuses, and methods for abstracting tasks in virtual memory identifier (VMID) containers are disclosed. A processor coupled to a memory executes a plurality of concurrent tasks including a first task. Responsive to detecting one or more instructions of the first task which correspond to a first operation, the processor retrieves a first identifier (ID) which is used to uniquely identify the first task, wherein the first ID is transparent to the first task. Then, the processor maps the first ID to a second ID and/or a third ID. The processor completes the first operation by using the second ID and/or the third ID to identify the first task to at least a first data structure. In one implementation, the first operation is a memory access operation and the first data structure is a set of page tables. Also, in one implementation, the second ID identifies a first application of the first task and the third ID identifies a first operating system (OS) of the first task.Type: GrantFiled: July 31, 2018Date of Patent: July 28, 2020Assignees: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Anirudh R. Acharya, Michael J. Mantor, Rex Eldon McCrary, Anthony Asaro, Jeffrey Gongxian Cheng, Mark Fowler
-
Publication number: 20200042348Abstract: Systems, apparatuses, and methods for abstracting tasks in virtual memory identifier (VMID) containers are disclosed. A processor coupled to a memory executes a plurality of concurrent tasks including a first task. Responsive to detecting one or more instructions of the first task which correspond to a first operation, the processor retrieves a first identifier (ID) which is used to uniquely identify the first task, wherein the first ID is transparent to the first task. Then, the processor maps the first ID to a second ID and/or a third ID. The processor completes the first operation by using the second ID and/or the third ID to identify the first task to at least a first data structure. In one implementation, the first operation is a memory access operation and the first data structure is a set of page tables. Also, in one implementation, the second ID identifies a first application of the first task and the third ID identifies a first operating system (OS) of the first task.Type: ApplicationFiled: July 31, 2018Publication date: February 6, 2020Inventors: Anirudh R. Acharya, Michael J. Mantor, Rex Eldon McCrary, Anthony Asaro, Jeffrey Gongxian Cheng, Mark Fowler
-
Patent number: 10540280Abstract: Techniques for performing cache invalidates and write-backs in an accelerated processing device (e.g., a graphics processing device that renders three-dimensional graphics) are disclosed. The techniques involve receiving requests from a “master” (e.g., the central processing unit). The techniques involve invalidating virtual-to-physical address translations in an address translation request. The techniques include splitting up the requests based on whether the requests target virtually or physically tagged caches. Addresses for the portions of a request that target physically tagged caches are translated using invalidated virtual-to-physical address translations for speed. The split up request is processed to generate micro-transactions for individual caches targeted by the request. Micro-transactions for physically and virtually tagged caches are processed in parallel. Once all micro-transactions for a request have been processed, the unit that made the request is notified.Type: GrantFiled: December 23, 2016Date of Patent: January 21, 2020Assignees: ADVANCED MICRO DEVICES, INC., ATI TECHNOLOGIES ULCInventors: Mark Fowler, Jimshed Mirza, Anthony Asaro
-
Patent number: 10540802Abstract: A processor receives a request to access one or more levels of a partially resident texture (PRT) resource. The levels represent a texture at different levels of detail (LOD) and the request includes normalized coordinates indicating a location in the texture. The processor accesses a texture descriptor that includes dimensions of a first level of the levels and one or more offsets between a reference level and one or more second levels that are associated with one or more residency maps that indicate texels that are resident in the PRT resource. The processor translates the normalized coordinates to texel coordinates in the one or more residency maps based on the offset and accesses, in response to the request, the one or more residency maps based on the texel coordinates to determine whether texture data indicated by the normalized coordinates is resident in the PRT resource.Type: GrantFiled: January 31, 2019Date of Patent: January 21, 2020Assignee: Advanced Micro Devices, Inc.Inventors: Maxim V. Kazakov, Mark Fowler
-
Patent number: 10474468Abstract: Systems, apparatuses, and methods for processing variable wavefront sizes on a processor are disclosed. In one embodiment, a processor includes at least a scheduler, cache, and multiple execution units. When operating in a first mode, the processor executes the same instruction on multiple portions of a wavefront before proceeding to the next instruction of the shader program. When operating in a second mode, the processor executes a set of instructions on a first portion of a wavefront. In the second mode, when the processor finishes executing the set of instructions on the first portion of the wavefront, the processor executes the set of instructions on a second portion of the wavefront, and so on until all portions of the wavefront have been processed. The processor determines the operating mode based on one or more conditions.Type: GrantFiled: February 22, 2017Date of Patent: November 12, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Michael J. Mantor, Brian D. Emberling, Mark Fowler, Mark M. Leather
-
Patent number: 10403333Abstract: A memory controller includes a host interface for receiving memory access requests including access addresses, a memory interface for providing memory accesses to a memory system, and an address decoder coupled to the host interface for programmably mapping the access addresses to selected ones of a plurality of regions. The address decoder is programmable to map the access addresses to a first region having a non-power-of-two size using a primary decoder and a secondary decoder each having power-of-two sizes, and providing a first region mapping signal in response. A command queue stores the memory access requests and region mapping signals. An arbiter picks the memory access requests from the command queue based on a plurality of criteria, which are evaluated based in part on the region mapping signals, and provides corresponding memory accesses to the memory interface in response.Type: GrantFiled: July 15, 2016Date of Patent: September 3, 2019Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Kevin M. Brandl, Thomas Hamilton, Hideki Kanayama, Kedarnath Balakrishnan, James R. Magro, Guanhao Shen, Mark Fowler
-
Publication number: 20190122417Abstract: A system, method and a non-transitory computer readable storage medium are provided for hybrid rendering with deferred primitive batch binning. A primitive batch is generated from one or more primitives. A bin is identified for processing the primitive batch. At least a portion of each primitive intersecting the identified bin is processed and a next bin for processing the primitive batch is identified based on an intercept walk order. The processing is iteratively repeated for the one or more primitives in the primitive batch for successive bins until all primitives of the primitive batch are completely processed. Then, the one or more primitives in the primitive batch are further processed.Type: ApplicationFiled: November 2, 2018Publication date: April 25, 2019Applicants: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Michael Mantor, Laurent Lefebvre, Mark Fowler, Timothy Kelley, Mikko Alho, Mika Tuomi, Kiia Kallio, Patrick Klas Rudolf Buss, Jari Antero Komppa, Kaj Tuomi
-
Patent number: 10169906Abstract: A system, method and a computer program product are provided for hybrid rendering with deferred primitive batch binning. A primitive batch is generated from a sequence of primitives. Initial bin intercepts are identified for primitives in the primitive batch. A bin for processing is identified. The bin corresponds to a region of a screen space. Pixels of the primitives intercepting the identified bin are processed. Next bin intercepts are identified while the primitives intercepting the identified bin are processed.Type: GrantFiled: March 29, 2013Date of Patent: January 1, 2019Assignees: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Michael Mantor, Laurent Lefebvre, Mark Fowler, Timothy Kelley, Mikko Alho, Mika Tuomi, Kiia Kallio, Patrick Klas Rudolf Buss, Jari Antero Komppa, Kaj Tuomi
-
Patent number: 10152434Abstract: A system and method for efficient arbitration of memory access requests are described. One or more functional units generate memory access requests for a partitioned memory. An arbitration unit stores the generated requests and selects a given one of the stored requests. The arbitration unit identifies a given partition of the memory which stores a memory location targeted by the selected request. The arbitration unit determines whether one or more other stored requests access memory locations in the given partition. The arbitration unit sends each of the selected memory access request and the identified one or more other memory access requests to the memory to be serviced out of order.Type: GrantFiled: December 20, 2016Date of Patent: December 11, 2018Assignees: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Rostyslav Kyrychynskyi, Anthony Asaro, Kostantinos Danny Christidis, Mark Fowler, Michael J. Mantor, Robert Scott Hartog
-
Publication number: 20180246724Abstract: Systems, apparatuses, and methods for maintaining separate pending load and store counters are disclosed herein. In one embodiment, a system includes at least one execution unit, a memory subsystem, and a pair of counters for each thread of execution. In one embodiment, the system implements a software based approach for managing dependencies between instructions. In one embodiment, the execution unit(s) maintains counters to support the software-based approach for managing dependencies between instructions. The execution unit(s) are configured to execute instructions that are used to manage the dependencies during run-time. In one embodiment, the execution unit(s) execute wait instructions to wait until a given counter is equal to a specified value before continuing to execute the instruction sequence.Type: ApplicationFiled: February 24, 2017Publication date: August 30, 2018Inventors: Mark Fowler, Brian D. Emberling
-
Publication number: 20180239606Abstract: Systems, apparatuses, and methods for processing variable wavefront sizes on a processor are disclosed. In one embodiment, a processor includes at least a scheduler, cache, and multiple execution units. When operating in a first mode, the processor executes the same instruction on multiple portions of a wavefront before proceeding to the next instruction of the shader program. When operating in a second mode, the processor executes a set of instructions on a first portion of a wavefront. In the second mode, when the processor finishes executing the set of instructions on the first portion of the wavefront, the processor executes the set of instructions on a second portion of the wavefront, and so on until all portions of the wavefront have been processed. The processor determines the operating mode based on one or more conditions.Type: ApplicationFiled: February 22, 2017Publication date: August 23, 2018Inventors: Michael J. Mantor, Brian D. Emberling, Mark Fowler, Mark M. Leather
-
Publication number: 20180181488Abstract: Techniques for performing cache invalidates and write-backs in an accelerated processing device (e.g., a graphics processing device that renders three-dimensional graphics) are disclosed. The techniques involve receiving requests from a “master” (e.g., the central processing unit). The techniques involve invalidating virtual-to-physical address translations in an address translation request. The techniques include splitting up the requests based on whether the requests target virtually or physically tagged caches. Addresses for the portions of a request that target physically tagged caches are translated using invalidated virtual-to-physical address translations for speed. The split up request is processed to generate micro-transactions for individual caches targeted by the request. Micro-transactions for physically and virtually tagged caches are processed in parallel. Once all micro-transactions for a request have been processed, the unit that made the request is notified.Type: ApplicationFiled: December 23, 2016Publication date: June 28, 2018Applicants: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Mark Fowler, Jimshed Mirza, Anthony Asaro
-
Publication number: 20180173649Abstract: A system and method for efficient arbitration of memory access requests are described. One or more functional units generate memory access requests for a partitioned memory. An arbitration unit stores the generated requests and selects a given one of the stored requests. The arbitration unit identifies a given partition of the memory which stores a memory location targeted by the selected request. The arbitration unit determines whether one or more other stored requests access memory locations in the given partition. The arbitration unit sends each of the selected memory access request and the identified one or more other memory access requests to the memory to be serviced out of order.Type: ApplicationFiled: December 20, 2016Publication date: June 21, 2018Inventors: Rostyslav Kyrychynskyi, Anthony Asaro, Kostantinos Danny Christidis, Mark Fowler, Michael J. Mantor, Robert Scott Hartog
-
Publication number: 20180165872Abstract: Techniques for removing or identifying overlapping fragments in a fragment stream after z-culling are disclosed. The techniques include maintaining a first-in-first-out buffer that stores post-z-cull fragments. Each time a new fragment is received at the buffer, the screen position of the fragment is checked against all other fragments in the buffer. If the screen position of the fragment matches the screen position of a fragment in the buffer, then the fragment in the buffer is removed or marked as overlapping. If the screen position of the fragment does not match the screen position of any fragment in the buffer, then no modification is performed to fragments already in the buffer. In either case, he fragment is added to the buffer. The contents of the buffer are transmitted to the pixel shader for pixel shading at a later time.Type: ApplicationFiled: December 9, 2016Publication date: June 14, 2018Applicants: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Laurent Lefebvre, Michael Mantor, Mark Fowler, Mikko Alho, Mika Tuomi, Kiia Kallio, Patrick Klas Rudolf Buss, Jari Antero Komppa, Kaj Tuomi, Christopher J. Brennan