Patents by Inventor Mark Fowler

Mark Fowler has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Indicating instruction scheduling mode for processing wavefront portions

Patent number: 10474468

Abstract: Systems, apparatuses, and methods for processing variable wavefront sizes on a processor are disclosed. In one embodiment, a processor includes at least a scheduler, cache, and multiple execution units. When operating in a first mode, the processor executes the same instruction on multiple portions of a wavefront before proceeding to the next instruction of the shader program. When operating in a second mode, the processor executes a set of instructions on a first portion of a wavefront. In the second mode, when the processor finishes executing the set of instructions on the first portion of the wavefront, the processor executes the set of instructions on a second portion of the wavefront, and so on until all portions of the wavefront have been processed. The processor determines the operating mode based on one or more conditions.

Type: Grant

Filed: February 22, 2017

Date of Patent: November 12, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Michael J. Mantor, Brian D. Emberling, Mark Fowler, Mark M. Leather
Memory controller with flexible address decoding

Patent number: 10403333

Abstract: A memory controller includes a host interface for receiving memory access requests including access addresses, a memory interface for providing memory accesses to a memory system, and an address decoder coupled to the host interface for programmably mapping the access addresses to selected ones of a plurality of regions. The address decoder is programmable to map the access addresses to a first region having a non-power-of-two size using a primary decoder and a secondary decoder each having power-of-two sizes, and providing a first region mapping signal in response. A command queue stores the memory access requests and region mapping signals. An arbiter picks the memory access requests from the command queue based on a plurality of criteria, which are evaluated based in part on the region mapping signals, and provides corresponding memory accesses to the memory interface in response.

Type: Grant

Filed: July 15, 2016

Date of Patent: September 3, 2019

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Kevin M. Brandl, Thomas Hamilton, Hideki Kanayama, Kedarnath Balakrishnan, James R. Magro, Guanhao Shen, Mark Fowler
HYBRID RENDER WITH DEFERRED PRIMITIVE BATCH BINNING

Publication number: 20190122417

Abstract: A system, method and a non-transitory computer readable storage medium are provided for hybrid rendering with deferred primitive batch binning. A primitive batch is generated from one or more primitives. A bin is identified for processing the primitive batch. At least a portion of each primitive intersecting the identified bin is processed and a next bin for processing the primitive batch is identified based on an intercept walk order. The processing is iteratively repeated for the one or more primitives in the primitive batch for successive bins until all primitives of the primitive batch are completely processed. Then, the one or more primitives in the primitive batch are further processed.

Type: Application

Filed: November 2, 2018

Publication date: April 25, 2019

Applicants: Advanced Micro Devices, Inc., ATI Technologies ULC

Inventors: Michael Mantor, Laurent Lefebvre, Mark Fowler, Timothy Kelley, Mikko Alho, Mika Tuomi, Kiia Kallio, Patrick Klas Rudolf Buss, Jari Antero Komppa, Kaj Tuomi
Hybrid render with deferred primitive batch binning

Patent number: 10169906

Abstract: A system, method and a computer program product are provided for hybrid rendering with deferred primitive batch binning. A primitive batch is generated from a sequence of primitives. Initial bin intercepts are identified for primitives in the primitive batch. A bin for processing is identified. The bin corresponds to a region of a screen space. Pixels of the primitives intercepting the identified bin are processed. Next bin intercepts are identified while the primitives intercepting the identified bin are processed.

Type: Grant

Filed: March 29, 2013

Date of Patent: January 1, 2019

Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC

Inventors: Michael Mantor, Laurent Lefebvre, Mark Fowler, Timothy Kelley, Mikko Alho, Mika Tuomi, Kiia Kallio, Patrick Klas Rudolf Buss, Jari Antero Komppa, Kaj Tuomi
Efficient arbitration for memory accesses

Patent number: 10152434

Abstract: A system and method for efficient arbitration of memory access requests are described. One or more functional units generate memory access requests for a partitioned memory. An arbitration unit stores the generated requests and selects a given one of the stored requests. The arbitration unit identifies a given partition of the memory which stores a memory location targeted by the selected request. The arbitration unit determines whether one or more other stored requests access memory locations in the given partition. The arbitration unit sends each of the selected memory access request and the identified one or more other memory access requests to the memory to be serviced out of order.

Type: Grant

Filed: December 20, 2016

Date of Patent: December 11, 2018

Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC

Inventors: Rostyslav Kyrychynskyi, Anthony Asaro, Kostantinos Danny Christidis, Mark Fowler, Michael J. Mantor, Robert Scott Hartog
SEPARATE TRACKING OF PENDING LOADS AND STORES

Publication number: 20180246724

Abstract: Systems, apparatuses, and methods for maintaining separate pending load and store counters are disclosed herein. In one embodiment, a system includes at least one execution unit, a memory subsystem, and a pair of counters for each thread of execution. In one embodiment, the system implements a software based approach for managing dependencies between instructions. In one embodiment, the execution unit(s) maintains counters to support the software-based approach for managing dependencies between instructions. The execution unit(s) are configured to execute instructions that are used to manage the dependencies during run-time. In one embodiment, the execution unit(s) execute wait instructions to wait until a given counter is equal to a specified value before continuing to execute the instruction sequence.

Type: Application

Filed: February 24, 2017

Publication date: August 30, 2018

Inventors: Mark Fowler, Brian D. Emberling
VARIABLE WAVEFRONT SIZE

Publication number: 20180239606

Abstract: Systems, apparatuses, and methods for processing variable wavefront sizes on a processor are disclosed. In one embodiment, a processor includes at least a scheduler, cache, and multiple execution units. When operating in a first mode, the processor executes the same instruction on multiple portions of a wavefront before proceeding to the next instruction of the shader program. When operating in a second mode, the processor executes a set of instructions on a first portion of a wavefront. In the second mode, when the processor finishes executing the set of instructions on the first portion of the wavefront, the processor executes the set of instructions on a second portion of the wavefront, and so on until all portions of the wavefront have been processed. The processor determines the operating mode based on one or more conditions.

Type: Application

Filed: February 22, 2017

Publication date: August 23, 2018

Inventors: Michael J. Mantor, Brian D. Emberling, Mark Fowler, Mark M. Leather
HIGH-SPEED SELECTIVE CACHE INVALIDATES AND WRITE-BACKS ON GPUS

Publication number: 20180181488

Abstract: Techniques for performing cache invalidates and write-backs in an accelerated processing device (e.g., a graphics processing device that renders three-dimensional graphics) are disclosed. The techniques involve receiving requests from a “master” (e.g., the central processing unit). The techniques involve invalidating virtual-to-physical address translations in an address translation request. The techniques include splitting up the requests based on whether the requests target virtually or physically tagged caches. Addresses for the portions of a request that target physically tagged caches are translated using invalidated virtual-to-physical address translations for speed. The split up request is processed to generate micro-transactions for individual caches targeted by the request. Micro-transactions for physically and virtually tagged caches are processed in parallel. Once all micro-transactions for a request have been processed, the unit that made the request is notified.

Type: Application

Filed: December 23, 2016

Publication date: June 28, 2018

Applicants: Advanced Micro Devices, Inc., ATI Technologies ULC

Inventors: Mark Fowler, Jimshed Mirza, Anthony Asaro
EFFICIENT ARBITRATION FOR MEMORY ACCESSES

Publication number: 20180173649

Abstract: A system and method for efficient arbitration of memory access requests are described. One or more functional units generate memory access requests for a partitioned memory. An arbitration unit stores the generated requests and selects a given one of the stored requests. The arbitration unit identifies a given partition of the memory which stores a memory location targeted by the selected request. The arbitration unit determines whether one or more other stored requests access memory locations in the given partition. The arbitration unit sends each of the selected memory access request and the identified one or more other memory access requests to the memory to be serviced out of order.

Type: Application

Filed: December 20, 2016

Publication date: June 21, 2018

Inventors: Rostyslav Kyrychynskyi, Anthony Asaro, Kostantinos Danny Christidis, Mark Fowler, Michael J. Mantor, Robert Scott Hartog
NO ALLOCATE CACHE POLICY

Publication number: 20180165221

Abstract: A system and method for efficiently performing data allocation in a cache memory are described. A lookup is performed in a cache responsive to detecting an access request. If the targeted data is found in the cache and the targeted data is of a no allocate data type indicating the targeted data is not expected to be reused, then the targeted data is read from the cache without updating cache replacement policy information for the targeted data responsive to the access. If the lookup results in a miss, to the targeted data is prevented from being allocated in the cache.

Type: Application

Filed: December 9, 2016

Publication date: June 14, 2018

Inventor: Mark Fowler
REMOVING OR IDENTIFYING OVERLAPPING FRAGMENTS AFTER Z-CULLING

Publication number: 20180165872

Abstract: Techniques for removing or identifying overlapping fragments in a fragment stream after z-culling are disclosed. The techniques include maintaining a first-in-first-out buffer that stores post-z-cull fragments. Each time a new fragment is received at the buffer, the screen position of the fragment is checked against all other fragments in the buffer. If the screen position of the fragment matches the screen position of a fragment in the buffer, then the fragment in the buffer is removed or marked as overlapping. If the screen position of the fragment does not match the screen position of any fragment in the buffer, then no modification is performed to fragments already in the buffer. In either case, he fragment is added to the buffer. The contents of the buffer are transmitted to the pixel shader for pixel shading at a later time.

Type: Application

Filed: December 9, 2016

Publication date: June 14, 2018

Applicants: Advanced Micro Devices, Inc., ATI Technologies ULC

Inventors: Laurent Lefebvre, Michael Mantor, Mark Fowler, Mikko Alho, Mika Tuomi, Kiia Kallio, Patrick Klas Rudolf Buss, Jari Antero Komppa, Kaj Tuomi, Christopher J. Brennan
No allocate cache policy

Patent number: 9996478

Abstract: A system and method for efficiently performing data allocation in a cache memory are described. A lookup is performed in a cache responsive to detecting an access request. If the targeted data is found in the cache and the targeted data is of a no allocate data type indicating the targeted data is not expected to be reused, then the targeted data is read from the cache without updating cache replacement policy information for the targeted data responsive to the access. If the lookup results in a miss, the targeted data is prevented from being allocated in the cache.

Type: Grant

Filed: December 9, 2016

Date of Patent: June 12, 2018

Assignee: Advanced Micro Devices, Inc.

Inventor: Mark Fowler
Split storage of anti-aliased samples

Patent number: 9934551

Abstract: Embodiments of the present invention are directed to improving the performance of anti-aliased image rendering. One embodiment is a method of rendering a pixel from an anti-aliased image. The method includes: storing a first set and a second set of samples from a plurality of anti-aliased samples of the pixel respectively in a first memory and a second memory; and rendering a determined number of said samples from one of only the first set or the first and second sets. Corresponding system and computer program product embodiments are also disclosed.

Type: Grant

Filed: September 30, 2016

Date of Patent: April 3, 2018

Assignee: Advanced Micro Devices, Inc.

Inventor: Mark Fowler
MEMORY CONTROLLER WITH FLEXIBLE ADDRESS DECODING

Publication number: 20180019006

Abstract: A memory controller includes a host interface for receiving memory access requests including access addresses, a memory interface for providing memory accesses to a memory system, and an address decoder coupled to the host interface for programmably mapping the access addresses to selected ones of a plurality of regions. The address decoder is programmable to map the access addresses to a first region having a non-power-of-two size using a primary decoder and a secondary decoder each having power-of-two sizes, and providing a first region mapping signal in response. A command queue stores the memory access requests and region mapping signals. An arbiter picks the memory access requests from the command queue based on a plurality of criteria, which are evaluated based in part on the region mapping signals, and provides corresponding memory accesses to the memory interface in response.

Type: Application

Filed: July 15, 2016

Publication date: January 18, 2018

Applicant: Advanced Micro Devices, Inc.

Inventors: Kevin M. Brandl, Thomas Hamilton, Hideki Kanayama, Kedarnath Balakrishnan, James R. Magro, Guanhao Shen, Mark Fowler
SPLIT STORAGE OF ANTI-ALIASED SAMPLES

Publication number: 20170018053

Abstract: Embodiments of the present invention are directed to improving the performance of anti-aliased image rendering. One embodiment is a method of rendering a pixel from an anti-aliased image. The method includes: storing a first set and a second set of samples from a plurality of anti-aliased samples of the pixel respectively in a first memory and a second memory; and rendering a determined number of said samples from one of only the first set or the first and second sets. Corresponding system and computer program product embodiments are also disclosed.

Type: Application

Filed: September 30, 2016

Publication date: January 19, 2017

Applicant: Advanced Micro Devices, Inc.

Inventor: Mark Fowler
SHARED VIRTUAL ADDRESS SPACE FOR HETEROGENEOUS PROCESSORS

Publication number: 20160378674

Abstract: A processor uses the same virtual address space for heterogeneous processing units of the processor. The processor employs different sets of page tables for different types of processing units, such as a CPU and a GPU, wherein a memory management unit uses each set of page tables to translate virtual addresses of the virtual address space to corresponding physical addresses of memory modules associated with the processor. As data is migrated between memory modules, the physical addresses in the page tables can be updated to reflect the physical location of the data for each processing unit.

Type: Application

Filed: June 23, 2015

Publication date: December 29, 2016

Inventors: Gongxian Jeffrey Cheng, Mark Fowler, Philip J. Rogers, Benjamin T. Sander, Anthony Asaro, Mike Mantor, Raja Koduri
ACCESS LOG AND ADDRESS TRANSLATION LOG FOR A PROCESSOR

Publication number: 20160378682

Abstract: A processor maintains an access log indicating a stream of cache misses at a cache of the processor. In response to each of at least a subset of cache misses at the cache, the processor records a corresponding entry in the access log, indicating a physical memory address of the memory access request that resulted in the corresponding miss. In addition, the processor maintains an address translation log that indicates a mapping of physical memory addresses to virtual memory addresses. In response to an address translation (e.g., a page walk) that translates a virtual address to a physical address, the processor stores a mapping of the physical address to the corresponding virtual address at an entry of the address translation log. Software executing at the processor can use the two logs for memory management.

Type: Application

Filed: June 23, 2015

Publication date: December 29, 2016

Inventors: Benjamin T. Sander, Mark Fowler, Anthony Asaro, Gongxian Jeffrey Cheng, Mike Mantor
System and method for performing depth testing at top and bottom of graphics pipeline

Patent number: 9076265

Abstract: Embodiments of a system and method including graphics processing of a pixel sample are described. According to an embodiment, a first depth test processes a value, such as a z/stencil value, of a pixel sample and determines whether the value of the pixel sample satisfies the first depth test. If the value of the pixel sample satisfies the first depth test, the value of the pixel sample is not immediately written to storage, such as a Z-buffer. That is, if the value of the pixel sample satisfies the first depth test, the depth processing logic prevents or delays a write operation for the value of the pixel sample to storage at that time. A second depth test is performed on the value of the pixel sample if the value of the pixel sample satisfied the first depth test. If the value of the pixel sample satisfies the second depth test, the value of the pixel sample is then written to storage.

Type: Grant

Filed: June 16, 2006

Date of Patent: July 7, 2015

Assignee: ATI TECHNOLOGIES ULC

Inventors: Mark Fowler, Chris Brennan
Shared memory space in a unified memory model

Patent number: 9009419

Abstract: Methods and systems are provided for mapping a memory instruction to a shared memory address space in a computer arrangement having a CPU and an APD. A method includes receiving a memory instruction that refers to an address in the shared memory address space, mapping the memory instruction based on the address to a memory resource associated with either the CPU or the APD, and performing the memory instruction based on the mapping.

Type: Grant

Filed: July 31, 2012

Date of Patent: April 14, 2015

Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC

Inventors: Anthony Asaro, Kevin Normoyle, Mark D. Hummel, Mark Fowler
Tiling compaction in multi-processor systems

Patent number: 8963931

Abstract: A method and system for processing a graphics frame in a multi-processor computing environment are described. Embodiments of the present invention enable the reduction of the memory footprint required for processing a graphics frame in a multi-processor system. In one embodiment a method of processing a graphics frame using a plurality of processors is presented. The method includes determining a respective assignment of tiles of the graphics frame to each processor of the plurality of processors; allocating a memory area in a local memory of each processor, where the size of the allocated memory area substantially corresponds to the aggregate size of tiles assigned to the respective processor; and storing the tiles of the respective assignment of tiles in the memory area of each respective processor.

Type: Grant

Filed: September 10, 2010

Date of Patent: February 24, 2015

Assignee: Advanced Micro Devices, Inc.

Inventor: Mark Fowler

prev 1 2 3 4 next