Patents by Inventor Michael Mantor

Michael Mantor has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20210241516
    Abstract: A graphics processing unit (GPU) or other apparatus includes a plurality of shader engines. The apparatus also includes a first front end (FE) circuit and one or more second FE circuits. The first FE circuit is configured to schedule geometry workloads for the plurality of shader engines in a first mode. The first FE circuit is configured to schedule geometry workloads for a first subset of the plurality of shader engines and the one or more second FE circuits are configured to schedule geometry workloads for a second subset of the plurality of shader engines in a second mode. In some cases, a partition switch is configured to selectively connect the first FE circuit or the one or more second FE circuits to the second subset of the plurality of shader engines depending on whether the apparatus is in the first mode or the second mode.
    Type: Application
    Filed: November 6, 2020
    Publication date: August 5, 2021
    Inventors: Mark LEATHER, Michael MANTOR
  • Publication number: 20210209831
    Abstract: A method, system, and non-transitory computer readable storage medium for rasterizing primitives are disclosed. The method, system, and non-transitory computer readable storage medium includes: generating a primitive batch from a sequence of one or more primitives, wherein the primitive batch includes primitives sorted into one or more row groups based on which row of a plurality of rows each primitive intersects; and processing each row group, the processing for each row group including: identifying one or more primitive column intercepts for each of the one or more primitives in the row group, wherein each combination of primitive column intercept and row identifies a bin; and rasterizing the one or more primitives that intersect the bin.
    Type: Application
    Filed: March 22, 2021
    Publication date: July 8, 2021
    Applicants: Advanced Micro Devices, Inc., ATI Technologies ULC
    Inventors: Michael Mantor, Laurent Lefebvre, Mikko Alho, Mika Tuomi, Kiia Kallio
  • Publication number: 20210157588
    Abstract: A processor includes a plurality of vector sub-processors (VSPs) and a plurality of memory banks dedicated to respective VSPs. A first memory bank corresponding to a first VSP includes a first plurality of high vector general purpose register (VGPR) banks and a first plurality of low VGPR banks corresponding to the first plurality of high VGPR banks. The first memory bank further includes a plurality of operand gathering components that store operands from respective high VGPR banks and low VGPR banks. The operand gathering components are assigned to individual threads while the threads are executed by the first VSP.
    Type: Application
    Filed: November 27, 2019
    Publication date: May 27, 2021
    Inventors: Jiasheng CHEN, Bin HE, Jian HUANG, Michael MANTOR
  • Publication number: 20210157639
    Abstract: A graphics processing unit (GPU) adjusts a frequency of clock based on identifying a program thread executing at the processing unit, wherein the program thread is detected based on a workload to be executed. By adjusting the clock frequency based on the identified program thread, the processing unit adapts to different processing demands of different program threads. Further, by identifying the program thread based on workload, the processing unit adapts the clock frequency based on processing demands, thereby conserving processing resources.
    Type: Application
    Filed: November 22, 2019
    Publication date: May 27, 2021
    Inventors: Mangesh P. NIJASURE, Michael MANTOR, Ashkan HOSSEINZADEH NAMIN, Louis REGNIERE
  • Publication number: 20210117269
    Abstract: A system and method for protecting memory instructions against faults are described. The system and method include converting the slave instructions to dummy operations, modifying memory arbiter to issue up to N master and N slave global/shared memory instructions per cycle, sending master memory requests to memory system, using slave requests for error checking, entering master requests to the GM/LM FIFO, storing slave requests in a register, and comparing the entered master requests with the stored slave requests.
    Type: Application
    Filed: December 7, 2020
    Publication date: April 22, 2021
    Applicant: Advanced Micro Devices, Inc.
    Inventors: John Kalamatianos, Michael Mantor, Sudhanva Gurumurthi
  • Publication number: 20210096877
    Abstract: An arithmetic logic unit (ALU) pipeline of a processing unit collapses execution bubbles in response to a stall at a stage of the ALU pipeline. An execution bubble occurs at the pipeline in response to an invalid instruction being placed in the pipeline for execution. The invalid instruction thus consumes an available “slot” in the pipeline, and proceeds through the pipeline until a stall in a subsequent stage (that is, a stage after the stage executing the invalid instruction) is detected. In response to detecting the stall, the ALU continues to execute instructions that are behind the invalid instruction in the pipeline, thereby collapsing the execution bubble and conserving resources of the ALU.in response to a stall at a stage of the ALU pipeline.
    Type: Application
    Filed: September 26, 2019
    Publication date: April 1, 2021
    Inventors: Bin HE, Michael MANTOR, Jiasheng CHEN
  • Publication number: 20210089304
    Abstract: A processing unit such as a graphics processing unit (GPU) includes a plurality of vector signal processors (VSPs) that include multiply/accumulate elements. The processing unit also includes a plurality of registers associated with the plurality of VSPs. First portions of first and second matrices are fetched into the plurality of registers prior to a first round that includes a plurality of iterations. The multiply/accumulate elements perform matrix multiplication and accumulation on different combinations of subsets of the first portions of the first and second matrices in the plurality of iterations prior to fetching second portions of the first and second matrices into the plurality of registers for a second round. The accumulated results of multiplying the first portions of the first and second matrices are written into an output buffer in response to completing the plurality of iterations.
    Type: Application
    Filed: September 24, 2019
    Publication date: March 25, 2021
    Inventors: Bin HE, Michael MANTOR, Jiasheng CHEN, Jian HUANG
  • Publication number: 20210090205
    Abstract: The address of the draw or dispatch packet responsible for creating an exception is tied to a shader/wavefront back to the draw command from which it originated. In various embodiments, a method of operating a graphics pipeline and exception handling includes receiving, at a command processor of a graphics processing unit (GPU), an exception signal indicating an occurrence of a pipeline exception at a shader stage of a graphics pipeline. The shader stage generates an exception signal in response to a pipeline exception and transmits the exception signal to the command processor. The command processor determines, based on the exception signal, an address of a command packet responsible for the occurrence of the pipeline exception.
    Type: Application
    Filed: September 24, 2019
    Publication date: March 25, 2021
    Inventors: Michael MANTOR, Alexander Fuad ASHKAR, Randy RAMSEY, Mangesh P. NIJASURE, Brian EMBERLING
  • Patent number: 10957094
    Abstract: A system, method and a computer program product are provided for hybrid rendering with deferred primitive batch binning A primitive batch is generated from a sequence of primitives. Initial bin intercepts are identified for primitives in the primitive batch. A bin for processing is identified. The bin corresponds to a region of a screen space. Pixels of the primitives intercepting the identified bin are processed. Next bin intercepts are identified while the primitives intercepting the identified bin are processed.
    Type: Grant
    Filed: August 29, 2016
    Date of Patent: March 23, 2021
    Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC
    Inventors: Michael Mantor, Laurent Lefebvre, Mikko Alho, Mika Tuomi, Kiia Kallio
  • Patent number: 10943389
    Abstract: Techniques for removing or identifying overlapping fragments in a fragment stream after z-culling are disclosed. The techniques include maintaining a first-in-first-out buffer that stores post-z-cull fragments. Each time a new fragment is received at the buffer, the screen position of the fragment is checked against all other fragments in the buffer. If the screen position of the fragment matches the screen position of a fragment in the buffer, then the fragment in the buffer is removed or marked as overlapping. If the screen position of the fragment does not match the screen position of any fragment in the buffer, then no modification is performed to fragments already in the buffer. In either case, he fragment is added to the buffer. The contents of the buffer are transmitted to the pixel shader for pixel shading at a later time.
    Type: Grant
    Filed: December 9, 2016
    Date of Patent: March 9, 2021
    Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC
    Inventors: Laurent Lefebvre, Michael Mantor, Mark Fowler, Mikko Alho, Mika Tuomi, Kiia Kallio, Patrick Klas Rudolf Buss, Jari Antero Komppa, Kaj Tuomi, Christopher J. Brennan
  • Publication number: 20210064366
    Abstract: An apparatus such as a graphics processing unit (GPU) includes a plurality of processing elements configured to concurrently execute a plurality of first waves and accumulators associated with the plurality of processing elements. The accumulators are configured to store accumulated values representative of behavioral characteristics of the plurality of first waves that are concurrently executing on the plurality of processing elements. The apparatus also includes a dispatcher configured to dispatch second waves to the plurality of processing elements based on comparisons of values representative of behavioral characteristics of the second waves and the accumulated values stored in the accumulators. In some cases, the behavioral characteristics of the plurality of first waves comprise at least one of fetch bandwidths, usage of an arithmetic logic unit (ALU), and number of export operations.
    Type: Application
    Filed: August 30, 2019
    Publication date: March 4, 2021
    Inventors: Randy RAMSEY, William David ISENBERG, Michael MANTOR
  • Publication number: 20210049729
    Abstract: A graphics processing unit (GPU) includes a plurality of programmable processing cores configured to process graphics primitives and corresponding data and a plurality of fixed-function hardware units. The plurality of processing cores and the plurality of fixed-function hardware units are configured to implement a configurable number of virtual pipelines to concurrently process different command flows. Each virtual pipeline includes a configurable number of fragments and an operational state of each virtual pipeline is specified by a different context. The configurable number of virtual pipelines can be modified from a first number to a second number that is different than the first number. An emulation of a fixed-function hardware unit can be instantiated on one or more of the graphics processing cores in response to detection of a bottleneck in a fixed-function hardware unit. One or more of the virtual pipelines can then be reconfigured to utilize the emulation instead of the fixed-function hardware unit.
    Type: Application
    Filed: May 21, 2020
    Publication date: February 18, 2021
    Inventors: Timour T. PALTASHEV, Michael MANTOR, Rex Eldon MCCRARY
  • Patent number: 10922868
    Abstract: Improvements in the graphics processing pipeline that allow multiple pipelines to cooperate to render a single frame are disclosed. Two approaches are provided. In a first approach, world-space pipelines for the different graphics processing pipelines process all work for draw calls received from a central processing unit (CPU). In a second approach, the world-space pipelines divide up the work. Work that is divided is synchronized and redistributed at various points in the world-space pipeline. In either approach, the triangles output by the world-space pipelines are distributed to the screen-space pipelines based on the portions of the render surface overlapped by the triangles. Triangles are rendered by screen-space pipelines associated with the render surface portions overlapped by those triangles.
    Type: Grant
    Filed: June 26, 2019
    Date of Patent: February 16, 2021
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Mangesh P. Nijasure, Todd Martin, Michael Mantor
  • Patent number: 10860418
    Abstract: A system and method for protecting memory instructions against faults are described. The system and method include converting the slave instructions to dummy operations, modifying memory arbiter to issue up to N master and N slave global/shared memory instructions per cycle, sending master memory requests to memory system, using slave requests for error checking, entering master requests to the GM/LM FIFO, storing slave requests in a register, and comparing the entered master requests with the stored slave requests.
    Type: Grant
    Filed: April 8, 2019
    Date of Patent: December 8, 2020
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: John Kalamatianos, Michael Mantor, Sudhanva Gurumurthi
  • Publication number: 20200379767
    Abstract: A method of context bouncing includes receiving, at a command processor of a graphics processing unit (GPU), a conditional execute packet providing a hash identifier corresponding to an encapsulated state. The encapsulated state includes one or more context state packets following the conditional execute packet. A command packet following the encapsulated state is executed based at least in part on determining whether the hash identifier of the encapsulated state matches one of a plurality of hash identifiers of active context states currently stored at the GPU.
    Type: Application
    Filed: May 30, 2019
    Publication date: December 3, 2020
    Inventors: Rex Eldon MCCRARY, Yi LUO, Harry J. WISE, Alexander Fuad ASHKAR, Michael MANTOR
  • Publication number: 20200293286
    Abstract: A graphics processing unit (GPU) implements operations, with associated op codes, to perform mixed precision mathematical operations. The GPU includes an arithmetic logic unit (ALU) with different execution paths, wherein each execution path executes a different mixed precision operation. By implementing mixed precision operations at the ALU in response to designate op codes that delineate the operations, the GPU efficiently increases the precision of specified mathematical operations while reducing execution overhead.
    Type: Application
    Filed: October 2, 2019
    Publication date: September 17, 2020
    Inventors: Bin HE, Michael MANTOR, Jiasheng CHEN
  • Publication number: 20200293329
    Abstract: A processing element is implemented in a stage of a pipeline and configured to execute an instruction. A first array of multiplexers is to provide information associated with the instruction to the processing element in response to the instruction being in a first set of instructions. A second array of multiplexers is to provide information associated with the instruction to the first processing element in response to the instruction being in a second set of instructions. A control unit is to gate at least one of power or a clock signal provided to the first array of multiplexers in response to the instruction being in the second set.
    Type: Application
    Filed: April 28, 2020
    Publication date: September 17, 2020
    Inventors: Jiasheng CHEN, YunXiao ZOU, Bin HE, Angel E. SOCARRAS, QingCheng WANG, Wei YUAN, Michael MANTOR
  • Publication number: 20200210341
    Abstract: Embodiments include methods, systems and non-transitory computer-readable computer readable media including instructions for executing a prefetch kernel with reduced intermediate state storage resource requirements. These include executing a prefetch kernel on a graphics processing unit (GPU), such that the prefetch kernel begins executing before a processing kernel. The prefetch kernel performs memory operations that are based upon at least a subset of memory operations in the processing kernel.
    Type: Application
    Filed: March 9, 2020
    Publication date: July 2, 2020
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Nuwan S. Jayasena, James Michael O'Connor, Michael Mantor
  • Patent number: 10664942
    Abstract: A graphics processing unit (GPU) includes a plurality of programmable processing cores configured to process graphics primitives and corresponding data and a plurality of fixed-function hardware units. The plurality of processing cores and the plurality of fixed-function hardware units are configured to implement a configurable number of virtual pipelines to concurrently process different command flows. Each virtual pipeline includes a configurable number of fragments and an operational state of each virtual pipeline is specified by a different context. The configurable number of virtual pipelines can be modified from a first number to a second number that is different than the first number. An emulation of a fixed-function hardware unit can be instantiated on one or more of the graphics processing cores in response to detection of a bottleneck in a fixed-function hardware unit. One or more of the virtual pipelines can then be reconfigured to utilize the emulation instead of the fixed-function hardware unit.
    Type: Grant
    Filed: October 21, 2016
    Date of Patent: May 26, 2020
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Timour T. Paltashev, Michael Mantor, Rex Eldon McCrary
  • Patent number: 10656951
    Abstract: A processing element is implemented in a stage of a pipeline and configured to execute an instruction. A first array of multiplexers is to provide information associated with the instruction to the processing element in response to the instruction being in a first set of instructions. A second array of multiplexers is to provide information associated with the instruction to the first processing element in response to the instruction being in a second set of instructions. A control unit is to gate at least one of power or a clock signal provided to the first array of multiplexers in response to the instruction being in the second set.
    Type: Grant
    Filed: October 20, 2017
    Date of Patent: May 19, 2020
    Assignees: ADVANCED MICRO DEVICES, INC., ADVANCED MICRO DEVICES (SHANGHAI) CO., LTD.
    Inventors: Jiasheng Chen, YunXiao Zou, Bin He, Angel E. Socarras, QingCheng Wang, Wei Yuan, Michael Mantor