Patents by Inventor Michael Mantor

Michael Mantor has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20220100528
    Abstract: An array processor includes processor element arrays distributed in rows and columns. The processor element arrays perform operations on parameter values. The array processor also includes memory interfaces that broadcast sets of the parameter values to mutually exclusive subsets of the rows and columns of the processor element arrays. In some cases, the array processor includes single-instruction-multiple-data (SIMD) units including subsets of the processor element arrays in corresponding rows, workgroup processors (WGPs) including subsets of the SIMD units, and a memory fabric configured to interconnect with an external memory that stores the parameter values. The memory interfaces broadcast the parameter values to the SIMD units that include the processor element arrays in rows associated with the memory interfaces and columns of processor element arrays that are implemented across the SIMD units in the WGPs. The memory interfaces access the parameter values from the external memory via the memory fabric.
    Type: Application
    Filed: September 25, 2020
    Publication date: March 31, 2022
    Inventors: Sateesh LAGUDU, Allen H. RUSH, Michael MANTOR, Arun Vaidyanathan ANANTHANARAYAN, Prasad NAGABHUSHANAMGARI, Maxim V. KAZAKOV
  • Patent number: 11263044
    Abstract: A graphics processing unit (GPU) adjusts a frequency of clock based on identifying a program thread executing at the processing unit, wherein the program thread is detected based on a workload to be executed. By adjusting the clock frequency based on the identified program thread, the processing unit adapts to different processing demands of different program threads. Further, by identifying the program thread based on workload, the processing unit adapts the clock frequency based on processing demands, thereby conserving processing resources.
    Type: Grant
    Filed: November 22, 2019
    Date of Patent: March 1, 2022
    Assignees: Advanced Micro Devices, Inc., ATI TECHNOLOGIES ULC
    Inventors: Mangesh P. Nijasure, Michael Mantor, Ashkan Hosseinzadeh Namin, Louis Regniere
  • Publication number: 20220051985
    Abstract: A semiconductor package includes a first die, a second die, and an interconnect die coupled to a first plurality of through-die vias in the first die and a second plurality of through-die vias in the second die. The interconnect die provides communications pathways the first die and the second die.
    Type: Application
    Filed: October 30, 2020
    Publication date: February 17, 2022
    Inventors: RAHUL AGARWAL, RAJA SWAMINATHAN, MICHAEL S. ALFANO, GABRIEL H. LOH, ALAN D. SMITH, GABRIEL WONG, MICHAEL MANTOR
  • Patent number: 11226819
    Abstract: A processing unit includes a plurality of processing elements and one or more caches. A first thread executes a program that includes one or more prefetch instructions to prefetch information into a first cache. Prefetching is selectively enabled when executing the first thread on a first processing element dependent upon whether one or more second threads previously executed the program on the first processing element. The first thread is then dispatched to execute the program on the first processing element. In some cases, a dispatcher receives the first thread four dispatching to the first processing element. The dispatcher modifies the prefetch instruction to disable prefetching into the first cache in response to the one or more second threads having previously executed the program on the first processing element.
    Type: Grant
    Filed: November 20, 2017
    Date of Patent: January 18, 2022
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Brian Emberling, Michael Mantor
  • Publication number: 20210405968
    Abstract: A parallel processing unit employs an arithmetic logic unit (ALU) having a relatively small footprint, thereby reducing the overall power consumption and circuit area of the processing unit. To support the smaller footprint, the ALU includes multiple stages to execute operations corresponding to a received instruction. The ALU executes at least one operation at a precision indicated by the received instruction, and then reduces the resulting data of the at least one operation to a smaller size before providing the results to another stage of the ALU to continue execution of the instruction.
    Type: Application
    Filed: September 23, 2020
    Publication date: December 30, 2021
    Inventors: Bin HE, Shubh SHAH, Michael MANTOR
  • Patent number: 11200060
    Abstract: An array processor includes processor element arrays (PEAs) distributed in rows and columns. The PEAs are configured to perform operations on parameter values. A first sequencer received a first direct memory access (DMA) instruction that includes a request to read data from at least one address in memory. A texture address (TA) engine requests the data from the memory based on the at least one address and a texture data (TD) engine provides the data to the PEAs. The PEAs provide first synchronization signals to the TD engine to indicate availability of registers for receiving the data. The TD engine provides second synchronization signals to the first sequencer in response to receiving acknowledgments that the PEAs have consumed the data.
    Type: Grant
    Filed: December 23, 2020
    Date of Patent: December 14, 2021
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Sateesh Lagudu, Arun Vaidyanathan Ananthanarayan, Michael Mantor, Allen H. Rush
  • Patent number: 11169811
    Abstract: A method of context bouncing includes receiving, at a command processor of a graphics processing unit (GPU), a conditional execute packet providing a hash identifier corresponding to an encapsulated state. The encapsulated state includes one or more context state packets following the conditional execute packet. A command packet following the encapsulated state is executed based at least in part on determining whether the hash identifier of the encapsulated state matches one of a plurality of hash identifiers of active context states currently stored at the GPU.
    Type: Grant
    Filed: May 30, 2019
    Date of Patent: November 9, 2021
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Rex Eldon McCrary, Yi Luo, Harry J. Wise, Alexander Fuad Ashkar, Michael Mantor
  • Publication number: 20210241516
    Abstract: A graphics processing unit (GPU) or other apparatus includes a plurality of shader engines. The apparatus also includes a first front end (FE) circuit and one or more second FE circuits. The first FE circuit is configured to schedule geometry workloads for the plurality of shader engines in a first mode. The first FE circuit is configured to schedule geometry workloads for a first subset of the plurality of shader engines and the one or more second FE circuits are configured to schedule geometry workloads for a second subset of the plurality of shader engines in a second mode. In some cases, a partition switch is configured to selectively connect the first FE circuit or the one or more second FE circuits to the second subset of the plurality of shader engines depending on whether the apparatus is in the first mode or the second mode.
    Type: Application
    Filed: November 6, 2020
    Publication date: August 5, 2021
    Inventors: Mark LEATHER, Michael MANTOR
  • Publication number: 20210209831
    Abstract: A method, system, and non-transitory computer readable storage medium for rasterizing primitives are disclosed. The method, system, and non-transitory computer readable storage medium includes: generating a primitive batch from a sequence of one or more primitives, wherein the primitive batch includes primitives sorted into one or more row groups based on which row of a plurality of rows each primitive intersects; and processing each row group, the processing for each row group including: identifying one or more primitive column intercepts for each of the one or more primitives in the row group, wherein each combination of primitive column intercept and row identifies a bin; and rasterizing the one or more primitives that intersect the bin.
    Type: Application
    Filed: March 22, 2021
    Publication date: July 8, 2021
    Applicants: Advanced Micro Devices, Inc., ATI Technologies ULC
    Inventors: Michael Mantor, Laurent Lefebvre, Mikko Alho, Mika Tuomi, Kiia Kallio
  • Publication number: 20210157588
    Abstract: A processor includes a plurality of vector sub-processors (VSPs) and a plurality of memory banks dedicated to respective VSPs. A first memory bank corresponding to a first VSP includes a first plurality of high vector general purpose register (VGPR) banks and a first plurality of low VGPR banks corresponding to the first plurality of high VGPR banks. The first memory bank further includes a plurality of operand gathering components that store operands from respective high VGPR banks and low VGPR banks. The operand gathering components are assigned to individual threads while the threads are executed by the first VSP.
    Type: Application
    Filed: November 27, 2019
    Publication date: May 27, 2021
    Inventors: Jiasheng CHEN, Bin HE, Jian HUANG, Michael MANTOR
  • Publication number: 20210157639
    Abstract: A graphics processing unit (GPU) adjusts a frequency of clock based on identifying a program thread executing at the processing unit, wherein the program thread is detected based on a workload to be executed. By adjusting the clock frequency based on the identified program thread, the processing unit adapts to different processing demands of different program threads. Further, by identifying the program thread based on workload, the processing unit adapts the clock frequency based on processing demands, thereby conserving processing resources.
    Type: Application
    Filed: November 22, 2019
    Publication date: May 27, 2021
    Inventors: Mangesh P. NIJASURE, Michael MANTOR, Ashkan HOSSEINZADEH NAMIN, Louis REGNIERE
  • Publication number: 20210117269
    Abstract: A system and method for protecting memory instructions against faults are described. The system and method include converting the slave instructions to dummy operations, modifying memory arbiter to issue up to N master and N slave global/shared memory instructions per cycle, sending master memory requests to memory system, using slave requests for error checking, entering master requests to the GM/LM FIFO, storing slave requests in a register, and comparing the entered master requests with the stored slave requests.
    Type: Application
    Filed: December 7, 2020
    Publication date: April 22, 2021
    Applicant: Advanced Micro Devices, Inc.
    Inventors: John Kalamatianos, Michael Mantor, Sudhanva Gurumurthi
  • Publication number: 20210096877
    Abstract: An arithmetic logic unit (ALU) pipeline of a processing unit collapses execution bubbles in response to a stall at a stage of the ALU pipeline. An execution bubble occurs at the pipeline in response to an invalid instruction being placed in the pipeline for execution. The invalid instruction thus consumes an available “slot” in the pipeline, and proceeds through the pipeline until a stall in a subsequent stage (that is, a stage after the stage executing the invalid instruction) is detected. In response to detecting the stall, the ALU continues to execute instructions that are behind the invalid instruction in the pipeline, thereby collapsing the execution bubble and conserving resources of the ALU.in response to a stall at a stage of the ALU pipeline.
    Type: Application
    Filed: September 26, 2019
    Publication date: April 1, 2021
    Inventors: Bin HE, Michael MANTOR, Jiasheng CHEN
  • Publication number: 20210089304
    Abstract: A processing unit such as a graphics processing unit (GPU) includes a plurality of vector signal processors (VSPs) that include multiply/accumulate elements. The processing unit also includes a plurality of registers associated with the plurality of VSPs. First portions of first and second matrices are fetched into the plurality of registers prior to a first round that includes a plurality of iterations. The multiply/accumulate elements perform matrix multiplication and accumulation on different combinations of subsets of the first portions of the first and second matrices in the plurality of iterations prior to fetching second portions of the first and second matrices into the plurality of registers for a second round. The accumulated results of multiplying the first portions of the first and second matrices are written into an output buffer in response to completing the plurality of iterations.
    Type: Application
    Filed: September 24, 2019
    Publication date: March 25, 2021
    Inventors: Bin HE, Michael MANTOR, Jiasheng CHEN, Jian HUANG
  • Publication number: 20210090205
    Abstract: The address of the draw or dispatch packet responsible for creating an exception is tied to a shader/wavefront back to the draw command from which it originated. In various embodiments, a method of operating a graphics pipeline and exception handling includes receiving, at a command processor of a graphics processing unit (GPU), an exception signal indicating an occurrence of a pipeline exception at a shader stage of a graphics pipeline. The shader stage generates an exception signal in response to a pipeline exception and transmits the exception signal to the command processor. The command processor determines, based on the exception signal, an address of a command packet responsible for the occurrence of the pipeline exception.
    Type: Application
    Filed: September 24, 2019
    Publication date: March 25, 2021
    Inventors: Michael MANTOR, Alexander Fuad ASHKAR, Randy RAMSEY, Mangesh P. NIJASURE, Brian EMBERLING
  • Patent number: 10957094
    Abstract: A system, method and a computer program product are provided for hybrid rendering with deferred primitive batch binning A primitive batch is generated from a sequence of primitives. Initial bin intercepts are identified for primitives in the primitive batch. A bin for processing is identified. The bin corresponds to a region of a screen space. Pixels of the primitives intercepting the identified bin are processed. Next bin intercepts are identified while the primitives intercepting the identified bin are processed.
    Type: Grant
    Filed: August 29, 2016
    Date of Patent: March 23, 2021
    Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC
    Inventors: Michael Mantor, Laurent Lefebvre, Mikko Alho, Mika Tuomi, Kiia Kallio
  • Patent number: 10943389
    Abstract: Techniques for removing or identifying overlapping fragments in a fragment stream after z-culling are disclosed. The techniques include maintaining a first-in-first-out buffer that stores post-z-cull fragments. Each time a new fragment is received at the buffer, the screen position of the fragment is checked against all other fragments in the buffer. If the screen position of the fragment matches the screen position of a fragment in the buffer, then the fragment in the buffer is removed or marked as overlapping. If the screen position of the fragment does not match the screen position of any fragment in the buffer, then no modification is performed to fragments already in the buffer. In either case, he fragment is added to the buffer. The contents of the buffer are transmitted to the pixel shader for pixel shading at a later time.
    Type: Grant
    Filed: December 9, 2016
    Date of Patent: March 9, 2021
    Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC
    Inventors: Laurent Lefebvre, Michael Mantor, Mark Fowler, Mikko Alho, Mika Tuomi, Kiia Kallio, Patrick Klas Rudolf Buss, Jari Antero Komppa, Kaj Tuomi, Christopher J. Brennan
  • Publication number: 20210064366
    Abstract: An apparatus such as a graphics processing unit (GPU) includes a plurality of processing elements configured to concurrently execute a plurality of first waves and accumulators associated with the plurality of processing elements. The accumulators are configured to store accumulated values representative of behavioral characteristics of the plurality of first waves that are concurrently executing on the plurality of processing elements. The apparatus also includes a dispatcher configured to dispatch second waves to the plurality of processing elements based on comparisons of values representative of behavioral characteristics of the second waves and the accumulated values stored in the accumulators. In some cases, the behavioral characteristics of the plurality of first waves comprise at least one of fetch bandwidths, usage of an arithmetic logic unit (ALU), and number of export operations.
    Type: Application
    Filed: August 30, 2019
    Publication date: March 4, 2021
    Inventors: Randy RAMSEY, William David ISENBERG, Michael MANTOR
  • Publication number: 20210049729
    Abstract: A graphics processing unit (GPU) includes a plurality of programmable processing cores configured to process graphics primitives and corresponding data and a plurality of fixed-function hardware units. The plurality of processing cores and the plurality of fixed-function hardware units are configured to implement a configurable number of virtual pipelines to concurrently process different command flows. Each virtual pipeline includes a configurable number of fragments and an operational state of each virtual pipeline is specified by a different context. The configurable number of virtual pipelines can be modified from a first number to a second number that is different than the first number. An emulation of a fixed-function hardware unit can be instantiated on one or more of the graphics processing cores in response to detection of a bottleneck in a fixed-function hardware unit. One or more of the virtual pipelines can then be reconfigured to utilize the emulation instead of the fixed-function hardware unit.
    Type: Application
    Filed: May 21, 2020
    Publication date: February 18, 2021
    Inventors: Timour T. PALTASHEV, Michael MANTOR, Rex Eldon MCCRARY
  • Patent number: 10922868
    Abstract: Improvements in the graphics processing pipeline that allow multiple pipelines to cooperate to render a single frame are disclosed. Two approaches are provided. In a first approach, world-space pipelines for the different graphics processing pipelines process all work for draw calls received from a central processing unit (CPU). In a second approach, the world-space pipelines divide up the work. Work that is divided is synchronized and redistributed at various points in the world-space pipeline. In either approach, the triangles output by the world-space pipelines are distributed to the screen-space pipelines based on the portions of the render surface overlapped by the triangles. Triangles are rendered by screen-space pipelines associated with the render surface portions overlapped by those triangles.
    Type: Grant
    Filed: June 26, 2019
    Date of Patent: February 16, 2021
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Mangesh P. Nijasure, Todd Martin, Michael Mantor