Patents by Inventor Michael Mantor

Michael Mantor has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11494192
    Abstract: A processing element is implemented in a stage of a pipeline and configured to execute an instruction. A first array of multiplexers is to provide information associated with the instruction to the processing element in response to the instruction being in a first set of instructions. A second array of multiplexers is to provide information associated with the instruction to the first processing element in response to the instruction being in a second set of instructions. A control unit is to gate at least one of power or a clock signal provided to the first array of multiplexers in response to the instruction being in the second set.
    Type: Grant
    Filed: April 28, 2020
    Date of Patent: November 8, 2022
    Assignees: Advanced Micro Devices, Inc., ADVANCED MICRO DEVICES (SHANGHAI) CO., LTD.
    Inventors: Jiasheng Chen, YunXiao Zou, Bin He, Angel E. Socarras, QingCheng Wang, Wei Yuan, Michael Mantor
  • Publication number: 20220320042
    Abstract: A multi-die parallel processor semiconductor package includes a first base IC die including a first plurality of virtual compute dies 3D stacked on top of the first base IC die. A first subset of a parallel processing pipeline logic is positioned at the first plurality of virtual compute dies. Additionally, a second subset of the parallel processing pipeline logic is positioned at the first base IC die. The multi-die parallel processor semiconductor package also includes a second base IC die including a second plurality of virtual compute dies 3D stacked on top of the second base IC die. An active bridge chip communicably couples a first interconnect structure of the first base IC die to a first interconnect structure of the second base IC die.
    Type: Application
    Filed: March 30, 2021
    Publication date: October 6, 2022
    Inventor: Michael MANTOR
  • Publication number: 20220277508
    Abstract: A method, computer system, and a non-transitory computer-readable storage medium for performing primitive batch binning are disclosed. The method, computer system, and non-transitory computer-readable storage medium include techniques for generating a primitive batch from a plurality of primitives, computing respective bin intercepts for each of the plurality of primitives in the primitive batch, and shading the primitive batch by iteratively processing each of the respective bin intercepts computed until all of the respective bin intercepts are processed.
    Type: Application
    Filed: May 16, 2022
    Publication date: September 1, 2022
    Applicants: Advanced Micro Devices, Inc., ATI Technologies ULC
    Inventors: Michael Mantor, Laurent Lefebvre, Mark Fowler, Timothy Kelley, Mikko Alho, Mika Tuomi, Kiia Kallio, Patrick Klas Rudolf Buss, Jari Antero Komppa, Kaj Tuomi
  • Publication number: 20220269620
    Abstract: A processor maintains an access log indicating a stream of cache misses at a cache of the processor. In response to each of at least a subset of cache misses at the cache, the processor records a corresponding entry in the access log, indicating a physical memory address of the memory access request that resulted in the corresponding miss. In addition, the processor maintains an address translation log that indicates a mapping of physical memory addresses to virtual memory addresses. In response to an address translation (e.g., a page walk) that translates a virtual address to a physical address, the processor stores a mapping of the physical address to the corresponding virtual address at an entry of the address translation log. Software executing at the processor can use the two logs for memory management.
    Type: Application
    Filed: February 8, 2022
    Publication date: August 25, 2022
    Inventors: Benjamin T. SANDER, Mark Fowler, Anthony Asaro, Gongxian Jeffrey Cheng, Michael Mantor
  • Patent number: 11409536
    Abstract: A method and apparatus for performing a multi-precision computation in a plurality of arithmetic logic units (ALUs) includes pairing a first Single Instruction/Multiple Data (SIMD) block channel device with a second SIMD block channel device to create a first block pair having one-level staggering between the first and second channel devices. A third SIMD block channel device is paired with a fourth SIMD block channel device to create a second block pair having one-level staggering between the third and fourth channel devices. A plurality of source inputs are received at the first block pair and the second block pair. The first block pair computes a first result, and the second block pair computes a second result.
    Type: Grant
    Filed: November 3, 2016
    Date of Patent: August 9, 2022
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Bin He, YunXiao Zou, Jiasheng Chen, Michael Mantor
  • Patent number: 11409840
    Abstract: An array processor includes processor element arrays distributed in rows and columns. The processor element arrays perform operations on parameter values. The array processor also includes memory interfaces that are dynamically mapped to mutually exclusive subsets of the rows and columns of the processor element arrays based on dimensions of matrices that provide the parameter values to the processor element arrays. In some cases, the processor element arrays are vector arithmetic logic unit (ALU) processors and the memory interfaces are direct memory access (DMA) engines. The rows of the processor element arrays in the subsets are mutually exclusive to the rows in the other subsets and the columns of the processor element arrays in the subsets are mutually exclusive to the columns in the other subsets. The matrices can be symmetric or asymmetric, e.g., one of the matrices can be a vector having a single column.
    Type: Grant
    Filed: September 25, 2020
    Date of Patent: August 9, 2022
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Sateesh Lagudu, Allen H. Rush, Michael Mantor, Arun Vaidyanathan Ananthanarayan, Prasad Nagabhushanamgari
  • Publication number: 20220237851
    Abstract: A graphics processing unit (GPU) or other apparatus includes a plurality of shader engines. The apparatus also includes a first front end (FE) circuit and one or more second FE circuits. The first FE circuit is configured to schedule geometry workloads for the plurality of shader engines in a first mode. The first FE circuit is configured to schedule geometry workloads for a first subset of the plurality of shader engines and the one or more second FE circuits are configured to schedule geometry workloads for a second subset of the plurality of shader engines in a second mode. In some cases, a partition switch is configured to selectively connect the first FE circuit or the one or more second FE circuits to the second subset of the plurality of shader engines depending on whether the apparatus is in the first mode or the second mode.
    Type: Application
    Filed: March 29, 2022
    Publication date: July 28, 2022
    Inventors: Mark LEATHER, Michael MANTOR
  • Patent number: 11397578
    Abstract: An apparatus such as a graphics processing unit (GPU) includes a plurality of processing elements configured to concurrently execute a plurality of first waves and accumulators associated with the plurality of processing elements. The accumulators are configured to store accumulated values representative of behavioral characteristics of the plurality of first waves that are concurrently executing on the plurality of processing elements. The apparatus also includes a dispatcher configured to dispatch second waves to the plurality of processing elements based on comparisons of values representative of behavioral characteristics of the second waves and the accumulated values stored in the accumulators. In some cases, the behavioral characteristics of the plurality of first waves comprise at least one of fetch bandwidths, usage of an arithmetic logic unit (ALU), and number of export operations.
    Type: Grant
    Filed: August 30, 2019
    Date of Patent: July 26, 2022
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Randy Ramsey, William David Isenberg, Michael Mantor
  • Patent number: 11386518
    Abstract: The address of the draw or dispatch packet responsible for creating an exception is tied to a shader/wavefront back to the draw command from which it originated. In various embodiments, a method of operating a graphics pipeline and exception handling includes receiving, at a command processor of a graphics processing unit (GPU), an exception signal indicating an occurrence of a pipeline exception at a shader stage of a graphics pipeline. The shader stage generates an exception signal in response to a pipeline exception and transmits the exception signal to the command processor. The command processor determines, based on the exception signal, an address of a command packet responsible for the occurrence of the pipeline exception.
    Type: Grant
    Filed: September 24, 2019
    Date of Patent: July 12, 2022
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Michael Mantor, Alexander Fuad Ashkar, Randy Ramsey, Mangesh P. Nijasure, Brian Emberling
  • Patent number: 11379941
    Abstract: Improvements in the graphics processing pipeline are disclosed. More specifically, a new primitive shader stage performs tasks of the vertex shader stage or a domain shader stage if tessellation is enabled, a geometry shader if enabled, and a fixed function primitive assembler. The primitive shader stage is compiled by a driver from user-provided vertex or domain shader code, geometry shader code, and from code that performs functions of the primitive assembler. Moving tasks of the fixed function primitive assembler to a primitive shader that executes in programmable hardware provides many benefits, such as removal of a fixed function crossbar, removal of dedicated parameter and position buffers that are unusable in general compute mode, and other benefits.
    Type: Grant
    Filed: January 25, 2017
    Date of Patent: July 5, 2022
    Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC
    Inventors: Todd Martin, Mangesh P. Nijasure, Randy W. Ramsey, Michael Mantor, Laurent Lefebvre
  • Publication number: 20220197655
    Abstract: An array processor includes processor element arrays (PEAs) distributed in rows and columns. The PEAs are configured to perform operations on parameter values. A first sequencer received a first direct memory access (DMA) instruction that includes a request to read data from at least one address in memory. A texture address (TA) engine requests the data from the memory based on the at least one address and a texture data (TD) engine provides the data to the PEAs. The PEAs provide first synchronization signals to the TD engine to indicate availability of registers for receiving the data. The TD engine provides second synchronization signals to the first sequencer in response to receiving acknowledgments that the PEAs have consumed the data.
    Type: Application
    Filed: December 10, 2021
    Publication date: June 23, 2022
    Inventors: Sateesh LAGUDU, Arun Vaidyanathan ANANTHANARAYAN, Michael Mantor, Allen H. Rush
  • Publication number: 20220197973
    Abstract: A processing system includes a first set and a second set of general-purpose registers (GPRs) and memory access circuitry that fetches nonzero values of a sparse matrix into consecutive slots in the first set. The memory access circuitry also fetches values of an expanded matrix into consecutive slots in the second set of GPRs. The expanded matrix is formed based on values of a vector and locations of the nonzero values in the sparse matrix. The processing system also includes a set of multipliers that concurrently perform multiplication of the nonzero values in slots of the first set of GPRs with the values of the vector in corresponding slots of the second set. Reduced sum circuitry accumulates results from the set of multipliers for rows of the sparse matrix.
    Type: Application
    Filed: December 17, 2020
    Publication date: June 23, 2022
    Inventors: Sateesh LAGUDU, Allen H. RUSH, Michael MANTOR
  • Publication number: 20220188076
    Abstract: A processing system executes wavefronts at multiple arithmetic logic unit (ALU) pipelines of a single instruction multiple data (SIMD) unit in a single execution cycle. The ALU pipelines each include a number of ALUs that execute instructions on wavefront operands that are collected from vector general process register (VGPR) banks at a cache and output results of the instructions executed on the wavefronts at a buffer. By storing wavefronts supplied by the VGPR banks at the cache, a greater number of wavefronts can be made available to the SIMD unit without increasing the VGPR bandwidth, enabling multiple ALU pipelines to execute instructions during a single execution cycle.
    Type: Application
    Filed: December 14, 2020
    Publication date: June 16, 2022
    Inventors: Bin He, Brian Emberling, Mark Leather, Michael Mantor
  • Patent number: 11335052
    Abstract: A system, method and a non-transitory computer readable storage medium are provided for hybrid rendering with deferred primitive batch binning. A primitive batch is generated from one or more primitives. A bin is identified for processing the primitive batch. At least a portion of each primitive intersecting the identified bin is processed and a next bin for processing the primitive batch is identified based on an intercept walk order. The processing is iteratively repeated for the one or more primitives in the primitive batch for successive bins until all primitives of the primitive batch are completely processed. Then, the one or more primitives in the primitive batch are further processed.
    Type: Grant
    Filed: November 2, 2018
    Date of Patent: May 17, 2022
    Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC
    Inventors: Michael Mantor, Laurent Lefebvre, Mark Fowler, Timothy Kelley, Mikko Alho, Mika Tuomi, Kiia Kallio, Patrick Klas Rudolf Buss, Jari Antero Komppa, Kaj Tuomi
  • Patent number: 11295507
    Abstract: A graphics processing unit (GPU) or other apparatus includes a plurality of shader engines. The apparatus also includes a first front end (FE) circuit and one or more second FE circuits. The first FE circuit is configured to schedule geometry workloads for the plurality of shader engines in a first mode. The first FE circuit is configured to schedule geometry workloads for a first subset of the plurality of shader engines and the one or more second FE circuits are configured to schedule geometry workloads for a second subset of the plurality of shader engines in a second mode. In some cases, a partition switch is configured to selectively connect the first FE circuit or the one or more second FE circuits to the second subset of the plurality of shader engines depending on whether the apparatus is in the first mode or the second mode.
    Type: Grant
    Filed: November 6, 2020
    Date of Patent: April 5, 2022
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Mark Leather, Michael Mantor
  • Publication number: 20220100528
    Abstract: An array processor includes processor element arrays distributed in rows and columns. The processor element arrays perform operations on parameter values. The array processor also includes memory interfaces that broadcast sets of the parameter values to mutually exclusive subsets of the rows and columns of the processor element arrays. In some cases, the array processor includes single-instruction-multiple-data (SIMD) units including subsets of the processor element arrays in corresponding rows, workgroup processors (WGPs) including subsets of the SIMD units, and a memory fabric configured to interconnect with an external memory that stores the parameter values. The memory interfaces broadcast the parameter values to the SIMD units that include the processor element arrays in rows associated with the memory interfaces and columns of processor element arrays that are implemented across the SIMD units in the WGPs. The memory interfaces access the parameter values from the external memory via the memory fabric.
    Type: Application
    Filed: September 25, 2020
    Publication date: March 31, 2022
    Inventors: Sateesh LAGUDU, Allen H. RUSH, Michael MANTOR, Arun Vaidyanathan ANANTHANARAYAN, Prasad NAGABHUSHANAMGARI, Maxim V. KAZAKOV
  • Publication number: 20220100813
    Abstract: An array processor includes processor element arrays distributed in rows and columns. The processor element arrays perform operations on parameter values. The array processor also includes memory interfaces that are dynamically mapped to mutually exclusive subsets of the rows and columns of the processor element arrays based on dimensions of matrices that provide the parameter values to the processor element arrays. In some cases, the processor element arrays are vector arithmetic logic unit (ALU) processors and the memory interfaces are direct memory access (DMA) engines. The rows of the processor element arrays in the subsets are mutually exclusive to the rows in the other subsets and the columns of the processor element arrays in the subsets are mutually exclusive to the columns in the other subsets. The matrices can be symmetric or asymmetric, e.g., one of the matrices can be a vector having a single column.
    Type: Application
    Filed: September 25, 2020
    Publication date: March 31, 2022
    Inventors: Sateesh LAGUDU, Allen H. RUSH, Michael MANTOR, Arun Vaidyanathan ANANTHANARAYAN, Prasad NAGABHUSHANAMGARI
  • Patent number: 11263044
    Abstract: A graphics processing unit (GPU) adjusts a frequency of clock based on identifying a program thread executing at the processing unit, wherein the program thread is detected based on a workload to be executed. By adjusting the clock frequency based on the identified program thread, the processing unit adapts to different processing demands of different program threads. Further, by identifying the program thread based on workload, the processing unit adapts the clock frequency based on processing demands, thereby conserving processing resources.
    Type: Grant
    Filed: November 22, 2019
    Date of Patent: March 1, 2022
    Assignees: Advanced Micro Devices, Inc., ATI TECHNOLOGIES ULC
    Inventors: Mangesh P. Nijasure, Michael Mantor, Ashkan Hosseinzadeh Namin, Louis Regniere
  • Publication number: 20220051985
    Abstract: A semiconductor package includes a first die, a second die, and an interconnect die coupled to a first plurality of through-die vias in the first die and a second plurality of through-die vias in the second die. The interconnect die provides communications pathways the first die and the second die.
    Type: Application
    Filed: October 30, 2020
    Publication date: February 17, 2022
    Inventors: RAHUL AGARWAL, RAJA SWAMINATHAN, MICHAEL S. ALFANO, GABRIEL H. LOH, ALAN D. SMITH, GABRIEL WONG, MICHAEL MANTOR
  • Patent number: 11226819
    Abstract: A processing unit includes a plurality of processing elements and one or more caches. A first thread executes a program that includes one or more prefetch instructions to prefetch information into a first cache. Prefetching is selectively enabled when executing the first thread on a first processing element dependent upon whether one or more second threads previously executed the program on the first processing element. The first thread is then dispatched to execute the program on the first processing element. In some cases, a dispatcher receives the first thread four dispatching to the first processing element. The dispatcher modifies the prefetch instruction to disable prefetching into the first cache in response to the one or more second threads having previously executed the program on the first processing element.
    Type: Grant
    Filed: November 20, 2017
    Date of Patent: January 18, 2022
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Brian Emberling, Michael Mantor