Patents by Inventor Michael Mantor
Michael Mantor has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11720328Abstract: A parallel processing unit employs an arithmetic logic unit (ALU) having a relatively small footprint, thereby reducing the overall power consumption and circuit area of the processing unit. To support the smaller footprint, the ALU includes multiple stages to execute operations corresponding to a received instruction. The ALU executes at least one operation at a precision indicated by the received instruction, and then reduces the resulting data of the at least one operation to a smaller size before providing the results to another stage of the ALU to continue execution of the instruction.Type: GrantFiled: September 23, 2020Date of Patent: August 8, 2023Assignee: Advanced Micro Devices, Inc.Inventors: Bin He, Shubh Shah, Michael Mantor
-
Publication number: 20230206543Abstract: A processing unit employs a hardware traversal engine to traverse an acceleration structure such as a ray tracing structure. The hardware traversal engine includes one or more memory modules to store state information and other data used for the structure traversal, and control logic to execute a traversal process based on the stored data and based on received information indicating a source node of the acceleration structure to be used for the traversal process. By employing a hardware traversal engine, the processing unit is able to execute the traversal process more quickly and efficiently, conserving processing resources and improving overall processing efficiency.Type: ApplicationFiled: December 28, 2021Publication date: June 29, 2023Inventors: Konstantin Igorevich SHKURKO, Michael Mantor
-
Patent number: 11675568Abstract: A processing system executes wavefronts at multiple arithmetic logic unit (ALU) pipelines of a single instruction multiple data (SIMD) unit in a single execution cycle. The ALU pipelines each include a number of ALUs that execute instructions on wavefront operands that are collected from vector general process register (VGPR) banks at a cache and output results of the instructions executed on the wavefronts at a buffer. By storing wavefronts supplied by the VGPR banks at the cache, a greater number of wavefronts can be made available to the SIMD unit without increasing the VGPR bandwidth, enabling multiple ALU pipelines to execute instructions during a single execution cycle.Type: GrantFiled: December 14, 2020Date of Patent: June 13, 2023Assignee: Advanced Micro Devices, Inc.Inventors: Bin He, Brian Emberling, Mark Leather, Michael Mantor
-
Patent number: 11635967Abstract: An array processor includes processor element arrays distributed in rows and columns. The processor element arrays perform operations on parameter values. The array processor also includes memory interfaces that broadcast sets of the parameter values to mutually exclusive subsets of the rows and columns of the processor element arrays. In some cases, the array processor includes single-instruction-multiple-data (SIMD) units including subsets of the processor element arrays in corresponding rows, workgroup processors (WGPs) including subsets of the SIMD units, and a memory fabric configured to interconnect with an external memory that stores the parameter values. The memory interfaces broadcast the parameter values to the SIMD units that include the processor element arrays in rows associated with the memory interfaces and columns of processor element arrays that are implemented across the SIMD units in the WGPs. The memory interfaces access the parameter values from the external memory via the memory fabric.Type: GrantFiled: September 25, 2020Date of Patent: April 25, 2023Assignee: Advanced Micro Devices, Inc.Inventors: Sateesh Lagudu, Allen H. Rush, Michael Mantor, Arun Vaidyanathan Ananthanarayan, Prasad Nagabhushanamgari, Maxim V. Kazakov
-
Patent number: 11630667Abstract: A processor includes a plurality of vector sub-processors (VSPs) and a plurality of memory banks dedicated to respective VSPs. A first memory bank corresponding to a first VSP includes a first plurality of high vector general purpose register (VGPR) banks and a first plurality of low VGPR banks corresponding to the first plurality of high VGPR banks. The first memory bank further includes a plurality of operand gathering components that store operands from respective high VGPR banks and low VGPR banks. The operand gathering components are assigned to individual threads while the threads are executed by the first VSP.Type: GrantFiled: November 27, 2019Date of Patent: April 18, 2023Assignee: Advanced Micro Devices, Inc.Inventors: Jiasheng Chen, Bin He, Jian Huang, Michael Mantor
-
Publication number: 20230097562Abstract: Described herein is a technique for performing ray tracing operations. The technique includes encountering, at a non-leaf node, a pointer to a bottom-level acceleration structure having one or more delta instances; identifying an index associated with the pointer, wherein the index identifies an instance within the bottom-level acceleration structure; and obtaining data for the instance based on the pointer and the index.Type: ApplicationFiled: September 28, 2021Publication date: March 30, 2023Applicant: Advanced Micro Devices, Inc.Inventors: Konstantin I. Shkurko, Matthäus G. Chajdas, Michael Mantor
-
Publication number: 20230097279Abstract: Methods and systems are disclosed for executing operations on single-instruction-multiple-data (SIMD) units. Techniques disclosed perform a dot product operation on input data during one computer cycle, including convolving the input data, generating intermediate data, and applying one or more transitional operations to the intermediate data to generate output data. Aspects described, wherein the input data is an input to a layer of a convolutional neural network and the generated output data is the output of the layer.Type: ApplicationFiled: September 29, 2021Publication date: March 30, 2023Applicant: Advanced Micro Devices, Inc.Inventors: Brian Emberling, Michael Mantor, Michael Y. Chow, Bin He
-
Patent number: 11609791Abstract: A first workload is executed in a first subset of pipelines of a processing unit. A second workload is executed in a second subset of the pipelines of the processing unit. The second workload is dependent upon the first workload. The first and second workloads are suspended and state information for the first and second workloads is stored in a first memory in response to suspending the first and second workloads. In some cases, a third workload executes in a third subset of the pipelines of the processing unit concurrently with executing the first and second workloads. In some cases, a fourth workload is executed in the first and second pipelines after suspending the first and second workloads. The first and second pipelines are resumed on the basis of the stored state information in response to completion or suspension of the fourth workload.Type: GrantFiled: November 30, 2017Date of Patent: March 21, 2023Assignee: Advanced Micro Devices, Inc.Inventors: Anirudh R. Acharya, Michael Mantor
-
Publication number: 20230076872Abstract: Embodiments include methods, systems and non-transitory computer-readable computer readable media including instructions for executing a prefetch kernel that includes memory accesses for prefetching data for a processing kernel into a memory, and, subsequent to executing at least a portion of the prefetch kernel, executing the processing kernel where the processing kernel includes accesses to data that is stored into the memory resulting from execution of the prefetch kernel.Type: ApplicationFiled: November 11, 2022Publication date: March 9, 2023Applicant: Advanced Micro Devices, Inc.Inventors: Nuwan S. Jayasena, James Michael O'Connor, Michael Mantor
-
Patent number: 11500778Abstract: Embodiments include methods, systems and non-transitory computer-readable computer readable media including instructions for executing a prefetch kernel with reduced intermediate state storage resource requirements. These include executing a prefetch kernel on a graphics processing unit (GPU), such that the prefetch kernel begins executing before a processing kernel. The prefetch kernel performs memory operations that are based upon at least a subset of memory operations in the processing kernel.Type: GrantFiled: March 9, 2020Date of Patent: November 15, 2022Assignee: Advanced Micro Devices, Inc.Inventors: Nuwan S. Jayasena, James Michael O'Connor, Michael Mantor
-
Patent number: 11494192Abstract: A processing element is implemented in a stage of a pipeline and configured to execute an instruction. A first array of multiplexers is to provide information associated with the instruction to the processing element in response to the instruction being in a first set of instructions. A second array of multiplexers is to provide information associated with the instruction to the first processing element in response to the instruction being in a second set of instructions. A control unit is to gate at least one of power or a clock signal provided to the first array of multiplexers in response to the instruction being in the second set.Type: GrantFiled: April 28, 2020Date of Patent: November 8, 2022Assignees: Advanced Micro Devices, Inc., ADVANCED MICRO DEVICES (SHANGHAI) CO., LTD.Inventors: Jiasheng Chen, YunXiao Zou, Bin He, Angel E. Socarras, QingCheng Wang, Wei Yuan, Michael Mantor
-
Publication number: 20220320042Abstract: A multi-die parallel processor semiconductor package includes a first base IC die including a first plurality of virtual compute dies 3D stacked on top of the first base IC die. A first subset of a parallel processing pipeline logic is positioned at the first plurality of virtual compute dies. Additionally, a second subset of the parallel processing pipeline logic is positioned at the first base IC die. The multi-die parallel processor semiconductor package also includes a second base IC die including a second plurality of virtual compute dies 3D stacked on top of the second base IC die. An active bridge chip communicably couples a first interconnect structure of the first base IC die to a first interconnect structure of the second base IC die.Type: ApplicationFiled: March 30, 2021Publication date: October 6, 2022Inventor: Michael MANTOR
-
Publication number: 20220277508Abstract: A method, computer system, and a non-transitory computer-readable storage medium for performing primitive batch binning are disclosed. The method, computer system, and non-transitory computer-readable storage medium include techniques for generating a primitive batch from a plurality of primitives, computing respective bin intercepts for each of the plurality of primitives in the primitive batch, and shading the primitive batch by iteratively processing each of the respective bin intercepts computed until all of the respective bin intercepts are processed.Type: ApplicationFiled: May 16, 2022Publication date: September 1, 2022Applicants: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Michael Mantor, Laurent Lefebvre, Mark Fowler, Timothy Kelley, Mikko Alho, Mika Tuomi, Kiia Kallio, Patrick Klas Rudolf Buss, Jari Antero Komppa, Kaj Tuomi
-
Publication number: 20220269620Abstract: A processor maintains an access log indicating a stream of cache misses at a cache of the processor. In response to each of at least a subset of cache misses at the cache, the processor records a corresponding entry in the access log, indicating a physical memory address of the memory access request that resulted in the corresponding miss. In addition, the processor maintains an address translation log that indicates a mapping of physical memory addresses to virtual memory addresses. In response to an address translation (e.g., a page walk) that translates a virtual address to a physical address, the processor stores a mapping of the physical address to the corresponding virtual address at an entry of the address translation log. Software executing at the processor can use the two logs for memory management.Type: ApplicationFiled: February 8, 2022Publication date: August 25, 2022Inventors: Benjamin T. SANDER, Mark Fowler, Anthony Asaro, Gongxian Jeffrey Cheng, Michael Mantor
-
Patent number: 11409840Abstract: An array processor includes processor element arrays distributed in rows and columns. The processor element arrays perform operations on parameter values. The array processor also includes memory interfaces that are dynamically mapped to mutually exclusive subsets of the rows and columns of the processor element arrays based on dimensions of matrices that provide the parameter values to the processor element arrays. In some cases, the processor element arrays are vector arithmetic logic unit (ALU) processors and the memory interfaces are direct memory access (DMA) engines. The rows of the processor element arrays in the subsets are mutually exclusive to the rows in the other subsets and the columns of the processor element arrays in the subsets are mutually exclusive to the columns in the other subsets. The matrices can be symmetric or asymmetric, e.g., one of the matrices can be a vector having a single column.Type: GrantFiled: September 25, 2020Date of Patent: August 9, 2022Assignee: Advanced Micro Devices, Inc.Inventors: Sateesh Lagudu, Allen H. Rush, Michael Mantor, Arun Vaidyanathan Ananthanarayan, Prasad Nagabhushanamgari
-
Patent number: 11409536Abstract: A method and apparatus for performing a multi-precision computation in a plurality of arithmetic logic units (ALUs) includes pairing a first Single Instruction/Multiple Data (SIMD) block channel device with a second SIMD block channel device to create a first block pair having one-level staggering between the first and second channel devices. A third SIMD block channel device is paired with a fourth SIMD block channel device to create a second block pair having one-level staggering between the third and fourth channel devices. A plurality of source inputs are received at the first block pair and the second block pair. The first block pair computes a first result, and the second block pair computes a second result.Type: GrantFiled: November 3, 2016Date of Patent: August 9, 2022Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Bin He, YunXiao Zou, Jiasheng Chen, Michael Mantor
-
Publication number: 20220237851Abstract: A graphics processing unit (GPU) or other apparatus includes a plurality of shader engines. The apparatus also includes a first front end (FE) circuit and one or more second FE circuits. The first FE circuit is configured to schedule geometry workloads for the plurality of shader engines in a first mode. The first FE circuit is configured to schedule geometry workloads for a first subset of the plurality of shader engines and the one or more second FE circuits are configured to schedule geometry workloads for a second subset of the plurality of shader engines in a second mode. In some cases, a partition switch is configured to selectively connect the first FE circuit or the one or more second FE circuits to the second subset of the plurality of shader engines depending on whether the apparatus is in the first mode or the second mode.Type: ApplicationFiled: March 29, 2022Publication date: July 28, 2022Inventors: Mark LEATHER, Michael MANTOR
-
Patent number: 11397578Abstract: An apparatus such as a graphics processing unit (GPU) includes a plurality of processing elements configured to concurrently execute a plurality of first waves and accumulators associated with the plurality of processing elements. The accumulators are configured to store accumulated values representative of behavioral characteristics of the plurality of first waves that are concurrently executing on the plurality of processing elements. The apparatus also includes a dispatcher configured to dispatch second waves to the plurality of processing elements based on comparisons of values representative of behavioral characteristics of the second waves and the accumulated values stored in the accumulators. In some cases, the behavioral characteristics of the plurality of first waves comprise at least one of fetch bandwidths, usage of an arithmetic logic unit (ALU), and number of export operations.Type: GrantFiled: August 30, 2019Date of Patent: July 26, 2022Assignee: Advanced Micro Devices, Inc.Inventors: Randy Ramsey, William David Isenberg, Michael Mantor
-
Patent number: 11386518Abstract: The address of the draw or dispatch packet responsible for creating an exception is tied to a shader/wavefront back to the draw command from which it originated. In various embodiments, a method of operating a graphics pipeline and exception handling includes receiving, at a command processor of a graphics processing unit (GPU), an exception signal indicating an occurrence of a pipeline exception at a shader stage of a graphics pipeline. The shader stage generates an exception signal in response to a pipeline exception and transmits the exception signal to the command processor. The command processor determines, based on the exception signal, an address of a command packet responsible for the occurrence of the pipeline exception.Type: GrantFiled: September 24, 2019Date of Patent: July 12, 2022Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Michael Mantor, Alexander Fuad Ashkar, Randy Ramsey, Mangesh P. Nijasure, Brian Emberling
-
Patent number: 11379941Abstract: Improvements in the graphics processing pipeline are disclosed. More specifically, a new primitive shader stage performs tasks of the vertex shader stage or a domain shader stage if tessellation is enabled, a geometry shader if enabled, and a fixed function primitive assembler. The primitive shader stage is compiled by a driver from user-provided vertex or domain shader code, geometry shader code, and from code that performs functions of the primitive assembler. Moving tasks of the fixed function primitive assembler to a primitive shader that executes in programmable hardware provides many benefits, such as removal of a fixed function crossbar, removal of dedicated parameter and position buffers that are unusable in general compute mode, and other benefits.Type: GrantFiled: January 25, 2017Date of Patent: July 5, 2022Assignees: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Todd Martin, Mangesh P. Nijasure, Randy W. Ramsey, Michael Mantor, Laurent Lefebvre