Patents by Inventor Matthaeus G. Chajdas

Matthaeus G. Chajdas has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240111710
    Abstract: A semiconductor module comprises multiple non-homogeneous semiconductor dies disposed on the semiconductor module, with each semiconductor die having a set of circuitry modules that are common to all of the semiconductor dies and also a set of supporting circuitry modules that are distinct between the semiconductor dies. An interconnect communicatively couples the semiconductor dies together. Commands for processing by the semiconductor module may be routed to individual semiconductor dies based on capabilities of the particular circuitry modules disposed on those individual semiconductor dies.
    Type: Application
    Filed: September 29, 2022
    Publication date: April 4, 2024
    Inventor: Matthaeus G. Chajdas
  • Publication number: 20240111574
    Abstract: Systems, apparatuses, and methods for implementing a hierarchical scheduler. In various implementations, a processor includes a global scheduler, and a plurality of independent local schedulers with each of the local schedulers coupled to a plurality of processors. In one implementation, the processor is a graphics processing unit and the processors are computation units. The processor further includes a shared cache that is shared by the plurality of local schedulers. Each of the local schedulers also includes a local cache used by the local scheduler and processors coupled to the local scheduler. To schedule work items for execution, the global scheduler is configured to store one or more work items in the shared cache and convey an indication to a first local scheduler of the plurality of local schedulers which causes the first local scheduler to retrieve the one or more work items from the shared cache.
    Type: Application
    Filed: September 29, 2022
    Publication date: April 4, 2024
    Inventors: Matthäus G. Chajdas, Michael J. Mantor, Rex Eldon McCrary, Christopher J. Brennan, Robert Martin, Dominik Baumeister, Fabian Robert Sebastian Wildgrube
  • Publication number: 20240111575
    Abstract: Systems, apparatuses, and methods for implementing a message passing system to schedule work in a computing system. In various implementations, a processor includes a global scheduler, and a plurality of local schedulers with each of the local schedulers coupled to a plurality of processors. The processor further includes a shared cache that is shared by the plurality of local schedulers. Also, a plurality of mailboxes are implemented to enable communication between the local schedulers and the global scheduler. To schedule work items for execution, the global scheduler is configured to store one or more work items in the shared cache and store an indication in a mailbox for a first local scheduler of the plurality of local schedulers. Responsive to detecting the message in the mailbox, the first local scheduler identifies a location of the one or more work items in the shared cache and retrieves them for scheduling locally.
    Type: Application
    Filed: September 29, 2022
    Publication date: April 4, 2024
    Inventors: Matthäus G. Chajdas, Michael J. Mantor, Rex Eldon McCrary, Christopher J. Brennan, Robert Martin, Dominik Baumeister, Fabian Robert Sebastian Wildgrube
  • Publication number: 20240111578
    Abstract: A method for hierarchical work scheduling includes consuming a work item at a first scheduling domain having a local scheduler circuit and one or more workgroup processing elements. Consuming the work item produces a set of new work items. Subsequently, the local scheduler circuit distributes at least one new work item of the set of new work items to be executed locally at the first scheduling domain. If the local scheduler circuit of the first scheduling domain determines that the set of new work items includes one or more work items that would overload the first scheduling domain with work if scheduled for local execution, those work items are distributed to the next higher-level scheduler circuit in a scheduling domain hierarchy for redistribution to one or more other scheduling domains.
    Type: Application
    Filed: September 30, 2022
    Publication date: April 4, 2024
    Inventors: Matthaeus G. Chajdas, Christopher J. Brennan, Michael Mantor, Robert W. Martin, Nicolai Haehnle
  • Publication number: 20240112397
    Abstract: In response to receiving a scene description, a processing system generates a set of planes in the scene and a bounding volume representing a partition of the scene. Using the set of planes in the scene, a compute unit of an accelerated processing unit performs a spatial test on the bounding volume to determine whether the bounding volume intersects one or more planes of the set of planes in the scene. Based on the spatial test, the compute unit generates intersection data indicating whether the bounding volume intersects one or more planes of the set of planes in the scene. The accelerated processing unit then uses the intersection data to render the scene.
    Type: Application
    Filed: September 30, 2022
    Publication date: April 4, 2024
    Inventors: Christopher J. Brennan, Matthaeus G. Chajdas
  • Publication number: 20240087223
    Abstract: A method and a processing device for performing rendering are disclosed. The method comprises generating a base hierarchy tree comprising data representing a first object and generating a second hierarchy tree representing a second object comprising shared data of the base hierarchy tree and the second hierarchy tree and difference data. The method further comprises storing the difference data in the memory without storing the shared data, and generating an overlay hierarchy tree comprising the shared data, the difference data, and indication information indicating nodes of the overlay hierarchy tree that comprise the difference data. The method further comprises rendering the first object using the data stored for the base hierarchy tree, and rendering the second object using any one or a combination of the shared data, the difference data, and the indication information.
    Type: Application
    Filed: November 10, 2023
    Publication date: March 14, 2024
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Matthäus G. Chajdas, Konstantin I. Shkurko
  • Patent number: 11915337
    Abstract: Systems, apparatuses, and methods for implementing a downsampler in a single compute shader pass are disclosed. A central processing unit (CPU) issues a single-pass compute shader kernel to perform downsampling of a texture on a graphics processing unit (GPU). The GPU includes a plurality of compute units for executing thread groups of the kernel. Each thread group fetches a patch of the texture, and each individual thread downsamples four quads of texels to compute mip levels 1 and 2 independently of the other threads. For mip level 3, texel data is written back over one of the local data share (LDS) entries from which the texel data was loaded. This eliminates the need for a barrier between loads and stores for computing mip level 3. The remaining mip levels are computed in a similar fashion by the thread groups of the single-pass kernel.
    Type: Grant
    Filed: February 23, 2021
    Date of Patent: February 27, 2024
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Lou Isabelle Kramer, Matthäus G. Chajdas
  • Patent number: 11854138
    Abstract: Described herein is a technique for modifying a bounding volume hierarchy. The techniques include combining preferred orientations of child nodes of a first bounding box node to generate a first preferred orientation; based on the first preferred orientation, converting one or more child nodes of the first bounding box node into one or more oriented bounding box nodes; combining preferred orientations of child nodes of a second bounding box node to generate a second preferred orientation; and based on the second preferred orientation, maintaining one or more children of the second bounding box node as non-oriented bounding box nodes.
    Type: Grant
    Filed: July 23, 2021
    Date of Patent: December 26, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Matthäus G Chajdas, Michael A. Kern, David Ronald Oldcorn
  • Publication number: 20230409337
    Abstract: Devices and methods for partial sorting for coherence recovery are provided. The partial sorting is efficiently executed by utilizing existing hardware along the memory path (e.g., memory local to the compute unit). The devices include an accelerated processing device which comprises memory and a processor. The processor is, for example, a compute unit of a GPU which comprises a plurality of SIMD units and is configured to determine, for data entries each comprising a plurality of bits, a number of occurrences of different types of the data entries by storing the number of occurrences in one or more portions of the memory local to the processor, sort the data entries based on the determined number of occurrences stored in the one or more portions of the memory local to the processor and execute the sorted data entries.
    Type: Application
    Filed: June 21, 2022
    Publication date: December 21, 2023
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Matthäus G. Chajdas, Christopher J. Brennan
  • Patent number: 11816792
    Abstract: Devices and methods for using ray tracing to render similar but different objects in a scene are described which include rendering a second object using an overlay hierarchy tree. The overlay hierarchy tree comprises shared data from a base hierarchy tree comprising data representing a first object in the scene, a second hierarchy tree representing the second object in the scene, difference data representing a difference between the first object and the second object and indication information which indicates nodes of the overlay hierarchy tree comprising difference data.
    Type: Grant
    Filed: December 16, 2021
    Date of Patent: November 14, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Matthäus G. Chajdas, Konstantin I. Shkurko
  • Publication number: 20230196669
    Abstract: Devices and methods for using ray tracing to render similar but different objects in a scene are described which include rendering a second object using an overlay hierarchy tree. The overlay hierarchy tree comprises shared data from a base hierarchy tree comprising data representing a first object in the scene, a second hierarchy tree representing the second object in the scene, difference data representing a difference between the first object and the second object and indication information which indicates nodes of the overlay hierarchy tree comprising difference data.
    Type: Application
    Filed: December 16, 2021
    Publication date: June 22, 2023
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Matthäus G. Chajdas, Konstantin I. Shkurko
  • Publication number: 20230097562
    Abstract: Described herein is a technique for performing ray tracing operations. The technique includes encountering, at a non-leaf node, a pointer to a bottom-level acceleration structure having one or more delta instances; identifying an index associated with the pointer, wherein the index identifies an instance within the bottom-level acceleration structure; and obtaining data for the instance based on the pointer and the index.
    Type: Application
    Filed: September 28, 2021
    Publication date: March 30, 2023
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Konstantin I. Shkurko, Matthäus G. Chajdas, Michael Mantor
  • Publication number: 20230099806
    Abstract: Described herein is a technique for performing operations for a bounding volume hierarchy. The techniques include: for a bounding box with quantized orientation, the bounding box being part of a bounding volume hierarchy, rotating a ray according to the quantized orientation to generate a rotated ray; performing an intersection test against the bounding box with the rotated ray; and according to the results of the intersection test, continuing traversal of the bounding volume hierarchy.
    Type: Application
    Filed: September 29, 2021
    Publication date: March 30, 2023
    Applicant: Advanced Micro Devices, Inc.
    Inventors: David Ronald Oldcorn, Matthäus G. Chajdas, Michael A. Kern
  • Publication number: 20230027725
    Abstract: Described herein is a technique for modifying a bounding volume hierarchy. The techniques include combining preferred orientations of child nodes of a first bounding box node to generate a first preferred orientation; based on the first preferred orientation, converting one or more child nodes of the first bounding box node into one or more oriented bounding box nodes; combining preferred orientations of child nodes of a second bounding box node to generate a second preferred orientation; and based on the second preferred orientation, maintaining one or more children of the second bounding box node as non-oriented bounding box nodes.
    Type: Application
    Filed: July 23, 2021
    Publication date: January 26, 2023
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Matthäus G. Chajdas, Michael A. Kern, David Ronald Oldcorn
  • Patent number: 11481967
    Abstract: Systems, apparatuses, and methods for executing a shader core instruction to invoke depth culling are disclosed. A shader core executes an instruction to invoke a culling function on a depth culling unit for one or more entities prior to completing a corresponding draw call. The shader core provides a mode and coordinates to the depth culling unit as a result of executing the instruction. The depth culling unit implements the culling function to access a live depth buffer to determine whether one or more primitives corresponding to the entities are occluded. The culling unit returns indication(s) to the shader core regarding the result(s) of processing the one or more primitives. For example, if the results indicate a primitive is occluded, the shader core cancels the draw call for the primitive.
    Type: Grant
    Filed: August 31, 2020
    Date of Patent: October 25, 2022
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Matthäus G. Chajdas, Christopher J. Brennan
  • Publication number: 20220197878
    Abstract: Systems, apparatuses, and methods for implementing a collapsed stack are disclosed. A parallel processor includes a plurality of compute units for executing wavefronts of a given application. Each compute unit includes multiple single-instruction, multiple-data (SIMD) units. When the work-items executing on the execution lanes of a SIMD unit are writing data values to a stack, many of the data values are repeated. In these cases, when the lanes are pushing duplicate data values to the stack, a control unit deduplicates the duplicate data values and stores the deduplicated data values. The control unit then generates a control word that maps the deduplicated data values to execution lanes and stores the control word in association with the stored data values. When the stored data values are restored, the control word is used to determine which lanes receive which values of the stored data values.
    Type: Application
    Filed: December 21, 2020
    Publication date: June 23, 2022
    Inventors: Matthäus G. Chajdas, Christopher J. Brennan
  • Publication number: 20220068012
    Abstract: Systems, apparatuses, and methods for executing a shader core instruction to invoke depth culling are disclosed. A shader core executes an instruction to invoke a culling function on a depth culling unit for one or more entities prior to completing a corresponding draw call. The shader core provides a mode and coordinates to the depth culling unit as a result of executing the instruction. The depth culling unit implements the culling function to access a live depth buffer to determine whether one or more primitives corresponding to the entities are occluded. The culling unit returns indication(s) to the shader core regarding the result(s) of processing the one or more primitives. For example, if the results indicate a primitive is occluded, the shader core cancels the draw call for the primitive.
    Type: Application
    Filed: August 31, 2020
    Publication date: March 3, 2022
    Inventors: Matthäus G. Chajdas, Christopher J. Brennan
  • Publication number: 20210287325
    Abstract: Systems, apparatuses, and methods for implementing a downsampler in a single compute shader pass are disclosed. A central processing unit (CPU) issues a single-pass compute shader kernel to perform downsampling of a texture on a graphics processing unit (GPU). The GPU includes a plurality of compute units for executing thread groups of the kernel. Each thread group fetches a patch of the texture, and each individual thread downsamples four quads of texels to compute mip levels 1 and 2 independently of the other threads. For mip level 3, texel data is written back over one of the local data share (LDS) entries from which the texel data was loaded. This eliminates the need for a barrier between loads and stores for computing mip level 3. The remaining mip levels are computed in a similar fashion by the thread groups of the single-pass kernel.
    Type: Application
    Filed: February 23, 2021
    Publication date: September 16, 2021
    Inventors: Lou Isabelle Kramer, Matthäus G. Chajdas