Patents by Inventor Maxim V. KAZAKOV

Maxim V. KAZAKOV has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11960399
    Abstract: Methods, systems, and devices maintain state information in a shadow tag memory for a plurality of cachelines in each of a plurality of private caches, with each of the private caches being associated with a corresponding one of multiple processing cores. One or more cache probes are generated based on a write operation associated with one or more cachelines of the plurality of cachelines, such that each of the cache probes is associated with cachelines of a particular private cache of the multiple private caches, the particular private cache being associated with an indicated processing core. Transmission of the cache probes to the particular private cache is prevented until, responsive to a scope acquire operation from the indicated processing core, the cache probes are released for transmission to the respectively associated cachelines in the particular private cache.
    Type: Grant
    Filed: December 21, 2021
    Date of Patent: April 16, 2024
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Akhil Arunkumar, Tarun Nakra, Maxim V. Kazakov, Milind N. Nemlekar
  • Publication number: 20240029336
    Abstract: Techniques for executing computing work by a plurality of chiplets are provided. The techniques include assigning workgroups of a kernel dispatch packet to the chiplets; by each chiplet, executing the workgroups assigned to that chiplet; for each chiplet, upon completion of all workgroups assigned to that chiplet for the kernel dispatch packet, notifying the other chiplets of such completion; and upon completion of all workgroups of the kernel dispatch packet, notifying a client of such completion and proceeding to a subsequent kernel dispatch packet.
    Type: Application
    Filed: October 3, 2023
    Publication date: January 25, 2024
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Milind N. Nemlekar, Maxim V. Kazakov, Prerit Dak
  • Patent number: 11790590
    Abstract: Techniques for executing computing work by a plurality of chiplets are provided. The techniques include assigning workgroups of a kernel dispatch packet to the chiplets; by each chiplet, executing the workgroups assigned to that chiplet; for each chiplet, upon completion of all workgroups assigned to that chiplet for the kernel dispatch packet, notifying the other chiplets of such completion; and upon completion of all workgroups of the kernel dispatch packet, notifying a client of such completion and proceeding to a subsequent kernel dispatch packet.
    Type: Grant
    Filed: March 31, 2021
    Date of Patent: October 17, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Milind N. Nemlekar, Maxim V. Kazakov, Prerit Dak
  • Publication number: 20230289191
    Abstract: An array processor includes processor element arrays distributed in rows and columns. The processor element arrays perform operations on parameter values. The array processor also includes memory interfaces that broadcast sets of the parameter values to mutually exclusive subsets of the rows and columns of the processor element arrays. In some cases, the array processor includes single-instruction-multiple-data (SIMD) units including subsets of the processor element arrays in corresponding rows, workgroup processors (WGPs) including subsets of the SIMD units, and a memory fabric configured to interconnect with an external memory that stores the parameter values. The memory interfaces broadcast the parameter values to the SIMD units that include the processor element arrays in rows associated with the memory interfaces and columns of processor element arrays that are implemented across the SIMD units in the WGPs. The memory interfaces access the parameter values from the external memory via the memory fabric.
    Type: Application
    Filed: March 30, 2023
    Publication date: September 14, 2023
    Inventors: Sateesh LAGUDU, Allen H. Rush, Michael Mantor, Arun Vaidyanathan Ananthanarayan, Prasad Nagabhushanamgari, Maxim V. Kazakov
  • Publication number: 20230266975
    Abstract: Techniques are provided for executing wavefronts. The techniques include at a first time for issuing instructions for execution, performing first identifying, including identifying that sufficient processing resources exist to execute a first set of instructions together within a processing lane; in response to the first identifying, executing the first set of instructions together; at a second time for issuing instructions for execution, performing second identifying, including identifying that no instructions are available for which sufficient processing resources exist for execution together within the processing lane; and in response to the second identifying, executing an instruction independently of any other instruction.
    Type: Application
    Filed: April 28, 2023
    Publication date: August 24, 2023
    Applicant: Advanced Micro Devices, Inc.
    Inventor: Maxim V. Kazakov
  • Publication number: 20230195628
    Abstract: Methods, systems, and devices maintain state information in a shadow tag memory for a plurality of cachelines in each of a plurality of private caches, with each of the private caches being associated with a corresponding one of multiple processing cores. One or more cache probes are generated based on a write operation associated with one or more cachelines of the plurality of cachelines, such that each of the cache probes is associated with cachelines of a particular private cache of the multiple private caches, the particular private cache being associated with an indicated processing core. Transmission of the cache probes to the particular private cache is prevented until, responsive to a scope acquire operation from the indicated processing core, the cache probes are released for transmission to the respectively associated cachelines in the particular private cache.
    Type: Application
    Filed: December 21, 2021
    Publication date: June 22, 2023
    Inventors: Akhil Arunkumar, Tarun Nakra, Maxim V. Kazakov, Milind N. Nemlekar
  • Patent number: 11657119
    Abstract: A processing device is provided which includes memory configured to store data and a processor configured to determine, based on convolutional parameters associated with an image, a virtual general matrix-matrix multiplication (GEMM) space of a virtual GEMM space output matrix and generate, in the virtual GEMM space output matrix, a convolution result by matrix multiplying the data corresponding to a virtual GEMM space input matrix with the data corresponding to a virtual GEMM space filter matrix. The processing device also includes convolutional mapping hardware configured to map, based on the convolutional parameters, positions of the virtual GEMM space input matrix to positions of an image space of the image.
    Type: Grant
    Filed: August 30, 2019
    Date of Patent: May 23, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Swapnil P. Sakharshete, Samuel Lawrence Wasmundt, Maxim V. Kazakov, Vineet Goel
  • Patent number: 11656877
    Abstract: Techniques are provided for executing wavefronts. The techniques include at a first time for issuing instructions for execution, performing first identifying, including identifying that sufficient processing resources exist to execute a first set of instructions together within a processing lane; in response to the first identifying, executing the first set of instructions together; at a second time for issuing instructions for execution, performing second identifying, including identifying that no instructions are available for which sufficient processing resources exist for execution together within the processing lane; and in response to the second identifying, executing an instruction independently of any other instruction.
    Type: Grant
    Filed: March 31, 2021
    Date of Patent: May 23, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventor: Maxim V. Kazakov
  • Patent number: 11635967
    Abstract: An array processor includes processor element arrays distributed in rows and columns. The processor element arrays perform operations on parameter values. The array processor also includes memory interfaces that broadcast sets of the parameter values to mutually exclusive subsets of the rows and columns of the processor element arrays. In some cases, the array processor includes single-instruction-multiple-data (SIMD) units including subsets of the processor element arrays in corresponding rows, workgroup processors (WGPs) including subsets of the SIMD units, and a memory fabric configured to interconnect with an external memory that stores the parameter values. The memory interfaces broadcast the parameter values to the SIMD units that include the processor element arrays in rows associated with the memory interfaces and columns of processor element arrays that are implemented across the SIMD units in the WGPs. The memory interfaces access the parameter values from the external memory via the memory fabric.
    Type: Grant
    Filed: September 25, 2020
    Date of Patent: April 25, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Sateesh Lagudu, Allen H. Rush, Michael Mantor, Arun Vaidyanathan Ananthanarayan, Prasad Nagabhushanamgari, Maxim V. Kazakov
  • Publication number: 20230069890
    Abstract: An accelerated processing device is provided which comprises a plurality of compute units each including a plurality of SIMD units, and each SIMD unit comprises a register file. The accelerated processing device also comprises LDS in communication with each of the SIMD units. The accelerated processing device also comprises a first portion of cache memory, in communication with each of the SIMD units and a second cache portion of memory shared by the compute units. The compute units are configured to execute a program in which a storage portion of at least one of the register file of a SIMD unit, the first portion of cache memory and the LDS is reserved as part of another of the register file, the first portion of cache memory and the LDS.
    Type: Application
    Filed: September 3, 2021
    Publication date: March 9, 2023
    Applicant: Advanced Micro Devices, Inc.
    Inventor: Maxim V. Kazakov
  • Publication number: 20230004385
    Abstract: A processing device is provided which comprises a plurality of compute units configured to process data, a plurality of arithmetic logic units, instantiated separate from the plurality of compute units, and configured to store the data at the arithmetic logic units and perform calculations using the data and an interconnect network, connecting the arithmetic logic units and configured to provide the arithmetic logic units with shared access to the data for communication between the arithmetic logic units. The interconnect network is also configured to provide the compute units with shared access to the data for communication between the compute units.
    Type: Application
    Filed: June 30, 2021
    Publication date: January 5, 2023
    Applicant: Advanced Micro Devices, Inc.
    Inventor: Maxim V. Kazakov
  • Publication number: 20230004871
    Abstract: Methods, systems, and devices for pipeline fusion of a plurality of kernels. In some implementations, a first batch of a first kernel is executed on a first processing device to generate a first output of the first kernel based on an input. A first batch of a second kernel is executed on a second processing device to generate a first output of the second kernel based on the first output of the first kernel. A second batch of the first kernel is executed on the first processing device to generate a second output of the first kernel based on the input. The execution of the second batch of the first kernel overlaps at least partially in time with executing the first batch of the second kernel.
    Type: Application
    Filed: June 30, 2021
    Publication date: January 5, 2023
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Swapnil P. Sakharshete, Maxim V. Kazakov, Milind N. Nemlekar, Samuel Lawrence Wasmundt
  • Publication number: 20220413858
    Abstract: A processing device is provided which comprises memory, a plurality of registers and a processor. the processor is configured to execute a plurality of portions of a program, allocate a number of the registers per portion of the program such that a number of remaining registers are available as a register cache and transfer data between the number of registers, which are allocated per portion of the program, and the register cache. The processor loads data to the allocated registers to execute a portion of the program, stores data, resulting from execution of the portion, in the register cache, reloads the data in the allocated registers and executes another portion of the program using the data reloaded to the allocated registers and A called function uses the number of allocated registers, which is less than an architectural limit of registers allocated per portion of the program.
    Type: Application
    Filed: June 28, 2021
    Publication date: December 29, 2022
    Applicant: Advanced Micro Devices, Inc.
    Inventor: Maxim V. Kazakov
  • Patent number: 11521342
    Abstract: A processor receives a request to access one or more levels of a partially resident texture (PRT) resource. The levels represent a texture at different levels of detail (LOD) and the request includes normalized coordinates indicating a location in the texture. The processor accesses a texture descriptor that includes dimensions of a first level of the levels and one or more offsets between a reference level and one or more second levels that are associated with one or more residency maps that indicate texels that are resident in the PRT resource. The processor translates the normalized coordinates to texel coordinates in the one or more residency maps based on the offset and accesses, in response to the request, the one or more residency maps based on the texel coordinates to determine whether texture data indicated by the normalized coordinates is resident in the PRT resource.
    Type: Grant
    Filed: April 14, 2021
    Date of Patent: December 6, 2022
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Maxim V. Kazakov, Mark Fowler
  • Publication number: 20220318021
    Abstract: Techniques are provided for executing wavefronts. The techniques include at a first time for issuing instructions for execution, performing first identifying, including identifying that sufficient processing resources exist to execute a first set of instructions together within a processing lane; in response to the first identifying, executing the first set of instructions together; at a second time for issuing instructions for execution, performing second identifying, including identifying that no instructions are available for which sufficient processing resources exist for execution together within the processing lane; and in response to the second identifying, executing an instruction independently of any other instruction.
    Type: Application
    Filed: March 31, 2021
    Publication date: October 6, 2022
    Applicant: Advanced Micro Devices, Inc.
    Inventor: Maxim V. Kazakov
  • Publication number: 20220319089
    Abstract: Techniques for executing computing work by a plurality of chiplets are provided. The techniques include assigning workgroups of a kernel dispatch packet to the chiplets; by each chiplet, executing the workgroups assigned to that chiplet; for each chiplet, upon completion of all workgroups assigned to that chiplet for the kernel dispatch packet, notifying the other chiplets of such completion; and upon completion of all workgroups of the kernel dispatch packet, notifying a client of such completion and proceeding to a subsequent kernel dispatch packet.
    Type: Application
    Filed: March 31, 2021
    Publication date: October 6, 2022
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Milind N. Nemlekar, Maxim V. Kazakov, Prerit Dak
  • Publication number: 20220309125
    Abstract: A processing device is provided which comprises memory configured to store data and a processor. The processor comprises a plurality of MACs configured to perform matrix multiplication of elements of a first matrix and elements of a second matrix. The processor also comprises a plurality of logic devices configured to sum values of bits of product exponents values of the elements of the first matrix and second matrix and determine keep bit values for product exponents values to be kept for matrix multiplication. The processor also comprises a plurality of multiplexor arrays each configured to receive bits of the elements of the first matrix and the second matrix and the keep bit values and provide data for selecting which elements of the first matrix and the second matrix values are provided to the MACs for matrix multiplication.
    Type: Application
    Filed: March 26, 2021
    Publication date: September 29, 2022
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Swapnil P. Sakharshete, Pramod Vasant Argade, Maxim V. Kazakov, Alexander M. Potapov
  • Publication number: 20220309126
    Abstract: A processing device is provided which comprises memory configured to store data and a processor configured to receive a portion of data of a first matrix comprising a first plurality of elements and receive a portion of data of a second matrix comprising a second plurality of elements. The processor is also configured to determine values for a third matrix by dropping a number of products from products of pairs of elements of the first and second matrices based on approximating the products of the pairs of elements as a sum of the exponents of the pairs of elements and performing matrix multiplication on remaining products of the pairs of elements of the first and second matrices.
    Type: Application
    Filed: March 26, 2021
    Publication date: September 29, 2022
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Pramod Vasant Argade, Swapnil P. Sakharshete, Maxim V. Kazakov, Alexander M. Potapov
  • Patent number: 11403221
    Abstract: A system and method for efficiently processing memory requests are described. A computing system includes multiple compute units, multiple caches of a memory hierarchy and a communication fabric. A compute unit generates a memory access request that misses in a higher level cache, which sends a miss request to a lower level shared cache. During servicing of the miss request, the lower level cache merges identification information of multiple memory access requests targeting a same cache line from multiple compute units into a merged memory access response. The lower level shared cache continues to insert information into the merged memory access response until the lower level shared cache is ready to issue the merged memory access response. An intermediate router in the communication fabric broadcasts the merged memory access response into multiple memory access responses to send to corresponding compute units.
    Type: Grant
    Filed: September 24, 2020
    Date of Patent: August 2, 2022
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Onur Kayiran, Yasuko Eckert, Mark Henry Oskin, Gabriel H. Loh, Steven E. Raasch, Maxim V. Kazakov
  • Publication number: 20220207411
    Abstract: A graphics processing unit (GPU) for clustering of machine learning (ML) functional components, including: a plurality of compute units; a plurality of ML clusters, wherein each of the ML clusters comprises at least one arithmetic logic unit (ALU), and wherein each of the ML clusters is associated with a respective subset of the compute units; and a plurality of memory modules each positioned on the GPU adjacent to a respective ML cluster of the plurality of ML clusters, wherein each ML cluster is configured to directly access one or more adjacent memory modules.
    Type: Application
    Filed: December 28, 2020
    Publication date: June 30, 2022
    Inventors: MAXIM V. KAZAKOV, MILIND N. NEMLEKAR, SWAPNIL SAKHARSHETE, VINEET GOEL