Patents by Inventor Maxim V. KAZAKOV

Maxim V. KAZAKOV has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Vertical and horizontal broadcast of shared operands

Patent number: 12327124

Abstract: An array processor includes processor element arrays distributed in rows and columns. The processor element arrays perform operations on parameter values. The array processor also includes memory interfaces that broadcast sets of the parameter values to mutually exclusive subsets of the rows and columns of the processor element arrays. In some cases, the array processor includes single-instruction-multiple-data (SIMD) units including subsets of the processor element arrays in corresponding rows, workgroup processors (WGPs) including subsets of the SIMD units, and a memory fabric configured to interconnect with an external memory that stores the parameter values. The memory interfaces broadcast the parameter values to the SIMD units that include the processor element arrays in rows associated with the memory interfaces and columns of processor element arrays that are implemented across the SIMD units in the WGPs. The memory interfaces access the parameter values from the external memory via the memory fabric.

Type: Grant

Filed: March 30, 2023

Date of Patent: June 10, 2025

Assignee: Advanced Micro Devices, Inc.

Inventors: Sateesh Lagudu, Allen H. Rush, Michael Mantor, Arun Vaidyanathan Ananthanarayan, Prasad Nagabhushanamgari, Maxim V. Kazakov
Processing device and method of sharing storage between cache memory, local data storage and register files

Patent number: 12265484

Abstract: An accelerated processing device is provided which comprises a plurality of compute units each including a plurality of SIMD units, and each SIMD unit comprises a register file. The accelerated processing device also comprises LDS in communication with each of the SIMD units. The accelerated processing device also comprises a first portion of cache memory, in communication with each of the SIMD units and a second cache portion of memory shared by the compute units. The compute units are configured to execute a program in which a storage portion of at least one of the register file of a SIMD unit, the first portion of cache memory and the LDS is reserved as part of another of the register file, the first portion of cache memory and the LDS.

Type: Grant

Filed: September 3, 2021

Date of Patent: April 1, 2025

Assignee: Advanced Micro Devices, Inc.

Inventor: Maxim V. Kazakov
DIRECT-CONNECTED MACHINE LEARNING ACCELERATOR

Publication number: 20250086515

Abstract: Techniques are disclosed for communicating between a machine learning accelerator and one or more processing cores. The techniques include obtaining data at the machine learning accelerator via an input/output die; processing the data at the machine learning accelerator to generate machine learning processing results; and exporting the machine learning processing results via the input/output die, wherein the input/output die is coupled to one or more processor chiplets via one or more processor ports, and wherein the input/output die is coupled to the machine learning accelerator via an accelerator port.

Type: Application

Filed: November 21, 2024

Publication date: March 13, 2025

Applicant: Advanced Micro Devices, Inc.

Inventor: Maxim V. Kazakov
Wavefront selection and execution

Patent number: 12248789

Abstract: Techniques are provided for executing wavefronts. The techniques include at a first time for issuing instructions for execution, performing first identifying, including identifying that sufficient processing resources exist to execute a first set of instructions together within a processing lane; in response to the first identifying, executing the first set of instructions together; at a second time for issuing instructions for execution, performing second identifying, including identifying that no instructions are available for which sufficient processing resources exist for execution together within the processing lane; and in response to the second identifying, executing an instruction independently of any other instruction.

Type: Grant

Filed: April 28, 2023

Date of Patent: March 11, 2025

Assignee: Advanced Micro Devices, Inc.

Inventor: Maxim V. Kazakov
Wavefront selection and execution

Patent number: 12217061

Abstract: Techniques are provided for executing wavefronts. The techniques include at a first time for issuing instructions for execution, performing first identifying, including identifying that sufficient processing resources exist to execute a first set of instructions together within a processing lane; in response to the first identifying, executing the first set of instructions together; at a second time for issuing instructions for execution, performing second identifying, including identifying that no instructions are available for which sufficient processing resources exist for execution together within the processing lane; and in response to the second identifying, executing an instruction independently of any other instruction.

Type: Grant

Filed: April 28, 2023

Date of Patent: February 4, 2025

Assignee: Advanced Micro Devices, Inc.

Inventor: Maxim V. Kazakov
Approximation of matrices for matrix multiply operations

Patent number: 12197533

Abstract: A processing device is provided which comprises memory configured to store data and a processor configured to receive a portion of data of a first matrix comprising a first plurality of elements and receive a portion of data of a second matrix comprising a second plurality of elements. The processor is also configured to determine values for a third matrix by dropping a number of products from products of pairs of elements of the first and second matrices based on approximating the products of the pairs of elements as a sum of the exponents of the pairs of elements and performing matrix multiplication on remaining products of the pairs of elements of the first and second matrices.

Type: Grant

Filed: March 26, 2021

Date of Patent: January 14, 2025

Assignee: Advanced Micro Devices, Inc.

Inventors: Pramod Vasant Argade, Swapnil P. Sakharshete, Maxim V. Kazakov, Alexander M. Potapov
Multi-accelerator compute dispatch

Patent number: 12165252

Abstract: Techniques for executing computing work by a plurality of chiplets are provided. The techniques include assigning workgroups of a kernel dispatch packet to the chiplets; by each chiplet, executing the workgroups assigned to that chiplet; for each chiplet, upon completion of all workgroups assigned to that chiplet for the kernel dispatch packet, notifying the other chiplets of such completion; and upon completion of all workgroups of the kernel dispatch packet, notifying a client of such completion and proceeding to a subsequent kernel dispatch packet.

Type: Grant

Filed: October 3, 2023

Date of Patent: December 10, 2024

Assignee: Advanced Micro Devices, Inc.

Inventors: Milind N. Nemlekar, Maxim V. Kazakov, Prerit Dak
Direct-connected machine learning accelerator

Patent number: 12165016

Abstract: Techniques are disclosed for communicating between a machine learning accelerator and one or more processing cores. The techniques include obtaining data at the machine learning accelerator via an input/output die; processing the data at the machine learning accelerator to generate machine learning processing results; and exporting the machine learning processing results via the input/output die, wherein the input/output die is coupled to one or more processor chiplets via one or more processor ports, and wherein the input/output die is coupled to the machine learning accelerator via an accelerator port.

Type: Grant

Filed: September 25, 2020

Date of Patent: December 10, 2024

Assignee: Advanced Micro Devices, Inc.

Inventor: Maxim V. Kazakov
Data compressor for approximation of matrices for matrix multiply operations

Patent number: 12072952

Abstract: A processing device is provided which comprises memory configured to store data and a processor. The processor comprises a plurality of MACs configured to perform matrix multiplication of elements of a first matrix and elements of a second matrix. The processor also comprises a plurality of logic devices configured to sum values of bits of product exponents values of the elements of the first matrix and second matrix and determine keep bit values for product exponents values to be kept for matrix multiplication. The processor also comprises a plurality of multiplexor arrays each configured to receive bits of the elements of the first matrix and the second matrix and the keep bit values and provide data for selecting which elements of the first matrix and the second matrix values are provided to the MACs for matrix multiplication.

Type: Grant

Filed: March 26, 2021

Date of Patent: August 27, 2024

Assignee: Advanced Micro Devices, Inc.

Inventors: Swapnil P. Sakharshete, Pramod Vasant Argade, Maxim V. Kazakov, Alexander M. Potapov
Techniques for reducing serialization in divergent control flow

Patent number: 12014208

Abstract: Techniques for executing shader programs with divergent control flow on a single instruction multiple data (“SIMD”) processor are disclosed. These techniques includes detecting entry into a divergent section of a shader program and, for the work-items that enter the divergent section, placing a task entry into a task queue associated with the target of each work-item. The target is the destination, in code, of any particular work-item, and is also referred to as a code segment herein. The task queues store task entries for code segments generated by different (or the same) wavefronts. A command processor examines task lists and schedules wavefronts for execution by grouping together tasks in the same task list into wavefronts and launching those wavefronts. By grouping tasks from different wavefronts together for execution in the same front, serialization of execution is greatly reduced or eliminated.

Type: Grant

Filed: June 29, 2018

Date of Patent: June 18, 2024

Assignee: Advanced Micro Devices, Inc.

Inventors: Skyler Jonathon Saleh, Maxim V. Kazakov
Relaxed invalidation for cache coherence

Patent number: 11960399

Abstract: Methods, systems, and devices maintain state information in a shadow tag memory for a plurality of cachelines in each of a plurality of private caches, with each of the private caches being associated with a corresponding one of multiple processing cores. One or more cache probes are generated based on a write operation associated with one or more cachelines of the plurality of cachelines, such that each of the cache probes is associated with cachelines of a particular private cache of the multiple private caches, the particular private cache being associated with an indicated processing core. Transmission of the cache probes to the particular private cache is prevented until, responsive to a scope acquire operation from the indicated processing core, the cache probes are released for transmission to the respectively associated cachelines in the particular private cache.

Type: Grant

Filed: December 21, 2021

Date of Patent: April 16, 2024

Assignee: Advanced Micro Devices, Inc.

Inventors: Akhil Arunkumar, Tarun Nakra, Maxim V. Kazakov, Milind N. Nemlekar
MULTI-ACCELERATOR COMPUTE DISPATCH

Publication number: 20240029336

Abstract: Techniques for executing computing work by a plurality of chiplets are provided. The techniques include assigning workgroups of a kernel dispatch packet to the chiplets; by each chiplet, executing the workgroups assigned to that chiplet; for each chiplet, upon completion of all workgroups assigned to that chiplet for the kernel dispatch packet, notifying the other chiplets of such completion; and upon completion of all workgroups of the kernel dispatch packet, notifying a client of such completion and proceeding to a subsequent kernel dispatch packet.

Type: Application

Filed: October 3, 2023

Publication date: January 25, 2024

Applicant: Advanced Micro Devices, Inc.

Inventors: Milind N. Nemlekar, Maxim V. Kazakov, Prerit Dak
Multi-accelerator compute dispatch

Patent number: 11790590

Abstract: Techniques for executing computing work by a plurality of chiplets are provided. The techniques include assigning workgroups of a kernel dispatch packet to the chiplets; by each chiplet, executing the workgroups assigned to that chiplet; for each chiplet, upon completion of all workgroups assigned to that chiplet for the kernel dispatch packet, notifying the other chiplets of such completion; and upon completion of all workgroups of the kernel dispatch packet, notifying a client of such completion and proceeding to a subsequent kernel dispatch packet.

Type: Grant

Filed: March 31, 2021

Date of Patent: October 17, 2023

Assignee: Advanced Micro Devices, Inc.

Inventors: Milind N. Nemlekar, Maxim V. Kazakov, Prerit Dak
VERTICAL AND HORIZONTAL BROADCAST OF SHARED OPERANDS

Publication number: 20230289191

Abstract: An array processor includes processor element arrays distributed in rows and columns. The processor element arrays perform operations on parameter values. The array processor also includes memory interfaces that broadcast sets of the parameter values to mutually exclusive subsets of the rows and columns of the processor element arrays. In some cases, the array processor includes single-instruction-multiple-data (SIMD) units including subsets of the processor element arrays in corresponding rows, workgroup processors (WGPs) including subsets of the SIMD units, and a memory fabric configured to interconnect with an external memory that stores the parameter values. The memory interfaces broadcast the parameter values to the SIMD units that include the processor element arrays in rows associated with the memory interfaces and columns of processor element arrays that are implemented across the SIMD units in the WGPs. The memory interfaces access the parameter values from the external memory via the memory fabric.

Type: Application

Filed: March 30, 2023

Publication date: September 14, 2023

Inventors: Sateesh LAGUDU, Allen H. Rush, Michael Mantor, Arun Vaidyanathan Ananthanarayan, Prasad Nagabhushanamgari, Maxim V. Kazakov
WAVEFRONT SELECTION AND EXECUTION

Publication number: 20230266975

Abstract: Techniques are provided for executing wavefronts. The techniques include at a first time for issuing instructions for execution, performing first identifying, including identifying that sufficient processing resources exist to execute a first set of instructions together within a processing lane; in response to the first identifying, executing the first set of instructions together; at a second time for issuing instructions for execution, performing second identifying, including identifying that no instructions are available for which sufficient processing resources exist for execution together within the processing lane; and in response to the second identifying, executing an instruction independently of any other instruction.

Type: Application

Filed: April 28, 2023

Publication date: August 24, 2023

Applicant: Advanced Micro Devices, Inc.

Inventor: Maxim V. Kazakov
RELAXED INVALIDATION FOR CACHE COHERENCE

Publication number: 20230195628

Abstract: Methods, systems, and devices maintain state information in a shadow tag memory for a plurality of cachelines in each of a plurality of private caches, with each of the private caches being associated with a corresponding one of multiple processing cores. One or more cache probes are generated based on a write operation associated with one or more cachelines of the plurality of cachelines, such that each of the cache probes is associated with cachelines of a particular private cache of the multiple private caches, the particular private cache being associated with an indicated processing core. Transmission of the cache probes to the particular private cache is prevented until, responsive to a scope acquire operation from the indicated processing core, the cache probes are released for transmission to the respectively associated cachelines in the particular private cache.

Type: Application

Filed: December 21, 2021

Publication date: June 22, 2023

Inventors: Akhil Arunkumar, Tarun Nakra, Maxim V. Kazakov, Milind N. Nemlekar
Wavefront selection and execution

Patent number: 11656877

Abstract: Techniques are provided for executing wavefronts. The techniques include at a first time for issuing instructions for execution, performing first identifying, including identifying that sufficient processing resources exist to execute a first set of instructions together within a processing lane; in response to the first identifying, executing the first set of instructions together; at a second time for issuing instructions for execution, performing second identifying, including identifying that no instructions are available for which sufficient processing resources exist for execution together within the processing lane; and in response to the second identifying, executing an instruction independently of any other instruction.

Type: Grant

Filed: March 31, 2021

Date of Patent: May 23, 2023

Assignee: Advanced Micro Devices, Inc.

Inventor: Maxim V. Kazakov
Hardware accelerated convolution

Patent number: 11657119

Abstract: A processing device is provided which includes memory configured to store data and a processor configured to determine, based on convolutional parameters associated with an image, a virtual general matrix-matrix multiplication (GEMM) space of a virtual GEMM space output matrix and generate, in the virtual GEMM space output matrix, a convolution result by matrix multiplying the data corresponding to a virtual GEMM space input matrix with the data corresponding to a virtual GEMM space filter matrix. The processing device also includes convolutional mapping hardware configured to map, based on the convolutional parameters, positions of the virtual GEMM space input matrix to positions of an image space of the image.

Type: Grant

Filed: August 30, 2019

Date of Patent: May 23, 2023

Assignee: Advanced Micro Devices, Inc.

Inventors: Swapnil P. Sakharshete, Samuel Lawrence Wasmundt, Maxim V. Kazakov, Vineet Goel
Vertical and horizontal broadcast of shared operands

Patent number: 11635967

Abstract: An array processor includes processor element arrays distributed in rows and columns. The processor element arrays perform operations on parameter values. The array processor also includes memory interfaces that broadcast sets of the parameter values to mutually exclusive subsets of the rows and columns of the processor element arrays. In some cases, the array processor includes single-instruction-multiple-data (SIMD) units including subsets of the processor element arrays in corresponding rows, workgroup processors (WGPs) including subsets of the SIMD units, and a memory fabric configured to interconnect with an external memory that stores the parameter values. The memory interfaces broadcast the parameter values to the SIMD units that include the processor element arrays in rows associated with the memory interfaces and columns of processor element arrays that are implemented across the SIMD units in the WGPs. The memory interfaces access the parameter values from the external memory via the memory fabric.

Type: Grant

Filed: September 25, 2020

Date of Patent: April 25, 2023

Assignee: Advanced Micro Devices, Inc.

Inventors: Sateesh Lagudu, Allen H. Rush, Michael Mantor, Arun Vaidyanathan Ananthanarayan, Prasad Nagabhushanamgari, Maxim V. Kazakov
PROCESSING DEVICE AND METHOD OF SHARING STORAGE BETWEEN CACHE MEMORY, LOCAL DATA STORAGE AND REGISTER FILES

Publication number: 20230069890

Abstract: An accelerated processing device is provided which comprises a plurality of compute units each including a plurality of SIMD units, and each SIMD unit comprises a register file. The accelerated processing device also comprises LDS in communication with each of the SIMD units. The accelerated processing device also comprises a first portion of cache memory, in communication with each of the SIMD units and a second cache portion of memory shared by the compute units. The compute units are configured to execute a program in which a storage portion of at least one of the register file of a SIMD unit, the first portion of cache memory and the LDS is reserved as part of another of the register file, the first portion of cache memory and the LDS.

Type: Application

Filed: September 3, 2021

Publication date: March 9, 2023

Applicant: Advanced Micro Devices, Inc.

Inventor: Maxim V. Kazakov

1 2 3 next