Patents by Inventor Maxim V. KAZAKOV
Maxim V. KAZAKOV has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12327124Abstract: An array processor includes processor element arrays distributed in rows and columns. The processor element arrays perform operations on parameter values. The array processor also includes memory interfaces that broadcast sets of the parameter values to mutually exclusive subsets of the rows and columns of the processor element arrays. In some cases, the array processor includes single-instruction-multiple-data (SIMD) units including subsets of the processor element arrays in corresponding rows, workgroup processors (WGPs) including subsets of the SIMD units, and a memory fabric configured to interconnect with an external memory that stores the parameter values. The memory interfaces broadcast the parameter values to the SIMD units that include the processor element arrays in rows associated with the memory interfaces and columns of processor element arrays that are implemented across the SIMD units in the WGPs. The memory interfaces access the parameter values from the external memory via the memory fabric.Type: GrantFiled: March 30, 2023Date of Patent: June 10, 2025Assignee: Advanced Micro Devices, Inc.Inventors: Sateesh Lagudu, Allen H. Rush, Michael Mantor, Arun Vaidyanathan Ananthanarayan, Prasad Nagabhushanamgari, Maxim V. Kazakov
-
Patent number: 12265484Abstract: An accelerated processing device is provided which comprises a plurality of compute units each including a plurality of SIMD units, and each SIMD unit comprises a register file. The accelerated processing device also comprises LDS in communication with each of the SIMD units. The accelerated processing device also comprises a first portion of cache memory, in communication with each of the SIMD units and a second cache portion of memory shared by the compute units. The compute units are configured to execute a program in which a storage portion of at least one of the register file of a SIMD unit, the first portion of cache memory and the LDS is reserved as part of another of the register file, the first portion of cache memory and the LDS.Type: GrantFiled: September 3, 2021Date of Patent: April 1, 2025Assignee: Advanced Micro Devices, Inc.Inventor: Maxim V. Kazakov
-
Publication number: 20250086515Abstract: Techniques are disclosed for communicating between a machine learning accelerator and one or more processing cores. The techniques include obtaining data at the machine learning accelerator via an input/output die; processing the data at the machine learning accelerator to generate machine learning processing results; and exporting the machine learning processing results via the input/output die, wherein the input/output die is coupled to one or more processor chiplets via one or more processor ports, and wherein the input/output die is coupled to the machine learning accelerator via an accelerator port.Type: ApplicationFiled: November 21, 2024Publication date: March 13, 2025Applicant: Advanced Micro Devices, Inc.Inventor: Maxim V. Kazakov
-
Patent number: 12248789Abstract: Techniques are provided for executing wavefronts. The techniques include at a first time for issuing instructions for execution, performing first identifying, including identifying that sufficient processing resources exist to execute a first set of instructions together within a processing lane; in response to the first identifying, executing the first set of instructions together; at a second time for issuing instructions for execution, performing second identifying, including identifying that no instructions are available for which sufficient processing resources exist for execution together within the processing lane; and in response to the second identifying, executing an instruction independently of any other instruction.Type: GrantFiled: April 28, 2023Date of Patent: March 11, 2025Assignee: Advanced Micro Devices, Inc.Inventor: Maxim V. Kazakov
-
Patent number: 12217061Abstract: Techniques are provided for executing wavefronts. The techniques include at a first time for issuing instructions for execution, performing first identifying, including identifying that sufficient processing resources exist to execute a first set of instructions together within a processing lane; in response to the first identifying, executing the first set of instructions together; at a second time for issuing instructions for execution, performing second identifying, including identifying that no instructions are available for which sufficient processing resources exist for execution together within the processing lane; and in response to the second identifying, executing an instruction independently of any other instruction.Type: GrantFiled: April 28, 2023Date of Patent: February 4, 2025Assignee: Advanced Micro Devices, Inc.Inventor: Maxim V. Kazakov
-
Patent number: 12197533Abstract: A processing device is provided which comprises memory configured to store data and a processor configured to receive a portion of data of a first matrix comprising a first plurality of elements and receive a portion of data of a second matrix comprising a second plurality of elements. The processor is also configured to determine values for a third matrix by dropping a number of products from products of pairs of elements of the first and second matrices based on approximating the products of the pairs of elements as a sum of the exponents of the pairs of elements and performing matrix multiplication on remaining products of the pairs of elements of the first and second matrices.Type: GrantFiled: March 26, 2021Date of Patent: January 14, 2025Assignee: Advanced Micro Devices, Inc.Inventors: Pramod Vasant Argade, Swapnil P. Sakharshete, Maxim V. Kazakov, Alexander M. Potapov
-
Patent number: 12165252Abstract: Techniques for executing computing work by a plurality of chiplets are provided. The techniques include assigning workgroups of a kernel dispatch packet to the chiplets; by each chiplet, executing the workgroups assigned to that chiplet; for each chiplet, upon completion of all workgroups assigned to that chiplet for the kernel dispatch packet, notifying the other chiplets of such completion; and upon completion of all workgroups of the kernel dispatch packet, notifying a client of such completion and proceeding to a subsequent kernel dispatch packet.Type: GrantFiled: October 3, 2023Date of Patent: December 10, 2024Assignee: Advanced Micro Devices, Inc.Inventors: Milind N. Nemlekar, Maxim V. Kazakov, Prerit Dak
-
Patent number: 12165016Abstract: Techniques are disclosed for communicating between a machine learning accelerator and one or more processing cores. The techniques include obtaining data at the machine learning accelerator via an input/output die; processing the data at the machine learning accelerator to generate machine learning processing results; and exporting the machine learning processing results via the input/output die, wherein the input/output die is coupled to one or more processor chiplets via one or more processor ports, and wherein the input/output die is coupled to the machine learning accelerator via an accelerator port.Type: GrantFiled: September 25, 2020Date of Patent: December 10, 2024Assignee: Advanced Micro Devices, Inc.Inventor: Maxim V. Kazakov
-
Patent number: 12072952Abstract: A processing device is provided which comprises memory configured to store data and a processor. The processor comprises a plurality of MACs configured to perform matrix multiplication of elements of a first matrix and elements of a second matrix. The processor also comprises a plurality of logic devices configured to sum values of bits of product exponents values of the elements of the first matrix and second matrix and determine keep bit values for product exponents values to be kept for matrix multiplication. The processor also comprises a plurality of multiplexor arrays each configured to receive bits of the elements of the first matrix and the second matrix and the keep bit values and provide data for selecting which elements of the first matrix and the second matrix values are provided to the MACs for matrix multiplication.Type: GrantFiled: March 26, 2021Date of Patent: August 27, 2024Assignee: Advanced Micro Devices, Inc.Inventors: Swapnil P. Sakharshete, Pramod Vasant Argade, Maxim V. Kazakov, Alexander M. Potapov
-
Patent number: 12014208Abstract: Techniques for executing shader programs with divergent control flow on a single instruction multiple data (“SIMD”) processor are disclosed. These techniques includes detecting entry into a divergent section of a shader program and, for the work-items that enter the divergent section, placing a task entry into a task queue associated with the target of each work-item. The target is the destination, in code, of any particular work-item, and is also referred to as a code segment herein. The task queues store task entries for code segments generated by different (or the same) wavefronts. A command processor examines task lists and schedules wavefronts for execution by grouping together tasks in the same task list into wavefronts and launching those wavefronts. By grouping tasks from different wavefronts together for execution in the same front, serialization of execution is greatly reduced or eliminated.Type: GrantFiled: June 29, 2018Date of Patent: June 18, 2024Assignee: Advanced Micro Devices, Inc.Inventors: Skyler Jonathon Saleh, Maxim V. Kazakov
-
Patent number: 11960399Abstract: Methods, systems, and devices maintain state information in a shadow tag memory for a plurality of cachelines in each of a plurality of private caches, with each of the private caches being associated with a corresponding one of multiple processing cores. One or more cache probes are generated based on a write operation associated with one or more cachelines of the plurality of cachelines, such that each of the cache probes is associated with cachelines of a particular private cache of the multiple private caches, the particular private cache being associated with an indicated processing core. Transmission of the cache probes to the particular private cache is prevented until, responsive to a scope acquire operation from the indicated processing core, the cache probes are released for transmission to the respectively associated cachelines in the particular private cache.Type: GrantFiled: December 21, 2021Date of Patent: April 16, 2024Assignee: Advanced Micro Devices, Inc.Inventors: Akhil Arunkumar, Tarun Nakra, Maxim V. Kazakov, Milind N. Nemlekar
-
Publication number: 20240029336Abstract: Techniques for executing computing work by a plurality of chiplets are provided. The techniques include assigning workgroups of a kernel dispatch packet to the chiplets; by each chiplet, executing the workgroups assigned to that chiplet; for each chiplet, upon completion of all workgroups assigned to that chiplet for the kernel dispatch packet, notifying the other chiplets of such completion; and upon completion of all workgroups of the kernel dispatch packet, notifying a client of such completion and proceeding to a subsequent kernel dispatch packet.Type: ApplicationFiled: October 3, 2023Publication date: January 25, 2024Applicant: Advanced Micro Devices, Inc.Inventors: Milind N. Nemlekar, Maxim V. Kazakov, Prerit Dak
-
Patent number: 11790590Abstract: Techniques for executing computing work by a plurality of chiplets are provided. The techniques include assigning workgroups of a kernel dispatch packet to the chiplets; by each chiplet, executing the workgroups assigned to that chiplet; for each chiplet, upon completion of all workgroups assigned to that chiplet for the kernel dispatch packet, notifying the other chiplets of such completion; and upon completion of all workgroups of the kernel dispatch packet, notifying a client of such completion and proceeding to a subsequent kernel dispatch packet.Type: GrantFiled: March 31, 2021Date of Patent: October 17, 2023Assignee: Advanced Micro Devices, Inc.Inventors: Milind N. Nemlekar, Maxim V. Kazakov, Prerit Dak
-
Publication number: 20230289191Abstract: An array processor includes processor element arrays distributed in rows and columns. The processor element arrays perform operations on parameter values. The array processor also includes memory interfaces that broadcast sets of the parameter values to mutually exclusive subsets of the rows and columns of the processor element arrays. In some cases, the array processor includes single-instruction-multiple-data (SIMD) units including subsets of the processor element arrays in corresponding rows, workgroup processors (WGPs) including subsets of the SIMD units, and a memory fabric configured to interconnect with an external memory that stores the parameter values. The memory interfaces broadcast the parameter values to the SIMD units that include the processor element arrays in rows associated with the memory interfaces and columns of processor element arrays that are implemented across the SIMD units in the WGPs. The memory interfaces access the parameter values from the external memory via the memory fabric.Type: ApplicationFiled: March 30, 2023Publication date: September 14, 2023Inventors: Sateesh LAGUDU, Allen H. Rush, Michael Mantor, Arun Vaidyanathan Ananthanarayan, Prasad Nagabhushanamgari, Maxim V. Kazakov
-
Publication number: 20230266975Abstract: Techniques are provided for executing wavefronts. The techniques include at a first time for issuing instructions for execution, performing first identifying, including identifying that sufficient processing resources exist to execute a first set of instructions together within a processing lane; in response to the first identifying, executing the first set of instructions together; at a second time for issuing instructions for execution, performing second identifying, including identifying that no instructions are available for which sufficient processing resources exist for execution together within the processing lane; and in response to the second identifying, executing an instruction independently of any other instruction.Type: ApplicationFiled: April 28, 2023Publication date: August 24, 2023Applicant: Advanced Micro Devices, Inc.Inventor: Maxim V. Kazakov
-
Publication number: 20230195628Abstract: Methods, systems, and devices maintain state information in a shadow tag memory for a plurality of cachelines in each of a plurality of private caches, with each of the private caches being associated with a corresponding one of multiple processing cores. One or more cache probes are generated based on a write operation associated with one or more cachelines of the plurality of cachelines, such that each of the cache probes is associated with cachelines of a particular private cache of the multiple private caches, the particular private cache being associated with an indicated processing core. Transmission of the cache probes to the particular private cache is prevented until, responsive to a scope acquire operation from the indicated processing core, the cache probes are released for transmission to the respectively associated cachelines in the particular private cache.Type: ApplicationFiled: December 21, 2021Publication date: June 22, 2023Inventors: Akhil Arunkumar, Tarun Nakra, Maxim V. Kazakov, Milind N. Nemlekar
-
Patent number: 11656877Abstract: Techniques are provided for executing wavefronts. The techniques include at a first time for issuing instructions for execution, performing first identifying, including identifying that sufficient processing resources exist to execute a first set of instructions together within a processing lane; in response to the first identifying, executing the first set of instructions together; at a second time for issuing instructions for execution, performing second identifying, including identifying that no instructions are available for which sufficient processing resources exist for execution together within the processing lane; and in response to the second identifying, executing an instruction independently of any other instruction.Type: GrantFiled: March 31, 2021Date of Patent: May 23, 2023Assignee: Advanced Micro Devices, Inc.Inventor: Maxim V. Kazakov
-
Patent number: 11657119Abstract: A processing device is provided which includes memory configured to store data and a processor configured to determine, based on convolutional parameters associated with an image, a virtual general matrix-matrix multiplication (GEMM) space of a virtual GEMM space output matrix and generate, in the virtual GEMM space output matrix, a convolution result by matrix multiplying the data corresponding to a virtual GEMM space input matrix with the data corresponding to a virtual GEMM space filter matrix. The processing device also includes convolutional mapping hardware configured to map, based on the convolutional parameters, positions of the virtual GEMM space input matrix to positions of an image space of the image.Type: GrantFiled: August 30, 2019Date of Patent: May 23, 2023Assignee: Advanced Micro Devices, Inc.Inventors: Swapnil P. Sakharshete, Samuel Lawrence Wasmundt, Maxim V. Kazakov, Vineet Goel
-
Patent number: 11635967Abstract: An array processor includes processor element arrays distributed in rows and columns. The processor element arrays perform operations on parameter values. The array processor also includes memory interfaces that broadcast sets of the parameter values to mutually exclusive subsets of the rows and columns of the processor element arrays. In some cases, the array processor includes single-instruction-multiple-data (SIMD) units including subsets of the processor element arrays in corresponding rows, workgroup processors (WGPs) including subsets of the SIMD units, and a memory fabric configured to interconnect with an external memory that stores the parameter values. The memory interfaces broadcast the parameter values to the SIMD units that include the processor element arrays in rows associated with the memory interfaces and columns of processor element arrays that are implemented across the SIMD units in the WGPs. The memory interfaces access the parameter values from the external memory via the memory fabric.Type: GrantFiled: September 25, 2020Date of Patent: April 25, 2023Assignee: Advanced Micro Devices, Inc.Inventors: Sateesh Lagudu, Allen H. Rush, Michael Mantor, Arun Vaidyanathan Ananthanarayan, Prasad Nagabhushanamgari, Maxim V. Kazakov
-
Publication number: 20230069890Abstract: An accelerated processing device is provided which comprises a plurality of compute units each including a plurality of SIMD units, and each SIMD unit comprises a register file. The accelerated processing device also comprises LDS in communication with each of the SIMD units. The accelerated processing device also comprises a first portion of cache memory, in communication with each of the SIMD units and a second cache portion of memory shared by the compute units. The compute units are configured to execute a program in which a storage portion of at least one of the register file of a SIMD unit, the first portion of cache memory and the LDS is reserved as part of another of the register file, the first portion of cache memory and the LDS.Type: ApplicationFiled: September 3, 2021Publication date: March 9, 2023Applicant: Advanced Micro Devices, Inc.Inventor: Maxim V. Kazakov