Patents by Inventor Maxim Kazakov

Maxim Kazakov has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

INSTRUCTION ENCODING TO IMPLEMENT INCREASED REGISTER CAPACITY PER THREAD

Publication number: 20250068423

Abstract: Described herein is a graphics processor comprising first circuitry configured to execute a decoded instruction and second circuitry configured to second circuitry configured to decode an instruction into the decoded instruction. The second circuitry is configured to determine a number of registers within a register file that are available to a thread of the processing resource and decode the instruction based on that number of registers.

Type: Application

Filed: August 22, 2023

Publication date: February 27, 2025

Applicant: Intel Corporation

Inventors: Jorge Eduardo Parra Osorio, Jiasheng Chen, Supratim Pal, Vasanth Ranganathan, Guei-Yuan Lueh, James Valerio, Pradeep Golconda, Brent Schwartz, Fangwen Fu, Sabareesh Ganapathy, Peter Caday, Wei-Yu Chen, Po-Yu Chen, Timothy Bauer, Maxim Kazakov, Stanley Gambarin, Samir Pandya
DATA MULTICAST IN COMPUTE CORE CLUSTERS

Publication number: 20240220254

Abstract: Data multicast in compute core clusters is described. An example of an apparatus includes one or more processors including at least a first processor, the first processor including one or more clusters of cores and a memory, wherein each cluster of cores includes multiple cores, each core including one or more processing resources, shared memory, and broadcast circuitry; and wherein a first core in a first cluster of cores is to request a data element, determine whether any additional cores in the first cluster require the data element, and, upon determining that one or more additional cores in the first cluster require the data element, broadcast the data element to the one or more additional cores via interconnects between the broadcast circuitry of the cores of the first core cluster.

Type: Application

Filed: December 30, 2022

Publication date: July 4, 2024

Applicant: Intel Corporation

Inventors: Chunhui Mei, Yongsheng Liu, John A. Wiegert, Vasanth Ranganathan, Ben J. Ashbaugh, Fangwen Fu, Hong Jiang, Guei-Yuan Lueh, James Valerio, Alan M. Curtis, Maxim Kazakov
SCALABLE AND CONFIGURABLE CLUSTERED SYSTOLIC ARRAY

Publication number: 20240220448

Abstract: A scalable and configurable clustered systolic array is described. An example of apparatus includes a cluster including multiple cores; and a cache memory coupled with the cluster, wherein each core includes multiple processing resources, a memory coupled with the plurality of processing resources, a systolic array coupled with the memory, and one or more interconnects with one or more other cores of the plurality of cores; and wherein the systolic arrays of the cores are configurable by the apparatus to form a logically combined systolic array for processing of an operation by a cooperative group of threads running on one or more of the plurality of cores in the cluster.

Type: Application

Filed: December 30, 2022

Publication date: July 4, 2024

Applicant: Intel Corporation

Inventors: Chunhui Mei, Jiasheng Chen, Ben J. Ashbaugh, Fangwen Fu, Hong Jiang, Guei-Yuan Lueh, Rama S.B. Harihara, Maxim Kazakov
SYNCHRONIZATION FOR DATA MULTICAST IN COMPUTE CORE CLUSTERS

Publication number: 20240220335

Abstract: Synchronization for data multicast in compute core clusters is described. An example of an apparatus includes one or more processors including at least a graphics processing unit (GPU), the GPU including one or more clusters of cores and a memory, wherein each cluster of cores includes a plurality of cores, each core including one or more processing resources, shared local memory, and gateway circuitry, wherein the GPU is to initiate broadcast of a data element from a producer core to one or more consumer cores, and synchronize the broadcast of the data element utilizing the gateway circuitry of the producer core and the one or more consumer cores, and wherein synchronizing the broadcast of the data element includes establishing a multi-core barrier for broadcast of the data element.

Type: Application

Filed: December 30, 2022

Publication date: July 4, 2024

Applicant: Intel Corporation

Inventors: Chunhui Mei, Yongsheng Liu, John A. Wiegert, Vasanth Ranganathan, Ben J. Ashbaugh, Fangwen Fu, Hong Jiang, Guei-Yuan Lueh, James Valerio, Alan M. Curtis, Maxim Kazakov
CROSS-THREAD REGISTER SHARING FOR MATRIX MULTIPLICATION COMPUTE

Publication number: 20240168807

Abstract: An apparatus to facilitate cross-thread register sharing for matrix multiplication compute is disclosed. The apparatus includes matrix acceleration hardware comprising a plurality of data processing units, wherein the respective plurality of data processing units are to: receive a decoded instruction for a first thread having a first register space, wherein the decoded instruction is for a matrix multiplication operation and comprises an indication to utilize a second register space of a second thread for an operand of the decoded instruction for the first thread; access the second register space of the second thread to obtain data for the operand of the decoded instruction; and perform the matrix multiplication operation for the first thread using the data for the operand from the second register space of the second thread.

Type: Application

Filed: November 18, 2022

Publication date: May 23, 2024

Applicant: Intel Corporation

Inventors: Jorge Eduardo Parra Osorio, Guei-Yuan Lueh, Maxim Kazakov, Fangwen Fu, Supratim Pal, Kaiyu Chen
DETERMINISTIC BROADCASTING FROM SHARED MEMORY

Publication number: 20240111534

Abstract: Embodiments described herein provide a technique enable a broadcast load from an L1 cache or shared local memory to register files associated with hardware threads of a graphics core. One embodiment provides a graphics processor comprising a cache memory and a graphics core coupled with the cache memory. The graphics core includes a plurality of hardware threads and memory access circuitry to facilitate access to memory by the plurality of hardware threads. The graphics core is configurable to process a plurality of load request from the plurality of hardware threads, detect duplicate load requests within the plurality of load requests, perform a single read from the cache memory in response to the duplicate load requests, and transmit data associated with the duplicate load requests to requesting hardware threads.

Type: Application

Filed: September 30, 2022

Publication date: April 4, 2024

Applicant: Intel Corporation

Inventors: Fangwen Fu, Chunhui Mei, Maxim Kazakov, Biju George, Jorge Parra, Supratim Pal
SHARED LOCAL REGISTERS FOR THREAD TEAM PROCESSING

Publication number: 20240112295

Abstract: Shared local registers for thread team processing is described. An example of an apparatus includes one or more processors including a graphic processor having multiple processing resources; and memory for storage of data, the graphics processor to allocate a first thread team to a first processing resource, the first thread team including hardware threads to be executed solely by the first processing resource; allocate a shared local register (SLR) space that may be directly reference in the ISA instructions to the first processing resource, the SLR space being accessible to the threads of the thread team and being inaccessible to threads outside of the thread team; and allocate individual register spaces to the thread team, each of the individual register spaces being accessible to a respective thread of the thread team.

Type: Application

Filed: September 30, 2022

Publication date: April 4, 2024

Applicant: Intel Corporation

Inventors: Biju George, Fangwen Fu, Supratim Pal, Jorge Parra, Chunhui Mei, Maxim Kazakov, Joydeep Ray
Memory request arbitration

Patent number: 10572399

Abstract: In an example, a method of arbitrating memory requests may include tagging a first batch of memory requests with first metadata identifying that the first batch of memory requests originates from a first group of threads. The method may include tagging a second batch of memory requests with second metadata identifying that the second batch of memory requests originates from the first group of threads. The method may include storing the first and second batches of memory requests in a conflict arbitration queue. The method may include performing, using the first metadata and the second metadata, conflict arbitration between only the first batch of memory of requests and the second batch of memory requests stored in the conflict arbitration queue, which may include at least one other batch of memory requests stored that originates from a group of threads different from the first group of threads stored therein.

Type: Grant

Filed: July 13, 2016

Date of Patent: February 25, 2020

Assignee: QUALCOMM Incorporated

Inventor: Maxim Kazakov
Inter-subgroup data sharing

Patent number: 10223436

Abstract: In an example, a method of transferring data may include synchronizing work-items corresponding to a first subgroup and work-items corresponding to a second subgroup with a barrier. The method may include performing an inter-subgroup data transfer between the first subgroup and the second subgroup.

Type: Grant

Filed: September 7, 2016

Date of Patent: March 5, 2019

Assignee: QUALCOMM Incorporated

Inventors: Alexei Vladimirovich Bourd, Vladislav Shimanskiy, Maxim Kazakov, Yun Du
Vertex shaders for binning based graphics processing

Patent number: 10062139

Abstract: This disclosure describes examples of using two vertex shaders each one during different graphics processing passes in a binning architecture for graphics processing. A first vertex shader processes subset of attributes of a vertex in a binning pass, where the subset of attributes include those that contribute to visibility determination and attributes that may benefit from being processed with a vertex shader that provides functional flexibility. A second, different vertex shader processes another subset of attributes of the vertex in the rendering pass.

Type: Grant

Filed: July 25, 2016

Date of Patent: August 28, 2018

Assignee: QUALCOMM Incorporated

Inventors: Maxim Kazakov, Andrew Evan Gruber
Resource sharing on shader processor of GPU

Patent number: 10026145

Abstract: Techniques for allowing for concurrent execution of multiple different tasks and preempted prioritized execution of tasks on a shader processor. In an example operation, a driver executed by a central processing unit (CPU) configures GPU resources based on needs of a first “host” shader to allow the first shader to execute “normally” on the GPU. The GPU may observe two sets of tasks, “guest” tasks. Based on, for example, detecting an availability of resources, the GPU may determine a “guest” task may be run while the “host” task is running. A second “guest” shader executes on a GPU by using resources that were configured for the first “host” shader if there are available resources and, in some examples, additional resources are obtained through software-programmable means.

Type: Grant

Filed: December 13, 2016

Date of Patent: July 17, 2018

Assignee: QUALCOMM Incorporated

Inventors: Alexei Vladimirovich Bourd, Maxim Kazakov, Chunhui Mei, Sumesh Udayakumaran
RESOURCE SHARING ON SHADER PROCESSOR OF GPU

Publication number: 20180165786

Abstract: Techniques for allowing for concurrent execution of multiple different tasks and preempted prioritized execution of tasks on a shader processor. In an example operation, a driver executed by a central processing unit (CPU) configures GPU resources based on needs of a first “host” shader to allow the first shader to execute “normally” on the GPU. The GPU may observe two sets of tasks, “guest” tasks. Based on, for example, detecting an availability of resources, the GPU may determine a “guest” task may be run while the “host” task is running. A second “guest” shader executes on a GPU by using resources that were configured for the first “host” shader if there are available resources and, in some examples, additional resources are obtained through software-programmable means.

Type: Application

Filed: December 13, 2016

Publication date: June 14, 2018

Inventors: Alexei Vladimirovich Bourd, Maxim Kazakov, Chunhui Mei, Sumesh Udayakumaran
VERTEX SHADERS FOR BINNING BASED GRAPHICS PROCESSING

Publication number: 20180025463

Abstract: This disclosure describes examples of using two vertex shaders each one during different graphics processing passes in a binning architecture for graphics processing. A first vertex shader processes subset of attributes of a vertex in a binning pass, where the subset of attributes include those that contribute to visibility determination and attributes that may benefit from being processed with a vertex shader that provides functional flexibility. A second, different vertex shader processes another subset of attributes of the vertex in the rendering pass.

Type: Application

Filed: July 25, 2016

Publication date: January 25, 2018

Inventors: Maxim Kazakov, Andrew Evan Gruber
MEMORY REQUEST ARBITRATION

Publication number: 20180018097

Abstract: In an example, a method of arbitrating memory requests may include tagging a first batch of memory requests with first metadata identifying that the first batch of memory requests originates from a first group of threads. The method may include tagging a second batch of memory requests with second metadata identifying that the second batch of memory requests originates from the first group of threads. The method may include storing the first and second batches of memory requests in a conflict arbitration queue. The method may include performing, using the first metadata and the second metadata, conflict arbitration between only the first batch of memory of requests and the second batch of memory requests stored in the conflict arbitration queue, which may include at least one other batch of memory requests stored that originates from a group of threads different from the first group of threads stored therein.

Type: Application

Filed: July 13, 2016

Publication date: January 18, 2018

Inventor: Maxim Kazakov
INTER-SUBGROUP DATA SHARING

Publication number: 20170316076

Abstract: In an example, a method of transferring data may include synchronizing work-items corresponding to a first subgroup and work-items corresponding to a second subgroup with a barrier. The method may include performing an inter-subgroup data transfer between the first subgroup and the second subgroup.

Type: Application

Filed: September 7, 2016

Publication date: November 2, 2017

Inventors: Alexei Vladimirovich Bourd, Vladislav Shimanskiy, Maxim Kazakov, Yun Du
Register spill management for general purpose registers (GPRs)

Patent number: 9779469

Abstract: Techniques are described for copying data only from a subset of memory locations allocated to a set of instructions to free memory locations for higher priority instructions to execute. Data from a dynamic portion of one or more general purpose registers (GPRs) allocated to the set of instructions may be copied and stored to another memory unit while data from a static portion of the one or more GPRs allocated to the set of instructions may not be copied and stored to another memory unit.

Type: Grant

Filed: August 17, 2015

Date of Patent: October 3, 2017

Assignee: QUALCOMM Incorporated

Inventors: Lee Howes, Maxim Kazakov
REGISTER SPILL MANAGEMENT FOR GENERAL PURPOSE REGISTERS (GPRs)

Publication number: 20170053374

Abstract: Techniques are described for copying data only from a subset of memory locations allocated to a set of instructions to free memory locations for higher priority instructions to execute. Data from a dynamic portion of one or more general purpose registers (GPRs) allocated to the set of instructions may be copied and stored to another memory unit while data from a static portion of the one or more GPRs allocated to the set of instructions may not be copied and stored to another memory unit.

Type: Application

Filed: August 17, 2015

Publication date: February 23, 2017

Inventors: Lee Howes, Maxim Kazakov
Image processing device

Patent number: 9218686

Abstract: The present invention intends to provide an image processing apparatus that can process geometrical primitives rapidly and with restrained memory consumption. A sequence of vertex data of a primitive sequence is stored in an index buffer, and size data of the primitive is stored in the head of the sequence. The result of processing of the primitive sequence in a geometry shader is stored in a cache buffer, and a primitive output, which is the output result stored in the cache buffer, is reused in reprocessing of the primitive processed in the geometry shader.

Type: Grant

Filed: December 2, 2011

Date of Patent: December 22, 2015

Assignee: DIGITAL MEDIA PROFESSIONALS INC.

Inventor: Maxim Kazakov
IMAGE PROCESSING DEVICE

Publication number: 20130113790

Abstract: The present invention intends to provide an image processing apparatus that can process geometrical primitives rapidly and with restrained memory consumption. A sequence of vertex data of a primitive sequence is stored in an index buffer, and size data of the primitive is stored in the head of the sequence. The result of processing of the primitive sequence in a geometry shader is stored in a cache buffer, and a primitive output, which is the output result stored in the cache buffer, is reused in reprocessing of the primitive processed in the geometry shader.

Type: Application

Filed: December 2, 2011

Publication date: May 9, 2013

Applicant: DIGITAL MEDIA PROFESSIONALS INC.

Inventor: Maxim Kazakov