Patents by Inventor Maxim Kazakov
Maxim Kazakov has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20250068423Abstract: Described herein is a graphics processor comprising first circuitry configured to execute a decoded instruction and second circuitry configured to second circuitry configured to decode an instruction into the decoded instruction. The second circuitry is configured to determine a number of registers within a register file that are available to a thread of the processing resource and decode the instruction based on that number of registers.Type: ApplicationFiled: August 22, 2023Publication date: February 27, 2025Applicant: Intel CorporationInventors: Jorge Eduardo Parra Osorio, Jiasheng Chen, Supratim Pal, Vasanth Ranganathan, Guei-Yuan Lueh, James Valerio, Pradeep Golconda, Brent Schwartz, Fangwen Fu, Sabareesh Ganapathy, Peter Caday, Wei-Yu Chen, Po-Yu Chen, Timothy Bauer, Maxim Kazakov, Stanley Gambarin, Samir Pandya
-
Publication number: 20240220448Abstract: A scalable and configurable clustered systolic array is described. An example of apparatus includes a cluster including multiple cores; and a cache memory coupled with the cluster, wherein each core includes multiple processing resources, a memory coupled with the plurality of processing resources, a systolic array coupled with the memory, and one or more interconnects with one or more other cores of the plurality of cores; and wherein the systolic arrays of the cores are configurable by the apparatus to form a logically combined systolic array for processing of an operation by a cooperative group of threads running on one or more of the plurality of cores in the cluster.Type: ApplicationFiled: December 30, 2022Publication date: July 4, 2024Applicant: Intel CorporationInventors: Chunhui Mei, Jiasheng Chen, Ben J. Ashbaugh, Fangwen Fu, Hong Jiang, Guei-Yuan Lueh, Rama S.B. Harihara, Maxim Kazakov
-
Publication number: 20240220335Abstract: Synchronization for data multicast in compute core clusters is described. An example of an apparatus includes one or more processors including at least a graphics processing unit (GPU), the GPU including one or more clusters of cores and a memory, wherein each cluster of cores includes a plurality of cores, each core including one or more processing resources, shared local memory, and gateway circuitry, wherein the GPU is to initiate broadcast of a data element from a producer core to one or more consumer cores, and synchronize the broadcast of the data element utilizing the gateway circuitry of the producer core and the one or more consumer cores, and wherein synchronizing the broadcast of the data element includes establishing a multi-core barrier for broadcast of the data element.Type: ApplicationFiled: December 30, 2022Publication date: July 4, 2024Applicant: Intel CorporationInventors: Chunhui Mei, Yongsheng Liu, John A. Wiegert, Vasanth Ranganathan, Ben J. Ashbaugh, Fangwen Fu, Hong Jiang, Guei-Yuan Lueh, James Valerio, Alan M. Curtis, Maxim Kazakov
-
Publication number: 20240220254Abstract: Data multicast in compute core clusters is described. An example of an apparatus includes one or more processors including at least a first processor, the first processor including one or more clusters of cores and a memory, wherein each cluster of cores includes multiple cores, each core including one or more processing resources, shared memory, and broadcast circuitry; and wherein a first core in a first cluster of cores is to request a data element, determine whether any additional cores in the first cluster require the data element, and, upon determining that one or more additional cores in the first cluster require the data element, broadcast the data element to the one or more additional cores via interconnects between the broadcast circuitry of the cores of the first core cluster.Type: ApplicationFiled: December 30, 2022Publication date: July 4, 2024Applicant: Intel CorporationInventors: Chunhui Mei, Yongsheng Liu, John A. Wiegert, Vasanth Ranganathan, Ben J. Ashbaugh, Fangwen Fu, Hong Jiang, Guei-Yuan Lueh, James Valerio, Alan M. Curtis, Maxim Kazakov
-
Publication number: 20240168807Abstract: An apparatus to facilitate cross-thread register sharing for matrix multiplication compute is disclosed. The apparatus includes matrix acceleration hardware comprising a plurality of data processing units, wherein the respective plurality of data processing units are to: receive a decoded instruction for a first thread having a first register space, wherein the decoded instruction is for a matrix multiplication operation and comprises an indication to utilize a second register space of a second thread for an operand of the decoded instruction for the first thread; access the second register space of the second thread to obtain data for the operand of the decoded instruction; and perform the matrix multiplication operation for the first thread using the data for the operand from the second register space of the second thread.Type: ApplicationFiled: November 18, 2022Publication date: May 23, 2024Applicant: Intel CorporationInventors: Jorge Eduardo Parra Osorio, Guei-Yuan Lueh, Maxim Kazakov, Fangwen Fu, Supratim Pal, Kaiyu Chen
-
Publication number: 20240112295Abstract: Shared local registers for thread team processing is described. An example of an apparatus includes one or more processors including a graphic processor having multiple processing resources; and memory for storage of data, the graphics processor to allocate a first thread team to a first processing resource, the first thread team including hardware threads to be executed solely by the first processing resource; allocate a shared local register (SLR) space that may be directly reference in the ISA instructions to the first processing resource, the SLR space being accessible to the threads of the thread team and being inaccessible to threads outside of the thread team; and allocate individual register spaces to the thread team, each of the individual register spaces being accessible to a respective thread of the thread team.Type: ApplicationFiled: September 30, 2022Publication date: April 4, 2024Applicant: Intel CorporationInventors: Biju George, Fangwen Fu, Supratim Pal, Jorge Parra, Chunhui Mei, Maxim Kazakov, Joydeep Ray
-
Publication number: 20240111534Abstract: Embodiments described herein provide a technique enable a broadcast load from an L1 cache or shared local memory to register files associated with hardware threads of a graphics core. One embodiment provides a graphics processor comprising a cache memory and a graphics core coupled with the cache memory. The graphics core includes a plurality of hardware threads and memory access circuitry to facilitate access to memory by the plurality of hardware threads. The graphics core is configurable to process a plurality of load request from the plurality of hardware threads, detect duplicate load requests within the plurality of load requests, perform a single read from the cache memory in response to the duplicate load requests, and transmit data associated with the duplicate load requests to requesting hardware threads.Type: ApplicationFiled: September 30, 2022Publication date: April 4, 2024Applicant: Intel CorporationInventors: Fangwen Fu, Chunhui Mei, Maxim Kazakov, Biju George, Jorge Parra, Supratim Pal
-
Patent number: 10572399Abstract: In an example, a method of arbitrating memory requests may include tagging a first batch of memory requests with first metadata identifying that the first batch of memory requests originates from a first group of threads. The method may include tagging a second batch of memory requests with second metadata identifying that the second batch of memory requests originates from the first group of threads. The method may include storing the first and second batches of memory requests in a conflict arbitration queue. The method may include performing, using the first metadata and the second metadata, conflict arbitration between only the first batch of memory of requests and the second batch of memory requests stored in the conflict arbitration queue, which may include at least one other batch of memory requests stored that originates from a group of threads different from the first group of threads stored therein.Type: GrantFiled: July 13, 2016Date of Patent: February 25, 2020Assignee: QUALCOMM IncorporatedInventor: Maxim Kazakov
-
Patent number: 10223436Abstract: In an example, a method of transferring data may include synchronizing work-items corresponding to a first subgroup and work-items corresponding to a second subgroup with a barrier. The method may include performing an inter-subgroup data transfer between the first subgroup and the second subgroup.Type: GrantFiled: September 7, 2016Date of Patent: March 5, 2019Assignee: QUALCOMM IncorporatedInventors: Alexei Vladimirovich Bourd, Vladislav Shimanskiy, Maxim Kazakov, Yun Du
-
Patent number: 10062139Abstract: This disclosure describes examples of using two vertex shaders each one during different graphics processing passes in a binning architecture for graphics processing. A first vertex shader processes subset of attributes of a vertex in a binning pass, where the subset of attributes include those that contribute to visibility determination and attributes that may benefit from being processed with a vertex shader that provides functional flexibility. A second, different vertex shader processes another subset of attributes of the vertex in the rendering pass.Type: GrantFiled: July 25, 2016Date of Patent: August 28, 2018Assignee: QUALCOMM IncorporatedInventors: Maxim Kazakov, Andrew Evan Gruber
-
Patent number: 10026145Abstract: Techniques for allowing for concurrent execution of multiple different tasks and preempted prioritized execution of tasks on a shader processor. In an example operation, a driver executed by a central processing unit (CPU) configures GPU resources based on needs of a first “host” shader to allow the first shader to execute “normally” on the GPU. The GPU may observe two sets of tasks, “guest” tasks. Based on, for example, detecting an availability of resources, the GPU may determine a “guest” task may be run while the “host” task is running. A second “guest” shader executes on a GPU by using resources that were configured for the first “host” shader if there are available resources and, in some examples, additional resources are obtained through software-programmable means.Type: GrantFiled: December 13, 2016Date of Patent: July 17, 2018Assignee: QUALCOMM IncorporatedInventors: Alexei Vladimirovich Bourd, Maxim Kazakov, Chunhui Mei, Sumesh Udayakumaran
-
Publication number: 20180165786Abstract: Techniques for allowing for concurrent execution of multiple different tasks and preempted prioritized execution of tasks on a shader processor. In an example operation, a driver executed by a central processing unit (CPU) configures GPU resources based on needs of a first “host” shader to allow the first shader to execute “normally” on the GPU. The GPU may observe two sets of tasks, “guest” tasks. Based on, for example, detecting an availability of resources, the GPU may determine a “guest” task may be run while the “host” task is running. A second “guest” shader executes on a GPU by using resources that were configured for the first “host” shader if there are available resources and, in some examples, additional resources are obtained through software-programmable means.Type: ApplicationFiled: December 13, 2016Publication date: June 14, 2018Inventors: Alexei Vladimirovich Bourd, Maxim Kazakov, Chunhui Mei, Sumesh Udayakumaran
-
Publication number: 20180025463Abstract: This disclosure describes examples of using two vertex shaders each one during different graphics processing passes in a binning architecture for graphics processing. A first vertex shader processes subset of attributes of a vertex in a binning pass, where the subset of attributes include those that contribute to visibility determination and attributes that may benefit from being processed with a vertex shader that provides functional flexibility. A second, different vertex shader processes another subset of attributes of the vertex in the rendering pass.Type: ApplicationFiled: July 25, 2016Publication date: January 25, 2018Inventors: Maxim Kazakov, Andrew Evan Gruber
-
Publication number: 20180018097Abstract: In an example, a method of arbitrating memory requests may include tagging a first batch of memory requests with first metadata identifying that the first batch of memory requests originates from a first group of threads. The method may include tagging a second batch of memory requests with second metadata identifying that the second batch of memory requests originates from the first group of threads. The method may include storing the first and second batches of memory requests in a conflict arbitration queue. The method may include performing, using the first metadata and the second metadata, conflict arbitration between only the first batch of memory of requests and the second batch of memory requests stored in the conflict arbitration queue, which may include at least one other batch of memory requests stored that originates from a group of threads different from the first group of threads stored therein.Type: ApplicationFiled: July 13, 2016Publication date: January 18, 2018Inventor: Maxim Kazakov
-
Publication number: 20170316076Abstract: In an example, a method of transferring data may include synchronizing work-items corresponding to a first subgroup and work-items corresponding to a second subgroup with a barrier. The method may include performing an inter-subgroup data transfer between the first subgroup and the second subgroup.Type: ApplicationFiled: September 7, 2016Publication date: November 2, 2017Inventors: Alexei Vladimirovich Bourd, Vladislav Shimanskiy, Maxim Kazakov, Yun Du
-
Patent number: 9779469Abstract: Techniques are described for copying data only from a subset of memory locations allocated to a set of instructions to free memory locations for higher priority instructions to execute. Data from a dynamic portion of one or more general purpose registers (GPRs) allocated to the set of instructions may be copied and stored to another memory unit while data from a static portion of the one or more GPRs allocated to the set of instructions may not be copied and stored to another memory unit.Type: GrantFiled: August 17, 2015Date of Patent: October 3, 2017Assignee: QUALCOMM IncorporatedInventors: Lee Howes, Maxim Kazakov
-
Publication number: 20170053374Abstract: Techniques are described for copying data only from a subset of memory locations allocated to a set of instructions to free memory locations for higher priority instructions to execute. Data from a dynamic portion of one or more general purpose registers (GPRs) allocated to the set of instructions may be copied and stored to another memory unit while data from a static portion of the one or more GPRs allocated to the set of instructions may not be copied and stored to another memory unit.Type: ApplicationFiled: August 17, 2015Publication date: February 23, 2017Inventors: Lee Howes, Maxim Kazakov
-
Patent number: 9218686Abstract: The present invention intends to provide an image processing apparatus that can process geometrical primitives rapidly and with restrained memory consumption. A sequence of vertex data of a primitive sequence is stored in an index buffer, and size data of the primitive is stored in the head of the sequence. The result of processing of the primitive sequence in a geometry shader is stored in a cache buffer, and a primitive output, which is the output result stored in the cache buffer, is reused in reprocessing of the primitive processed in the geometry shader.Type: GrantFiled: December 2, 2011Date of Patent: December 22, 2015Assignee: DIGITAL MEDIA PROFESSIONALS INC.Inventor: Maxim Kazakov
-
Publication number: 20130113790Abstract: The present invention intends to provide an image processing apparatus that can process geometrical primitives rapidly and with restrained memory consumption. A sequence of vertex data of a primitive sequence is stored in an index buffer, and size data of the primitive is stored in the head of the sequence. The result of processing of the primitive sequence in a geometry shader is stored in a cache buffer, and a primitive output, which is the output result stored in the cache buffer, is reused in reprocessing of the primitive processed in the geometry shader.Type: ApplicationFiled: December 2, 2011Publication date: May 9, 2013Applicant: DIGITAL MEDIA PROFESSIONALS INC.Inventor: Maxim Kazakov