Patents by Inventor Maxim Kazakov

Maxim Kazakov has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20250068423
    Abstract: Described herein is a graphics processor comprising first circuitry configured to execute a decoded instruction and second circuitry configured to second circuitry configured to decode an instruction into the decoded instruction. The second circuitry is configured to determine a number of registers within a register file that are available to a thread of the processing resource and decode the instruction based on that number of registers.
    Type: Application
    Filed: August 22, 2023
    Publication date: February 27, 2025
    Applicant: Intel Corporation
    Inventors: Jorge Eduardo Parra Osorio, Jiasheng Chen, Supratim Pal, Vasanth Ranganathan, Guei-Yuan Lueh, James Valerio, Pradeep Golconda, Brent Schwartz, Fangwen Fu, Sabareesh Ganapathy, Peter Caday, Wei-Yu Chen, Po-Yu Chen, Timothy Bauer, Maxim Kazakov, Stanley Gambarin, Samir Pandya
  • Publication number: 20240220448
    Abstract: A scalable and configurable clustered systolic array is described. An example of apparatus includes a cluster including multiple cores; and a cache memory coupled with the cluster, wherein each core includes multiple processing resources, a memory coupled with the plurality of processing resources, a systolic array coupled with the memory, and one or more interconnects with one or more other cores of the plurality of cores; and wherein the systolic arrays of the cores are configurable by the apparatus to form a logically combined systolic array for processing of an operation by a cooperative group of threads running on one or more of the plurality of cores in the cluster.
    Type: Application
    Filed: December 30, 2022
    Publication date: July 4, 2024
    Applicant: Intel Corporation
    Inventors: Chunhui Mei, Jiasheng Chen, Ben J. Ashbaugh, Fangwen Fu, Hong Jiang, Guei-Yuan Lueh, Rama S.B. Harihara, Maxim Kazakov
  • Publication number: 20240220335
    Abstract: Synchronization for data multicast in compute core clusters is described. An example of an apparatus includes one or more processors including at least a graphics processing unit (GPU), the GPU including one or more clusters of cores and a memory, wherein each cluster of cores includes a plurality of cores, each core including one or more processing resources, shared local memory, and gateway circuitry, wherein the GPU is to initiate broadcast of a data element from a producer core to one or more consumer cores, and synchronize the broadcast of the data element utilizing the gateway circuitry of the producer core and the one or more consumer cores, and wherein synchronizing the broadcast of the data element includes establishing a multi-core barrier for broadcast of the data element.
    Type: Application
    Filed: December 30, 2022
    Publication date: July 4, 2024
    Applicant: Intel Corporation
    Inventors: Chunhui Mei, Yongsheng Liu, John A. Wiegert, Vasanth Ranganathan, Ben J. Ashbaugh, Fangwen Fu, Hong Jiang, Guei-Yuan Lueh, James Valerio, Alan M. Curtis, Maxim Kazakov
  • Publication number: 20240220254
    Abstract: Data multicast in compute core clusters is described. An example of an apparatus includes one or more processors including at least a first processor, the first processor including one or more clusters of cores and a memory, wherein each cluster of cores includes multiple cores, each core including one or more processing resources, shared memory, and broadcast circuitry; and wherein a first core in a first cluster of cores is to request a data element, determine whether any additional cores in the first cluster require the data element, and, upon determining that one or more additional cores in the first cluster require the data element, broadcast the data element to the one or more additional cores via interconnects between the broadcast circuitry of the cores of the first core cluster.
    Type: Application
    Filed: December 30, 2022
    Publication date: July 4, 2024
    Applicant: Intel Corporation
    Inventors: Chunhui Mei, Yongsheng Liu, John A. Wiegert, Vasanth Ranganathan, Ben J. Ashbaugh, Fangwen Fu, Hong Jiang, Guei-Yuan Lueh, James Valerio, Alan M. Curtis, Maxim Kazakov
  • Publication number: 20240168807
    Abstract: An apparatus to facilitate cross-thread register sharing for matrix multiplication compute is disclosed. The apparatus includes matrix acceleration hardware comprising a plurality of data processing units, wherein the respective plurality of data processing units are to: receive a decoded instruction for a first thread having a first register space, wherein the decoded instruction is for a matrix multiplication operation and comprises an indication to utilize a second register space of a second thread for an operand of the decoded instruction for the first thread; access the second register space of the second thread to obtain data for the operand of the decoded instruction; and perform the matrix multiplication operation for the first thread using the data for the operand from the second register space of the second thread.
    Type: Application
    Filed: November 18, 2022
    Publication date: May 23, 2024
    Applicant: Intel Corporation
    Inventors: Jorge Eduardo Parra Osorio, Guei-Yuan Lueh, Maxim Kazakov, Fangwen Fu, Supratim Pal, Kaiyu Chen
  • Publication number: 20240112295
    Abstract: Shared local registers for thread team processing is described. An example of an apparatus includes one or more processors including a graphic processor having multiple processing resources; and memory for storage of data, the graphics processor to allocate a first thread team to a first processing resource, the first thread team including hardware threads to be executed solely by the first processing resource; allocate a shared local register (SLR) space that may be directly reference in the ISA instructions to the first processing resource, the SLR space being accessible to the threads of the thread team and being inaccessible to threads outside of the thread team; and allocate individual register spaces to the thread team, each of the individual register spaces being accessible to a respective thread of the thread team.
    Type: Application
    Filed: September 30, 2022
    Publication date: April 4, 2024
    Applicant: Intel Corporation
    Inventors: Biju George, Fangwen Fu, Supratim Pal, Jorge Parra, Chunhui Mei, Maxim Kazakov, Joydeep Ray
  • Publication number: 20240111534
    Abstract: Embodiments described herein provide a technique enable a broadcast load from an L1 cache or shared local memory to register files associated with hardware threads of a graphics core. One embodiment provides a graphics processor comprising a cache memory and a graphics core coupled with the cache memory. The graphics core includes a plurality of hardware threads and memory access circuitry to facilitate access to memory by the plurality of hardware threads. The graphics core is configurable to process a plurality of load request from the plurality of hardware threads, detect duplicate load requests within the plurality of load requests, perform a single read from the cache memory in response to the duplicate load requests, and transmit data associated with the duplicate load requests to requesting hardware threads.
    Type: Application
    Filed: September 30, 2022
    Publication date: April 4, 2024
    Applicant: Intel Corporation
    Inventors: Fangwen Fu, Chunhui Mei, Maxim Kazakov, Biju George, Jorge Parra, Supratim Pal
  • Patent number: 10572399
    Abstract: In an example, a method of arbitrating memory requests may include tagging a first batch of memory requests with first metadata identifying that the first batch of memory requests originates from a first group of threads. The method may include tagging a second batch of memory requests with second metadata identifying that the second batch of memory requests originates from the first group of threads. The method may include storing the first and second batches of memory requests in a conflict arbitration queue. The method may include performing, using the first metadata and the second metadata, conflict arbitration between only the first batch of memory of requests and the second batch of memory requests stored in the conflict arbitration queue, which may include at least one other batch of memory requests stored that originates from a group of threads different from the first group of threads stored therein.
    Type: Grant
    Filed: July 13, 2016
    Date of Patent: February 25, 2020
    Assignee: QUALCOMM Incorporated
    Inventor: Maxim Kazakov
  • Patent number: 10223436
    Abstract: In an example, a method of transferring data may include synchronizing work-items corresponding to a first subgroup and work-items corresponding to a second subgroup with a barrier. The method may include performing an inter-subgroup data transfer between the first subgroup and the second subgroup.
    Type: Grant
    Filed: September 7, 2016
    Date of Patent: March 5, 2019
    Assignee: QUALCOMM Incorporated
    Inventors: Alexei Vladimirovich Bourd, Vladislav Shimanskiy, Maxim Kazakov, Yun Du
  • Patent number: 10062139
    Abstract: This disclosure describes examples of using two vertex shaders each one during different graphics processing passes in a binning architecture for graphics processing. A first vertex shader processes subset of attributes of a vertex in a binning pass, where the subset of attributes include those that contribute to visibility determination and attributes that may benefit from being processed with a vertex shader that provides functional flexibility. A second, different vertex shader processes another subset of attributes of the vertex in the rendering pass.
    Type: Grant
    Filed: July 25, 2016
    Date of Patent: August 28, 2018
    Assignee: QUALCOMM Incorporated
    Inventors: Maxim Kazakov, Andrew Evan Gruber
  • Patent number: 10026145
    Abstract: Techniques for allowing for concurrent execution of multiple different tasks and preempted prioritized execution of tasks on a shader processor. In an example operation, a driver executed by a central processing unit (CPU) configures GPU resources based on needs of a first “host” shader to allow the first shader to execute “normally” on the GPU. The GPU may observe two sets of tasks, “guest” tasks. Based on, for example, detecting an availability of resources, the GPU may determine a “guest” task may be run while the “host” task is running. A second “guest” shader executes on a GPU by using resources that were configured for the first “host” shader if there are available resources and, in some examples, additional resources are obtained through software-programmable means.
    Type: Grant
    Filed: December 13, 2016
    Date of Patent: July 17, 2018
    Assignee: QUALCOMM Incorporated
    Inventors: Alexei Vladimirovich Bourd, Maxim Kazakov, Chunhui Mei, Sumesh Udayakumaran
  • Publication number: 20180165786
    Abstract: Techniques for allowing for concurrent execution of multiple different tasks and preempted prioritized execution of tasks on a shader processor. In an example operation, a driver executed by a central processing unit (CPU) configures GPU resources based on needs of a first “host” shader to allow the first shader to execute “normally” on the GPU. The GPU may observe two sets of tasks, “guest” tasks. Based on, for example, detecting an availability of resources, the GPU may determine a “guest” task may be run while the “host” task is running. A second “guest” shader executes on a GPU by using resources that were configured for the first “host” shader if there are available resources and, in some examples, additional resources are obtained through software-programmable means.
    Type: Application
    Filed: December 13, 2016
    Publication date: June 14, 2018
    Inventors: Alexei Vladimirovich Bourd, Maxim Kazakov, Chunhui Mei, Sumesh Udayakumaran
  • Publication number: 20180025463
    Abstract: This disclosure describes examples of using two vertex shaders each one during different graphics processing passes in a binning architecture for graphics processing. A first vertex shader processes subset of attributes of a vertex in a binning pass, where the subset of attributes include those that contribute to visibility determination and attributes that may benefit from being processed with a vertex shader that provides functional flexibility. A second, different vertex shader processes another subset of attributes of the vertex in the rendering pass.
    Type: Application
    Filed: July 25, 2016
    Publication date: January 25, 2018
    Inventors: Maxim Kazakov, Andrew Evan Gruber
  • Publication number: 20180018097
    Abstract: In an example, a method of arbitrating memory requests may include tagging a first batch of memory requests with first metadata identifying that the first batch of memory requests originates from a first group of threads. The method may include tagging a second batch of memory requests with second metadata identifying that the second batch of memory requests originates from the first group of threads. The method may include storing the first and second batches of memory requests in a conflict arbitration queue. The method may include performing, using the first metadata and the second metadata, conflict arbitration between only the first batch of memory of requests and the second batch of memory requests stored in the conflict arbitration queue, which may include at least one other batch of memory requests stored that originates from a group of threads different from the first group of threads stored therein.
    Type: Application
    Filed: July 13, 2016
    Publication date: January 18, 2018
    Inventor: Maxim Kazakov
  • Publication number: 20170316076
    Abstract: In an example, a method of transferring data may include synchronizing work-items corresponding to a first subgroup and work-items corresponding to a second subgroup with a barrier. The method may include performing an inter-subgroup data transfer between the first subgroup and the second subgroup.
    Type: Application
    Filed: September 7, 2016
    Publication date: November 2, 2017
    Inventors: Alexei Vladimirovich Bourd, Vladislav Shimanskiy, Maxim Kazakov, Yun Du
  • Patent number: 9779469
    Abstract: Techniques are described for copying data only from a subset of memory locations allocated to a set of instructions to free memory locations for higher priority instructions to execute. Data from a dynamic portion of one or more general purpose registers (GPRs) allocated to the set of instructions may be copied and stored to another memory unit while data from a static portion of the one or more GPRs allocated to the set of instructions may not be copied and stored to another memory unit.
    Type: Grant
    Filed: August 17, 2015
    Date of Patent: October 3, 2017
    Assignee: QUALCOMM Incorporated
    Inventors: Lee Howes, Maxim Kazakov
  • Publication number: 20170053374
    Abstract: Techniques are described for copying data only from a subset of memory locations allocated to a set of instructions to free memory locations for higher priority instructions to execute. Data from a dynamic portion of one or more general purpose registers (GPRs) allocated to the set of instructions may be copied and stored to another memory unit while data from a static portion of the one or more GPRs allocated to the set of instructions may not be copied and stored to another memory unit.
    Type: Application
    Filed: August 17, 2015
    Publication date: February 23, 2017
    Inventors: Lee Howes, Maxim Kazakov
  • Patent number: 9218686
    Abstract: The present invention intends to provide an image processing apparatus that can process geometrical primitives rapidly and with restrained memory consumption. A sequence of vertex data of a primitive sequence is stored in an index buffer, and size data of the primitive is stored in the head of the sequence. The result of processing of the primitive sequence in a geometry shader is stored in a cache buffer, and a primitive output, which is the output result stored in the cache buffer, is reused in reprocessing of the primitive processed in the geometry shader.
    Type: Grant
    Filed: December 2, 2011
    Date of Patent: December 22, 2015
    Assignee: DIGITAL MEDIA PROFESSIONALS INC.
    Inventor: Maxim Kazakov
  • Publication number: 20130113790
    Abstract: The present invention intends to provide an image processing apparatus that can process geometrical primitives rapidly and with restrained memory consumption. A sequence of vertex data of a primitive sequence is stored in an index buffer, and size data of the primitive is stored in the head of the sequence. The result of processing of the primitive sequence in a geometry shader is stored in a cache buffer, and a primitive output, which is the output result stored in the cache buffer, is reused in reprocessing of the primitive processed in the geometry shader.
    Type: Application
    Filed: December 2, 2011
    Publication date: May 9, 2013
    Applicant: DIGITAL MEDIA PROFESSIONALS INC.
    Inventor: Maxim Kazakov