Patents by Inventor Stephen Junkins

Stephen Junkins has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20250200811
    Abstract: Variable width interleaved coding for graphics processing is described. An example of an apparatus includes one or more processors including a graphic processor; and memory for storage of data including data for graphics processing, wherein the graphics processor includes an encoder pipeline to provide variable width interleaved coding and a decoder pipeline to decode the variable width interleaved coding, and wherein the encoder pipeline is to receive a plurality of bitstreams from workgroups; perform parallel entropy encoding on the bitstreams to generate a plurality of encoded bitstreams for each of the workgroups; perform variable interleaving of the bitstreams for each workgroup based at least in part on data requirements for decoding received from the decoder pipeline; and compact outputs for each of the workgroups into a contiguous stream of interleaved data.
    Type: Application
    Filed: December 27, 2024
    Publication date: June 19, 2025
    Applicant: Intel Corporation
    Inventors: Stephen Junkins, Sreenivas Kothandaraman, Prasoonkumar Surti, Srihari Pratapa, William Hux, John Feit
  • Publication number: 20250117360
    Abstract: A processing apparatus includes a processing resource including a general-purpose parallel processing engine and a matrix accelerator. The matrix accelerator includes first circuitry to receive a command to perform operations associated with an instruction, second circuitry to configure the matrix accelerator according to a physical depth of a systolic array within the matrix accelerator and a logical depth associated with the instruction, third circuitry to read operands for the instruction from a register file associated with the systolic array, fourth circuitry to perform operations for the instruction via one or more passes through one or more physical pipeline stages of the systolic array based on a configuration performed by the second circuitry, and fifth circuitry to write output of the operations to the register file associated with the systolic array.
    Type: Application
    Filed: October 30, 2024
    Publication date: April 10, 2025
    Applicant: Intel Corporation
    Inventors: Jorge Parra, Wei-yu Chen, Kaiyu Chen, Varghese George, Junjie Gu, Chandra Gurram, Guei-Yuan Lueh, Stephen Junkins, Subramaniam Maiyuran, Supratim Pal
  • Publication number: 20250068429
    Abstract: A Streaming Wave Coalescer (SWC) circuit stores a first set of state values associated with a first subset of threads of a first wave in a bin based on each of the first subset of threads including a first set of instructions to be executed. A second set of state values associated with a second subset of threads of a second wave is stored in the bin based on each of the second subset of threads including the first set of instructions to be executed and based on the first wave and the second wave both being associated with a hard key. A third wave is formed from the threads of the first subset and the second subset and is emitted for execution. As a result of reorganizing the threads and reconstituting a different wave, thread divergence of waves sent for execution is reduced.
    Type: Application
    Filed: December 12, 2023
    Publication date: February 27, 2025
    Inventors: John Stephen Junkins, Christopher J. Brennan, Ian Richard Beaumont, Kellie Marks, Matthaeus G. Chajdas, Max Oberberger, Michael John Bedy, Michael Mantor, Sean Keely
  • Patent number: 12223682
    Abstract: Variable width interleaved coding for graphics processing is described. An example of an apparatus includes one or more processors including a graphic processor; and memory for storage of data including data for graphics processing, wherein the graphics processor includes an encoder pipeline to provide variable width interleaved coding and a decoder pipeline to decode the variable width interleaved coding, and wherein the encoder pipeline is to receive a plurality of bitstreams from workgroups; perform parallel entropy encoding on the bitstreams to generate a plurality of encoded bitstreams for each of the workgroups; perform variable interleaving of the bitstreams for each workgroup based at least in part on data requirements for decoding received from the decoder pipeline; and compact outputs for each of the workgroups into a contiguous stream of interleaved data.
    Type: Grant
    Filed: June 24, 2021
    Date of Patent: February 11, 2025
    Assignee: INTEL CORPORATION
    Inventors: Stephen Junkins, Sreenivas Kothandaraman, Prasoonkumar Surti, Srihari Pratapa, William Hux, John Feit
  • Patent number: 12174783
    Abstract: A processing apparatus includes a processing resource including a general-purpose parallel processing engine and a matrix accelerator. The matrix accelerator includes first circuitry to receive a command to perform operations associated with an instruction, second circuitry to configure the matrix accelerator according to a physical depth of a systolic array within the matrix accelerator and a logical depth associated with the instruction, third circuitry to read operands for the instruction from a register file associated with the systolic array, fourth circuitry to perform operations for the instruction via one or more passes through one or more physical pipeline stages of the systolic array based on a configuration performed by the second circuitry, and fifth circuitry to write output of the operations to the register file associated with the systolic array.
    Type: Grant
    Filed: June 24, 2021
    Date of Patent: December 24, 2024
    Assignee: Intel Corporation
    Inventors: Jorge Parra, Wei-yu Chen, Kaiyu Chen, Varghese George, Junjie Gu, Chandra Gurram, Guei-Yuan Lueh, Stephen Junkins, Subramaniam Maiyuran, Supratim Pal
  • Publication number: 20240394956
    Abstract: Apparatus and method for efficient graphics processing including ray tracing. For example, one embodiment of a graphics processor comprises: execution hardware logic to execute graphics commands and render images; an interface to couple functional units of the execution hardware logic to a tiled resource; and a tiled resource manager to manage access by the functional units to the tiled resource, a functional unit of the execution hardware logic to generate a request with a hash identifier (ID) to request access to a portion of the tiled resource, wherein the tiled resource manager is to determine whether a portion of the tiled resource identified by the hash ID exists, and if not, to allocate a new portion of the tiled resource and associate the new portion with the hash ID.
    Type: Application
    Filed: May 28, 2024
    Publication date: November 28, 2024
    Inventors: Sven Woop, Michael J. Doyle, Sreenivas Kothandaraman, Karthik Vaidyanathan, Abhishek R. Appu, Carsten Benthin, Prasoonkumar Surti, Holger Gruen, Stephen Junkins, Adam Lake, Bret G. Alfieri, Gabor Liktor, Joshua Barczak, Won-Jong Lee
  • Patent number: 12002145
    Abstract: Apparatus and method for efficient graphics processing including ray tracing. For example, one embodiment of a graphics processor comprises: execution hardware logic to execute graphics commands and render images; an interface to couple functional units of the execution hardware logic to a tiled resource; and a tiled resource manager to manage access by the functional units to the tiled resource, a functional unit of the execution hardware logic to generate a request with a hash identifier (ID) to request access to a portion of the tiled resource, wherein the tiled resource manager is to determine whether a portion of the tiled resource identified by the hash ID exists, and if not, to allocate a new portion of the tiled resource and associate the new portion with the hash ID.
    Type: Grant
    Filed: December 23, 2020
    Date of Patent: June 4, 2024
    Assignee: Intel Corporation
    Inventors: Sven Woop, Michael J. Doyle, Sreenivas Kothandaraman, Karthik Vaidyanathan, Abhishek R. Appu, Carsten Benthin, Prasoonkumar Surti, Holger Gruen, Stephen Junkins, Adam Lake, Bret G. Alfieri, Gabor Liktor, Joshua Barczak, Won-Jong Lee
  • Patent number: 11640297
    Abstract: Embodiments described herein provided for an instruction and associated logic to enable GPGPU program code to access special purpose hardware logic to accelerate dot product operations. One embodiment provides for a graphics processing unit comprising a fetch unit to fetch an instruction for execution and a decode unit to decode the instruction into a decoded instruction. The decoded instruction is a matrix instruction to cause the graphics processing unit to perform a parallel dot product operation. The GPGPU also includes systolic dot product circuitry to execute the decoded instruction across one or more SIMD lanes using multiple systolic layers, wherein to execute the decoded instruction, a dot product computed at a first systolic layer is to be output to a second systolic layer, wherein each systolic layer includes one or more sets of interconnected multipliers and adders, each set of multipliers and adders to generate a dot product.
    Type: Grant
    Filed: June 15, 2021
    Date of Patent: May 2, 2023
    Assignee: Intel Corporation
    Inventors: Subramaniam Maiyuran, Guei-Yuan Lueh, Supratim Pal, Ashutosh Garg, Chandra S. Gurram, Jorge E. Parra, Junjie Gu, Konrad Trifunovic, Hong Bin Liao, Mike B. MacPherson, Shubh B. Shah, Shubra Marwaha, Stephen Junkins, Timothy R. Bauer, Varghese George, Weiyu Chen
  • Publication number: 20230057492
    Abstract: Interleaving of variable bitrate streams for GPU implementations is described. An example of an apparatus includes one or more processors including a graphic processor, the graphics processor including a super-compression encoder pipeline to provide variable width interleaved coding; and memory for storage of data, wherein the graphics processor is to perform parallel dictionary encoding on a bitstream of symbols one of multiple workgroups, the workgroup to employ a plurality of encoders to generate a plurality of token-streams of variable lengths; create a histogram including at least tokens from the plurality of token-streams for the workgroup to generate an optimized entropy code; entropy code each of the plurality of token-streams for the workgroup into an encoded bitstream; and variably interleave the encoded bitstreams to generate an interleaved bitstream and bookkeep a size of the interleaved bitstream.
    Type: Application
    Filed: June 30, 2022
    Publication date: February 23, 2023
    Applicant: Intel Corporation
    Inventors: Sreenivas Kothandaraman, Stephen Junkins, Srihari Pratapa, Prasoonkumar Surti
  • Publication number: 20220414053
    Abstract: A processing apparatus includes a processing resource including a general-purpose parallel processing engine and a matrix accelerator. The matrix accelerator includes first circuitry to receive a command to perform operations associated with an instruction, second circuitry to configure the matrix accelerator according to a physical depth of a systolic array within the matrix accelerator and a logical depth associated with the instruction, third circuitry to read operands for the instruction from a register file associated with the systolic array, fourth circuitry to perform operations for the instruction via one or more passes through one or more physical pipeline stages of the systolic array based on a configuration performed by the second circuitry, and fifth circuitry to write output of the operations to the register file associated with the systolic array.
    Type: Application
    Filed: June 24, 2021
    Publication date: December 29, 2022
    Applicant: Intel Corporation
    Inventors: Jorge Parra, Wei-yu Chen, Kaiyu Chen, Varghese George, Junjie Gu, Chandra Gurram, Guei-Yuan Lueh, Stephen Junkins, Subramaniam Maiyuran, Supratim Pal
  • Publication number: 20220301228
    Abstract: Variable width interleaved coding for graphics processing is described. An example of an apparatus includes one or more processors including a graphic processor; and memory for storage of data including data for graphics processing, wherein the graphics processor includes an encoder pipeline to provide variable width interleaved coding and a decoder pipeline to decode the variable width interleaved coding, and wherein the encoder pipeline is to receive a plurality of bitstreams from workgroups; perform parallel entropy encoding on the bitstreams to generate a plurality of encoded bitstreams for each of the workgroups; perform variable interleaving of the bitstreams for each workgroup based at least in part on data requirements for decoding received from the decoder pipeline; and compact outputs for each of the workgroups into a contiguous stream of interleaved data.
    Type: Application
    Filed: June 24, 2021
    Publication date: September 22, 2022
    Applicant: Intel Corporation
    Inventors: Stephen Junkins, Sreenivas Kothandaraman, Prasoonkumar Surti, Srihari Pratapa, William Hux, John Feit
  • Patent number: 11436695
    Abstract: One embodiment provides for a general-purpose graphics processing device comprising a general-purpose graphics processing compute block to process a workload including graphics or compute operations, a first cache memory, and a coherency module enable the first cache memory to coherently cache data for the workload, the data stored in memory within a virtual address space, wherein the virtual address space shared with a separate general-purpose processor including a second cache memory that is coherent with the first cache memory.
    Type: Grant
    Filed: March 12, 2021
    Date of Patent: September 6, 2022
    Assignee: Intel Corporation
    Inventors: Joydeep Ray, Altug Koker, James A. Valerio, David Puffer, Abhishek R. Appu, Stephen Junkins
  • Publication number: 20220051467
    Abstract: Apparatus and method for efficient graphics processing including ray tracing. For example, one embodiment of a graphics processor comprises: execution hardware logic to execute graphics commands and render images; an interface to couple functional units of the execution hardware logic to a tiled resource; and a tiled resource manager to manage access by the functional units to the tiled resource, a functional unit of the execution hardware logic to generate a request with a hash identifier (ID) to request access to a portion of the tiled resource, wherein the tiled resource manager is to determine whether a portion of the tiled resource identified by the hash ID exists, and if not, to allocate a new portion of the tiled resource and associate the new portion with the hash ID.
    Type: Application
    Filed: December 23, 2020
    Publication date: February 17, 2022
    Inventors: Sven Woop, Michael J. Doyle, Sreenivas Kothandaraman, Karthik Vaidyanathan, Abhishek R. Appu, Carsten Benthin, Prasoonkumar Surti, Holger GRUEN, Stephen Junkins, Adam Lake, Bret G. Alfieri, Gabor Liktor, Joshua Barczak, Won-Jong Lee
  • Publication number: 20220051476
    Abstract: Apparatus and method for efficient graphics processing including ray tracing. For example, one embodiment of a graphics processor comprises: execution hardware logic to execute graphics commands and render images; an interface to couple functional units of the execution hardware logic to a tiled resource; and a tiled resource manager to manage access by the functional units to the tiled resource, a functional unit of the execution hardware logic to generate a request with a hash identifier (ID) to request access to a portion of the tiled resource, wherein the tiled resource manager is to determine whether a portion of the tiled resource identified by the hash ID exists, and if not, to allocate a new portion of the tiled resource and associate the new portion with the hash ID.
    Type: Application
    Filed: December 23, 2020
    Publication date: February 17, 2022
    Inventors: Sven Woop, Michael J. Doyle, Sreenivas Kothandaraman, Karthik Vaidyanathan, Abhishek R. Appu, Carsten Benthin, Prasoonkumar Surti, Holger GRUEN, Stephen Junkins, Adam Lake, Bret G. Alfieri, Gabor Liktor, Joshua Barczak, Won-Jong Lee
  • Publication number: 20210272231
    Abstract: One embodiment provides for a general-purpose graphics processing device comprising a general-purpose graphics processing compute block to process a workload including graphics or compute operations, a first cache memory, and a coherency module enable the first cache memory to coherently cache data for the workload, the data stored in memory within a virtual address space, wherein the virtual address space shared with a separate general-purpose processor including a second cache memory that is coherent with the first cache memory.
    Type: Application
    Filed: March 12, 2021
    Publication date: September 2, 2021
    Applicant: Intel Corporation
    Inventors: Joydeep Ray, Altug Koker, James A. Valerio, David Puffer, Abhishek R. Appu, Stephen Junkins
  • Patent number: 11042370
    Abstract: Embodiments described herein provided for an instruction and associated logic to enable GPGPU program code to access special purpose hardware logic to accelerate dot product operations. One embodiment provides for a graphics processing unit comprising a fetch unit to fetch an instruction for execution and a decode unit to decode the instruction into a decoded instruction. The decoded instruction is a matrix instruction to cause the graphics processing unit to perform a parallel dot product operation. The GPGPU also includes a systolic dot product unit to execute the decoded instruction across one or more SIMD lanes using multiple systolic layers, wherein to execute the decoded instruction, a dot product computed at a first systolic layer is to be output to a second systolic layer, wherein each systolic layer includes one or more sets of interconnected multipliers and adders, each set of multipliers and adders to generate a dot product.
    Type: Grant
    Filed: April 19, 2018
    Date of Patent: June 22, 2021
    Assignee: Intel Corporation
    Inventors: Subramaniam Maiyuran, Guei-Yuan Lueh, Supratim Pal, Ashutosh Garg, Chandra S. Gurram, Jorge E. Parra, Junjie Gu, Konrad Trifunovic, Hong Bin Liao, Mike B. Macpherson, Shubh B. Shah, Shubra Marwaha, Stephen Junkins, Timothy R. Bauer, Varghese George, Weiyu Chen
  • Patent number: 10949945
    Abstract: One embodiment provides for a general-purpose graphics processing device comprising a general-purpose graphics processing compute block to process a workload including graphics or compute operations, a first cache memory, and a coherency module enable the first cache memory to coherently cache data for the workload, the data stored in memory within a virtual address space, wherein the virtual address space shared with a separate general-purpose processor including a second cache memory that is coherent with the first cache memory.
    Type: Grant
    Filed: May 11, 2020
    Date of Patent: March 16, 2021
    Assignee: Intel Corporation
    Inventors: Joydeep Ray, Altug Koker, James A. Valerio, David Puffer, Abhishek R. Appu, Stephen Junkins
  • Publication number: 20200342564
    Abstract: One embodiment provides for a general-purpose graphics processing device comprising a general-purpose graphics processing compute block to process a workload including graphics or compute operations, a first cache memory, and a coherency module enable the first cache memory to coherently cache data for the workload, the data stored in memory within a virtual address space, wherein the virtual address space shared with a separate general-purpose processor including a second cache memory that is coherent with the first cache memory.
    Type: Application
    Filed: May 11, 2020
    Publication date: October 29, 2020
    Applicant: Intel Corporation
    Inventors: Joydeep Ray, Altug Koker, James A. Valerio, David Puffer, Abhishek R. Appu, Stephen Junkins
  • Patent number: 10657618
    Abstract: One embodiment provides for a general-purpose graphics processing device comprising a general-purpose graphics processing compute block to process a workload including graphics or compute operations, a first cache memory, and a coherency module enable the first cache memory to coherently cache data for the workload, the data stored in memory within a virtual address space, wherein the virtual address space shared with a separate general-purpose processor including a second cache memory that is coherent with the first cache memory.
    Type: Grant
    Filed: June 14, 2019
    Date of Patent: May 19, 2020
    Assignee: Intel Corporation
    Inventors: Joydeep Ray, Altug Koker, James A. Valerio, David Puffer, Abhishek R. Appu, Stephen Junkins
  • Publication number: 20190304052
    Abstract: One embodiment provides for a general-purpose graphics processing device comprising a general-purpose graphics processing compute block to process a workload including graphics or compute operations, a first cache memory, and a coherency module enable the first cache memory to coherently cache data for the workload, the data stored in memory within a virtual address space, wherein the virtual address space shared with a separate general-purpose processor including a second cache memory that is coherent with the first cache memory.
    Type: Application
    Filed: June 14, 2019
    Publication date: October 3, 2019
    Applicant: Intel Corporation
    Inventors: Joydeep Ray, Altug Koker, James A. Valerio, David Puffer, Abhishek R. Appu, Stephen Junkins