Patents by Inventor Fangwen Fu

Fangwen Fu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12346694
    Abstract: A processing apparatus includes a general-purpose parallel processing engine including a set of multiple processing elements including a single precision floating-point unit, a double precision floating point unit, and an integer unit; a matrix accelerator including one or more systolic arrays; a first register file coupled with a first read control circuit, wherein the first read control circuit couples with the set of multiple processing elements and the matrix accelerator to arbitrate read requests to the first register file from the set of multiple processing elements and the matrix accelerator; and a second register file coupled with a second read control circuit, wherein the second read control circuit couples with the matrix accelerator to arbitrate read requests to the second register file from the matrix accelerator and limit access to the second register file by the set of multiple processing elements.
    Type: Grant
    Filed: June 25, 2021
    Date of Patent: July 1, 2025
    Assignee: Intel Corporation
    Inventors: Chandra Gurram, Wei-yu Chen, Fangwen Fu, Sabareesh Ganapathy, Varghese George, Guei-Yuan Lueh, Subramaniam Maiyuran, Mike Macpherson, Supratim Pal, Jorge Parra
  • Publication number: 20250147762
    Abstract: Described herein is a graphics processor having processing resources with configurable thread and register configurations. Program code can configure a number of registers and accumulators that will be used by hardware threads during execution of the program code by the graphics processor. Processing resources within the graphics processor can be configured to assign different numbers of registers and accumulators to hardware threads based on the configuration requested by program code to be executed by the processing resource.
    Type: Application
    Filed: November 8, 2023
    Publication date: May 8, 2025
    Applicant: Intel Corporation
    Inventors: Vasanth Ranganathan, Gang Chen, Supratim Pal, Jorge Eduardo Parra Osorio, Arthur Hunter, Boris Kuznetsov, Deepak N K, Siva Kumar Seemakurthi, James Valerio, Shubham Dinesh Chavan, Abhishek Kumar Singh, Samir Pandya, Sandeep Tippannanavar Niranjan, Alan Curtis, Jain Philip, Maltesh Kulkarni, Fangwen Fu, John Wiegert, Brent Schwartz
  • Publication number: 20250117359
    Abstract: A processing apparatus described herein includes a general-purpose parallel processing engine comprising a systolic array having multiple pipelines, each of the multiple pipelines including multiple pipeline stages, wherein the multiple pipelines include a first pipeline, a second pipeline, and a common input shared between the first pipeline and the second pipeline.
    Type: Application
    Filed: October 11, 2024
    Publication date: April 10, 2025
    Applicant: Intel Corporation
    Inventors: Jorge Parra, Jiasheng Chen, Supratim Pal, Fangwen Fu, Sabareesh Ganapathy, Chandra Gurram, Chunhui Mei, Yue Qi
  • Publication number: 20250110741
    Abstract: An apparatus to facilitate supporting 8-bit floating point format for parallel computing and stochastic rounding operations in a graphics architecture is disclosed. The apparatus includes a processor comprising: a decoder to decode an instruction fetched for execution into a decoded instruction, wherein the decoded instruction is a matrix instruction that is to operate on 8-bit floating point operands to perform a parallel dot product operation; a scheduler to schedule the decoded instruction and provide input data for the 8-bit floating point operands in accordance with an 8-bit floating data format indicated by the decoded instruction; and circuitry to execute the decoded instruction to perform 32-way dot-product using 8-bit wide dot-product layers, each 8-bit wide dot-product layer comprises one or more sets of interconnected multipliers, shifters, and adders, wherein each set of multipliers, shifters, and adders is to generate a dot product of the 8-bit floating point operands.
    Type: Application
    Filed: September 29, 2023
    Publication date: April 3, 2025
    Applicant: Intel Corporation
    Inventors: Jorge Eduardo Parra Osorio, Fangwen Fu, Guei-Yuan Lueh, Hong Jiang, Jiasheng Chen, Naveen K. Mellempudi, Kevin Hurd, Chunhui Mei, Alexandre Hadj-Chaib, Elliot Taylor, Shuai Mu
  • Publication number: 20250110733
    Abstract: An apparatus to facilitate conversion operations and special value use cases supporting 8-bit floating point format in a graphics architecture is disclosed. The apparatus includes a processor comprising a decoder to decode an instruction fetched for execution into a decoded instruction, wherein the decoded instruction to cause the processor to perform conversion operation corresponding to an 8-bit floating point format operand; a scheduler to schedule the decoded instruction and provide input data for an input operand of the conversion operation indicated by the decoded instruction; and conversion circuitry to execute the decoded instruction to perform the conversion operation to convert the input operand to an output operand in accordance with the 8-bit floating point format operand, the conversion circuitry comprising hardware circuitry to rescale, normalize, and convert the input operand to the output operand.
    Type: Application
    Filed: September 29, 2023
    Publication date: April 3, 2025
    Applicant: Intel Corporation
    Inventors: Jorge Eduardo Parra Osorio, Fangwen Fu, Guei-Yuan Lueh, Jiasheng Chen, Naveen K. Mellempudi, Kevin Hurd, Alexandre Hadj-Chaib, Elliot Taylor, Marius Cornea-Hasegan
  • Publication number: 20250104180
    Abstract: Embodiments described herein include software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. One embodiment provides for data aware sparsity via compressed bitstreams. One embodiment provides for block sparse dot product instructions. One embodiment provides for a depth-wise adapter for a systolic array.
    Type: Application
    Filed: December 3, 2024
    Publication date: March 27, 2025
    Applicant: Intel Corporation
    Inventors: Abhishek Appu, Subramaniam Maiyuran, Mike Macpherson, Fangwen Fu, Jiasheng Chen, Varghese George, Vasanth Ranganathan, Ashutosh Garg, Joydeep Ray
  • Patent number: 12242846
    Abstract: An apparatus to facilitate supporting 8-bit floating point format operands in a computing architecture is disclosed. The apparatus includes a processor comprising: a decoder to decode an instruction fetched for execution into a decoded instruction, wherein the decoded instruction is a matrix instruction that operates on 8-bit floating point operands to cause the processor to perform a parallel dot product operation; a controller to schedule the decoded instruction and provide input data for the 8-bit floating point operands in accordance with an 8-bit floating data format indicated by the decoded instruction; and systolic dot product circuitry to execute the decoded instruction using systolic layers, each systolic layer comprises one or more sets of interconnected multipliers, shifters, and adder, each set of multipliers, shifters, and adders to generate a dot product of the 8-bit floating point operands.
    Type: Grant
    Filed: March 27, 2024
    Date of Patent: March 4, 2025
    Assignee: INTEL CORPORATION
    Inventors: Naveen Mellempudi, Subramaniam Maiyuran, Varghese George, Fangwen Fu, Shuai Mu, Supratim Pal, Wei Xiong
  • Publication number: 20250068423
    Abstract: Described herein is a graphics processor comprising first circuitry configured to execute a decoded instruction and second circuitry configured to second circuitry configured to decode an instruction into the decoded instruction. The second circuitry is configured to determine a number of registers within a register file that are available to a thread of the processing resource and decode the instruction based on that number of registers.
    Type: Application
    Filed: August 22, 2023
    Publication date: February 27, 2025
    Applicant: Intel Corporation
    Inventors: Jorge Eduardo Parra Osorio, Jiasheng Chen, Supratim Pal, Vasanth Ranganathan, Guei-Yuan Lueh, James Valerio, Pradeep Golconda, Brent Schwartz, Fangwen Fu, Sabareesh Ganapathy, Peter Caday, Wei-Yu Chen, Po-Yu Chen, Timothy Bauer, Maxim Kazakov, Stanley Gambarin, Samir Pandya
  • Publication number: 20250036361
    Abstract: Described herein is a graphics processor comprising a memory interface and a graphics processing cluster coupled with the memory interface. The graphics processing cluster includes a multi-lane parallel floating-point unit and a multi-lane parallel integer unit. The multi-lane parallel integer unit includes an integer pipeline including a plurality of parallel integer logic units configured to perform integer compute operations on a plurality of input data elements and a format conversion pipeline including a plurality of parallel format conversion units configured to convert a plurality of input data elements from a first one of a plurality of datatype formats to a second one of the plurality of datatype formats, the plurality of datatype formats including integer and floating-point formats.
    Type: Application
    Filed: July 25, 2023
    Publication date: January 30, 2025
    Applicant: Intel Corporation
    Inventors: Supratim Pal, Jiasheng Chen, Kevin Hurd, Jorge E. Parra Osorio, Christopher Spencer, Guei-Yuan Lueh, Pradeep K. Golconda, Fangwen Fu, Wei Xiong, Hongzheng Li, James Valerio, Mukundan Swaminathan, Nicholas Murphy, Shuai Mu, Clifford Gibson, Buqi Cheng
  • Publication number: 20250036412
    Abstract: Described herein is a graphics processor comprising a memory interface and a graphics processing cluster coupled with the memory interface. The graphics processing cluster includes a plurality of processing resources. A processing resource of the plurality of processing resources includes a source crossbar communicatively coupled with a register file, the source crossbar to reorder data elements of a source operand and a format conversion pipeline to convert a plurality of input data elements specified by the source operand from a first format of a plurality of datatype formats to a second format of the plurality of datatype formats, the plurality of datatype formats including integer and floating-point formats.
    Type: Application
    Filed: July 25, 2023
    Publication date: January 30, 2025
    Applicant: Intel Corporation
    Inventors: Supratim Pal, Jiasheng Chen, Christopher Spencer, Jorge E. Parra Osorio, Kevin Hurd, Guei-Yuan Lueh, Pradeep K. Golconda, Fangwen Fu, Wei Xiong, Hongzheng Li, James Valerio, Mukundan Swaminathan, Nicholas Murphy, Shuai Mu, Clifford Gibson, Buqi Cheng
  • Publication number: 20250037347
    Abstract: Described herein is a graphics processor comprising an instruction cache and a plurality of processing elements coupled with the instruction cache. The plurality of processing elements include functional units configured to provide an integer pipeline to execute instructions to perform operations on integer data elements. The integer pipeline including a first multiplier and a second multiplier, the first multiplier and the second multiplier configured to execute operations for a single instruction.
    Type: Application
    Filed: July 25, 2023
    Publication date: January 30, 2025
    Applicant: Intel Corporation
    Inventors: Jiasheng Chen, Supratim Pal, Kevin Hurd, Jorge E. Parra Osorio, Christopher Spencer, Takashi Nakagawa, Guei-Yuan Lueh, Pradeep K. Golconda, James Valerio, Mukundan Swaminathan, Nicholas Murphy, Clifford Gibson, Li-An Tang, Fangwen Fu, Kaiyu Chen, Buqi Cheng
  • Patent number: 12198222
    Abstract: Embodiments described herein include software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. One embodiment provides for data aware sparsity via compressed bitstreams. One embodiment provides for block sparse dot product instructions. One embodiment provides for a depth-wise adapter for a systolic array.
    Type: Grant
    Filed: December 7, 2023
    Date of Patent: January 14, 2025
    Assignee: Intel Corporation
    Inventors: Abhishek Appu, Subramaniam Maiyuran, Mike Macpherson, Fangwen Fu, Jiasheng Chen, Varghese George, Vasanth Ranganathan, Ashutosh Garg, Joydeep Ray
  • Patent number: 12189571
    Abstract: A processing apparatus described herein includes a general-purpose parallel processing engine comprising a systolic array having multiple pipelines, each of the multiple pipelines including multiple pipeline stages, wherein the multiple pipelines include a first pipeline, a second pipeline, and a common input shared between the first pipeline and the second pipeline.
    Type: Grant
    Filed: June 25, 2021
    Date of Patent: January 7, 2025
    Assignee: Intel Corporation
    Inventors: Jorge Parra, Jiasheng Chen, Supratim Pal, Fangwen Fu, Sabareesh Ganapathy, Chandra Gurram, Chunhui Mei, Yue Qi
  • Publication number: 20240427842
    Abstract: An apparatus to facilitate machine learning matrix processing is disclosed. The apparatus comprises a memory to store matrix data one or more processors to execute an instruction to examine a message descriptor included in the instruction to determine a type of matrix layout manipulation operation that is to be executed, examine a message header included in the instruction having a plurality of parameters that define a two-dimensional (2D) memory surface that is to be retrieved, retrieve one or more blocks of the matrix data from the memory based on the plurality of parameters and a register file including a plurality of registers, wherein the one or more blocks of the matrix data is stored within a first set of the plurality of registers.
    Type: Application
    Filed: May 24, 2024
    Publication date: December 26, 2024
    Applicant: Intel Corporation
    Inventors: Joydeep Ray, Fangwen Fu, Dhiraj D. Kalamkar, Sasikanth Avancha
  • Publication number: 20240256274
    Abstract: An apparatus to facilitate supporting 8-bit floating point format operands in a computing architecture is disclosed. The apparatus includes a processor comprising: a decoder to decode an instruction fetched for execution into a decoded instruction, wherein the decoded instruction is a matrix instruction that operates on 8-bit floating point operands to cause the processor to perform a parallel dot product operation; a controller to schedule the decoded instruction and provide input data for the 8-bit floating point operands in accordance with an 8-bit floating data format indicated by the decoded instruction; and systolic dot product circuitry to execute the decoded instruction using systolic layers, each systolic layer comprises one or more sets of interconnected multipliers, shifters, and adder, each set of multipliers, shifters, and adders to generate a dot product of the 8-bit floating point operands.
    Type: Application
    Filed: March 27, 2024
    Publication date: August 1, 2024
    Applicant: Intel Corporation
    Inventors: Naveen Mellempudi, Subramaniam Maiyuran, Varghese George, Fangwen Fu, Shuai Mu, Supratim Pal, Wei Xiong
  • Patent number: 12039000
    Abstract: An apparatus to facilitate machine learning matrix processing is disclosed. The apparatus comprises a memory to store matrix data one or more processors to execute an instruction to examine a message descriptor included in the instruction to determine a type of matrix layout manipulation operation that is to be executed, examine a message header included in the instruction having a plurality of parameters that define a two-dimensional (2D) memory surface that is to be retrieved, retrieve one or more blocks of the matrix data from the memory based on the plurality of parameters and a register file including a plurality of registers, wherein the one or more blocks of the matrix data is stored within a first set of the plurality of registers.
    Type: Grant
    Filed: February 2, 2023
    Date of Patent: July 16, 2024
    Assignee: Intel Corporation
    Inventors: Joydeep Ray, Fangwen Fu, Dhiraj D. Kalamkar, Sasikanth Avancha
  • Publication number: 20240232088
    Abstract: Embodiments described herein provide a technique to facilitate the broadcast or multicast of asynchronous loads to shared local memory of a plurality of graphics cores within a graphics core cluster. One embodiment provides a graphics processor including a cache memory a graphics core cluster coupled with the cache memory. The graphics core cluster includes a plurality of graphics cores. The plurality of graphics cores includes a graphics core configured to receive a designation as a producer graphics core for a multicast load, read data from the cache memory; and transmit the data read from the cache memory to a consumer graphics core of the plurality of graphics cores.
    Type: Application
    Filed: October 25, 2022
    Publication date: July 11, 2024
    Applicant: Intel Corporation
    Inventors: John A. Wiegert, Joydeep Ray, Vasanth Ranganathan, Biju George, Fangwen Fu, Abhishek R. Appu, Chunhui Mei, Changwon Rhee
  • Publication number: 20240231957
    Abstract: Embodiments described herein provide a technique to facilitate the synchronization of workgroups executed on multiple graphics cores of a graphics core cluster. One embodiment provides a graphics core including a cache memory and a graphics core coupled with the cache memory. The graphics core includes execution resources to execute an instruction via a plurality of hardware threads and barrier circuitry to synchronize execution of the plurality of hardware threads, wherein the barrier circuitry is configured to provide a plurality of re-usable named barriers.
    Type: Application
    Filed: October 25, 2022
    Publication date: July 11, 2024
    Applicant: Intel Corporation
    Inventors: Fangwen Fu, Chunhui Mei, John A. Wiegert, Yongsheng Liu, Ben J. Ashbaugh
  • Publication number: 20240220254
    Abstract: Data multicast in compute core clusters is described. An example of an apparatus includes one or more processors including at least a first processor, the first processor including one or more clusters of cores and a memory, wherein each cluster of cores includes multiple cores, each core including one or more processing resources, shared memory, and broadcast circuitry; and wherein a first core in a first cluster of cores is to request a data element, determine whether any additional cores in the first cluster require the data element, and, upon determining that one or more additional cores in the first cluster require the data element, broadcast the data element to the one or more additional cores via interconnects between the broadcast circuitry of the cores of the first core cluster.
    Type: Application
    Filed: December 30, 2022
    Publication date: July 4, 2024
    Applicant: Intel Corporation
    Inventors: Chunhui Mei, Yongsheng Liu, John A. Wiegert, Vasanth Ranganathan, Ben J. Ashbaugh, Fangwen Fu, Hong Jiang, Guei-Yuan Lueh, James Valerio, Alan M. Curtis, Maxim Kazakov
  • Publication number: 20240220448
    Abstract: A scalable and configurable clustered systolic array is described. An example of apparatus includes a cluster including multiple cores; and a cache memory coupled with the cluster, wherein each core includes multiple processing resources, a memory coupled with the plurality of processing resources, a systolic array coupled with the memory, and one or more interconnects with one or more other cores of the plurality of cores; and wherein the systolic arrays of the cores are configurable by the apparatus to form a logically combined systolic array for processing of an operation by a cooperative group of threads running on one or more of the plurality of cores in the cluster.
    Type: Application
    Filed: December 30, 2022
    Publication date: July 4, 2024
    Applicant: Intel Corporation
    Inventors: Chunhui Mei, Jiasheng Chen, Ben J. Ashbaugh, Fangwen Fu, Hong Jiang, Guei-Yuan Lueh, Rama S.B. Harihara, Maxim Kazakov