Patents by Inventor Changwon RHEE

Changwon RHEE has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240134797
    Abstract: Embodiments described herein provide a technique to facilitate the broadcast or multicast of asynchronous loads to shared local memory of a plurality of graphics cores within a graphics core cluster. One embodiment provides a graphics processor including a cache memory a graphics core cluster coupled with the cache memory. The graphics core cluster includes a plurality of graphics cores. The plurality of graphics cores includes a graphics core configured to receive a designation as a producer graphics core for a multicast load, read data from the cache memory; and transmit the data read from the cache memory to a consumer graphics core of the plurality of graphics cores.
    Type: Application
    Filed: October 24, 2022
    Publication date: April 25, 2024
    Applicant: Intel Corporation
    Inventors: John A. Wiegert, Joydeep Ray, Vasanth Ranganathan, Biju George, Fangwen Fu, Abhishek R. Appu, Chunhui Mei, Changwon Rhee
  • Publication number: 20240111826
    Abstract: An apparatus to facilitate hardware enhancements for double precision systolic support is disclosed. The apparatus includes matrix acceleration hardware having double-precision (DP) matrix multiplication circuitry including a multiplier circuits to multiply pairs of input source operands in a DP floating-point format; adders to receive multiplier outputs from the multiplier circuits and accumulate the multiplier outputs in a high precision intermediate format; an accumulator circuit to accumulate adder outputs from the adders with at least one of a third global source operand on a first pass of the DP matrix multiplication circuitry or an intermediate result from the first pass on a second pass of the DP matrix multiplication circuitry, wherein the accumulator circuit to generate an accumulator output in the high precision intermediate format; and a down conversion and rounding circuit to down convert and round an output of the second pass as final result in the DP floating-point format.
    Type: Application
    Filed: September 30, 2022
    Publication date: April 4, 2024
    Applicant: Intel Corporation
    Inventors: Jiasheng Chen, Kevin Hurd, Changwon Rhee, Jorge Parra, Fangwen Fu, Theo Drane, William Zorn, Peter Caday, Gregory Henry, Guei-Yuan Lueh, Farzad Chehrazi, Amit Karande, Turbo Majumder, Xinmin Tian, Milind Girkar, Hong Jiang
  • Publication number: 20240111825
    Abstract: An apparatus to facilitate single precision support for systolic pipeline in a graphics environment is disclosed. The apparatus includes a processor comprising systolic array hardware including a plurality of data processing units, wherein the systolic array hardware is to: receive data for performance of a matrix multiplication operation in a first precision format; convert an original value of the data into two split values with a second precision format having a lower precision than the first precision format; perform the matrix multiplication operation using the two split values in the second precision format, the matrix multiplication operation comprising a split-term operation that utilizes two passes through the systolic array hardware with feedback wiring and local reduction; and generate an emulated result for the matrix multiplication operation in the first precision format.
    Type: Application
    Filed: September 30, 2022
    Publication date: April 4, 2024
    Applicant: Intel Corporation
    Inventors: Jiasheng Chen, Changwon Rhee, Kevin Hurd, Gregory Henry, Peter Caday, Kristopher Wong
  • Publication number: 20240103810
    Abstract: An apparatus to facilitate supporting vector multiply add with double accumulator access in a graphics environment is disclosed. The apparatus includes a processor comprising processing resources, the processing resources comprising multiplier circuitry to: receive operands for a matrix multiplication operation, wherein the operands comprising two source matrices to be multiplied as part of the matrix multiplication operation; and issue a multiply and add vector (MADV) instruction for the multiplication operation utilizing a double accumulator access output, wherein the MADV instruction to multiply two vectors of the two source matrices in a single floating point (FP) pipeline of the processor.
    Type: Application
    Filed: September 27, 2022
    Publication date: March 28, 2024
    Applicant: Intel Corporation
    Inventors: Jiasheng Chen, Supratim Pal, Changwon Rhee, Hong Jiang, Kevin Hurd, Shuai Mu
  • Publication number: 20230086275
    Abstract: Emulating floating point calculation using lower precision format calculations is described. An example of a processor includes a floating point unit (FPU) to provide a native floating point operation in a first precision format; and systolic array hardware including multiple data processing units, wherein the processor is to receive data for performance of a matrix multiplication operation in the first precision format; enable an emulated floating point multiplication operation using one or more values with a second precision format, the second precision format having a lower precision than the first precision format, the emulated floating point multiplication including operation of the systolic array hardware; and generate an emulated result for the matrix multiplication operation.
    Type: Application
    Filed: September 22, 2021
    Publication date: March 23, 2023
    Applicant: Intel Corporation
    Inventors: Jiasheng Chen, Changwon Rhee, Sabareesh Ganapathy, Gregory Henry, Fangwen Fu
  • Publication number: 20220416999
    Abstract: An apparatus to facilitate a fused instruction to accelerate performance of secure hash algorithm 2 (SHA-2) in a graphics environment is disclosed. The apparatus includes a processor comprising processing resources, the processing resources comprising execution circuitry to receive a fused SHA instruction identifying a length corresponding to a data size of the fused SHA instruction and a functional control identifying an operation type of the fused SHA instruction; based on decoding the fused SHA instruction, cause a sub-function identified by the length and the function control to be scheduled to an integer pipeline of the execution resource; and execute the sub-function of the fused SHA instruction in an integer pipeline of the execution circuitry, the sub-function to perform merged operations on a source operand of the fused SHA instruction, the merged operations comprising a rotate operation, a shift operation, and an xor operation.
    Type: Application
    Filed: June 25, 2021
    Publication date: December 29, 2022
    Applicant: Intel Corporation
    Inventors: Supratim Pal, Wajdi Feghali, Changwon Rhee, Wei-Yu Chen, Timothy R. Bauer, Alexander Lyashevsky
  • Publication number: 20220413848
    Abstract: An apparatus to facilitate large integer multiplication enhancements in a graphics environment is disclosed. The apparatus includes a processor comprising processing resources, the processing resources comprising multiplier circuitry to: receive operands for a multiplication operation, wherein the multiplication operation is part of a chain of multiplication operations for a large integer multiplication; and issue a multiply and add (MAD) instruction for the multiplication operation utilizing at least one of a double precision multiplier or a 48 bit output, wherein the MAD instruction to generate an output in a single clock cycle of the processor.
    Type: Application
    Filed: June 25, 2021
    Publication date: December 29, 2022
    Applicant: Intel Corporation
    Inventors: Supratim Pal, Li-An Tang, Changwon Rhee, Timothy R. Bauer, Alexander Lyashevsky, Jiasheng Chen
  • Publication number: 20220413854
    Abstract: An apparatus to facilitate 64-bit two-dimensional (2D) block load with transpose is disclosed. The apparatus includes a processor comprising processing resources; and load store pipeline hardware circuitry coupled to the processing resources, the load store pipeline hardware circuitry to receive a 64-bit two-dimensional (2D) block load message with transpose from the processing resources. The load store pipeline hardware circuitry comprising a load store pipeline sequencer to map rows of a block of memory corresponding to the 64-bit 2D block load message with transpose to 64-bit standard load messages; and load store pipeline return circuitry to: sequentially number general register files (GRFs) used for returning elements of the block of memory accessed by the 64-bit standard load messages to the processing resources; and return, to the processing resources, the sequentially numbered GRFs in response to the 64-bit 2D block load message with transpose.
    Type: Application
    Filed: June 25, 2021
    Publication date: December 29, 2022
    Applicant: Intel Corporation
    Inventors: Joydeep Ray, Supratim Pal, Prathamesh Raghunath Shinde, Ben J. Ashbaugh, Changwon Rhee, Hong Jiang, FangWen Fu
  • Publication number: 20220414010
    Abstract: Embodiments are generally directed to methods and apparatuses for dynamically changing data priority in a cache. An embodiment of an apparatus comprising: a priority controller to: receive a memory access request to request data; and set a priority flag for the memory access request based on an accumulated access amount of data stored in a memory block to be accessed by the memory access request to dynamically change a priority level of the requested data.
    Type: Application
    Filed: March 25, 2022
    Publication date: December 29, 2022
    Applicant: Intel Corporation
    Inventors: Xiaodong Qiu, Yong Jiang, Changwon Rhee, Cui Tang, Shuangpeng Zhou, Lei Chen, Danyu Bi, Peiqing Jiang, Chengxi Wu
  • Patent number: 10484640
    Abstract: Techniques related to compositing video content are discussed. Such techniques may include generating transparency data for a surface of first video content and storing it in a stream out buffer, accessing the transparency data via the stream out buffer when there is no change to the surface of the first video content, and compositing the first video content with second video content based on the accessed transparency data.
    Type: Grant
    Filed: June 3, 2015
    Date of Patent: November 19, 2019
    Assignee: Intel Corporation
    Inventors: Yunbiao Lin, Changliang Wang, Tiehan Lu, Qingyuan Zhang, Li Xu, Bo Zhao, Changwon Rhee
  • Publication number: 20180288353
    Abstract: Techniques related to compositing video content are discussed. Such techniques may include generating transparency data for a surface of first video content and storing it in a stream out buffer, accessing the transparency data via the stream out buffer when there is no change to the surface of the first video content, and compositing the first video content with second video content based on the accessed transparency data.
    Type: Application
    Filed: June 3, 2015
    Publication date: October 4, 2018
    Inventors: Yunbiao LIN, Changliang WANG, Tiehan LU, Qingyuan ZHANG, Li XU, Bo ZHAO, Changwon RHEE