Patents by Inventor Changwon RHEE
Changwon RHEE has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240134797Abstract: Embodiments described herein provide a technique to facilitate the broadcast or multicast of asynchronous loads to shared local memory of a plurality of graphics cores within a graphics core cluster. One embodiment provides a graphics processor including a cache memory a graphics core cluster coupled with the cache memory. The graphics core cluster includes a plurality of graphics cores. The plurality of graphics cores includes a graphics core configured to receive a designation as a producer graphics core for a multicast load, read data from the cache memory; and transmit the data read from the cache memory to a consumer graphics core of the plurality of graphics cores.Type: ApplicationFiled: October 24, 2022Publication date: April 25, 2024Applicant: Intel CorporationInventors: John A. Wiegert, Joydeep Ray, Vasanth Ranganathan, Biju George, Fangwen Fu, Abhishek R. Appu, Chunhui Mei, Changwon Rhee
-
Publication number: 20240111826Abstract: An apparatus to facilitate hardware enhancements for double precision systolic support is disclosed. The apparatus includes matrix acceleration hardware having double-precision (DP) matrix multiplication circuitry including a multiplier circuits to multiply pairs of input source operands in a DP floating-point format; adders to receive multiplier outputs from the multiplier circuits and accumulate the multiplier outputs in a high precision intermediate format; an accumulator circuit to accumulate adder outputs from the adders with at least one of a third global source operand on a first pass of the DP matrix multiplication circuitry or an intermediate result from the first pass on a second pass of the DP matrix multiplication circuitry, wherein the accumulator circuit to generate an accumulator output in the high precision intermediate format; and a down conversion and rounding circuit to down convert and round an output of the second pass as final result in the DP floating-point format.Type: ApplicationFiled: September 30, 2022Publication date: April 4, 2024Applicant: Intel CorporationInventors: Jiasheng Chen, Kevin Hurd, Changwon Rhee, Jorge Parra, Fangwen Fu, Theo Drane, William Zorn, Peter Caday, Gregory Henry, Guei-Yuan Lueh, Farzad Chehrazi, Amit Karande, Turbo Majumder, Xinmin Tian, Milind Girkar, Hong Jiang
-
Publication number: 20240111825Abstract: An apparatus to facilitate single precision support for systolic pipeline in a graphics environment is disclosed. The apparatus includes a processor comprising systolic array hardware including a plurality of data processing units, wherein the systolic array hardware is to: receive data for performance of a matrix multiplication operation in a first precision format; convert an original value of the data into two split values with a second precision format having a lower precision than the first precision format; perform the matrix multiplication operation using the two split values in the second precision format, the matrix multiplication operation comprising a split-term operation that utilizes two passes through the systolic array hardware with feedback wiring and local reduction; and generate an emulated result for the matrix multiplication operation in the first precision format.Type: ApplicationFiled: September 30, 2022Publication date: April 4, 2024Applicant: Intel CorporationInventors: Jiasheng Chen, Changwon Rhee, Kevin Hurd, Gregory Henry, Peter Caday, Kristopher Wong
-
Publication number: 20240103810Abstract: An apparatus to facilitate supporting vector multiply add with double accumulator access in a graphics environment is disclosed. The apparatus includes a processor comprising processing resources, the processing resources comprising multiplier circuitry to: receive operands for a matrix multiplication operation, wherein the operands comprising two source matrices to be multiplied as part of the matrix multiplication operation; and issue a multiply and add vector (MADV) instruction for the multiplication operation utilizing a double accumulator access output, wherein the MADV instruction to multiply two vectors of the two source matrices in a single floating point (FP) pipeline of the processor.Type: ApplicationFiled: September 27, 2022Publication date: March 28, 2024Applicant: Intel CorporationInventors: Jiasheng Chen, Supratim Pal, Changwon Rhee, Hong Jiang, Kevin Hurd, Shuai Mu
-
Publication number: 20230086275Abstract: Emulating floating point calculation using lower precision format calculations is described. An example of a processor includes a floating point unit (FPU) to provide a native floating point operation in a first precision format; and systolic array hardware including multiple data processing units, wherein the processor is to receive data for performance of a matrix multiplication operation in the first precision format; enable an emulated floating point multiplication operation using one or more values with a second precision format, the second precision format having a lower precision than the first precision format, the emulated floating point multiplication including operation of the systolic array hardware; and generate an emulated result for the matrix multiplication operation.Type: ApplicationFiled: September 22, 2021Publication date: March 23, 2023Applicant: Intel CorporationInventors: Jiasheng Chen, Changwon Rhee, Sabareesh Ganapathy, Gregory Henry, Fangwen Fu
-
Publication number: 20220416999Abstract: An apparatus to facilitate a fused instruction to accelerate performance of secure hash algorithm 2 (SHA-2) in a graphics environment is disclosed. The apparatus includes a processor comprising processing resources, the processing resources comprising execution circuitry to receive a fused SHA instruction identifying a length corresponding to a data size of the fused SHA instruction and a functional control identifying an operation type of the fused SHA instruction; based on decoding the fused SHA instruction, cause a sub-function identified by the length and the function control to be scheduled to an integer pipeline of the execution resource; and execute the sub-function of the fused SHA instruction in an integer pipeline of the execution circuitry, the sub-function to perform merged operations on a source operand of the fused SHA instruction, the merged operations comprising a rotate operation, a shift operation, and an xor operation.Type: ApplicationFiled: June 25, 2021Publication date: December 29, 2022Applicant: Intel CorporationInventors: Supratim Pal, Wajdi Feghali, Changwon Rhee, Wei-Yu Chen, Timothy R. Bauer, Alexander Lyashevsky
-
Publication number: 20220413848Abstract: An apparatus to facilitate large integer multiplication enhancements in a graphics environment is disclosed. The apparatus includes a processor comprising processing resources, the processing resources comprising multiplier circuitry to: receive operands for a multiplication operation, wherein the multiplication operation is part of a chain of multiplication operations for a large integer multiplication; and issue a multiply and add (MAD) instruction for the multiplication operation utilizing at least one of a double precision multiplier or a 48 bit output, wherein the MAD instruction to generate an output in a single clock cycle of the processor.Type: ApplicationFiled: June 25, 2021Publication date: December 29, 2022Applicant: Intel CorporationInventors: Supratim Pal, Li-An Tang, Changwon Rhee, Timothy R. Bauer, Alexander Lyashevsky, Jiasheng Chen
-
Publication number: 20220413854Abstract: An apparatus to facilitate 64-bit two-dimensional (2D) block load with transpose is disclosed. The apparatus includes a processor comprising processing resources; and load store pipeline hardware circuitry coupled to the processing resources, the load store pipeline hardware circuitry to receive a 64-bit two-dimensional (2D) block load message with transpose from the processing resources. The load store pipeline hardware circuitry comprising a load store pipeline sequencer to map rows of a block of memory corresponding to the 64-bit 2D block load message with transpose to 64-bit standard load messages; and load store pipeline return circuitry to: sequentially number general register files (GRFs) used for returning elements of the block of memory accessed by the 64-bit standard load messages to the processing resources; and return, to the processing resources, the sequentially numbered GRFs in response to the 64-bit 2D block load message with transpose.Type: ApplicationFiled: June 25, 2021Publication date: December 29, 2022Applicant: Intel CorporationInventors: Joydeep Ray, Supratim Pal, Prathamesh Raghunath Shinde, Ben J. Ashbaugh, Changwon Rhee, Hong Jiang, FangWen Fu
-
Publication number: 20220414010Abstract: Embodiments are generally directed to methods and apparatuses for dynamically changing data priority in a cache. An embodiment of an apparatus comprising: a priority controller to: receive a memory access request to request data; and set a priority flag for the memory access request based on an accumulated access amount of data stored in a memory block to be accessed by the memory access request to dynamically change a priority level of the requested data.Type: ApplicationFiled: March 25, 2022Publication date: December 29, 2022Applicant: Intel CorporationInventors: Xiaodong Qiu, Yong Jiang, Changwon Rhee, Cui Tang, Shuangpeng Zhou, Lei Chen, Danyu Bi, Peiqing Jiang, Chengxi Wu
-
Patent number: 10484640Abstract: Techniques related to compositing video content are discussed. Such techniques may include generating transparency data for a surface of first video content and storing it in a stream out buffer, accessing the transparency data via the stream out buffer when there is no change to the surface of the first video content, and compositing the first video content with second video content based on the accessed transparency data.Type: GrantFiled: June 3, 2015Date of Patent: November 19, 2019Assignee: Intel CorporationInventors: Yunbiao Lin, Changliang Wang, Tiehan Lu, Qingyuan Zhang, Li Xu, Bo Zhao, Changwon Rhee
-
Publication number: 20180288353Abstract: Techniques related to compositing video content are discussed. Such techniques may include generating transparency data for a surface of first video content and storing it in a stream out buffer, accessing the transparency data via the stream out buffer when there is no change to the surface of the first video content, and compositing the first video content with second video content based on the accessed transparency data.Type: ApplicationFiled: June 3, 2015Publication date: October 4, 2018Inventors: Yunbiao LIN, Changliang WANG, Tiehan LU, Qingyuan ZHANG, Li XU, Bo ZHAO, Changwon RHEE