Patents by Inventor THOMAS F. RAOUX

THOMAS F. RAOUX has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Techniques to manage execution of divergent shaders

Patent number: 11776195

Abstract: Examples are described here that can be used to enable a main routine to request subroutines or other related code to be executed with other instantiations of the same subroutine or other related code for parallel execution. A sorting unit can be used to accumulate requests to execute instantiations of the subroutine. The sorting unit can request execution of a number of multiple instantiations of the subroutine corresponding to a number of lanes in a SIMD unit. A call stack can be used to share information to be accessed by a main routine after execution of the subroutine completes.

Type: Grant

Filed: August 31, 2021

Date of Patent: October 3, 2023

Assignee: Intel Corporation

Inventors: John G. Gierach, Karthik Vaidyanathan, Thomas F. Raoux
TECHNIQUES TO MANAGE EXECUTION OF DIVERGENT SHADERS

Publication number: 20220068005

Abstract: Examples are described here that can be used to enable a main routine to request subroutines or other related code to be executed with other instantiations of the same subroutine or other related code for parallel execution. A sorting unit can be used to accumulate requests to execute instantiations of the subroutine. The sorting unit can request execution of a number of multiple instantiations of the subroutine corresponding to a number of lanes in a SIMD unit. A call stack can be used to share information to be accessed by a main routine after execution of the subroutine completes.

Type: Application

Filed: August 31, 2021

Publication date: March 3, 2022

Inventors: John G. GIERACH, Karthik VAIDYANATHAN, Thomas F. RAOUX
Techniques to manage execution of divergent shaders

Patent number: 11107263

Abstract: Examples are described here that can be used to enable a main routine to request subroutines or other related code to be executed with other instantiations of the same subroutine or other related code for parallel execution. A sorting unit can be used to accumulate requests to execute instantiations of the subroutine. The sorting unit can request execution of a number of multiple instantiations of the subroutine corresponding to a number of lanes in a SIMD unit. A call stack can be used to share information to be accessed by a main routine after execution of the subroutine completes.

Type: Grant

Filed: November 13, 2018

Date of Patent: August 31, 2021

Assignee: Intel Corporation

Inventors: John G. Gierach, Karthik Vaidyanathan, Thomas F. Raoux
Facilitating dynamic runtime transformation of graphics processing commands for improved graphics performance at computing devices

Patent number: 10796397

Abstract: A mechanism is described for facilitating dynamic runtime transformation of graphics processing commands for improved graphics performance on computing devices. A method of embodiments, as described herein, includes detecting a command stream associated with an application, where the command stream includes dispatches. The method may further include evaluating processing parameters relating to each of the dispatches, where evaluating further includes associating a first plan with one or more of the dispatches to transform the command stream into a transformed command stream. The method may further include associating, based on the first plan, a second plan to the one or more of the dispatches, where the second plan represents the transformed command stream. The method may further include executing the second plan, where execution of the second plan includes processing the transformed command stream in lieu of the command stream.

Type: Grant

Filed: June 12, 2015

Date of Patent: October 6, 2020

Assignee: INTEL CORPORATION

Inventors: James A. Valerio, Abhishek Venkatesh, Satyajit Sarangi, Michael Apodaca, Thomas F. Raoux, Hashem Hashemi, Rama S. B. Harihara
Method and apparatus for efficient processing of derived uniform values in a graphics processor

Patent number: 10726605

Abstract: Various embodiments enable low frequency calculation of derived uniform values. A compiler can identify one or more portions of a shader that calculate a derived value based on an input value. For example, this portion may include instructions that use constant values, or the results of prior functions that used constant values. The constant values may include hardcoded values provided by the program (e.g., immediates) and/or other constant values. This portion of the shader can be extracted by the compiler and compiled into a first program. The compiler can compile the remainder of the shader into a second program that receives the derived uniform values from the first program. By extracting the portion(s) of the program that calculates a derived value into a separate program, the derived uniform value or values can be calculated at a lower frequency than if they were calculated for each pixel.

Type: Grant

Filed: September 15, 2017

Date of Patent: July 28, 2020

Assignee: Intel Corporation

Inventors: Travis T. Schluessler, Aleksander Neyman, Guei-Yuan Lueh, Thomas F. Raoux, Bartosz Spitzbarth
TECHNIQUES TO MANAGE EXECUTION OF SHADERS

Publication number: 20200151936

Abstract: Examples are described here that can be used to enable a main routine to request subroutines or other related code to be executed with other instantiations of the same subroutine or other related code for parallel execution. A sorting unit can be used to accumulate requests to execute instantiations of the subroutine. The sorting unit can request execution of a number of multiple instantiations of the subroutine corresponding to a number of lanes in a SIMD unit. A call stack can be used to share information to be accessed by a main routine after execution of the subroutine completes.

Type: Application

Filed: November 13, 2018

Publication date: May 14, 2020

Inventors: John G. GIERACH, Karthik VAIDYANATHAN, Thomas F. RAOUX
Reducing memory latency in graphics operations

Patent number: 10552934

Abstract: Methods and apparatus relating to reducing memory latency in graphics operations are described. In an embodiment, uniform data is transferred from a buffer to a General Register File (GRF) of a processor based at least in part on information stored in a gather table. The uniform data comprises data that is uniform across a plurality of primitives in a graphics operation. Other embodiments are also disclosed and claimed.

Type: Grant

Filed: July 1, 2016

Date of Patent: February 4, 2020

Assignee: Intel Corporation

Inventors: Michael Apodaca, David M. Cimini, Thomas F. Raoux, Somnath Ghosh, Uddipan Mukherjee, Debraj Bose, Sthiti Deka, Yohai Gevim
Method and apparatus for subdividing shader workloads in a graphics processor for efficient machine configuration

Patent number: 10360717

Abstract: An apparatus and method for splitting shaders. For example, one embodiment of a method comprises: receiving a request for compilation of a shader in a graphics processing environment; determining whether there is sufficient work associated with the shader to justify splitting the shader into two or more blocks of program code; evaluating the program code of the shader to identify dependencies between the blocks of program code if there is sufficient work; subdividing the shader into the two or more blocks in accordance with the identified dependencies; and individually executing the two or more blocks of code on a graphics processor. In addition, one embodiment includes the operations of determining whether any of the regions that can be subdivided are likely to run faster with different machine configurations than if the shader is executed without being subdivided, and subdividing the shader only for those regions that are likely to run faster with different machine configurations.

Type: Grant

Filed: December 29, 2017

Date of Patent: July 23, 2019

Assignee: Intel Corporation

Inventors: John G. Gierach, Travis Schluessler, Thomas F. Raoux, Peng Guo
METHOD AND APPARATUS FOR SUBDIVIDING SHADER WORKLOADS IN A GRAPHICS PROCESSOR FOR EFFICIENT MACHINE CONFIGURATION

Publication number: 20190206110

Abstract: An apparatus and method for splitting shaders. For example, one embodiment of a method comprises: receiving a request for compilation of a shader in a graphics processing environment; determining whether there is sufficient work associated with the shader to justify splitting the shader into two or more blocks of program code; evaluating the program code of the shader to identify dependencies between the blocks of program code if there is sufficient work; subdividing the shader into the two or more blocks in accordance with the identified dependencies; and individually executing the two or more blocks of code on a graphics processor. In addition, one embodiment includes the operations of determining whether any of the regions that can be subdivided are likely to run faster with different machine configurations than if the shader is executed without being subdivided, and subdividing the shader only for those regions that are likely to run faster with different machine configurations.

Type: Application

Filed: December 29, 2017

Publication date: July 4, 2019

Inventors: JOHN G. GIERACH, TRAVIS SCHLUESSLER, THOMAS F. RAOUX, PENG GUO
Hardware instruction set to replace a plurality of atomic operations with a single atomic operation

Patent number: 10318292

Abstract: Systems and methods may process a single atomic operation. An instruction set may be generated to replace a plurality of atomic operations with a single atomic operation. The instruction set may include an accumulation instruction to compute a prefix sum for a plurality of initial values associated with a plurality of processing lanes to generate a plurality of accumulated values. The instruction set may also include a broadcast instruction to return a pre-existing value to be added with each of the plurality of accumulated values to generate a plurality of intermediate accumulated values. In one example, a graphics processor may execute the instruction set to process the single atomic operation.

Type: Grant

Filed: November 17, 2014

Date of Patent: June 11, 2019

Assignee: Intel Corporation

Inventors: Satyajit Sarangi, Thomas F. Raoux, Guei-Yuan Lueh, Subramaniam Maiyuran
METHOD AND APPARATUS FOR EFFICIENT PROCESSING OF DERIVED UNIFORM VALUES IN A GRAPHICS PROCESSOR

Publication number: 20190087998

Abstract: Various embodiments enable low frequency calculation of derived uniform values. A compiler can identify one or more portions of a shader that calculate a derived value based on an input value. For example, this portion may include instructions that use constant values, or the results of prior functions that used constant values. The constant values may include hardcoded values provided by the program (e.g., immediates) and/or other constant values. This portion of the shader can be extracted by the compiler and compiled into a first program. The compiler can compile the remainder of the shader into a second program that receives the derived uniform values from the first program. By extracting the portion(s) of the program that calculates a derived value into a separate program, the derived uniform value or values can be calculated at a lower frequency than if they were calculated for each pixel.

Type: Application

Filed: September 15, 2017

Publication date: March 21, 2019

Inventors: Travis T. SCHLUESSLER, Aleksander NEYMAN, Guei-Yuan LUEH, Thomas F. RAOUX, Bartosz SPITZBARTH
SPECIALIZED CODE PATHS IN GPU PROCESSING

Publication number: 20190066256

Abstract: Techniques to improve graphics processing unit (GPU) performance by introducing specialized code paths to process frequent common values are described. A shader compiler can determine instruction that, during operation, may output a common value and can introduce an enhanced shader instruction branch to process the common value to reduce overall computational requirements to execute the shader.

Type: Application

Filed: October 25, 2018

Publication date: February 28, 2019

Applicant: INTEL CORPORATION

Inventors: SAURABH SHARMA, ABHISHEK VENKATESH, TRAVIS T. SCHLUESSLER, THOMAS F. RAOUX, RAHUL P. SATHE, JON HASSELGREN
Specialized code paths in GPU processing

Patent number: 10140678

Abstract: Techniques to improve graphics processing unit (GPU) performance by introducing specialized code paths to process frequent common values are described. A shader compiler can determine instruction that, during operation, may output a common value and can introduce an enhanced shader instruction branch to process the common value to reduce overall computational requirements to execute the shader.

Type: Grant

Filed: April 1, 2016

Date of Patent: November 27, 2018

Assignee: INTEL CORPORATION

Inventors: Saurabh Sharma, Abhishek Ventakesh, Travis T. Schluessler, Thomas F. Raoux, Rahul P. Sathe, Jon Hasselgren
REDUCING MEMORY LATENCY IN GRAPHICS OPERATIONS

Publication number: 20180005345

Abstract: Methods and apparatus relating to reducing memory latency in graphics operations are described. In an embodiment, uniform data is transferred from a buffer to a General Register File (GRF) of a processor based at least in part on information stored in a gather table. The uniform data comprises data that is uniform across a plurality of primitives in a graphics operation. Other embodiments are also disclosed and claimed.

Type: Application

Filed: July 1, 2016

Publication date: January 4, 2018

Applicant: Intel Corporation

Inventors: Michael Apodaca, David M. Cimini, Thomas F. Raoux, Somnath Ghosh, Uddipan Mukherjee, Debraj Bose, Sthiti Deka, Yohai Gevim
SPECIALIZED CODE PATHS IN GPU PROCESSING

Publication number: 20170178277

Abstract: Techniques to improve graphics processing unit (GPU) performance by introducing specialized code paths to process frequent common values are described. A shader compiler can determine instruction that, during operation, may output a common value and can introduce an enhanced shader instruction branch to process the common value to reduce overall computational requirements to execute the shader.

Type: Application

Filed: April 1, 2016

Publication date: June 22, 2017

Inventors: SAURABH SHARMA, ABHISHEK VENKATESH, TRAVIS T. SCHLUESSLER, THOMAS F. RAOUX, RAHUL P. SATHE, JON HASSELGREN
Increasing Thread Payload for 3D Pipeline with Wider SIMD Execution Width

Publication number: 20170178384

Abstract: Reducing SIMD fragmentation for SIMD execution widths of 32 or even 64 channels in a single hardware thread leads to better EU utilization. Increasing SIMD execution widths to 32 or 64 channels per thread, enables handling more vertices, patches, primitives and triangles per EU hardware thread. Modified 3D pipeline shader payloads can handle multiple patches in case of domain shaders or multiple primitives when primitive object instance count is greater than one in the case of geometry shaders and multiple triangles in case of pixel shaders.

Type: Application

Filed: December 21, 2015

Publication date: June 22, 2017

Inventors: Jayashree Venkatesh, Gang Chen, Thomas F. Raoux, Guei-Yuan Lueh, Subramaniam Maiyuran
Apparatus and method for efficient prefix sum operation

Patent number: 9632979

Abstract: An apparatus and method are described for performing a prefix sum. For example, one embodiment of an apparatus comprises: a graphics processor unit comprising one or more execution units to execute single instruction multiple data (SIMD) instructions, the GPU to be provided with a plurality of data elements as input for a prefix sum operation; a first register of the GPU to store the plurality of data elements in specified data element positions; and the one or more execution units to perform a series of single instruction multiple data (SIMD) operations using the plurality of data elements, the SIMD operations performed using regioning techniques to generate the prefix sum, the SIMD operations including a first plurality of simultaneous addition operations to add specified data elements to generate intermediate results and further including a second plurality of simultaneous addition operations to add the intermediate results to other intermediate results to generate the prefix sum.

Type: Grant

Filed: June 1, 2015

Date of Patent: April 25, 2017

Assignee: Intel Corporation

Inventors: Satyajit Sarangi, Thomas F. Raoux
FACILITATING DYNAMIC RUNTIME TRANSFORMATION OF GRAPHICS PROCESSING COMMANDS FOR IMPROVED GRAPHICS PERFORMANCE AT COMPUTING DEVICES

Publication number: 20160364828

Abstract: A mechanism is described for facilitating dynamic runtime transformation of graphics processing commands for improved graphics performance on computing devices. A method of embodiments, as described herein, includes detecting a command stream associated with an application, where the command stream includes dispatches. The method may further include evaluating processing parameters relating to each of the dispatches, where evaluating further includes associating a first plan with one or more of the dispatches to transform the command stream into a transformed command stream. The method may further include associating, based on the first plan, a second plan to the one or more of the dispatches, where the second plan represents the transformed command stream. The method may further include executing the second plan, where execution of the second plan includes processing the transformed command stream in lieu of the command stream.

Type: Application

Filed: June 12, 2015

Publication date: December 15, 2016

Applicant: INTEL CORPORATION

Inventors: James A. Valerio, Abhishek Venkatesh, Satyajit Sarangi, Michael Apodaca, Thomas F. Raoux, Hashem Hashemi, Rama S.B. Harihara
APPARATUS AND METHOD FOR EFFICIENT PREFIX SUM OPERATION

Publication number: 20160350262

Abstract: An apparatus and method are described for performing a prefix sum. For example, one embodiment of an apparatus comprises: a graphics processor unit comprising one or more execution units to execute single instruction multiple data (SIMD) instructions, the GPU to be provided with a plurality of data elements as input for a prefix sum operation; a first register of the GPU to store the plurality of data elements in specified data element positions; and the one or more execution units to perform a series of single instruction multiple data (SIMD) operations using the plurality of data elements, the SIMD operations performed using regioning techniques to generate the prefix sum, the SIMD operations including a first plurality of simultaneous addition operations to add specified data elements to generate intermediate results and further including a second plurality of simultaneous addition operations to add the intermediate results to other intermediate results to generate the prefix sum.

Type: Application

Filed: June 1, 2015

Publication date: December 1, 2016

Inventors: SATYAJIT SARANGI, THOMAS F. RAOUX
HARDWARE INSTRUCTION SET TO REPLACE A PLURALITY OF ATOMIC OPERATIONS WITH A SINGLE ATOMIC OPERATION

Publication number: 20160139934

Abstract: Systems and methods may process a single atomic operation. An instruction set may be generated to replace a plurality of atomic operations with a single atomic operation. The instruction set may include an accumulation instruction to compute a prefix sum for a plurality of initial values associated with a plurality of processing lanes to generate a plurality of accumulated values. The instruction set may also include a broadcast instruction to return a pre-existing value to be added with each of the plurality of accumulated values to generate a plurality of intermediate accumulated values. In one example, a graphics processor may execute the instruction set to process the single atomic operation.

Type: Application

Filed: November 17, 2014

Publication date: May 19, 2016

Applicant: Intel Corporation

Inventors: SATYAJIT SARANGI, THOMAS F. RAOUX, GUEI-YUAN LUEH, SUBRAMANIAM MAIYURAN