Patents by Inventor THOMAS F. RAOUX
THOMAS F. RAOUX has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11776195Abstract: Examples are described here that can be used to enable a main routine to request subroutines or other related code to be executed with other instantiations of the same subroutine or other related code for parallel execution. A sorting unit can be used to accumulate requests to execute instantiations of the subroutine. The sorting unit can request execution of a number of multiple instantiations of the subroutine corresponding to a number of lanes in a SIMD unit. A call stack can be used to share information to be accessed by a main routine after execution of the subroutine completes.Type: GrantFiled: August 31, 2021Date of Patent: October 3, 2023Assignee: Intel CorporationInventors: John G. Gierach, Karthik Vaidyanathan, Thomas F. Raoux
-
Publication number: 20220068005Abstract: Examples are described here that can be used to enable a main routine to request subroutines or other related code to be executed with other instantiations of the same subroutine or other related code for parallel execution. A sorting unit can be used to accumulate requests to execute instantiations of the subroutine. The sorting unit can request execution of a number of multiple instantiations of the subroutine corresponding to a number of lanes in a SIMD unit. A call stack can be used to share information to be accessed by a main routine after execution of the subroutine completes.Type: ApplicationFiled: August 31, 2021Publication date: March 3, 2022Inventors: John G. GIERACH, Karthik VAIDYANATHAN, Thomas F. RAOUX
-
Patent number: 11107263Abstract: Examples are described here that can be used to enable a main routine to request subroutines or other related code to be executed with other instantiations of the same subroutine or other related code for parallel execution. A sorting unit can be used to accumulate requests to execute instantiations of the subroutine. The sorting unit can request execution of a number of multiple instantiations of the subroutine corresponding to a number of lanes in a SIMD unit. A call stack can be used to share information to be accessed by a main routine after execution of the subroutine completes.Type: GrantFiled: November 13, 2018Date of Patent: August 31, 2021Assignee: Intel CorporationInventors: John G. Gierach, Karthik Vaidyanathan, Thomas F. Raoux
-
Patent number: 10796397Abstract: A mechanism is described for facilitating dynamic runtime transformation of graphics processing commands for improved graphics performance on computing devices. A method of embodiments, as described herein, includes detecting a command stream associated with an application, where the command stream includes dispatches. The method may further include evaluating processing parameters relating to each of the dispatches, where evaluating further includes associating a first plan with one or more of the dispatches to transform the command stream into a transformed command stream. The method may further include associating, based on the first plan, a second plan to the one or more of the dispatches, where the second plan represents the transformed command stream. The method may further include executing the second plan, where execution of the second plan includes processing the transformed command stream in lieu of the command stream.Type: GrantFiled: June 12, 2015Date of Patent: October 6, 2020Assignee: INTEL CORPORATIONInventors: James A. Valerio, Abhishek Venkatesh, Satyajit Sarangi, Michael Apodaca, Thomas F. Raoux, Hashem Hashemi, Rama S. B. Harihara
-
Patent number: 10726605Abstract: Various embodiments enable low frequency calculation of derived uniform values. A compiler can identify one or more portions of a shader that calculate a derived value based on an input value. For example, this portion may include instructions that use constant values, or the results of prior functions that used constant values. The constant values may include hardcoded values provided by the program (e.g., immediates) and/or other constant values. This portion of the shader can be extracted by the compiler and compiled into a first program. The compiler can compile the remainder of the shader into a second program that receives the derived uniform values from the first program. By extracting the portion(s) of the program that calculates a derived value into a separate program, the derived uniform value or values can be calculated at a lower frequency than if they were calculated for each pixel.Type: GrantFiled: September 15, 2017Date of Patent: July 28, 2020Assignee: Intel CorporationInventors: Travis T. Schluessler, Aleksander Neyman, Guei-Yuan Lueh, Thomas F. Raoux, Bartosz Spitzbarth
-
Publication number: 20200151936Abstract: Examples are described here that can be used to enable a main routine to request subroutines or other related code to be executed with other instantiations of the same subroutine or other related code for parallel execution. A sorting unit can be used to accumulate requests to execute instantiations of the subroutine. The sorting unit can request execution of a number of multiple instantiations of the subroutine corresponding to a number of lanes in a SIMD unit. A call stack can be used to share information to be accessed by a main routine after execution of the subroutine completes.Type: ApplicationFiled: November 13, 2018Publication date: May 14, 2020Inventors: John G. GIERACH, Karthik VAIDYANATHAN, Thomas F. RAOUX
-
Patent number: 10552934Abstract: Methods and apparatus relating to reducing memory latency in graphics operations are described. In an embodiment, uniform data is transferred from a buffer to a General Register File (GRF) of a processor based at least in part on information stored in a gather table. The uniform data comprises data that is uniform across a plurality of primitives in a graphics operation. Other embodiments are also disclosed and claimed.Type: GrantFiled: July 1, 2016Date of Patent: February 4, 2020Assignee: Intel CorporationInventors: Michael Apodaca, David M. Cimini, Thomas F. Raoux, Somnath Ghosh, Uddipan Mukherjee, Debraj Bose, Sthiti Deka, Yohai Gevim
-
Patent number: 10360717Abstract: An apparatus and method for splitting shaders. For example, one embodiment of a method comprises: receiving a request for compilation of a shader in a graphics processing environment; determining whether there is sufficient work associated with the shader to justify splitting the shader into two or more blocks of program code; evaluating the program code of the shader to identify dependencies between the blocks of program code if there is sufficient work; subdividing the shader into the two or more blocks in accordance with the identified dependencies; and individually executing the two or more blocks of code on a graphics processor. In addition, one embodiment includes the operations of determining whether any of the regions that can be subdivided are likely to run faster with different machine configurations than if the shader is executed without being subdivided, and subdividing the shader only for those regions that are likely to run faster with different machine configurations.Type: GrantFiled: December 29, 2017Date of Patent: July 23, 2019Assignee: Intel CorporationInventors: John G. Gierach, Travis Schluessler, Thomas F. Raoux, Peng Guo
-
Publication number: 20190206110Abstract: An apparatus and method for splitting shaders. For example, one embodiment of a method comprises: receiving a request for compilation of a shader in a graphics processing environment; determining whether there is sufficient work associated with the shader to justify splitting the shader into two or more blocks of program code; evaluating the program code of the shader to identify dependencies between the blocks of program code if there is sufficient work; subdividing the shader into the two or more blocks in accordance with the identified dependencies; and individually executing the two or more blocks of code on a graphics processor. In addition, one embodiment includes the operations of determining whether any of the regions that can be subdivided are likely to run faster with different machine configurations than if the shader is executed without being subdivided, and subdividing the shader only for those regions that are likely to run faster with different machine configurations.Type: ApplicationFiled: December 29, 2017Publication date: July 4, 2019Inventors: JOHN G. GIERACH, TRAVIS SCHLUESSLER, THOMAS F. RAOUX, PENG GUO
-
Patent number: 10318292Abstract: Systems and methods may process a single atomic operation. An instruction set may be generated to replace a plurality of atomic operations with a single atomic operation. The instruction set may include an accumulation instruction to compute a prefix sum for a plurality of initial values associated with a plurality of processing lanes to generate a plurality of accumulated values. The instruction set may also include a broadcast instruction to return a pre-existing value to be added with each of the plurality of accumulated values to generate a plurality of intermediate accumulated values. In one example, a graphics processor may execute the instruction set to process the single atomic operation.Type: GrantFiled: November 17, 2014Date of Patent: June 11, 2019Assignee: Intel CorporationInventors: Satyajit Sarangi, Thomas F. Raoux, Guei-Yuan Lueh, Subramaniam Maiyuran
-
Publication number: 20190087998Abstract: Various embodiments enable low frequency calculation of derived uniform values. A compiler can identify one or more portions of a shader that calculate a derived value based on an input value. For example, this portion may include instructions that use constant values, or the results of prior functions that used constant values. The constant values may include hardcoded values provided by the program (e.g., immediates) and/or other constant values. This portion of the shader can be extracted by the compiler and compiled into a first program. The compiler can compile the remainder of the shader into a second program that receives the derived uniform values from the first program. By extracting the portion(s) of the program that calculates a derived value into a separate program, the derived uniform value or values can be calculated at a lower frequency than if they were calculated for each pixel.Type: ApplicationFiled: September 15, 2017Publication date: March 21, 2019Inventors: Travis T. SCHLUESSLER, Aleksander NEYMAN, Guei-Yuan LUEH, Thomas F. RAOUX, Bartosz SPITZBARTH
-
Publication number: 20190066256Abstract: Techniques to improve graphics processing unit (GPU) performance by introducing specialized code paths to process frequent common values are described. A shader compiler can determine instruction that, during operation, may output a common value and can introduce an enhanced shader instruction branch to process the common value to reduce overall computational requirements to execute the shader.Type: ApplicationFiled: October 25, 2018Publication date: February 28, 2019Applicant: INTEL CORPORATIONInventors: SAURABH SHARMA, ABHISHEK VENKATESH, TRAVIS T. SCHLUESSLER, THOMAS F. RAOUX, RAHUL P. SATHE, JON HASSELGREN
-
Patent number: 10140678Abstract: Techniques to improve graphics processing unit (GPU) performance by introducing specialized code paths to process frequent common values are described. A shader compiler can determine instruction that, during operation, may output a common value and can introduce an enhanced shader instruction branch to process the common value to reduce overall computational requirements to execute the shader.Type: GrantFiled: April 1, 2016Date of Patent: November 27, 2018Assignee: INTEL CORPORATIONInventors: Saurabh Sharma, Abhishek Ventakesh, Travis T. Schluessler, Thomas F. Raoux, Rahul P. Sathe, Jon Hasselgren
-
Publication number: 20180005345Abstract: Methods and apparatus relating to reducing memory latency in graphics operations are described. In an embodiment, uniform data is transferred from a buffer to a General Register File (GRF) of a processor based at least in part on information stored in a gather table. The uniform data comprises data that is uniform across a plurality of primitives in a graphics operation. Other embodiments are also disclosed and claimed.Type: ApplicationFiled: July 1, 2016Publication date: January 4, 2018Applicant: Intel CorporationInventors: Michael Apodaca, David M. Cimini, Thomas F. Raoux, Somnath Ghosh, Uddipan Mukherjee, Debraj Bose, Sthiti Deka, Yohai Gevim
-
Publication number: 20170178277Abstract: Techniques to improve graphics processing unit (GPU) performance by introducing specialized code paths to process frequent common values are described. A shader compiler can determine instruction that, during operation, may output a common value and can introduce an enhanced shader instruction branch to process the common value to reduce overall computational requirements to execute the shader.Type: ApplicationFiled: April 1, 2016Publication date: June 22, 2017Inventors: SAURABH SHARMA, ABHISHEK VENKATESH, TRAVIS T. SCHLUESSLER, THOMAS F. RAOUX, RAHUL P. SATHE, JON HASSELGREN
-
Publication number: 20170178384Abstract: Reducing SIMD fragmentation for SIMD execution widths of 32 or even 64 channels in a single hardware thread leads to better EU utilization. Increasing SIMD execution widths to 32 or 64 channels per thread, enables handling more vertices, patches, primitives and triangles per EU hardware thread. Modified 3D pipeline shader payloads can handle multiple patches in case of domain shaders or multiple primitives when primitive object instance count is greater than one in the case of geometry shaders and multiple triangles in case of pixel shaders.Type: ApplicationFiled: December 21, 2015Publication date: June 22, 2017Inventors: Jayashree Venkatesh, Gang Chen, Thomas F. Raoux, Guei-Yuan Lueh, Subramaniam Maiyuran
-
Patent number: 9632979Abstract: An apparatus and method are described for performing a prefix sum. For example, one embodiment of an apparatus comprises: a graphics processor unit comprising one or more execution units to execute single instruction multiple data (SIMD) instructions, the GPU to be provided with a plurality of data elements as input for a prefix sum operation; a first register of the GPU to store the plurality of data elements in specified data element positions; and the one or more execution units to perform a series of single instruction multiple data (SIMD) operations using the plurality of data elements, the SIMD operations performed using regioning techniques to generate the prefix sum, the SIMD operations including a first plurality of simultaneous addition operations to add specified data elements to generate intermediate results and further including a second plurality of simultaneous addition operations to add the intermediate results to other intermediate results to generate the prefix sum.Type: GrantFiled: June 1, 2015Date of Patent: April 25, 2017Assignee: Intel CorporationInventors: Satyajit Sarangi, Thomas F. Raoux
-
Publication number: 20160364828Abstract: A mechanism is described for facilitating dynamic runtime transformation of graphics processing commands for improved graphics performance on computing devices. A method of embodiments, as described herein, includes detecting a command stream associated with an application, where the command stream includes dispatches. The method may further include evaluating processing parameters relating to each of the dispatches, where evaluating further includes associating a first plan with one or more of the dispatches to transform the command stream into a transformed command stream. The method may further include associating, based on the first plan, a second plan to the one or more of the dispatches, where the second plan represents the transformed command stream. The method may further include executing the second plan, where execution of the second plan includes processing the transformed command stream in lieu of the command stream.Type: ApplicationFiled: June 12, 2015Publication date: December 15, 2016Applicant: INTEL CORPORATIONInventors: James A. Valerio, Abhishek Venkatesh, Satyajit Sarangi, Michael Apodaca, Thomas F. Raoux, Hashem Hashemi, Rama S.B. Harihara
-
Publication number: 20160350262Abstract: An apparatus and method are described for performing a prefix sum. For example, one embodiment of an apparatus comprises: a graphics processor unit comprising one or more execution units to execute single instruction multiple data (SIMD) instructions, the GPU to be provided with a plurality of data elements as input for a prefix sum operation; a first register of the GPU to store the plurality of data elements in specified data element positions; and the one or more execution units to perform a series of single instruction multiple data (SIMD) operations using the plurality of data elements, the SIMD operations performed using regioning techniques to generate the prefix sum, the SIMD operations including a first plurality of simultaneous addition operations to add specified data elements to generate intermediate results and further including a second plurality of simultaneous addition operations to add the intermediate results to other intermediate results to generate the prefix sum.Type: ApplicationFiled: June 1, 2015Publication date: December 1, 2016Inventors: SATYAJIT SARANGI, THOMAS F. RAOUX
-
Publication number: 20160139934Abstract: Systems and methods may process a single atomic operation. An instruction set may be generated to replace a plurality of atomic operations with a single atomic operation. The instruction set may include an accumulation instruction to compute a prefix sum for a plurality of initial values associated with a plurality of processing lanes to generate a plurality of accumulated values. The instruction set may also include a broadcast instruction to return a pre-existing value to be added with each of the plurality of accumulated values to generate a plurality of intermediate accumulated values. In one example, a graphics processor may execute the instruction set to process the single atomic operation.Type: ApplicationFiled: November 17, 2014Publication date: May 19, 2016Applicant: Intel CorporationInventors: SATYAJIT SARANGI, THOMAS F. RAOUX, GUEI-YUAN LUEH, SUBRAMANIAM MAIYURAN