Patents by Inventor THOMAS F. RAOUX

THOMAS F. RAOUX has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11776195
    Abstract: Examples are described here that can be used to enable a main routine to request subroutines or other related code to be executed with other instantiations of the same subroutine or other related code for parallel execution. A sorting unit can be used to accumulate requests to execute instantiations of the subroutine. The sorting unit can request execution of a number of multiple instantiations of the subroutine corresponding to a number of lanes in a SIMD unit. A call stack can be used to share information to be accessed by a main routine after execution of the subroutine completes.
    Type: Grant
    Filed: August 31, 2021
    Date of Patent: October 3, 2023
    Assignee: Intel Corporation
    Inventors: John G. Gierach, Karthik Vaidyanathan, Thomas F. Raoux
  • Publication number: 20220068005
    Abstract: Examples are described here that can be used to enable a main routine to request subroutines or other related code to be executed with other instantiations of the same subroutine or other related code for parallel execution. A sorting unit can be used to accumulate requests to execute instantiations of the subroutine. The sorting unit can request execution of a number of multiple instantiations of the subroutine corresponding to a number of lanes in a SIMD unit. A call stack can be used to share information to be accessed by a main routine after execution of the subroutine completes.
    Type: Application
    Filed: August 31, 2021
    Publication date: March 3, 2022
    Inventors: John G. GIERACH, Karthik VAIDYANATHAN, Thomas F. RAOUX
  • Patent number: 11107263
    Abstract: Examples are described here that can be used to enable a main routine to request subroutines or other related code to be executed with other instantiations of the same subroutine or other related code for parallel execution. A sorting unit can be used to accumulate requests to execute instantiations of the subroutine. The sorting unit can request execution of a number of multiple instantiations of the subroutine corresponding to a number of lanes in a SIMD unit. A call stack can be used to share information to be accessed by a main routine after execution of the subroutine completes.
    Type: Grant
    Filed: November 13, 2018
    Date of Patent: August 31, 2021
    Assignee: Intel Corporation
    Inventors: John G. Gierach, Karthik Vaidyanathan, Thomas F. Raoux
  • Patent number: 10796397
    Abstract: A mechanism is described for facilitating dynamic runtime transformation of graphics processing commands for improved graphics performance on computing devices. A method of embodiments, as described herein, includes detecting a command stream associated with an application, where the command stream includes dispatches. The method may further include evaluating processing parameters relating to each of the dispatches, where evaluating further includes associating a first plan with one or more of the dispatches to transform the command stream into a transformed command stream. The method may further include associating, based on the first plan, a second plan to the one or more of the dispatches, where the second plan represents the transformed command stream. The method may further include executing the second plan, where execution of the second plan includes processing the transformed command stream in lieu of the command stream.
    Type: Grant
    Filed: June 12, 2015
    Date of Patent: October 6, 2020
    Assignee: INTEL CORPORATION
    Inventors: James A. Valerio, Abhishek Venkatesh, Satyajit Sarangi, Michael Apodaca, Thomas F. Raoux, Hashem Hashemi, Rama S. B. Harihara
  • Patent number: 10726605
    Abstract: Various embodiments enable low frequency calculation of derived uniform values. A compiler can identify one or more portions of a shader that calculate a derived value based on an input value. For example, this portion may include instructions that use constant values, or the results of prior functions that used constant values. The constant values may include hardcoded values provided by the program (e.g., immediates) and/or other constant values. This portion of the shader can be extracted by the compiler and compiled into a first program. The compiler can compile the remainder of the shader into a second program that receives the derived uniform values from the first program. By extracting the portion(s) of the program that calculates a derived value into a separate program, the derived uniform value or values can be calculated at a lower frequency than if they were calculated for each pixel.
    Type: Grant
    Filed: September 15, 2017
    Date of Patent: July 28, 2020
    Assignee: Intel Corporation
    Inventors: Travis T. Schluessler, Aleksander Neyman, Guei-Yuan Lueh, Thomas F. Raoux, Bartosz Spitzbarth
  • Publication number: 20200151936
    Abstract: Examples are described here that can be used to enable a main routine to request subroutines or other related code to be executed with other instantiations of the same subroutine or other related code for parallel execution. A sorting unit can be used to accumulate requests to execute instantiations of the subroutine. The sorting unit can request execution of a number of multiple instantiations of the subroutine corresponding to a number of lanes in a SIMD unit. A call stack can be used to share information to be accessed by a main routine after execution of the subroutine completes.
    Type: Application
    Filed: November 13, 2018
    Publication date: May 14, 2020
    Inventors: John G. GIERACH, Karthik VAIDYANATHAN, Thomas F. RAOUX
  • Patent number: 10552934
    Abstract: Methods and apparatus relating to reducing memory latency in graphics operations are described. In an embodiment, uniform data is transferred from a buffer to a General Register File (GRF) of a processor based at least in part on information stored in a gather table. The uniform data comprises data that is uniform across a plurality of primitives in a graphics operation. Other embodiments are also disclosed and claimed.
    Type: Grant
    Filed: July 1, 2016
    Date of Patent: February 4, 2020
    Assignee: Intel Corporation
    Inventors: Michael Apodaca, David M. Cimini, Thomas F. Raoux, Somnath Ghosh, Uddipan Mukherjee, Debraj Bose, Sthiti Deka, Yohai Gevim
  • Patent number: 10360717
    Abstract: An apparatus and method for splitting shaders. For example, one embodiment of a method comprises: receiving a request for compilation of a shader in a graphics processing environment; determining whether there is sufficient work associated with the shader to justify splitting the shader into two or more blocks of program code; evaluating the program code of the shader to identify dependencies between the blocks of program code if there is sufficient work; subdividing the shader into the two or more blocks in accordance with the identified dependencies; and individually executing the two or more blocks of code on a graphics processor. In addition, one embodiment includes the operations of determining whether any of the regions that can be subdivided are likely to run faster with different machine configurations than if the shader is executed without being subdivided, and subdividing the shader only for those regions that are likely to run faster with different machine configurations.
    Type: Grant
    Filed: December 29, 2017
    Date of Patent: July 23, 2019
    Assignee: Intel Corporation
    Inventors: John G. Gierach, Travis Schluessler, Thomas F. Raoux, Peng Guo
  • Publication number: 20190206110
    Abstract: An apparatus and method for splitting shaders. For example, one embodiment of a method comprises: receiving a request for compilation of a shader in a graphics processing environment; determining whether there is sufficient work associated with the shader to justify splitting the shader into two or more blocks of program code; evaluating the program code of the shader to identify dependencies between the blocks of program code if there is sufficient work; subdividing the shader into the two or more blocks in accordance with the identified dependencies; and individually executing the two or more blocks of code on a graphics processor. In addition, one embodiment includes the operations of determining whether any of the regions that can be subdivided are likely to run faster with different machine configurations than if the shader is executed without being subdivided, and subdividing the shader only for those regions that are likely to run faster with different machine configurations.
    Type: Application
    Filed: December 29, 2017
    Publication date: July 4, 2019
    Inventors: JOHN G. GIERACH, TRAVIS SCHLUESSLER, THOMAS F. RAOUX, PENG GUO
  • Patent number: 10318292
    Abstract: Systems and methods may process a single atomic operation. An instruction set may be generated to replace a plurality of atomic operations with a single atomic operation. The instruction set may include an accumulation instruction to compute a prefix sum for a plurality of initial values associated with a plurality of processing lanes to generate a plurality of accumulated values. The instruction set may also include a broadcast instruction to return a pre-existing value to be added with each of the plurality of accumulated values to generate a plurality of intermediate accumulated values. In one example, a graphics processor may execute the instruction set to process the single atomic operation.
    Type: Grant
    Filed: November 17, 2014
    Date of Patent: June 11, 2019
    Assignee: Intel Corporation
    Inventors: Satyajit Sarangi, Thomas F. Raoux, Guei-Yuan Lueh, Subramaniam Maiyuran
  • Publication number: 20190087998
    Abstract: Various embodiments enable low frequency calculation of derived uniform values. A compiler can identify one or more portions of a shader that calculate a derived value based on an input value. For example, this portion may include instructions that use constant values, or the results of prior functions that used constant values. The constant values may include hardcoded values provided by the program (e.g., immediates) and/or other constant values. This portion of the shader can be extracted by the compiler and compiled into a first program. The compiler can compile the remainder of the shader into a second program that receives the derived uniform values from the first program. By extracting the portion(s) of the program that calculates a derived value into a separate program, the derived uniform value or values can be calculated at a lower frequency than if they were calculated for each pixel.
    Type: Application
    Filed: September 15, 2017
    Publication date: March 21, 2019
    Inventors: Travis T. SCHLUESSLER, Aleksander NEYMAN, Guei-Yuan LUEH, Thomas F. RAOUX, Bartosz SPITZBARTH
  • Publication number: 20190066256
    Abstract: Techniques to improve graphics processing unit (GPU) performance by introducing specialized code paths to process frequent common values are described. A shader compiler can determine instruction that, during operation, may output a common value and can introduce an enhanced shader instruction branch to process the common value to reduce overall computational requirements to execute the shader.
    Type: Application
    Filed: October 25, 2018
    Publication date: February 28, 2019
    Applicant: INTEL CORPORATION
    Inventors: SAURABH SHARMA, ABHISHEK VENKATESH, TRAVIS T. SCHLUESSLER, THOMAS F. RAOUX, RAHUL P. SATHE, JON HASSELGREN
  • Patent number: 10140678
    Abstract: Techniques to improve graphics processing unit (GPU) performance by introducing specialized code paths to process frequent common values are described. A shader compiler can determine instruction that, during operation, may output a common value and can introduce an enhanced shader instruction branch to process the common value to reduce overall computational requirements to execute the shader.
    Type: Grant
    Filed: April 1, 2016
    Date of Patent: November 27, 2018
    Assignee: INTEL CORPORATION
    Inventors: Saurabh Sharma, Abhishek Ventakesh, Travis T. Schluessler, Thomas F. Raoux, Rahul P. Sathe, Jon Hasselgren
  • Publication number: 20180005345
    Abstract: Methods and apparatus relating to reducing memory latency in graphics operations are described. In an embodiment, uniform data is transferred from a buffer to a General Register File (GRF) of a processor based at least in part on information stored in a gather table. The uniform data comprises data that is uniform across a plurality of primitives in a graphics operation. Other embodiments are also disclosed and claimed.
    Type: Application
    Filed: July 1, 2016
    Publication date: January 4, 2018
    Applicant: Intel Corporation
    Inventors: Michael Apodaca, David M. Cimini, Thomas F. Raoux, Somnath Ghosh, Uddipan Mukherjee, Debraj Bose, Sthiti Deka, Yohai Gevim
  • Publication number: 20170178384
    Abstract: Reducing SIMD fragmentation for SIMD execution widths of 32 or even 64 channels in a single hardware thread leads to better EU utilization. Increasing SIMD execution widths to 32 or 64 channels per thread, enables handling more vertices, patches, primitives and triangles per EU hardware thread. Modified 3D pipeline shader payloads can handle multiple patches in case of domain shaders or multiple primitives when primitive object instance count is greater than one in the case of geometry shaders and multiple triangles in case of pixel shaders.
    Type: Application
    Filed: December 21, 2015
    Publication date: June 22, 2017
    Inventors: Jayashree Venkatesh, Gang Chen, Thomas F. Raoux, Guei-Yuan Lueh, Subramaniam Maiyuran
  • Publication number: 20170178277
    Abstract: Techniques to improve graphics processing unit (GPU) performance by introducing specialized code paths to process frequent common values are described. A shader compiler can determine instruction that, during operation, may output a common value and can introduce an enhanced shader instruction branch to process the common value to reduce overall computational requirements to execute the shader.
    Type: Application
    Filed: April 1, 2016
    Publication date: June 22, 2017
    Inventors: SAURABH SHARMA, ABHISHEK VENKATESH, TRAVIS T. SCHLUESSLER, THOMAS F. RAOUX, RAHUL P. SATHE, JON HASSELGREN
  • Patent number: 9632979
    Abstract: An apparatus and method are described for performing a prefix sum. For example, one embodiment of an apparatus comprises: a graphics processor unit comprising one or more execution units to execute single instruction multiple data (SIMD) instructions, the GPU to be provided with a plurality of data elements as input for a prefix sum operation; a first register of the GPU to store the plurality of data elements in specified data element positions; and the one or more execution units to perform a series of single instruction multiple data (SIMD) operations using the plurality of data elements, the SIMD operations performed using regioning techniques to generate the prefix sum, the SIMD operations including a first plurality of simultaneous addition operations to add specified data elements to generate intermediate results and further including a second plurality of simultaneous addition operations to add the intermediate results to other intermediate results to generate the prefix sum.
    Type: Grant
    Filed: June 1, 2015
    Date of Patent: April 25, 2017
    Assignee: Intel Corporation
    Inventors: Satyajit Sarangi, Thomas F. Raoux
  • Publication number: 20160364828
    Abstract: A mechanism is described for facilitating dynamic runtime transformation of graphics processing commands for improved graphics performance on computing devices. A method of embodiments, as described herein, includes detecting a command stream associated with an application, where the command stream includes dispatches. The method may further include evaluating processing parameters relating to each of the dispatches, where evaluating further includes associating a first plan with one or more of the dispatches to transform the command stream into a transformed command stream. The method may further include associating, based on the first plan, a second plan to the one or more of the dispatches, where the second plan represents the transformed command stream. The method may further include executing the second plan, where execution of the second plan includes processing the transformed command stream in lieu of the command stream.
    Type: Application
    Filed: June 12, 2015
    Publication date: December 15, 2016
    Applicant: INTEL CORPORATION
    Inventors: James A. Valerio, Abhishek Venkatesh, Satyajit Sarangi, Michael Apodaca, Thomas F. Raoux, Hashem Hashemi, Rama S.B. Harihara
  • Publication number: 20160350262
    Abstract: An apparatus and method are described for performing a prefix sum. For example, one embodiment of an apparatus comprises: a graphics processor unit comprising one or more execution units to execute single instruction multiple data (SIMD) instructions, the GPU to be provided with a plurality of data elements as input for a prefix sum operation; a first register of the GPU to store the plurality of data elements in specified data element positions; and the one or more execution units to perform a series of single instruction multiple data (SIMD) operations using the plurality of data elements, the SIMD operations performed using regioning techniques to generate the prefix sum, the SIMD operations including a first plurality of simultaneous addition operations to add specified data elements to generate intermediate results and further including a second plurality of simultaneous addition operations to add the intermediate results to other intermediate results to generate the prefix sum.
    Type: Application
    Filed: June 1, 2015
    Publication date: December 1, 2016
    Inventors: SATYAJIT SARANGI, THOMAS F. RAOUX
  • Publication number: 20160139934
    Abstract: Systems and methods may process a single atomic operation. An instruction set may be generated to replace a plurality of atomic operations with a single atomic operation. The instruction set may include an accumulation instruction to compute a prefix sum for a plurality of initial values associated with a plurality of processing lanes to generate a plurality of accumulated values. The instruction set may also include a broadcast instruction to return a pre-existing value to be added with each of the plurality of accumulated values to generate a plurality of intermediate accumulated values. In one example, a graphics processor may execute the instruction set to process the single atomic operation.
    Type: Application
    Filed: November 17, 2014
    Publication date: May 19, 2016
    Applicant: Intel Corporation
    Inventors: SATYAJIT SARANGI, THOMAS F. RAOUX, GUEI-YUAN LUEH, SUBRAMANIAM MAIYURAN