Patents by Inventor Marek Targowski

Marek Targowski has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11029960
    Abstract: Apparatus and method for widened SIMD execution on a limited register file. For example, one embodiment of an apparatus comprises: instruction dispatch circuitry to dispatch instructions of a thread for execution, including a first instruction to indicate a start of a double execution instruction sequence and a second instruction to indicate an end of a double execution instruction sequence; and execution circuitry including single instruction multiple data (SIMD) circuitry, the execution circuitry to execute the double execution instruction sequence in a first pass using a first set of lanes of the SIMD circuitry and to execute the double execution instruction sequence in a second pass following the first pass using a second set of lanes of the SIMD circuitry.
    Type: Grant
    Filed: December 7, 2018
    Date of Patent: June 8, 2021
    Assignee: Intel Corporation
    Inventors: Marek Targowski, Konrad Trifunović
  • Patent number: 10956359
    Abstract: A mechanism is described for facilitating smart spill/fill data transfers in computing environments. A method of embodiments, as described herein, includes facilitating dividing a kernel into regions including low pressure regions and high pressure regions, where the low pressure regions are associated with low use of registers hosted by a processor of a computing device, while the high pressure regions are associated with high use of the registers. The method may further include transferring of data between memory and the registers based on at least one of the low pressure regions and the high pressure regions.
    Type: Grant
    Filed: August 16, 2017
    Date of Patent: March 23, 2021
    Assignee: Intel Corporation
    Inventors: Marek Targowski, Konrad Trifunovic
  • Publication number: 20200183697
    Abstract: Apparatus and method for widened SIMD execution on a limited register file. For example, one embodiment of an apparatus comprises: instruction dispatch circuitry to dispatch instructions of a thread for execution, including a first instruction to indicate a start of a double execution instruction sequence and a second instruction to indicate an end of a double execution instruction sequence; and execution circuitry including single instruction multiple data (SIMD) circuitry, the execution circuitry to execute the double execution instruction sequence in a first pass using a first set of lanes of the SIMD circuitry and to execute the double execution instruction sequence in a second pass following the first pass using a second set of lanes of the SIMD circuitry.
    Type: Application
    Filed: December 7, 2018
    Publication date: June 11, 2020
    Inventors: MAREK TARGOWSKI, KONRAD TRIFUNOVIC
  • Publication number: 20190286430
    Abstract: Apparatus and method for optimizing shader execution. For example, one embodiment of a graphics processing apparatus comprises: a plurality of execution units to execute shader programs; optimization detection circuitry and/or logic to identify one or more portions of shader program code to be optimized including one or more reduction operations which require read/write memory operations and associated barrier operations; and optimization circuitry and/or logic to optimize the shader program code by converting a plurality of the read/write memory operations to read/write register operations and removing one or more barrier operations to generate optimized shader program code; the execution units to execute the optimized shader program code.
    Type: Application
    Filed: March 15, 2018
    Publication date: September 19, 2019
    Inventor: MAREK TARGOWSKI
  • Patent number: 10409571
    Abstract: Apparatus and method for optimizing shader execution. For example, one embodiment of a graphics processing apparatus comprises: a plurality of execution units to execute shader programs; optimization detection circuitry and/or logic to identify one or more portions of shader program code to be optimized including one or more reduction operations which require read/write memory operations and associated barrier operations; and optimization circuitry and/or logic to optimize the shader program code by converting a plurality of the read/write memory operations to read/write register operations and removing one or more barrier operations to generate optimized shader program code; the execution units to execute the optimized shader program code.
    Type: Grant
    Filed: March 15, 2018
    Date of Patent: September 10, 2019
    Assignee: Intel Corporation
    Inventor: Marek Targowski
  • Publication number: 20190057061
    Abstract: A mechanism is described for facilitating smart spill/fill data transfers in computing environments. A method of embodiments, as described herein, includes facilitating dividing a kernel into regions including low pressure regions and high pressure regions, where the low pressure regions are associated with low use of registers hosted by a processor of a computing device, while the high pressure regions are associated with high use of the registers. The method may further include transferring of data between memory and the registers based on at least one of the low pressure regions and the high pressure regions.
    Type: Application
    Filed: August 16, 2017
    Publication date: February 21, 2019
    Applicant: Intel Corporation
    Inventors: MAREK TARGOWSKI, Konrad TRIFUNOVIC
  • Patent number: 9652300
    Abstract: Systems and methods for the processing of EU threads (also known as warps) in a thread group. The status of each EU thread in the group may be monitored, to determine if it is executing or if it is halted and waiting at a synchronization barrier. If certain threshold conditions are met, the waiting EU threads may be preempted to allow execution of threads from another thread group. The threshold conditions may include a minimum number of EUs in use, a minimum number of EU threads in the first thread group that are waiting at the synchronization barrier and/or a maximum number of EU threads that are still executing, and a minimum wait time for one or more of the EU threads waiting at the barrier.
    Type: Grant
    Filed: June 28, 2012
    Date of Patent: May 16, 2017
    Assignee: Intel Corporation
    Inventor: Marek Targowski
  • Patent number: 9449360
    Abstract: Methods and apparatuses to reduce the number of sequential operations such as atomic operations in an application to be performed on a shared memory cell may be provided. A translation unit can detect in the application multiple atomic operations to be performed on the same memory and replaces the multiple atomic operations with an equivalent single atomic operation. In some implementations, the application includes shader code. In some implementations, each of the multiple atomic operations increment a value stored at the same memory by an update amount. The translation unit may calculate the partial prefix sum over all the atomic operations and replace the multiple atomic operations with a single atomic operation to increment the value stored at memory by the sum of the update amounts.
    Type: Grant
    Filed: December 28, 2011
    Date of Patent: September 20, 2016
    Assignee: Intel Corporation
    Inventors: Tomasz Janczak, Marek Targowski
  • Publication number: 20140198110
    Abstract: Methods and apparatuses to reduce the number of sequential operations such as atomic operations in an application to be performed on a shared memory cell may be provided. A translation unit can detect in the application multiple atomic operations to be performed on the same memory and replaces the multiple atomic operations with an equivalent single atomic operation. In some implementations, the application includes shader code. In some implementations, each of the multiple atomic operations increment a value stored at the same memory by an update amount. The translation unit may calculate the partial prefix sum over all the atomic operations and replace the multiple atomic operations with a single atomic operation to increment the value stored at memory by the sum of the update amounts.
    Type: Application
    Filed: December 28, 2011
    Publication date: July 17, 2014
    Inventors: Tomasz Janczak, Marek Targowski
  • Publication number: 20140007111
    Abstract: Systems and methods for the processing of EU threads (also known as warps) in a thread group. The status of each EU thread in the group may be monitored, to determine if it is executing or if it is halted and waiting at a synchronization barrier. If certain threshold conditions are met, the waiting EU threads may be preempted to allow execution of threads from another thread group. The threshold conditions may include a minimum number of EUs in use, a minimum number of EU threads in the first thread group that are waiting at the synchronization barrier and/or a maximum number of EU threads that are still executing, and a minimum wait time for one or more of the EU threads waiting at the barrier.
    Type: Application
    Filed: June 28, 2012
    Publication date: January 2, 2014
    Inventor: Marek Targowski