Patents by Inventor Noam Itzhaki
Noam Itzhaki has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20230281104Abstract: Disclosed examples include generating instrumented code by inserting profiling instructions at insertion points in code; outputting the instrumented code for execution by second programmable circuitry; and accessing profiling data generated by the second programmable circuitry based on the instrumented code.Type: ApplicationFiled: May 12, 2023Publication date: September 7, 2023Inventors: Konstantin Levit-Gurevich, Aleksey Alekseev, Michael Berezalsky, Sion Berkowits, Julia Fedorova, Anton V. Gorshkov, Sunpyo Hong, Noam Itzhaki, Arik Narkis
-
Patent number: 11694299Abstract: Embodiments are disclosed for emulation of graphics processing unit instructions. An example method executing an instrumented kernel using a logic circuit, the instrumented kernel including an emulation sequence; saving, in response to determination that the emulation sequence is to be executed, source data to a shared memory; setting an emulation request flag to indicate to processor circuitry separate from the logic circuit that offloaded execution of the emulation sequence is to be executed; monitoring the emulation request flag to determine whether the offloaded execution of the emulation sequence is complete; and accessing resulting data from the shared memory.Type: GrantFiled: September 24, 2021Date of Patent: July 4, 2023Assignee: INTEL CORPORATIONInventors: Konstantin Levit-Gurevich, Michael Berezalsky, Noam Itzhaki, Arik Narkis, Orr Goldman
-
Patent number: 11650902Abstract: Disclosed examples to perform instruction-level graphics processing unit (GPU) profiling based on binary instrumentation include: accessing, via a GPU driver executed by a processor, binary code generated by a GPU compiler based on application programming interface (API)-based code provided by an application; accessing, via the GPU driver executed by the processor, instrumented binary code, the instrumented binary code generated by a binary instrumentation module that inserts profiling instructions in the binary code based on an instrumentation schema provided by a profiling application; and providing, via the GPU driver executed by the processor, the instrumented binary code from the GPU driver to a GPU, the instrumented binary code structured to cause the GPU to collect and store profiling data in a memory based on the profiling instructions while executing the instrumented binary code.Type: GrantFiled: November 8, 2017Date of Patent: May 16, 2023Assignee: Intel CorporationInventors: Konstantin Levit-Gurevich, Aleksey Alekseev, Michael Berezalsky, Sion Berkowits, Julia Fedorova, Anton V. Gorshkov, Sunpyo Hong, Noam Itzhaki, Arik Narkis
-
Publication number: 20220012844Abstract: Embodiments are disclosed for emulation of graphics processing unit instructions. An example method executing an instrumented kernel using a logic circuit, the instrumented kernel including an emulation sequence; saving, in response to determination that the emulation sequence is to be executed, source data to a shared memory; setting an emulation request flag to indicate to processor circuitry separate from the logic circuit that offloaded execution of the emulation sequence is to be executed; monitoring the emulation request flag to determine whether the offloaded execution of the emulation sequence is complete; and accessing resulting data from the shared memory.Type: ApplicationFiled: September 24, 2021Publication date: January 13, 2022Inventors: Konstantin Levit-Gurevich, Michael Berezalsky, Noam Itzhaki, Arik Narkis, Orr Goldman
-
Patent number: 11132761Abstract: Embodiments are disclosed for emulation of graphics processing unit instructions. An example apparatus includes a kernel accessor to access an instruction of an original GPU kernel, the original GPU kernel intended to be executed at a first GPU. An instruction support determiner is to determine whether execution of the instruction is supported by a second GPU different from the first GPU. An instruction modifier is to, in response to determining that the execution of the instruction is not supported by the second GPU, create an instrumented GPU kernel based on the original GPU kernel. The instrumented GPU kernel includes an emulation sequence. The emulation sequence is to, when executed by the second GPU, cause the second GPU to emulate execution of the instruction by the first GPU.Type: GrantFiled: February 6, 2020Date of Patent: September 28, 2021Assignee: INTEL CORPORATIONInventors: Konstantin Levit-Gurevich, Michael Berezalsky, Noam Itzhaki, Arik Narkis, Orr Goldman
-
Patent number: 11120521Abstract: Techniques and apparatus for profiling graphics processing unit (GPU) processes using binary instrumentation are described. In one embodiment, for example, an apparatus may include at least one memory comprising instructions and a processor coupled to the at least one memory. The processor may execute the instructions to implement a profiling process to profile a graphics processing unit (GPU) application being executed via a GPU, the profiling process to perform an instrumentation phase to determine an operating process being executed via the GPU and to generate instrumented binary code for the operating process, perform an execution phase to collect profiling data for a command of the operating process, and perform a completion phase for a profiling application executed via the processor to read the profiling data. Other embodiments are described.Type: GrantFiled: December 28, 2018Date of Patent: September 14, 2021Assignee: INTEL CORPORATIONInventors: Orr Goldman, Konstantin Levit-Gurevich, Michael Berezalsky, Noam Itzhaki, Arik Narkis
-
Publication number: 20210192674Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed to improve operation of a graphics processing unit (GPU). An example apparatus includes an instruction generator to insert profiling instructions into a GPU kernel to generate an instrumented GPU kernel, the instrumented GPU kernel is to be executed by a GPU, a trace analyzer to generate an occupancy map associated with the GPU executing the instrumented GPU kernel, a parameter calculator to determine one or more operating parameters of the GPU based on the occupancy map, and a processor optimizer to invoke a GPU driver to adjust a workload of the GPU based on the one or more operating parameters.Type: ApplicationFiled: November 12, 2020Publication date: June 24, 2021Inventors: Konstantin Levit-Gurevich, Michael Berezalsky, Noam Itzhaki, Arik Narkis, Orr Goldman
-
Patent number: 10922779Abstract: Techniques and apparatus for profiling graphics processing unit (GPU) processes using binary instrumentation are described. In one embodiment, for example, an apparatus may include at least one memory comprising instructions and a processor coupled to the at least one memory. The processor may execute the instructions to determine a plurality of profiling modes for profiling an operating process of a graphics processing unit (GPU) application, access original binary code for the GPU application, and generate a multi-mode instrumented binary code comprising a plurality of instrumentation modes, each of the plurality of instrumentation modes corresponding to at least one of the plurality of profiling modes. Other embodiments are described.Type: GrantFiled: December 28, 2018Date of Patent: February 16, 2021Assignee: INTEL CORPORATIONInventors: Orr Goldman, Konstantin Levit-Gurevich, Michael Berezalsky, Noam Itzhaki, Arik Narkis
-
Patent number: 10867362Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed to improve operation of a graphics processing unit (GPU). An example apparatus includes an instruction generator to insert profiling instructions into a GPU kernel to generate an instrumented GPU kernel, the instrumented GPU kernel is to be executed by a GPU, a trace analyzer to generate an occupancy map associated with the GPU executing the instrumented GPU kernel, a parameter calculator to determine one or more operating parameters of the GPU based on the occupancy map, and a processor optimizer to invoke a GPU driver to adjust a workload of the GPU based on the one or more operating parameters.Type: GrantFiled: September 12, 2018Date of Patent: December 15, 2020Assignee: INTEL CORPORATIONInventors: Konstantin Levit-Gurevich, Michael Berezalsky, Noam Itzhaki, Arik Narkis, Orr Goldman
-
Publication number: 20200279348Abstract: Embodiments are disclosed for emulation of graphics processing unit instructions. An example apparatus includes a kernel accessor to access an instruction of an original GPU kernel, the original GPU kernel intended to be executed at a first GPU. An instruction support determiner is to determine whether execution of the instruction is supported by a second GPU different from the first GPU. An instruction modifier is to, in response to determining that the execution of the instruction is not supported by the second GPU, create an instrumented GPU kernel based on the original GPU kernel. The instrumented GPU kernel includes an emulation sequence. The emulation sequence is to, when executed by the second GPU, cause the second GPU to emulate execution of the instruction by the first GPU.Type: ApplicationFiled: February 6, 2020Publication date: September 3, 2020Inventors: Konstantin Levit-Gurevich, Michael Berezalsky, Noam Itzhaki, Arik Narkis, Orr Goldman
-
Patent number: 10559057Abstract: Embodiments are disclosed for emulation of graphics processing unit instructions. An example apparatus includes a kernel accessor to access an instruction of an original GPU kernel, the original GPU kernel intended to be executed at a first GPU. An instruction support determiner is to determine whether execution of the instruction is supported by a second GPU different from the first GPU. An instruction modifier is to, in response to determining that the execution of the instruction is not supported by the second GPU, create an instrumented GPU kernel based on the original GPU kernel. The instrumented GPU kernel includes an emulation sequence. The emulation sequence is to, when executed by the second GPU, cause the second GPU to emulate execution of the instruction by the first GPU.Type: GrantFiled: September 27, 2018Date of Patent: February 11, 2020Assignee: INTEL CORPORATIONInventors: Konstantin Levit-Gurevich, Michael Berezalsky, Noam Itzhaki, Arik Narkis, Orr Goldman
-
Patent number: 10504492Abstract: An apparatus for generating dynamic trace data of binary code running on one or more execution units of a Graphics Processing Unit (GPU) though binary instrumentation is presented. In embodiments, the apparatus may include an input interface disposed in the GPU to receive instrumented binary code and communication data, and an output interface disposed in the GPU, and coupled to a memory of a computer hosting the GPU. In embodiments, the memory may be further coupled to the input interface and a Central Processing Unit (CPU) of the computer, the memory having a trace buffer and a control buffer, the control buffer including an overflow flag of the trace buffer. In embodiments, the apparatus may further include an execution unit (EU) disposed in the GPU and coupled to the input interface and to the output interface, to conditionally execute the instrumented binary code and generate dynamic trace data when the overflow flag is not set to indicate an overflow condition.Type: GrantFiled: May 4, 2018Date of Patent: December 10, 2019Assignee: Intel CorporationInventors: Sunpyo Hong, Konstantin Levit-Gurevich, Michael Berezalsky, Arik Narkis, Noam Itzhaki
-
Patent number: 10467118Abstract: Techniques and apparatus for performance analysis of a program are described. In one embodiment, for example, an apparatus may include at least one memory, and logic, at least a portion of comprised in hardware coupled to the at least one memory, to access a program for performance analysis, the program comprising at least one producer instruction and at least one consumer instruction for the at least one producer instruction, and generate an analysis program based on the program, the analysis program comprising a stall time instruction set to determine a stall time of the at least one producer instruction, the stall time instruction set comprising a first time stamp instruction immediately preceding a consumer instruction, a second time stamp instruction immediately following the consumer instruction, and a stall time instruction to determine the stall time as the difference between the second time stamp and the first time stamp. Other embodiments are described and claimed.Type: GrantFiled: September 28, 2017Date of Patent: November 5, 2019Assignee: INTEL CORPORATIONInventors: Konstantin Levit-Gurevich, Michael Berezalsky, Noam Itzhaki, Arik Narkis
-
Patent number: 10459705Abstract: Systems, apparatuses and methods may provide for technology that receives compiled code and identifies a plurality of blocks in the compiled code. Instrumented code may be generated from the compiled code by modifying the blocks to include probes to measure latencies of the blocks during execution of the instrumented code on a graphics processing unit.Type: GrantFiled: December 28, 2017Date of Patent: October 29, 2019Assignee: Intel CorporationInventors: Anton V. Gorshkov, Michael Berezalsky, Konstantin Levit-Guervich, Julia Fedorova, Noam Itzhaki, Arik Narkis, Sion Berkowits
-
Publication number: 20190213706Abstract: Techniques and apparatus for profiling graphics processing unit (GPU) processes using binary instrumentation are described. In one embodiment, for example, an apparatus may include at least one memory comprising instructions and a processor coupled to the at least one memory. The processor may execute the instructions to implement a profiling process to profile a graphics processing unit (GPU) application being executed via a GPU, the profiling process to perform an instrumentation phase to determine an operating process being executed via the GPU and to generate instrumented binary code for the operating process, perform an execution phase to collect profiling data for a command of the operating process, and perform a completion phase for a profiling application executed via the processor to read the profiling data. Other embodiments are described.Type: ApplicationFiled: December 28, 2018Publication date: July 11, 2019Applicant: INTEL CORPORATIONInventors: Orr Goldman, Konstantin Levit-Gurevich, Michael Berezalsky, Noam Itzhaki, Arik Narkis
-
Publication number: 20190139183Abstract: Techniques and apparatus for profiling graphics processing unit (GPU) processes using binary instrumentation are described. In one embodiment, for example, an apparatus may include at least one memory comprising instructions and a processor coupled to the at least one memory. The processor may execute the instructions to determine a plurality of profiling modes for profiling an operating process of a graphics processing unit (GPU) application, access original binary code for the GPU application, and generate a multi-mode instrumented binary code comprising a plurality of instrumentation modes, each of the plurality of instrumentation modes corresponding to at least one of the plurality of profiling modes. Other embodiments are described.Type: ApplicationFiled: December 28, 2018Publication date: May 9, 2019Inventors: Orr Goldman, Konstantin Levit-Gurevich, Michael Berezalsky, Noam Itzhaki, Arik Narkis
-
Publication number: 20190095309Abstract: Techniques and apparatus for performance analysis of a program are described. In one embodiment, for example, an apparatus may include at least one memory, and logic, at least a portion of comprised in hardware coupled to the at least one memory, to access a program for performance analysis, the program comprising at least one producer instruction and at least one consumer instruction for the at least one producer instruction, and generate an analysis program based on the program, the analysis program comprising a stall time instruction set to determine a stall time of the at least one producer instruction, the stall time instruction set comprising a first time stamp instruction immediately preceding a consumer instruction, a second time stamp instruction immediately following the consumer instruction, and a stall time instruction to determine the stall time as the difference between the second time stamp and the first time stamp. Other embodiments are described and claimed.Type: ApplicationFiled: September 28, 2017Publication date: March 28, 2019Inventors: Konstantin Levit-Gurevich, Michael Berezalsky, Noam Itzhaki, Arik Narkis
-
Publication number: 20190042223Abstract: Systems, apparatuses and methods may provide for technology that receives compiled code and identifies a plurality of blocks in the compiled code. Instrumented code may be generated from the compiled code by modifying the blocks to include probes to measure latencies of the blocks during execution of the instrumented code on a graphics processing unit.Type: ApplicationFiled: December 28, 2017Publication date: February 7, 2019Inventors: Anton V. Gorshkov, Michael Berezalsky, Konstantin Levit-Guervich, Julia Fedorova, Noam Itzhaki, Arik Narkis, Sion Berkowits
-
Publication number: 20190043457Abstract: An apparatus for generating dynamic trace data of binary code running on one or more execution units of a Graphics Processing Unit (GPU) though binary instrumentation is presented. In embodiments, the apparatus may include an input interface disposed in the GPU to receive instrumented binary code and communication data, and an output interface disposed in the GPU, and coupled to a memory of a computer hosting the GPU. In embodiments, the memory may be further coupled to the input interface and a Central Processing Unit (CPU) of the computer, the memory having a trace buffer and a control buffer, the control buffer including an overflow flag of the trace buffer. In embodiments, the apparatus may further include an execution unit (EU) disposed in the GPU and coupled to the input interface and to the output interface, to conditionally execute the instrumented binary code and generate dynamic trace data when the overflow flag is not set to indicate an overflow condition.Type: ApplicationFiled: May 4, 2018Publication date: February 7, 2019Inventors: Sunpyo Hong, Konstantin Levit-Gurevich, Michael Berezalsky, Arik Narkis, Noam Itzhaki
-
Publication number: 20190043158Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed to improve operation of a graphics processing unit (GPU). An example apparatus includes an instruction generator to insert profiling instructions into a GPU kernel to generate an instrumented GPU kernel, the instrumented GPU kernel is to be executed by a GPU, a trace analyzer to generate an occupancy map associated with the GPU executing the instrumented GPU kernel, a parameter calculator to determine one or more operating parameters of the GPU based on the occupancy map, and a processor optimizer to invoke a GPU driver to adjust a workload of the GPU based on the one or more operating parameters.Type: ApplicationFiled: September 12, 2018Publication date: February 7, 2019Inventors: Konstantin Levit-Gurevich, Michael Berezalsky, Noam Itzhaki, Arik Narkis, Orr Goldman