Patents by Inventor Gadi Haber
Gadi Haber has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11579881Abstract: Disclosed embodiments relate to instructions for vector operations with immediate values. In one example, a system includes a memory and a processor that includes fetch circuitry to fetch the instruction from a code storage, the instruction including an opcode, a destination identifier to specify a destination vector register, a first immediate, and a write mask identifier to specify a write mask register, the write mask register including at least one bit corresponding to each destination vector register element, the at least one bit to specify whether the destination vector register element is masked or unmasked, decode circuitry to decode the fetched instruction, and execution circuitry to execute the decoded instruction, to, use the write mask register to determine unmasked elements of the destination vector register, and, when the opcode specifies to broadcast, broadcast the first immediate to one or more unmasked vector elements of the destination vector register.Type: GrantFiled: June 29, 2017Date of Patent: February 14, 2023Assignee: Intel CorporationInventors: Gadi Haber, Robert Valentine, Ayal Zaks, Jesus Corbal San Adrian
-
Patent number: 11188342Abstract: An apparatus and method for a speculative conditional move instruction. A processor comprising: a decoder to decode a first speculative conditional move instruction; a prediction storage to store prediction data related to previously executed speculative conditional move instructions; and execution circuitry to read first prediction data associated with the speculative conditional move instruction and to execute the speculative conditional move instruction either speculatively or non-speculatively based on the first prediction data.Type: GrantFiled: April 1, 2020Date of Patent: November 30, 2021Assignee: Intel CorporationInventors: Amjad Aboud, Gadi Haber, Jared Warner Stark, IV
-
Publication number: 20200225959Abstract: An apparatus and method for a speculative conditional move instruction. A processor comprising: a decoder to decode a first speculative conditional move instruction; a prediction storage to store prediction data related to previously executed speculative conditional move instructions; and execution circuitry to read first prediction data associated with the speculative conditional move instruction and to execute the speculative conditional move instruction either speculatively or non-speculatively based on the first prediction data.Type: ApplicationFiled: April 1, 2020Publication date: July 16, 2020Applicant: Intel CorporationInventors: AMJAD ABOUD, GADI HABER, JARED WARNER STARK, IV
-
Patent number: 10620961Abstract: An apparatus and method for a speculative conditional move instruction. A processor comprising: a decoder to decode a first speculative conditional move instruction; a prediction storage to store prediction data related to previously executed speculative conditional move instructions; and execution circuitry to read first prediction data associated with the speculative conditional move instruction and to execute the speculative conditional move instruction either speculatively or non-speculatively based on the first prediction data.Type: GrantFiled: March 30, 2018Date of Patent: April 14, 2020Assignee: Intel CorporationInventors: Amjad Aboud, Gadi Haber, Jared Warner Stark, IV
-
Publication number: 20190303163Abstract: An apparatus and method for a speculative conditional move instruction. A processor comprising: a decoder to decode a first speculative conditional move instruction; a prediction storage to store prediction data related to previously executed speculative conditional move instructions; and execution circuitry to read first prediction data associated with the speculative conditional move instruction and to execute the speculative conditional move instruction either speculatively or non-speculatively based on the first prediction data.Type: ApplicationFiled: March 30, 2018Publication date: October 3, 2019Inventors: AMJAD ABOUD, GADI HABER, JARED Warner STARK, IV
-
Patent number: 10209764Abstract: Embodiments described herein relate to improving processor power-performance using a binary analyzer routine. In one example, a processor includes a memory interface to couple to a memory, at least one hardware accelerator circuit, and an execution pipeline including at least fetch, decode, and execute stages, wherein the processor, in response to a hot-spot hardware event indicating presence of a hot-spot sequence, is to switch context to a binary analyzer routine stored in the memory, the binary analyzer routine including instructions that, when fetched, decoded, and executed by the processor, cause the processor to analyze a region in the memory containing the hot-spot sequence, analyze hardware metrics relating to execution of the hot-spot sequence, and generate, based on the analyses, a recommendation for the at least one hardware accelerator circuit to improve at least one of power consumption and performance.Type: GrantFiled: December 20, 2016Date of Patent: February 19, 2019Assignee: Intel CorporationInventors: Konstantin Levit-Gurevich, Gadi Haber
-
Publication number: 20190004801Abstract: Disclosed embodiments relate to instructions for vector operations with immediate values. In one example, a system includes a memory and a processor that includes fetch circuitry to fetch the instruction from a code storage, the instruction including an opcode, a destination identifier to specify a destination vector register, a first immediate, and a write mask identifier to specify a write mask register, the write mask register including at least one bit corresponding to each destination vector register element, the at least one bit to specify whether the destination vector register element is masked or unmasked, decode circuitry to decode the fetched instruction, and execution circuitry to execute the decoded instruction, to, use the write mask register to determine unmasked elements of the destination vector register, and, when the opcode specifies to broadcast, broadcast the first immediate to one or more unmasked vector elements of the destination vector register.Type: ApplicationFiled: June 29, 2017Publication date: January 3, 2019Inventors: Gadi Haber, Robert Valentine, Ayal Zaks, Jesus Corbal San Adrian
-
Publication number: 20180173534Abstract: A processor may include a decoder to decode a first instance of a branch instruction for which the resolved branch direction is data dependent and add results of the decoding to a stream of decoded instructions for execution. The processor may include a code generator to inject, into the stream of decoded instructions, branch resolution code to resolve the branch condition for a second instance of the branch instruction following the first instance at a predetermined look-ahead distance. The processor may include an execution unit to execute the branch resolution code, storing an indication of the resolved branch direction for the second instance in an entry of a prediction queue for the branch instruction. The processor may include a branch predictor to receive the second instance of the branch instruction, and output the resolved branch direction as the predicted branch direction for the second instance of the branch instruction.Type: ApplicationFiled: December 20, 2016Publication date: June 21, 2018Inventors: Leeor Peled, Gadi Haber, Yiannakis Sazeides
-
Publication number: 20180173291Abstract: Embodiments described herein relate to improving processor power-performance using a binary analyzer routine. In one example, a processor includes a memory interface to couple to a memory, at least one hardware accelerator circuit, and an execution pipeline including at least fetch, decode, and execute stages, wherein the processor, in response to a hot-spot hardware event indicating presence of a hot-spot sequence, is to switch context to a binary analyzer routine stored in the memory, the binary analyzer routine including instructions that, when fetched, decoded, and executed by the processor, cause the processor to analyze a region in the memory containing the hot-spot sequence, analyze hardware metrics relating to execution of the hot-spot sequence, and generate, based on the analyses, a recommendation for the at least one hardware accelerator circuit to improve at least one of power consumption and performance.Type: ApplicationFiled: December 20, 2016Publication date: June 21, 2018Inventors: Konstantin Levit-Gurevich, Gadi Haber
-
Publication number: 20170132007Abstract: Methods, apparatus and systems for virtualization of a native instruction set are disclosed. Embodiments include a processor core executing the native instructions and a second core, or alternatively only the second processor core consuming less power while executing a second instruction set that excludes portions of the native instruction set. The second core's decoder detects invalid opcodes of the second instruction set. A microcode layer disassembler determines if opcodes should be translated. A translation runtime environment identifies an executable region containing an invalid opcode, other invalid opcodes and interjacent valid opcodes of the second instruction set. An analysis unit determines an initial machine state prior to execution of the invalid opcode. A partial translation of the executable region that includes encapsulations of the translations of invalid opcodes and state recoveries of the machine states is generated and saved to a translation cache memory.Type: ApplicationFiled: January 23, 2017Publication date: May 11, 2017Inventors: Gadi Haber, Konstantin Kostya Levit-Gurevich, Esfir Natanzon, Boris Ginzburg, Aya Elhanan, Moshe Maury Bach, Igor Breger
-
Patent number: 9552207Abstract: Methods, apparatus and systems for virtualization of a native instruction set are disclosed. Embodiments include a processor core executing the native instructions and a second core, or alternatively only the second processor core consuming less power while executing a second instruction set that excludes portions of the native instruction set. The second core's decoder detects invalid opcodes of the second instruction set. A microcode layer disassembler determines if opcodes should be translated. A translation runtime environment identifies an executable region containing an invalid opcode, other invalid opcodes and interjacent valid opcodes of the second instruction set. An analysis unit determines an initial machine state prior to execution of the invalid opcode. A partial translation of the executable region that includes encapsulations of the translations of invalid opcodes and state recoveries of the machine states is generated and saved to a translation cache memory.Type: GrantFiled: February 2, 2016Date of Patent: January 24, 2017Assignee: Intel CorporationInventors: Gadi Haber, Konstantin Kostya Levit-Gurevich, Esfir Natanzon, Boris Ginzburg, Aya Elhanan, Moshe Maury Bach, Igor Breger
-
Publication number: 20160378480Abstract: Embodiments for systems, methods, and apparatuses for improving performance of status dependent computations are detailed. In an embodiment, an hardware apparatus comprises decoder hardware to decode an instruction, operand retrieval hardware to retrieve data from at least one source operand associated with the instruction decoded by the decoder hardware, and execution hardware to execute the decoded instruction to generate a result including at least one status bit and to cause the result and at least one status bit to be stored in a single destination physical storage location, wherein the at least one status bit and result are accessible through a read of the single register.Type: ApplicationFiled: June 27, 2015Publication date: December 29, 2016Inventors: Pavel G. Matveyev, Dmitry M. Maslennikov, Paul Caprioli, Gadi Haber
-
Publication number: 20160154651Abstract: Methods, apparatus and systems for virtualization of a native instruction set are disclosed. Embodiments include a processor core executing the native instructions and a second core, or alternatively only the second processor core consuming less power while executing a second instruction set that excludes portions of the native instruction set. The second core's decoder detects invalid opcodes of the second instruction set. A microcode layer disassembler determines if opcodes should be translated. A translation runtime environment identifies an executable region containing an invalid opcode, other invalid opcodes and interjacent valid opcodes of the second instruction set. An analysis unit determines an initial machine state prior to execution of the invalid opcode. A partial translation of the executable region that includes encapsulations of the translations of invalid opcodes and state recoveries of the machine states is generated and saved to a translation cache memory.Type: ApplicationFiled: February 2, 2016Publication date: June 2, 2016Inventors: Gadi Haber, Konstantin Kostya Levit-Gurevich, Esfir Natanzon, Boris Ginzburg, Aya Elhanan, Moshe Maury Bach, Igor Breger
-
Patent number: 9348594Abstract: An asymmetric multiprocessor system (ASMP) may comprise computational cores implementing different instruction set architectures and having different power requirements. Program code executing on the ASMP is analyzed by a binary analysis unit to determine what functions are called by the program code and select which of the cores are to execute the program code, or a code segment thereof. Selection may be made to provide for native execution of the program code, to minimize power consumption, and so forth. Control operations based on this selection may then be inserted into the program code, forming instrumented program code. The instrumented program code is then executed by the ASMP.Type: GrantFiled: December 29, 2011Date of Patent: May 24, 2016Assignee: Intel CorporationInventors: Koichi Yamada, Boris Ginzburg, Wei Li, Ronny Ronen, Esfir Natanzon, Konstantin Levit-Gurevich, Gadi Haber, Alon Naveh, Eliezer Weissmann, Michael Mishaeli
-
Patent number: 9250906Abstract: Methods, apparatus and systems for virtualization of a native instruction set are disclosed. Embodiments include a processor core executing the native instructions and a second core, or alternatively only the second processor core consuming less power while executing a second instruction set that excludes portions of the native instruction set. The second core's decoder detects invalid opcodes of the second instruction set. A microcode layer disassembler determines if opcodes should be translated. A translation runtime environment identifies an executable region containing an invalid opcode, other invalid opcodes and interjacent valid opcodes of the second instruction set. An analysis unit determines an initial machine state prior to execution of the invalid opcode. A partial translation of the executable region that includes encapsulations of the translations of invalid opcodes and state recoveries of the machine states is generated and saved to a translation cache memory.Type: GrantFiled: August 30, 2015Date of Patent: February 2, 2016Assignee: Intel CorporationInventors: Gadi Haber, Konstantin Kostya Levit-Gurevich, Esfir Natanzon, Boris Ginzburg, Aya Elhanan, Moshe Maury Bach, Igor Breger
-
Publication number: 20150370567Abstract: Methods, apparatus and systems for virtualization of a native instruction set are disclosed. Embodiments include a processor core executing the native instructions and a second core, or alternatively only the second processor core consuming less power while executing a second instruction set that excludes portions of the native instruction set. The second core's decoder detects invalid opcodes of the second instruction set. A microcode layer disassembler determines if opcodes should be translated. A translation runtime environment identifies an executable region containing an invalid opcode, other invalid opcodes and interjacent valid opcodes of the second instruction set. An analysis unit determines an initial machine state prior to execution of the invalid opcode. A partial translation of the executable region that includes encapsulations of the translations of invalid opcodes and state recoveries of the machine states is generated and saved to a translation cache memory.Type: ApplicationFiled: August 30, 2015Publication date: December 24, 2015Inventors: Gadi Haber, Konstantin Kostya Levit-Gurevich, Esfir Natanzon, Boris Ginzburg, Aya Elhanan, Moshe Maury Bach, Igor Breger
-
Patent number: 9141361Abstract: Methods, apparatus and systems for virtualization of a native instruction set are disclosed. Embodiments include a processor core executing the native instructions and a second core, or alternatively only the second processor core consuming less power while executing a second instruction set that excludes portions of the native instruction set. The second core's decoder detects invalid opcodes of the second instruction set. A microcode layer disassembler determines if opcodes should be translated. A translation runtime environment identifies an executable region containing an invalid opcode, other invalid opcodes and interjacent valid opcodes of the second instruction set. An analysis unit determines an initial machine state prior to execution of the invalid opcode. A partial translation of the executable region that includes encapsulations of the translations of invalid opcodes and state recoveries of the machine states is generated and saved to a translation cache memory.Type: GrantFiled: September 30, 2012Date of Patent: September 22, 2015Assignee: Intel CorporationInventors: Gadi Haber, Konstantin Kostya Levit-Gurevich, Esfir Natanzon, Boris Ginzburg, Aya Elhanan, Moshe Maury Bach, Igor Breger
-
Publication number: 20140095832Abstract: Methods, apparatus and systems for virtualization of a native instruction set are disclosed. Embodiments include a processor core executing the native instructions and a second core, or alternatively only the second processor core consuming less power while executing a second instruction set that excludes portions of the native instruction set. The second core's decoder detects invalid opcodes of the second instruction set. A microcode layer disassembler determines if opcodes should be translated. A translation runtime environment identifies an executable region containing an invalid opcode, other invalid opcodes and interjacent valid opcodes of the second instruction set. An analysis unit determines an initial machine state prior to execution of the invalid opcode. A partial translation of the executable region that includes encapsulations of the translations of invalid opcodes and state recoveries of the machine states is generated and saved to a translation cache memory.Type: ApplicationFiled: September 30, 2012Publication date: April 3, 2014Inventors: Gadi Haber, Konstantin Kostya Levit-Gurevich, Esfir Natanzon, Boris Ginzburg, Aya Elhanan, Moshe Maury Bach, Igor Breger
-
Publication number: 20140019723Abstract: An asymmetric multiprocessor system (ASMP) may comprise computational cores implementing different instruction set architectures and having different power requirements. Program code for execution on the ASMP is analyzed and a determination is made as to whether to allow the program code, or a code segment thereof to execute on a first core natively or to use binary translation on the code and execute the translated code on a second core which consumes less power than the first core during execution.Type: ApplicationFiled: December 28, 2011Publication date: January 16, 2014Inventors: Koichi Yamada, Ronny Ronen, Wei Li, Boris Ginzburg, Gadi Haber, Konstantin Levit-Gurevich, Esfir Natanzon, Alon Naveh, Eliezer Weissmann, Michael Mishaeli
-
Publication number: 20130268742Abstract: An asymmetric multiprocessor system (ASMP) may comprise computational cores implementing different instruction set architectures and having different power requirements. Program code executing on the ASMP is analyzed by a binary analysis unit to determine what functions are called by the program code and select which of the cores are to execute the program code, or a code segment thereof. Selection may be made to provide for native execution of the program code, to minimize power consumption, and so forth. Control operations based on this selection may then be inserted into the program code, forming instrumented program code. The instrumented program code is then executed by the ASMP.Type: ApplicationFiled: December 29, 2011Publication date: October 10, 2013Inventors: Koichi Yamada, Boris Ginzburg, Wei Li, Ronny Ronen, Esfir Natanzon, Konstantin Levit-Gurevich, Gadi Haber, Alon Naveh, Eliezer Weissmann, Michael Mishaeli