Patents by Inventor Oren Ben-Kiki

Oren Ben-Kiki has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10664284
    Abstract: An apparatus and method are described for executing both latency-optimized execution logic and throughput-optimized execution logic on a processing device. For example, a processor according to one embodiment comprises: latency-optimized execution logic to execute a first type of program code; throughput-optimized execution logic to execute a second type of program code, wherein the first type of program code and the second type of program code are designed for the same instruction set architecture; logic to identify the first type of program code and the second type of program code within a process and to distribute the first type of program code for execution on the latency-optimized execution logic and the second type of program code for execution on the throughput-optimized execution logic.
    Type: Grant
    Filed: February 28, 2019
    Date of Patent: May 26, 2020
    Assignee: Intel Corporation
    Inventors: Oren Ben-Kiki, Yuval Yosef, Ilan Pardo, Dror Markovich
  • Patent number: 10528345
    Abstract: Instructions and logic provide atomic range operations in a multiprocessing system. In one embodiment an atomic range modification instruction specifies an address for a set of range indices. The instruction locks access to the set of range indices and loads the range indices to check the range size. The range size is compared with a size sufficient to perform the range modification. If the range size is sufficient to perform the range modification, the range modification is performed and one or more modified range indices of the set of range indices is stored back to memory. Otherwise an error signal is set when the range size is not sufficient to perform said range modification. Access to the set of range indices is unlocked responsive to completion of the atomic range modification instruction. Embodiments may include atomic increment next instructions, add next instructions, decrement end instructions, and/or subtract end instructions.
    Type: Grant
    Filed: March 27, 2015
    Date of Patent: January 7, 2020
    Assignee: Intel Corporation
    Inventors: Ilan Pardo, Oren Ben-Kiki, Arch D. Robison, Nadav Chachmon, James H. Cownie
  • Patent number: 10346195
    Abstract: A processor is described having logic circuitry of a general purpose CPU core to save multiple copies of context of a thread of the general purpose CPU core to prepare multiple micro-threads of a multi-threaded accelerator for execution to accelerate operations for the thread through parallel execution of the micro-threads.
    Type: Grant
    Filed: December 29, 2012
    Date of Patent: July 9, 2019
    Assignee: Intel Corporation
    Inventors: Oren Ben-Kiki, Ilan Pardo, Eliezer Weissmann, Robert Valentine, Yuval Yosef
  • Publication number: 20190196838
    Abstract: An apparatus and method are described for executing both latency-optimized execution logic and throughput-optimized execution logic on a processing device. For example, a processor according to one embodiment comprises: latency-optimized execution logic to execute a first type of program code; throughput-optimized execution logic to execute a second type of program code, wherein the first type of program code and the second type of program code are designed for the same instruction set architecture; logic to identify the first type of program code and the second type of program code within a process and to distribute the first type of program code for execution on the latency-optimized execution logic and the second type of program code for execution on the throughput-optimized execution logic.
    Type: Application
    Filed: February 28, 2019
    Publication date: June 27, 2019
    Inventors: Oren Ben-Kiki, Yuval Yosef, Ilan Pardo, Dror Markovich
  • Publication number: 20190171462
    Abstract: A processor having one or more processing cores is described. Each of the one or more processing cores has front end logic circuitry and a plurality of processing units. The front end logic circuitry is to fetch respective instructions of threads and decode the instructions into respective micro-code and input operand and resultant addresses of the instructions. Each of the plurality of processing units is to be assigned at least one of the threads, is coupled to said front end unit, and has a respective buffer to receive and store microcode of its assigned at least one of the threads.
    Type: Application
    Filed: November 26, 2018
    Publication date: June 6, 2019
    Inventors: ILAN PARDO, DROR MARKOVICH, OREN BEN-KIKI, YUVAL YOSEF
  • Patent number: 10255077
    Abstract: An apparatus and method are described for executing both latency-optimized execution logic and throughput-optimized execution logic on a processing device. For example, a processor according to one embodiment comprises: latency-optimized execution logic to execute a first type of program code; throughput-optimized execution logic to execute a second type of program code, wherein the first type of program code and the second type of program code are designed for the same instruction set architecture; logic to identify the first type of program code and the second type of program code within a process and to distribute the first type of program code for execution on the latency-optimized execution logic and the second type of program code for execution on the throughput-optimized execution logic.
    Type: Grant
    Filed: August 2, 2016
    Date of Patent: April 9, 2019
    Assignee: Intel Corporation
    Inventors: Oren Ben-Kiki, Yuval Yosef, Ilan Pardo, Dror Markovich
  • Patent number: 10146533
    Abstract: A processor includes circuitry to decode at least one instruction and an execution unit. The decoded instruction may compute a floating point result. The execution unit includes circuitry to execute the instruction to determine the floating point result, compute the amount of precision lost in a mantissa of the floating point result, compare the amount of precision lost to a numeric accumulation error precision threshold, determine whether a numeric accumulation error occurred based on the comparison, and write a value to a flag. The amount of precision lost corresponds to a plurality of bits lost in the mantissa of the floating point result. The value to be written to the flag may be based on the determination that the numeric accumulation error occurred. The flag may be for notification that the numeric accumulation error occurred.
    Type: Grant
    Filed: September 29, 2016
    Date of Patent: December 4, 2018
    Assignee: Intel Corporation
    Inventors: Ilan Pardo, Oren Ben-Kiki
  • Patent number: 10140129
    Abstract: A processor having one or more processing cores is described. Each of the one or more processing cores has front end logic circuitry and a plurality of processing units. The front end logic circuitry is to fetch respective instructions of threads and decode the instructions into respective micro-code and input operand and resultant addresses of the instructions. Each of the plurality of processing units is to be assigned at least one of the threads, is coupled to said front end unit, and has a respective buffer to receive and store microcode of its assigned at least one of the threads.
    Type: Grant
    Filed: December 28, 2012
    Date of Patent: November 27, 2018
    Assignee: Intel Corporation
    Inventors: Ilan Pardo, Dror Markovich, Oren Ben-Kiki, Yuval Yosef
  • Patent number: 10101999
    Abstract: A semiconductor chip is described having a load collision detection circuit comprising a first bloom filter circuit. The semiconductor chip has a store collision detection circuit comprising a second bloom filter circuit. The semiconductor chip has one or more processing units capable of executing ordered parallel threads coupled to the load collision detection circuit and the store collision detection circuit. The load collision detection circuit and the store collision detection circuit is to detect younger stores for load operations of said threads and younger loads for store operations of said threads.
    Type: Grant
    Filed: January 10, 2017
    Date of Patent: October 16, 2018
    Assignee: intel corporation
    Inventors: Enrique De Lucas, Pedro Marcuello, Oren Ben-Kiki, Ilan Pardo, Yuval Yosef
  • Patent number: 10095521
    Abstract: An apparatus and method are described for providing low-latency invocation of accelerators. For example, a processor according to one embodiment comprises execution logic to execute a plurality of instructions including an accelerator invocation instruction to invoke one or more accelerator commands. The accelerator invocation instruction stores command data specifying the command within a command register. One or more accelerators read the command data from the command register and responsively attempt to execute the command identified by the command data. Upon a switch from a first context to a second context, an accelerator context save/restore pointer identifies a region within system memory where the accelerator is to save its state and later the accelerator context save/restore pointer aids in restoring its state upon returning to the first context.
    Type: Grant
    Filed: May 3, 2016
    Date of Patent: October 9, 2018
    Assignee: Intel Corporation
    Inventors: Oren Ben-Kiki, Ilan Pardo, Robert Valentine, Eliezer Weissmann, Dror Markovich, Yuval Yosef
  • Patent number: 10095517
    Abstract: An apparatus and method are described for retrieving elements from a linked structure. For example, one embodiment of an apparatus comprises: a decode unit to decode a first instruction, the first instruction to utilize a current address value, an end address value, and an offset; and an execution unit to execute the first instruction to cause the execution unit to compare the current address value with the end address value, the execution unit to perform no additional operation with respect to the first instruction if the current address value is equal to the end address value; and if the current address value is not equal to the end address value, then the execution unit to add the offset value to the current address value to identify a next address pointer within an element structure, the execution unit to further set the current address value equal to the next address pointer.
    Type: Grant
    Filed: December 22, 2015
    Date of Patent: October 9, 2018
    Assignee: Intel Corporation
    Inventors: Oren Ben-Kiki, Ilan Pardo
  • Patent number: 10089113
    Abstract: An apparatus and method are described for providing low-latency invocation of accelerators. For example, a system according to one embodiment comprises: a processor includes a plurality of simultaneous multithreading (SMT) cores, at least one shared cache circuit to be shared among two or more of the SMT cores; and at least one of the SMT cores including at least one level 2 (L2) cache circuit to store both instructions and data and communicatively coupled to the instruction cache circuit and the data cache circuit, a communication interconnect circuit including a peripheral component interconnect express (PCIe) circuit to communicatively couple one or more of the SMT cores to an accelerator device and a memory access circuit to identify an accelerator context save/restore region in a memory responsive to a context save/restore value, the accelerator context save/restore region to share an accelerator context state.
    Type: Grant
    Filed: September 30, 2016
    Date of Patent: October 2, 2018
    Assignee: INTEL CORPORATION
    Inventors: Oren Ben-Kiki, Ilan Pardo, Robert Valentine, Eliezer Weissmann, Dror Markovich, Yuval Yosef
  • Patent number: 10083037
    Abstract: An apparatus and method are described for providing low-latency invocation of accelerators. For example, a processor according to one embodiment comprises: a plurality of simultaneous multithreading (SMT) cores, at least one shared cache circuit to be shared among the SMT cores, and at least one L2 cache circuit to store both instructions and data. The processor further comprises a communication interconnect circuit including a PCIe circuit to communicatively couple one or more of the SMT cores to an accelerator device, the PCIe circuit to provide the accelerator device access to resources of the processor including the at least one shared cache circuit. The processor further comprises a memory access circuit to identify an accelerator context save/restore region in a memory determined by an accelerator context save/restore value, the accelerator context save/restore region to store an accelerator context state.
    Type: Grant
    Filed: September 30, 2016
    Date of Patent: September 25, 2018
    Assignee: INTEL CORPORATION
    Inventors: Oren Ben-Kiki, Ilan Pardo, Robert Valentine, Eliezer Weissmann, Dror Markovich, Yuval Yosef
  • Patent number: 9934090
    Abstract: An apparatus and method are described for enforcement of reserved bits. For example, one embodiment of a processor comprises: a memory management unit to store a set of bits including a set of reserved bits to a system memory; reserved bit enforcement logic to generate a pseudo-random pattern in the reserved bits and an error correction code over the pseudo-random pattern prior to storing the reserved bits; the memory management unit to load the reserved bits including the pseudo-random pattern and the error correction code; the reserved bit enforcement logic to use the error correction code to determine whether the reserved bits have been modified by software; and if the reserved bits have been modified, then the processor to generate an error condition and if not modified, then the processor to continue normal execution.
    Type: Grant
    Filed: December 22, 2015
    Date of Patent: April 3, 2018
    Assignee: Intel Corporation
    Inventors: Oren Ben-Kiki, Ilan Pardo
  • Publication number: 20180088941
    Abstract: A processor includes circuitry to decode at least one instruction and an execution unit. The decoded instruction may compute a floating point result. The execution unit includes circuitry to execute the instruction to determine the floating point result, compute the amount of precision lost in a mantissa of the floating point result, compare the amount of precision lost to a numeric accumulation error precision threshold, determine whether a numeric accumulation error occurred based on the comparison, and write a value to a flag. The amount of precision lost corresponds to a plurality of bits lost in the mantissa of the floating point result. The value to be written to the flag may be based on the determination that the numeric accumulation error occurred. The flag may be for notification that the numeric accumulation error occurred.
    Type: Application
    Filed: September 29, 2016
    Publication date: March 29, 2018
    Inventors: Ilan Pardo, Oren Ben-Kiki
  • Patent number: 9747108
    Abstract: A processor of an aspect includes a plurality of processor elements, and a first processor element. The first processor element may perform a user-level fork instruction of a software thread. The first processor element may include a decoder to decode the user-level fork instruction. The user-level fork instruction is to indicate at least one instruction address. The first processor element may also include a user-level thread fork module. The user-level fork module, in response to the user-level fork instruction being decoded, may configure each of the plurality of processor elements to perform instructions in parallel. Other processors, methods, systems, and instructions are disclosed.
    Type: Grant
    Filed: March 27, 2015
    Date of Patent: August 29, 2017
    Assignee: Intel Corporation
    Inventors: Oren Ben-Kiki, Ilan Pardo, Arch D. Robison, James H. Cownie
  • Publication number: 20170177341
    Abstract: An apparatus and method are described for retrieving elements from a linked structure. For example, one embodiment of an apparatus comprises: a decode unit to decode a first instruction, the first instruction to utilize a current address value, an end address value, and an offset; and an execution unit to execute the first instruction to cause the execution unit to compare the current address value with the end address value, the execution unit to perform no additional operation with respect to the first instruction if the current address value is equal to the end address value; and if the current address value is not equal to the end address value, then the execution unit to add the offset value to the current address value to identify a next address pointer within an element structure, the execution unit to further set the current address value equal to the next address pointer.
    Type: Application
    Filed: December 22, 2015
    Publication date: June 22, 2017
    Inventors: OREN BEN-KIKI, ILAN PARDO
  • Publication number: 20170177439
    Abstract: An apparatus and method are described for enforcement of reserved bits. For example, one embodiment of a processor comprises: a memory management unit to store a set of bits including a set of reserved bits to a system memory; reserved bit enforcement logic to generate a pseudo-random pattern in the reserved bits and an error correction code over the pseudo-random pattern prior to storing the reserved bits; the memory management unit to load the reserved bits including the pseudo-random pattern and the error correction code; the reserved bit enforcement logic to use the error correction code to determine whether the reserved bits have been modified by software; and if the reserved bits have been modified, then the processor to generate an error condition and if not modified, then the processor to continue normal execution.
    Type: Application
    Filed: December 22, 2015
    Publication date: June 22, 2017
    Inventors: Oren Ben-Kiki, Ilan Pardo
  • Publication number: 20170147344
    Abstract: A semiconductor chip is described having a load collision detection circuit comprising a first bloom filter circuit. The semiconductor chip has a store collision detection circuit comprising a second bloom filter circuit. The semiconductor chip has one or more processing units capable of executing ordered parallel threads coupled to the load collision detection circuit and the store collision detection circuit. The load collision detection circuit and the store collision detection circuit is to detect younger stores for load operations of said threads and younger loads for store operations of said threads.
    Type: Application
    Filed: January 10, 2017
    Publication date: May 25, 2017
    Inventors: ENRIQUE DE LUCAS, PEDRO MARCUELLO, OREN BEN-KIKI, ILAN PARDO, YUVAL YOSEF
  • Publication number: 20170017491
    Abstract: An apparatus and method are described for providing low-latency invocation of accelerators.
    Type: Application
    Filed: September 30, 2016
    Publication date: January 19, 2017
    Inventors: Oren Ben-Kiki, ILAN PARDO, Robert Valentine, Eliezer Weissmann, Dror Markovich, Yuval Yosef