Patents by Inventor Oren Ben-Kiki
Oren Ben-Kiki has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 10664284Abstract: An apparatus and method are described for executing both latency-optimized execution logic and throughput-optimized execution logic on a processing device. For example, a processor according to one embodiment comprises: latency-optimized execution logic to execute a first type of program code; throughput-optimized execution logic to execute a second type of program code, wherein the first type of program code and the second type of program code are designed for the same instruction set architecture; logic to identify the first type of program code and the second type of program code within a process and to distribute the first type of program code for execution on the latency-optimized execution logic and the second type of program code for execution on the throughput-optimized execution logic.Type: GrantFiled: February 28, 2019Date of Patent: May 26, 2020Assignee: Intel CorporationInventors: Oren Ben-Kiki, Yuval Yosef, Ilan Pardo, Dror Markovich
-
Patent number: 10528345Abstract: Instructions and logic provide atomic range operations in a multiprocessing system. In one embodiment an atomic range modification instruction specifies an address for a set of range indices. The instruction locks access to the set of range indices and loads the range indices to check the range size. The range size is compared with a size sufficient to perform the range modification. If the range size is sufficient to perform the range modification, the range modification is performed and one or more modified range indices of the set of range indices is stored back to memory. Otherwise an error signal is set when the range size is not sufficient to perform said range modification. Access to the set of range indices is unlocked responsive to completion of the atomic range modification instruction. Embodiments may include atomic increment next instructions, add next instructions, decrement end instructions, and/or subtract end instructions.Type: GrantFiled: March 27, 2015Date of Patent: January 7, 2020Assignee: Intel CorporationInventors: Ilan Pardo, Oren Ben-Kiki, Arch D. Robison, Nadav Chachmon, James H. Cownie
-
Patent number: 10346195Abstract: A processor is described having logic circuitry of a general purpose CPU core to save multiple copies of context of a thread of the general purpose CPU core to prepare multiple micro-threads of a multi-threaded accelerator for execution to accelerate operations for the thread through parallel execution of the micro-threads.Type: GrantFiled: December 29, 2012Date of Patent: July 9, 2019Assignee: Intel CorporationInventors: Oren Ben-Kiki, Ilan Pardo, Eliezer Weissmann, Robert Valentine, Yuval Yosef
-
Publication number: 20190196838Abstract: An apparatus and method are described for executing both latency-optimized execution logic and throughput-optimized execution logic on a processing device. For example, a processor according to one embodiment comprises: latency-optimized execution logic to execute a first type of program code; throughput-optimized execution logic to execute a second type of program code, wherein the first type of program code and the second type of program code are designed for the same instruction set architecture; logic to identify the first type of program code and the second type of program code within a process and to distribute the first type of program code for execution on the latency-optimized execution logic and the second type of program code for execution on the throughput-optimized execution logic.Type: ApplicationFiled: February 28, 2019Publication date: June 27, 2019Inventors: Oren Ben-Kiki, Yuval Yosef, Ilan Pardo, Dror Markovich
-
Publication number: 20190171462Abstract: A processor having one or more processing cores is described. Each of the one or more processing cores has front end logic circuitry and a plurality of processing units. The front end logic circuitry is to fetch respective instructions of threads and decode the instructions into respective micro-code and input operand and resultant addresses of the instructions. Each of the plurality of processing units is to be assigned at least one of the threads, is coupled to said front end unit, and has a respective buffer to receive and store microcode of its assigned at least one of the threads.Type: ApplicationFiled: November 26, 2018Publication date: June 6, 2019Inventors: ILAN PARDO, DROR MARKOVICH, OREN BEN-KIKI, YUVAL YOSEF
-
Patent number: 10255077Abstract: An apparatus and method are described for executing both latency-optimized execution logic and throughput-optimized execution logic on a processing device. For example, a processor according to one embodiment comprises: latency-optimized execution logic to execute a first type of program code; throughput-optimized execution logic to execute a second type of program code, wherein the first type of program code and the second type of program code are designed for the same instruction set architecture; logic to identify the first type of program code and the second type of program code within a process and to distribute the first type of program code for execution on the latency-optimized execution logic and the second type of program code for execution on the throughput-optimized execution logic.Type: GrantFiled: August 2, 2016Date of Patent: April 9, 2019Assignee: Intel CorporationInventors: Oren Ben-Kiki, Yuval Yosef, Ilan Pardo, Dror Markovich
-
Patent number: 10146533Abstract: A processor includes circuitry to decode at least one instruction and an execution unit. The decoded instruction may compute a floating point result. The execution unit includes circuitry to execute the instruction to determine the floating point result, compute the amount of precision lost in a mantissa of the floating point result, compare the amount of precision lost to a numeric accumulation error precision threshold, determine whether a numeric accumulation error occurred based on the comparison, and write a value to a flag. The amount of precision lost corresponds to a plurality of bits lost in the mantissa of the floating point result. The value to be written to the flag may be based on the determination that the numeric accumulation error occurred. The flag may be for notification that the numeric accumulation error occurred.Type: GrantFiled: September 29, 2016Date of Patent: December 4, 2018Assignee: Intel CorporationInventors: Ilan Pardo, Oren Ben-Kiki
-
Patent number: 10140129Abstract: A processor having one or more processing cores is described. Each of the one or more processing cores has front end logic circuitry and a plurality of processing units. The front end logic circuitry is to fetch respective instructions of threads and decode the instructions into respective micro-code and input operand and resultant addresses of the instructions. Each of the plurality of processing units is to be assigned at least one of the threads, is coupled to said front end unit, and has a respective buffer to receive and store microcode of its assigned at least one of the threads.Type: GrantFiled: December 28, 2012Date of Patent: November 27, 2018Assignee: Intel CorporationInventors: Ilan Pardo, Dror Markovich, Oren Ben-Kiki, Yuval Yosef
-
Patent number: 10101999Abstract: A semiconductor chip is described having a load collision detection circuit comprising a first bloom filter circuit. The semiconductor chip has a store collision detection circuit comprising a second bloom filter circuit. The semiconductor chip has one or more processing units capable of executing ordered parallel threads coupled to the load collision detection circuit and the store collision detection circuit. The load collision detection circuit and the store collision detection circuit is to detect younger stores for load operations of said threads and younger loads for store operations of said threads.Type: GrantFiled: January 10, 2017Date of Patent: October 16, 2018Assignee: intel corporationInventors: Enrique De Lucas, Pedro Marcuello, Oren Ben-Kiki, Ilan Pardo, Yuval Yosef
-
Patent number: 10095517Abstract: An apparatus and method are described for retrieving elements from a linked structure. For example, one embodiment of an apparatus comprises: a decode unit to decode a first instruction, the first instruction to utilize a current address value, an end address value, and an offset; and an execution unit to execute the first instruction to cause the execution unit to compare the current address value with the end address value, the execution unit to perform no additional operation with respect to the first instruction if the current address value is equal to the end address value; and if the current address value is not equal to the end address value, then the execution unit to add the offset value to the current address value to identify a next address pointer within an element structure, the execution unit to further set the current address value equal to the next address pointer.Type: GrantFiled: December 22, 2015Date of Patent: October 9, 2018Assignee: Intel CorporationInventors: Oren Ben-Kiki, Ilan Pardo
-
Patent number: 10095521Abstract: An apparatus and method are described for providing low-latency invocation of accelerators. For example, a processor according to one embodiment comprises execution logic to execute a plurality of instructions including an accelerator invocation instruction to invoke one or more accelerator commands. The accelerator invocation instruction stores command data specifying the command within a command register. One or more accelerators read the command data from the command register and responsively attempt to execute the command identified by the command data. Upon a switch from a first context to a second context, an accelerator context save/restore pointer identifies a region within system memory where the accelerator is to save its state and later the accelerator context save/restore pointer aids in restoring its state upon returning to the first context.Type: GrantFiled: May 3, 2016Date of Patent: October 9, 2018Assignee: Intel CorporationInventors: Oren Ben-Kiki, Ilan Pardo, Robert Valentine, Eliezer Weissmann, Dror Markovich, Yuval Yosef
-
Patent number: 10089113Abstract: An apparatus and method are described for providing low-latency invocation of accelerators. For example, a system according to one embodiment comprises: a processor includes a plurality of simultaneous multithreading (SMT) cores, at least one shared cache circuit to be shared among two or more of the SMT cores; and at least one of the SMT cores including at least one level 2 (L2) cache circuit to store both instructions and data and communicatively coupled to the instruction cache circuit and the data cache circuit, a communication interconnect circuit including a peripheral component interconnect express (PCIe) circuit to communicatively couple one or more of the SMT cores to an accelerator device and a memory access circuit to identify an accelerator context save/restore region in a memory responsive to a context save/restore value, the accelerator context save/restore region to share an accelerator context state.Type: GrantFiled: September 30, 2016Date of Patent: October 2, 2018Assignee: INTEL CORPORATIONInventors: Oren Ben-Kiki, Ilan Pardo, Robert Valentine, Eliezer Weissmann, Dror Markovich, Yuval Yosef
-
Patent number: 10083037Abstract: An apparatus and method are described for providing low-latency invocation of accelerators. For example, a processor according to one embodiment comprises: a plurality of simultaneous multithreading (SMT) cores, at least one shared cache circuit to be shared among the SMT cores, and at least one L2 cache circuit to store both instructions and data. The processor further comprises a communication interconnect circuit including a PCIe circuit to communicatively couple one or more of the SMT cores to an accelerator device, the PCIe circuit to provide the accelerator device access to resources of the processor including the at least one shared cache circuit. The processor further comprises a memory access circuit to identify an accelerator context save/restore region in a memory determined by an accelerator context save/restore value, the accelerator context save/restore region to store an accelerator context state.Type: GrantFiled: September 30, 2016Date of Patent: September 25, 2018Assignee: INTEL CORPORATIONInventors: Oren Ben-Kiki, Ilan Pardo, Robert Valentine, Eliezer Weissmann, Dror Markovich, Yuval Yosef
-
Patent number: 9934090Abstract: An apparatus and method are described for enforcement of reserved bits. For example, one embodiment of a processor comprises: a memory management unit to store a set of bits including a set of reserved bits to a system memory; reserved bit enforcement logic to generate a pseudo-random pattern in the reserved bits and an error correction code over the pseudo-random pattern prior to storing the reserved bits; the memory management unit to load the reserved bits including the pseudo-random pattern and the error correction code; the reserved bit enforcement logic to use the error correction code to determine whether the reserved bits have been modified by software; and if the reserved bits have been modified, then the processor to generate an error condition and if not modified, then the processor to continue normal execution.Type: GrantFiled: December 22, 2015Date of Patent: April 3, 2018Assignee: Intel CorporationInventors: Oren Ben-Kiki, Ilan Pardo
-
Publication number: 20180088941Abstract: A processor includes circuitry to decode at least one instruction and an execution unit. The decoded instruction may compute a floating point result. The execution unit includes circuitry to execute the instruction to determine the floating point result, compute the amount of precision lost in a mantissa of the floating point result, compare the amount of precision lost to a numeric accumulation error precision threshold, determine whether a numeric accumulation error occurred based on the comparison, and write a value to a flag. The amount of precision lost corresponds to a plurality of bits lost in the mantissa of the floating point result. The value to be written to the flag may be based on the determination that the numeric accumulation error occurred. The flag may be for notification that the numeric accumulation error occurred.Type: ApplicationFiled: September 29, 2016Publication date: March 29, 2018Inventors: Ilan Pardo, Oren Ben-Kiki
-
Patent number: 9747108Abstract: A processor of an aspect includes a plurality of processor elements, and a first processor element. The first processor element may perform a user-level fork instruction of a software thread. The first processor element may include a decoder to decode the user-level fork instruction. The user-level fork instruction is to indicate at least one instruction address. The first processor element may also include a user-level thread fork module. The user-level fork module, in response to the user-level fork instruction being decoded, may configure each of the plurality of processor elements to perform instructions in parallel. Other processors, methods, systems, and instructions are disclosed.Type: GrantFiled: March 27, 2015Date of Patent: August 29, 2017Assignee: Intel CorporationInventors: Oren Ben-Kiki, Ilan Pardo, Arch D. Robison, James H. Cownie
-
Publication number: 20170177341Abstract: An apparatus and method are described for retrieving elements from a linked structure. For example, one embodiment of an apparatus comprises: a decode unit to decode a first instruction, the first instruction to utilize a current address value, an end address value, and an offset; and an execution unit to execute the first instruction to cause the execution unit to compare the current address value with the end address value, the execution unit to perform no additional operation with respect to the first instruction if the current address value is equal to the end address value; and if the current address value is not equal to the end address value, then the execution unit to add the offset value to the current address value to identify a next address pointer within an element structure, the execution unit to further set the current address value equal to the next address pointer.Type: ApplicationFiled: December 22, 2015Publication date: June 22, 2017Inventors: OREN BEN-KIKI, ILAN PARDO
-
Publication number: 20170177439Abstract: An apparatus and method are described for enforcement of reserved bits. For example, one embodiment of a processor comprises: a memory management unit to store a set of bits including a set of reserved bits to a system memory; reserved bit enforcement logic to generate a pseudo-random pattern in the reserved bits and an error correction code over the pseudo-random pattern prior to storing the reserved bits; the memory management unit to load the reserved bits including the pseudo-random pattern and the error correction code; the reserved bit enforcement logic to use the error correction code to determine whether the reserved bits have been modified by software; and if the reserved bits have been modified, then the processor to generate an error condition and if not modified, then the processor to continue normal execution.Type: ApplicationFiled: December 22, 2015Publication date: June 22, 2017Inventors: Oren Ben-Kiki, Ilan Pardo
-
Publication number: 20170147344Abstract: A semiconductor chip is described having a load collision detection circuit comprising a first bloom filter circuit. The semiconductor chip has a store collision detection circuit comprising a second bloom filter circuit. The semiconductor chip has one or more processing units capable of executing ordered parallel threads coupled to the load collision detection circuit and the store collision detection circuit. The load collision detection circuit and the store collision detection circuit is to detect younger stores for load operations of said threads and younger loads for store operations of said threads.Type: ApplicationFiled: January 10, 2017Publication date: May 25, 2017Inventors: ENRIQUE DE LUCAS, PEDRO MARCUELLO, OREN BEN-KIKI, ILAN PARDO, YUVAL YOSEF
-
Publication number: 20170017491Abstract: An apparatus and method are described for providing low-latency invocation of accelerators.Type: ApplicationFiled: September 30, 2016Publication date: January 19, 2017Inventors: Oren Ben-Kiki, ILAN PARDO, Robert Valentine, Eliezer Weissmann, Dror Markovich, Yuval Yosef