Patents by Inventor Yuval Yosef

Yuval Yosef has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10664284
    Abstract: An apparatus and method are described for executing both latency-optimized execution logic and throughput-optimized execution logic on a processing device. For example, a processor according to one embodiment comprises: latency-optimized execution logic to execute a first type of program code; throughput-optimized execution logic to execute a second type of program code, wherein the first type of program code and the second type of program code are designed for the same instruction set architecture; logic to identify the first type of program code and the second type of program code within a process and to distribute the first type of program code for execution on the latency-optimized execution logic and the second type of program code for execution on the throughput-optimized execution logic.
    Type: Grant
    Filed: February 28, 2019
    Date of Patent: May 26, 2020
    Assignee: Intel Corporation
    Inventors: Oren Ben-Kiki, Yuval Yosef, Ilan Pardo, Dror Markovich
  • Patent number: 10346195
    Abstract: A processor is described having logic circuitry of a general purpose CPU core to save multiple copies of context of a thread of the general purpose CPU core to prepare multiple micro-threads of a multi-threaded accelerator for execution to accelerate operations for the thread through parallel execution of the micro-threads.
    Type: Grant
    Filed: December 29, 2012
    Date of Patent: July 9, 2019
    Assignee: Intel Corporation
    Inventors: Oren Ben-Kiki, Ilan Pardo, Eliezer Weissmann, Robert Valentine, Yuval Yosef
  • Publication number: 20190196838
    Abstract: An apparatus and method are described for executing both latency-optimized execution logic and throughput-optimized execution logic on a processing device. For example, a processor according to one embodiment comprises: latency-optimized execution logic to execute a first type of program code; throughput-optimized execution logic to execute a second type of program code, wherein the first type of program code and the second type of program code are designed for the same instruction set architecture; logic to identify the first type of program code and the second type of program code within a process and to distribute the first type of program code for execution on the latency-optimized execution logic and the second type of program code for execution on the throughput-optimized execution logic.
    Type: Application
    Filed: February 28, 2019
    Publication date: June 27, 2019
    Inventors: Oren Ben-Kiki, Yuval Yosef, Ilan Pardo, Dror Markovich
  • Publication number: 20190171462
    Abstract: A processor having one or more processing cores is described. Each of the one or more processing cores has front end logic circuitry and a plurality of processing units. The front end logic circuitry is to fetch respective instructions of threads and decode the instructions into respective micro-code and input operand and resultant addresses of the instructions. Each of the plurality of processing units is to be assigned at least one of the threads, is coupled to said front end unit, and has a respective buffer to receive and store microcode of its assigned at least one of the threads.
    Type: Application
    Filed: November 26, 2018
    Publication date: June 6, 2019
    Inventors: ILAN PARDO, DROR MARKOVICH, OREN BEN-KIKI, YUVAL YOSEF
  • Patent number: 10255077
    Abstract: An apparatus and method are described for executing both latency-optimized execution logic and throughput-optimized execution logic on a processing device. For example, a processor according to one embodiment comprises: latency-optimized execution logic to execute a first type of program code; throughput-optimized execution logic to execute a second type of program code, wherein the first type of program code and the second type of program code are designed for the same instruction set architecture; logic to identify the first type of program code and the second type of program code within a process and to distribute the first type of program code for execution on the latency-optimized execution logic and the second type of program code for execution on the throughput-optimized execution logic.
    Type: Grant
    Filed: August 2, 2016
    Date of Patent: April 9, 2019
    Assignee: Intel Corporation
    Inventors: Oren Ben-Kiki, Yuval Yosef, Ilan Pardo, Dror Markovich
  • Patent number: 10185566
    Abstract: In one embodiment, the present invention includes a multicore processor having first and second cores to independently execute instructions, the first core visible to an operating system (OS) and the second core transparent to the OS and heterogeneous from the first core. A task controller, which may be included in or coupled to the multicore processor, can cause dynamic migration of a first process scheduled by the OS to the first core to the second core transparently to the OS. Other embodiments are described and claimed.
    Type: Grant
    Filed: April 27, 2012
    Date of Patent: January 22, 2019
    Assignee: Intel Corporation
    Inventors: Alon Naveh, Yuval Yosef, Eliezer Weissmann, Anil Aggarwal, Efraim Rotem, Avi Mendelson, Ronny Ronen, Boris Ginzburg, Michael Mishaeli, Scott D. Hahn, David A. Koufaty, Ganapati Srinivasa, Guy Therien
  • Patent number: 10140129
    Abstract: A processor having one or more processing cores is described. Each of the one or more processing cores has front end logic circuitry and a plurality of processing units. The front end logic circuitry is to fetch respective instructions of threads and decode the instructions into respective micro-code and input operand and resultant addresses of the instructions. Each of the plurality of processing units is to be assigned at least one of the threads, is coupled to said front end unit, and has a respective buffer to receive and store microcode of its assigned at least one of the threads.
    Type: Grant
    Filed: December 28, 2012
    Date of Patent: November 27, 2018
    Assignee: Intel Corporation
    Inventors: Ilan Pardo, Dror Markovich, Oren Ben-Kiki, Yuval Yosef
  • Patent number: 10101999
    Abstract: A semiconductor chip is described having a load collision detection circuit comprising a first bloom filter circuit. The semiconductor chip has a store collision detection circuit comprising a second bloom filter circuit. The semiconductor chip has one or more processing units capable of executing ordered parallel threads coupled to the load collision detection circuit and the store collision detection circuit. The load collision detection circuit and the store collision detection circuit is to detect younger stores for load operations of said threads and younger loads for store operations of said threads.
    Type: Grant
    Filed: January 10, 2017
    Date of Patent: October 16, 2018
    Assignee: intel corporation
    Inventors: Enrique De Lucas, Pedro Marcuello, Oren Ben-Kiki, Ilan Pardo, Yuval Yosef
  • Patent number: 10095521
    Abstract: An apparatus and method are described for providing low-latency invocation of accelerators. For example, a processor according to one embodiment comprises execution logic to execute a plurality of instructions including an accelerator invocation instruction to invoke one or more accelerator commands. The accelerator invocation instruction stores command data specifying the command within a command register. One or more accelerators read the command data from the command register and responsively attempt to execute the command identified by the command data. Upon a switch from a first context to a second context, an accelerator context save/restore pointer identifies a region within system memory where the accelerator is to save its state and later the accelerator context save/restore pointer aids in restoring its state upon returning to the first context.
    Type: Grant
    Filed: May 3, 2016
    Date of Patent: October 9, 2018
    Assignee: Intel Corporation
    Inventors: Oren Ben-Kiki, Ilan Pardo, Robert Valentine, Eliezer Weissmann, Dror Markovich, Yuval Yosef
  • Patent number: 10089113
    Abstract: An apparatus and method are described for providing low-latency invocation of accelerators. For example, a system according to one embodiment comprises: a processor includes a plurality of simultaneous multithreading (SMT) cores, at least one shared cache circuit to be shared among two or more of the SMT cores; and at least one of the SMT cores including at least one level 2 (L2) cache circuit to store both instructions and data and communicatively coupled to the instruction cache circuit and the data cache circuit, a communication interconnect circuit including a peripheral component interconnect express (PCIe) circuit to communicatively couple one or more of the SMT cores to an accelerator device and a memory access circuit to identify an accelerator context save/restore region in a memory responsive to a context save/restore value, the accelerator context save/restore region to share an accelerator context state.
    Type: Grant
    Filed: September 30, 2016
    Date of Patent: October 2, 2018
    Assignee: INTEL CORPORATION
    Inventors: Oren Ben-Kiki, Ilan Pardo, Robert Valentine, Eliezer Weissmann, Dror Markovich, Yuval Yosef
  • Patent number: 10083037
    Abstract: An apparatus and method are described for providing low-latency invocation of accelerators. For example, a processor according to one embodiment comprises: a plurality of simultaneous multithreading (SMT) cores, at least one shared cache circuit to be shared among the SMT cores, and at least one L2 cache circuit to store both instructions and data. The processor further comprises a communication interconnect circuit including a PCIe circuit to communicatively couple one or more of the SMT cores to an accelerator device, the PCIe circuit to provide the accelerator device access to resources of the processor including the at least one shared cache circuit. The processor further comprises a memory access circuit to identify an accelerator context save/restore region in a memory determined by an accelerator context save/restore value, the accelerator context save/restore region to store an accelerator context state.
    Type: Grant
    Filed: September 30, 2016
    Date of Patent: September 25, 2018
    Assignee: INTEL CORPORATION
    Inventors: Oren Ben-Kiki, Ilan Pardo, Robert Valentine, Eliezer Weissmann, Dror Markovich, Yuval Yosef
  • Patent number: 9720730
    Abstract: In one embodiment, the present invention includes a multicore processor with first and second groups of cores. The second group can be of a different instruction set architecture (ISA) than the first group or of the same ISA set but having different power and performance support level, and is transparent to an operating system (OS). The processor further includes a migration unit that handles migration requests for a number of different scenarios and causes a context switch to dynamically migrate a process from the second core to a first core of the first group. This dynamic hardware-based context switch can be transparent to the OS. Other embodiments are described and claimed.
    Type: Grant
    Filed: December 30, 2011
    Date of Patent: August 1, 2017
    Assignee: Intel Corporation
    Inventors: Boris Ginzburg, Ilya Osadchiy, Ronny Ronen, Eliezer Weissmann, Michael Mishaeli, Alon Naveh, David A. Koufaty, Scott D. Hahn, Tong Li, Avi Mendleson, Eugene Gorbatov, Hisham Abu-Salah, Dheeraj R. Subbareddy, Paolo Narvaez, Aamer Jaleel, Efraim Rotem, Yuval Yosef, Anil Aggarwal, Kenzo Van Craeynest
  • Publication number: 20170147344
    Abstract: A semiconductor chip is described having a load collision detection circuit comprising a first bloom filter circuit. The semiconductor chip has a store collision detection circuit comprising a second bloom filter circuit. The semiconductor chip has one or more processing units capable of executing ordered parallel threads coupled to the load collision detection circuit and the store collision detection circuit. The load collision detection circuit and the store collision detection circuit is to detect younger stores for load operations of said threads and younger loads for store operations of said threads.
    Type: Application
    Filed: January 10, 2017
    Publication date: May 25, 2017
    Inventors: ENRIQUE DE LUCAS, PEDRO MARCUELLO, OREN BEN-KIKI, ILAN PARDO, YUVAL YOSEF
  • Patent number: 9588771
    Abstract: In one embodiment, the present invention includes a method for directly communicating between an accelerator and an instruction sequencer coupled thereto, where the accelerator is a heterogeneous resource with respect to the instruction sequencer. An interface may be used to provide the communication between these resources. Via such a communication mechanism a user-level application may directly communicate with the accelerator without operating system support. Further, the instruction sequencer and the accelerator may perform operations in parallel. Other embodiments are described and claimed.
    Type: Grant
    Filed: March 8, 2013
    Date of Patent: March 7, 2017
    Assignee: Intel Corporation
    Inventors: Hong Wang, John Shen, Hong Jiang, Richard Hankins, Per Hammarlund, Dion Rodgers, Gautham Chinya, Baiju Patel, Shiv Kaushik, Bryant Bigbee, Gad Sheaffer, Yoav Talgam, Yuval Yosef, James P. Held
  • Publication number: 20170017492
    Abstract: An apparatus and method are described for providing low-latency invocation of accelerators.
    Type: Application
    Filed: September 30, 2016
    Publication date: January 19, 2017
    Inventors: Oren Ben-Kiki, ILAN PARDO, Robert Valentine, Eliezer Weissmann, Dror Markovich, Yuval Yosef
  • Publication number: 20170017491
    Abstract: An apparatus and method are described for providing low-latency invocation of accelerators.
    Type: Application
    Filed: September 30, 2016
    Publication date: January 19, 2017
    Inventors: Oren Ben-Kiki, ILAN PARDO, Robert Valentine, Eliezer Weissmann, Dror Markovich, Yuval Yosef
  • Patent number: 9542193
    Abstract: A semiconductor chip is described having a load collision detection circuit comprising a first bloom filter circuit. The semiconductor chip has a store collision detection circuit comprising a second bloom filter circuit. The semiconductor chip has one or more processing units capable of executing ordered parallel threads coupled to the load collision detection circuit and the store collision detection circuit. The load collision detection circuit and the store collision detection circuit is to detect younger stores for load operations of said threads and younger loads for store operations of said threads.
    Type: Grant
    Filed: December 28, 2012
    Date of Patent: January 10, 2017
    Assignee: Intel Corporation
    Inventors: Enrique De Lucas, Pedro Marcuello, Oren Ben-Kiki, Ilan Pardo, Yuval Yosef
  • Publication number: 20160342419
    Abstract: An apparatus and method are described for executing both latency-optimized execution logic and throughput-optimized execution logic on a processing device. For example, a processor according to one embodiment comprises: latency-optimized execution logic to execute a first type of program code; throughput-optimized execution logic to execute a second type of program code, wherein the first type of program code and the second type of program code are designed for the same instruction set architecture; logic to identify the first type of program code and the second type of program code within a process and to distribute the first type of program code for execution on the latency-optimized execution logic and the second type of program code for execution on the throughput-optimized execution logic.
    Type: Application
    Filed: August 2, 2016
    Publication date: November 24, 2016
    Inventors: Ben Oren-Kiki, Yuval Yosef, Ilan Pardo, Dror Markovich
  • Patent number: 9459874
    Abstract: In one embodiment, the present invention includes a method for directly communicating between an accelerator and an instruction sequencer coupled thereto, where the accelerator is a heterogeneous resource with respect to the instruction sequencer. An interface may be used to provide the communication between these resources. Via such a communication mechanism a user-level application may directly communicate with the accelerator without operating system support. Further, the instruction sequencer and the accelerator may perform operations in parallel. Other embodiments are described and claimed.
    Type: Grant
    Filed: November 14, 2014
    Date of Patent: October 4, 2016
    Assignee: Intel Corporation
    Inventors: Hong Wang, John Shen, Hong Jiang, Richard Hankins, Per Hammarlund, Dion Rodgers, Gautham Chinya, Baiju Patel, Shiv Kaushik, Bryant Bigbee, Gad Sheaffer, Yoav Talgam, Yuval Yosef, James P. Held
  • Publication number: 20160246597
    Abstract: An apparatus and method are described for providing low-latency invocation of accelerators.
    Type: Application
    Filed: May 3, 2016
    Publication date: August 25, 2016
    Inventors: Oren Ben-Kiki, llan Pardo, Robert Valentine, Eliezer Weissmann, Dror Markovich, Yuval Yosef