Patents by Inventor Youfeng Wu

Youfeng Wu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11755099
    Abstract: Example methods and apparatus to facilitate dynamic core selection are disclosed. An example apparatus includes a first processor core of a first type; a second processor core of a second type different from the first type; and software to: access a user-supplied hint indicative of a user preference to execute program code on the first processor core, the user-supplied hint including a user-defined attribute of the program code; monitor performance of the program code on the first processor core; determine, based on the user-defined attribute of the program code, a predicted performance of the program code on the second processor core is better than the performance of the program code on the first processor core; and ignore the user preference by migrating the program code from the first processor core for execution on the second processor core.
    Type: Grant
    Filed: June 28, 2022
    Date of Patent: September 12, 2023
    Assignee: Intel Corporation
    Inventors: Youfeng Wu, Shiliang Hu, Edson Borin, Cheng Wang
  • Publication number: 20220326756
    Abstract: Example methods and apparatus to facilitate dynamic core selection are disclosed.
    Type: Application
    Filed: June 28, 2022
    Publication date: October 13, 2022
    Inventors: Youfeng Wu, Shiliang Hu, Edson Borin, Cheng Wang
  • Patent number: 10725755
    Abstract: Systems, apparatuses, and methods for a hardware and software system to automatically decompose a program into multiple parallel threads are described. In some embodiments, the systems and apparatuses execute a method of original code decomposition and/or generated thread execution.
    Type: Grant
    Filed: June 6, 2017
    Date of Patent: July 28, 2020
    Assignee: Intel Corporation
    Inventors: David J. Sager, Ruchira Sasanka, Ron Gabor, Shlomo Raikin, Joseph Nuzman, Leeor Peled, Jason A. Domer, Ho-Seop Kim, Youfeng Wu, Koichi Yamada, Tin-Fook Ngai, Howard H. Chen, Jayaram Bobba, Jeffrey J. Cook, Omar M. Shaikh, Suresh Srinivas
  • Patent number: 10540178
    Abstract: A processor for redundant stores includes a front end including circuitry to decode instructions from an instruction stream, a data cache unit including circuitry to cache data for the processor, a binary translator, and a memory execution unit. The binary translator includes circuitry to identify a first region of the instruction stream including a redundant store, mark a first starting instruction of the first region with a protection designator, mark a first ending instruction of the first region with a clear designator, and store an amended instruction stream with the markings. The memory execution unit includes circuitry to track the first redundant store based on the protection designator and the clear designator to eliminate the first redundant store.
    Type: Grant
    Filed: September 14, 2016
    Date of Patent: January 21, 2020
    Assignee: Intel Corporation
    Inventors: Vineeth Mekkat, Youfeng Wu, Sebastian Winkel, Oleg Margulis
  • Patent number: 10534424
    Abstract: Dynamically switching cores on a heterogeneous multi-core processing system may be performed by executing program code on a first processing core. Power up of a second processing core may be signaled. A first performance metric of the first processing core executing the program code may be collected. When the first performance metric is better than a previously determined core performance metric, power down of the second processing core may be signaled and execution of the program code may be continued on the first processing core. When the first performance metric is not better than the previously determined core performance metric, execution of the program code may be switched from the first processing core to the second processing core.
    Type: Grant
    Filed: January 2, 2016
    Date of Patent: January 14, 2020
    Assignee: Intel Corporation
    Inventors: Youfeng Wu, Shiliang Hu, Edson Borin, Cheng Wang
  • Patent number: 10489158
    Abstract: A processor of an aspect includes a decode unit to decode a persistent store fence instruction. The processor also includes a memory subsystem module coupled with the decode unit. The memory subsystem module, in response to the persistent store fence instruction, is to ensure that a given data corresponding to the persistent store fence instruction is stored persistently in a persistent storage before data of all subsequent store instructions is stored persistently in the persistent storage. The subsequent store instructions occur after the persistent store fence instruction in original program order. Other processors, methods, systems, and articles of manufacture are also disclosed.
    Type: Grant
    Filed: September 26, 2014
    Date of Patent: November 26, 2019
    Assignee: Intel Corporation
    Inventors: Cheng Wang, Youfeng Wu, Rajesh M Sankaran
  • Publication number: 20190332158
    Abstract: Dynamically switching cores on a heterogeneous multi-core processing system may be performed by executing program code on a first processing core. Power up of a second processing core may be signaled. A first performance metric of the first processing core executing the program code may be collected. When the first performance metric is better than a previously determined core performance metric, power down of the second processing core may be signaled and execution of the program code may be continued on the first processing core. When the first performance metric is not better than the previously determined core performance metric, execution of the program code may be switched from the first processing core to the second processing core.
    Type: Application
    Filed: July 11, 2019
    Publication date: October 31, 2019
    Inventors: Youfeng Wu, Shiliang Hu, Edson Borin, Cheng Wang
  • Patent number: 10437319
    Abstract: Dynamically switching cores on a heterogeneous multi-core processing system may be performed by executing program code on a first processing core. Power up of a second processing core may be signaled. A first performance metric of the first processing core executing the program code may be collected. When the first performance metric is better than a previously determined core performance metric, power down of the second processing core may be signaled and execution of the program code may be continued on the first processing core. When the first performance metric is not better than the previously determined core performance metric, execution of the program code may be switched from the first processing core to the second processing core.
    Type: Grant
    Filed: January 2, 2016
    Date of Patent: October 8, 2019
    Assignee: Intel Corporation
    Inventors: Youfeng Wu, Shiliang Hu, Edson Borin, Cheng Wang
  • Patent number: 10437318
    Abstract: Dynamically switching cores on a heterogeneous multi-core processing system may be performed by executing program code on a first processing core. Power up of a second processing core may be signaled. A first performance metric of the first processing core executing the program code may be collected. When the first performance metric is better than a previously determined core performance metric, power down of the second processing core may be signaled and execution of the program code may be continued on the first processing core. When the first performance metric is not better than the previously determined core performance metric, execution of the program code may be switched from the first processing core to the second processing core.
    Type: Grant
    Filed: January 2, 2016
    Date of Patent: October 8, 2019
    Assignee: Intel Corporation
    Inventors: Youfeng Wu, Shiliang Hu, Edson Borin, Cheng Wang
  • Patent number: 10387296
    Abstract: Methods and systems to identify threads responsible for causing a concurrency bug in a computer program having a plurality of concurrently executing threads are disclosed. An example method disclosed herein includes defining, with a processor, a data type. The data type including a first predicate, the first predicate being invoked using a first program instruction inserted in a first thread of the plurality of threads, a second predicate, the second predicate being invoked using a second program instruction inserted in a second thread of the plurality of threads, and an expression defining a relationship between the first predicate and the second predicate. The method further includes, in response to determining the relationship is satisfied during execution of the computer program, identifying the first thread and the second thread as responsible for the concurrency bug.
    Type: Grant
    Filed: August 26, 2015
    Date of Patent: August 20, 2019
    Assignee: Intel corporation
    Inventors: Youfeng Wu, Justin Gottschlich, Gilles Pokam, Shiliang Hu, Ali-Reza Adl-Tabatabai, Cristiano Pereira
  • Patent number: 10324768
    Abstract: Embodiments described herein utilize restricted transactional memory (RTM) instructions to implement speculative compile time optimizations that will be automatically rolled back by hardware in the event of a missed speculation. In one embodiment, a lightweight version of RTM for speculative compiler optimization is described to provide lower operational overhead in comparison to conventional RTM implementations used when performing SLE.
    Type: Grant
    Filed: December 17, 2014
    Date of Patent: June 18, 2019
    Assignee: Intel Corporation
    Inventors: Cheng Wang, Youfeng Wu, Sara S. Baghsorkhi, Albert Hartono, Robert Valentine
  • Patent number: 10303525
    Abstract: Systems, methods, and apparatuses for data speculation execution (DSX) are described. In some embodiments, a hardware apparatus for performing DSX comprises a hardware decoder to decode an instruction, the instruction to include an opcode and an operand to store a portion of a fallback address, execution hardware to execute the decoded instruction to initiate a data speculative execution (DSX) region by activating DSX tracking hardware to track speculative memory accesses and detect ordering violations in the DSX region, and storing the fallback address.
    Type: Grant
    Filed: December 24, 2014
    Date of Patent: May 28, 2019
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Christopher J. Hughes, Robert Valentine, Milind B. Girkar, Hideki Ido, Youfeng Wu, Cheng Wang
  • Patent number: 10296343
    Abstract: A processing device including a first shadow register, a second shadow register, and an instruction execution circuit, communicatively coupled to the first shadow register and the second shadow register, to receive a sequence of instructions comprising a first local commit marker, a first global commit marker, and a first register access instruction referencing an architectural register, speculatively execute the first register access instruction to generate a speculative register state value associated with a physical register, responsive to identifying the first local commit marker, store, in the first shadow register, the speculative register state value, and responsive to identifying the first global commit marker, store, in the second shadow register, the speculative register state value.
    Type: Grant
    Filed: March 30, 2017
    Date of Patent: May 21, 2019
    Assignee: Intel Corporation
    Inventors: Vineeth Mekkat, Jason M. Agron, Youfeng Wu
  • Patent number: 10268497
    Abstract: Methods and apparatus relating to conjugate code generation for efficient dynamic optimizations are described. In an embodiment, a binary code and an intermediate representation (IR) code are generated based at least partially on a source program. The binary code and the intermediate code are transmitted to a virtual machine logic. The binary code and the IR code each include a plurality of regions that are in one-to-one correspondence. Other embodiments are also claimed and described.
    Type: Grant
    Filed: October 24, 2013
    Date of Patent: April 23, 2019
    Assignee: Intel Corporation
    Inventors: Hongbo Rong, Hyunchul Park, Cheng Wang, Youfeng Wu
  • Patent number: 10235177
    Abstract: In an example, an apparatus includes a binary translator (BT) including circuitry to: analyze a code block; determine that an architectural register mapped to a physical register in the physical register file is available for early reclamation; and insert a reclamation hint into the code block. In another example, a processor reclaims the physical register based at least in part on the reclamation hint.
    Type: Grant
    Filed: July 2, 2016
    Date of Patent: March 19, 2019
    Assignee: Intel Corporation
    Inventors: Vineeth Mekkat, Janghaeng Lee, Youfeng Wu
  • Patent number: 10191745
    Abstract: In one example a processor includes a region formation engine to identify a region of code for translation from a guest instruction set architecture to a native instruction set architecture. The processor also includes a binary translator to translate the region of code. The region formation engine is to perform aggressive region formation, which includes forming a region across a boundary of a return instruction. The translated region of code is to prevent a side entry into the translated region of code at a translated return target instruction included in the translated region of code. In more specific examples, performing aggressive region formation includes a region formation grow phase and a region formation cleanup phase. In the grow phase priority may be given to growing complete paths from a call target to a corresponding return. The region formation cleanup phase may comprise eliminating call targets that are not reachable.
    Type: Grant
    Filed: March 31, 2017
    Date of Patent: January 29, 2019
    Assignee: Intel Corporation
    Inventors: Hou-Jen Ko, Girish Venkatasubramanian, Jason Agron, Tyler Sondag, Youfeng Wu
  • Patent number: 10120686
    Abstract: A processor includes a front end including circuitry to decode instructions from an instruction stream, a data cache unit including circuitry to cache data for the processor, and a binary translator. The binary translator includes circuitry to identify a redundant store in the instruction stream, mark the start and end of a region of the instruction stream with the redundant store, remove the redundant store, and store an amended instruction stream with the redundant store removed.
    Type: Grant
    Filed: June 7, 2016
    Date of Patent: November 6, 2018
    Assignee: Intel Corporation
    Inventors: Vineeth Mekkat, Oleg Margulis, Ching-Tsun Chou, Youfeng Wu
  • Publication number: 20180285112
    Abstract: A processing device including a first shadow register, a second shadow register, and an instruction execution circuit, communicatively coupled to the first shadow register and the second shadow register, to receive a sequence of instructions comprising a first local commit marker, a first global commit marker, and a first register access instruction referencing an architectural register, speculatively execute the first register access instruction to generate a speculative register state value associated with a physical register, responsive to identifying the first local commit marker, store, in the first shadow register, the speculative register state value, and responsive to identifying the first global commit marker, store, in the second shadow register, the speculative register state value.
    Type: Application
    Filed: March 30, 2017
    Publication date: October 4, 2018
    Inventors: Vineeth Mekkat, Jason M. Agron, Youfeng Wu
  • Publication number: 20180285113
    Abstract: In one example a processor includes a region formation engine to identify a region of code for translation from a guest instruction set architecture to a native instruction set architecture. The processor also includes a binary translator to translate the region of code. The region formation engine is to perform aggressive region formation, which includes forming a region across a boundary of a return instruction. The translated region of code is to prevent a side entry into the translated region of code at a translated return target instruction included in the translated region of code. In more specific examples, performing aggressive region formation includes a region formation grow phase and a region formation cleanup phase. In the grow phase priority may be given to growing complete paths from a call target to a corresponding return. The region formation cleanup phase may comprise eliminating call targets that are not reachable.
    Type: Application
    Filed: March 31, 2017
    Publication date: October 4, 2018
    Applicant: Intel Corporation
    Inventors: Hou-Jen Ko, Girish Venkatasubramanian, Jason Agron, Tyler Sondag, Youfeng Wu
  • Patent number: 10078357
    Abstract: In one embodiment, the present invention includes an apparatus having a core including functional units each to execute instructions of a target instruction set architecture (ISA) and a power controller to control a power mode of a first functional unit responsive to a power identification field of a power instruction of a power region of a code block to be executed on the core. Other embodiments are described and claimed.
    Type: Grant
    Filed: January 12, 2015
    Date of Patent: September 18, 2018
    Assignee: Intel Corporation
    Inventors: Jaewoong Chung, Hanjun Kim, Youfeng Wu