Patents by Inventor Youfeng Wu

Youfeng Wu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20150186183
    Abstract: A processor includes a decoder to decode an instruction, a scheduler to schedule the instruction, and an execution unit to execute the instruction. The instruction is to load a memory operation applicable to a quantity of addresses into an execution vector. The execution vector includes a plurality of vector positions for respective addressees. The instruction is further to evaluate, for a given address in the execution vector at a vector position, whether a cache indicates that a previous memory operation was performed at a higher vector position than the vector position of the given address. The instruction is also to determine, based on the evaluation whether the cache indicates that the previous memory operation was performed at a higher vector position than the vector position of the given address, whether the memory operation will cause a memory error.
    Type: Application
    Filed: December 30, 2013
    Publication date: July 2, 2015
    Inventors: NALINI VASUDEVAN, YOUFENG WU, CHENG WANG, SARA BAGHSORKHI, ALBERT HARTONO
  • Publication number: 20150169226
    Abstract: Technologies for persistent memory programming include a computing device having a persistent memory including one or more nonvolatile regions. The computing device may assign a virtual memory address of a target location in persistent memory to a persistent memory pointer using persistent pointer strategy, and may dereference the pointer using the same strategy. Persistent pointer strategies include off-holder, ID-in-value, optimistic rectification, and pessimistic rectification. The computing device may log changes to persistent memory during the execution of a data consistency section, and commit changes to the persistent memory when the last data consistency section ends. Data consistency sections may be grouped by log group identifier. Using type metadata stored in the nonvolatile region, the computing device may identify the type of a root object within the nonvolatile region and then recursively identify the type of all objects referenced by the root object. Other embodiments are described and claimed.
    Type: Application
    Filed: September 25, 2014
    Publication date: June 18, 2015
    Inventors: Xipeng Shen, Youfeng Wu, Cheng Wang, Hyunchul Park, Hongbo Rong
  • Publication number: 20150160715
    Abstract: In one embodiment, the present invention includes an apparatus having a core including functional units each to execute instructions of a target instruction set architecture (ISA) and a power controller to control a power mode of a first functional unit responsive to a power identification field of a power instruction of a power region of a code block to be executed on the core. Other embodiments are described and claimed.
    Type: Application
    Filed: January 12, 2015
    Publication date: June 11, 2015
    Inventors: Jaewoong Chung, Hanjun Kim, Youfeng Wu
  • Patent number: 8954775
    Abstract: In one embodiment, the present invention includes an apparatus having a core including functional units each to execute instructions of a target instruction set architecture (ISA) and a power controller to control a power mode of a first functional unit responsive to a power identification field of a power instruction of a power region of a code block to be executed on the core. Other embodiments are described and claimed.
    Type: Grant
    Filed: June 20, 2012
    Date of Patent: February 10, 2015
    Assignee: Intel Corporation
    Inventors: Jaewoong Chung, Hanjun Kim, Youfeng Wu
  • Publication number: 20150039861
    Abstract: In an embodiment, a system includes a processor including one or more cores and a plurality of alias registers to store memory range information associated with a plurality of operations of a loop. The memory range information references one or more memory locations within a memory. The system also includes register assignment means for assigning each of the alias registers to a corresponding operation of the loop, where the assignments are made according to a rotation schedule, and one of the alias registers is assigned to a first operation in a first iteration of the loop and to a second operation in a subsequent iteration of the loop. The system also includes the memory coupled to the processor. Other embodiments are described and claimed.
    Type: Application
    Filed: May 30, 2013
    Publication date: February 5, 2015
    Inventors: Hongbo Rong, Cheng Wang, Hyunchul Park, Youfeng Wu
  • Patent number: 8935678
    Abstract: Methods and an apparatus to form a resilient objective instruction construct are provided. An example method obtains a source instruction construct and forms a resilient objective instruction construct by compiling one or more resilient transactions.
    Type: Grant
    Filed: April 9, 2012
    Date of Patent: January 13, 2015
    Assignee: Intel Corporation
    Inventors: Youfeng Wu, Cheng Wang, Ho-Seop Kim
  • Publication number: 20140359591
    Abstract: In an embodiment, a system includes a processor including at least one core to execute operations of a loop that includes S stages. The system also includes stage insertion means for adding a delay stage to the loop to increase a lifetime of a corresponding register associated with a first variable of the loop and to delay storage of contents of the register. The system also includes a dynamic random access memory (DRAM). Other embodiments are described and claimed.
    Type: Application
    Filed: May 30, 2013
    Publication date: December 4, 2014
    Inventors: Hyunchul Park, Hongbo Rong, Youfeng Wu
  • Publication number: 20140298306
    Abstract: Apparatuses and methods may provide for determining a level of performance for processing one or more loops by a dynamic compiler and executing code optimizations to generate a pipelined schedule for the one or more loops that achieves the determined level of performance within a prescribed time period. In one example, a dependence graph may be established for the one or more loops, and each dependence graph may be partitioned into stages based on the level of performance.
    Type: Application
    Filed: March 29, 2013
    Publication date: October 2, 2014
    Inventors: Hongbo Rong, Hyunchul Park, Youfeng Wu
  • Publication number: 20140282423
    Abstract: Methods, apparatus, systems and articles of manufacture are disclosed to manage concurrent predicate expressions. An example method discloses inserting a first condition hook into a first thread, the first condition hook associated with a first condition, inserting a second condition hook into a second thread, the second condition hook associated with a second condition, preventing the second thread from executing until the first condition is satisfied, and identifying a concurrency violation when the second condition is satisfied.
    Type: Application
    Filed: March 14, 2013
    Publication date: September 18, 2014
    Inventors: Justin E. Gottschlich, Cristiano Ligieri Pereira, Gilles Pokam, Youfeng Wu
  • Publication number: 20140281246
    Abstract: A system, processor, and method to predict with high accuracy and retain instruction boundaries for previously executed instructions in order to decode variable length instructions is disclosed. In at least one embodiment, a disclosed processor includes an instruction fetch unit, an instruction cache, a boundary byte predictor, and an instruction decoder. In some embodiments, the instruction fetch unit provides an instruction address and the instruction cache produces an instruction tag and instruction cache content corresponding to the instruction address. The instruction decoder, in some embodiments, includes boundary byte logic to determine an instruction boundary in the instruction cache content.
    Type: Application
    Filed: March 15, 2013
    Publication date: September 18, 2014
    Inventors: Mauricio Breternitz, JR., Youfeng Wu, Peter Sassone, James Mason, Aashish Phansalkar, Balaji Vijayan
  • Publication number: 20140281382
    Abstract: A system and method to enhance execution of architected instructions in a processor uses auxiliary code to optimize execution of base microcode. An execution context of the architected instructions may be profiled to detect potential optimizations, resulting in generation and storage of auxiliary microcode. When the architected instructions are decoded to base microcode for execution, the base microcode may be enhanced or modified using retrieved auxiliary code.
    Type: Application
    Filed: March 15, 2013
    Publication date: September 18, 2014
    Applicant: Intel Corporation
    Inventors: James E. Smith, Denis M. Khartikov, Shiliang Hu, Youfeng Wu
  • Publication number: 20140223166
    Abstract: Dynamically switching cores on a heterogeneous multi-core processing system may be performed by executing program code on a first processing core. Power up of a second processing core may be signaled. A first performance metric of the first processing core executing the program code may be collected. When the first performance metric is better than a previously determined core performance metric, power down of the second processing core may be signaled and execution of the program code may be continued on the first processing core. When the first performance metric is not better than the previously determined core performance metric, execution of the program code may be switched from the first processing core to the second processing core.
    Type: Application
    Filed: January 31, 2014
    Publication date: August 7, 2014
    Applicant: INTEL CORPORATION
    Inventors: Youfeng Wu, Shiliang Hu, Edson Borin, Cheng Wang
  • Publication number: 20140208085
    Abstract: Logic and instruction to efficiently monitor loop trip count. Loop trip count information of a loop may be stored in a dedicated hardware buffer. Average loop trip count of the loop may be calculated based on the stored loop trip count information. Based on the average trip count, loop optimizations may be applied or removed from the loop. The stored loop trip count information may include an identifier identifying the loop, a total loop trip count of the loop, and an exit count of the loop.
    Type: Application
    Filed: March 30, 2012
    Publication date: July 24, 2014
    Inventors: Jaewoong Chung, Hyunchul Park, Hongbo Rong, Cheng Wang, Youfeng Wu
  • Patent number: 8789031
    Abstract: In one embodiment, the present invention includes a software-controlled method of forming instruction strands. The software may include instructions to obtain code of a superblock including a plurality of basic blocks, build a dependency directed acyclic graph (DAG) for the code, sort nodes coupled by edges of the dependency DAG into a topological order, form strands from the nodes based on hardware constraints, rule constraints, and scheduling constraints, and generate executable code for the strands and store the executable code in a storage. Other embodiments are described and claimed.
    Type: Grant
    Filed: September 18, 2007
    Date of Patent: July 22, 2014
    Assignee: Intel Corporation
    Inventors: Wei Liu, Lixin Su, Youfeng Wu, Herbert Hum
  • Publication number: 20140122845
    Abstract: In one embodiment, the present invention includes a processor having a core to execute instructions. This core can include various structures and logic that enable instructions of different atomic regions to be executed in an overlapping manner. To this end, the core can include a register file having registers to store data for use in execution of the instructions, and multiple shadow register files each to store a register checkpoint on initiation of a given atomic region. In this way, overlapping execution of atomic regions identified by a programmer or compiler can occur. Other embodiments are described and claimed.
    Type: Application
    Filed: December 30, 2011
    Publication date: May 1, 2014
    Inventors: Jaewoong Chung, Cheng Wang, Youfeng Wu
  • Publication number: 20140095778
    Abstract: Methods and apparatus are disclosed to cache code in non-volatile memory. A disclosed example method includes identifying an instance of a code request for first code, identifying whether the first code is stored on non-volatile (NV) random access memory (RAM) cache, and when the first code is absent from the NV RAM cache, adding the first code to the NV RAM cache when a first condition associated with the first code is met and preventing storage of the first code to the NV RAM cache when the first condition is not met.
    Type: Application
    Filed: September 28, 2012
    Publication date: April 3, 2014
    Inventors: Jaewoong Chung, Youfeng Wu, Cheng Wang
  • Publication number: 20140096132
    Abstract: Technologies for performing flexible code acceleration on a computing device includes initializing an accelerator virtual device on the computing device. The computing device allocates memory-mapped input and output (I/O) for the accelerator virtual device and also allocates an accelerator virtual device context for a code to be accelerated. The computing device accesses a bytecode of the code to be accelerated and determines whether the bytecode is an operating system-dependent bytecode. If not, the computing device performs hardware acceleration of the bytecode via the memory-mapped I/O using an internal binary translation module. However, if the bytecode is operating system-dependent, the computing device performs software acceleration of the bytecode.
    Type: Application
    Filed: September 28, 2012
    Publication date: April 3, 2014
    Inventors: Cheng Wang, Youfeng Wu
  • Patent number: 8683243
    Abstract: Dynamically switching cores on a heterogeneous multi-core processing system may be performed by executing program code on a first processing core. Power up of a second processing core may be signaled. A first performance metric of the first processing core executing the program code may be collected. When the first performance metric is better than a previously determined core performance metric, power down of the second processing core may be signaled and execution of the program code may be continued on the first processing core. When the first performance metric is not better than the previously determined core performance metric, execution of the program code may be switched from the first processing core to the second processing core.
    Type: Grant
    Filed: March 11, 2011
    Date of Patent: March 25, 2014
    Assignee: Intel Corporation
    Inventors: Youfeng Wu, Shiliang Hu, Edson Borin, Cheng Wang
  • Publication number: 20140032885
    Abstract: Example methods and apparatus to manage partial commit-checkpoints are disclosed. A disclosed example method includes identifying a commit instruction associated with a region of instructions executed by a processor, identifying candidate instructions from the region of instructions, and generating a processor partial commit-checkpoint to save a current state of the processor, the checkpoint based on calculated register values associated with live instructions, and including instruction reference addresses to link the candidate instructions.
    Type: Application
    Filed: September 30, 2013
    Publication date: January 30, 2014
    Inventors: Edson Borin, Youfeng Wu
  • Publication number: 20140007054
    Abstract: Methods and systems to identify and reproduce concurrency bugs in multi-threaded programs are disclosed. An example method disclosed herein includes defining a data type. The data type includes a first predicate associated with a first thread of a multi-threaded program that is associated with a first condition, a second predicate that is associated with a second thread of the multi-threaded program, the second predicate being associated with a second condition, and an expression that defines a relationship between the first predicate and the second predicate. The relationship, when satisfied, causes the concurrency bug to be detected. A concurrency bug detector conforming to the data type is used to detect the concurrency bug in the multi-threaded program.
    Type: Application
    Filed: June 27, 2012
    Publication date: January 2, 2014
    Inventors: Youfeng Wu, Justin E. Gottschlich, Gilles Pokam, Shiliang Hu, Ali-Reza Adl-Tabatabai, Cristiano L. Pereira