Patents by Inventor Youfeng Wu

Youfeng Wu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 7865885
    Abstract: Dynamic optimization of application code is performed by selecting a portion of the application code as a possible transaction. A transaction has a property that when it is executed, it is either atomically committed or atomically aborted. Determining whether to convert the selected portion of the application code to a transaction includes determining whether to apply at least one of a group of code optimizations to the portion of the application code. If it is determined to apply at least one of the code optimizations of the group of optimizations to the portion of application code, then the optimization is applied to the portion of the code and the portion of the code is converted to a transaction.
    Type: Grant
    Filed: September 27, 2006
    Date of Patent: January 4, 2011
    Assignee: Intel Corporation
    Inventors: Youfeng Wu, Cheng Wang, Ho-seop Kim
  • Publication number: 20100306512
    Abstract: A method and apparatus for efficient register checkpointing is herein described. A transaction is detected in program code. A recovery block is inserted in the program code to perform recovery operations in response to an abort of the first transaction. A roll-back edge is potentially inserted from an abort point to the recovery block. A control flow edge is inserted from the recovery block to a entry point of the transaction. Checkpoint code is inserted before the entry point to backup live-in registers in backup storage elements and recovery code is inserted in the recovery block to restore the live-in registers from the backup storage elements in response to an abort of the transaction.
    Type: Application
    Filed: August 13, 2010
    Publication date: December 2, 2010
    Inventors: Cheng Wang, Youfeng Wu
  • Patent number: 7844946
    Abstract: Methods and an apparatus for forming a transaction object instruction construct are provided. An example method translates a source instruction construct to form a transactional objective instruction construct, executes the transactional objective instruction construct, intercepts an aborted transaction associated with the transactional objective instruction construct during execution, maintains a graph of nodes and edges associated with the executed transactional objective instruction construct to predict a deadlock situation, and resolves the deadlock situation associated with the transactional objective instruction construct based on the graph.
    Type: Grant
    Filed: September 26, 2006
    Date of Patent: November 30, 2010
    Assignee: Intel Corporation
    Inventors: Youfeng Wu, Cheng Wang
  • Patent number: 7840953
    Abstract: In a method for reducing code size a replaceable subset of instructions at a first location within a set of instructions and a matching target subset of instructions at a second location within the set of instructions are identified. A base offset and a relative offset are determined. The base offset and the relative offset indicate an absolute offset from the first location to the second location. An instruction to cause a base offset storage element to be loaded with the base offset is inserted prior to the first location. The replaceable subset of instructions is replaced with a second instruction to cause a program counter to be modified based on the relative offset and a value in the base offset register so that the modified program counter indicates the second location.
    Type: Grant
    Filed: December 22, 2004
    Date of Patent: November 23, 2010
    Assignee: Intel Corporation
    Inventors: Youfeng Wu, Mauricio Breternitz, Jr.
  • Patent number: 7818744
    Abstract: An apparatus and method for redundant transient fault detection. In one embodiment, the method includes the replication of an application into two communicating threads, a leading thread and a trailing thread. The trailing thread may repeat computations performed by the leading thread to detect transient faults, referred to herein as “soft errors.” A first in, first out (FIFO) buffer of shared memory is reserved for passing data between the leading thread and the trailing thread. The FIFO buffer may include a buffer head variable to write data to the FIFO buffer and a buffer tail variable to read data from the FIFO buffer. In one embodiment, data passing between the leading thread data buffering is restricted according to a data unit size and thread synchronization between a leading thread and the trailing thread is limited to buffer overflow/underflow detection. Other embodiments are described and claimed.
    Type: Grant
    Filed: December 30, 2005
    Date of Patent: October 19, 2010
    Assignee: Intel Corporation
    Inventors: Cheng C. Wang, Youfeng Wu
  • Patent number: 7802136
    Abstract: A method and apparatus for efficient register checkpointing is herein described. A transaction is detected in program code. A recovery block is inserted in the program code to perform recovery operations in response to an abort of the first transaction. A roll-back edge is potentially inserted from an abort point to the recovery block. A control flow edge is inserted from the recovery block to a entry point of the transaction. Checkpoint code is inserted before the entry point to backup live-in registers in backup storage elements and recovery code is inserted in the recovery block to restore the live-in registers from the backup storage elements in response to an abort of the transaction.
    Type: Grant
    Filed: December 28, 2006
    Date of Patent: September 21, 2010
    Assignee: Intel Corporation
    Inventors: Cheng Wang, Youfeng Wu
  • Patent number: 7757221
    Abstract: A method and apparatus for dynamic binary translator to support precise exceptions with minimal optimization constraints. In one embodiment, the method includes the translation of a source binary application generated for a source instruction set architecture (ISA) into a sequential, intermediate representation (IR) of the source binary application. In one embodiment, the sequential IR is modified to incorporate exception recovery information for each of the exception instructions identified from the source binary application to enable a dynamic binary translator (DBT) to represent exception recovery values as regular values used by IR instructions. In one embodiment, the sequential IR may be optimized with a constraint on movement of an exception instruction downward past an irreversible instruction to form a non-sequential IR. In one embodiment, the non-sequential IR is optimized to form a translated binary application for a target ISA. Other embodiments are described and claimed.
    Type: Grant
    Filed: September 30, 2005
    Date of Patent: July 13, 2010
    Assignee: Intel Corporation
    Inventors: Bixia Zheng, Cheng C. Wang, Ho-seop Kim, Mauricio Breternitz, Jr., Youfeng Wu
  • Patent number: 7752613
    Abstract: A method and apparatus for disambiguating in a dynamic binary translator is described. The method comprises selecting a code segment for load-store memory disambiguation based at least in part on a measure of likelihood of frequency of execution of the code segment; heuristically identifying one or more ambiguous memory dependencies in the code segment for disambiguation by runtime checks; based at least in part on inspecting instructions in the code segment, and using a pointer analysis of the code segment to identify all other ambiguous memory dependencies that can be removed by the runtime checks.
    Type: Grant
    Filed: December 5, 2006
    Date of Patent: July 6, 2010
    Assignee: Intel Corporation
    Inventors: Bolei Guo, Youfeng Wu
  • Publication number: 20100169861
    Abstract: A method and apparatus for optimizing parallelized single threaded programs is herein described. Code regions, such as dependency chains, are replicated utilizing any known method, such as dynamic code replication. A flow network associated with a replicated code region is built and a minimum cut algorithm is applied to determine duplicated nodes, which may include a single instruction or a group of instructions, to be removed. The dependency of removed nodes is fulfilled with inserted communication to ensure proper data consistency of the original single-threaded program. As a result, both performance and power consumption is optimized for parallel code sections through removal of expensive workload nodes and replacement with communication between other replicated code regions to be executed in parallel.
    Type: Application
    Filed: December 29, 2008
    Publication date: July 1, 2010
    Inventors: Cheng Wang, Youfeng Wu
  • Patent number: 7725887
    Abstract: In a method for reducing code size, replaceable subsets of instructions at first locations in areas of infrequently executed instructions in a set of instructions and target subsets of instructions at second locations in the set of instructions are identified, wherein each replaceable subset matches at least one target subset. If multiple target subsets of instructions match one replaceable subset of instructions, one of the multiple matching target subsets is chosen as the matching target subset for the one replaceable subset based on whether the multiple target subsets are located in regions of frequently executed code. For each of at least some of the replaceable subsets of instructions, the replaceable subset of instructions is replaced with an instruction to cause the matching target subset of instructions at the second location to be executed.
    Type: Grant
    Filed: December 22, 2004
    Date of Patent: May 25, 2010
    Assignee: Intel Corporation
    Inventors: Youfeng Wu, Mauricio Breternitz, Jr.
  • Patent number: 7703088
    Abstract: Selected regions of native instructions translated in a DBT environment from non-native instructions are compressed based on the independent compression of different fields of selected instructions using compression tables to reduce a length of selected fields. The regions of compressed instructions are stored and de-compressed into the native instructions during subsequent execution using de-compression tables. Specifically, for native instructions of a selected region, selected types of opcodes and/or operands may be compressed independently. The types may be selected by profiling the opcodes using benchmark programs and creating an opcode conversion table prior to compression, and scanning of the operands and creating an operand conversion table during compression of the opcodes.
    Type: Grant
    Filed: September 30, 2005
    Date of Patent: April 20, 2010
    Assignee: Intel Corporation
    Inventors: Zhiyuan Li, Youfeng Wu
  • Patent number: 7694281
    Abstract: A first potential hot trace of a program is determined. A second potential hot trace of the program is determined. A common path from the first potential hot trace and the second potential hot trace is selected as the selected hot trace of the program.
    Type: Grant
    Filed: September 30, 2005
    Date of Patent: April 6, 2010
    Assignee: Intel Corporation
    Inventors: Cheng Wang, Bixia Zheng, Ho-seop Kim, Mauricio Breternitz, Jr., Youfeng Wu
  • Publication number: 20100083236
    Abstract: Methods and apparatus relating to compact trace trees for dynamic binary parallelization are described. In one embodiment, a compact trace tree (CTT) is generated to improve the effectiveness of dynamic binary parallelization. CTT may be used to determine which traces are to be duplicated and specialized for execution on separate processing elements. Other embodiments are also described and claimed.
    Type: Application
    Filed: September 30, 2008
    Publication date: April 1, 2010
    Inventors: Joao Paulo Porto, Edson Borin, Youfeng Wu, Cheng Wang
  • Publication number: 20090313616
    Abstract: A method and apparatus for improving parallelism through optimal code replication is herein described. An optimal replication factor for code is determined based on costs associated with a plurality of replication factors. The code is replicated by the optimal replication factor, and then the code is potentially executed in parallel to obtain parallelized efficient execution.
    Type: Application
    Filed: June 16, 2008
    Publication date: December 17, 2009
    Inventors: Cheng Wang, Youfeng Wu
  • Patent number: 7620781
    Abstract: Implementation of a Bloom filter using multiple single-ported memory slices. A control value is combined with a hashed address value such that the resultant address value has the property that one, and only one, of the k memories or slices is selected for a given input value, a, for each bank. Collisions are thereby avoided and the multiple hash accesses for a given input value, a, may be performed concurrently. Other embodiments are also described and claimed.
    Type: Grant
    Filed: December 19, 2006
    Date of Patent: November 17, 2009
    Assignee: Intel Corporation
    Inventors: Mauricio Breternitz, Jr., Youfeng Wu, Peter G. Sassone, Jeffrey P. Rupley, II, Wesley Attrot, Bryan Black
  • Publication number: 20090172644
    Abstract: Methods, systems and machine readable media are disclosed for performing dynamic information flow tracking. One method includes executing operations of a program with a main thread, and tracking the main thread's execution of the operations of the program with a tracking thread. The method further includes updating, with the tracking thread, a taint value associated with the value of the main thread to reflect whether the value is tainted, and determining, with the tracking thread based upon the taint value, whether use of the value by the main thread violates a specific security policy.
    Type: Application
    Filed: December 27, 2007
    Publication date: July 2, 2009
    Inventors: Vijayanand Nagarajan, Ho-Seop Kim, Youfeng Wu, Rajiv Gupta
  • Publication number: 20090172654
    Abstract: Disclosed are methods, machine readable medium and systems that dynamically translate binary programs. The dynamic binary translation may include identifying a hot code trace of a program. The translation may further include determining a completion ratio for the hot code trace. The translation may also include packaging the hot code trace into a transactional memory region in response to the completion ratio having a predetermined relationship to a threshold ratio.
    Type: Application
    Filed: December 28, 2007
    Publication date: July 2, 2009
    Inventors: Chengyan Zhao, Cheng Wang, Youfeng Wu
  • Publication number: 20090172713
    Abstract: Methods and apparatuses enable on-demand instruction emulation via user-level exception handling. A non-supported instruction triggers an exception during runtime of a program. In response to the exception, a user-level or application-level exception handler is launched, instead of a kernel-level handler. Then the exception handler can execute at the application layer instead of the kernel level. The handler identifies the instruction and emulates the instruction, where emulation of the instruction is supported by the handler. Emulating the instructions enables the program to continue execution. Repeated instruction emulation is amortized via dynamic binary translation of hot code.
    Type: Application
    Filed: December 31, 2007
    Publication date: July 2, 2009
    Inventors: Ho-Seop Kim, Mauricio Breternitz, JR., Youfeng Wu
  • Publication number: 20090125894
    Abstract: A method, system, and computer readable medium for converting a series of computer executable instructions in control flow graph form into an intermediate representation, of a type similar to Static Single Assignment (SSA), used in the compiler arts. The indeterminate representation may facilitate compilation optimizations such as constant propagation, sparse conditional constant propagation, dead code elimination, global value numbering, partial redundancy elimination, strength reduction, and register allocation. The method, system, and computer readable medium are capable of operating on the control flow graph to construct an SSA representation in parallel, thus exploiting recent advances in multi-core processing and massively parallel computing systems. Other embodiments may be employed, and other embodiments are described and claimed.
    Type: Application
    Filed: November 14, 2007
    Publication date: May 14, 2009
    Inventors: Sreekumar R. Nair, Youfeng Wu
  • Publication number: 20090077360
    Abstract: In one embodiment, the present invention includes a software-controlled method of forming instruction strands. The software may include instructions to obtain code of a superblock including a plurality of basic blocks, build a dependency directed acyclic graph (DAG) for the code, sort nodes coupled by edges of the dependency DAG into a topological order, form strands from the nodes based on hardware constraints, rule constraints, and scheduling constraints, and generate executable code for the strands and store the executable code in a storage. Other embodiments are described and claimed.
    Type: Application
    Filed: September 18, 2007
    Publication date: March 19, 2009
    Inventors: Wei Liu, Lixin Su, Youfeng Wu, Herbert Hum