Patents by Inventor Youfeng Wu

Youfeng Wu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20050210197
    Abstract: According to one embodiment a system is disclosed. The system includes a central processing unit (CPU), a first cache memory coupled to the CPU to store only data for vital loads that are to be immediately processed at the CPU, a second cache memory coupled to the CPU to store data for semi-vital loads to be processed at the CPU, and a third cache memory coupled to the CPU, the first cache memory and the second cache memory to store non-vital loads to be processed at the CPU.
    Type: Application
    Filed: March 18, 2004
    Publication date: September 22, 2005
    Inventors: Ryan Rakvic, Youfeng Wu, Bryan Black, John Shen
  • Publication number: 20050149915
    Abstract: Methods and apparatus for optimizing a program undergoing dynamic binary translation using profile information are disclosed. A disclosed system optimizes foreign program instructions through an enhanced dynamic binary translation process. The foreign program instructions are translated into native program instructions. Loops within the native program instructions are instrumented with profiling instructions and optimized. The profiling information is collected during execution of the loop. After profiling information is collected, the loop may be further optimized by inserting prefetching instructions into the optimized loop. The prefetched loop is then linked back into the native program instructions and is executable.
    Type: Application
    Filed: December 29, 2003
    Publication date: July 7, 2005
    Inventors: Youfeng Wu, Orna Etzion
  • Patent number: 6848100
    Abstract: A hierarchical software profiling mechanism that gathers hierarchical path profile information has been described. Software to be profiled is instrumented with instructions that save an outer path sum when an inner region is entered, and restore the outer path sum when the inner region is exited. When the inner region is being executed, an inner path sum is generated and a profile indicator representing the inner path traversed is updated prior to the outer path sum being restored. The software to be profiled is instrumented using information from augmented control flow graphs that represent the software.
    Type: Grant
    Filed: March 31, 2000
    Date of Patent: January 25, 2005
    Assignee: Intel Corporation
    Inventors: Youfeng Wu, Ali Adl-Tabatabai, David A. Berson, Jesse Fang, Rajiv Gupta
  • Patent number: 6836841
    Abstract: In one embodiment, a method for speculatively reusing regions of code includes identifying a reuse region and a data input to the reuse region, determining whether a data output of the reuse region is contained within reuse region instance information pertaining to a plurality of instances of the reuse region, and when the data output is not contained within the reuse region instance information, predicting the data output of the reuse region based on the reuse region instance information.
    Type: Grant
    Filed: June 29, 2000
    Date of Patent: December 28, 2004
    Assignee: Intel Corporation
    Inventors: Youfeng Wu, Dong-Yuan Chen
  • Publication number: 20040216095
    Abstract: An arrangement is provided for data value recovery in an optimized program by precisely allocating predicate registers to guard branching instructions in the optimized program at compilation time. At execution time, an execution path leading to a recovery point is determined based on values of predicate registers guarding branching blocks. The values of non-current and non-resident data may be recovered at the recovery point according to the determined execution path. Optimization annotations may also be utilized for data value recovery.
    Type: Application
    Filed: April 25, 2003
    Publication date: October 28, 2004
    Inventor: Youfeng Wu
  • Publication number: 20040133886
    Abstract: Methods and apparatus to compile a software program to manage parallel &mgr; caches are disclosed. The compiler identifies a first set of load instructions for possibly bypassing a first cache and attempts to schedule the software program such that the load instructions in the first set of load instructions has at least a first predetermined latency greater than the latency of the first cache. The compiler also identifies a second set of load instructions in the scheduled software program having less than the first predetermined latency. The second set of load instructions is marked to access the first cache. The compiler identifies a third set of load instructions for possibly bypassing a second cache and attempts to schedule the software program such that the load instruction in the third set have at least a second predetermined latency greater than the latency of the second cache.
    Type: Application
    Filed: December 17, 2003
    Publication date: July 8, 2004
    Inventor: Youfeng Wu
  • Publication number: 20040078790
    Abstract: Methods and apparatus to manage bypassing of a first cache are disclosed. In one such method, a load instruction having an expected latency greater than or equal to a predetermined threshold is identified. A request is then made to schedule the identified load instruction to have a predetermined latency. The software program is then scheduled. An actual latency associated with the load instruction in the scheduled software program is then compared to the predetermined latency. If the actual latency is greater than or equal to the predetermined latency, the load instruction is marked to bypass the first cache.
    Type: Application
    Filed: October 22, 2002
    Publication date: April 22, 2004
    Inventors: Youfeng Wu, Li-Ling Chen
  • Publication number: 20040015930
    Abstract: A method and system for collaborative profiling for continuous detection of profile phase transitions is disclosed. In one embodiment, the method, comprises using hardware and software to perform continuous edge profiling on a program; detecting profile phase transitions continuously; and optimizing the program based upon the profile phase transitions and edge profile.
    Type: Application
    Filed: March 26, 2001
    Publication date: January 22, 2004
    Inventor: Youfeng Wu
  • Patent number: 6668372
    Abstract: An efficient software profiling technique utilizes a combination of software resources and hardware resources. Control flow graphs are partitioned into single entry regions and then further into blocks. Blocks are separated into profiled blocks and non-profiled blocks. Each profiled block has an existing instruction modified, or an auxiliary instruction added, thereby allowing the generation of a profiling counter address with little or no overhead in terms of end-user program execution speed. A register set is maintained that defines the scope for functions or procedures. The register set includes a base address register and an offset register. Profile counter addresses are generated from the register contents and information included in instructions within profiled blocks.
    Type: Grant
    Filed: October 13, 1999
    Date of Patent: December 23, 2003
    Assignee: Intel Corporation
    Inventor: Youfeng Wu
  • Publication number: 20030204666
    Abstract: A mechanism for maintaining reuse invalidation information includes a reuse buffer and a reuse invalidation buffer. The reuse buffer stores multiple instances of the reuse region. Each instance stored in the reuse buffer is identified by one or more versions. The reuse invalidation buffer contains multiple entries. Each entry in the reuse invalidation buffer includes one or more pairs of pointers pointing to instances and versions of instances held in the reuse buffer.
    Type: Application
    Filed: April 8, 2003
    Publication date: October 30, 2003
    Inventor: Youfeng Wu
  • Publication number: 20030204840
    Abstract: An apparatus and method for one-pass profiling to concurrently generate a frequency profile and a stride profile to enable pre-fetching of irregular program data are described. In one embodiment, the method includes the selective generation of stride profile information according to partially generated frequency profile information to concurrently form a stride profile and a frequency profile during execution of a user program instrumented during a single profiling pass. Once the stride profile and frequency profile are generated, prefetch instructions are inserted into the user program utilizing the stride profile and the frequency profile. In one embodiment, the present invention utilizes profiling to identify regular stride patterns in irregular program code, which is referred to herein as stride profiling.
    Type: Application
    Filed: April 30, 2002
    Publication date: October 30, 2003
    Inventor: Youfeng Wu
  • Patent number: 6629314
    Abstract: A mechanism for maintaining reuse invalidation information includes a reuse buffer and a reuse invalidation buffer. The reuse buffer stores multiple instances of the reuse region. Each instance stored in the reuse buffer is identified by one or more versions. The reuse invalidation buffer contains multiple entries. Each entry in the reuse invalidation buffer includes one or more pairs of pointers pointing to instances and versions of instances held in the reuse buffer.
    Type: Grant
    Filed: June 29, 2000
    Date of Patent: September 30, 2003
    Assignee: Intel Corporation
    Inventor: Youfeng Wu
  • Patent number: 6625725
    Abstract: A speculative code reuse mechanism includes a reuse buffer, a main processing core and a reuse checking core. The reuse buffer includes inputs and outputs of previously executed instances of code reuse regions. Aliased reuse regions are regions that access memory locations that may change between executions of the region. When an aliased code reuse region is encountered and a matching instance exists in the reuse buffer, the main core speculatively executes code occurring after the reuse region, while the reuse checking core executes code from the reuse region to verify the matching instance. If the matching instance is verified, the speculative execution is committed, and if the matching instance is not verified, the speculative execution is squashed.
    Type: Grant
    Filed: December 22, 1999
    Date of Patent: September 23, 2003
    Assignee: Intel Corporation
    Inventors: Youfeng Wu, Jesse Fang
  • Publication number: 20030126591
    Abstract: A compiler technique uses profile feedback to determine stride values for memory references, allowing prefetching of instructions for those loads that can be effectively prefetched. The compiler first identifies a set of loads, and instruments the loads to profile the difference between the successive load addresses in the current iteration and in the previous iteration. The frequency of stride difference is also profiled to allow the compiler to insert prefetching instructions for loads with near-constant strides. The compiler employs code analysis to determine the best prefetching distance, to reduce the profiling cost, and to reduce the prefetching overhead.
    Type: Application
    Filed: December 21, 2001
    Publication date: July 3, 2003
    Inventors: Youfeng Wu, Mauricio Serrano
  • Publication number: 20030110366
    Abstract: A data processing apparatus, a computer, an article including a machine-accessible medium, and a method of processing data are disclosed. The data processing apparatus may include a pair of pipelines sharing an instruction cache, data cache, and a branch predictor with the second pipeline running ahead of the first pipeline using a data value prediction module. The pipelines may be included in one or more processors and coupled to a memory to form a computer. The method includes executing a plurality of instructions using the pipeline pair, such that when a cache miss is encountered by the second pipeline during execution of a LOAD instruction, the data value prediction module supplies a predicted load value in lieu of a cached value, enabling continued execution of the plurality of instructions by the second pipeline without waiting for the return of the cached value.
    Type: Application
    Filed: December 12, 2001
    Publication date: June 12, 2003
    Applicant: Intel Corporation
    Inventors: Youfeng Wu, Tin-Fook Ngai
  • Publication number: 20030101442
    Abstract: The present invention relates to a method, apparatus, and system to formulate regions of reusable instructions. The method includes selecting initial regions. The method further includes computing UEU(E,R) and DED(X,R), wherein UEU(E,R) represents a number of upward exposed registers at a main entry E of a region R that are used in the region R and DED(X,R) represents a number of downward exposed registers at a main exit X of the region R that are defined in the region R. The method also includes applying code motion. The method additionally includes applying tail duplication.
    Type: Application
    Filed: September 28, 2001
    Publication date: May 29, 2003
    Inventor: Youfeng Wu
  • Publication number: 20030101444
    Abstract: The present invention relates to a method, apparatus, and system to optimize frequently executed code and to use compiler transformation and hardware support to handle infrequently executed code. The method includes compiling a computer program. The method further includes improving performance of the computer program by optimizing frequently executed code and using compiler transformation to handle infrequently executed code with hardware support. The method also includes storing temporarily the results of produced during execution of a region to improve performance of the computer program. The method additionally includes committing the results produced when the execution of the region is completed successfully.
    Type: Application
    Filed: October 30, 2001
    Publication date: May 29, 2003
    Inventors: Youfeng Wu, Li-Ling Chen
  • Patent number: 6571385
    Abstract: The invention is directed to the transformation of software loops having early exit conditions, thereby allowing the loops to be more effectively converted to a single basic block for software pipelining. The invention assigns a predicate register for each early exit condition of the software loop. The predicate registers are set when the corresponding early exit condition is satisfied. In this manner, when the loop terminates the predicate registers can be examined to indicate which early exit conditions were satisfied. The invention produces loops having a lower recurrence II and resource II than conventional techniques.
    Type: Grant
    Filed: March 22, 1999
    Date of Patent: May 27, 2003
    Assignee: Intel Corporation
    Inventors: Kalyan Muthukumar, Dong-Yuan Chen, Youfeng Wu, Daniel M. Lavery
  • Publication number: 20030074653
    Abstract: A compiler-directed speculative approach to resolve performance-degrading long latency events in an application is described. One or more performance-degrading instructions are identified from multiple instructions to be executed in a program. A set of instructions prefetching the performance-degrading instruction is defined within the program. Finally, at least one speculative bit of each instruction of the identified set of instructions is marked to indicate a predetermined execution of the instruction.
    Type: Application
    Filed: September 28, 2001
    Publication date: April 17, 2003
    Inventors: Dz-Ching Ju, Youfeng Wu
  • Publication number: 20030066061
    Abstract: A method and apparatus for providing compiler transformation of code using regions with simplified data and control flow and value specialization are described. In one embodiment, the method includes identifying in the code a plurality of potential candidates for value specialization, selecting a group of candidates from the plurality of potential candidates based on a value profile associated with each potential candidate, and determining specialized data for each selected candidate using a corresponding value profile. The method further includes forming a plurality of optimized regions based on corresponding specialized data. Each optimized region includes one or more selected candidates.
    Type: Application
    Filed: September 29, 2001
    Publication date: April 3, 2003
    Inventors: Youfeng Wu, Li-Ling Chen