Patents by Inventor Youfeng Wu

Youfeng Wu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 7308682
    Abstract: An arrangement is provided for data value recovery in an optimized program by precisely allocating predicate registers to guard branching instructions in the optimized program at compilation time. At execution time, an execution path leading to a recovery point is determined based on values of predicate registers guarding branching blocks. The values of non-current and non-resident data may be recovered at the recovery point according to the determined execution path. Optimization annotations may also be utilized for data value recovery.
    Type: Grant
    Filed: April 25, 2003
    Date of Patent: December 11, 2007
    Assignee: Intel Corporation
    Inventor: Youfeng Wu
  • Publication number: 20070240141
    Abstract: In one embodiment, the present invention includes a method for instrumenting a code block with code to perform dynamic information flow tracking. Then during execution, it may be determined whether a pattern of input data to the code block has been previously received by the code block. If so, the code block may be executed, otherwise the instrumented code block may be executed. Other embodiments are described and claimed.
    Type: Application
    Filed: March 30, 2006
    Publication date: October 11, 2007
    Inventors: Feng Qin, Cheng Wang, Ho-Seop Kim, Yuanyuan Zhou, Youfeng Wu
  • Publication number: 20070174837
    Abstract: An apparatus and method for redundant software thread computation. In one embodiment, the method includes the replication of an application into two communicating threads, a leading thread and a trailing thread. In one embodiment, the trailing thread repeats computations performed by the leading thread to detect transient faults, referred to herein as “soft errors.” A first in, first out (FIFO) buffer of shared memory is reserved for passing data between the leading thread and the trailing thread. The FIFO buffer may include a buffer head variable to write data to the FIFO buffer and a buffer tail variable to read data from the FIFO buffer. In one embodiment, data passing between the leading thread data buffering is restricted according to a data unit size and thread synchronization between a leading thread and the trailing thread is limited to buffer overflow/underflow detection. Other embodiments are described and claimed.
    Type: Application
    Filed: December 30, 2005
    Publication date: July 26, 2007
    Inventors: Cheng Wang, Youfeng Wu
  • Publication number: 20070094164
    Abstract: A method to compress microcode utilizing a genetic algorithm includes generating a population of chromosomes, each chromosome including one or more elements that indicate a cluster to which a portion of microcode memory belongs. The method further includes determining a fitness value of each chromosome and modifying the population of chromosomes based on the fitness values of the chromosomes to generate a new population of chromosomes. In addition, the method includes compressing the microcode memory using a cluster-based compression technique, wherein clusters are selected according to a chromosome from the new population with the best fitness value. Other embodiments are also disclosed.
    Type: Application
    Filed: September 27, 2005
    Publication date: April 26, 2007
    Inventors: Youfeng Wu, Mauricio Breternitz
  • Publication number: 20070079293
    Abstract: A first potential hot trace of a program is determined. A second potential hot trace of the program is determined. A common path from the first potential hot trace and the second potential hot trace is selected as the selected hot trace of the program.
    Type: Application
    Filed: September 30, 2005
    Publication date: April 5, 2007
    Inventors: Cheng Wang, Bixia Zheng, Ho-seop Kim, Mauricio Breternitz, Youfeng Wu
  • Publication number: 20070079296
    Abstract: Selected regions of native instructions translated in a DBT environment from non-native instructions are compressed based on the independent compression of different fields of selected instructions using compression tables to reduce a length of selected fields. The regions of compressed instructions are stored and de-compressed into the native instructions during subsequent execution using de-compression tables. Specifically, for native instructions of a selected region, selected types of opcodes and/or operands may be compressed independently. The types may be selected by profiling the opcodes using benchmark programs and creating an opcode conversion table prior to compression, and scanning of the operands and creating an operand conversion table during compression of the opcodes.
    Type: Application
    Filed: September 30, 2005
    Publication date: April 5, 2007
    Inventors: Zhiyuan Li, Youfeng Wu
  • Publication number: 20070079304
    Abstract: A method and apparatus for dynamic binary translator to support precise exceptions with minimal optimization constraints. In one embodiment, the method includes the translation of a source binary application generated for a source instruction set architecture (ISA) into a sequential, intermediate representation (IR) of the source binary application. In one embodiment, the sequential IR is modified to incorporate exception recovery information for each of the exception instructions identified from the source binary application to enable a dynamic binary translator (DBT) to represent exception recovery values as regular values used by IR instructions. In one embodiment, the sequential IR may be optimized with a constraint on movement of an exception instruction downward past an irreversible instruction to form a non-sequential IR. In one embodiment, the non-sequential IR is optimized to form a translated binary application for a target ISA. Other embodiments are described and claimed.
    Type: Application
    Filed: September 30, 2005
    Publication date: April 5, 2007
    Inventors: Bixia Zheng, Cheng Wang, Ho-seop Kim, Mauricio Breternitz, Youfeng Wu
  • Patent number: 7188234
    Abstract: A data processing apparatus, a computer, an article including a machine-accessible medium, and a method of processing data are disclosed. The data processing apparatus may include a pair of pipelines sharing an instruction cache, data cache, and a branch predictor with the second pipeline running ahead of the first pipeline using a data value prediction module. The pipelines may be included in one or more processors and coupled to a memory to form a computer. The method includes executing a plurality of instructions using the pipeline pair, such that when a cache miss is encountered by the second pipeline during execution of a LOAD instruction, the data value prediction module supplies a predicted load value in lieu of a cached value, enabling continued execution of the plurality of instructions by the second pipeline without waiting for the return of the cached value.
    Type: Grant
    Filed: December 12, 2001
    Date of Patent: March 6, 2007
    Assignee: Intel Corporation
    Inventors: Youfeng Wu, Tin-Fook Ngai
  • Publication number: 20070022279
    Abstract: An arrangement is provided for compressing microcode ROM (“uROM”) in a processor and for efficiently accessing a compressed “uROM”. A clustering-based approach may be used to effectively compress a uROM. The approach groups similar columns of microcode into different clusters and identifies unique patterns within each cluster. Only unique patterns identified in each cluster are stored in a pattern storage. Indices, which help map an address of a microcode word (“uOP”) to be fetched from a uROM to unique patterns required for the uOP, may be stored in an index storage. Typically it takes a longer time to fetch a uOP from a compressed uROM than from an uncompressed uROM. The compressed uROM may be so designed that the process of fetching a uOP (or uOPs) from a compressed uROM may be fully-pipelined to reduce the access latency.
    Type: Application
    Filed: July 20, 2005
    Publication date: January 25, 2007
    Inventors: Youfeng Wu, Sangwook Kim, Mauricio Breternitz, Herbert Hum
  • Patent number: 7120749
    Abstract: According to one embodiment a system is disclosed. The system includes a central processing unit (CPU), a first cache memory coupled to the CPU to store only data for vital loads that are to be immediately processed at the CPU, a second cache memory coupled to the CPU to store data for semi-vital loads to be processed at the CPU, and a third cache memory coupled to the CPU, the first cache memory and the second cache memory to store non-vital loads to be processed at the CPU.
    Type: Grant
    Filed: March 18, 2004
    Date of Patent: October 10, 2006
    Assignee: Intel Corporation
    Inventors: Ryan Rakvic, Youfeng Wu, Bryan Black, John Shen
  • Publication number: 20060206886
    Abstract: In a method for reducing code size, replaceable subsets of instructions at first locations in areas of infrequently executed instructions in a set of instructions and target subsets of instructions at second locations in the set of instructions are identified, wherein each replaceable subset matches at least one target subset. If multiple target subsets of instructions match one replaceable subset of instructions, one of the multiple matching target subsets is chosen as the matching target subset for the one replaceable subset based on whether the multiple target subsets are located in regions of frequently executed code. For each of at least some of the replaceable subsets of instructions, the replaceable subset of instructions is replaced with an instruction to cause the matching target subset of instructions at the second location to be executed.
    Type: Application
    Filed: December 22, 2004
    Publication date: September 14, 2006
    Applicant: INTEL CORPORATION
    Inventors: Youfeng Wu, Mauricio Breternitz
  • Patent number: 7100155
    Abstract: An apparatus and method for profiling candidate reuse regions and candidate load instructions aids in the selection of computation reuse regions and computation reuse instructions with good reuse qualities. Registers holding input values for candidate reuse regions are sampled periodically when the candidate reuse region is encountered. The register contents are combined into set-values. When a relatively small number of set-values account for a large percentage of occurrences, the candidate reuse region may be a good computation reuse region. Load instructions are profiled for the location accessed and the value loaded. The location and value are combined into location-values. The relative occurrence frequency of location-values can be used to evaluate load instructions as candidate instructions for reuse.
    Type: Grant
    Filed: March 10, 2000
    Date of Patent: August 29, 2006
    Assignee: Intel Corporation
    Inventor: Youfeng Wu
  • Patent number: 7095342
    Abstract: In one embodiment, the present invention includes a method to compress data stored in a memory to reduce size and power consumption. The method includes segmenting each word of a code portion into multiple fields, forming tables having unique entries for each of the fields, and assigning a pointer to each of the unique entries in each of the tables. Other embodiments are described and claimed.
    Type: Grant
    Filed: March 31, 2005
    Date of Patent: August 22, 2006
    Assignee: Intel Corporation
    Inventors: Herbert Hum, Mauricio Breternitz, Jr., Youfeng Wu, Sangwook Kim
  • Publication number: 20060136678
    Abstract: In a method for reducing code size a replaceable subset of instructions at a first location within a set of instructions and a matching target subset of instructions at a second location within the set of instructions are identified. A base offset and a relative offset are determined. The base offset and the relative offset indicate an absolute offset from the first location to the second location. An instruction to cause a base offset storage element to be loaded with the base offset is inserted prior to the first location. The replaceable subset of instructions is replaced with a second instruction to cause a program counter to be modified based on the relative offset and a value in the base offset register so that the modified program counter indicates the second location.
    Type: Application
    Filed: December 22, 2004
    Publication date: June 22, 2006
    Applicant: INTEL CORPORATION
    Inventors: Youfeng Wu, Mauricio Breternitz
  • Patent number: 7039909
    Abstract: A method and apparatus for providing compiler transformation of code using regions with simplified data and control flow and value specialization are described. In one embodiment, the method includes identifying in the code a plurality of potential candidates for value specialization, selecting a group of candidates from the plurality of potential candidates based on a value profile associated with each potential candidate, and determining specialized data for each selected candidate using a corresponding value profile. The method further includes forming a plurality of optimized regions based on corresponding specialized data. Each optimized region includes one or more selected candidates.
    Type: Grant
    Filed: September 29, 2001
    Date of Patent: May 2, 2006
    Assignee: Intel Corporation
    Inventors: Youfeng Wu, Li-Ling Chen
  • Patent number: 7032217
    Abstract: A method and system for collaborative profiling for continuous detection of profile phase transitions is disclosed. In one embodiment, the method, comprises using hardware and software to perform continuous edge profiling on a program; detecting profile phase transitions continuously; and optimizing the program based upon the profile phase transitions and edge profile.
    Type: Grant
    Filed: March 26, 2001
    Date of Patent: April 18, 2006
    Assignee: Intel Corporation
    Inventor: Youfeng Wu
  • Publication number: 20050289203
    Abstract: Methods are disclosed to implement bit scan operations using properties of two's complement arithmetic and compute zero index instructions. A data value may be provided and the most-significant or least-significant bit may be determined using the methods set forth herein.
    Type: Application
    Filed: June 24, 2004
    Publication date: December 29, 2005
    Inventors: Mauricio Breternitz, Youfeng Wu, Tal Abir
  • Patent number: 6964043
    Abstract: The present invention relates to a method, apparatus, and system to optimize frequently executed code and to use compiler transformation and hardware support to handle infrequently executed code. The method includes compiling a computer program. The method further includes improving performance of the computer program by optimizing frequently executed code and using compiler transformation to handle infrequently executed code with hardware support. The method also includes storing temporarily the results produced during execution of a region to improve performance of the computer program. The method additionally includes committing the results produced when the execution of the region is completed successfully.
    Type: Grant
    Filed: October 30, 2001
    Date of Patent: November 8, 2005
    Assignee: Intel Corporation
    Inventors: Youfeng Wu, Li-Ling Chen
  • Publication number: 20050240896
    Abstract: A method, machine readable medium, and system are disclosed. In one embodiment the method comprises collecting a loop trip count continuously during runtime of a region of code being executed that contains a loop, categorizing the trip count to identify one or more code modification techniques applicable to the loop, and dynamically applying the one or more applicable code modification techniques to alter the code that relates to the loop.
    Type: Application
    Filed: March 31, 2004
    Publication date: October 27, 2005
    Inventors: Youfeng Wu, Mauricio Breternitz
  • Patent number: 6959435
    Abstract: A compiler-directed speculative approach to resolve performance-degrading long latency events in an application is described. One or more performance-degrading instructions are identified from multiple instructions to be executed in a program. A set of instructions prefetching the performance-degrading instruction is defined within the program. Finally, at least one speculative bit of each instruction of the identified set of instructions is marked to indicate a predetermined execution of the instruction.
    Type: Grant
    Filed: September 28, 2001
    Date of Patent: October 25, 2005
    Assignee: Intel Corporation
    Inventors: Dz-Ching Ju, Youfeng Wu