Patents by Inventor Yuan C. Chou

Yuan C. Chou has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20120131311
    Abstract: The disclosed embodiments provide a system that facilitates prefetching an instruction cache line in a processor. During execution of the processor, the system performs a current instruction cache access which is directed to a current cache line. If the current instruction cache access causes a cache miss or is a first demand fetch for a previously prefetched cache line, the system determines whether the current instruction cache access is discontinuous with a preceding instruction cache access. If so, the system completes the current instruction cache access by performing a cache access to service the cache miss or the first demand fetch, and also prefetching a predicted cache line associated with a discontinuous instruction cache access which is predicted to follow the current instruction cache access.
    Type: Application
    Filed: November 23, 2010
    Publication date: May 24, 2012
    Applicant: ORACLE INTERNATIONAL CORPORATION
    Inventor: Yuan C. Chou
  • Patent number: 8099586
    Abstract: A system and method for reducing branch misprediction penalty. In response to detecting a mispredicted branch instruction, circuitry within a microprocessor identifies a predetermined condition prior to retirement of the branch instruction. Upon identifying this condition, the entire corresponding pipeline is flushed prior to retirement of the branch instruction, and instruction fetch is started at a corresponding address of an oldest instruction in the pipeline immediately prior to the flushing of the pipeline. The correct outcome is stored prior to the pipeline flush. In order to distinguish the mispredicted branch from other instructions, identification information may be stored alongside the correct outcome. One example of the predetermined condition being satisfied is in response to a timer reaching a predetermined threshold value, wherein the timer begins incrementing in response to the mispredicted branch detection and resets at retirement of the mispredicted branch.
    Type: Grant
    Filed: December 30, 2008
    Date of Patent: January 17, 2012
    Assignee: Oracle America, Inc.
    Inventors: Yuan C. Chou, Robert T. Golla, Mark A. Luttrell, Paul J. Jordan, Manish Shah
  • Patent number: 8086804
    Abstract: A method for pre-fetching data. The method includes obtaining a pre-fetch request. The pre-fetch request identifies new data to pre-fetch from memory and store in a cache. The method further includes identifying a set in the cache to store the new data and identifying a value of a hotness indicator for the set. The hotness indicator value defines a number of replacements of at least one line in the set. The method further includes determining whether the value of the hotness indicator exceeds a predefined threshold, and storing the new data in the set when the value of the hotness indicator does not exceed the pre-defined threshold.
    Type: Grant
    Filed: September 24, 2008
    Date of Patent: December 27, 2011
    Assignee: Oracle America, Inc.
    Inventor: Yuan C. Chou
  • Publication number: 20110276760
    Abstract: Techniques relating to a processor that supports a non-committing store instruction that is executable during a scouting thread to provide data to a subsequently executed load instruction. The processor may include a memory access unit configured to perform an instance of the non-committing store instruction by storing a value in an entry of a store buffer without committing the instance of the non-committing store instruction. In response to subsequently receiving an instance of a load instruction of the scouting thread that specifies a load from the memory address, the memory access unit is configured to perform the instance of the load instruction by retrieving the value. The memory access unit may retrieve the value from the store buffer or from a cache of the processor.
    Type: Application
    Filed: May 6, 2010
    Publication date: November 10, 2011
    Inventor: Yuan C. Chou
  • Publication number: 20110258415
    Abstract: Techniques for handling dependency conditions, including evil twin conditions, are disclosed herein. An instruction may designate a source register comprising two portions. The source register may be a double-precision register and its two portions may be single-precision portions, each specified as destinations by two other single-precision instructions. Execution of these two single-precision instructions, especially on a register renaming machine, may result in the appropriate values for the two portions of the source register being stored in different physical locations, which can complicate execution of an instruction stream. In response to detecting a potential dependency, one or more instructions may be inserted in an instruction stream to enable the appropriate values to be stored within one physical double precision register, eliminating an actual or potential evil twin dependency.
    Type: Application
    Filed: June 30, 2011
    Publication date: October 20, 2011
    Applicant: SUN MICROSYSTEMS, INC.
    Inventors: Yuan C. Chou, Jared C. Smolens, Jeffrey S. Brooks
  • Publication number: 20110179230
    Abstract: A method of read-set and write-set management distinguishes between shared and non-shared memory regions. A shared memory region, used by a transactional memory application, which may be shared by one or more concurrent transactions is identified. A non-shared memory region, used by the transactional memory application, which is not shared by the one or more concurrent transactions is identified. A subset of a read-set and a write-set that access the shared memory region is checked for conflicts with the one or more concurrent transactions at a first granularity. A subset of the read-set and the write-set that access the non-shared memory region is checked for conflicts with the one or more concurrent transactions at a second granularity. The first granularity is finer than the second granularity.
    Type: Application
    Filed: January 15, 2010
    Publication date: July 21, 2011
    Applicant: Sun Microsystems, Inc.
    Inventor: Yuan C. Chou
  • Patent number: 7984265
    Abstract: A computer processor and a method of using the computer processor take advantage of information in the event address register of the computer processor by saving information from the event address register to an event address register history buffer. Thus, the event address register history buffer includes a cluster of events associated with execution of a computer program. The cluster of events is analyzed and the computer program modified, either statically or dynamically, to eliminate or at least ameliorate the effects of such events in further execution of the computer program.
    Type: Grant
    Filed: May 16, 2008
    Date of Patent: July 19, 2011
    Assignee: Oracle America, Inc.
    Inventors: Wei Chung Hsu, Yuan C. Chou
  • Patent number: 7925865
    Abstract: In the described embodiments, a method for prefetching data and/or instructions may include generating control flow information for each retired branch instruction. A correlation table may be maintained based on the generated control flow information and cache miss addresses for each retired instruction that incurs one or more cache misses. Each correlation table entry may correspond to an index, and may contain a tag and a correlation list. The correlation list may consist of a specified number of cache miss addresses that most frequently follow the cache miss address for the index. A prefetch operation may be performed for each cache miss based on the contents of the correlation table entry corresponding to the index. The index may generated using a combination of bits of a given cache miss address and one or more bits of the program control flow information for the given cache miss address.
    Type: Grant
    Filed: June 2, 2008
    Date of Patent: April 12, 2011
    Assignee: Oracle America, Inc.
    Inventors: Yuan C. Chou, Yasuko Watanabe
  • Publication number: 20100274992
    Abstract: Techniques for handling dependency conditions, including evil twin conditions, are disclosed herein. An instruction may designate a source register comprising two portions. The source register may be a double-precision register and its two portions may be single-precision portions, each specified as destinations by two other single-precision instructions. Execution of these two single-precision instructions, especially on a register renaming machine, may result in the appropriate values for the two portions of the source register being stored in different physical locations, which can complicate execution of an instruction stream. In response to detecting a potential dependency, one or more instructions may be inserted in an instruction stream to enable the appropriate values to be stored within one physical double precision register, eliminating an actual or potential evil twin dependency.
    Type: Application
    Filed: April 22, 2009
    Publication date: October 28, 2010
    Inventors: Yuan C. Chou, Jared C. Smolens, Jeffrey S. Brooks
  • Publication number: 20100274994
    Abstract: Various techniques for mitigating dependencies between groups of instructions are disclosed. In one embodiment, such dependencies include “evil twin” conditions, in which a first floating-point instruction has as a destination a first portion of a logical floating-point register (e.g., a single-precision write), and in which a second, subsequent floating-point instruction has as a source the first portion and a second portion of the same logical floating-point register (e.g., a double-precision read). The disclosed techniques may be applicable in a multithreaded processor implementing register renaming. In one embodiment, a processor may enter an operating mode in which detection of evil twin “producers” (e.g., single-precision writes) causes the instruction sequence to be modified to break potential dependencies. Modification of the instruction sequence may continue until one or more exit criteria are reached (e.g., committing a predetermined number of single-precision writes).
    Type: Application
    Filed: April 22, 2009
    Publication date: October 28, 2010
    Inventors: Robert T. Golla, Paul J. Jordan, Jama I. Barreh, Matthew B. Smittel, Yuan C. Chou, Jared C. Smolens
  • Patent number: 7793044
    Abstract: In accordance with one embodiment, an enhanced chip multiprocessor permits an L1 cache to request ownership of a data line from a shared L2 cache. A determination is made whether to deny or grant the request for ownership based on the sharing of the data line. In one embodiment, the sharing of the data line is determined from an enhanced L2 cache directory entry associated with the data line. If ownership of the data line is granted, the current data line is passed from the shared L2 to the requesting L1 cache and an associated enhanced L1 cache directory entry and the enhanced L2 cache directory entry are updated to reflect the L1 cache ownership of the data line. Consequently, updates of the data line by the L1 cache do not go through the shared L2 cache, thus reducing transaction pressure on the shared L2 cache.
    Type: Grant
    Filed: January 16, 2007
    Date of Patent: September 7, 2010
    Assignee: Oracle America, Inc.
    Inventors: Lawrence A. Spracklen, Yuan C. Chou, Santosh G. Abraham
  • Patent number: 7757047
    Abstract: Maintaining a cache of indications of exclusively-owned coherence state for memory space units (e.g., cache line) allows reduction, if not elimination, of delay from missing store operations. In addition, the indications are maintained without corresponding data of the memory space unit, thus allowing representation of a large memory space with a relatively small missing store operation accelerator. With the missing store operation accelerator, a store operation, which misses in low-latency memory (e.g., L1 or L2 cache), proceeds as if the targeted memory space unit resides in the low-latency memory, if indicated in the missing store operation accelerator. When a store operation misses in low-latency memory and hits in the accelerator, a positive acknowledgement is transmitted to the writing processing unit allowing the store operation to proceed. An entry is allocated for the store operation, the store data is written into the allocated entry, and the target of the store operation is requested from memory.
    Type: Grant
    Filed: November 12, 2005
    Date of Patent: July 13, 2010
    Assignee: Oracle America, Inc.
    Inventors: Santosh G. Abraham, Lawrence A. Spracklen, Yuan C. Chou
  • Publication number: 20100169611
    Abstract: A system and method for reducing branch misprediction penalty. In response to detecting a mispredicted branch instruction, circuitry within a microprocessor identifies a predetermined condition prior to retirement of the branch instruction. Upon identifying this condition, the entire corresponding pipeline is flushed prior to retirement of the branch instruction, and instruction fetch is started at a corresponding address of an oldest instruction in the pipeline immediately prior to the flushing of the pipeline. The correct outcome is stored prior to the pipeline flush. In order to distinguish the mispredicted branch from other instructions, identification information may be stored alongside the correct outcome. One example of the predetermined condition being satisfied is in response to a timer reaching a predetermined threshold value, wherein the timer begins incrementing in response to the mispredicted branch detection and resets at retirement of the mispredicted branch.
    Type: Application
    Filed: December 30, 2008
    Publication date: July 1, 2010
    Inventors: Yuan C. Chou, Robert T. Golla, Mark A. Luttrell, Paul J. Jordan, Manish Shah
  • Publication number: 20100077154
    Abstract: A method for pre-fetching data. The method includes obtaining a pre-fetch request. The pre-fetch request identifies new data to pre-fetch from memory and store in a cache. The method further includes identifying a set in the cache to store the new data and identifying a value of a hotness indicator for the set. The hotness indicator value defines a number of replacements of at least one line in the set. The method further includes determining whether the value of the hotness indicator exceeds a predefined threshold, and storing the new data in the set when the value of the hotness indicator does not exceed the pre-defined threshold.
    Type: Application
    Filed: September 24, 2008
    Publication date: March 25, 2010
    Applicant: SUN MICROSYSTEMS, INC.
    Inventor: Yuan C. Chou
  • Patent number: 7650485
    Abstract: A multithreading processor achieves a very large lookahead instruction window by allowing non-sequential fetch and processing of the dynamic instruction stream. A speculative thread is spawned at a specified point in the dynamic instruction stream and the instructions subsequent to the specified point are speculatively executed so that these instructions are fetched and issued out of sequential order. Very minimal modifications to existing processor design of a multithreading processor are required to achieve the very large lookahead instruction window. The modifications include changes to the control logic of the issue unit, only three additional bits in the register scoreboard.
    Type: Grant
    Filed: April 10, 2007
    Date of Patent: January 19, 2010
    Assignee: Sun Microsystems, Inc.
    Inventor: Yuan C. Chou
  • Publication number: 20090300340
    Abstract: A method for prefetching data and/or instructions from a main memory to a cache memory may include generating control flow information by storing respective information for each retired branch instruction. The method may further include storing respective one or more cache miss addresses for each retired instruction that incurs one or more cache misses, with the respective one or more cache miss addresses corresponding respectively to the one or more cache misses. A correlation table may be maintained based on the generated control flow information and the stored cache miss addresses. Each respective correlation table entry may correspond to a respective index, and may contain a respective tag and a respective correlation list. The correlation list may consist of a specified number of cache miss addresses that most frequently follow the cache miss address used in generating the index to which the respective correlation table entry corresponds.
    Type: Application
    Filed: June 2, 2008
    Publication date: December 3, 2009
    Inventors: Yuan C. Chou, Yasuko Watanabe
  • Publication number: 20090287903
    Abstract: A computer processor and a method of using the computer processor take advantage of information in the event address register of the computer processor by saving information from the event address register to an event address register history buffer. Thus, the event address register history buffer includes a cluster of events associated with execution of a computer program. The cluster of events is analyzed and the computer program modified, either statically or dynamically, to eliminate or at least ameliorate the effects of such events in further execution of the computer program.
    Type: Application
    Filed: May 16, 2008
    Publication date: November 19, 2009
    Inventors: Wei Chung Hsu, Yuan C. Chou
  • Patent number: 7600098
    Abstract: A method and system for efficient implementation of a large store buffer within a processor includes a store buffer within a processor having a first component configured to hold a plurality of younger stores requested by the processor and a second component configured to hold a plurality of older stores. The first component is implemented as a small content addressable memory (CAM) and the second component includes a first-in-first-out (FIFO) buffer to hold the data and addresses of the plurality of older stores and an address disambiguator to hold the addresses of each of the plurality of older stores found in the FIFO buffer. The processor uses the small CAM to perform most of the store-to-load forwarding in a fast and efficient way thereby enhancing processor performance.
    Type: Grant
    Filed: September 29, 2006
    Date of Patent: October 6, 2009
    Assignee: Sun Microsystems, Inc.
    Inventor: Yuan C. Chou
  • Patent number: 7543282
    Abstract: One embodiment of the present invention provides a system that selectively executes different versions of executable code for the same source code. During operation, the system first receives an executable code module which includes two or more versions of executable code for the same source code, wherein the two or more versions of the executable code are optimized in different ways. Next, the system executes the executable code module by first evaluating a test condition, and subsequently executing a specific version of the executable code based on the outcome of the evaluation, so that the execution is optimized for the test condition.
    Type: Grant
    Filed: March 24, 2006
    Date of Patent: June 2, 2009
    Assignee: Sun Microsystems, Inc.
    Inventor: Yuan C. Chou
  • Patent number: 7543112
    Abstract: The storage of data line in one or more L1 caches and/or a shared L2 cache of a chip multiprocessor is dynamically optimized based on the sharing of the data line. In one embodiment, an enhanced L2 cache directory entry associated with the data line is generated in an L2 cache directory of the shared L2 cache. The enhanced L2 cache directory entry includes a cache mask indicating a storage state of the data line in the one or more L1 caches and the shared L2 cache. In some embodiments, where the data line is stored in the shared L2 cache only, a portion of the cache mask indicates a storage history of the data line in the one or more L2 caches.
    Type: Grant
    Filed: June 20, 2006
    Date of Patent: June 2, 2009
    Assignee: Sun Microsystems, Inc.
    Inventors: Yuan C. Chou, Santosh G. Abraham, Lawrence A. Spracklen