Patents by Inventor Yuan C. Chou

Yuan C. Chou has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20100274994
    Abstract: Various techniques for mitigating dependencies between groups of instructions are disclosed. In one embodiment, such dependencies include “evil twin” conditions, in which a first floating-point instruction has as a destination a first portion of a logical floating-point register (e.g., a single-precision write), and in which a second, subsequent floating-point instruction has as a source the first portion and a second portion of the same logical floating-point register (e.g., a double-precision read). The disclosed techniques may be applicable in a multithreaded processor implementing register renaming. In one embodiment, a processor may enter an operating mode in which detection of evil twin “producers” (e.g., single-precision writes) causes the instruction sequence to be modified to break potential dependencies. Modification of the instruction sequence may continue until one or more exit criteria are reached (e.g., committing a predetermined number of single-precision writes).
    Type: Application
    Filed: April 22, 2009
    Publication date: October 28, 2010
    Inventors: Robert T. Golla, Paul J. Jordan, Jama I. Barreh, Matthew B. Smittel, Yuan C. Chou, Jared C. Smolens
  • Publication number: 20100274992
    Abstract: Techniques for handling dependency conditions, including evil twin conditions, are disclosed herein. An instruction may designate a source register comprising two portions. The source register may be a double-precision register and its two portions may be single-precision portions, each specified as destinations by two other single-precision instructions. Execution of these two single-precision instructions, especially on a register renaming machine, may result in the appropriate values for the two portions of the source register being stored in different physical locations, which can complicate execution of an instruction stream. In response to detecting a potential dependency, one or more instructions may be inserted in an instruction stream to enable the appropriate values to be stored within one physical double precision register, eliminating an actual or potential evil twin dependency.
    Type: Application
    Filed: April 22, 2009
    Publication date: October 28, 2010
    Inventors: Yuan C. Chou, Jared C. Smolens, Jeffrey S. Brooks
  • Patent number: 7793044
    Abstract: In accordance with one embodiment, an enhanced chip multiprocessor permits an L1 cache to request ownership of a data line from a shared L2 cache. A determination is made whether to deny or grant the request for ownership based on the sharing of the data line. In one embodiment, the sharing of the data line is determined from an enhanced L2 cache directory entry associated with the data line. If ownership of the data line is granted, the current data line is passed from the shared L2 to the requesting L1 cache and an associated enhanced L1 cache directory entry and the enhanced L2 cache directory entry are updated to reflect the L1 cache ownership of the data line. Consequently, updates of the data line by the L1 cache do not go through the shared L2 cache, thus reducing transaction pressure on the shared L2 cache.
    Type: Grant
    Filed: January 16, 2007
    Date of Patent: September 7, 2010
    Assignee: Oracle America, Inc.
    Inventors: Lawrence A. Spracklen, Yuan C. Chou, Santosh G. Abraham
  • Patent number: 7757047
    Abstract: Maintaining a cache of indications of exclusively-owned coherence state for memory space units (e.g., cache line) allows reduction, if not elimination, of delay from missing store operations. In addition, the indications are maintained without corresponding data of the memory space unit, thus allowing representation of a large memory space with a relatively small missing store operation accelerator. With the missing store operation accelerator, a store operation, which misses in low-latency memory (e.g., L1 or L2 cache), proceeds as if the targeted memory space unit resides in the low-latency memory, if indicated in the missing store operation accelerator. When a store operation misses in low-latency memory and hits in the accelerator, a positive acknowledgement is transmitted to the writing processing unit allowing the store operation to proceed. An entry is allocated for the store operation, the store data is written into the allocated entry, and the target of the store operation is requested from memory.
    Type: Grant
    Filed: November 12, 2005
    Date of Patent: July 13, 2010
    Assignee: Oracle America, Inc.
    Inventors: Santosh G. Abraham, Lawrence A. Spracklen, Yuan C. Chou
  • Publication number: 20100169611
    Abstract: A system and method for reducing branch misprediction penalty. In response to detecting a mispredicted branch instruction, circuitry within a microprocessor identifies a predetermined condition prior to retirement of the branch instruction. Upon identifying this condition, the entire corresponding pipeline is flushed prior to retirement of the branch instruction, and instruction fetch is started at a corresponding address of an oldest instruction in the pipeline immediately prior to the flushing of the pipeline. The correct outcome is stored prior to the pipeline flush. In order to distinguish the mispredicted branch from other instructions, identification information may be stored alongside the correct outcome. One example of the predetermined condition being satisfied is in response to a timer reaching a predetermined threshold value, wherein the timer begins incrementing in response to the mispredicted branch detection and resets at retirement of the mispredicted branch.
    Type: Application
    Filed: December 30, 2008
    Publication date: July 1, 2010
    Inventors: Yuan C. Chou, Robert T. Golla, Mark A. Luttrell, Paul J. Jordan, Manish Shah
  • Publication number: 20100077154
    Abstract: A method for pre-fetching data. The method includes obtaining a pre-fetch request. The pre-fetch request identifies new data to pre-fetch from memory and store in a cache. The method further includes identifying a set in the cache to store the new data and identifying a value of a hotness indicator for the set. The hotness indicator value defines a number of replacements of at least one line in the set. The method further includes determining whether the value of the hotness indicator exceeds a predefined threshold, and storing the new data in the set when the value of the hotness indicator does not exceed the pre-defined threshold.
    Type: Application
    Filed: September 24, 2008
    Publication date: March 25, 2010
    Applicant: SUN MICROSYSTEMS, INC.
    Inventor: Yuan C. Chou
  • Patent number: 7650485
    Abstract: A multithreading processor achieves a very large lookahead instruction window by allowing non-sequential fetch and processing of the dynamic instruction stream. A speculative thread is spawned at a specified point in the dynamic instruction stream and the instructions subsequent to the specified point are speculatively executed so that these instructions are fetched and issued out of sequential order. Very minimal modifications to existing processor design of a multithreading processor are required to achieve the very large lookahead instruction window. The modifications include changes to the control logic of the issue unit, only three additional bits in the register scoreboard.
    Type: Grant
    Filed: April 10, 2007
    Date of Patent: January 19, 2010
    Assignee: Sun Microsystems, Inc.
    Inventor: Yuan C. Chou
  • Publication number: 20090300340
    Abstract: A method for prefetching data and/or instructions from a main memory to a cache memory may include generating control flow information by storing respective information for each retired branch instruction. The method may further include storing respective one or more cache miss addresses for each retired instruction that incurs one or more cache misses, with the respective one or more cache miss addresses corresponding respectively to the one or more cache misses. A correlation table may be maintained based on the generated control flow information and the stored cache miss addresses. Each respective correlation table entry may correspond to a respective index, and may contain a respective tag and a respective correlation list. The correlation list may consist of a specified number of cache miss addresses that most frequently follow the cache miss address used in generating the index to which the respective correlation table entry corresponds.
    Type: Application
    Filed: June 2, 2008
    Publication date: December 3, 2009
    Inventors: Yuan C. Chou, Yasuko Watanabe
  • Publication number: 20090287903
    Abstract: A computer processor and a method of using the computer processor take advantage of information in the event address register of the computer processor by saving information from the event address register to an event address register history buffer. Thus, the event address register history buffer includes a cluster of events associated with execution of a computer program. The cluster of events is analyzed and the computer program modified, either statically or dynamically, to eliminate or at least ameliorate the effects of such events in further execution of the computer program.
    Type: Application
    Filed: May 16, 2008
    Publication date: November 19, 2009
    Inventors: Wei Chung Hsu, Yuan C. Chou
  • Patent number: 7600098
    Abstract: A method and system for efficient implementation of a large store buffer within a processor includes a store buffer within a processor having a first component configured to hold a plurality of younger stores requested by the processor and a second component configured to hold a plurality of older stores. The first component is implemented as a small content addressable memory (CAM) and the second component includes a first-in-first-out (FIFO) buffer to hold the data and addresses of the plurality of older stores and an address disambiguator to hold the addresses of each of the plurality of older stores found in the FIFO buffer. The processor uses the small CAM to perform most of the store-to-load forwarding in a fast and efficient way thereby enhancing processor performance.
    Type: Grant
    Filed: September 29, 2006
    Date of Patent: October 6, 2009
    Assignee: Sun Microsystems, Inc.
    Inventor: Yuan C. Chou
  • Patent number: 7543112
    Abstract: The storage of data line in one or more L1 caches and/or a shared L2 cache of a chip multiprocessor is dynamically optimized based on the sharing of the data line. In one embodiment, an enhanced L2 cache directory entry associated with the data line is generated in an L2 cache directory of the shared L2 cache. The enhanced L2 cache directory entry includes a cache mask indicating a storage state of the data line in the one or more L1 caches and the shared L2 cache. In some embodiments, where the data line is stored in the shared L2 cache only, a portion of the cache mask indicates a storage history of the data line in the one or more L2 caches.
    Type: Grant
    Filed: June 20, 2006
    Date of Patent: June 2, 2009
    Assignee: Sun Microsystems, Inc.
    Inventors: Yuan C. Chou, Santosh G. Abraham, Lawrence A. Spracklen
  • Patent number: 7543282
    Abstract: One embodiment of the present invention provides a system that selectively executes different versions of executable code for the same source code. During operation, the system first receives an executable code module which includes two or more versions of executable code for the same source code, wherein the two or more versions of the executable code are optimized in different ways. Next, the system executes the executable code module by first evaluating a test condition, and subsequently executing a specific version of the executable code based on the outcome of the evaluation, so that the execution is optimized for the test condition.
    Type: Grant
    Filed: March 24, 2006
    Date of Patent: June 2, 2009
    Assignee: Sun Microsystems, Inc.
    Inventor: Yuan C. Chou
  • Patent number: 7529911
    Abstract: One embodiment of the present invention provides a system that improves the effectiveness of prefetching during execution of instructions in scout mode. Upon encountering a non-data dependent stall condition, the system performs a checkpoint and commences execution of instructions in scout mode, wherein instructions are speculatively executed to prefetch future memory operations, but wherein results are not committed to the architectural state of a processor. When the system executes a load instruction during scout mode, if the load instruction causes a lower-level cache miss, the system allows the load instruction to access a higher-level cache. Next, the system places the load instruction and subsequent dependent instructions into a deferred queue, and resumes execution of the program in scout mode.
    Type: Grant
    Filed: May 26, 2005
    Date of Patent: May 5, 2009
    Assignee: Sun Microsystems, Inc.
    Inventors: Lawrence A. Spracklen, Yuan C. Chou, Santosh G. Abraham
  • Publication number: 20090106495
    Abstract: A method is disclosed that uses a non-coherent store instruction to reduce inter-thread communication latency between threads sharing a level one write-through cache. When a thread executes the non-coherent store instruction, the level one cache is immediately updated with the data value. The data value is immediately available to another thread sharing the level-one write-through cache. A computer system having reduced inter-thread communication latency is disclosed. The computer system includes a first plurality of processor cores, each processor core including a second plurality of processing engines sharing a level one write-through cache. The level one caches are connected to a level two cache via a crossbar switch. The computer system further implements a non-coherent store instruction that updates a data value in the level one cache prior to updating the corresponding data value in the level two cache.
    Type: Application
    Filed: October 23, 2007
    Publication date: April 23, 2009
    Applicant: Sun Microsystems, Inc.
    Inventor: Yuan C. Chou
  • Patent number: 7487296
    Abstract: A multi-stride prefetcher includes a recurring prefetch table that in turn includes a stream table and an index table. The stream table includes a valid field and a tag field. The stream table also includes a thread number field to help support multi-threaded processor cores. The tag field stores a tag from an address associated with a cache miss. The index table includes fields for storing information characterizing a state machine. The fields include a learning bit. The multi-stride prefetcher prefetches data into a cache for a plurality of streams of cache misses, each stream having a plurality of strides.
    Type: Grant
    Filed: February 17, 2005
    Date of Patent: February 3, 2009
    Assignee: Sun Microsystems, Inc.
    Inventors: Sorin Iacobovici, Sudarshan Kadambi, Yuan C. Chou
  • Patent number: 7475230
    Abstract: One embodiment of the present invention provides a system that performs register file checkpointing to support speculative execution within a processor. During operation, the system commences speculative execution of a program from a point of speculation, at which the outcome of a long latency instruction is speculatively predicted. During this speculative execution, registers are updated by checkpointing an old value of the register, if the register has not already been checkpointed, and then updating the architectural state of the register with the new value. In this way, only registers that are updated during the speculative execution are checkpointed, instead of checkpointing all of the architectural registers prior to commencing speculative execution.
    Type: Grant
    Filed: May 16, 2003
    Date of Patent: January 6, 2009
    Assignee: Sun Microsystems, Inc.
    Inventors: Yuan C. Chou, Santosh G. Abraham
  • Patent number: 7457923
    Abstract: A dynamic prediction is made whether a load instruction will miss a cache. Data is prefetched for the load instruction when a cache miss is predicted. Thus, the prefetch is only performed if a trigger event correlated with a cache miss for that load instruction is detected. This selective execution of the prefetches for a particular load instruction improves processor utilization and performance.
    Type: Grant
    Filed: May 11, 2005
    Date of Patent: November 25, 2008
    Assignee: Sun Microsystems, Inc.
    Inventors: Yuan C. Chou, Wei Chung Hsu
  • Patent number: 7373482
    Abstract: One embodiment of the present invention provides a system that improves the effectiveness of prefetching during execution of instructions in scout mode. During operation, the system executes program instructions in a normal-execution mode. Upon encountering a condition which causes the processor to enter scout mode, the system performs a checkpoint and commences execution of instructions in scout mode, wherein the instructions are speculatively executed to prefetch future memory operations, but wherein results are not committed to the architectural state of a processor. During execution of a load instruction during scout mode, if the load instruction is a special load instruction and if the load instruction causes a lower-level cache miss, the system waits for data to be returned from a higher-level cache before resuming execution of subsequent instructions in scout mode, instead of disregarding the result of the load instruction and immediately resuming execution in scout mode.
    Type: Grant
    Filed: May 26, 2005
    Date of Patent: May 13, 2008
    Assignee: Sun Microsystems, Inc.
    Inventors: Lawrence A. Spracklen, Yuan C. Chou, Santosh G. Abraham
  • Patent number: 7340567
    Abstract: Typically, missing read operations instances account for a small fraction of the operations instances of an application, but for nearly all of the performance degradation due to access latency. Hence, a small predictor structure maintains sufficient information for performing value prediction for the small fraction of operations (the missing instances of read operations) that account for nearly all of the access latency performance degradation. With such a small predictor structure, a processor value predicts for selective instances of read operations, those selective instances being read operations that are unavailable in a first memory (e.g., those instances of read operations that miss in L2 cache). Respective actual values for prior missing instances of the read operations are stored and used for value predictions of respective subsequent instances of the read operations.
    Type: Grant
    Filed: April 14, 2004
    Date of Patent: March 4, 2008
    Assignee: Sun Microsystems, Inc.
    Inventors: Yuan C. Chou, Santosh G. Abraham
  • Patent number: 7127592
    Abstract: One embodiment of the present invention provides a system that dynamically allocates physical registers in a windowed processor architecture. The system includes a physical register file and a register map that maps architectural registers defined within an executing program to physical registers within the physical register file. The system also includes a window allocation mechanism that allocates a new name space for a register window without allocating physical registers for the register window, thereby allowing the physical registers to be dynamically allocated as needed instead of being allocated at window initialization time.
    Type: Grant
    Filed: January 8, 2003
    Date of Patent: October 24, 2006
    Assignee: SUN Microsystems, Inc.
    Inventors: Santosh G. Abraham, Yuan C. Chou