Patents by Inventor Yuan C. Chou
Yuan C. Chou has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20120131311Abstract: The disclosed embodiments provide a system that facilitates prefetching an instruction cache line in a processor. During execution of the processor, the system performs a current instruction cache access which is directed to a current cache line. If the current instruction cache access causes a cache miss or is a first demand fetch for a previously prefetched cache line, the system determines whether the current instruction cache access is discontinuous with a preceding instruction cache access. If so, the system completes the current instruction cache access by performing a cache access to service the cache miss or the first demand fetch, and also prefetching a predicted cache line associated with a discontinuous instruction cache access which is predicted to follow the current instruction cache access.Type: ApplicationFiled: November 23, 2010Publication date: May 24, 2012Applicant: ORACLE INTERNATIONAL CORPORATIONInventor: Yuan C. Chou
-
Patent number: 8099586Abstract: A system and method for reducing branch misprediction penalty. In response to detecting a mispredicted branch instruction, circuitry within a microprocessor identifies a predetermined condition prior to retirement of the branch instruction. Upon identifying this condition, the entire corresponding pipeline is flushed prior to retirement of the branch instruction, and instruction fetch is started at a corresponding address of an oldest instruction in the pipeline immediately prior to the flushing of the pipeline. The correct outcome is stored prior to the pipeline flush. In order to distinguish the mispredicted branch from other instructions, identification information may be stored alongside the correct outcome. One example of the predetermined condition being satisfied is in response to a timer reaching a predetermined threshold value, wherein the timer begins incrementing in response to the mispredicted branch detection and resets at retirement of the mispredicted branch.Type: GrantFiled: December 30, 2008Date of Patent: January 17, 2012Assignee: Oracle America, Inc.Inventors: Yuan C. Chou, Robert T. Golla, Mark A. Luttrell, Paul J. Jordan, Manish Shah
-
Patent number: 8086804Abstract: A method for pre-fetching data. The method includes obtaining a pre-fetch request. The pre-fetch request identifies new data to pre-fetch from memory and store in a cache. The method further includes identifying a set in the cache to store the new data and identifying a value of a hotness indicator for the set. The hotness indicator value defines a number of replacements of at least one line in the set. The method further includes determining whether the value of the hotness indicator exceeds a predefined threshold, and storing the new data in the set when the value of the hotness indicator does not exceed the pre-defined threshold.Type: GrantFiled: September 24, 2008Date of Patent: December 27, 2011Assignee: Oracle America, Inc.Inventor: Yuan C. Chou
-
Publication number: 20110276760Abstract: Techniques relating to a processor that supports a non-committing store instruction that is executable during a scouting thread to provide data to a subsequently executed load instruction. The processor may include a memory access unit configured to perform an instance of the non-committing store instruction by storing a value in an entry of a store buffer without committing the instance of the non-committing store instruction. In response to subsequently receiving an instance of a load instruction of the scouting thread that specifies a load from the memory address, the memory access unit is configured to perform the instance of the load instruction by retrieving the value. The memory access unit may retrieve the value from the store buffer or from a cache of the processor.Type: ApplicationFiled: May 6, 2010Publication date: November 10, 2011Inventor: Yuan C. Chou
-
Publication number: 20110258415Abstract: Techniques for handling dependency conditions, including evil twin conditions, are disclosed herein. An instruction may designate a source register comprising two portions. The source register may be a double-precision register and its two portions may be single-precision portions, each specified as destinations by two other single-precision instructions. Execution of these two single-precision instructions, especially on a register renaming machine, may result in the appropriate values for the two portions of the source register being stored in different physical locations, which can complicate execution of an instruction stream. In response to detecting a potential dependency, one or more instructions may be inserted in an instruction stream to enable the appropriate values to be stored within one physical double precision register, eliminating an actual or potential evil twin dependency.Type: ApplicationFiled: June 30, 2011Publication date: October 20, 2011Applicant: SUN MICROSYSTEMS, INC.Inventors: Yuan C. Chou, Jared C. Smolens, Jeffrey S. Brooks
-
Publication number: 20110179230Abstract: A method of read-set and write-set management distinguishes between shared and non-shared memory regions. A shared memory region, used by a transactional memory application, which may be shared by one or more concurrent transactions is identified. A non-shared memory region, used by the transactional memory application, which is not shared by the one or more concurrent transactions is identified. A subset of a read-set and a write-set that access the shared memory region is checked for conflicts with the one or more concurrent transactions at a first granularity. A subset of the read-set and the write-set that access the non-shared memory region is checked for conflicts with the one or more concurrent transactions at a second granularity. The first granularity is finer than the second granularity.Type: ApplicationFiled: January 15, 2010Publication date: July 21, 2011Applicant: Sun Microsystems, Inc.Inventor: Yuan C. Chou
-
Patent number: 7984265Abstract: A computer processor and a method of using the computer processor take advantage of information in the event address register of the computer processor by saving information from the event address register to an event address register history buffer. Thus, the event address register history buffer includes a cluster of events associated with execution of a computer program. The cluster of events is analyzed and the computer program modified, either statically or dynamically, to eliminate or at least ameliorate the effects of such events in further execution of the computer program.Type: GrantFiled: May 16, 2008Date of Patent: July 19, 2011Assignee: Oracle America, Inc.Inventors: Wei Chung Hsu, Yuan C. Chou
-
Patent number: 7925865Abstract: In the described embodiments, a method for prefetching data and/or instructions may include generating control flow information for each retired branch instruction. A correlation table may be maintained based on the generated control flow information and cache miss addresses for each retired instruction that incurs one or more cache misses. Each correlation table entry may correspond to an index, and may contain a tag and a correlation list. The correlation list may consist of a specified number of cache miss addresses that most frequently follow the cache miss address for the index. A prefetch operation may be performed for each cache miss based on the contents of the correlation table entry corresponding to the index. The index may generated using a combination of bits of a given cache miss address and one or more bits of the program control flow information for the given cache miss address.Type: GrantFiled: June 2, 2008Date of Patent: April 12, 2011Assignee: Oracle America, Inc.Inventors: Yuan C. Chou, Yasuko Watanabe
-
Publication number: 20100274992Abstract: Techniques for handling dependency conditions, including evil twin conditions, are disclosed herein. An instruction may designate a source register comprising two portions. The source register may be a double-precision register and its two portions may be single-precision portions, each specified as destinations by two other single-precision instructions. Execution of these two single-precision instructions, especially on a register renaming machine, may result in the appropriate values for the two portions of the source register being stored in different physical locations, which can complicate execution of an instruction stream. In response to detecting a potential dependency, one or more instructions may be inserted in an instruction stream to enable the appropriate values to be stored within one physical double precision register, eliminating an actual or potential evil twin dependency.Type: ApplicationFiled: April 22, 2009Publication date: October 28, 2010Inventors: Yuan C. Chou, Jared C. Smolens, Jeffrey S. Brooks
-
Publication number: 20100274994Abstract: Various techniques for mitigating dependencies between groups of instructions are disclosed. In one embodiment, such dependencies include “evil twin” conditions, in which a first floating-point instruction has as a destination a first portion of a logical floating-point register (e.g., a single-precision write), and in which a second, subsequent floating-point instruction has as a source the first portion and a second portion of the same logical floating-point register (e.g., a double-precision read). The disclosed techniques may be applicable in a multithreaded processor implementing register renaming. In one embodiment, a processor may enter an operating mode in which detection of evil twin “producers” (e.g., single-precision writes) causes the instruction sequence to be modified to break potential dependencies. Modification of the instruction sequence may continue until one or more exit criteria are reached (e.g., committing a predetermined number of single-precision writes).Type: ApplicationFiled: April 22, 2009Publication date: October 28, 2010Inventors: Robert T. Golla, Paul J. Jordan, Jama I. Barreh, Matthew B. Smittel, Yuan C. Chou, Jared C. Smolens
-
Patent number: 7793044Abstract: In accordance with one embodiment, an enhanced chip multiprocessor permits an L1 cache to request ownership of a data line from a shared L2 cache. A determination is made whether to deny or grant the request for ownership based on the sharing of the data line. In one embodiment, the sharing of the data line is determined from an enhanced L2 cache directory entry associated with the data line. If ownership of the data line is granted, the current data line is passed from the shared L2 to the requesting L1 cache and an associated enhanced L1 cache directory entry and the enhanced L2 cache directory entry are updated to reflect the L1 cache ownership of the data line. Consequently, updates of the data line by the L1 cache do not go through the shared L2 cache, thus reducing transaction pressure on the shared L2 cache.Type: GrantFiled: January 16, 2007Date of Patent: September 7, 2010Assignee: Oracle America, Inc.Inventors: Lawrence A. Spracklen, Yuan C. Chou, Santosh G. Abraham
-
Patent number: 7757047Abstract: Maintaining a cache of indications of exclusively-owned coherence state for memory space units (e.g., cache line) allows reduction, if not elimination, of delay from missing store operations. In addition, the indications are maintained without corresponding data of the memory space unit, thus allowing representation of a large memory space with a relatively small missing store operation accelerator. With the missing store operation accelerator, a store operation, which misses in low-latency memory (e.g., L1 or L2 cache), proceeds as if the targeted memory space unit resides in the low-latency memory, if indicated in the missing store operation accelerator. When a store operation misses in low-latency memory and hits in the accelerator, a positive acknowledgement is transmitted to the writing processing unit allowing the store operation to proceed. An entry is allocated for the store operation, the store data is written into the allocated entry, and the target of the store operation is requested from memory.Type: GrantFiled: November 12, 2005Date of Patent: July 13, 2010Assignee: Oracle America, Inc.Inventors: Santosh G. Abraham, Lawrence A. Spracklen, Yuan C. Chou
-
Publication number: 20100169611Abstract: A system and method for reducing branch misprediction penalty. In response to detecting a mispredicted branch instruction, circuitry within a microprocessor identifies a predetermined condition prior to retirement of the branch instruction. Upon identifying this condition, the entire corresponding pipeline is flushed prior to retirement of the branch instruction, and instruction fetch is started at a corresponding address of an oldest instruction in the pipeline immediately prior to the flushing of the pipeline. The correct outcome is stored prior to the pipeline flush. In order to distinguish the mispredicted branch from other instructions, identification information may be stored alongside the correct outcome. One example of the predetermined condition being satisfied is in response to a timer reaching a predetermined threshold value, wherein the timer begins incrementing in response to the mispredicted branch detection and resets at retirement of the mispredicted branch.Type: ApplicationFiled: December 30, 2008Publication date: July 1, 2010Inventors: Yuan C. Chou, Robert T. Golla, Mark A. Luttrell, Paul J. Jordan, Manish Shah
-
Publication number: 20100077154Abstract: A method for pre-fetching data. The method includes obtaining a pre-fetch request. The pre-fetch request identifies new data to pre-fetch from memory and store in a cache. The method further includes identifying a set in the cache to store the new data and identifying a value of a hotness indicator for the set. The hotness indicator value defines a number of replacements of at least one line in the set. The method further includes determining whether the value of the hotness indicator exceeds a predefined threshold, and storing the new data in the set when the value of the hotness indicator does not exceed the pre-defined threshold.Type: ApplicationFiled: September 24, 2008Publication date: March 25, 2010Applicant: SUN MICROSYSTEMS, INC.Inventor: Yuan C. Chou
-
Patent number: 7650485Abstract: A multithreading processor achieves a very large lookahead instruction window by allowing non-sequential fetch and processing of the dynamic instruction stream. A speculative thread is spawned at a specified point in the dynamic instruction stream and the instructions subsequent to the specified point are speculatively executed so that these instructions are fetched and issued out of sequential order. Very minimal modifications to existing processor design of a multithreading processor are required to achieve the very large lookahead instruction window. The modifications include changes to the control logic of the issue unit, only three additional bits in the register scoreboard.Type: GrantFiled: April 10, 2007Date of Patent: January 19, 2010Assignee: Sun Microsystems, Inc.Inventor: Yuan C. Chou
-
Publication number: 20090300340Abstract: A method for prefetching data and/or instructions from a main memory to a cache memory may include generating control flow information by storing respective information for each retired branch instruction. The method may further include storing respective one or more cache miss addresses for each retired instruction that incurs one or more cache misses, with the respective one or more cache miss addresses corresponding respectively to the one or more cache misses. A correlation table may be maintained based on the generated control flow information and the stored cache miss addresses. Each respective correlation table entry may correspond to a respective index, and may contain a respective tag and a respective correlation list. The correlation list may consist of a specified number of cache miss addresses that most frequently follow the cache miss address used in generating the index to which the respective correlation table entry corresponds.Type: ApplicationFiled: June 2, 2008Publication date: December 3, 2009Inventors: Yuan C. Chou, Yasuko Watanabe
-
Publication number: 20090287903Abstract: A computer processor and a method of using the computer processor take advantage of information in the event address register of the computer processor by saving information from the event address register to an event address register history buffer. Thus, the event address register history buffer includes a cluster of events associated with execution of a computer program. The cluster of events is analyzed and the computer program modified, either statically or dynamically, to eliminate or at least ameliorate the effects of such events in further execution of the computer program.Type: ApplicationFiled: May 16, 2008Publication date: November 19, 2009Inventors: Wei Chung Hsu, Yuan C. Chou
-
Patent number: 7600098Abstract: A method and system for efficient implementation of a large store buffer within a processor includes a store buffer within a processor having a first component configured to hold a plurality of younger stores requested by the processor and a second component configured to hold a plurality of older stores. The first component is implemented as a small content addressable memory (CAM) and the second component includes a first-in-first-out (FIFO) buffer to hold the data and addresses of the plurality of older stores and an address disambiguator to hold the addresses of each of the plurality of older stores found in the FIFO buffer. The processor uses the small CAM to perform most of the store-to-load forwarding in a fast and efficient way thereby enhancing processor performance.Type: GrantFiled: September 29, 2006Date of Patent: October 6, 2009Assignee: Sun Microsystems, Inc.Inventor: Yuan C. Chou
-
Patent number: 7543282Abstract: One embodiment of the present invention provides a system that selectively executes different versions of executable code for the same source code. During operation, the system first receives an executable code module which includes two or more versions of executable code for the same source code, wherein the two or more versions of the executable code are optimized in different ways. Next, the system executes the executable code module by first evaluating a test condition, and subsequently executing a specific version of the executable code based on the outcome of the evaluation, so that the execution is optimized for the test condition.Type: GrantFiled: March 24, 2006Date of Patent: June 2, 2009Assignee: Sun Microsystems, Inc.Inventor: Yuan C. Chou
-
Patent number: 7543112Abstract: The storage of data line in one or more L1 caches and/or a shared L2 cache of a chip multiprocessor is dynamically optimized based on the sharing of the data line. In one embodiment, an enhanced L2 cache directory entry associated with the data line is generated in an L2 cache directory of the shared L2 cache. The enhanced L2 cache directory entry includes a cache mask indicating a storage state of the data line in the one or more L1 caches and the shared L2 cache. In some embodiments, where the data line is stored in the shared L2 cache only, a portion of the cache mask indicates a storage history of the data line in the one or more L2 caches.Type: GrantFiled: June 20, 2006Date of Patent: June 2, 2009Assignee: Sun Microsystems, Inc.Inventors: Yuan C. Chou, Santosh G. Abraham, Lawrence A. Spracklen