Patents by Inventor Yuan C. Chou

Yuan C. Chou has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11829763
    Abstract: A system and method for efficiently reducing the latency of load operations. In various embodiments, logic of a processor accesses a prediction table after fetching instructions. For a prediction table hit, the logic executes a load instruction with a retrieved predicted address from the prediction table. For a prediction table miss, when the logic determines the address of the load instruction and hits in a learning table, the logic updates a level of confidence indication to indicate a higher level of confidence when a stored address matches the determined address. When the logic determines the level of confidence indication stored in a given table entry of the learning table meets a threshold, the logic allocates, in the prediction table, information stored in the given entry. Therefore, the predicted address is available during the next lookup of the prediction table.
    Type: Grant
    Filed: August 13, 2019
    Date of Patent: November 28, 2023
    Assignee: Apple Inc.
    Inventors: Yuan C. Chou, Viney Gautam, Wei-Han Lien, Kulin N. Kothari, Mridul Agarwal
  • Publication number: 20210049015
    Abstract: A system and method for efficiently reducing the latency of load operations. In various embodiments, logic of a processor accesses a prediction table after fetching instructions. For a prediction table hit, the logic executes a load instruction with a retrieved predicted address from the prediction table. For a prediction table miss, when the logic determines the address of the load instruction and hits in a learning table, the logic updates a level of confidence indication to indicate a higher level of confidence when a stored address matches the determined address. When the logic determines the level of confidence indication stored in a given table entry of the learning table meets a threshold, the logic allocates, in the prediction table, information stored in the given entry. Therefore, the predicted address is available during the next lookup of the prediction table.
    Type: Application
    Filed: August 13, 2019
    Publication date: February 18, 2021
    Inventors: Yuan C. Chou, Viney Gautam, Wei-Han Lien, Kulin N. Kothari, Mridul Agarwal
  • Patent number: 10296460
    Abstract: The disclosed embodiments relate to a method for controlling prefetching in a processor to prevent over-saturation of interfaces in the memory hierarchy of the processor. While the processor is executing, the method determines a bandwidth utilization of an interface from a cache in the processor to a lower level of the memory hierarchy. Next, the method selectively adjusts a prefetch-dropping high-water mark for occupancy of a miss buffer associated with the cache based on the determined bandwidth utilization, wherein the miss buffer stores entries for outstanding demand requests and prefetches that missed in the cache and are waiting for corresponding data to be returned from the lower level of the memory hierarchy, and wherein when the occupancy of the miss buffer exceeds the prefetch-dropping high-water mark, subsequent prefetches that cause a cache miss are dropped.
    Type: Grant
    Filed: June 29, 2016
    Date of Patent: May 21, 2019
    Assignee: Oracle International Corporation
    Inventors: Suraj Sudhir, Yuan C. Chou
  • Patent number: 10013356
    Abstract: The disclosed embodiments relate to a system that generates prefetches for a stream of data accesses with multiple strides. During operation, while a processor is generating the stream of data accesses, the system examines a sequence of strides associated with the stream of data accesses. Next, upon detecting a pattern having a single constant stride in the examined sequence of strides, the system issues prefetch instructions to prefetch a sequence of data cache lines consistent with the single constant stride. Similarly, upon detecting a recurring pattern having two or more different strides in the examined sequence of strides, the system issues prefetch instructions to prefetch a sequence of data cache lines consistent with the recurring pattern having two or more different strides.
    Type: Grant
    Filed: July 8, 2015
    Date of Patent: July 3, 2018
    Assignee: ORACLE INTERNAIONAL CORPORATION
    Inventor: Yuan C. Chou
  • Patent number: 9946543
    Abstract: A processor includes an execution pipeline configured to execute instructions for threads, wherein the architectural state of a thread includes a set of register windows for the thread. The processor also includes a physical register file (PRF) containing both speculative and architectural versions of registers for each thread. When an instruction that writes to a destination register enters a rename stage, the rename stage allocates an entry for the destination register in the PRF. When an instruction that has written to a speculative version of a destination register enters a commit stage, the commit stage converts the speculative version into an architectural version. It also deallocates an entry for a previous version of the destination register from the PRF. When a register-window-restore instruction that deallocates a register window enters the commit stage, the commit stage deallocates local and output registers for the deallocated register window from the PRF.
    Type: Grant
    Filed: March 14, 2016
    Date of Patent: April 17, 2018
    Assignee: Oracle International Corporation
    Inventor: Yuan C. Chou
  • Publication number: 20180004670
    Abstract: The disclosed embodiments relate to a method for controlling prefetching in a processor to prevent over-saturation of interfaces in the memory hierarchy of the processor. While the processor is executing, the method determines a bandwidth utilization of an interface from a cache in the processor to a lower level of the memory hierarchy. Next, the method selectively adjusts a prefetch-dropping high-water mark for occupancy of a miss buffer associated with the cache based on the determined bandwidth utilization, wherein the miss buffer stores entries for outstanding demand requests and prefetches that missed in the cache and are waiting for corresponding data to be returned from the lower level of the memory hierarchy, and wherein when the occupancy of the miss buffer exceeds the prefetch-dropping high-water mark, subsequent prefetches that cause a cache miss are dropped.
    Type: Application
    Filed: June 29, 2016
    Publication date: January 4, 2018
    Applicant: Oracle International Corporation
    Inventors: Suraj Sudhir, Yuan C. Chou
  • Patent number: 9690707
    Abstract: The disclosed embodiments provide a system that facilitates prefetching an instruction cache line in a processor. During execution of the processor, the system performs a current instruction cache access which is directed to a current cache line. If the current instruction cache access causes a cache miss or is a first demand fetch for a previously prefetched cache line, the system determines whether the current instruction cache access is discontinuous with a preceding instruction cache access. If so, the system completes the current instruction cache access by performing a cache access to service the cache miss or the first demand fetch, and also prefetching a predicted cache line associated with a discontinuous instruction cache access which is predicted to follow the current instruction cache access.
    Type: Grant
    Filed: November 23, 2010
    Date of Patent: June 27, 2017
    Assignee: ORACLE INTERNATIONAL CORPORATION
    Inventor: Yuan C. Chou
  • Patent number: 9665375
    Abstract: Systems and methods for efficient thread arbitration in a threaded processor with dynamic resource allocation. A processor includes a resource shared by multiple threads. The resource includes an array with multiple entries, each of which may be allocated for use by any thread. Control logic detects a load miss to memory, wherein the miss is associated with a latency greater than a given threshold. The load instruction or an immediately younger instruction is selected for replay for an associated thread. A pipeline flush and replay for the associated thread begins with the selected instruction. Instructions younger than the load instruction are held at a given pipeline stage until the load instruction completes. During replay, this hold prevents resources from being allocated to the associated thread while the load instruction is being serviced.
    Type: Grant
    Filed: April 26, 2012
    Date of Patent: May 30, 2017
    Assignee: Oracle International Corporation
    Inventors: Yuan C. Chou, Robert T. Golla, Mark A. Luttrell
  • Publication number: 20170010970
    Abstract: The disclosed embodiments relate to a system that generates prefetches for a stream of data accesses with multiple strides. During operation, while a processor is generating the stream of data accesses, the system examines a sequence of strides associated with the stream of data accesses. Next, upon detecting a pattern having a single constant stride in the examined sequence of strides, the system issues prefetch instructions to prefetch a sequence of data cache lines consistent with the single constant stride. Similarly, upon detecting a recurring pattern having two or more different strides in the examined sequence of strides, the system issues prefetch instructions to prefetch a sequence of data cache lines consistent with the recurring pattern having two or more different strides.
    Type: Application
    Filed: July 8, 2015
    Publication date: January 12, 2017
    Applicant: ORACLE INTERNATIONAL CORPORATION
    Inventor: Yuan C. Chou
  • Patent number: 9535697
    Abstract: The present embodiments provide a system that facilitates lazy register window fills in a processor. During program execution, when the system encounters a restore instruction for a register window, the system determines if the restore instruction causes an underflow condition that requires the register window to be filled from a stack in memory. If so, the system completes the restore instruction by updating state information for the register window to indicate that the restore instruction is complete without actually filling the individual registers that comprise the register window from the stack. During subsequent program execution, the system lazily fills registers in the register window from the stack as the registers are accessed by the program.
    Type: Grant
    Filed: July 1, 2013
    Date of Patent: January 3, 2017
    Assignee: ORACLE INTERNATIONAL CORPORATION
    Inventor: Yuan C. Chou
  • Patent number: 9442727
    Abstract: The disclosed embodiments relate to a system that selectively filters out redundant software prefetch instructions during execution of a program on a processor. During execution of the program, the system collects information associated with hit rates for individual software prefetch instructions as the individual software prefetch instructions are executed, wherein a software prefetch instruction is redundant if the software prefetch instruction accesses a cache line that has already been fetched from memory. As software prefetch instructions are encountered during execution of the program, the system selectively filters out individual software prefetch instructions that are likely to be redundant based on the collected information, so that likely redundant software prefetch instructions are not executed by the processor.
    Type: Grant
    Filed: October 14, 2013
    Date of Patent: September 13, 2016
    Assignee: ORACLE INTERNATIONAL CORPORATION
    Inventor: Yuan C. Chou
  • Publication number: 20160196138
    Abstract: A processor includes an execution pipeline configured to execute instructions for threads, wherein the architectural state of a thread includes a set of register windows for the thread. The processor also includes a physical register file (PRF) containing both speculative and architectural versions of registers for each thread. When an instruction that writes to a destination register enters a rename stage, the rename stage allocates an entry for the destination register in the PRF. When an instruction that has written to a speculative version of a destination register enters a commit stage, the commit stage converts the speculative version into an architectural version. It also deallocates an entry for a previous version of the destination register from the PRF. When a register-window-restore instruction that deallocates a register window enters the commit stage, the commit stage deallocates local and output registers for the deallocated register window from the PRF.
    Type: Application
    Filed: March 14, 2016
    Publication date: July 7, 2016
    Applicant: Oracle International Corporation
    Inventor: Yuan C. Chou
  • Patent number: 9367312
    Abstract: A processor includes an execution pipeline configured to execute instructions for threads, wherein the architectural state of a thread includes a set of register windows for the thread. The processor also includes a physical register file (PRF) containing both speculative and architectural versions of registers for each thread. When an instruction that writes to a destination register enters a rename stage, the rename stage allocates an entry for the destination register in the PRF. When an instruction that has written to a speculative version of a destination register enters a commit stage, the commit stage converts the speculative version into an architectural version. It also deallocates an entry for a previous version of the destination register from the PRF. When a register-window-restore instruction that deallocates a register window enters the commit stage, the commit stage deallocates local and output registers for the deallocated register window from the PRF.
    Type: Grant
    Filed: February 26, 2014
    Date of Patent: June 14, 2016
    Assignee: ORACLE INTERNATIONAL CORPORATION
    Inventor: Yuan C. Chou
  • Patent number: 9304927
    Abstract: The disclosed embodiments relate to a method for dynamically changing a prefetching configuration in a computer system, wherein the prefetching configuration specifies how to change an ahead distance that specifies how many references ahead to prefetch for each stream. During operation of the computer system, the method keeps track of one or more stream lengths, wherein a stream is a sequence of memory references with a constant stride. Next, the method dynamically changes the prefetching configuration for the computer system based on observed stream lengths in a most-recent window of time.
    Type: Grant
    Filed: August 27, 2012
    Date of Patent: April 5, 2016
    Assignee: ORACLE INTERNATIONAL CORPORATION
    Inventors: Suryanarayana Murthy Durbhakula, Yuan C. Chou
  • Publication number: 20150242209
    Abstract: A processor includes an execution pipeline configured to execute instructions for threads, wherein the architectural state of a thread includes a set of register windows for the thread. The processor also includes a physical register file (PRF) containing both speculative and architectural versions of registers for each thread. When an instruction that writes to a destination register enters a rename stage, the rename stage allocates an entry for the destination register in the PRF. When an instruction that has written to a speculative version of a destination register enters a commit stage, the commit stage converts the speculative version into an architectural version. It also deallocates an entry for a previous version of the destination register from the PRF. When a register-window-restore instruction that deallocates a register window enters the commit stage, the commit stage deallocates local and output registers for the deallocated register window from the PRF.
    Type: Application
    Filed: February 26, 2014
    Publication date: August 27, 2015
    Applicant: Oracle International Corporation
    Inventor: Yuan C. Chou
  • Patent number: 9110811
    Abstract: A method and apparatus for determining data to be prefetched based on previous cache miss history is disclosed. In one embodiment, a processor includes a first cache memory and a controller circuit. The controller circuit is configured to load data from a first address into the first cache memory responsive to a cache miss corresponding to the first address. The controller circuit is further configured to determine, responsive to a cache miss for the first address, if a previous cache miss occurred at a second address. Responsive to determining that the previous cache miss occurred at the second address, the controller circuit is configured to load data from a second address into the first cache.
    Type: Grant
    Filed: September 18, 2012
    Date of Patent: August 18, 2015
    Assignee: Oracle International Corporation
    Inventor: Yuan C. Chou
  • Patent number: 9047197
    Abstract: A method is disclosed that uses a non-coherent store instruction to reduce inter-thread communication latency between threads sharing a level one write-through cache. When a thread executes the non-coherent store instruction, the level one cache is immediately updated with the data value. The data value is immediately available to another thread sharing the level-one write-through cache. A computer system having reduced inter-thread communication latency is disclosed. The computer system includes a first plurality of processor cores, each processor core including a second plurality of processing engines sharing a level one write-through cache. The level one caches are connected to a level two cache via a crossbar switch. The computer system further implements a non-coherent store instruction that updates a data value in the level one cache prior to updating the corresponding data value in the level two cache.
    Type: Grant
    Filed: October 23, 2007
    Date of Patent: June 2, 2015
    Assignee: Oracle America, Inc.
    Inventor: Yuan C. Chou
  • Publication number: 20150106590
    Abstract: The disclosed embodiments relate to a system that selectively filters out redundant software prefetch instructions during execution of a program on a processor. During execution of the program, the system collects information associated with hit rates for individual software prefetch instructions as the individual software prefetch instructions are executed, wherein a software prefetch instruction is redundant if the software prefetch instruction accesses a cache line that has already been fetched from memory. As software prefetch instructions are encountered during execution of the program, the system selectively filters out individual software prefetch instructions that are likely to be redundant based on the collected information, so that likely redundant software prefetch instructions are not executed by the processor.
    Type: Application
    Filed: October 14, 2013
    Publication date: April 16, 2015
    Applicant: Oracle International Corporation
    Inventor: Yuan C. Chou
  • Patent number: 9009449
    Abstract: A system that executes program instructions on a processor is described. During a normal-execution mode, the system issues instructions for execution in program order. Upon encountering an unresolved data dependency during execution of an instruction, the system speculatively executes subsequent instructions in a lookahead mode to prefetch future loads. While executing in the lookahead mode, if the processor determines that the lookahead mode is unlikely to uncover any additional outer-level cache misses, the system terminates the lookahead mode. Then, after the unresolved data dependency is resolved, the system recommences execution in the normal-execution mode from the instruction that triggered the lookahead mode.
    Type: Grant
    Filed: November 10, 2011
    Date of Patent: April 14, 2015
    Assignee: Oracle International Corporation
    Inventors: Yuan C. Chou, Eric W. Mahurin
  • Publication number: 20150006864
    Abstract: The present embodiments provide a system that facilitates lazy register window fills in a processor. During program execution, when the system encounters a restore instruction for a register window, the system determines if the restore instruction causes an underflow condition that requires the register window to be filled from a stack in memory. If so, the system completes the restore instruction by updating state information for the register window to indicate that the restore instruction is complete without actually filling the individual registers that comprise the register window from the stack. During subsequent program execution, the system lazily fills registers in the register window from the stack as the registers are accessed by the program.
    Type: Application
    Filed: July 1, 2013
    Publication date: January 1, 2015
    Inventor: Yuan C. Chou