Patents by Inventor Yuan C. Chou
Yuan C. Chou has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11829763Abstract: A system and method for efficiently reducing the latency of load operations. In various embodiments, logic of a processor accesses a prediction table after fetching instructions. For a prediction table hit, the logic executes a load instruction with a retrieved predicted address from the prediction table. For a prediction table miss, when the logic determines the address of the load instruction and hits in a learning table, the logic updates a level of confidence indication to indicate a higher level of confidence when a stored address matches the determined address. When the logic determines the level of confidence indication stored in a given table entry of the learning table meets a threshold, the logic allocates, in the prediction table, information stored in the given entry. Therefore, the predicted address is available during the next lookup of the prediction table.Type: GrantFiled: August 13, 2019Date of Patent: November 28, 2023Assignee: Apple Inc.Inventors: Yuan C. Chou, Viney Gautam, Wei-Han Lien, Kulin N. Kothari, Mridul Agarwal
-
Publication number: 20210049015Abstract: A system and method for efficiently reducing the latency of load operations. In various embodiments, logic of a processor accesses a prediction table after fetching instructions. For a prediction table hit, the logic executes a load instruction with a retrieved predicted address from the prediction table. For a prediction table miss, when the logic determines the address of the load instruction and hits in a learning table, the logic updates a level of confidence indication to indicate a higher level of confidence when a stored address matches the determined address. When the logic determines the level of confidence indication stored in a given table entry of the learning table meets a threshold, the logic allocates, in the prediction table, information stored in the given entry. Therefore, the predicted address is available during the next lookup of the prediction table.Type: ApplicationFiled: August 13, 2019Publication date: February 18, 2021Inventors: Yuan C. Chou, Viney Gautam, Wei-Han Lien, Kulin N. Kothari, Mridul Agarwal
-
Patent number: 10296460Abstract: The disclosed embodiments relate to a method for controlling prefetching in a processor to prevent over-saturation of interfaces in the memory hierarchy of the processor. While the processor is executing, the method determines a bandwidth utilization of an interface from a cache in the processor to a lower level of the memory hierarchy. Next, the method selectively adjusts a prefetch-dropping high-water mark for occupancy of a miss buffer associated with the cache based on the determined bandwidth utilization, wherein the miss buffer stores entries for outstanding demand requests and prefetches that missed in the cache and are waiting for corresponding data to be returned from the lower level of the memory hierarchy, and wherein when the occupancy of the miss buffer exceeds the prefetch-dropping high-water mark, subsequent prefetches that cause a cache miss are dropped.Type: GrantFiled: June 29, 2016Date of Patent: May 21, 2019Assignee: Oracle International CorporationInventors: Suraj Sudhir, Yuan C. Chou
-
Patent number: 10013356Abstract: The disclosed embodiments relate to a system that generates prefetches for a stream of data accesses with multiple strides. During operation, while a processor is generating the stream of data accesses, the system examines a sequence of strides associated with the stream of data accesses. Next, upon detecting a pattern having a single constant stride in the examined sequence of strides, the system issues prefetch instructions to prefetch a sequence of data cache lines consistent with the single constant stride. Similarly, upon detecting a recurring pattern having two or more different strides in the examined sequence of strides, the system issues prefetch instructions to prefetch a sequence of data cache lines consistent with the recurring pattern having two or more different strides.Type: GrantFiled: July 8, 2015Date of Patent: July 3, 2018Assignee: ORACLE INTERNAIONAL CORPORATIONInventor: Yuan C. Chou
-
Patent number: 9946543Abstract: A processor includes an execution pipeline configured to execute instructions for threads, wherein the architectural state of a thread includes a set of register windows for the thread. The processor also includes a physical register file (PRF) containing both speculative and architectural versions of registers for each thread. When an instruction that writes to a destination register enters a rename stage, the rename stage allocates an entry for the destination register in the PRF. When an instruction that has written to a speculative version of a destination register enters a commit stage, the commit stage converts the speculative version into an architectural version. It also deallocates an entry for a previous version of the destination register from the PRF. When a register-window-restore instruction that deallocates a register window enters the commit stage, the commit stage deallocates local and output registers for the deallocated register window from the PRF.Type: GrantFiled: March 14, 2016Date of Patent: April 17, 2018Assignee: Oracle International CorporationInventor: Yuan C. Chou
-
Publication number: 20180004670Abstract: The disclosed embodiments relate to a method for controlling prefetching in a processor to prevent over-saturation of interfaces in the memory hierarchy of the processor. While the processor is executing, the method determines a bandwidth utilization of an interface from a cache in the processor to a lower level of the memory hierarchy. Next, the method selectively adjusts a prefetch-dropping high-water mark for occupancy of a miss buffer associated with the cache based on the determined bandwidth utilization, wherein the miss buffer stores entries for outstanding demand requests and prefetches that missed in the cache and are waiting for corresponding data to be returned from the lower level of the memory hierarchy, and wherein when the occupancy of the miss buffer exceeds the prefetch-dropping high-water mark, subsequent prefetches that cause a cache miss are dropped.Type: ApplicationFiled: June 29, 2016Publication date: January 4, 2018Applicant: Oracle International CorporationInventors: Suraj Sudhir, Yuan C. Chou
-
Patent number: 9690707Abstract: The disclosed embodiments provide a system that facilitates prefetching an instruction cache line in a processor. During execution of the processor, the system performs a current instruction cache access which is directed to a current cache line. If the current instruction cache access causes a cache miss or is a first demand fetch for a previously prefetched cache line, the system determines whether the current instruction cache access is discontinuous with a preceding instruction cache access. If so, the system completes the current instruction cache access by performing a cache access to service the cache miss or the first demand fetch, and also prefetching a predicted cache line associated with a discontinuous instruction cache access which is predicted to follow the current instruction cache access.Type: GrantFiled: November 23, 2010Date of Patent: June 27, 2017Assignee: ORACLE INTERNATIONAL CORPORATIONInventor: Yuan C. Chou
-
Patent number: 9665375Abstract: Systems and methods for efficient thread arbitration in a threaded processor with dynamic resource allocation. A processor includes a resource shared by multiple threads. The resource includes an array with multiple entries, each of which may be allocated for use by any thread. Control logic detects a load miss to memory, wherein the miss is associated with a latency greater than a given threshold. The load instruction or an immediately younger instruction is selected for replay for an associated thread. A pipeline flush and replay for the associated thread begins with the selected instruction. Instructions younger than the load instruction are held at a given pipeline stage until the load instruction completes. During replay, this hold prevents resources from being allocated to the associated thread while the load instruction is being serviced.Type: GrantFiled: April 26, 2012Date of Patent: May 30, 2017Assignee: Oracle International CorporationInventors: Yuan C. Chou, Robert T. Golla, Mark A. Luttrell
-
Publication number: 20170010970Abstract: The disclosed embodiments relate to a system that generates prefetches for a stream of data accesses with multiple strides. During operation, while a processor is generating the stream of data accesses, the system examines a sequence of strides associated with the stream of data accesses. Next, upon detecting a pattern having a single constant stride in the examined sequence of strides, the system issues prefetch instructions to prefetch a sequence of data cache lines consistent with the single constant stride. Similarly, upon detecting a recurring pattern having two or more different strides in the examined sequence of strides, the system issues prefetch instructions to prefetch a sequence of data cache lines consistent with the recurring pattern having two or more different strides.Type: ApplicationFiled: July 8, 2015Publication date: January 12, 2017Applicant: ORACLE INTERNATIONAL CORPORATIONInventor: Yuan C. Chou
-
Patent number: 9535697Abstract: The present embodiments provide a system that facilitates lazy register window fills in a processor. During program execution, when the system encounters a restore instruction for a register window, the system determines if the restore instruction causes an underflow condition that requires the register window to be filled from a stack in memory. If so, the system completes the restore instruction by updating state information for the register window to indicate that the restore instruction is complete without actually filling the individual registers that comprise the register window from the stack. During subsequent program execution, the system lazily fills registers in the register window from the stack as the registers are accessed by the program.Type: GrantFiled: July 1, 2013Date of Patent: January 3, 2017Assignee: ORACLE INTERNATIONAL CORPORATIONInventor: Yuan C. Chou
-
Patent number: 9442727Abstract: The disclosed embodiments relate to a system that selectively filters out redundant software prefetch instructions during execution of a program on a processor. During execution of the program, the system collects information associated with hit rates for individual software prefetch instructions as the individual software prefetch instructions are executed, wherein a software prefetch instruction is redundant if the software prefetch instruction accesses a cache line that has already been fetched from memory. As software prefetch instructions are encountered during execution of the program, the system selectively filters out individual software prefetch instructions that are likely to be redundant based on the collected information, so that likely redundant software prefetch instructions are not executed by the processor.Type: GrantFiled: October 14, 2013Date of Patent: September 13, 2016Assignee: ORACLE INTERNATIONAL CORPORATIONInventor: Yuan C. Chou
-
Publication number: 20160196138Abstract: A processor includes an execution pipeline configured to execute instructions for threads, wherein the architectural state of a thread includes a set of register windows for the thread. The processor also includes a physical register file (PRF) containing both speculative and architectural versions of registers for each thread. When an instruction that writes to a destination register enters a rename stage, the rename stage allocates an entry for the destination register in the PRF. When an instruction that has written to a speculative version of a destination register enters a commit stage, the commit stage converts the speculative version into an architectural version. It also deallocates an entry for a previous version of the destination register from the PRF. When a register-window-restore instruction that deallocates a register window enters the commit stage, the commit stage deallocates local and output registers for the deallocated register window from the PRF.Type: ApplicationFiled: March 14, 2016Publication date: July 7, 2016Applicant: Oracle International CorporationInventor: Yuan C. Chou
-
Patent number: 9367312Abstract: A processor includes an execution pipeline configured to execute instructions for threads, wherein the architectural state of a thread includes a set of register windows for the thread. The processor also includes a physical register file (PRF) containing both speculative and architectural versions of registers for each thread. When an instruction that writes to a destination register enters a rename stage, the rename stage allocates an entry for the destination register in the PRF. When an instruction that has written to a speculative version of a destination register enters a commit stage, the commit stage converts the speculative version into an architectural version. It also deallocates an entry for a previous version of the destination register from the PRF. When a register-window-restore instruction that deallocates a register window enters the commit stage, the commit stage deallocates local and output registers for the deallocated register window from the PRF.Type: GrantFiled: February 26, 2014Date of Patent: June 14, 2016Assignee: ORACLE INTERNATIONAL CORPORATIONInventor: Yuan C. Chou
-
Patent number: 9304927Abstract: The disclosed embodiments relate to a method for dynamically changing a prefetching configuration in a computer system, wherein the prefetching configuration specifies how to change an ahead distance that specifies how many references ahead to prefetch for each stream. During operation of the computer system, the method keeps track of one or more stream lengths, wherein a stream is a sequence of memory references with a constant stride. Next, the method dynamically changes the prefetching configuration for the computer system based on observed stream lengths in a most-recent window of time.Type: GrantFiled: August 27, 2012Date of Patent: April 5, 2016Assignee: ORACLE INTERNATIONAL CORPORATIONInventors: Suryanarayana Murthy Durbhakula, Yuan C. Chou
-
Publication number: 20150242209Abstract: A processor includes an execution pipeline configured to execute instructions for threads, wherein the architectural state of a thread includes a set of register windows for the thread. The processor also includes a physical register file (PRF) containing both speculative and architectural versions of registers for each thread. When an instruction that writes to a destination register enters a rename stage, the rename stage allocates an entry for the destination register in the PRF. When an instruction that has written to a speculative version of a destination register enters a commit stage, the commit stage converts the speculative version into an architectural version. It also deallocates an entry for a previous version of the destination register from the PRF. When a register-window-restore instruction that deallocates a register window enters the commit stage, the commit stage deallocates local and output registers for the deallocated register window from the PRF.Type: ApplicationFiled: February 26, 2014Publication date: August 27, 2015Applicant: Oracle International CorporationInventor: Yuan C. Chou
-
Patent number: 9110811Abstract: A method and apparatus for determining data to be prefetched based on previous cache miss history is disclosed. In one embodiment, a processor includes a first cache memory and a controller circuit. The controller circuit is configured to load data from a first address into the first cache memory responsive to a cache miss corresponding to the first address. The controller circuit is further configured to determine, responsive to a cache miss for the first address, if a previous cache miss occurred at a second address. Responsive to determining that the previous cache miss occurred at the second address, the controller circuit is configured to load data from a second address into the first cache.Type: GrantFiled: September 18, 2012Date of Patent: August 18, 2015Assignee: Oracle International CorporationInventor: Yuan C. Chou
-
Patent number: 9047197Abstract: A method is disclosed that uses a non-coherent store instruction to reduce inter-thread communication latency between threads sharing a level one write-through cache. When a thread executes the non-coherent store instruction, the level one cache is immediately updated with the data value. The data value is immediately available to another thread sharing the level-one write-through cache. A computer system having reduced inter-thread communication latency is disclosed. The computer system includes a first plurality of processor cores, each processor core including a second plurality of processing engines sharing a level one write-through cache. The level one caches are connected to a level two cache via a crossbar switch. The computer system further implements a non-coherent store instruction that updates a data value in the level one cache prior to updating the corresponding data value in the level two cache.Type: GrantFiled: October 23, 2007Date of Patent: June 2, 2015Assignee: Oracle America, Inc.Inventor: Yuan C. Chou
-
Publication number: 20150106590Abstract: The disclosed embodiments relate to a system that selectively filters out redundant software prefetch instructions during execution of a program on a processor. During execution of the program, the system collects information associated with hit rates for individual software prefetch instructions as the individual software prefetch instructions are executed, wherein a software prefetch instruction is redundant if the software prefetch instruction accesses a cache line that has already been fetched from memory. As software prefetch instructions are encountered during execution of the program, the system selectively filters out individual software prefetch instructions that are likely to be redundant based on the collected information, so that likely redundant software prefetch instructions are not executed by the processor.Type: ApplicationFiled: October 14, 2013Publication date: April 16, 2015Applicant: Oracle International CorporationInventor: Yuan C. Chou
-
Patent number: 9009449Abstract: A system that executes program instructions on a processor is described. During a normal-execution mode, the system issues instructions for execution in program order. Upon encountering an unresolved data dependency during execution of an instruction, the system speculatively executes subsequent instructions in a lookahead mode to prefetch future loads. While executing in the lookahead mode, if the processor determines that the lookahead mode is unlikely to uncover any additional outer-level cache misses, the system terminates the lookahead mode. Then, after the unresolved data dependency is resolved, the system recommences execution in the normal-execution mode from the instruction that triggered the lookahead mode.Type: GrantFiled: November 10, 2011Date of Patent: April 14, 2015Assignee: Oracle International CorporationInventors: Yuan C. Chou, Eric W. Mahurin
-
Publication number: 20150006864Abstract: The present embodiments provide a system that facilitates lazy register window fills in a processor. During program execution, when the system encounters a restore instruction for a register window, the system determines if the restore instruction causes an underflow condition that requires the register window to be filled from a stack in memory. If so, the system completes the restore instruction by updating state information for the register window to indicate that the restore instruction is complete without actually filling the individual registers that comprise the register window from the stack. During subsequent program execution, the system lazily fills registers in the register window from the stack as the registers are accessed by the program.Type: ApplicationFiled: July 1, 2013Publication date: January 1, 2015Inventor: Yuan C. Chou