Patents by Inventor Yuan C. Chou
Yuan C. Chou has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12175248Abstract: Disclosed techniques relate to re-use of speculative results from an incorrect execution path. In some embodiments, when a control transfer instruction is mispredicted, a load instruction may have been executed on the wrong path. In disclosed embodiments, result storage circuitry records information that indicates destination registers of speculatively-executed load instructions including a first load instruction. Control flow tracker circuitry may store information indicating a reconvergence point for the control transfer instruction.Type: GrantFiled: April 21, 2023Date of Patent: December 24, 2024Assignee: Apple Inc.Inventors: Yuan C. Chou, Deepankar Duggal, Debasish Chandra, Niket K Choudhary, Richard F. Russo
-
Patent number: 12159142Abstract: Techniques are disclosed relating to predicting values for load operations. In some embodiments, front-end circuitry is configured to predict values of load operations based on multiple value tagged geometric length predictor (VTAGE) prediction tables (based on program counter information and branch history information). Training circuitry may adjust multiple VTAGE learning tables based on completed load operations. Control circuitry may pre-compute access information (e.g., an index) for a VTAGE learning table for a load based on branch history information that is available to the front-end circuitry but that is unavailable to the training circuitry, store the pre-computed access information, and provide the pre-computed access information from the first storage circuitry to the training circuitry to access the VTAGE learning table based on completion of the load. This may facilitate VTAGE training without pipelining the branch history information.Type: GrantFiled: May 2, 2023Date of Patent: December 3, 2024Assignee: Apple Inc.Inventors: Yuan C. Chou, Chang Xu, Deepankar Duggal, Debasish Chandra
-
Publication number: 20240362027Abstract: Techniques are disclosed relating to load value prediction. In some embodiments, a processor includes load address prediction circuitry and load value prediction circuitry. Training circuitry may train loads in a given entry, and may include a first entry configured to store first predicted load address information and a confidence indication of confidence that the first predicted load address information is correct and a second entry configured to store first predicted load value information and a confidence indication of confidence that the first predicted load value information is correct (note a given entry may be configured to load or value prediction at different times). Control circuitry may, in response to an entry in the training circuitry reaching a threshold level of confidence, allocate a corresponding entry in either the load value prediction circuitry or the load address prediction circuitry.Type: ApplicationFiled: July 5, 2024Publication date: October 31, 2024Inventors: Yuan C. Chou, Debasish Chandra, Mridul Agarwal, Haoyan Jia
-
Publication number: 20240354109Abstract: Disclosed techniques relate to re-use of speculative results from an incorrect execution path. In some embodiments, when a control transfer instruction is mispredicted, a load instruction may have been executed on the wrong path. In disclosed embodiments, result storage circuitry records information that indicates destination registers of speculatively-executed load instructions including a first load instruction. Control flow tracker circuitry may store information indicating a reconvergence point for the control transfer instruction.Type: ApplicationFiled: April 21, 2023Publication date: October 24, 2024Inventors: Yuan C. Chou, Deepankar Duggal, Debasish Chandra, Niket K. Choudhary, Richard F. Russo
-
Publication number: 20240354111Abstract: Disclosed techniques relate to re-use of speculative results from an incorrect execution path. In some embodiments, when a first control transfer instruction is mispredicted, a second control transfer instruction may have been executed on the wrong path because of the misprediction. Result storage circuitry may record information indicating a determined direction for the second control transfer instruction. Control flow tracker circuitry may store, for the first control transfer instruction, information indicating a reconvergence point. Re-use control circuitry may track registers written by instructions prior to the reconvergence point, determine, based on the tracked registers, that the second control transfer instruction does not depend on data from any instruction between the first control transfer instruction and the reconvergence point, and use the recorded determined direction for the second control transfer instruction, notwithstanding the misprediction of the first control transfer instruction.Type: ApplicationFiled: April 21, 2023Publication date: October 24, 2024Inventors: Yuan C. Chou, Deepankar Duggal, Debasish Chandra, Niket K. Choudhary, Richard F. Russo
-
Publication number: 20240329990Abstract: A system, e.g., a system on a chip (SOC), may include one or more processors. A processor may execute an instruction synchronization barrier (ISB) instruction to enforce an ordering constraint on instructions. To execute the ISB instruction, the processor may determine whether contexts of the processor required for execution of instructions older than the ISB instruction are consumed for the older instructions. Responsive to determining that the contexts are consumed for the older instructions, the processor may initiate fetching of an instruction younger than the ISB instruction, without waiting for the older instructions to retire.Type: ApplicationFiled: June 11, 2024Publication date: October 3, 2024Applicant: Apple Inc.Inventors: Deepankar Duggal, Kulin N Kothari, Mridul Agarwal, Chang Xu, Yanran Yang, Richard F Russo, Yuan C Chou, Douglas C Holman
-
Patent number: 12067398Abstract: Techniques are disclosed relating to load value prediction. In some embodiments, a processor includes learning table circuitry that is shared for both address and value prediction. Loads may be trained for value prediction when they are eligible for both value and address prediction. Entries in the learning table may be promoted to an address prediction table or a load value prediction table for prediction, e.g., when they reach a threshold confidence level in the training table. In some embodiments, the learning table stores a hash of a predicted load value and control circuitry uses a probing load to retrieve the actual predicted load value for the value prediction table.Type: GrantFiled: April 29, 2022Date of Patent: August 20, 2024Assignee: Apple Inc.Inventors: Yuan C. Chou, Debasish Chandra, Mridul Agarwal, Haoyan Jia
-
Patent number: 12045615Abstract: A system, e.g., a system on a chip (SOC), may include one or more processors. A processor may execute an instruction synchronization barrier (ISB) instruction to enforce an ordering constraint on instructions. To execute the ISB instruction, the processor may determine whether contexts of the processor required for execution of instructions older than the ISB instruction are consumed for the older instructions. Responsive to determining that the contexts are consumed for the older instructions, the processor may initiate fetching of an instruction younger than the ISB instruction, without waiting for the older instructions to retire.Type: GrantFiled: September 16, 2022Date of Patent: July 23, 2024Assignee: Apple Inc.Inventors: Deepankar Duggal, Kulin N Kothari, Mridul Agarwal, Chang Xu, Yanran Yang, Richard F Russo, Yuan C Chou, Douglas C Holman
-
Patent number: 11829763Abstract: A system and method for efficiently reducing the latency of load operations. In various embodiments, logic of a processor accesses a prediction table after fetching instructions. For a prediction table hit, the logic executes a load instruction with a retrieved predicted address from the prediction table. For a prediction table miss, when the logic determines the address of the load instruction and hits in a learning table, the logic updates a level of confidence indication to indicate a higher level of confidence when a stored address matches the determined address. When the logic determines the level of confidence indication stored in a given table entry of the learning table meets a threshold, the logic allocates, in the prediction table, information stored in the given entry. Therefore, the predicted address is available during the next lookup of the prediction table.Type: GrantFiled: August 13, 2019Date of Patent: November 28, 2023Assignee: Apple Inc.Inventors: Yuan C. Chou, Viney Gautam, Wei-Han Lien, Kulin N. Kothari, Mridul Agarwal
-
Publication number: 20210049015Abstract: A system and method for efficiently reducing the latency of load operations. In various embodiments, logic of a processor accesses a prediction table after fetching instructions. For a prediction table hit, the logic executes a load instruction with a retrieved predicted address from the prediction table. For a prediction table miss, when the logic determines the address of the load instruction and hits in a learning table, the logic updates a level of confidence indication to indicate a higher level of confidence when a stored address matches the determined address. When the logic determines the level of confidence indication stored in a given table entry of the learning table meets a threshold, the logic allocates, in the prediction table, information stored in the given entry. Therefore, the predicted address is available during the next lookup of the prediction table.Type: ApplicationFiled: August 13, 2019Publication date: February 18, 2021Inventors: Yuan C. Chou, Viney Gautam, Wei-Han Lien, Kulin N. Kothari, Mridul Agarwal
-
Patent number: 10296460Abstract: The disclosed embodiments relate to a method for controlling prefetching in a processor to prevent over-saturation of interfaces in the memory hierarchy of the processor. While the processor is executing, the method determines a bandwidth utilization of an interface from a cache in the processor to a lower level of the memory hierarchy. Next, the method selectively adjusts a prefetch-dropping high-water mark for occupancy of a miss buffer associated with the cache based on the determined bandwidth utilization, wherein the miss buffer stores entries for outstanding demand requests and prefetches that missed in the cache and are waiting for corresponding data to be returned from the lower level of the memory hierarchy, and wherein when the occupancy of the miss buffer exceeds the prefetch-dropping high-water mark, subsequent prefetches that cause a cache miss are dropped.Type: GrantFiled: June 29, 2016Date of Patent: May 21, 2019Assignee: Oracle International CorporationInventors: Suraj Sudhir, Yuan C. Chou
-
Patent number: 10013356Abstract: The disclosed embodiments relate to a system that generates prefetches for a stream of data accesses with multiple strides. During operation, while a processor is generating the stream of data accesses, the system examines a sequence of strides associated with the stream of data accesses. Next, upon detecting a pattern having a single constant stride in the examined sequence of strides, the system issues prefetch instructions to prefetch a sequence of data cache lines consistent with the single constant stride. Similarly, upon detecting a recurring pattern having two or more different strides in the examined sequence of strides, the system issues prefetch instructions to prefetch a sequence of data cache lines consistent with the recurring pattern having two or more different strides.Type: GrantFiled: July 8, 2015Date of Patent: July 3, 2018Assignee: ORACLE INTERNAIONAL CORPORATIONInventor: Yuan C. Chou
-
Patent number: 9946543Abstract: A processor includes an execution pipeline configured to execute instructions for threads, wherein the architectural state of a thread includes a set of register windows for the thread. The processor also includes a physical register file (PRF) containing both speculative and architectural versions of registers for each thread. When an instruction that writes to a destination register enters a rename stage, the rename stage allocates an entry for the destination register in the PRF. When an instruction that has written to a speculative version of a destination register enters a commit stage, the commit stage converts the speculative version into an architectural version. It also deallocates an entry for a previous version of the destination register from the PRF. When a register-window-restore instruction that deallocates a register window enters the commit stage, the commit stage deallocates local and output registers for the deallocated register window from the PRF.Type: GrantFiled: March 14, 2016Date of Patent: April 17, 2018Assignee: Oracle International CorporationInventor: Yuan C. Chou
-
Publication number: 20180004670Abstract: The disclosed embodiments relate to a method for controlling prefetching in a processor to prevent over-saturation of interfaces in the memory hierarchy of the processor. While the processor is executing, the method determines a bandwidth utilization of an interface from a cache in the processor to a lower level of the memory hierarchy. Next, the method selectively adjusts a prefetch-dropping high-water mark for occupancy of a miss buffer associated with the cache based on the determined bandwidth utilization, wherein the miss buffer stores entries for outstanding demand requests and prefetches that missed in the cache and are waiting for corresponding data to be returned from the lower level of the memory hierarchy, and wherein when the occupancy of the miss buffer exceeds the prefetch-dropping high-water mark, subsequent prefetches that cause a cache miss are dropped.Type: ApplicationFiled: June 29, 2016Publication date: January 4, 2018Applicant: Oracle International CorporationInventors: Suraj Sudhir, Yuan C. Chou
-
Patent number: 9690707Abstract: The disclosed embodiments provide a system that facilitates prefetching an instruction cache line in a processor. During execution of the processor, the system performs a current instruction cache access which is directed to a current cache line. If the current instruction cache access causes a cache miss or is a first demand fetch for a previously prefetched cache line, the system determines whether the current instruction cache access is discontinuous with a preceding instruction cache access. If so, the system completes the current instruction cache access by performing a cache access to service the cache miss or the first demand fetch, and also prefetching a predicted cache line associated with a discontinuous instruction cache access which is predicted to follow the current instruction cache access.Type: GrantFiled: November 23, 2010Date of Patent: June 27, 2017Assignee: ORACLE INTERNATIONAL CORPORATIONInventor: Yuan C. Chou
-
Patent number: 9665375Abstract: Systems and methods for efficient thread arbitration in a threaded processor with dynamic resource allocation. A processor includes a resource shared by multiple threads. The resource includes an array with multiple entries, each of which may be allocated for use by any thread. Control logic detects a load miss to memory, wherein the miss is associated with a latency greater than a given threshold. The load instruction or an immediately younger instruction is selected for replay for an associated thread. A pipeline flush and replay for the associated thread begins with the selected instruction. Instructions younger than the load instruction are held at a given pipeline stage until the load instruction completes. During replay, this hold prevents resources from being allocated to the associated thread while the load instruction is being serviced.Type: GrantFiled: April 26, 2012Date of Patent: May 30, 2017Assignee: Oracle International CorporationInventors: Yuan C. Chou, Robert T. Golla, Mark A. Luttrell
-
Publication number: 20170010970Abstract: The disclosed embodiments relate to a system that generates prefetches for a stream of data accesses with multiple strides. During operation, while a processor is generating the stream of data accesses, the system examines a sequence of strides associated with the stream of data accesses. Next, upon detecting a pattern having a single constant stride in the examined sequence of strides, the system issues prefetch instructions to prefetch a sequence of data cache lines consistent with the single constant stride. Similarly, upon detecting a recurring pattern having two or more different strides in the examined sequence of strides, the system issues prefetch instructions to prefetch a sequence of data cache lines consistent with the recurring pattern having two or more different strides.Type: ApplicationFiled: July 8, 2015Publication date: January 12, 2017Applicant: ORACLE INTERNATIONAL CORPORATIONInventor: Yuan C. Chou
-
Patent number: 9535697Abstract: The present embodiments provide a system that facilitates lazy register window fills in a processor. During program execution, when the system encounters a restore instruction for a register window, the system determines if the restore instruction causes an underflow condition that requires the register window to be filled from a stack in memory. If so, the system completes the restore instruction by updating state information for the register window to indicate that the restore instruction is complete without actually filling the individual registers that comprise the register window from the stack. During subsequent program execution, the system lazily fills registers in the register window from the stack as the registers are accessed by the program.Type: GrantFiled: July 1, 2013Date of Patent: January 3, 2017Assignee: ORACLE INTERNATIONAL CORPORATIONInventor: Yuan C. Chou
-
Patent number: 9442727Abstract: The disclosed embodiments relate to a system that selectively filters out redundant software prefetch instructions during execution of a program on a processor. During execution of the program, the system collects information associated with hit rates for individual software prefetch instructions as the individual software prefetch instructions are executed, wherein a software prefetch instruction is redundant if the software prefetch instruction accesses a cache line that has already been fetched from memory. As software prefetch instructions are encountered during execution of the program, the system selectively filters out individual software prefetch instructions that are likely to be redundant based on the collected information, so that likely redundant software prefetch instructions are not executed by the processor.Type: GrantFiled: October 14, 2013Date of Patent: September 13, 2016Assignee: ORACLE INTERNATIONAL CORPORATIONInventor: Yuan C. Chou
-
Publication number: 20160196138Abstract: A processor includes an execution pipeline configured to execute instructions for threads, wherein the architectural state of a thread includes a set of register windows for the thread. The processor also includes a physical register file (PRF) containing both speculative and architectural versions of registers for each thread. When an instruction that writes to a destination register enters a rename stage, the rename stage allocates an entry for the destination register in the PRF. When an instruction that has written to a speculative version of a destination register enters a commit stage, the commit stage converts the speculative version into an architectural version. It also deallocates an entry for a previous version of the destination register from the PRF. When a register-window-restore instruction that deallocates a register window enters the commit stage, the commit stage deallocates local and output registers for the deallocated register window from the PRF.Type: ApplicationFiled: March 14, 2016Publication date: July 7, 2016Applicant: Oracle International CorporationInventor: Yuan C. Chou