Prefetching Patents (Class 712/207)
-
Publication number: 20100306503Abstract: A microprocessor includes a cache memory, an instruction set having first and second prefetch instructions each configured to instruct the microprocessor to prefetch a cache line of data from a system memory into the cache memory, and a memory subsystem configured to execute the first and second prefetch instructions. For the first prefetch instruction the memory subsystem is configured to forego prefetching the cache line of data from the system memory into the cache memory in response to a predetermined set of conditions. For the second prefetch instruction the memory subsystem is configured to complete prefetching the cache line of data from the system memory into the cache memory in response to the predetermined set of conditions.Type: ApplicationFiled: May 17, 2010Publication date: December 2, 2010Applicant: VIA TECHNOLOGIES, INC.Inventors: G. Glenn Henry, Colin Eddy, Rodney E. Hooker
-
Patent number: 7840761Abstract: A processor executes one or more prefetch threads and one or more main computing threads. Each prefetch thread executes instructions ahead of a main computing thread to retrieve data for the main computing thread, such as data that the main computing thread may use in the immediate future. Data is retrieved for the prefetch thread and stored in a memory, such as data fetched from an external memory and stored in a buffer. A prefetch controller determines whether the memory is full. If the memory is full, a cache controller stalls at least one prefetch thread. The stall may continue until at least some of the data is transferred from the memory to a cache for use by at least one main computing thread. The stalled prefetch thread or threads are then reactivated.Type: GrantFiled: April 1, 2005Date of Patent: November 23, 2010Assignee: STMicroelectronics, Inc.Inventors: Osvaldo M. Colavin, Davide Rizzo
-
Patent number: 7836277Abstract: A method of managing an instruction cache and a process of using the method are provided. The processor may comprise a processor core which is operated either during an active mode or during an inactive mode wherein the process core performs at least one instruction during the active mode, an instruction cache which pre-traces a first instruction and determines, during the inactive mode, whether the processor core will meet a cache miss with regard to the first instruction, wherein the first instruction is to be performed by the processor core during the active mode, a coarse-grained array which performs a second instruction during the inactive mode, and a configuration memory which stores configuration information of the coarse-grained array, wherein the coarse-grained array performs the second instruction using the configuration information.Type: GrantFiled: March 5, 2008Date of Patent: November 16, 2010Assignee: Samsung Electronics Co., Ltd.Inventors: Il Hyun Park, Dong-Hoon Yoo, Dong Kwan Suh, Soojung Ryu, Jeongwook Kim
-
Patent number: 7836262Abstract: In one embodiment, a processor may be configured to write ECC granular stores into the data cache, while non-ECC granular stores may be merged with cache data in a memory request buffer. In one embodiment, a processor may be configured to detect that a victim block writeback hits one or more stores in a memory request buffer (or vice versa) and may convert the victim block writeback to a fill. In one embodiment, a processor may speculatively issue stores that are subsequent to a load from a load/store queue, but prevent the update for the stores in response to a snoop hit on the load.Type: GrantFiled: June 5, 2007Date of Patent: November 16, 2010Assignee: Apple Inc.Inventors: Ramesh Gunna, Sudarshan Kadambi
-
Patent number: 7836281Abstract: A system that facilitates improving performance of a processor during scout mode. During a normal-execution mode, the system executes instructions for using main thread. Upon encountering a stall condition during execution of the main thread, the system generates a checkpoint. The system then enters a scout mode, wherein instructions are speculatively executed by a speculative thread to prefetch future memory references, but results are not committed to the architectural state of the processor. Upon encountering a memory reference during scout mode, the system issues a prefetch for the memory reference. If the stall condition that caused the processor to enter scout mode is resolved, the system uses the checkpoint to resume execution of the main thread from the instruction that caused the stall condition, and simultaneously continues executing instructions in scout mode using the speculative thread from the point where the speculative thread left off.Type: GrantFiled: October 6, 2005Date of Patent: November 16, 2010Assignee: Oracle America, Inc.Inventors: Marc Tremblay, Shailender Chaudhry
-
Publication number: 20100287357Abstract: In one embodiment, a serial processor is configured to execute software instructions in a software program in serial. A serial memory is configured to store data for use by the serial processor in executing the software instructions in serial. A plurality of parallel processors are configured to execute software instructions in the software program in parallel. A plurality of partitioned memory modules are provided and configured to store data for use by the plurality of parallel processors in executing software instructions in parallel. Accordingly, a processor/memory structure is provided that allows serial programs to use quick local serial memories and parallel programs to use partitioned parallel memories. The system may switch between a serial mode and a parallel mode. The system may incorporate pre-fetching commands of several varieties.Type: ApplicationFiled: March 10, 2010Publication date: November 11, 2010Applicant: XMTT INC.Inventor: Uzi Y. Vishkin
-
Patent number: 7831806Abstract: A data processing apparatus comprises a processor for executing a stream of instructions, and a prefetch unit for prefetching instructions from a memory prior to sending those instructions to the processor for execution. The prefetch unit receives from the memory a plurality of prefetched instructions from sequential addresses in memory, and detects whether any prefetched instructions are an instruction flow changing instruction, and outputs a fetch address for a next instruction to be prefetched by the prefetch unit. Address generation logic is also provided which, for a selected prefetched instruction that is detected to be an instruction flow changing instruction, determines a target address to be output as the fetch address. Address generation logic has a first address generation path and a further generation path for determining the target address. The first address generation path generates the target address more quickly than the further address generation path.Type: GrantFiled: February 18, 2004Date of Patent: November 9, 2010Assignee: ARM LimitedInventor: Paul Anthony Gilkerson
-
Patent number: 7827389Abstract: A method, system, and computer program product are provided for enhancing the execution of independent loads in a processing unit. The processing unit dispatches a first set of instructions in order from a first buffer for execution. The processing unit receives updated results from the execution of the first set of instructions. The processing unit updates, in a first register, at least one register entry associated with each instruction in the first set of instructions, with the updated results. The processing unit determines if the first set of instructions from the first buffer have completed execution. Responsive to the completed execution of the first set of instructions from the first buffer, the processing unit copies the set of entries from the first register to a second register.Type: GrantFiled: June 15, 2007Date of Patent: November 2, 2010Assignee: International Business Machines CorporationInventors: Hung Q. Le, Dung Q. Nguyen
-
Patent number: 7822943Abstract: Systems, methods and computer program products for improving data stream prefetching in a microprocessor are described herein.Type: GrantFiled: August 4, 2008Date of Patent: October 26, 2010Assignee: MIPS Technologies, Inc.Inventor: Keith E. Diefendorff
-
Patent number: 7814298Abstract: A method, system and computer program product for promoting a trace in an instruction processing circuit is disclosed. They comprise determining if a current trace is promotable and determining if a next trace is appendable to the current trace. They include promoting the current trace and the next trace if the current trace is promotable and the next trace is appendable.Type: GrantFiled: November 16, 2007Date of Patent: October 12, 2010Assignee: Oracle America, Inc.Inventors: Richard Thaik, John Gregory Favor, Joseph Rowlands, Leonard Eric Shar, Matthew Ashcraft
-
Patent number: 7814247Abstract: A pre-fetch circuit of a semiconductor memory apparatus can carry out a high-frequency operating test through a low-frequency channel of a test equipment. The pre-fetch circuit of a semiconductor memory apparatus can includes: a pre-fetch unit for pre-fetching data bits in a first predetermined number; a plurality of registers provided in the first predetermined number, each of which latches a data in order or a data out of order of the pre-fetched data in response to different control signals; and a control unit for selectively activating the different control signals in response to a test mode signal, whereby some of the registers latch the data out of order.Type: GrantFiled: July 18, 2008Date of Patent: October 12, 2010Assignee: Hynix Semiconductor Inc.Inventor: Young-Ju Kim
-
Publication number: 20100250854Abstract: An efficient and effective compiler data prefetching technique is disclosed in which memory accesses may be prefetched are represented in linear induction expressions. Furthermore, indirect memory accesses indexed by other memory accesses of linear induction expressions in scalar loops may be prefetched.Type: ApplicationFiled: March 16, 2010Publication date: September 30, 2010Inventor: Dz-ching Ju
-
Patent number: 7805592Abstract: Techniques are disclosed for handling control transfer instructions in pipelined processors. Such instructions may cause the sequence of subsequent instructions to change, and thus may require subsequent instructions to be deleted from the processor's pipeline. Pre-decode means (110) are provided for at least partially decoding control transfer instructions early in the pipeline. Subsequent instructions can then be prevented from progressing through the pipeline. The mechanism required to delete unwanted instructions is thereby simplified.Type: GrantFiled: October 7, 2002Date of Patent: September 28, 2010Assignee: Altera CorporationInventors: Nicholas Paul Joyce, Nigel Peter Topham
-
Publication number: 20100241811Abstract: Technologies are generally described for allocating available prefetch bandwidth among processor cores in a multiprocessor computing system. The prefetch bandwidth associated with an off-chip memory interface of the multiprocessor may be determined, partitioned, and allocated across multiple processor cores.Type: ApplicationFiled: March 20, 2009Publication date: September 23, 2010Inventor: Yan Solihin
-
Patent number: 7802077Abstract: A new class traces for a processing engine, called “extended blocks,” possess an architecture that permits possible many entry points but only a single exit point. These extended blocks may be indexed based upon the address of the last instruction therein. Use of the new trace architecture provides several advantages, including reduction of instruction redundancies, dynamic block extension and a sharing of instructions among various extended blocks.Type: GrantFiled: June 30, 2000Date of Patent: September 21, 2010Assignee: Intel CorporationInventors: Stephen J. Jourdan, Lihu Rappoport, Ronny Ronen, Adi Yoaz
-
Patent number: 7793085Abstract: A memory control circuit for providing a small-circuit-size memory control circuit capable of reducing a branch penalty during the execution of a branch instruction in a CPU. A branch-destination buffer caches a branch-destination instruction and a branch-destination-instruction address determined by a branch instruction executed by the CPU. When the CPU executes a branch instruction thereafter, if the branch-destination-instruction address output from the CPU matches an instruction address in the branch-destination buffer, the corresponding branch-destination instruction stored in the branch-destination buffer is sent to the CPU. When a branch instruction is executed, an address comparison circuit compares the branch-destination-instruction address with the branch-source-instruction address.Type: GrantFiled: December 30, 2004Date of Patent: September 7, 2010Assignee: Fujitsu Semiconductor LimitedInventor: Kenji Furuya
-
Patent number: 7783863Abstract: A method of handling a trace to be aborted includes receiving an indication of a trace to be aborted and an indication of an abort reason corresponding to an execution of the trace to be aborted. The trace to be aborted has a trace type associated therewith and includes a sequence of the operations, and represents a sequence of at least two of the instructions. The method further includes identifying a corrective action based at least in part on the type of the trace to be aborted and on the abort reason, not taking into account a correspondence between the at least one operation that caused the execution to be aborted and the at least one instruction that the at least one operation at least in part represents. A next trace and its trace type is determined for execution, where the determining is based on the trace to be aborted and on the corrective action.Type: GrantFiled: October 24, 2007Date of Patent: August 24, 2010Assignee: Oracle America, Inc.Inventors: Christopher Patrick Nelson, John Gregory Favor, Richard Win Thaik, Matthew William Ashcraft
-
Patent number: 7779234Abstract: The present invention includes a system and method for implementing a hardware-supported thread assist under load lookahead mechanism for a microprocessor. According to an embodiment of the present invention, hardware thread-assist mode can be activated when one thread of the microprocessor is in a sleep mode. When load lookahead mode is activated, the fixed point unit copies the content of one or more architected facilities from an active thread to corresponding architected facilities in the first inactive thread. The load-store unit performs at least one speculative load in load lookahead mode and writes the results of the at least one speculative load to a duplicated architected facility in the first inactive thread.Type: GrantFiled: October 23, 2007Date of Patent: August 17, 2010Assignee: International Business Machines CorporationInventors: James W. Bishop, Hung Q. Le, Dung Q. Nguyen, Wolfram Sauer, Benjamin W. Stolt, Michael T. Vaden
-
Patent number: 7779232Abstract: A method and apparatus for dynamically managing instruction buffer depths for non-predicted branches reduces wasted energy and resources associated with low confidence branch prediction conditions. A portion of the instruction buffer for a instruction thread is allocated for storing predicted branch instruction streams and another portion, which may be zero-sized during high prediction confidence conditions, is allocated to the non-predicted branch instruction stream. The size of the buffers is adjusted dynamically in conformity with an on-going prediction confidence that provides a measure of how well branch prediction mechanisms are working for a given instruction thread. An alternate instruction fetch address table can be maintained and multiplexed with the main fetch address register for addressing the instruction cache, so that the instruction stream can be quickly shifted to the non-predicted path when a branch instruction is resolved to the non-predicted path.Type: GrantFiled: August 28, 2007Date of Patent: August 17, 2010Assignee: International Business Machines CorporationInventors: Richard W. Doing, Michael O. Klett, Kevin N. Magill, Brian R. Mestan, David Mui, Balaram Sinharoy, Jeffrey R. Summers
-
System and method for implementing a software-supported thread assist mechanism for a microprocessor
Patent number: 7779233Abstract: A system and computer-implementable method for implementing software-supported thread assist within a data processing system, wherein the data processing system supports processing instructions within at least a first thread and a second thread. An instruction dispatch unit (IDU) places the first thread into a sleep mode. The IDU separates an instruction stream for the second thread into at least a first independent instruction stream and a second independent instruction stream. The first independent instruction stream is processed utilizing facilities allocated to the first thread and the second independent instruction stream is processed utilizing facilities allocated to the second thread.Type: GrantFiled: October 23, 2007Date of Patent: August 17, 2010Assignee: International Business Machines CorporationInventors: Hung Q. Le, Dung Q. Nguyen -
Patent number: 7769954Abstract: A data processing system includes: a cache memory comprising a plurality of ways, each of which stores a data line including a data and address information of the data; an analysis module that analyzes whether or not a data requested in a read instruction is to be used in a subsequent instruction to be executed within a predetermined time period after the execution of the read instruction is started; a mode selection module that selects one of a plurality of access modes for accessing the cache memory based on a result of the analysis module; and an access unit that accesses the cache memory in the selected one of the access modes when the read instruction is executed.Type: GrantFiled: March 29, 2007Date of Patent: August 3, 2010Assignee: Kabushiki Kaisha ToshibaInventor: Kenta Yasufuku
-
Patent number: 7761667Abstract: A mechanism is provided that identifies instructions that access storage and may be candidates for catch prefetching. The mechanism augments these instructions so that any given instance of the instruction operates in one of four modes, namely normal, unexecuted, data gathering, and validation. In the normal mode, the instruction merely performs the function specified in the software runtime environment. An instruction in unexecuted mode, upon the next execution, is placed in data gathering mode. When an instruction in the data gathering mode is encountered, the mechanism of the present invention collects data to discover potential fixed storage access patterns. When an instruction is in validation mode, the mechanism of the present invention validates the presumed fixed storage access patterns.Type: GrantFiled: August 12, 2008Date of Patent: July 20, 2010Assignee: International Business Machines CorporationInventors: Christopher Michael Donawa, Allan Henry Kielstra
-
Publication number: 20100153689Abstract: Instruction execution includes fetching an instruction that comprises a first set of one or more bits identifying the instruction, and a second set of one or more bits associated with a first address value. It further includes executing the instruction to determine whether to perform a trap, wherein executing the instruction includes selecting from a plurality of tests at least one test for determining whether to perform a trap and carrying out the at least one test.Type: ApplicationFiled: February 12, 2010Publication date: June 17, 2010Inventors: Jack Choquette, Gil Tene, Michael A. Wolf
-
Patent number: 7730288Abstract: A method and apparatus for executing instructions. The method includes receiving a first load instruction and a second load instruction. The method also includes issuing the first load instruction and the second load instruction to a cascaded delayed execution pipeline unit having at least a first execution pipeline and a second execution pipeline, wherein the second execution pipeline executes an instruction in a common issue group in a delayed manner relative to another instruction in the common issue group executed in the first execution pipeline. The method also includes accessing a cache by executing the first load instruction and the second load instruction. A delay between execution of the first load instruction and the second load instruction allows the cache to complete the access with the first load instruction before beginning the access with the second load instruction.Type: GrantFiled: June 27, 2007Date of Patent: June 1, 2010Assignee: International Business Machines CorporationInventor: David Arnold Luick
-
Patent number: 7730289Abstract: A method for preloading data in a CPU pipeline is provided, which includes the following steps. When a hint instruction is executed, allocate and initiate an entry in a preload table. When a load instruction is fetched, load a piece of data from a memory into the entry according to the entry. When a use instruction which uses the data loaded by the load instruction is executed, forward the data for the use instruction from the entry instead of from the memory. When the load instruction is executed, update the entry according to the load instruction.Type: GrantFiled: September 27, 2007Date of Patent: June 1, 2010Assignee: Faraday Technology Corp.Inventors: I-Jui Sung, Ming-Chung Kao
-
Patent number: 7725659Abstract: A method of obtaining data, comprising at least one sector, for use by at least a first thread wherein each processor cycle is allocated to at least one thread, includes the steps of: requesting data for at least a first thread; upon receipt of at least a first sector of the data, determining whether the at least first sector is aligned with the at least first thread, wherein a given sector is aligned with a given thread when a processor cycle in which the given sector will be written is allocated to the given thread; responsive to a determination that the at least first sector is aligned with the at least first thread, bypassing the at least first sector, wherein bypassing a sector comprises reading the sector while it is being written; and responsive to a determination that the at least first sector is not aligned with the at least first thread, delaying the writing of the at least first sector until the occurrence of a processor cycle allocated to the at least first thread by retaining the at least first sType: GrantFiled: September 5, 2007Date of Patent: May 25, 2010Assignee: International Business Machines CorporationInventors: Michael Karl Gschwind, Hans Mikael Jacobson, Robert Alan Philhower
-
Patent number: 7721070Abstract: A high-performance, superscalar-based computer system with out-of-order instruction execution for enhanced resource utilization and performance throughput. The computer system fetches a plurality of fixed length instructions with a specified, sequential program order (in-order). The computer system includes an instruction execution unit including a register file, a plurality of functional units, and an instruction control unit for examining the instructions and scheduling the instructions for out-of-order execution by the functional units. The register file includes a set of temporary data registers that are utilized by the instruction execution control unit to receive data results generated by the functional units. The data results of each executed instruction are stored in the temporary data registers until all prior instructions have been executed, thereby retiring the executed instruction in-order.Type: GrantFiled: September 22, 2008Date of Patent: May 18, 2010Inventors: Le Trong Nguyen, Derek J. Lentz, Yoshiyuki Miyayama, Sanjiv Garg, Yasuaki Hagiwara, Johannes Wang, Te-Li Lau, Sze-Shun Wang, Quang H. Trang
-
Publication number: 20100122064Abstract: A device may include a data processing logic cell field and one or more sequential CPUs. The logic cell field and the CPUs may be configured to be coupled to each other for data exchange. The data exchange may be in block form using lines leading to a cache memory. In a method for operating a reconfigurable unit having runtime-limited configurations, the configurations may be able to increase their maximum allowed runtime, e.g., by triggering a parallel counter. An increase in configuration runtime by the configurations may be suppressed in response to an interrupt.Type: ApplicationFiled: September 30, 2009Publication date: May 13, 2010Inventor: MARTIN VORBACH
-
Patent number: 7716427Abstract: In a microprocessor having a load/store unit and prefetch hardware, the prefetch hardware includes a prefetch queue containing entries indicative of allocated data streams. A prefetch engine receives an address associated with a store instruction executed by the load/store unit. The prefetch engine determines whether to allocate an entry in the prefetch queue corresponding to the store instruction by comparing entries in the queue to a window of addresses encompassing multiple cache blocks, where the window of addresses is derived from the received address. The prefetch engine compares entries in the prefetch queue to a window of 2M contiguous cache blocks. The prefetch engine suppresses allocation of a new entry when any entry in the prefetch queue is within the address window. The prefetch engine further suppresses allocation of a new entry when the data address of the store instruction is equal to an address in a border area of the address window.Type: GrantFiled: January 4, 2008Date of Patent: May 11, 2010Assignee: International Business Machines CorporationInventors: John Barry Griswell, Jr., Hung Qui Le, Francis Patrick O'Connell, William J. Starke, Jeffrey Adam Stuecheli, Albert Thomas Williams
-
Publication number: 20100115513Abstract: Provided is a virtual machine including a first virtualization module operating on a physical CPU, for providing a first CPU, and a second virtualization module operating on the first CPU, for providing second CPU. The second virtualization module includes first processor control information holding a state of the first CPU obtained at a time of execution of the user program. The first virtualization module includes second processor control information containing a state of the physical CPU obtained at the time of the execution of the second virtualization module, third processor control information containing a state of the physical CPU obtained at the time of the execution of the user program, and prefetch entry information in which information to be prefetched from the third processor control information is set, and, upon detection of a event, the information set in the prefetch entry information is reflected to the first processor control information.Type: ApplicationFiled: October 30, 2009Publication date: May 6, 2010Inventors: Toshiomi MORIKI, Naoya Hattori, Yuji Tsushima
-
Patent number: 7711927Abstract: An instruction preload instruction executed in a first processor instruction set operating mode is operative to correctly preload instructions in a different, second instruction set. The instructions are pre-decoded according to the second instruction set encoding in response to an instruction set preload indicator (ISPI). In various embodiments, the ISPI may be set prior to executing the preload instruction, or may comprise part of the preload instruction or the preload target address.Type: GrantFiled: March 14, 2007Date of Patent: May 4, 2010Assignee: QUALCOMM IncorporatedInventors: Thomas Andrew Sartorius, Brian Michael Stempel, Rodney Wayne Smith
-
Patent number: 7707388Abstract: In one embodiment, a serial processor is configured to execute software instructions in a software program in serial. A serial memory is configured to store data for use by the serial processor in executing the software instructions in serial. A plurality of parallel processors are configured to execute software instructions in the software program in parallel. A plurality of partitioned memory modules are provided and configured to store data for use by the plurality of parallel processors in executing software instructions in parallel. Accordingly, a processor/memory structure is provided that allows serial programs to use quick local serial memories and parallel programs to use partitioned parallel memories. The system may switch between a serial mode and a parallel mode. The system may incorporate pre-fetching commands of several varieties.Type: GrantFiled: November 29, 2006Date of Patent: April 27, 2010Assignee: XMTT Inc.Inventor: Uzi Vishkin
-
Patent number: 7702856Abstract: The prefetch distance to be used by a prefetch instruction may not always be correctly calculated using compile-time information. In one embodiment, the present invention generates prefetch distance calculation code to dynamically calculate a prefetch distance used by a prefetch instruction at run-time.Type: GrantFiled: November 9, 2005Date of Patent: April 20, 2010Assignee: Intel CorporationInventors: Rakesh Krishnaiyer, Somnath Ghosh, Abhay Kanhere
-
Patent number: 7702888Abstract: An apparatus for executing branch predictor directed prefetch operations. During operation, a branch prediction unit may provide an address of a first instruction to the fetch unit. The fetch unit may send a fetch request for the first instruction to the instruction cache to perform a fetch operation. In response to detecting a cache miss corresponding to the first instruction, the fetch unit may execute one or more prefetch operation while the cache miss corresponding to the first instruction is being serviced. The branch prediction unit may provide an address of a predicted next instruction in the instruction stream to the fetch unit. The fetch unit may send a prefetch request for the predicted next instruction to the instruction cache to execute the prefetch operation. The fetch unit may store prefetched instruction data obtained from a next level of memory in the instruction cache or in a prefetch buffer.Type: GrantFiled: February 28, 2007Date of Patent: April 20, 2010Assignee: GlobalFoundries Inc.Inventors: Marius Evers, Trivikram Krishnamurthy
-
Publication number: 20100082948Abstract: In a CCW fetching section, for each input/output device being a control objective, a result prediction table in which prediction values of status values to be returned from an input/output device as execution results of CCW commands, is referred to. Then, based on the prediction values, commands being pre-fetching objectives are pre-fetched from a CCW program stored in a memory, and transmitted to a CCW executing section. On the other hand, in the CCW executing section, the pre-fetched commands are sequentially executed, and the actual status values as the execution results are received from the input/output device. Then, when the received actual status values are not same as the predicted status values, success or failure in prediction is notified to the CCW fetching section, and also, the result prediction table is updated in the CCW fetching section.Type: ApplicationFiled: June 29, 2009Publication date: April 1, 2010Applicant: FUJITSU LIMITEDInventors: Tsukasa Matsuda, Hideki Yamanaka
-
Patent number: 7689774Abstract: A system and method for improving the page crossing performance of a data prefetcher is presented. A prefetch engine tracks times at which a data stream terminates due to a page boundary. When a certain percentage of data streams terminate at page boundaries, the prefetch engine sets an aggressive profile flag. In turn, when the data prefetch engine receives a real address that corresponds to the beginning/end of a new page, and the aggressive profile flag is set, the prefetch engine uses an aggressive startup profile to generate and schedule prefetches on the assumption that the real address is highly likely to be the continuation of a long data stream. As a result, the system and method minimize latency when crossing real page boundaries when a program is predominately accessing long streams.Type: GrantFiled: April 6, 2007Date of Patent: March 30, 2010Assignee: International Business Machines CorporationInventors: Francis Patrick O'Connell, Jeffrey A. Stuecheli
-
Patent number: 7689775Abstract: Computer implemented method, system and computer program product for prefetching data in a data processing system. A computer implemented method for prefetching data in a data processing system includes generating attribute information of prior data streams by associating attributes of each prior data stream with a storage access instruction which caused allocation of the data stream, and then recording the generated attribute information. The recorded attribute information is accessed, and a behavior of a new data stream is modified using the accessed recorded attribute information.Type: GrantFiled: March 9, 2009Date of Patent: March 30, 2010Assignee: International Business Machines CorporationInventors: John Barry Griswell, Jr., Francis Patrick O'Connell
-
Patent number: 7681188Abstract: One embodiment of the present invention provides a system that facilitates locked prefetch scheduling in general cyclic regions of a computer program. The system operates by first receiving a source code for the computer program and compiling the source code into intermediate code. The system then performs a trace detection on the intermediate code. Next, the system inserts prefetch instructions and corresponding locks into the intermediate code. Finally, the system generates executable code from the intermediate code, wherein a lock for a given prefetch instruction prevents subsequent prefetches from being issued until the data value returns for the given prefetch instruction.Type: GrantFiled: April 29, 2005Date of Patent: March 16, 2010Assignee: Sun Microsystems, Inc.Inventors: Partha P. Tirumalai, Spiros Kalogeropulos, Yonghong Song
-
Patent number: 7676659Abstract: In a processor executing instructions from a variable-length instruction set, a preload instruction is operative to retrieve from memory a data block corresponding to an instruction cache line, pre-decode instructions from a variable-length instruction set in the data block, and load the instructions and pre-decode information into the instruction cache. An instruction execution unit indicates to a pre-decoder the position within the data block of a first valid instruction. The pre-decoder successively determines the length of each instruction and hence the instruction boundaries. An instruction cache line offset indicator that identifies the position of the first valid instruction may be generated and provided to the pre-decoder in a variety of ways.Type: GrantFiled: April 4, 2007Date of Patent: March 9, 2010Assignee: QUALCOMM IncorporatedInventors: Brian Michael Stempel, Thomas Andrew Sartorius, Rodney Wayne Smith
-
Publication number: 20100049947Abstract: A processor and an early-load method thereof are provided. In the early-load method, an instruction is fetched and determined in an instruction fetch stage to obtain a determination result. Whether to early-load an early-loaded data corresponding to the instruction is determined according to the determination result. A target data is fetched according to the instruction in an instruction execution stage if the early-loaded data is not loaded correctly. The early-loaded data is served as the target data if the early-loaded data is loaded correctly.Type: ApplicationFiled: August 22, 2008Publication date: February 25, 2010Applicant: FARADAY TECHNOLOGY CORP.Inventors: Shun-Chieh Chang, Yuan-Hwa Li, Yuan-Jung Kuo, Chin-Ling Huang, Chung-Ping Chung
-
Patent number: 7669194Abstract: A mechanism for minimizing effective memory latency without unnecessary cost through fine-grained software-directed data prefetching using integrated high-level and low-level code analysis and optimizations is provided. The mechanism identifies and classifies streams, identifies data that is most likely to incur a cache miss, exploits effective hardware prefetching to determine the proper number of streams to be prefetched, exploits effective data prefetching on different types of streams in order to eliminate redundant prefetching and avoid cache pollution, and uses high-level transformations with integrated lower level cost analysis in the instruction scheduler to schedule prefetch instructions effectively.Type: GrantFiled: August 26, 2004Date of Patent: February 23, 2010Assignee: International Business Machines CorporationInventors: Roch Georges Archambault, Robert James Blainey, Yaoqing Gao, Allan Russell Martin, James Lawrence McInnes, Francis Patrick O'Connell
-
Patent number: 7664920Abstract: A microprocessor includes a hierarchical memory subsystem, an instruction decoder, and a stream prefetch unit. The decoder decodes an instruction that specifies a locality characteristic parameter. In one embodiment, the parameter specifies a relative urgency with which a data stream specified by the instruction is needed rather than specifying exactly which of the cache memories in the hierarchy to prefetch the data stream into. The prefetch unit selects one of the cache memory levels in the hierarchy for prefetching the data stream into based on the memory subsystem configuration and on the relative urgency. In another embodiment, the prefetch unit instructs the memory subsystem to mark the prefetched cache line for early, late, or normal eviction according to its cache line replacement policy based on the parameter value.Type: GrantFiled: August 11, 2006Date of Patent: February 16, 2010Assignee: MIPS Technologies, Inc.Inventor: Keith E. Diefendorff
-
Patent number: 7664942Abstract: Embodiments of the present invention provide a system that executes program code in a processor. The system starts by executing the program code in a normal mode using a primary strand while concurrently executing the program code ahead of the primary strand using a subordinate strand in a scout mode. Upon resolving a branch using the subordinate strand, the system records a resolution for the branch in a speculative branch resolution table. Upon subsequently encountering the branch using the primary strand, the system uses the recorded resolution from the speculative branch resolution table to predict a resolution for the branch for the primary strand. Upon determining that the resolution of the branch was mispredicted for the primary strand, the system determines that the subordinate strand mispredicted the branch. The system then recovers the subordinate strand to the branch and restarts the subordinate strand executing the program code.Type: GrantFiled: August 25, 2008Date of Patent: February 16, 2010Assignee: Sun Microsystems, Inc.Inventors: Marc Tremblay, Shailender Chaudhry
-
Publication number: 20100036987Abstract: Techniques for interrupt processing are described. An exceptional condition is detected in one or more stages of an instruction pipeline in a processor. In response to the detected exceptional condition and prior to the processor accepting an interrupt in response to the detected exceptional condition, an instruction cache is checked for the presence of an instruction at a starting address of an interrupt handler. The instruction at the starting address of the interrupt vector table is prefetched from storage above the instruction cache when the instruction is not present in the instruction cache to load the instruction in the instruction cache, whereby the instruction is made available in the instruction cache by the time the processor accepts the interrupt in response to the detected exceptional condition.Type: ApplicationFiled: August 8, 2008Publication date: February 11, 2010Applicant: QUALCOMM INCORPORATEDInventors: Daren Eugene Streett, Brian Michael Stempel
-
Patent number: 7657723Abstract: A system and method are described for a memory management processor which, using a table of reference addresses embedded in the object code, can open the appropriate memory pages to expedite the retrieval of information from memory referenced by instructions in the execution pipeline. A suitable compiler parses the source code and collects references to branch addresses, calls to other routines, or data references, and creates reference tables listing the addresses for these references at the beginning of each routine. These tables are received by the memory management processor as the instructions of the routine are beginning to be loaded into the execution pipeline, so that the memory management processor can begin opening memory pages where the referenced information is stored. Opening the memory pages where the referenced information is located before the instructions reach the instruction processor helps lessen memory latency delays which can greatly impede processing performance.Type: GrantFiled: January 28, 2009Date of Patent: February 2, 2010Assignee: Micron Technology, Inc.Inventor: Dean A. Klein
-
Patent number: 7657883Abstract: A dispatch scheduler in a multithreading microprocessor is disclosed. Each of N concurrently executing threads has one of P priorities. P N-bit round-robin vectors are generated, each being a 1-bit left-rotated and subsequently sign-extended version of an N-bit 1-hot input vector indicating the last thread selected for dispatching at the priority. N P-input muxes each receive a corresponding one of the N bits of each of the P round-robin vectors and selects the input specified by the thread priority. Selection logic selects an instruction for dispatching from the thread having a dispatch value greater than or equal to any of the threads left thereof in the N-bit input vectors. The dispatch value of each of the threads comprises a least-significant bit equal to the corresponding P-input mux output, a most-significant bit that is true if the instruction is dispatchable, and middle bits comprising the priority of the thread.Type: GrantFiled: March 22, 2005Date of Patent: February 2, 2010Assignee: MIPS Technologies, Inc.Inventor: Michael Gottlieb Jensen
-
Patent number: 7647477Abstract: Inspecting a currently fetched instruction group and determining branching behavior of the currently fetched instruction group, allows for intelligent instruction prefetching. A currently fetched instruction group is predecoded and, assuming the currently fetch instruction group includes a branch type instruction, a branch target is characterized in relation to a fetch boundary, which delimits a memory region contiguous with the memory region that hosts the currently fetched instruction group. Instruction prefetching is included based, at least in part, on the predecoded characterization of the branch target.Type: GrantFiled: November 23, 2004Date of Patent: January 12, 2010Assignee: Sun Microsystems, Inc.Inventors: Paul Caprioli, Shailender Chaudhry
-
Publication number: 20100005251Abstract: The memory unit is compatible with a plurality of operation modes. The plurality of operation modes include the normal mode allowing access and the standby mode consuming a lower power than the normal mode. The branch detection section detects a branch instruction from an instruction fetched from the memory unit by the CPU. The mode control section changes an operation mode of the memory unit according to a detection result by the branch detection section.Type: ApplicationFiled: December 23, 2008Publication date: January 7, 2010Applicant: NEC ELECTRONICS CORPORATIONInventor: Kiminari Yamazoe
-
Publication number: 20090313456Abstract: A method, storage medium, processor instruction and processor to for specifying a value in a first portion of a conditional pre-fetch instruction associated with a branch instruction used for effectuating a branch operation, specifying a target instruction address in a second portion of the instruction, evaluating the value to determine whether a condition is met, and pre-fetching one or more instructions starting at the target instruction address into an instruction buffer of the processor when the condition is met, is provided.Type: ApplicationFiled: August 13, 2009Publication date: December 17, 2009Applicant: SONY COMPUTER ENTERTAINMENT INC.Inventors: Masahiro Yasue, Akiyuki Hatakeyama
-
Publication number: 20090313455Abstract: A multithreaded processor is provided with a saturating counter which serves to generate a thread preference signal to steer selection of which program thread operations are taken from for issue into the multiple processor pipelines. The counter is updated based upon the selections made for issue. The counter is a saturating counter and its sign bit may be used as a thread preference signal when discriminating between two threads. The update made to the count value can be weighted depending upon programmable priorities associated with the respective threads as well as a weighting based upon the time taken to execute the type of operation selected.Type: ApplicationFiled: December 15, 2005Publication date: December 17, 2009Inventors: David Hennah Mansell, Stuart David Biles