Prefetching Patents (Class 712/207)
-
Patent number: 7484041Abstract: Systems and methods for improving the performance of a multiprocessor system by enabling a first processor to initiate the retrieval of data and the storage of the data in the cache memory of a second processor. One embodiment comprises a system having a plurality of processors coupled to a bus, where each processor has a corresponding cache memory. The processors are configured so that a first one of the processors can issue a preload command directing a target processor to load data into the target processor's cache memory. The preload command may be issued in response to a preload instruction in program code, or in response to an event. The first processor may include an explicit identifier of the target processor in the preload command, or the selection of the target processor may be left to another agent, such as an arbitrator coupled to the bus.Type: GrantFiled: April 4, 2005Date of Patent: January 27, 2009Assignee: Kabushiki Kaisha ToshibaInventor: Takashi Yoshikawa
-
Publication number: 20090024835Abstract: A system and method for pre-fetching data from system memory. A multi-core processor accesses a cache hit predictor concurrently with sending a memory request to a cache subsystem. The predictor has two tables. The first table is indexed by a portion of a memory address and provides a hit prediction based on a first counter value. The second table is indexed by a core number and provides a hit prediction based on a second counter value. If neither table predicts a hit, a pre-fetch request is sent to memory. In response to detecting said hit prediction is incorrect, the pre-fetch is cancelled.Type: ApplicationFiled: July 19, 2007Publication date: January 22, 2009Inventors: Michael K. Fertig, Patrick Conway, Kevin Michael Lepak, Cissy Xumin Yuan
-
Patent number: 7480769Abstract: A microprocessor coupled to a system memory includes a load request signal that requests data be loaded from the system memory into the microprocessor in response to a load instruction. The load request signal includes a load virtual page address. The microprocessor also includes a prefetch request signal that requests a cache line be prefetched from the system memory into the microprocessor in response to a prefetch instruction. The prefetch request signal includes a prefetch virtual page address.Type: GrantFiled: August 11, 2006Date of Patent: January 20, 2009Assignee: MIPS Technologies, Inc.Inventors: Keith E. Diefendorff, Thomas A. Petersen
-
Patent number: 7480783Abstract: Disclosed are systems for loading an unaligned word from a specified unaligned word address in a memory, the unaligned word comprising a plurality of indexed portions crossing a word boundry, a method of operating the system comprising: loading a first aligned word commencing at an aligned word address rounded from the specified unaligned word address; identifying an index representing the location of the unaligned word address relative to the aligned word address; loading a second aligned word commencing at an aligned word address rounded from a second unaligned word address; and combining indexed portions of the first and second alinged words using the indentified index to construct the unaligned word.Type: GrantFiled: August 19, 2004Date of Patent: January 20, 2009Assignees: STMicroelectronics Limited, Hewlett-Packard CompanyInventors: Mark O. Homewood, Paolo Faraboschi
-
Publication number: 20090019261Abstract: A high-performance, superscalar-based computer system with out-of-order instruction execution for enhanced resource utilization and performance throughput. The computer system fetches a plurality of fixed length instructions with a specified, sequential program order (in-order). The computer system includes an instruction execution unit including a register file, a plurality of functional units, and an instruction control unit for examining the instructions and scheduling the instructions for out-of-order execution by the functional units. The register file includes a set of temporary data registers that are utilized by the instruction execution control unit to receive data results generated by the functional units. The data results of each executed instruction are stored in the temporary data registers until all prior instructions have been executed, thereby retiring the executed instruction in-order.Type: ApplicationFiled: September 22, 2008Publication date: January 15, 2009Applicant: Seiko Epson CorporationInventors: Le Trong NGUYEN, Derek J. LENTZ, Yoshiyuki MIYAYAMA, Sanjiv GARG, Yasuaki HAGIWARA, Johannes WANG, Te-Li LAU, Sze-Shun WANG, Quang H. TRANG
-
Publication number: 20090019260Abstract: Disclosed herein is a mass prefetching method for disk arrays. In order to improve disk read performance for a non-sequential with having spatial locality as well as a sequential read, when a host requests a block to be read, all the blocks of the strip to which the block belongs are read. This is designated as strip prefetching (SP). Throttled Strip Prefetching (TSP), proposed in the present invention, investigates whether SP is beneficial by an online disk simulation, and does not perform SP if it is determined that SP is not beneficial. Since all prefetching operations of TSP are aligned in the strip of the disk array, the disk independence loss is resolved, and thus the performance of disk arrays is improved for concurrent sequential reads of multiple processes. TSP may however suffer from the loss of disk parallelism due to the disk independence of SP for a single sequential read. In order to solve this problem, this invention proposes Massive Stripe Prefetching (MSP).Type: ApplicationFiled: January 2, 2008Publication date: January 15, 2009Applicant: KOREA ADVANCED INSTITUTE OF SCIENCE AND TECHNOLOGYInventors: Kyu-Ho Park, Sung-Hoon Baek
-
Publication number: 20090006813Abstract: An apparatus, system, and method are disclosed. In one embodiment, the apparatus includes a system memory-side prefetcher that is coupled to a memory controller. The system memory-side prefetcher includes a stride detection unit to identify one or more patterns in a stream. The system memory-side prefetcher also includes a prefetch injection unit to insert prefetches into the memory controller based on the detected one or more patterns. The system memory-side prefetcher also includes a prefetch data forwarding unit to forward the prefetched data to a cache memory coupled to a processor.Type: ApplicationFiled: June 28, 2007Publication date: January 1, 2009Inventors: Abhishek Singhal, Hemant G. Rotithor
-
Patent number: 7472256Abstract: Profile information can be used to target read operations that cause a substantial portion of misses in a program. A software value prediction technique that utilizes latency and is applied to the targeted read operations facilitates aggressive speculative execution without significant performance impact and without hardware support. A software value predictor issues prefetches for targeted read operations during speculative execution, and utilizes values from these prefetches during subsequent speculative execution, since the earlier prefectches should have completed, to update a software value prediction structure(s). Such a software based value prediction technique allows for aggressive speculative execution without the overhead of a hardware value predictor.Type: GrantFiled: April 12, 2005Date of Patent: December 30, 2008Assignee: Sun Microsystems, Inc.Inventors: Sreekumar R. Nair, Santosh G. Abraham
-
Publication number: 20080307202Abstract: Provided are a method and system for loading test data into execution units in a graphics card to test the execution units. Test instructions are loaded into a cache in a graphics module comprising multiple execution units coupled to the cache on a bus during a design test mode. The cache instructions are concurrently transferred to an instruction queue of each execution unit to concurrently load the cache instructions into the instruction queues of the execution units. The execution units concurrently execute the cache instructions to fetch test instructions from the cache to load into memories of the execution units and execute during the design test mode.Type: ApplicationFiled: June 7, 2007Publication date: December 11, 2008Inventors: Allan WONG, Ke YIN, Naveen MATAM, Anthony BABELLA, Wing Hang WONG
-
Patent number: 7461237Abstract: A system that suppresses duplicative prefetches for branch target cache lines. During operation, the system fetches a first cache line into in a fetch buffer. The system then prefetches a second cache line, which immediately follows the first cache line, into the fetch buffer. If a control transfer instruction in the first cache line has a target instruction which is located in the second cache line, the system determines if the control transfer instruction is also located at the end of the first cache line so that a corresponding delay slot for the control transfer instruction is located at the beginning of the second cache line. If so, the system suppresses a subsequent prefetch for a target cache line containing the target instruction because the target instruction is located in the second cache line which has already been prefetched.Type: GrantFiled: April 20, 2005Date of Patent: December 2, 2008Assignee: Sun Microsystems, Inc.Inventors: Abid Ali, Paul Caprioli, Shailender Chaudhry, Miles Lee
-
Publication number: 20080288751Abstract: A processor system (100) includes a central processing unit (102) and a prefetch engine (110). The prefetch engine (110) is coupled to the central processing unit (102). The prefetch engine (110) is configured to detect, when data associated with the central processing unit (102) is read from a memory (114), a stride pattern in an address stream based upon whether sums of a current stride and a previous stride are equal for a number of consecutive reads. The prefetch engine (110) is also configured to prefetch, for the central processing unit (102), data from the memory (114) based on the detected stride pattern.Type: ApplicationFiled: May 17, 2007Publication date: November 20, 2008Applicant: ADVANCED MICRO DEVICES, INC.Inventor: Andrej Kocev
-
Publication number: 20080276073Abstract: An apparatus is provided for buffering instructions. An instruction store has memory locations for storing instructions. Each instruction can be associated with a timer such that an instruction dispatcher causes the instruction to be sent when the timer indicates that the instruction should be sent.Type: ApplicationFiled: May 2, 2007Publication date: November 6, 2008Applicant: Analog Devices, Inc.Inventors: Joern Soerensen, Dilip Muthukrishnan, William Plumb, Thomas Keller, Morag Clark
-
Patent number: 7447877Abstract: A method and apparatus for converting memory instructions to prefetch operations during a thread switch window is disclosed. In one embodiment, memory access instructions that are already inside an instruction pipeline when the current thread is switched out may be decoded and then converted to the complementary prefetch operations. The prefetch operation may place the data into the cache during the execution of the alternate thread.Type: GrantFiled: June 13, 2002Date of Patent: November 4, 2008Assignee: Intel CorporationInventors: Bharadwaj Pudipeddi, Udo Walterscheidt
-
Patent number: 7441110Abstract: A mechanism is described that predicts the usefulness of a prefetching instruction during the instruction's decode cycle. Prefetching instructions that are predicted as useful (prefetch useful data) are sent to an execution unit of the processor for execution, while instructions that are predicted as not useful are discarded. The prediction regarding the usefulness of a prefetching instructions is performed utilizing a branch prediction mask contained in the branch history mechanism. This mask is compared to information contained in the prefetching instruction that records the branch path between the prefetching instruction and actual use of the data. Both instructions and data can be prefetched using this mechanism.Type: GrantFiled: December 10, 1999Date of Patent: October 21, 2008Assignee: International Business Machines CorporationInventors: Thomas R. Puzak, Allan M. Hartstein, Mark Charney, Daniel A. Prener, Peter H. Oden
-
Patent number: 7437542Abstract: A conjugate processor includes an instruction set architecture (ISA) visible portion having a main pipeline, and an h-flow portion having an h-flow pipeline. The binary executed on the conjugate processor includes an essential portion that is executed on the main pipeline and a non-essential portion that is executed on the h-flow pipeline. The non-essential portion includes hint calculus that is used to provide hints to the main pipeline. The conjugate processor also includes a conjugate mapping table that maps triggers to h-flow targets. Triggers can be instruction attributes, data attributes, state attributes or event attributes. When a trigger is satisfied, the h-flow code specified by the target is executed in the h-flow pipeline.Type: GrantFiled: January 13, 2006Date of Patent: October 14, 2008Assignee: Intel CorporationInventors: Hong Wang, Ralph Kling, Yong-Fong Lee, David A. Berson, Michael A. Kozuch, Konrad Lai
-
Patent number: 7434005Abstract: A preload controller for controlling a bus access device that reads out data from a main memory via a bus and transfers the readout data to a temporary memory, including a first acquiring device to acquire access hint information which represents a data access interval to the main memory, a second acquiring device to acquire system information which represents a transfer delay time in transfer of data via the bus by the bus access device, a determining device to determine a preload unit count based on the data access interval represented by the access hint information and the transfer delay time represented by the system information, and a management device to instruct the bus access device to read out data for the preload unit count from the main memory and to transfer the readout data to the temporary memory ahead of a data access of the data.Type: GrantFiled: June 14, 2005Date of Patent: October 7, 2008Assignee: Kabushiki Kaisha ToshibaInventors: Seiji Maeda, Yusuke Shirota
-
Publication number: 20080243268Abstract: According to one example embodiment of the inventive subject matter, there is provided a mechanism that controls which prefetchers are applied to execute an application in a computing system by turning them on and off. In one embodiment, this may be accomplished for example with a software control process that may run in the background. In another example embodiment, this may be accomplished using a hardware control machine, or a combination of hardware and software. The prefetchers are turned on and off in order to increase the performance of the computing system.Type: ApplicationFiled: March 31, 2007Publication date: October 2, 2008Inventors: Meenakshi A. Kandaswamy, Simon C. Steely
-
Publication number: 20080244080Abstract: A processor includes non-volatile memory into which streamed application components may be pre-fetched from a slower storage medium in order to decrease stall times during execution of the application. Alternatively, the application components pre-fetched into the non-volatile memory may be from a traditionally-loaded application rather than a streamed application. The order in which components of the application are prefetched into the non-volatile memory may be based on load order hints. For at least one embodiment, the load order hints are derived from sever-side load ordering logic. For at least one other embodiment, the load order hints are provided by the application itself via a mechanism such as an application programming interface. For at least one other embodiment, the load order hints are generated by the client using profile data. Or, a combination of such approaches may be used. Other embodiments are also described and claimed.Type: ApplicationFiled: March 29, 2007Publication date: October 2, 2008Inventors: Thomas H. James, Steven Grobman
-
Publication number: 20080244231Abstract: In some embodiments, the invention involves a novel combination of techniques for prefetching data and passing messages between and among cores in a multi-processor/multi-core platform. In an embodiment, a receiving core has a message queue and a message prefetcher. Incoming messages are simultaneously written to the message queue and the message prefetcher. The prefetcher speculatively fetches data referenced in the received message so that the data is available when the message is executed in the execution pipeline, or shortly thereafter. Other embodiments are described and claimed.Type: ApplicationFiled: March 30, 2007Publication date: October 2, 2008Inventors: Aaron Kunze, Erik J. Johnson, Hermann Gartler
-
Publication number: 20080244232Abstract: Apparatus and computing systems associated with data pre-fetching are described. One embodiment includes a processor that includes a first unit to store data corresponding to a load instruction and an instruction pointer (IP) value associated with the load instruction. The processor also includes a second unit to produce a predicted demand address for a next load instruction, the predicted demand address being based on a constant stride value. The processor also includes a third unit to generate an instruction pointer pre-fetch (IPP) request for the predicted demand address. The processor may also include units to arbitrate between generated IP pre-fetch requests and alternative pre-fetch requests.Type: ApplicationFiled: April 2, 2007Publication date: October 2, 2008Inventors: Marina Sherman, Jack Doweck
-
Patent number: 7430650Abstract: Cache prefetching algorithm uses previously requested address and data patterns to predict future data needs and prefetch such data from memory into cache. A requested address is compared to previously requested addresses and returned data to compute a set of increments, and the set of increments is added to the currently requested address and returned data to generate a set of prefetch candidates. Weight functions are used to prioritize prefetch candidates. The prefetching method requires no changes to application code or operation system (OS) and is transparent to the compiler and the processor. The prefetching method comprises a parallel algorithm well-suited to implementation on an application-specific integrated circuit (ASIC) or field-programmable gate array (FPGA), or to integration into a processor.Type: GrantFiled: June 13, 2005Date of Patent: September 30, 2008Inventor: Richard A. Ross
-
Patent number: 7430640Abstract: The decision to prefetch inodes is based upon the detecting of access patterns that would benefit from such a prefetch. Once the decision to prefetch is made, a plurality of inodes are prefetched in parallel. Further, the prefetching of inodes is paced, such that the prefetching substantially matches the speed at which an application requests inodes.Type: GrantFiled: November 8, 2005Date of Patent: September 30, 2008Assignee: International Business Machines CorporationInventors: Frank B. Schmuck, James C. Wyllie
-
Publication number: 20080229069Abstract: An instruction preload instruction executed in a first processor instruction set operating mode is operative to correctly preload instructions in a different, second instruction set. The instructions are pre-decoded according to the second instruction set encoding in response to an instruction set preload indicator (ISPI). In various embodiments, the ISPI may be set prior to executing the preload instruction, or may comprise part of the preload instruction or the preload target address.Type: ApplicationFiled: March 14, 2007Publication date: September 18, 2008Applicant: QUALCOMM INCORPORATEDInventors: Thomas Andrew Sartorius, Brian Michael Stempel, Rodney Wayne Smith
-
Publication number: 20080229072Abstract: A prefetch processing apparatus includes a central-processing-unit monitor unit that monitors processing states of the central processing unit in association with time elapsed from start time of executing a program. A cache-miss-data address obtaining unit obtains cache-miss-data addresses in association with the time elapsed from the start time of executing the program, and a cycle determining unit determines a cycle of time required for executing the program. An identifying unit identifies a prefetch position in a cycle in which a prefetch-target address is to be prefetched by associating the cycle determined by the cycle determining unit with the cache-miss data addresses obtained by the cache-miss-data address obtaining unit. The prefetch-target address is an address of data on which prefetch processing is to be performed.Type: ApplicationFiled: March 5, 2008Publication date: September 18, 2008Applicant: FUJITSU LIMITEDInventors: Shuji Yamamura, Takashi Aoki
-
Publication number: 20080229071Abstract: A prefetch control apparatus includes a prefetch controller for controlling prefetch of read data into a cache memory caching data to be transferred between a computer apparatus and a storage device, and which enhances a read efficiency of the read data from the storage device, a sequentiality decider for deciding whether the read data that are read from the storage device toward the computer apparatus are sequential access data, a locality decider for deciding whether the read data have locality of data arrangement in the predetermined storage area, in a case where the read data that are read from the storage device toward the computer apparatus have been decided not to be sequential access data, and a prefetcher for prefetching the read data in a case where the read data has the locality of the data arrangement.Type: ApplicationFiled: March 5, 2008Publication date: September 18, 2008Applicant: Fujitsu LimitedInventors: Katsuhiko SHIOYA, Eiichi YAMANAKA
-
Publication number: 20080229070Abstract: Cache circuitry, a data processing apparatus including such cache circuitry, and a method for prefetching data into such cache circuitry, are provided. The cache circuitry has a cache storage comprising a plurality of cache lines for storing data values, and control circuitry which is responsive to an access racquet issued by a device of the data processing apparatus identifying a memory address of a data value to be accessed, to cause a lookup operation to be performed to determine whether the data value for that memory address is stored within the cache storage. If not, a linefill operation is initiated to retrieve the data value from memory.Type: ApplicationFiled: March 12, 2007Publication date: September 18, 2008Applicant: ARM LimitedInventors: Elodie Charra, Philippe Jean-Pierre Raphalen, Frederic Claude Marie Piry, Philippe Luc, Gilles Eric Grandou
-
Patent number: 7421540Abstract: A mechanism is provided that identifies instructions that access storage and may be candidates for cache prefetching. The mechanism augments these instructions so that any given instance of the instruction operates in one of four modes, namely normal, unexecuted, data gathering, and validation. In the normal mode, the instruction merely performs the function specified in the software runtime environment. An instruction in unexecuted mode, upon the next execution, is placed in data gathering mode. When an instruction in the data gathering mode is encountered, the mechanism of the present invention collects data to discover potential fixed storage access patterns. When an instruction is in validation mode, the mechanism of the present invention validates the presumed fixed storage access patterns.Type: GrantFiled: May 3, 2005Date of Patent: September 2, 2008Assignee: International Business Machines CorporationInventors: Christopher Michael Donawa, Allan Henry Kielstra
-
Publication number: 20080209173Abstract: An apparatus for executing branch predictor directed prefetch operations. During operation, a branch prediction unit may provide an address of a first instruction to the fetch unit. The fetch unit may send a fetch request for the first instruction to the instruction cache to perform a fetch operation. In response to detecting a cache miss corresponding to the first instruction, the fetch unit may execute one or more prefetch operation while the cache miss corresponding to the first instruction is being serviced. The branch prediction unit may provide an address of a predicted next instruction in the instruction stream to the fetch unit. The fetch unit may send a prefetch request for the predicted next instruction to the instruction cache to execute the prefetch operation. The fetch unit may store prefetched instruction data obtained from a next level of memory in the instruction cache or in a prefetch buffer.Type: ApplicationFiled: February 28, 2007Publication date: August 28, 2008Inventors: Marius Evers, Trivikram Krishnamurthy
-
Publication number: 20080201529Abstract: An apparatus, program product and method initiate, in connection with a context switch operation, a prefetch of data likely to be used by a thread prior to resuming execution of that thread. As a result, once it is known that a context switch will be performed to a particular thread, data may be prefetched on behalf of that thread so that when execution of the thread is resumed, more of the working state for the thread is likely to be cached, or at least in the process of being retrieved into cache memory, thus reducing cache-related performance penalties associated with context switching.Type: ApplicationFiled: April 24, 2008Publication date: August 21, 2008Applicant: International Business Machines CorporationInventors: Jeffrey Powers Bradford, Harold F. Kossman, Timothy John Mullins
-
Publication number: 20080189518Abstract: A processor includes a cache memory that has an array, word lines, and bit lines. A control module accesses cells of the array during access cycles to access instructions stored in the cache memory. The control module performs one of a first discrete read and a first sequential read to access instructions in a first set of cells of the array that are connected to a first word line and selectively performs one of a second discrete read and a second sequential read based on a branch instruction to access instructions in a second set of cells of the array that are connected to a second word line. The second word line is different than the first word line.Type: ApplicationFiled: April 2, 2008Publication date: August 7, 2008Inventors: Sehat Sutardja, Jason T. Su, Hong-Yi Chen, Jason Sheu, Jensen Tjeng
-
Patent number: 7409486Abstract: A protocol chip and a bridge are connected to a first bus, while the bridge and a micro processor (MP) are connected to a second bus. The MP generates parameter information and writes it into a local memory (LM), and issues a write command which includes access destination information to this parameter information to a protocol chip. The bridge pre-fetches the parameter information from the LM using the access destination information within the write command which is transferred to the protocol chip via itself, and when receiving a read command from the protocol chip, transmits the parameter information which has been pre-fetched to the protocol chip via the first bus, without passing the read command through to the MP.Type: GrantFiled: March 27, 2006Date of Patent: August 5, 2008Assignee: Hitachi, Ltd.Inventors: Osamu Torigoe, Hideaki Shima, Shouji Katoh
-
Publication number: 20080184010Abstract: According to the present invention, there is provided an instruction cache prefetch control apparatus having an external memory, a CPU and an instruction cache unit, the instruction cache unit having: an instruction cache data memory which receives and stores the instruction sequence; a prefetch buffer which prefetches and stores an instruction sequence next to the instruction sequence as a target of a fetch request from the CPU when the next instruction sequence is not stored in the instruction cache data memory; an instruction cache write control unit which selectively outputs, to the instruction cache data memory, one of the instruction sequence output from the external memory and the instruction sequence stored in the prefetch buffer; and a hit or miss determination access control unit which, upon receiving, from the CPU, a fetch request for the instruction sequence stored in the prefetch buffer, transfers the instruction sequence from the prefetch buffer to the instruction cache data memory and stores thType: ApplicationFiled: December 31, 2007Publication date: July 31, 2008Applicant: KABUSHIKI KAISHA TOSHIBAInventor: Masato Uchiyama
-
Patent number: 7404042Abstract: A fetch section of a processor comprises an instruction cache and a pipeline of several stages for obtaining instructions. Instructions may cross cache line boundaries. The pipeline stages process two addresses to recover a complete boundary crossing instruction. During such processing, if the second piece of the instruction is not in the cache, the fetch with regard to the first line is invalidated and recycled. On this first pass, processing of the address for the second part of the instruction is treated as a pre-fetch request to load instruction data to the cache from higher level memory, without passing any of that data to the later stages of the processor. When the first line address passes through the fetch stages again, the second line address follows in the normal order, and both pieces of the instruction are can be fetched from the cache and combined in the normal manner.Type: GrantFiled: May 18, 2005Date of Patent: July 22, 2008Assignee: QUALCOMM IncorporatedInventors: Brian Michael Stempel, Jeffrey Todd Bridges, Rodney Wayne Smith, Thomas Andrew Sartorius
-
Publication number: 20080168259Abstract: A DMA device prefetches descriptors into a descriptor prefetch buffer. The size of descriptor prefetch buffer holds an appropriate number of descriptors for a given latency environment. To support a linked list of descriptors, the DMA engine prefetches descriptors based on the assumption that they are sequential in memory and discards any descriptors that are found to violate this assumption. The DMA engine seeks to keep the descriptor prefetch buffer full by requesting multiple descriptors per transaction whenever possible. The bus engine fetches these descriptors from system memory and writes them to the prefetch buffer. The DMA engine may also use an aggressive prefetch where the bus engine requests the maximum number of descriptors that the buffer will support whenever there is any space in the descriptor prefetch buffer. The DMA device discards any remaining descriptors that cannot be stored.Type: ApplicationFiled: January 10, 2007Publication date: July 10, 2008Inventors: Giora Biran, Luis E. De la Torre, Bernard C. Drerup, Jyoti Gupta, Richard Nicholas
-
Publication number: 20080162819Abstract: A design structure for prefetching instruction lines is provided. The design structure is embodied in a machine readable storage medium for designing, manufacturing, and/or testing a design. The design structure comprises a processor having a level 2 cache, and a level 1 cache configured to receive instruction lines from the level 2 cache is described, wherein each instruction line comprises one or more instructions. The processor also includes a processor core configured to execute instructions retrieved from the level 1 cache, and circuitry configured to fetch a first instruction line from a level 2 cache, identify, in the first instruction line, an address identifying a first data line containing data targeted by a data access instruction contained in the first instruction line or a different instruction line, and prefetch, from the level 2 cache, the first data line using the extracted address.Type: ApplicationFiled: March 13, 2008Publication date: July 3, 2008Inventor: DAVID A. LUICK
-
Patent number: 7389405Abstract: A method and architecture accesses a unified memory in a micro-processing system having a two-phase clock. The unified memory is accessed during a first instruction cycle. When a program code discontinuity is encountered, the unified memory is accessed a first time during an instruction cycle with a dummy access. The unified memory is accessed a second time during the instruction cycle when a program code discontinuity is encountered with either a data access, as in the case of a last instruction of a loop, or an instruction access, as in the case of a jump instruction.Type: GrantFiled: November 17, 2003Date of Patent: June 17, 2008Assignee: Mediatek, Inc.Inventor: Frederic Boutaud
-
Publication number: 20080140997Abstract: Embodiments of the present invention relate to a data processing system and method for using metadata associated with data to be retrieved from storage to identify further data to be retrieve at least a portion of that further data from the storage in accordance with a prefetch policy.Type: ApplicationFiled: February 4, 2005Publication date: June 12, 2008Inventor: Shailendra Tripathi
-
Publication number: 20080140996Abstract: When misses occur in an instruction cache, prefetching techniques are used that minimize miss rates, memory access bandwidth, and power use. One of the prefetching techniques operates when a miss occurs. A notification that a fetch address missed in an instruction cache is received. The fetch address that caused the miss is analyzed to determine an attribute of the fetch address and based on the attribute a line of instructions is prefetched. The attribute may indicate that the fetch address is a target address of a non-sequential operation. Another attribute may indicate that the fetch address is a target address of a non-sequential operation and the target address is more than X % into a cache line. A further attribute may indicate that the fetch address is an even address in the instruction cache. Such attributes may be combined to determine whether to prefetch.Type: ApplicationFiled: December 8, 2006Publication date: June 12, 2008Inventors: Michael William Morrow, James Norris Dieffenderfer
-
Patent number: 7383417Abstract: The efficient performance of prefetching of data prior to the reading of the data by a program. A prefetching apparatus, for prefetching data from a file to a buffer before the data is read by a program, includes: a history recorder, for recording a history for a plurality of data readings issued by the program while performing data reading; a prefetching generator, for generating a plurality of prefetchings that correspond to the plurality of data readings recorded in the history; a prefetching process determination unit, for determining, based on the history, the performance order for the plurality of prefetchings; and a prefetching unit, for performing, when following the determination of the performance order the program is executed, the plurality of prefetchings in the performance order.Type: GrantFiled: March 15, 2006Date of Patent: June 3, 2008Assignee: International Business Machines CorporationInventors: Toshiaki Yasue, Hideaki Komatsu
-
Patent number: 7373482Abstract: One embodiment of the present invention provides a system that improves the effectiveness of prefetching during execution of instructions in scout mode. During operation, the system executes program instructions in a normal-execution mode. Upon encountering a condition which causes the processor to enter scout mode, the system performs a checkpoint and commences execution of instructions in scout mode, wherein the instructions are speculatively executed to prefetch future memory operations, but wherein results are not committed to the architectural state of a processor. During execution of a load instruction during scout mode, if the load instruction is a special load instruction and if the load instruction causes a lower-level cache miss, the system waits for data to be returned from a higher-level cache before resuming execution of subsequent instructions in scout mode, instead of disregarding the result of the load instruction and immediately resuming execution in scout mode.Type: GrantFiled: May 26, 2005Date of Patent: May 13, 2008Assignee: Sun Microsystems, Inc.Inventors: Lawrence A. Spracklen, Yuan C. Chou, Santosh G. Abraham
-
Patent number: 7370153Abstract: Method and apparatus for implementing controlled pre-fetching of data. An extended data structure can be used to specifying where and when data is to be pre-fetched, and how much pre-fetching is to be performed, if any. The extended data structure has a pre-fetch flag that signals a host controller if pre-fetching is to be done. If the pre-fetch flag is set, pre-fetching is performed, otherwise pre-fetching is not performed. The host controller parses the extended data structure and formulates a data request that is sent to the disk drive. Pre-fetched data can be stored in a buffer memory for future use.Type: GrantFiled: August 6, 2004Date of Patent: May 6, 2008Assignee: NVIDIA CorporationInventor: Radoslav Danilak
-
Patent number: 7363625Abstract: An SMT system is designed to allow software alteration of thread priority. In one case, the system signals a change in a thread priority based on the state of instruction execution and in particular when the instruction has completed execution. To alter the priority of a thread, the software uses a special form of a “no operation” (NOP) instruction (hereafter termed thread priority NOP). When the thread priority NOP is dispatched, its special NOP is decoded in the decode unit of the IDU into an operation that writes a special code into the completion table for the thread priority NOP. A “trouble” bit is also set in the completion table that indicates which instruction group contains the thread priority NOP. The trouble bit indicates that special processing is required after instruction completion. The thread priority instruction is processed after completion using the special code to change a thread's priority.Type: GrantFiled: April 24, 2003Date of Patent: April 22, 2008Assignee: International Business Machines CorporationInventors: William E. Burky, Ronald N. Kalla, David A. Schroter, Balaram Sinharoy
-
Publication number: 20080091921Abstract: Systems and methods for prefetching data in a microprocessor environment are provided. The method comprises decoding a first instruction; determining if the first instruction comprises both a load instruction and embedded prefetch data; processing the load instruction; and processing the prefetch data, in response to determining that the first instruction comprises the prefetch data, wherein processing the prefetch data comprises determining a prefetch multiple, a prefetch address and the number of elements to prefetch, based on the prefetch data.Type: ApplicationFiled: October 12, 2006Publication date: April 17, 2008Inventors: Diab Abuaiadh, Daniel Citron
-
Patent number: 7360059Abstract: In one embodiment, a digital signal processor includes look ahead logic to decrease the number of bubbles inserted in the processing pipeline. The processor receives data containing instructions in a plurality of buffers and decodes the size of a first instruction. The beginning of a second instruction is determined based on the size of the first instruction. The size of the second instruction is decoded and the processor determines whether loading the second instruction will deplete one of the plurality of buffers.Type: GrantFiled: February 3, 2006Date of Patent: April 15, 2008Assignee: Analog Devices, Inc.Inventors: Thomas Tomazin, William C. Anderson, Charles P. Roth, Kayla Chalmers, Juan G. Revilla, Ravi P. Singh
-
Publication number: 20080082790Abstract: An accelerator system supplements standard computer memory management units specifically in the case of sparse data. The accelerator processes requests for data from an analysis application running on the processor system by pre-fetching a subset of the irregularly ordered data and forming that data into a dense, sequentially-ordered array, which is then placed directly into the processor's main memory, for example. In one example, the memory controller is implemented as a separate, add-on coprocessor so that actions of the memory controller will take place simultaneously with the calculations of the processor system. This system addresses the problems caused by a lack of sequential and spatial locality in sparse data. In effect, the complicated data access characteristic of irregular structures, which are a characteristic of sparse matrices, is transferred from the code level to the hardware level.Type: ApplicationFiled: August 16, 2007Publication date: April 3, 2008Inventors: Oleg Vladimirovich Diyankov, Yuri Ivanovich Konotop, John Victor Batson
-
Patent number: 7346762Abstract: A method of executing program instructions may include receiving, in a processor, an instruction that causes the processor to read data from or write data to a portion of memory that is shared by one or more processes, at least one process of which manipulates data in a format that is different than a format of data in the shared portion of memory. The method may further include executing alternate instructions in place of the received instruction. The alternate instructions may effect transformation of data associated with the shared portion of memory from a first data format to a second data format.Type: GrantFiled: January 6, 2006Date of Patent: March 18, 2008Assignee: Apple Inc.Inventors: Ronnie G. Misra, Joshua H. Shaffer
-
Patent number: 7346741Abstract: A method and apparatus for retrieving instructions to be processed by a microprocessor is provided. By pre-fetching instructions in anticipation of being requested, instead of waiting for the instructions to be requested, the latency involved in requesting instructions from higher levels of memory may be avoided. A pre-fetched line of instruction may be stored into a pre-fetch buffer residing on a microprocessor. The pre-fetch buffer may be used by the microprocessor as an alternate source from which to retrieve a requested instruction when the requested instruction is not stored within the first level cache. The particular line of instruction being pre-fetched may be identified based on a configurable stride value. The configurable stride value may be adjusted to maximize the likelihood that a requested instruction, not present in the first level cache, is present in the pre-fetch buffer. The configurable stride value may be updated manually or automatically.Type: GrantFiled: May 10, 2005Date of Patent: March 18, 2008Assignee: Sun Microsystems, Inc.Inventors: Brian F. Keish, Quinn Jacobson, Lakshminarasim Varadadesikan
-
Patent number: 7343481Abstract: A data processing system incorporates an instruction prefetch unit 8 including a static branch predictor 12. A static branch prediction cache 30, 32, 34 is provided for storing a most recently encountered static branch prediction such that a subsequent request to fetch the already encountered branch instruction can be identified before the opcode for that branch instruction is returned. The cached static branch prediction can thus redirect the prefetching to the branch target address sooner than the static predictor 12.Type: GrantFiled: March 19, 2003Date of Patent: March 11, 2008Assignee: ARM LimitedInventor: David James Williamson
-
Patent number: 7334088Abstract: A computer system and a method for enhancing the cache prefetch behavior. A computer system including a processor, a main memory, a prefetch controller, a cache memory, a prefetch buffer, and a main memory, wherein each page in the main memory has associated with it a tag, which is used for controling the prefetching of a variable subset of lines from this page as well as lines from at least one other page. And, coupled to the processor is a prefetch controller, wherein the prefetch controller responds to the processor determining a fault (or miss) occurred to a line of data by fetching a corresponding line of data with the corresponding tag, with the corresponding tag to be stored in the prefetch buffer, and sending the corresponding line of data to the cache memory.Type: GrantFiled: December 20, 2002Date of Patent: February 19, 2008Assignee: International Business Machines CorporationInventor: Peter Franaszek
-
Patent number: 7328433Abstract: Methods and apparatus for reducing memory latency in a software application are disclosed. A disclosed system uses one or more helper threads to prefetch variables for a main thread to reduce performance bottlenecks due to memory latency and/or a cache miss. A performance analysis tool is used to profile the software application's resource usage and identifies areas in the software application experiencing performance bottlenecks. Compiler-runtime instructions are generated into the software application to create and manage the helper thread. The helper thread prefetches data in the identified areas of the software application experiencing performance bottlenecks. A counting mechanism is inserted into the helper thread and a counting mechanism is inserted into the main thread to coordinate the execution of the helper thread with the main thread and to help ensure the prefetched data is not removed from the cache before the main thread is able to take advantage of the prefetched data.Type: GrantFiled: October 2, 2003Date of Patent: February 5, 2008Assignee: Intel CorporationInventors: Xinmin Tian, Shih-wei Liao, Hong Wang, Milind Girkar, John Shen, Perry Wang, Grant Haab, Gerolf Hoflehner, Daniel Lavery, Hideki Saito, Sanjiv Shah, Dongkeun Kim