Prefetching Patents (Class 712/207)
  • Patent number: 7484041
    Abstract: Systems and methods for improving the performance of a multiprocessor system by enabling a first processor to initiate the retrieval of data and the storage of the data in the cache memory of a second processor. One embodiment comprises a system having a plurality of processors coupled to a bus, where each processor has a corresponding cache memory. The processors are configured so that a first one of the processors can issue a preload command directing a target processor to load data into the target processor's cache memory. The preload command may be issued in response to a preload instruction in program code, or in response to an event. The first processor may include an explicit identifier of the target processor in the preload command, or the selection of the target processor may be left to another agent, such as an arbitrator coupled to the bus.
    Type: Grant
    Filed: April 4, 2005
    Date of Patent: January 27, 2009
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Takashi Yoshikawa
  • Publication number: 20090024835
    Abstract: A system and method for pre-fetching data from system memory. A multi-core processor accesses a cache hit predictor concurrently with sending a memory request to a cache subsystem. The predictor has two tables. The first table is indexed by a portion of a memory address and provides a hit prediction based on a first counter value. The second table is indexed by a core number and provides a hit prediction based on a second counter value. If neither table predicts a hit, a pre-fetch request is sent to memory. In response to detecting said hit prediction is incorrect, the pre-fetch is cancelled.
    Type: Application
    Filed: July 19, 2007
    Publication date: January 22, 2009
    Inventors: Michael K. Fertig, Patrick Conway, Kevin Michael Lepak, Cissy Xumin Yuan
  • Patent number: 7480769
    Abstract: A microprocessor coupled to a system memory includes a load request signal that requests data be loaded from the system memory into the microprocessor in response to a load instruction. The load request signal includes a load virtual page address. The microprocessor also includes a prefetch request signal that requests a cache line be prefetched from the system memory into the microprocessor in response to a prefetch instruction. The prefetch request signal includes a prefetch virtual page address.
    Type: Grant
    Filed: August 11, 2006
    Date of Patent: January 20, 2009
    Assignee: MIPS Technologies, Inc.
    Inventors: Keith E. Diefendorff, Thomas A. Petersen
  • Patent number: 7480783
    Abstract: Disclosed are systems for loading an unaligned word from a specified unaligned word address in a memory, the unaligned word comprising a plurality of indexed portions crossing a word boundry, a method of operating the system comprising: loading a first aligned word commencing at an aligned word address rounded from the specified unaligned word address; identifying an index representing the location of the unaligned word address relative to the aligned word address; loading a second aligned word commencing at an aligned word address rounded from a second unaligned word address; and combining indexed portions of the first and second alinged words using the indentified index to construct the unaligned word.
    Type: Grant
    Filed: August 19, 2004
    Date of Patent: January 20, 2009
    Assignees: STMicroelectronics Limited, Hewlett-Packard Company
    Inventors: Mark O. Homewood, Paolo Faraboschi
  • Publication number: 20090019261
    Abstract: A high-performance, superscalar-based computer system with out-of-order instruction execution for enhanced resource utilization and performance throughput. The computer system fetches a plurality of fixed length instructions with a specified, sequential program order (in-order). The computer system includes an instruction execution unit including a register file, a plurality of functional units, and an instruction control unit for examining the instructions and scheduling the instructions for out-of-order execution by the functional units. The register file includes a set of temporary data registers that are utilized by the instruction execution control unit to receive data results generated by the functional units. The data results of each executed instruction are stored in the temporary data registers until all prior instructions have been executed, thereby retiring the executed instruction in-order.
    Type: Application
    Filed: September 22, 2008
    Publication date: January 15, 2009
    Applicant: Seiko Epson Corporation
    Inventors: Le Trong NGUYEN, Derek J. LENTZ, Yoshiyuki MIYAYAMA, Sanjiv GARG, Yasuaki HAGIWARA, Johannes WANG, Te-Li LAU, Sze-Shun WANG, Quang H. TRANG
  • Publication number: 20090019260
    Abstract: Disclosed herein is a mass prefetching method for disk arrays. In order to improve disk read performance for a non-sequential with having spatial locality as well as a sequential read, when a host requests a block to be read, all the blocks of the strip to which the block belongs are read. This is designated as strip prefetching (SP). Throttled Strip Prefetching (TSP), proposed in the present invention, investigates whether SP is beneficial by an online disk simulation, and does not perform SP if it is determined that SP is not beneficial. Since all prefetching operations of TSP are aligned in the strip of the disk array, the disk independence loss is resolved, and thus the performance of disk arrays is improved for concurrent sequential reads of multiple processes. TSP may however suffer from the loss of disk parallelism due to the disk independence of SP for a single sequential read. In order to solve this problem, this invention proposes Massive Stripe Prefetching (MSP).
    Type: Application
    Filed: January 2, 2008
    Publication date: January 15, 2009
    Applicant: KOREA ADVANCED INSTITUTE OF SCIENCE AND TECHNOLOGY
    Inventors: Kyu-Ho Park, Sung-Hoon Baek
  • Publication number: 20090006813
    Abstract: An apparatus, system, and method are disclosed. In one embodiment, the apparatus includes a system memory-side prefetcher that is coupled to a memory controller. The system memory-side prefetcher includes a stride detection unit to identify one or more patterns in a stream. The system memory-side prefetcher also includes a prefetch injection unit to insert prefetches into the memory controller based on the detected one or more patterns. The system memory-side prefetcher also includes a prefetch data forwarding unit to forward the prefetched data to a cache memory coupled to a processor.
    Type: Application
    Filed: June 28, 2007
    Publication date: January 1, 2009
    Inventors: Abhishek Singhal, Hemant G. Rotithor
  • Patent number: 7472256
    Abstract: Profile information can be used to target read operations that cause a substantial portion of misses in a program. A software value prediction technique that utilizes latency and is applied to the targeted read operations facilitates aggressive speculative execution without significant performance impact and without hardware support. A software value predictor issues prefetches for targeted read operations during speculative execution, and utilizes values from these prefetches during subsequent speculative execution, since the earlier prefectches should have completed, to update a software value prediction structure(s). Such a software based value prediction technique allows for aggressive speculative execution without the overhead of a hardware value predictor.
    Type: Grant
    Filed: April 12, 2005
    Date of Patent: December 30, 2008
    Assignee: Sun Microsystems, Inc.
    Inventors: Sreekumar R. Nair, Santosh G. Abraham
  • Publication number: 20080307202
    Abstract: Provided are a method and system for loading test data into execution units in a graphics card to test the execution units. Test instructions are loaded into a cache in a graphics module comprising multiple execution units coupled to the cache on a bus during a design test mode. The cache instructions are concurrently transferred to an instruction queue of each execution unit to concurrently load the cache instructions into the instruction queues of the execution units. The execution units concurrently execute the cache instructions to fetch test instructions from the cache to load into memories of the execution units and execute during the design test mode.
    Type: Application
    Filed: June 7, 2007
    Publication date: December 11, 2008
    Inventors: Allan WONG, Ke YIN, Naveen MATAM, Anthony BABELLA, Wing Hang WONG
  • Patent number: 7461237
    Abstract: A system that suppresses duplicative prefetches for branch target cache lines. During operation, the system fetches a first cache line into in a fetch buffer. The system then prefetches a second cache line, which immediately follows the first cache line, into the fetch buffer. If a control transfer instruction in the first cache line has a target instruction which is located in the second cache line, the system determines if the control transfer instruction is also located at the end of the first cache line so that a corresponding delay slot for the control transfer instruction is located at the beginning of the second cache line. If so, the system suppresses a subsequent prefetch for a target cache line containing the target instruction because the target instruction is located in the second cache line which has already been prefetched.
    Type: Grant
    Filed: April 20, 2005
    Date of Patent: December 2, 2008
    Assignee: Sun Microsystems, Inc.
    Inventors: Abid Ali, Paul Caprioli, Shailender Chaudhry, Miles Lee
  • Publication number: 20080288751
    Abstract: A processor system (100) includes a central processing unit (102) and a prefetch engine (110). The prefetch engine (110) is coupled to the central processing unit (102). The prefetch engine (110) is configured to detect, when data associated with the central processing unit (102) is read from a memory (114), a stride pattern in an address stream based upon whether sums of a current stride and a previous stride are equal for a number of consecutive reads. The prefetch engine (110) is also configured to prefetch, for the central processing unit (102), data from the memory (114) based on the detected stride pattern.
    Type: Application
    Filed: May 17, 2007
    Publication date: November 20, 2008
    Applicant: ADVANCED MICRO DEVICES, INC.
    Inventor: Andrej Kocev
  • Publication number: 20080276073
    Abstract: An apparatus is provided for buffering instructions. An instruction store has memory locations for storing instructions. Each instruction can be associated with a timer such that an instruction dispatcher causes the instruction to be sent when the timer indicates that the instruction should be sent.
    Type: Application
    Filed: May 2, 2007
    Publication date: November 6, 2008
    Applicant: Analog Devices, Inc.
    Inventors: Joern Soerensen, Dilip Muthukrishnan, William Plumb, Thomas Keller, Morag Clark
  • Patent number: 7447877
    Abstract: A method and apparatus for converting memory instructions to prefetch operations during a thread switch window is disclosed. In one embodiment, memory access instructions that are already inside an instruction pipeline when the current thread is switched out may be decoded and then converted to the complementary prefetch operations. The prefetch operation may place the data into the cache during the execution of the alternate thread.
    Type: Grant
    Filed: June 13, 2002
    Date of Patent: November 4, 2008
    Assignee: Intel Corporation
    Inventors: Bharadwaj Pudipeddi, Udo Walterscheidt
  • Patent number: 7441110
    Abstract: A mechanism is described that predicts the usefulness of a prefetching instruction during the instruction's decode cycle. Prefetching instructions that are predicted as useful (prefetch useful data) are sent to an execution unit of the processor for execution, while instructions that are predicted as not useful are discarded. The prediction regarding the usefulness of a prefetching instructions is performed utilizing a branch prediction mask contained in the branch history mechanism. This mask is compared to information contained in the prefetching instruction that records the branch path between the prefetching instruction and actual use of the data. Both instructions and data can be prefetched using this mechanism.
    Type: Grant
    Filed: December 10, 1999
    Date of Patent: October 21, 2008
    Assignee: International Business Machines Corporation
    Inventors: Thomas R. Puzak, Allan M. Hartstein, Mark Charney, Daniel A. Prener, Peter H. Oden
  • Patent number: 7437542
    Abstract: A conjugate processor includes an instruction set architecture (ISA) visible portion having a main pipeline, and an h-flow portion having an h-flow pipeline. The binary executed on the conjugate processor includes an essential portion that is executed on the main pipeline and a non-essential portion that is executed on the h-flow pipeline. The non-essential portion includes hint calculus that is used to provide hints to the main pipeline. The conjugate processor also includes a conjugate mapping table that maps triggers to h-flow targets. Triggers can be instruction attributes, data attributes, state attributes or event attributes. When a trigger is satisfied, the h-flow code specified by the target is executed in the h-flow pipeline.
    Type: Grant
    Filed: January 13, 2006
    Date of Patent: October 14, 2008
    Assignee: Intel Corporation
    Inventors: Hong Wang, Ralph Kling, Yong-Fong Lee, David A. Berson, Michael A. Kozuch, Konrad Lai
  • Patent number: 7434005
    Abstract: A preload controller for controlling a bus access device that reads out data from a main memory via a bus and transfers the readout data to a temporary memory, including a first acquiring device to acquire access hint information which represents a data access interval to the main memory, a second acquiring device to acquire system information which represents a transfer delay time in transfer of data via the bus by the bus access device, a determining device to determine a preload unit count based on the data access interval represented by the access hint information and the transfer delay time represented by the system information, and a management device to instruct the bus access device to read out data for the preload unit count from the main memory and to transfer the readout data to the temporary memory ahead of a data access of the data.
    Type: Grant
    Filed: June 14, 2005
    Date of Patent: October 7, 2008
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Seiji Maeda, Yusuke Shirota
  • Publication number: 20080243268
    Abstract: According to one example embodiment of the inventive subject matter, there is provided a mechanism that controls which prefetchers are applied to execute an application in a computing system by turning them on and off. In one embodiment, this may be accomplished for example with a software control process that may run in the background. In another example embodiment, this may be accomplished using a hardware control machine, or a combination of hardware and software. The prefetchers are turned on and off in order to increase the performance of the computing system.
    Type: Application
    Filed: March 31, 2007
    Publication date: October 2, 2008
    Inventors: Meenakshi A. Kandaswamy, Simon C. Steely
  • Publication number: 20080244080
    Abstract: A processor includes non-volatile memory into which streamed application components may be pre-fetched from a slower storage medium in order to decrease stall times during execution of the application. Alternatively, the application components pre-fetched into the non-volatile memory may be from a traditionally-loaded application rather than a streamed application. The order in which components of the application are prefetched into the non-volatile memory may be based on load order hints. For at least one embodiment, the load order hints are derived from sever-side load ordering logic. For at least one other embodiment, the load order hints are provided by the application itself via a mechanism such as an application programming interface. For at least one other embodiment, the load order hints are generated by the client using profile data. Or, a combination of such approaches may be used. Other embodiments are also described and claimed.
    Type: Application
    Filed: March 29, 2007
    Publication date: October 2, 2008
    Inventors: Thomas H. James, Steven Grobman
  • Publication number: 20080244231
    Abstract: In some embodiments, the invention involves a novel combination of techniques for prefetching data and passing messages between and among cores in a multi-processor/multi-core platform. In an embodiment, a receiving core has a message queue and a message prefetcher. Incoming messages are simultaneously written to the message queue and the message prefetcher. The prefetcher speculatively fetches data referenced in the received message so that the data is available when the message is executed in the execution pipeline, or shortly thereafter. Other embodiments are described and claimed.
    Type: Application
    Filed: March 30, 2007
    Publication date: October 2, 2008
    Inventors: Aaron Kunze, Erik J. Johnson, Hermann Gartler
  • Publication number: 20080244232
    Abstract: Apparatus and computing systems associated with data pre-fetching are described. One embodiment includes a processor that includes a first unit to store data corresponding to a load instruction and an instruction pointer (IP) value associated with the load instruction. The processor also includes a second unit to produce a predicted demand address for a next load instruction, the predicted demand address being based on a constant stride value. The processor also includes a third unit to generate an instruction pointer pre-fetch (IPP) request for the predicted demand address. The processor may also include units to arbitrate between generated IP pre-fetch requests and alternative pre-fetch requests.
    Type: Application
    Filed: April 2, 2007
    Publication date: October 2, 2008
    Inventors: Marina Sherman, Jack Doweck
  • Patent number: 7430650
    Abstract: Cache prefetching algorithm uses previously requested address and data patterns to predict future data needs and prefetch such data from memory into cache. A requested address is compared to previously requested addresses and returned data to compute a set of increments, and the set of increments is added to the currently requested address and returned data to generate a set of prefetch candidates. Weight functions are used to prioritize prefetch candidates. The prefetching method requires no changes to application code or operation system (OS) and is transparent to the compiler and the processor. The prefetching method comprises a parallel algorithm well-suited to implementation on an application-specific integrated circuit (ASIC) or field-programmable gate array (FPGA), or to integration into a processor.
    Type: Grant
    Filed: June 13, 2005
    Date of Patent: September 30, 2008
    Inventor: Richard A. Ross
  • Patent number: 7430640
    Abstract: The decision to prefetch inodes is based upon the detecting of access patterns that would benefit from such a prefetch. Once the decision to prefetch is made, a plurality of inodes are prefetched in parallel. Further, the prefetching of inodes is paced, such that the prefetching substantially matches the speed at which an application requests inodes.
    Type: Grant
    Filed: November 8, 2005
    Date of Patent: September 30, 2008
    Assignee: International Business Machines Corporation
    Inventors: Frank B. Schmuck, James C. Wyllie
  • Publication number: 20080229069
    Abstract: An instruction preload instruction executed in a first processor instruction set operating mode is operative to correctly preload instructions in a different, second instruction set. The instructions are pre-decoded according to the second instruction set encoding in response to an instruction set preload indicator (ISPI). In various embodiments, the ISPI may be set prior to executing the preload instruction, or may comprise part of the preload instruction or the preload target address.
    Type: Application
    Filed: March 14, 2007
    Publication date: September 18, 2008
    Applicant: QUALCOMM INCORPORATED
    Inventors: Thomas Andrew Sartorius, Brian Michael Stempel, Rodney Wayne Smith
  • Publication number: 20080229072
    Abstract: A prefetch processing apparatus includes a central-processing-unit monitor unit that monitors processing states of the central processing unit in association with time elapsed from start time of executing a program. A cache-miss-data address obtaining unit obtains cache-miss-data addresses in association with the time elapsed from the start time of executing the program, and a cycle determining unit determines a cycle of time required for executing the program. An identifying unit identifies a prefetch position in a cycle in which a prefetch-target address is to be prefetched by associating the cycle determined by the cycle determining unit with the cache-miss data addresses obtained by the cache-miss-data address obtaining unit. The prefetch-target address is an address of data on which prefetch processing is to be performed.
    Type: Application
    Filed: March 5, 2008
    Publication date: September 18, 2008
    Applicant: FUJITSU LIMITED
    Inventors: Shuji Yamamura, Takashi Aoki
  • Publication number: 20080229071
    Abstract: A prefetch control apparatus includes a prefetch controller for controlling prefetch of read data into a cache memory caching data to be transferred between a computer apparatus and a storage device, and which enhances a read efficiency of the read data from the storage device, a sequentiality decider for deciding whether the read data that are read from the storage device toward the computer apparatus are sequential access data, a locality decider for deciding whether the read data have locality of data arrangement in the predetermined storage area, in a case where the read data that are read from the storage device toward the computer apparatus have been decided not to be sequential access data, and a prefetcher for prefetching the read data in a case where the read data has the locality of the data arrangement.
    Type: Application
    Filed: March 5, 2008
    Publication date: September 18, 2008
    Applicant: Fujitsu Limited
    Inventors: Katsuhiko SHIOYA, Eiichi YAMANAKA
  • Publication number: 20080229070
    Abstract: Cache circuitry, a data processing apparatus including such cache circuitry, and a method for prefetching data into such cache circuitry, are provided. The cache circuitry has a cache storage comprising a plurality of cache lines for storing data values, and control circuitry which is responsive to an access racquet issued by a device of the data processing apparatus identifying a memory address of a data value to be accessed, to cause a lookup operation to be performed to determine whether the data value for that memory address is stored within the cache storage. If not, a linefill operation is initiated to retrieve the data value from memory.
    Type: Application
    Filed: March 12, 2007
    Publication date: September 18, 2008
    Applicant: ARM Limited
    Inventors: Elodie Charra, Philippe Jean-Pierre Raphalen, Frederic Claude Marie Piry, Philippe Luc, Gilles Eric Grandou
  • Patent number: 7421540
    Abstract: A mechanism is provided that identifies instructions that access storage and may be candidates for cache prefetching. The mechanism augments these instructions so that any given instance of the instruction operates in one of four modes, namely normal, unexecuted, data gathering, and validation. In the normal mode, the instruction merely performs the function specified in the software runtime environment. An instruction in unexecuted mode, upon the next execution, is placed in data gathering mode. When an instruction in the data gathering mode is encountered, the mechanism of the present invention collects data to discover potential fixed storage access patterns. When an instruction is in validation mode, the mechanism of the present invention validates the presumed fixed storage access patterns.
    Type: Grant
    Filed: May 3, 2005
    Date of Patent: September 2, 2008
    Assignee: International Business Machines Corporation
    Inventors: Christopher Michael Donawa, Allan Henry Kielstra
  • Publication number: 20080209173
    Abstract: An apparatus for executing branch predictor directed prefetch operations. During operation, a branch prediction unit may provide an address of a first instruction to the fetch unit. The fetch unit may send a fetch request for the first instruction to the instruction cache to perform a fetch operation. In response to detecting a cache miss corresponding to the first instruction, the fetch unit may execute one or more prefetch operation while the cache miss corresponding to the first instruction is being serviced. The branch prediction unit may provide an address of a predicted next instruction in the instruction stream to the fetch unit. The fetch unit may send a prefetch request for the predicted next instruction to the instruction cache to execute the prefetch operation. The fetch unit may store prefetched instruction data obtained from a next level of memory in the instruction cache or in a prefetch buffer.
    Type: Application
    Filed: February 28, 2007
    Publication date: August 28, 2008
    Inventors: Marius Evers, Trivikram Krishnamurthy
  • Publication number: 20080201529
    Abstract: An apparatus, program product and method initiate, in connection with a context switch operation, a prefetch of data likely to be used by a thread prior to resuming execution of that thread. As a result, once it is known that a context switch will be performed to a particular thread, data may be prefetched on behalf of that thread so that when execution of the thread is resumed, more of the working state for the thread is likely to be cached, or at least in the process of being retrieved into cache memory, thus reducing cache-related performance penalties associated with context switching.
    Type: Application
    Filed: April 24, 2008
    Publication date: August 21, 2008
    Applicant: International Business Machines Corporation
    Inventors: Jeffrey Powers Bradford, Harold F. Kossman, Timothy John Mullins
  • Publication number: 20080189518
    Abstract: A processor includes a cache memory that has an array, word lines, and bit lines. A control module accesses cells of the array during access cycles to access instructions stored in the cache memory. The control module performs one of a first discrete read and a first sequential read to access instructions in a first set of cells of the array that are connected to a first word line and selectively performs one of a second discrete read and a second sequential read based on a branch instruction to access instructions in a second set of cells of the array that are connected to a second word line. The second word line is different than the first word line.
    Type: Application
    Filed: April 2, 2008
    Publication date: August 7, 2008
    Inventors: Sehat Sutardja, Jason T. Su, Hong-Yi Chen, Jason Sheu, Jensen Tjeng
  • Patent number: 7409486
    Abstract: A protocol chip and a bridge are connected to a first bus, while the bridge and a micro processor (MP) are connected to a second bus. The MP generates parameter information and writes it into a local memory (LM), and issues a write command which includes access destination information to this parameter information to a protocol chip. The bridge pre-fetches the parameter information from the LM using the access destination information within the write command which is transferred to the protocol chip via itself, and when receiving a read command from the protocol chip, transmits the parameter information which has been pre-fetched to the protocol chip via the first bus, without passing the read command through to the MP.
    Type: Grant
    Filed: March 27, 2006
    Date of Patent: August 5, 2008
    Assignee: Hitachi, Ltd.
    Inventors: Osamu Torigoe, Hideaki Shima, Shouji Katoh
  • Publication number: 20080184010
    Abstract: According to the present invention, there is provided an instruction cache prefetch control apparatus having an external memory, a CPU and an instruction cache unit, the instruction cache unit having: an instruction cache data memory which receives and stores the instruction sequence; a prefetch buffer which prefetches and stores an instruction sequence next to the instruction sequence as a target of a fetch request from the CPU when the next instruction sequence is not stored in the instruction cache data memory; an instruction cache write control unit which selectively outputs, to the instruction cache data memory, one of the instruction sequence output from the external memory and the instruction sequence stored in the prefetch buffer; and a hit or miss determination access control unit which, upon receiving, from the CPU, a fetch request for the instruction sequence stored in the prefetch buffer, transfers the instruction sequence from the prefetch buffer to the instruction cache data memory and stores th
    Type: Application
    Filed: December 31, 2007
    Publication date: July 31, 2008
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventor: Masato Uchiyama
  • Patent number: 7404042
    Abstract: A fetch section of a processor comprises an instruction cache and a pipeline of several stages for obtaining instructions. Instructions may cross cache line boundaries. The pipeline stages process two addresses to recover a complete boundary crossing instruction. During such processing, if the second piece of the instruction is not in the cache, the fetch with regard to the first line is invalidated and recycled. On this first pass, processing of the address for the second part of the instruction is treated as a pre-fetch request to load instruction data to the cache from higher level memory, without passing any of that data to the later stages of the processor. When the first line address passes through the fetch stages again, the second line address follows in the normal order, and both pieces of the instruction are can be fetched from the cache and combined in the normal manner.
    Type: Grant
    Filed: May 18, 2005
    Date of Patent: July 22, 2008
    Assignee: QUALCOMM Incorporated
    Inventors: Brian Michael Stempel, Jeffrey Todd Bridges, Rodney Wayne Smith, Thomas Andrew Sartorius
  • Publication number: 20080168259
    Abstract: A DMA device prefetches descriptors into a descriptor prefetch buffer. The size of descriptor prefetch buffer holds an appropriate number of descriptors for a given latency environment. To support a linked list of descriptors, the DMA engine prefetches descriptors based on the assumption that they are sequential in memory and discards any descriptors that are found to violate this assumption. The DMA engine seeks to keep the descriptor prefetch buffer full by requesting multiple descriptors per transaction whenever possible. The bus engine fetches these descriptors from system memory and writes them to the prefetch buffer. The DMA engine may also use an aggressive prefetch where the bus engine requests the maximum number of descriptors that the buffer will support whenever there is any space in the descriptor prefetch buffer. The DMA device discards any remaining descriptors that cannot be stored.
    Type: Application
    Filed: January 10, 2007
    Publication date: July 10, 2008
    Inventors: Giora Biran, Luis E. De la Torre, Bernard C. Drerup, Jyoti Gupta, Richard Nicholas
  • Publication number: 20080162819
    Abstract: A design structure for prefetching instruction lines is provided. The design structure is embodied in a machine readable storage medium for designing, manufacturing, and/or testing a design. The design structure comprises a processor having a level 2 cache, and a level 1 cache configured to receive instruction lines from the level 2 cache is described, wherein each instruction line comprises one or more instructions. The processor also includes a processor core configured to execute instructions retrieved from the level 1 cache, and circuitry configured to fetch a first instruction line from a level 2 cache, identify, in the first instruction line, an address identifying a first data line containing data targeted by a data access instruction contained in the first instruction line or a different instruction line, and prefetch, from the level 2 cache, the first data line using the extracted address.
    Type: Application
    Filed: March 13, 2008
    Publication date: July 3, 2008
    Inventor: DAVID A. LUICK
  • Patent number: 7389405
    Abstract: A method and architecture accesses a unified memory in a micro-processing system having a two-phase clock. The unified memory is accessed during a first instruction cycle. When a program code discontinuity is encountered, the unified memory is accessed a first time during an instruction cycle with a dummy access. The unified memory is accessed a second time during the instruction cycle when a program code discontinuity is encountered with either a data access, as in the case of a last instruction of a loop, or an instruction access, as in the case of a jump instruction.
    Type: Grant
    Filed: November 17, 2003
    Date of Patent: June 17, 2008
    Assignee: Mediatek, Inc.
    Inventor: Frederic Boutaud
  • Publication number: 20080140997
    Abstract: Embodiments of the present invention relate to a data processing system and method for using metadata associated with data to be retrieved from storage to identify further data to be retrieve at least a portion of that further data from the storage in accordance with a prefetch policy.
    Type: Application
    Filed: February 4, 2005
    Publication date: June 12, 2008
    Inventor: Shailendra Tripathi
  • Publication number: 20080140996
    Abstract: When misses occur in an instruction cache, prefetching techniques are used that minimize miss rates, memory access bandwidth, and power use. One of the prefetching techniques operates when a miss occurs. A notification that a fetch address missed in an instruction cache is received. The fetch address that caused the miss is analyzed to determine an attribute of the fetch address and based on the attribute a line of instructions is prefetched. The attribute may indicate that the fetch address is a target address of a non-sequential operation. Another attribute may indicate that the fetch address is a target address of a non-sequential operation and the target address is more than X % into a cache line. A further attribute may indicate that the fetch address is an even address in the instruction cache. Such attributes may be combined to determine whether to prefetch.
    Type: Application
    Filed: December 8, 2006
    Publication date: June 12, 2008
    Inventors: Michael William Morrow, James Norris Dieffenderfer
  • Patent number: 7383417
    Abstract: The efficient performance of prefetching of data prior to the reading of the data by a program. A prefetching apparatus, for prefetching data from a file to a buffer before the data is read by a program, includes: a history recorder, for recording a history for a plurality of data readings issued by the program while performing data reading; a prefetching generator, for generating a plurality of prefetchings that correspond to the plurality of data readings recorded in the history; a prefetching process determination unit, for determining, based on the history, the performance order for the plurality of prefetchings; and a prefetching unit, for performing, when following the determination of the performance order the program is executed, the plurality of prefetchings in the performance order.
    Type: Grant
    Filed: March 15, 2006
    Date of Patent: June 3, 2008
    Assignee: International Business Machines Corporation
    Inventors: Toshiaki Yasue, Hideaki Komatsu
  • Patent number: 7373482
    Abstract: One embodiment of the present invention provides a system that improves the effectiveness of prefetching during execution of instructions in scout mode. During operation, the system executes program instructions in a normal-execution mode. Upon encountering a condition which causes the processor to enter scout mode, the system performs a checkpoint and commences execution of instructions in scout mode, wherein the instructions are speculatively executed to prefetch future memory operations, but wherein results are not committed to the architectural state of a processor. During execution of a load instruction during scout mode, if the load instruction is a special load instruction and if the load instruction causes a lower-level cache miss, the system waits for data to be returned from a higher-level cache before resuming execution of subsequent instructions in scout mode, instead of disregarding the result of the load instruction and immediately resuming execution in scout mode.
    Type: Grant
    Filed: May 26, 2005
    Date of Patent: May 13, 2008
    Assignee: Sun Microsystems, Inc.
    Inventors: Lawrence A. Spracklen, Yuan C. Chou, Santosh G. Abraham
  • Patent number: 7370153
    Abstract: Method and apparatus for implementing controlled pre-fetching of data. An extended data structure can be used to specifying where and when data is to be pre-fetched, and how much pre-fetching is to be performed, if any. The extended data structure has a pre-fetch flag that signals a host controller if pre-fetching is to be done. If the pre-fetch flag is set, pre-fetching is performed, otherwise pre-fetching is not performed. The host controller parses the extended data structure and formulates a data request that is sent to the disk drive. Pre-fetched data can be stored in a buffer memory for future use.
    Type: Grant
    Filed: August 6, 2004
    Date of Patent: May 6, 2008
    Assignee: NVIDIA Corporation
    Inventor: Radoslav Danilak
  • Patent number: 7363625
    Abstract: An SMT system is designed to allow software alteration of thread priority. In one case, the system signals a change in a thread priority based on the state of instruction execution and in particular when the instruction has completed execution. To alter the priority of a thread, the software uses a special form of a “no operation” (NOP) instruction (hereafter termed thread priority NOP). When the thread priority NOP is dispatched, its special NOP is decoded in the decode unit of the IDU into an operation that writes a special code into the completion table for the thread priority NOP. A “trouble” bit is also set in the completion table that indicates which instruction group contains the thread priority NOP. The trouble bit indicates that special processing is required after instruction completion. The thread priority instruction is processed after completion using the special code to change a thread's priority.
    Type: Grant
    Filed: April 24, 2003
    Date of Patent: April 22, 2008
    Assignee: International Business Machines Corporation
    Inventors: William E. Burky, Ronald N. Kalla, David A. Schroter, Balaram Sinharoy
  • Publication number: 20080091921
    Abstract: Systems and methods for prefetching data in a microprocessor environment are provided. The method comprises decoding a first instruction; determining if the first instruction comprises both a load instruction and embedded prefetch data; processing the load instruction; and processing the prefetch data, in response to determining that the first instruction comprises the prefetch data, wherein processing the prefetch data comprises determining a prefetch multiple, a prefetch address and the number of elements to prefetch, based on the prefetch data.
    Type: Application
    Filed: October 12, 2006
    Publication date: April 17, 2008
    Inventors: Diab Abuaiadh, Daniel Citron
  • Patent number: 7360059
    Abstract: In one embodiment, a digital signal processor includes look ahead logic to decrease the number of bubbles inserted in the processing pipeline. The processor receives data containing instructions in a plurality of buffers and decodes the size of a first instruction. The beginning of a second instruction is determined based on the size of the first instruction. The size of the second instruction is decoded and the processor determines whether loading the second instruction will deplete one of the plurality of buffers.
    Type: Grant
    Filed: February 3, 2006
    Date of Patent: April 15, 2008
    Assignee: Analog Devices, Inc.
    Inventors: Thomas Tomazin, William C. Anderson, Charles P. Roth, Kayla Chalmers, Juan G. Revilla, Ravi P. Singh
  • Publication number: 20080082790
    Abstract: An accelerator system supplements standard computer memory management units specifically in the case of sparse data. The accelerator processes requests for data from an analysis application running on the processor system by pre-fetching a subset of the irregularly ordered data and forming that data into a dense, sequentially-ordered array, which is then placed directly into the processor's main memory, for example. In one example, the memory controller is implemented as a separate, add-on coprocessor so that actions of the memory controller will take place simultaneously with the calculations of the processor system. This system addresses the problems caused by a lack of sequential and spatial locality in sparse data. In effect, the complicated data access characteristic of irregular structures, which are a characteristic of sparse matrices, is transferred from the code level to the hardware level.
    Type: Application
    Filed: August 16, 2007
    Publication date: April 3, 2008
    Inventors: Oleg Vladimirovich Diyankov, Yuri Ivanovich Konotop, John Victor Batson
  • Patent number: 7346762
    Abstract: A method of executing program instructions may include receiving, in a processor, an instruction that causes the processor to read data from or write data to a portion of memory that is shared by one or more processes, at least one process of which manipulates data in a format that is different than a format of data in the shared portion of memory. The method may further include executing alternate instructions in place of the received instruction. The alternate instructions may effect transformation of data associated with the shared portion of memory from a first data format to a second data format.
    Type: Grant
    Filed: January 6, 2006
    Date of Patent: March 18, 2008
    Assignee: Apple Inc.
    Inventors: Ronnie G. Misra, Joshua H. Shaffer
  • Patent number: 7346741
    Abstract: A method and apparatus for retrieving instructions to be processed by a microprocessor is provided. By pre-fetching instructions in anticipation of being requested, instead of waiting for the instructions to be requested, the latency involved in requesting instructions from higher levels of memory may be avoided. A pre-fetched line of instruction may be stored into a pre-fetch buffer residing on a microprocessor. The pre-fetch buffer may be used by the microprocessor as an alternate source from which to retrieve a requested instruction when the requested instruction is not stored within the first level cache. The particular line of instruction being pre-fetched may be identified based on a configurable stride value. The configurable stride value may be adjusted to maximize the likelihood that a requested instruction, not present in the first level cache, is present in the pre-fetch buffer. The configurable stride value may be updated manually or automatically.
    Type: Grant
    Filed: May 10, 2005
    Date of Patent: March 18, 2008
    Assignee: Sun Microsystems, Inc.
    Inventors: Brian F. Keish, Quinn Jacobson, Lakshminarasim Varadadesikan
  • Patent number: 7343481
    Abstract: A data processing system incorporates an instruction prefetch unit 8 including a static branch predictor 12. A static branch prediction cache 30, 32, 34 is provided for storing a most recently encountered static branch prediction such that a subsequent request to fetch the already encountered branch instruction can be identified before the opcode for that branch instruction is returned. The cached static branch prediction can thus redirect the prefetching to the branch target address sooner than the static predictor 12.
    Type: Grant
    Filed: March 19, 2003
    Date of Patent: March 11, 2008
    Assignee: ARM Limited
    Inventor: David James Williamson
  • Patent number: 7334088
    Abstract: A computer system and a method for enhancing the cache prefetch behavior. A computer system including a processor, a main memory, a prefetch controller, a cache memory, a prefetch buffer, and a main memory, wherein each page in the main memory has associated with it a tag, which is used for controling the prefetching of a variable subset of lines from this page as well as lines from at least one other page. And, coupled to the processor is a prefetch controller, wherein the prefetch controller responds to the processor determining a fault (or miss) occurred to a line of data by fetching a corresponding line of data with the corresponding tag, with the corresponding tag to be stored in the prefetch buffer, and sending the corresponding line of data to the cache memory.
    Type: Grant
    Filed: December 20, 2002
    Date of Patent: February 19, 2008
    Assignee: International Business Machines Corporation
    Inventor: Peter Franaszek
  • Patent number: 7328433
    Abstract: Methods and apparatus for reducing memory latency in a software application are disclosed. A disclosed system uses one or more helper threads to prefetch variables for a main thread to reduce performance bottlenecks due to memory latency and/or a cache miss. A performance analysis tool is used to profile the software application's resource usage and identifies areas in the software application experiencing performance bottlenecks. Compiler-runtime instructions are generated into the software application to create and manage the helper thread. The helper thread prefetches data in the identified areas of the software application experiencing performance bottlenecks. A counting mechanism is inserted into the helper thread and a counting mechanism is inserted into the main thread to coordinate the execution of the helper thread with the main thread and to help ensure the prefetched data is not removed from the cache before the main thread is able to take advantage of the prefetched data.
    Type: Grant
    Filed: October 2, 2003
    Date of Patent: February 5, 2008
    Assignee: Intel Corporation
    Inventors: Xinmin Tian, Shih-wei Liao, Hong Wang, Milind Girkar, John Shen, Perry Wang, Grant Haab, Gerolf Hoflehner, Daniel Lavery, Hideki Saito, Sanjiv Shah, Dongkeun Kim