Prefetching Patents (Class 712/207)
  • Patent number: 8087024
    Abstract: In general, in one aspect, the disclosure describes a processor that includes an instruction store to store instructions of at least a portion of at least one program and multiple engines coupled to the shared instruction store. The engines provide multiple execution threads and include an instruction cache to cache a subset of the at least the portion of the at least one program from the instruction store, with different respective portions of the engine's instruction cache being allocated to different respective ones of the engine threads.
    Type: Grant
    Filed: November 18, 2008
    Date of Patent: December 27, 2011
    Assignee: Intel Corporation
    Inventors: Sridhar Lakshmanamurthy, Wilson Y. Liao, Prashant R. Chandra, Jeen-Yuan Miin, Yim Pun
  • Publication number: 20110314261
    Abstract: In an embodiment, a method is provided for prefetching attributes used in access control evaluation. In this method, an access control policy that comprises rules is retrieved. These rules further comprise parameters. At least one of the rules is categorized into a class from multiple classes based on at least one of the parameters. Here, the class is a grouping based on at least one of these parameters. An attribute associated with the at least one of these parameters is identified and this attribute is mapped to the class.
    Type: Application
    Filed: June 17, 2010
    Publication date: December 22, 2011
    Applicant: SAP AG
    Inventors: Achim D. Brucker, Helmut Petritsch
  • Publication number: 20110314262
    Abstract: A prefetch request circuit is provided in a processor device. The processor device has hierarchized storage areas and can prefetch data of address to be used between appropriate storage areas among the storage areas, when executing respective instruction flows obtained by multi-flow expansion for one instruction at a time of decoding of the instruction. The prefetch request circuit includes a latch unit to hold, when a state in which the respective instruction flows to access the storage area are executed with a maximum specifiable data transfer volume is specified, the state during a time period of the multi-flow expansion; and a prefetch request signal output unit to output a prefetch request signal to request the prefetch every time when the instruction flow is executed, based on an output signal of the latch unit and a signal indicating an execution timing of the respective instruction flows.
    Type: Application
    Filed: August 29, 2011
    Publication date: December 22, 2011
    Applicant: FUJITSU LIMITED
    Inventors: Atsushi FUSEJIMA, Norihito Gomyo
  • Patent number: 8079031
    Abstract: A discussion of a dynamic configuration for a prefetcher is proposed. For example, a thread specific latency metric is calculated and provides dynamic feedback to the software on a per thread basis via the configuration and status registers. Likewise, the software can optionally use the information from the registers to dynamically configure the prefetching behavior and allows the software to be able to both query the performance and configure the prefetcher.
    Type: Grant
    Filed: October 21, 2005
    Date of Patent: December 13, 2011
    Assignee: Intel Corporation
    Inventors: Geeyarpuram N. Santhanakrishnan, Michael F. Cole, Mark Rowland, Ganapati Srinivasa
  • Patent number: 8078806
    Abstract: A microprocessor coupled to a system memory by a bus includes an instruction decode unit that decodes an instruction that specifies a data stream in the system memory and a stream prefetch priority. The microprocessor also includes a load/store unit that generates load/store requests to transfer data between the system memory and the microprocessor. The microprocessor also includes a stream prefetch unit that generates a plurality of prefetch requests to prefetch the data stream from the system memory into the microprocessor. The prefetch requests specify the stream prefetch priority. The microprocessor also includes a bus interface unit (BIU) that generates transaction requests on the bus to transfer data between the system memory and the microprocessor in response to the load/store requests and the prefetch requests. The BIU prioritizes the bus transaction requests for the prefetch requests relative to the bus transaction requests for the load/store requests based on the stream prefetch priority.
    Type: Grant
    Filed: October 25, 2010
    Date of Patent: December 13, 2011
    Assignee: MIPS Technologies, Inc.
    Inventor: Keith E. Diefendorff
  • Patent number: 8069336
    Abstract: Various embodiments of methods and systems for implementing a microprocessor that includes a trace cache and attempts to transition fetching from instruction cache to trace cache only on label boundaries are disclosed. In one embodiment, a microprocessor may include an instruction cache, a branch prediction unit, and a trace cache. The prefetch unit may fetch instructions from the instruction cache until the branch prediction unit outputs a predicted target address for a branch instruction. When the branch prediction unit outputs a predicted target address, the prefetch unit may check for an entry matching the predicted target address in the trace cache. If a match is found, the prefetch unit may fetch one or more traces from the trace cache in lieu of fetching instructions from the instruction cache.
    Type: Grant
    Filed: December 3, 2003
    Date of Patent: November 29, 2011
    Assignee: Globalfoundries Inc.
    Inventors: Mitchell Alsup, Gregory William Smaus
  • Patent number: 8069270
    Abstract: According to the present invention, methods and apparatus are provided improving reading of a remote tape device by a host through multiple fibre channel switches. A fibre channel switch preemptively sends read requests to a tape device before read requests are received from a host. Flow control, buffer management, and error handling mechanisms are implemented to allow accelerated tape back up restoration while working to prevent buffer overflow and underflow.
    Type: Grant
    Filed: September 6, 2005
    Date of Patent: November 29, 2011
    Assignee: Cisco Technology, Inc.
    Inventors: Manali Nambiar, Arpakorn Boonkongchuen, P. Murali Basavaiah
  • Patent number: 8060701
    Abstract: When misses occur in an instruction cache, prefetching techniques are used that minimize miss rates, memory access bandwidth, and power use. One of the prefetching techniques operates when a miss occurs. A notification that a fetch address missed in an instruction cache is received. The fetch address that caused the miss is analyzed to determine an attribute of the fetch address and based on the attribute a line of instructions is prefetched. The attribute may indicate that the fetch address is a target address of a non-sequential operation. Another attribute may indicate that the fetch address is a target address of a non-sequential operation and the target address is more than X % into a cache line. A further attribute may indicate that the fetch address is an even address in the instruction cache. Such attributes may be combined to determine whether to prefetch.
    Type: Grant
    Filed: December 8, 2006
    Date of Patent: November 15, 2011
    Assignee: QUALCOMM Incorporated
    Inventors: Michael William Morrow, James Norris Dieffenderfer
  • Publication number: 20110276786
    Abstract: Mechanisms are provided for optimizing code to perform prefetching of data into a shared memory of a computing device that is shared by a plurality of threads that execute on the computing device. A memory stream of a portion of code that is shared by the plurality of threads is identified. A set of prefetch instructions is distributed across the plurality of threads. Prefetch instructions are inserted into the instruction sequences of the plurality of threads such that each instruction sequence has a separate sub-portion of the set of prefetch instructions, thereby generating optimized code. Executable code is generated based on the optimized code and stored in a storage device. The executable code, when executed, performs the prefetches associated with the distributed set of prefetch instructions in a shared manner across the plurality of threads.
    Type: Application
    Filed: May 4, 2010
    Publication date: November 10, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Alexandre E. Eichenberger, John A. Gunnels
  • Patent number: 8055849
    Abstract: Reducing cache pollution of a software controlled cache is provided. A request is received to prefetch data into the software controlled cache. A first designator is set for a first cache access to a first value. If there is the second cache access to prefetch, a determination is made as to whether data associated with the second cache access exists in the software controlled cache. If the data is in the software controlled cache, a determination is made as to whether a second value of a second designator is greater than the first value of the first cache access. If the second value fails to be greater than the first value, the position of the first cache access and the second cache access in a cache line is swapped. The first value is decremented by a predetermined amount and the second value is replaced to equal the first value.
    Type: Grant
    Filed: April 4, 2008
    Date of Patent: November 8, 2011
    Assignee: International Business Machines Corporation
    Inventors: Tong Chen, Marc Gonzalez tallada, Zehra N. Sura, Tao Zhang
  • Publication number: 20110264894
    Abstract: A method is provided for controlling a pipeline operation of a processor. The processor is coupled to a memory containing executable computer instructions. The method includes determining a branch instruction to be executed by the processor, and providing both an address of a branch target instruction of the branch instruction and an address of a next instruction following the branch instruction in a program sequence. The method also includes determining a branch decision with respect to the branch instruction based on at least the address of the branch target instruction provided, and selecting at least one of the branch target instruction and the next instruction as a proper instruction to be executed by an execution unit of the processor, based on the branch decision and before the branch instruction is executed by the execution unit, such that the pipeline operation is not stalled whether or not a branch is taken with respect to the branch instruction.
    Type: Application
    Filed: December 31, 2010
    Publication date: October 27, 2011
    Inventor: Kenneth Chenghao Lin
  • Patent number: 8046752
    Abstract: A method and system for creating and injecting code into a running program that identifies a hot data stream, and prefetching data elements in the stream so they are available when needed by the processor. The injected code identifies the first few elements in a hot data stream (i.e. the prefix), and prefetches the balance of the elements in the stream (i.e., the suffix). Since the hot data stream identification code and prefetch code is injected at run time, pointer related time-dependencies inherent in earlier prefetch systems are eliminated. A global deterministic finite state machine (DFSM) is used to help create conceptual logic used to generate the code injected into the program for prefix detection.
    Type: Grant
    Filed: November 15, 2005
    Date of Patent: October 25, 2011
    Assignee: Microsoft Corporation
    Inventors: Trishul Chilimbi, Martin Hirzel
  • Patent number: 8032713
    Abstract: A design structure embodied in a machine readable storage medium for designing, manufacturing, and/or testing a design is provided. The design structure generally includes a computer system that includes a CPU, a storage device, circuitry for providing a speculative access threshold corresponding to a selected percentage of the total number of accesses to the storage device that can be speculatively issued, and circuitry for intermixing demand accesses and speculative accesses in accordance with the speculative access threshold.
    Type: Grant
    Filed: May 5, 2008
    Date of Patent: October 4, 2011
    Assignee: International Business Machines Corporation
    Inventors: James J. Allen, Jr., Steven K. Jenkins, James A. Mossman, Michael R. Trombley
  • Publication number: 20110238953
    Abstract: An instruction fetch apparatus is disclosed which includes: a detection state setting section configured to set the execution state of a program of which an instruction prefetch timing is to be detected; a program execution state generation section configured to generate the current execution state of the program; an instruction prefetch timing detection section configured to detect the instruction prefetch timing in the case of a match between the current execution state of the program and the set execution state thereof upon comparison therebetween; and an instruction prefetch section configured to prefetch the next instruction upon detection of the instruction prefetch timing.
    Type: Application
    Filed: February 16, 2011
    Publication date: September 29, 2011
    Applicant: Sony Corporation
    Inventors: Katsuhiko METSUGI, Hiroaki SAKAGUCHI, Hiroshi KOBAYASHI, Hitoshi KAI, Haruhisa YAMAMOTO, Taichi HIRAO, Yousuke MORITA, Koichi HASEGAWA
  • Publication number: 20110231612
    Abstract: One embodiment provides a system that pre-fetches into a sibling cache. During operation, a first thread executes in a first processor core associated with a first cache, while a second thread associated with the first thread simultaneously executes in a second processor core associated with a second cache. During execution, the second thread encounters an instruction that triggers a request to a lower-level cache which is shared by the first cache and the second cache. The system responds to this request by directing a load fill which returns from the lower-level cache in response to the request to the first cache, thereby reducing cache misses for the first thread.
    Type: Application
    Filed: March 16, 2010
    Publication date: September 22, 2011
    Applicant: ORACLE INTERNATIONAL CORPORATION
    Inventors: Martin R. Karlsson, Shailender Chaudhry, Robert E. Cypher
  • Patent number: 8015379
    Abstract: A wake-and-go mechanism is configured to issue a look-ahead load command on a system bus to read a data value from a target address and perform a comparison operation to determine whether the data value at the target address indicates that an event for which a thread is waiting has occurred. In response to the comparison resulting in a determination that the event has not occurred, the wake-and-go engine populates the wake-and-go storage array with the target address. In response to the comparison resulting in a determination that the event has occurred, the wake-and-go engine issues a load command on the system bus to read the data value from the target address with data exclusivity and determines whether the wake-and-go engine obtains a lock for the target address. Responsive to obtaining the lock for the target address, the wake-and-go engine holds the lock for the thread.
    Type: Grant
    Filed: February 1, 2008
    Date of Patent: September 6, 2011
    Assignee: International Business Machines Corporation
    Inventors: Ravi K. Arimilli, Satya P. Sharma, Randal C. Swanberg
  • Publication number: 20110213951
    Abstract: Methods for storing branch information in an address table of a processor are disclosed. A processor of the disclosed embodiments may generally include an instruction fetch unit connected to an instruction cache, a branch execution unit, and an address table being connected to the instruction fetch unit and the branch execution unit. The address table may generally be adapted to store a plurality of entries with each entry of the address table being adapted to store a base address and a base instruction tag. In a further embodiment, the branch execution unit may be adapted to determine the address of a branch instruction having an instruction tag based on the base address and the base instruction tag of an entry of the address table associated with the instruction tag. In some embodiments, the address table may further be adapted to store branch information.
    Type: Application
    Filed: May 5, 2011
    Publication date: September 1, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Brian R. Konigsburg, David Stephen Levitan, Wolfram M. Sauer, Samuel Jonathan Thomas
  • Patent number: 8006041
    Abstract: A prefetch processing apparatus includes a central-processing-unit monitor unit that monitors processing states of the central processing unit in association with time elapsed from start time of executing a program. A cache-miss-data address obtaining unit obtains cache-miss-data addresses in association with the time elapsed from the start time of executing the program, and a cycle determining unit determines a cycle of time required for executing the program. An identifying unit identifies a prefetch position in a cycle in which a prefetch-target address is to be prefetched by associating the cycle determined by the cycle determining unit with the cache-miss data addresses obtained by the cache-miss-data address obtaining unit. The prefetch-target address is an address of data on which prefetch processing is to be performed.
    Type: Grant
    Filed: March 5, 2008
    Date of Patent: August 23, 2011
    Assignee: Fujitsu Limited
    Inventors: Shuji Yamamura, Takashi Aoki
  • Patent number: 7996203
    Abstract: A method, system, and computer program product are provided for verifying out of order instruction address (IA) stride prefetch performance in a processor design having more than one level of cache hierarchies. Multiple instruction streams are generated and the instructions loop back to corresponding instruction addresses. The multiple instruction streams are dispatched to a processor and simulation application to process. When a particular instruction is being dispatched, the particular instruction's instruction address and operand address are recorded in the queue. The processor is monitored to determine if the processor executes fetch and prefetch commands in accordance with the simulation application. It is checked to determine if prefetch commands are issued for instructions having three or more strides.
    Type: Grant
    Filed: January 31, 2008
    Date of Patent: August 9, 2011
    Assignee: International Business Machines Corporation
    Inventors: Wei-Yi Xiao, Dean G. Bair, Christopher A. Krygowski, Chung-Lung K. Shum
  • Patent number: 7996445
    Abstract: A data storage system pre-fetches data blocks from a mass storage device, then determines whether reallocation of the pre-fetched blocks would improve access to them. If access would be improved, the pre-fetched blocks are written to different areas of the mass storage device. Several different implementations of such data storage systems are described.
    Type: Grant
    Filed: April 27, 2007
    Date of Patent: August 9, 2011
    Assignee: Network Appliance, Inc.
    Inventors: Robert L. Fair, Robert M. English
  • Publication number: 20110179255
    Abstract: A processor 4 is provided with reset circuitry 48 which generates a reset signal to reset a plurality of state parameters. Partial reset circuitry 50 is additionally provided to reset a proper subset of this plurality of state parameters. The reset circuitry triggers a redirection of program flow. The partial reset circuitry permits a continuation of program flow. The partial reset circuitry may be used to place processors into a known state with a low latency before switching from a split mode of operation into a locked mode of operation.
    Type: Application
    Filed: January 21, 2010
    Publication date: July 21, 2011
    Applicant: ARM Limited
    Inventors: Chiloda Ashan Senerath Pathirane, Antony John Penton, Andrew Christopher Rose
  • Publication number: 20110173417
    Abstract: A wake-and-go mechanism may be a programming idiom accelerator. As a processor fetches instructions, the programming idiom accelerator may look ahead to determine whether a programming idiom is coming up in the instruction stream. If the programming idiom accelerator recognizes a programming idiom, the programming idiom accelerator may perform an action to accelerate execution of the programming idiom. In the case of a wake-and-go programming idiom, the programming idiom accelerator may record an entry in a wake-and-go array, for example.
    Type: Application
    Filed: February 1, 2008
    Publication date: July 14, 2011
    Inventors: Ravi K. Arimilli, Satya P. Sharma, Randal C. Swanberg
  • Publication number: 20110167243
    Abstract: Techniques and structures are disclosed for a processor supporting checkpointing to operate effectively in scouting mode while a maximum number of supported checkpoints are active. Operation in scouting mode may include using bypass logic and a set of register storage locations to store and/or forward in-flight instruction results that were calculated during scouting mode. These forwarded results may be used during scouting mode to calculate memory load addresses for yet other in-flight instructions, and the processor may accordingly cause data to be prefetched from these calculated memory load addresses. The set of register storage locations may comprise a working register file or an active portion of a multiported register file.
    Type: Application
    Filed: January 5, 2010
    Publication date: July 7, 2011
    Inventors: Sherman H. Yip, Paul Caprioli
  • Publication number: 20110161631
    Abstract: According to an aspect of an embodiment of the invention, an arithmetic processing unit includes a first cache memory unit that holds a part of data stored in a storage device; an address register that holds an address; a flag register that stores flag information; and a decoder that decodes a prefetch instruction for acquiring data stored at the address in the storage device. The arithmetic processing unit further includes an instruction execution unit that executes a cache hit check instruction instead of the prefetch instruction on the basis of a decoded result when the flag information is held, the cache hit check instruction allowing for searching the first cache memory unit with the address to thereby make a first cache hit determination that the first cache memory unit holds the data stored at the address in the storage device.
    Type: Application
    Filed: December 22, 2010
    Publication date: June 30, 2011
    Applicant: FUJITSU LIMITED
    Inventors: Iwao Yamazaki, Hiroyuki Imai
  • Patent number: 7971031
    Abstract: A method, system and computer program for modifying an executing application, comprising monitoring the executing application to identify at least one of a hot load instruction, a hot store instruction and an active prefetch instruction that contributes to cache congestion; where the monitoring identifies a hot load instruction, enabling at least one prefetch associated with the hot load instruction; where the monitoring identifies a hot store instruction, enabling at least one prefetch associated with the hot store instruction; and where the monitoring identifies an active prefetch instruction that contributes to cache congestion, one of disabling the active prefetch instruction and reducing the effectiveness of the active prefetch instruction.
    Type: Grant
    Filed: May 29, 2007
    Date of Patent: June 28, 2011
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Sujoy Saraswati, Teresa Johnson
  • Publication number: 20110153987
    Abstract: A multi-core processor system supporting simultaneous thread sharing across execution resources of multiple processor cores is provided. The multi-core processor system includes a first processor core with a first instruction queue and dispatch logic in communication with a first execution resource of the first processor core. The multi-core processor system also includes a second processor core with a second instruction queue and dispatch logic in communication with a second execution resource of the second processor core. A high-speed execution resource bus couples the first and second processor cores. The first instruction queue and dispatch logic is configured to issue a first instruction of a thread to the first execution resource and issue a second instruction of the thread over the high-speed execution resource bus to the second execution resource for simultaneous execution of the first and second instruction of the thread on the first and second processor cores.
    Type: Application
    Filed: December 17, 2009
    Publication date: June 23, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Shawn M. Luke, John Sargis, JR., Daneyand J. Singley
  • Publication number: 20110153988
    Abstract: Methods and apparatus to perform adaptive pre-fetch operations in managed runtime environments are disclosed herein. An example pre-fetch unit for use with a pre-fetch operation includes a first size function executed by a processor to determine a size of an object associated with a pre-fetch operation; a comparator to compare the size of the object associated with the pre-fetch operation to a first one of a plurality of thresholds; and a fetcher to pre-fetch an amount of memory defined by a first one of a plurality of size definitions corresponding to the first threshold when the comparator determines that the size of the object is less than the first threshold.
    Type: Application
    Filed: December 22, 2009
    Publication date: June 23, 2011
    Inventor: Mingqiu Sun
  • Patent number: 7962724
    Abstract: A system and method for management of resource allocation for speculative fetched instructions following small backward branch instructions. An instruction fetch unit speculatively prefetches a memory line for each fetched memory line. Each memory line may have a small backward branch instruction, which is a backward branch instruction that has a target instruction within the same memory line. For each fetched memory line, the instruction fetch unit determines if a small backward branch instruction exists among the instructions within the memory line. If a small backward branch instruction is found and predicted taken, then the instruction fetch unit inhibits the speculative prefetch for that particular thread. The speculative prefetch may resume for that thread after the branch loop is completed. System resources may be better allocated during the iterations of the small backward branch loop.
    Type: Grant
    Filed: September 28, 2007
    Date of Patent: June 14, 2011
    Assignee: Oracle America, Inc.
    Inventor: Abid Ali
  • Patent number: 7949830
    Abstract: A system and method for handling speculative read requests for a memory controller in a computer system are provided. In one example, a method includes the steps of providing a speculative read threshold corresponding to a selected percentage of the total number of reads that can be speculatively issued, and intermixing demand reads and speculative reads in accordance with the speculative read threshold. In another example, a computer system includes a CPU, a memory controller, memory, a bus connecting the CPU, memory controller and memory, circuitry for providing a speculative read threshold corresponding to a selected percentage of the total number of reads that can be speculatively issued, and circuitry for intermixing demand reads and speculative reads in accordance with the speculative read threshold.
    Type: Grant
    Filed: December 10, 2007
    Date of Patent: May 24, 2011
    Assignee: International Business Machines Corporation
    Inventors: James Johnson Allen, Jr., Steven Kenneth Jenkins, James A. Mossman, Michael Raymond Trombley
  • Patent number: 7950012
    Abstract: One embodiment of the present invention provides a system for communicating and performing synchronization operations between a main thread and a helper-thread. The system starts by executing a program in a main thread. Upon encountering a loop which has associated helper-thread code, the system commences the execution of the code by the helper-thread separately and in parallel with the main thread. While executing the code by the helper-thread, the system periodically checks the progress of the main thread and deactivates the helper-thread if the code being executed by the helper-thread is no longer performing useful work. Hence, the helper-thread is executes in advance of where the main thread is executing to prefetch data items for the main thread without unnecessarily consuming processor resources or hampering the execution of the main thread.
    Type: Grant
    Filed: November 9, 2005
    Date of Patent: May 24, 2011
    Assignee: Oracle America, Inc.
    Inventors: Partha P. Tirumalai, Yonghong Song, Spiros Kalogeropulos
  • Patent number: 7941638
    Abstract: One embodiment of the present invention provides a system that performs a fast-scanning operation to generate fetch bundles within an instruction fetch unit (IFU) of a processor. During operation, the system obtains a cache line containing instructions at the IFU. Next, the system performs a complete-scanning operation on the cache line to identify control transfer instructions (CTIs) in the cache line. At the same time, the system performs a fast-scanning operation to identify CTIs in a group of initial instructions in the cache line, wherein the initial instructions are executed before other instructions in the cache line. Next, the system obtains results from the fast-scanning operation before results of the complete-scanning operation are available. The system then uses results from the fast-scanning operation to form an initial fetch bundle containing initial instructions, and sends the initial fetch bundle to the instruction-issue unit.
    Type: Grant
    Filed: June 15, 2006
    Date of Patent: May 10, 2011
    Assignee: Oracle America, Inc.
    Inventors: Abid Ali, Andrew T. Ewoldt
  • Patent number: 7941639
    Abstract: A method for protecting the execution of a main program against possible traps, including, on occurrence of an instruction from the main program, starting a time counter of a given count according to next instructions of the main program, and executing, once the counter has reached its count, at least one instruction of a secondary program from which the result of the main program depends.
    Type: Grant
    Filed: July 5, 2006
    Date of Patent: May 10, 2011
    Assignee: STMicroelectronics S.A.
    Inventors: Yannick Teglia, Pierre-Yvan Liardet, Alain Pomet
  • Patent number: 7937533
    Abstract: A design structure embodied in a machine readable storage medium for designing, manufacturing, and/or testing a design is provided. The design structure generally includes a computer system that includes a CPU, a memory controller, memory, a bus connecting the CPU, memory controller and memory, circuitry for providing a speculative read threshold corresponding to a selected percentage of the total number of reads that can be speculatively issued, and circuitry for intermixing demand reads and speculative reads in accordance with the speculative read threshold.
    Type: Grant
    Filed: May 4, 2008
    Date of Patent: May 3, 2011
    Assignee: International Business Machines Corporation
    Inventors: James J. Allen, Jr., Steven K. Jenkins, James A. Mossman, Michael R. Trombley
  • Patent number: 7930485
    Abstract: A system and method for pre-fetching data from system memory. A multi-core processor accesses a cache hit predictor concurrently with sending a memory request to a cache subsystem. The predictor has two tables. The first table is indexed by a portion of a memory address and provides a hit prediction based on a first counter value. The second table is indexed by a core number and provides a hit prediction based on a second counter value. If neither table predicts a hit, a pre-fetch request is sent to memory. In response to detecting said hit prediction is incorrect, the pre-fetch is cancelled.
    Type: Grant
    Filed: July 19, 2007
    Date of Patent: April 19, 2011
    Assignee: Globalfoundries Inc.
    Inventors: Michael K Fertig, Patrick Conway, Kevin Michael Lepak, Cissy Xumin Yuan
  • Patent number: 7925865
    Abstract: In the described embodiments, a method for prefetching data and/or instructions may include generating control flow information for each retired branch instruction. A correlation table may be maintained based on the generated control flow information and cache miss addresses for each retired instruction that incurs one or more cache misses. Each correlation table entry may correspond to an index, and may contain a tag and a correlation list. The correlation list may consist of a specified number of cache miss addresses that most frequently follow the cache miss address for the index. A prefetch operation may be performed for each cache miss based on the contents of the correlation table entry corresponding to the index. The index may generated using a combination of bits of a given cache miss address and one or more bits of the program control flow information for the given cache miss address.
    Type: Grant
    Filed: June 2, 2008
    Date of Patent: April 12, 2011
    Assignee: Oracle America, Inc.
    Inventors: Yuan C. Chou, Yasuko Watanabe
  • Publication number: 20110078413
    Abstract: An arithmetic processing apparatus includes an arithmetic circuit; a first memory configured to store data to be processed in the arithmetic circuit; a second memory configured to be accessed through a first path by the arithmetic circuit; a preloader configured to preload the data from the second memory into the first memory through a second path; a memory controller configured to arbitrate between a first access by the arithmetic circuit using the first path and a second access by the preloader using the second path; and a scheduler configured to control the memory controller.
    Type: Application
    Filed: September 28, 2010
    Publication date: March 31, 2011
    Applicant: FUJITSU LIMITED
    Inventors: Hiromasa YAMAUCHI, Koichiro Yamashita
  • Patent number: 7917731
    Abstract: A processor performs a prefetch operation on non-sequential instruction addresses. If a first instruction address misses in an instruction cache and accesses a higher-order memory as part of a fetch operation, and a branch instruction associated with the first instruction address or an address following the first instruction address is detected and predicted taken, a prefetch operation is performed using a predicted branch target address, during the higher-order memory access. If the predicted branch target address hits in the instruction cache during the prefetch operation, associated instructions are not retrieved, to conserve power. If the predicted branch target address misses in the instruction cache during the prefetch operation, a higher-order memory access may be launched, using the predicted branch instruction address. In either case, the first instruction address is re-loaded into the fetch stage pipeline to await the return of instructions from its higher-order memory access.
    Type: Grant
    Filed: August 2, 2006
    Date of Patent: March 29, 2011
    Assignee: QUALCOMM Incorporated
    Inventors: Brian Michael Stempel, Thomas Andrew Sartorius, Rodney Wayne Smith
  • Patent number: 7913064
    Abstract: The present subject matter relates to operation frame filtering, building, and execution. Some embodiments include identifying a frame signature, counting a number of execution occurrences of the frame signature, and building a frame of operations to execute instead of operations identified by the frame signature.
    Type: Grant
    Filed: March 30, 2009
    Date of Patent: March 22, 2011
    Assignee: Intel Corporation
    Inventors: Stephan Jourdan, Per Hammarlund, Alexandre Farcy, John Alan Miller
  • Patent number: 7904660
    Abstract: A computer system and a method for enhancing the cache prefetch behavior. A computer system including a processor, a main memory, a prefetch controller, a cache memory, a prefetch buffer, and a main memory, wherein each page in the main memory has associated with it a tag, which is used for controlling the prefetching of a variable subset of lines from this page as well as lines from at least one other page. And, coupled to the processor is a prefetch controller, wherein the prefetch controller responds to the processor determining a fault (or miss) occurred to a line of data by fetching a corresponding line of data with the corresponding tag, with the corresponding tag to be stored in the prefetch buffer, and sending the corresponding line of data to the cache memory.
    Type: Grant
    Filed: August 23, 2007
    Date of Patent: March 8, 2011
    Assignee: International Business Machines Corporation
    Inventor: Peter Franaszek
  • Patent number: 7899996
    Abstract: Adaptively pre-fetching data includes collecting a first set of statistics based on a number of avoidable read-misses in which data exists that is prior to data being read, collecting a second set of statistics based on a number of avoidable read-misses in which data exists that follows data being read, and collecting a third set of statistics based on said first and second sets of statistics. On the basis of the second set of statistics, a pre-fetch technique is selected from a first technique that pre-fetches data following data being read and a second technique that pre-fetches data before and following the data being read. The first and third set of statistics may be used to determine when to pre-fetch data.
    Type: Grant
    Filed: December 31, 2007
    Date of Patent: March 1, 2011
    Assignee: EMC Corporation
    Inventor: Orit Levin-Michael
  • Patent number: 7895399
    Abstract: A processor reads a program including a prefetch command and a load command and data from a main memory, and executes the program. The processor includes: a processor core that executes the program; a L2 cache that stores data on the main memory for each predetermined unit of data storage; and a prefetch unit that pre-reads the data into the L2 cache from the main memory on the basis of a request for prefetch from the processor core. The prefetch unit includes: a L2 cache management table including an area in which a storage state is held for each position in the unit of data storage of the L2 cache and an area in which a request for prefetch is reserved; and a prefetch control unit that instructs, the L2 cache to perform the request for prefetch reserved or the request for prefetch from the processor core.
    Type: Grant
    Filed: February 13, 2007
    Date of Patent: February 22, 2011
    Assignee: Hitachi, Ltd.
    Inventors: Aki Tomita, Naonobu Sukegawa
  • Publication number: 20110035551
    Abstract: A microprocessor includes an instruction decoder for decoding a repeat prefetch indirect instruction that includes address operands used to calculate an address of a first entry in a prefetch table having a plurality of entries, each including a prefetch address. The repeat prefetch indirect instruction also includes a count specifying a number of cache lines to be prefetched. The memory address of each of the cache lines is specified by the prefetch address in one of the entries in the prefetch table. A count register, initially loaded with the count specified in the prefetch instruction, stores a remaining count of the cache lines to be prefetched. Control logic fetches the prefetch addresses of the cache lines from the table into the microprocessor and prefetches the cache lines from the system memory into a cache memory of the microprocessor using the count register and the prefetch addresses fetched from the table.
    Type: Application
    Filed: October 15, 2009
    Publication date: February 10, 2011
    Inventors: Rodney E. Hooker, John Michael Greer
  • Publication number: 20110010506
    Abstract: A data prefetcher includes a table of entries to maintain a history of load operations. Each entry stores a tag and a corresponding next stride. The tag comprises a concatenation of first and second strides. The next stride comprises the first stride. The first stride comprises a first cache line address subtracted from a second cache line address. The second stride comprises the second cache line address subtracted from a third cache line address. The first, second and third cache line addresses each comprise a memory address of a cache line implicated by respective first, second and third temporally preceding load operations. Control logic calculates a current stride by subtracting a previous cache line address from a new load cache line address, looks up in the table a concatenation of a previous stride and the current stride, and prefetches a cache line using the hitting table entry next stride.
    Type: Application
    Filed: October 5, 2009
    Publication date: January 13, 2011
    Applicant: VIA Technologies, Inc.
    Inventors: John Michael Greer, Rodney E. Hooker, Albert J. Loper, JR.
  • Patent number: 7870369
    Abstract: A method of determining a reason for a trace to be aborted includes receiving at least two incoming indications of occurrences of abort triggers stemming from the execution of at least two of the operations that are different from each other, where each of the abort triggers has an associated abort priority level, and where the trace represents multiple instructions. The method further includes prioritizing among the abort triggers for the trace based on the abort priority level of each abort trigger, where the prioritizing does not take into account a correspondence between operations and instructions and where the prioritizing selects as a pending abort reason one or more of the abort triggers that have the same abort priority level, and where that abort priority level is the highest among the abort priority levels of the abort triggers for the trace.
    Type: Grant
    Filed: October 24, 2007
    Date of Patent: January 11, 2011
    Assignee: Oracle America, Inc.
    Inventors: Christopher Patrick Nelson, John Gregory Favor, Richard Win Thaik
  • Publication number: 20100332800
    Abstract: An instruction control device connects to a cache memory that stores data frequently used among data stored in a main memory. The instruction control device includes: a first free-space determining unit that determines whether there is free space in an instruction buffer; a second free-space determining unit that manages an instruction fetch request queue that stores an instruction fetch data to be sent from the cache memory to the main memory, and determines whether a move-in buffer in the cache memory has free space for at least two entries if the first free-space determining unit determines that there is free space; and an instruction control unit that outputs an instruction prefetch request to the cache memory in accordance with an address boundary corresponding to a line size of the cache line, if the second free-space determining unit determines that the move-in buffer has free space.
    Type: Application
    Filed: June 29, 2010
    Publication date: December 30, 2010
    Applicant: Fujitsu Limited
    Inventor: Ryuichi Sunayama
  • Publication number: 20100318707
    Abstract: An external device access apparatus according to the present invention includes: an address control unit that accepts a prefetch request and a prefetch data readout request from a master and performs a prefetch operation and a prefetch data readout operation; a readout data storage unit that stores data read out through the prefetch operation; a storage operation status holding unit that holds a prefetch operation status indicating whether or not the prefetch operation has been completed; and an acceptance signal generation unit that outputs, to the master, an acceptance signal indicating that the prefetch data readout request has been accepted from the master. First information indicating a status of the prefetch operation is outputted to the master based on the prefetch operation status.
    Type: Application
    Filed: August 13, 2008
    Publication date: December 16, 2010
    Applicant: PANASONIC CORPORATION
    Inventors: Tsuyoshi Tanaka, Nobuo Higaki, Takasi Inoue, Yosuke Kudo, Kazushi Kurata
  • Publication number: 20100306503
    Abstract: A microprocessor includes a cache memory, an instruction set having first and second prefetch instructions each configured to instruct the microprocessor to prefetch a cache line of data from a system memory into the cache memory, and a memory subsystem configured to execute the first and second prefetch instructions. For the first prefetch instruction the memory subsystem is configured to forego prefetching the cache line of data from the system memory into the cache memory in response to a predetermined set of conditions. For the second prefetch instruction the memory subsystem is configured to complete prefetching the cache line of data from the system memory into the cache memory in response to the predetermined set of conditions.
    Type: Application
    Filed: May 17, 2010
    Publication date: December 2, 2010
    Applicant: VIA TECHNOLOGIES, INC.
    Inventors: G. Glenn Henry, Colin Eddy, Rodney E. Hooker
  • Patent number: 7840761
    Abstract: A processor executes one or more prefetch threads and one or more main computing threads. Each prefetch thread executes instructions ahead of a main computing thread to retrieve data for the main computing thread, such as data that the main computing thread may use in the immediate future. Data is retrieved for the prefetch thread and stored in a memory, such as data fetched from an external memory and stored in a buffer. A prefetch controller determines whether the memory is full. If the memory is full, a cache controller stalls at least one prefetch thread. The stall may continue until at least some of the data is transferred from the memory to a cache for use by at least one main computing thread. The stalled prefetch thread or threads are then reactivated.
    Type: Grant
    Filed: April 1, 2005
    Date of Patent: November 23, 2010
    Assignee: STMicroelectronics, Inc.
    Inventors: Osvaldo M. Colavin, Davide Rizzo
  • Patent number: 7836262
    Abstract: In one embodiment, a processor may be configured to write ECC granular stores into the data cache, while non-ECC granular stores may be merged with cache data in a memory request buffer. In one embodiment, a processor may be configured to detect that a victim block writeback hits one or more stores in a memory request buffer (or vice versa) and may convert the victim block writeback to a fill. In one embodiment, a processor may speculatively issue stores that are subsequent to a load from a load/store queue, but prevent the update for the stores in response to a snoop hit on the load.
    Type: Grant
    Filed: June 5, 2007
    Date of Patent: November 16, 2010
    Assignee: Apple Inc.
    Inventors: Ramesh Gunna, Sudarshan Kadambi
  • Patent number: 7836281
    Abstract: A system that facilitates improving performance of a processor during scout mode. During a normal-execution mode, the system executes instructions for using main thread. Upon encountering a stall condition during execution of the main thread, the system generates a checkpoint. The system then enters a scout mode, wherein instructions are speculatively executed by a speculative thread to prefetch future memory references, but results are not committed to the architectural state of the processor. Upon encountering a memory reference during scout mode, the system issues a prefetch for the memory reference. If the stall condition that caused the processor to enter scout mode is resolved, the system uses the checkpoint to resume execution of the main thread from the instruction that caused the stall condition, and simultaneously continues executing instructions in scout mode using the speculative thread from the point where the speculative thread left off.
    Type: Grant
    Filed: October 6, 2005
    Date of Patent: November 16, 2010
    Assignee: Oracle America, Inc.
    Inventors: Marc Tremblay, Shailender Chaudhry