Prefetching Patents (Class 712/207)
  • Patent number: 8484437
    Abstract: A data processing apparatus includes a pre-fetch unit configured to divide and store data, a validation setting unit configured to store information regarding whether or not the data stored in the pre-fetch unit are valid, an address generation unit configured to generate an address for reading/storing the data from/in the pre-fetch unit, and a pre-fetch control unit configured to control a storage position of the data in the pre-fetch unit by using the address and information of the address generation unit and the validation setting unit.
    Type: Grant
    Filed: September 7, 2010
    Date of Patent: July 9, 2013
    Assignee: Hynix Semiconductor
    Inventor: Seok-In Kim
  • Patent number: 8484436
    Abstract: A memory controller is configured to receive read requests from a processor and return memory words from memory. The memory controller comprises an address comparator and a loop entry cache. The address comparator is configured to determine a difference between a previous read request address and a current read request address. The address comparator is also configured to determine whether the difference is positive and less than a certain address difference and, if so, indicate a limited backwards jump. The loop entry cache is configured to store a current memory word for the current read request address when the address comparator indicates a limited backwards jump.
    Type: Grant
    Filed: September 2, 2010
    Date of Patent: July 9, 2013
    Assignee: Atmel Corporation
    Inventors: Franck Lunadier, Frédéric Schumacher
  • Patent number: 8462789
    Abstract: A network processor of an embodiment includes a packet classification engine, a processing pipeline, and a controller. The packet classification engine allows for classifying each of a plurality of packets according to packet type. The processing pipeline has a plurality of stages for processing each of the plurality of packets in a pipelined manner, where each stage includes one or more processors. The controller allows for providing the plurality of packets to the processing pipeline in an order that is based at least partially on: (i) packet types of the plurality of packets as classified by the packet classification engine and (ii) estimates of processing times for processing packets of the packet types at each stage of the plurality of stages of the processing pipeline. A method in a network processor allows for prefetching instructions into a cache for processing a packet based on a packet type of the packet.
    Type: Grant
    Filed: May 9, 2012
    Date of Patent: June 11, 2013
    Inventor: Justin Mark Sobaje
  • Publication number: 20130138922
    Abstract: Systems and methods are disclosed for enhancing the throughput of a processor by minimizing the number of transfers of data associated with data transfer between a register file and a memory stack. The register file used by a processor running an application is partitioned into a number of blocks. A subset of the blocks of the register file is defined in an application binary interface enabling the subset to be pre-allocated and exposed to the application binary interface. Optionally, blocks other than the subset are not exposed to the application binary interface so that the data relating to application function switch or a context switch is not transferred between the unexposed blocks and a memory stack.
    Type: Application
    Filed: November 29, 2011
    Publication date: May 30, 2013
    Applicant: International Business Machines Corporation
    Inventors: Revital Eres, Amit Golander, Nadav Levison, Sagi Manole, Ayal Zaks
  • Patent number: 8453161
    Abstract: This disclosure describes a method and system that may enable fast, hardware-assisted, producer-consumer style communication of values between threads. The method, in one aspect, uses a dedicated hardware buffer as an intermediary storage for transferring values from registers in one thread to registers in another thread. The method may provide a generic, programmable solution that can transfer any subset of register values between threads in any given order, where the source and target registers may or may not be correlated. The method also may allow for determinate access times, since it completely bypasses the memory hierarchy. Also, the method is designed to be lightweight, focusing on communication, and keeping synchronization facilities orthogonal to the communication mechanism. It may be used by a helper thread that performs data prefetching for an application thread, for example, to initialize the upward-exposed reads in the address computation slice of the helper thread code.
    Type: Grant
    Filed: May 25, 2010
    Date of Patent: May 28, 2013
    Assignee: International Business Machines Corporation
    Inventors: Michael K. Gschwind, John K. O'Brien, Valentina Salapura, Zehra N. Sura
  • Publication number: 20130124828
    Abstract: The disclosed embodiments relate to a system that executes program instructions on a processor. During a normal-execution mode, the system issues instructions for execution in program order. Upon encountering an unresolved data dependency during execution of an instruction, the system speculatively executes subsequent instructions in a lookahead mode to prefetch future loads. When an instruction retires during the lookahead mode, a working register which serves as a destination register for the instruction is not copied to a corresponding architectural register. Instead the architectural register is marked as invalid. Note that by not updating architectural registers during lookahead mode, the system eliminates the need to checkpoint the architectural registers prior to entering lookahead mode.
    Type: Application
    Filed: November 10, 2011
    Publication date: May 16, 2013
    Applicant: ORACLE INTERNATIONAL CORPORATION
    Inventors: Yuan C. Chou, Eric W. Mahurin
  • Publication number: 20130124829
    Abstract: The disclosed embodiments relate to a system that executes program instructions on a processor. During a normal-execution mode, the system issues instructions for execution in program order. Upon encountering an unresolved data dependency during execution of an instruction, the system speculatively executes subsequent instructions in a lookahead mode to prefetch future loads. While executing in the lookahead mode, if the processor determines that the lookahead mode is unlikely to uncover any additional outer-level cache misses, the system terminates the lookahead mode. Then, after the unresolved data dependency is resolved, the system recommences execution in the normal-execution mode from the instruction that triggered the lookahead mode.
    Type: Application
    Filed: November 10, 2011
    Publication date: May 16, 2013
    Applicant: ORACLE INTERNATIONAL CORPORATION
    Inventors: Yuan C. Chou, Eric W. Mahurin
  • Patent number: 8443171
    Abstract: The present invention provides a system and method for runtime updating of hints in program instructions. The invention also provides for programs of instructions that include hint performance data. Also, the invention provides an instruction cache that modifies hints and writes them back. As runtime hint updates are stored in instructions, the impact of the updates is not limited by the limited memory capacity local to a processor. Also, there is no conflict between hardware and software hints, as they can share a common encoding in the program instructions.
    Type: Grant
    Filed: July 30, 2004
    Date of Patent: May 14, 2013
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Dale Morris, James E. McCormick
  • Patent number: 8443151
    Abstract: An apparatus and method is described herein for optimization to prefetch throttling, which potentially enhances performance, reduces power consumption, and maintains positive gain for workloads that benefit from prefetching. More specifically, the optimizations described herein allow for bandwidth congestion and prefetch accuracy to be taken into account as feedbacks for throttling at the source of prefetch generation. As a result, when there is low congestion, full prefetch generation is allowed, even if the prefetch is inaccurate, since there is available bandwidth. However, when congestion is high, the determination of throttling falls to prefetch accuracy. If accuracy is high—miss rate is low—then less throttling is needed, because the prefetches are being utilized—performance is being enhanced.
    Type: Grant
    Filed: November 9, 2009
    Date of Patent: May 14, 2013
    Assignee: Intel Corporation
    Inventors: Puqi P. Tang, Hemant G. Rotithor, Ryan L. Carlson, Nagi Aboulenein
  • Patent number: 8423720
    Abstract: A computer system having a main unit and an expansion unit connected by an interface arrangement. The expansion unit includes at least one connector for receiving an input/output component, so that additional input/output components can be added to the computer system. The interface arrangement includes at least one cache controller and at least one cache memory for monitoring and predicting requests exchanged between the main unit and the expansion unit. A method of caching and processing input/output requests and a storage medium is also provided.
    Type: Grant
    Filed: May 5, 2008
    Date of Patent: April 16, 2013
    Assignee: International Business Machines Corporation
    Inventor: Andreas Christian Döring
  • Patent number: 8413127
    Abstract: A mechanism for minimizing effective memory latency without unnecessary cost through fine-grained software-directed data prefetching using integrated high-level and low-level code analysis and optimizations is provided. The mechanism identifies and classifies streams, identifies data that is most likely to incur a cache miss, exploits effective hardware prefetching to determine the proper number of streams to be prefetched, exploits effective data prefetching on different types of streams in order to eliminate redundant prefetching and avoid cache pollution, and uses high-level transformations with integrated lower level cost analysis in the instruction scheduler to schedule prefetch instructions effectively.
    Type: Grant
    Filed: December 22, 2009
    Date of Patent: April 2, 2013
    Assignee: International Business Machines Corporation
    Inventors: Roch G. Archambault, Robert J. Blainey, Yaoqing Gao, Allan R. Martin, James L. McInnes, Francis Patrick O'Connell
  • Patent number: 8387053
    Abstract: A method of performing operations in a computer system, computer system, and related method of compilation, are disclosed. In one embodiment, the method of performing includes providing compiled code having at least one thread, where each of the at least one thread includes a respective plurality of blocks and each respective block includes a respective pre-fetch component and a respective execute component. The method also includes performing a first pre-fetch component from a first block of a first thread of the at least one thread, performing a first additional component after the first pre-fetch component has been performed, and performing a first execute component from the first block of the first thread. The first execute component is performed after the first additional component has been performed, and the first additional component is from either a second thread or another block of the first thread that is not the first block.
    Type: Grant
    Filed: January 25, 2007
    Date of Patent: February 26, 2013
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Blaine D. Gaither, Verna Knapp, Jerome Huck, Benjamin D. Osecky
  • Patent number: 8364907
    Abstract: In one embodiment, a processor may be configured to write ECC granular stores into the data cache, while non-ECC granular stores may be merged with cache data in a memory request buffer. In one embodiment, a processor may be configured to detect that a victim block writeback hits one or more stores in a memory request buffer (or vice versa) and may convert the victim block writeback to a fill. In one embodiment, a processor may speculatively issue stores that are subsequent to a load from a load/store queue, but prevent the update for the stores in response to a snoop hit on the load.
    Type: Grant
    Filed: January 27, 2012
    Date of Patent: January 29, 2013
    Assignee: Apple Inc.
    Inventors: Ramesh Gunna, Sudarshan Kadambi
  • Patent number: 8364902
    Abstract: A microprocessor includes an instruction decoder for decoding a repeat prefetch indirect instruction that includes address operands used to calculate an address of a first entry in a prefetch table having a plurality of entries, each including a prefetch address. The repeat prefetch indirect instruction also includes a count specifying a number of cache lines to be prefetched. The memory address of each of the cache lines is specified by the prefetch address in one of the entries in the prefetch table. A count register, initially loaded with the count specified in the prefetch instruction, stores a remaining count of the cache lines to be prefetched. Control logic fetches the prefetch addresses of the cache lines from the table into the microprocessor and prefetches the cache lines from the system memory into a cache memory of the microprocessor using the count register and the prefetch addresses fetched from the table.
    Type: Grant
    Filed: October 15, 2009
    Date of Patent: January 29, 2013
    Assignee: VIA Technologies, Inc.
    Inventors: Rodney E. Hooker, John Michael Greer
  • Patent number: 8341357
    Abstract: One embodiment provides a system that pre-fetches into a sibling cache. During operation, a first thread executes in a first processor core associated with a first cache, while a second thread associated with the first thread simultaneously executes in a second processor core associated with a second cache. During execution, the second thread encounters an instruction that triggers a request to a lower-level cache which is shared by the first cache and the second cache. The system responds to this request by directing a load fill which returns from the lower-level cache in response to the request to the first cache, thereby reducing cache misses for the first thread.
    Type: Grant
    Filed: March 16, 2010
    Date of Patent: December 25, 2012
    Assignee: Oracle America, Inc.
    Inventors: Martin R. Karlsson, Shailender Chaudhry, Robert E. Cypher
  • Publication number: 20120297169
    Abstract: A data processing apparatus which sequentially executes a verification process so as to recognize a target object, comprising: an obtaining unit configured to obtain dictionary data to be referred to in the verification process; a holding unit configured to hold a plurality of dictionary data; a verification unit configured to execute the verification process for the input data by referring to one dictionary data; a history holding unit configured to hold a verification result; and a prefetch determination unit configured to determine based on the verification result whether to execute prefetch processing in which the obtaining unit obtains in advance dictionary data to be referred to by the verification unit in a succeeding verification process, and holds the dictionary data in the holding unit before the succeeding verification process.
    Type: Application
    Filed: May 10, 2012
    Publication date: November 22, 2012
    Applicant: CANON KABUSHIKI KAISHA
    Inventor: Akiyoshi Momoi
  • Patent number: 8291202
    Abstract: Techniques for interrupt processing are described. An exceptional condition is detected in one or more stages of an instruction pipeline in a processor. In response to the detected exceptional condition and prior to the processor accepting an interrupt in response to the detected exceptional condition, an instruction cache is checked for the presence of an instruction at a starting address of an interrupt handler. The instruction at the starting address of the interrupt vector table is prefetched from storage above the instruction cache when the instruction is not present in the instruction cache to load the instruction in the instruction cache, whereby the instruction is made available in the instruction cache by the time the processor accepts the interrupt in response to the detected exceptional condition.
    Type: Grant
    Filed: August 8, 2008
    Date of Patent: October 16, 2012
    Inventors: Daren Eugene Streett, Brian Michael Stempel
  • Patent number: 8271750
    Abstract: A data processing system includes a data store having storage locations storing entries which can be used for a variety of purposes, such as operand value prediction, branch prediction, etc. An entry profile store stores profile data for more candidate entries than there are storage locations within the data store. The profile data is used to determine replacement policy for entries within the data store. The profile data can include hash values used to determine whether predictions associated with candidate entries were correct without having to store the full predictions within the profile data.
    Type: Grant
    Filed: January 18, 2008
    Date of Patent: September 18, 2012
    Assignee: ARM Limited
    Inventors: Sami Yehia, Marios Kleanthous
  • Publication number: 20120233442
    Abstract: Techniques and structures are disclosed relating to predicting return addresses in multithreaded processors. In one embodiment, a processor is disclosed that includes a return address prediction unit. The return address prediction unit is configured to store return addresses for different ones of a plurality of threads executable on the processor. The return address prediction unit is configured to receive a request for a predicted return address for one of the plurality of threads. The first request includes an identification of the requesting thread. The return address prediction unit is configured to provide the predicted return address to the requesting thread. In some embodiments, the return address prediction unit is configured to store the return addresses in a memory that has a plurality of dedicated portions. In some embodiments, the return address prediction unit is configured to store the return addresses in a memory that has dynamically allocable entries.
    Type: Application
    Filed: March 11, 2011
    Publication date: September 13, 2012
    Inventors: Manish K. Shah, Gregory F. Grohoski, Zeid H. Samoail
  • Patent number: 8266381
    Abstract: In at least one embodiment, a processor detects during execution of program code whether a load instruction within the program code is associated with a hint. In response to detecting that the load instruction is not associated with a hint, the processor retrieves a full cache line of data from the memory hierarchy into the processor in response to the load instruction. In response to detecting that the load instruction is associated with a hint, a processor retrieves a partial cache line of data into the processor from the memory hierarchy in response to the load instruction.
    Type: Grant
    Filed: February 1, 2008
    Date of Patent: September 11, 2012
    Assignee: International Business Machines Corporation
    Inventors: Ravi K. Arimilli, Gheorghe C. Cascaval, Balaram Sinharoy, William E. Speight, Lixin Zhang
  • Publication number: 20120226892
    Abstract: One embodiment of the present invention provides a system that generates code for a scout thread to prefetch data values for a main thread. During operation, the system compiles source code for a program to produce executable code for the program. This compilation process involves performing reuse analysis to identify prefetch candidates which are likely to be touched during execution of the program. Additionally, this compilation process produces executable code for the scout thread which contains prefetch instructions to prefetch the identified prefetch candidates for the main thread. In this way, the scout thread can subsequently be executed in parallel with the main thread in advance of where the main thread is executing to prefetch data items for the main thread.
    Type: Application
    Filed: March 16, 2005
    Publication date: September 6, 2012
    Inventors: Partha P. Tirumalai, Yonghong Song, Spiros Kalogeropulos
  • Patent number: 8260990
    Abstract: A system and method for selective preclusion of bus access requests are disclosed. In an embodiment, a method includes determining a bus unit access setting at a logic circuit of a processor. The method also includes selectively precluding a bus unit access request based on the bus unit access setting.
    Type: Grant
    Filed: November 19, 2007
    Date of Patent: September 4, 2012
    Assignee: QUALCOMM Incorporated
    Inventors: Lucian Codrescu, Ajay Anant Ingle, Christopher Edward Koob, Erich James Plondke
  • Patent number: 8250307
    Abstract: According to a method of data processing, a memory controller receives a prefetch load request from a processor core of a data processing system. The prefetch load request specifies a requested line of data. In response to receipt of the prefetch load request, the memory controller determines by reference to a stream of demand requests how much data is to be supplied to the processor core in response to the prefetch load request. In response to the memory controller determining to provide less than all of the requested line of data, the memory controller provides less than all of the requested line of data to the processor core.
    Type: Grant
    Filed: February 1, 2008
    Date of Patent: August 21, 2012
    Assignee: International Business Machines Corporation
    Inventors: Ravi K. Arimilli, Gheorghe C. Cascaval, Balaram Sinharoy, William E. Speight, Lixin Zhang
  • Patent number: 8250231
    Abstract: A method to reduce buffer capacity in a processor includes giving the data packets admittance to the processor through at least one interface, storing the data packets in at least one input buffer, and using a packet rate shaper outside of a processing pipeline to control flow of the data packets to the pipeline before the data packets enter the pipeline. First and second data packets are given admittance to the pipeline in dependence on cost information per packet that is dependent upon an expected time period of residence of the first data packet in the pipeline. Cost information dependent upon an expected time period of residence of the second data packet in the pipeline differs from said cost information dependent upon the expected time period of residence of the first data packet in the pipeline.
    Type: Grant
    Filed: December 20, 2005
    Date of Patent: August 21, 2012
    Assignee: Marvell International Ltd.
    Inventors: Thomas Bodén, Jakob Carlström
  • Patent number: 8250344
    Abstract: A method, storage medium, processor instruction and processor to for specifying a value in a first portion of a conditional pre-fetch instruction associated with a branch instruction used for effectuating a branch operation, specifying a target instruction address in a second portion of the instruction, evaluating the value to determine whether a condition is met, and pre-fetching one or more instructions starting at the target instruction address into an instruction buffer of the processor when the condition is met, is provided.
    Type: Grant
    Filed: August 13, 2009
    Date of Patent: August 21, 2012
    Assignee: Sony Computer Entertainment Inc.
    Inventors: Masahiro Yasue, Akiyuki Hatakeyama
  • Publication number: 20120173850
    Abstract: A high-performance information processing technique permitting updating of an instruction buffer ready for effective prefetching to branch instructions and returning to the subroutine with a small volume of hardware is to be provided at low cost. It is an information processing apparatus equipped with a CPU, a memory, prefetch means and the like, wherein a prefetch address generator unit in the prefetch means decodes a branching series of instructions including at least one branched address calculating instruction and branching instruction to a branched address out of a current instruction buffer storing the series of instructions currently accessed by the CPU, and thereby looks ahead to the branching destination address. The information processing apparatus further comprises a RTS instruction buffer for storing a series of instructions of the return destinations of RTS instructions, and series of instructions stored in the current instruction buffer are saved into the RTS instruction buffer.
    Type: Application
    Filed: March 16, 2012
    Publication date: July 5, 2012
    Applicant: RENESAS ELECTRONICS CORPORATION
    Inventors: Teppei HIROTSU, Yuuichi ABE, Takeshi KATAOKA, Yasuhiro NAKATSUKA
  • Patent number: 8209520
    Abstract: An apparatus for executing fixed width instructions in a multiple execution unit system has a device for fetching instructions from a memory, and a decoder for decoding each fetched instruction in turn. A determination is made as to whether each decoded instruction includes a portion to fetch a locally stored instruction from a local store. If it does, the locally stored instruction is fetched and locally stored portion are executed.
    Type: Grant
    Filed: March 20, 2007
    Date of Patent: June 26, 2012
    Assignee: Imagination Technologies Limited
    Inventor: Andrew Webber
  • Publication number: 20120159126
    Abstract: A programming language may include hint instructions that may notify a programming idiom accelerator that a programming idiom is coming. An idiom begin hint exposes the programming idiom to the programming idiom accelerator. Thus, the programming idiom accelerator need not perform pattern matching or other forms of analysis to recognize a sequence of instructions. Rather, the programmer may insert idiom hint instructions, such as an idiom begin hint, to expose the idiom to the programming idiom accelerator. Similarly, an idiom end hint may mark the end of the programming idiom.
    Type: Application
    Filed: February 1, 2008
    Publication date: June 21, 2012
    Inventors: Ravi K Arimilli, Satya P. Sharma, Randal C. Swanberg
  • Patent number: 8195888
    Abstract: Technologies are generally described for allocating available prefetch bandwidth among processor cores in a multiprocessor computing system. The prefetch bandwidth associated with an off-chip memory interface of the multiprocessor may be determined, partitioned, and allocated across multiple processor cores.
    Type: Grant
    Filed: March 20, 2009
    Date of Patent: June 5, 2012
    Assignee: Empire Technology Development LLC
    Inventor: Yan Solihin
  • Publication number: 20120131311
    Abstract: The disclosed embodiments provide a system that facilitates prefetching an instruction cache line in a processor. During execution of the processor, the system performs a current instruction cache access which is directed to a current cache line. If the current instruction cache access causes a cache miss or is a first demand fetch for a previously prefetched cache line, the system determines whether the current instruction cache access is discontinuous with a preceding instruction cache access. If so, the system completes the current instruction cache access by performing a cache access to service the cache miss or the first demand fetch, and also prefetching a predicted cache line associated with a discontinuous instruction cache access which is predicted to follow the current instruction cache access.
    Type: Application
    Filed: November 23, 2010
    Publication date: May 24, 2012
    Applicant: ORACLE INTERNATIONAL CORPORATION
    Inventor: Yuan C. Chou
  • Patent number: 8185721
    Abstract: A system including a dual function adder is described. In one embodiment, the system includes an adder. The adder is configured for a first instruction to determine an address for a hardware prefetch if the first instruction is a hardware prefetch instruction. The adder is further configured for the first instruction to determine a value from an arithmetic operation if the first instruction is an arithmetic operation instruction.
    Type: Grant
    Filed: March 4, 2008
    Date of Patent: May 22, 2012
    Assignee: QUALCOMM Incorporated
    Inventors: Ajay Anant Ingle, Erich James Plondke, Lucian Codrescu
  • Publication number: 20120124336
    Abstract: A signal processing system comprising at least one master device at least one memory element and prefetch module arranged to perform prefetching from at least one memory element upon a memory access request to the at least one memory element from the at least one master device. Upon receiving a memory access request from the at least one master device, the prefetch module is arranged to configure the enabling of prefetching of at least one of instruction information and data information in relation to that memory access request based at least partly on an address to which the memory access request relates.
    Type: Application
    Filed: July 20, 2009
    Publication date: May 17, 2012
    Applicant: Freescale Semiconductor, Inc.
    Inventors: Alistair Robertson, Joseph Circello, Mark Maiolani
  • Patent number: 8166259
    Abstract: A memory control apparatus, a memory control method and an information processing system are disclosed. Fetch response data retrieved from a main storage unit is received, while bypassing a storage unit, by a first port in which the received fetch response data can be set. The fetch response data retrieved from the main storage unit, if unable to be set in the first port, is set in a second port through the storage unit. A transmission control unit performs priority control operation to send out, in accordance with a predetermined priority, the fetch response data set in the first port or the second port to the processor. As a result, the latency is shortened from the time when the fetch response data arrives to the time when the fetch response data is sent out toward the processor in response to a fetch request from the processor.
    Type: Grant
    Filed: March 26, 2009
    Date of Patent: April 24, 2012
    Assignee: Fujitsu Limited
    Inventor: Souta Kusachi
  • Publication number: 20120096240
    Abstract: A method, system and computer-usable medium are disclosed for managing prefetch streams in a virtual machine environment. Compiled application code in a first core, which comprises a Special Purpose Register (SPR) and a plurality of first prefetch engines, initiates a prefetch stream request. If the prefetch stream request cannot be initiated due to unavailability of a first prefetch engine, then an indicator bit indicating a Prefetch Stream Dispatch Fault is set in the SPR, causing a Hypervisor to interrupt the execution of the prefetch stream request. The Hypervisor then calls its associated operating system (OS), which determines prefetch engine availability for a second core comprising a plurality of second prefetch engines. If a second prefetch engine is available, then the OS migrates the prefetch stream request from the first core to the second core, where it is initiated on an available second prefetch engine.
    Type: Application
    Filed: October 15, 2010
    Publication date: April 19, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Matthew Accapadi, Robert H. Bell, JR., Hong L. Hua, Ram Raghavan, Mysore S. Srinivas
  • Publication number: 20120096241
    Abstract: A method, system and computer-usable medium are disclosed for managing transient instruction streams. Transient flags are defined in Branch-and-Link (BRL) instructions that are known to be infrequently executed. A bit is likewise set in a Special Purpose Register (SPR) of the hardware (e.g., a core) that is executing an instruction request thread. Subsequent fetches or prefetches in the request thread are treated as transient and are not written to lower-level caches. If an instruction is non-transient, and if a lower-level cache is non-inclusive of the L1 instruction cache, a fetch or prefetch miss that is obtained from memory may be written in both the L1 and the lower-level cache. If it is not inclusive, a cast-out from the L1 instruction cache may be written in the lower-level cache.
    Type: Application
    Filed: October 15, 2010
    Publication date: April 19, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Robert H. Bell, JR., Hong L. Hua, Ram Raghavan, Mysore S. Srinivas
  • Publication number: 20120096039
    Abstract: Database elements are inserted into a database object by processing each of a plurality of operations in a sequential order within a first processing round to insert the database elements into the database objects, where processing for at least one operation in the order becomes suspended due to a resource request, and where at least one successive operation is initiated in response to suspension of one or more prior operations to enable prefetching of information for processing the operations. Each suspended operation is re-processed with the prefetched information in one or more additional processing rounds until processing of the operations is completed.
    Type: Application
    Filed: October 18, 2010
    Publication date: April 19, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Robert W. Lyle, Ping Wang
  • Patent number: 8161245
    Abstract: A method and apparatus for performing data prefetch in a multiprocessor system are disclosed. The multiprocessor system includes multiple processors, each having a cache memory. The cache memory is subdivided into multiple slices. A group of prefetch requests is initially issued by a requesting processor in the multiprocessor system. Each prefetch request is intended for one of the respective slices of the cache memory of the requesting processor. In response to the prefetch requests being missed in the cache memory of the requesting processor, the prefetch requests are merged into one combined prefetch request. The combined prefetch request is then sent to the cache memories of all the non-requesting processors within the multiprocessor system. In response to a combined clean response from the cache memories of all the non-requesting processors, data are then obtained for the combined prefetch request from a system memory.
    Type: Grant
    Filed: February 9, 2005
    Date of Patent: April 17, 2012
    Assignee: International Business Machines Corporation
    Inventors: James S. Fields, Jr., Benjiman L. Goodman, Guy L. Guthrie, Jeffrey A. Stuecheli
  • Publication number: 20120084511
    Abstract: A processor of an information handling system (IHS) initiates an L3 cache prefetch operation in response to a demand load during instruction processing. The processor selects an L3 cache prefetch at random for tracking as a target prefetched instruction. The processor initiates an L1 cache target prefetch operation and stores the resultant target prefetched instruction in the L1 cache. If a demand load arrives, the processor analyses the target prefetched instruction for effectiveness and determines the source of the prefetch data. If a demand does not arrive, the processor tests to determine if the particular prefetched instruction timed out in the cache and identifies the infectiveness of the prefetch operation. The processor samples multiple prefetch operations at random and generates a history of prefetch effectiveness and other useful prefetch information. The processor stores the prefetch effectiveness information to enable reduction or removal of ineffective prefetch operations.
    Type: Application
    Filed: October 4, 2010
    Publication date: April 5, 2012
    Applicant: International Business Machines Corporation
    Inventors: Miles R. Dooley, Venkat R. Indukuru, Alex E. Mericas, Francis P. O'Connell
  • Publication number: 20120084532
    Abstract: A microcontroller using an optimized buffer replacement strategy comprises a memory configured to store instructions, a processor configured to execute said program instructions, and a memory accelerator operatively coupled between the processor and the memory. The memory accelerator is configured to receive an information request and overwrite the buffer from which the prefetch was initiated with the requested information when the request is fulfilled by a previously initiated prefetch operation.
    Type: Application
    Filed: September 30, 2010
    Publication date: April 5, 2012
    Applicant: NXP B.V.
    Inventors: Craig MaCkenna, Richard N. Varney, Gregory K. Goodhue
  • Publication number: 20120079242
    Abstract: A processor and method are disclosed. In one embodiment the processor includes a prefetch buffer that stores macro instructions. The processor also includes a clock circuit that can provide a clock signal for at least some of the functional units within the processor. The processor additionally includes macro instruction decode logic that can determine a class of each macro instruction. The processor also includes a clock management unit that can cause the clock signal to remain in a steady state entering at least one of the units in the processor that do not operate on a current macro instruction being decoded. Finally, the processor also includes at least one instruction decoder unit that can decode the first macro instruction into one or more opcodes.
    Type: Application
    Filed: September 24, 2010
    Publication date: March 29, 2012
    Inventors: Venkateswara R. Madduri, Jonathan Y. Tong, Hoichi Cheong
  • Publication number: 20120079243
    Abstract: A graphics processing unit core 26 includes a plurality of processing pipelines 38, 40, 42, 44. A program instruction of a thread of program instructions being executed by a processing pipeline includes a next-instruction-type field 36 indicating an instruction type of a next program instruction following the current program instruction within the processing thread concerned. This next-instruction-type field is used to control selection of to which processing pipeline the next instruction is issued before that next instruction has been fetched and decoded. The next-instruction-type field may be passed along the processing pipeline as the least significant four bits within a program counter value associated with a current program instruction 32. The next-instruction-type field may also be used to control the forwarding of thread state variables between processing pipelines when a thread migrates between processing pipelines prior to the next program instruction being fetched or decoded.
    Type: Application
    Filed: September 1, 2011
    Publication date: March 29, 2012
    Applicant: ARM Limited
    Inventor: Jorn Nystad
  • Patent number: 8145887
    Abstract: A method, system, and computer program product are provided for enhancing the execution of independent loads in a processing unit. A processing unit detects if a long-latency miss associated with a load instruction has been encountered. Responsive to a long-latency miss, the processing unit enters a load lookahead mode. Responsive to entering the load lookahead mode, the processing unit dispatches each instruction from a first set of instructions from a first buffer with an associated vector. The processing unit determines if the first set of instructions from the first buffer have completed execution. Responsive to completed execution of the first set of instructions from the first buffer, the processing unit copies the set of vectors from a first vector array to a second vector array. Then the processing unit dispatches a second set of instructions from a second buffer with an associated vector from the second vector array.
    Type: Grant
    Filed: June 15, 2007
    Date of Patent: March 27, 2012
    Assignee: International Business Machines Corporation
    Inventors: Hung Q. Le, Dung Q. Nguyen
  • Patent number: 8145849
    Abstract: A wake-and-go mechanism is provided for a data processing system. The wake-and-go mechanism is configured to issue a look-ahead load command on a system bus to read a data value from a target address and perform a comparison operation to determine whether the data value at the target address indicates that an event for which a thread is waiting has occurred. In response to the comparison resulting in a determination that the event has not occurred, the wake-and-go engine populates a wake-and-go storage array with the target address and snooping the target address on the system bus without data exclusivity. In response to the comparison resulting in a determination that the event has occurred, the wake-and-go engine issues a load command on the system bus to read the data value from the target address with data exclusivity.
    Type: Grant
    Filed: February 1, 2008
    Date of Patent: March 27, 2012
    Assignee: International Business Machines Corporation
    Inventors: Ravi K. Arimilli, Satya P. Sharma, Randal C. Swanberg
  • Patent number: 8145883
    Abstract: A preload instruction in a first instruction set is executed at a processor. The preload instruction causes the processor to preload one or more instructions into an instruction cache. The pre-loaded instructions are pre-decoded according to a second instruction set that is different from the first instruction set. The preloaded instructions are pre-decoded according to the second instruction set in response to an instruction set preload indicator (ISPI).
    Type: Grant
    Filed: March 12, 2010
    Date of Patent: March 27, 2012
    Assignee: QUALCOMM Incorporation
    Inventors: Thomas Andrew Sartorius, Brian Michael Stempel, Rodney Wayne Smith
  • Patent number: 8145879
    Abstract: In one embodiment, a serial processor is configured to execute software instructions in a software program in serial. A serial memory is configured to store data for use by the serial processor in executing the software instructions in serial. A plurality of parallel processors are configured to execute software instructions in the software program in parallel. A plurality of partitioned memory modules are provided and configured to store data for use by the plurality of parallel processors in executing software instructions in parallel. Accordingly, a processor/memory structure is provided that allows serial programs to use quick local serial memories and parallel programs to use partitioned parallel memories. The system may switch between a serial mode and a parallel mode. The system may incorporate pre-fetching commands of several varieties.
    Type: Grant
    Filed: March 10, 2010
    Date of Patent: March 27, 2012
    Assignee: XMTT Inc.
    Inventor: Uzi Y. Vishkin
  • Publication number: 20120072702
    Abstract: A prefetch cancelation arbiter improves access to a shared memory resource by arbitrarily canceling speculative prefetches. The prefetch cancelation arbiter applies a set of arbitrary policies to speculative prefetches to select one or more of the received speculative prefetches to cancel. The selected speculative prefetches are canceled and a cancelation notification of each canceled speculative prefetch is sent to a higher-level memory component such as a prefetch unit or a local memory arbiter that is local to the processor associated with the canceled speculative prefetch. The set of arbitrary policies is used to reduce memory accesses to the shared memory resource.
    Type: Application
    Filed: September 15, 2011
    Publication date: March 22, 2012
    Inventors: Matthew D. Pierson, Joseph R.M. Zbiciak, Kai Chirca, Amitabh Menon, Timothy D. Anderson
  • Patent number: 8141098
    Abstract: An apparatus initiates, in connection with a context switch operation, a prefetch of data likely to be used by a thread prior to resuming execution of that thread. As a result, once it is known that a context switch will be performed to a particular thread, data may be prefetched on behalf of that thread so that when execution of the thread is resumed, more of the working state for the thread is likely to be cached, or at least in the process of being retrieved into cache memory, thus reducing cache-related performance penalties associated with context switching.
    Type: Grant
    Filed: January 16, 2009
    Date of Patent: March 20, 2012
    Assignee: International Business Machines Corporation
    Inventors: Jeffrey Powers Bradford, Harold F. Kossman, Timothy John Mullins
  • Patent number: 8131946
    Abstract: In one embodiment, a processor may be configured to write ECC granular stores into the data cache, while non-ECC granular stores may be merged with cache data in a memory request buffer. In one embodiment, a processor may be configured to detect that a victim block writeback hits one or more stores in a memory request buffer (or vice versa) and may convert the victim block writeback to a fill. In one embodiment, a processor may speculatively issue stores that are subsequent to a load from a load/store queue, but prevent the update for the stores in response to a snoop hit on the load.
    Type: Grant
    Filed: October 20, 2010
    Date of Patent: March 6, 2012
    Assignee: Apple Inc.
    Inventors: Ramesh Gunna, Sudarshan Kadambi
  • Patent number: 8127080
    Abstract: A wake-and-go mechanism is provided for a data processing system. The wake-and-go mechanism is configured to issue a look-ahead load command on a system bus to read a data value from a target address and perform a comparison operation to determine whether the data value at the target address indicates that an event for which a thread is waiting has occurred. In response to the comparison resulting in a determination that the event has not occurred, the wake-and-go engine populates the wake-and-go storage array with the target address and snoops the target address on the system bus.
    Type: Grant
    Filed: February 1, 2008
    Date of Patent: February 28, 2012
    Assignee: International Business Machines Corporation
    Inventors: Ravi K. Arimilli, Satya P. Sharma, Randal C. Swanberg
  • Patent number: 8108652
    Abstract: The claimed invention is an efficient and high-performance vector processor. Through minimizing the use of multiple banks of memory and/or multi-ported memory blocks to reduce implementation cost, vector memory 450 provides abundant memory bandwidth and enables sustained low-delay memory operations for a large number of SIMD (Single Instruction, Multiple Data) or vector operators simultaneously.
    Type: Grant
    Filed: September 10, 2008
    Date of Patent: January 31, 2012
    Inventor: Ronald Chi-Chun Hui