Prefetching Patents (Class 712/207)
  • Patent number: 7836277
    Abstract: A method of managing an instruction cache and a process of using the method are provided. The processor may comprise a processor core which is operated either during an active mode or during an inactive mode wherein the process core performs at least one instruction during the active mode, an instruction cache which pre-traces a first instruction and determines, during the inactive mode, whether the processor core will meet a cache miss with regard to the first instruction, wherein the first instruction is to be performed by the processor core during the active mode, a coarse-grained array which performs a second instruction during the inactive mode, and a configuration memory which stores configuration information of the coarse-grained array, wherein the coarse-grained array performs the second instruction using the configuration information.
    Type: Grant
    Filed: March 5, 2008
    Date of Patent: November 16, 2010
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Il Hyun Park, Dong-Hoon Yoo, Dong Kwan Suh, Soojung Ryu, Jeongwook Kim
  • Publication number: 20100287357
    Abstract: In one embodiment, a serial processor is configured to execute software instructions in a software program in serial. A serial memory is configured to store data for use by the serial processor in executing the software instructions in serial. A plurality of parallel processors are configured to execute software instructions in the software program in parallel. A plurality of partitioned memory modules are provided and configured to store data for use by the plurality of parallel processors in executing software instructions in parallel. Accordingly, a processor/memory structure is provided that allows serial programs to use quick local serial memories and parallel programs to use partitioned parallel memories. The system may switch between a serial mode and a parallel mode. The system may incorporate pre-fetching commands of several varieties.
    Type: Application
    Filed: March 10, 2010
    Publication date: November 11, 2010
    Applicant: XMTT INC.
    Inventor: Uzi Y. Vishkin
  • Patent number: 7831806
    Abstract: A data processing apparatus comprises a processor for executing a stream of instructions, and a prefetch unit for prefetching instructions from a memory prior to sending those instructions to the processor for execution. The prefetch unit receives from the memory a plurality of prefetched instructions from sequential addresses in memory, and detects whether any prefetched instructions are an instruction flow changing instruction, and outputs a fetch address for a next instruction to be prefetched by the prefetch unit. Address generation logic is also provided which, for a selected prefetched instruction that is detected to be an instruction flow changing instruction, determines a target address to be output as the fetch address. Address generation logic has a first address generation path and a further generation path for determining the target address. The first address generation path generates the target address more quickly than the further address generation path.
    Type: Grant
    Filed: February 18, 2004
    Date of Patent: November 9, 2010
    Assignee: ARM Limited
    Inventor: Paul Anthony Gilkerson
  • Patent number: 7827389
    Abstract: A method, system, and computer program product are provided for enhancing the execution of independent loads in a processing unit. The processing unit dispatches a first set of instructions in order from a first buffer for execution. The processing unit receives updated results from the execution of the first set of instructions. The processing unit updates, in a first register, at least one register entry associated with each instruction in the first set of instructions, with the updated results. The processing unit determines if the first set of instructions from the first buffer have completed execution. Responsive to the completed execution of the first set of instructions from the first buffer, the processing unit copies the set of entries from the first register to a second register.
    Type: Grant
    Filed: June 15, 2007
    Date of Patent: November 2, 2010
    Assignee: International Business Machines Corporation
    Inventors: Hung Q. Le, Dung Q. Nguyen
  • Patent number: 7822943
    Abstract: Systems, methods and computer program products for improving data stream prefetching in a microprocessor are described herein.
    Type: Grant
    Filed: August 4, 2008
    Date of Patent: October 26, 2010
    Assignee: MIPS Technologies, Inc.
    Inventor: Keith E. Diefendorff
  • Patent number: 7814298
    Abstract: A method, system and computer program product for promoting a trace in an instruction processing circuit is disclosed. They comprise determining if a current trace is promotable and determining if a next trace is appendable to the current trace. They include promoting the current trace and the next trace if the current trace is promotable and the next trace is appendable.
    Type: Grant
    Filed: November 16, 2007
    Date of Patent: October 12, 2010
    Assignee: Oracle America, Inc.
    Inventors: Richard Thaik, John Gregory Favor, Joseph Rowlands, Leonard Eric Shar, Matthew Ashcraft
  • Patent number: 7814247
    Abstract: A pre-fetch circuit of a semiconductor memory apparatus can carry out a high-frequency operating test through a low-frequency channel of a test equipment. The pre-fetch circuit of a semiconductor memory apparatus can includes: a pre-fetch unit for pre-fetching data bits in a first predetermined number; a plurality of registers provided in the first predetermined number, each of which latches a data in order or a data out of order of the pre-fetched data in response to different control signals; and a control unit for selectively activating the different control signals in response to a test mode signal, whereby some of the registers latch the data out of order.
    Type: Grant
    Filed: July 18, 2008
    Date of Patent: October 12, 2010
    Assignee: Hynix Semiconductor Inc.
    Inventor: Young-Ju Kim
  • Publication number: 20100250854
    Abstract: An efficient and effective compiler data prefetching technique is disclosed in which memory accesses may be prefetched are represented in linear induction expressions. Furthermore, indirect memory accesses indexed by other memory accesses of linear induction expressions in scalar loops may be prefetched.
    Type: Application
    Filed: March 16, 2010
    Publication date: September 30, 2010
    Inventor: Dz-ching Ju
  • Patent number: 7805592
    Abstract: Techniques are disclosed for handling control transfer instructions in pipelined processors. Such instructions may cause the sequence of subsequent instructions to change, and thus may require subsequent instructions to be deleted from the processor's pipeline. Pre-decode means (110) are provided for at least partially decoding control transfer instructions early in the pipeline. Subsequent instructions can then be prevented from progressing through the pipeline. The mechanism required to delete unwanted instructions is thereby simplified.
    Type: Grant
    Filed: October 7, 2002
    Date of Patent: September 28, 2010
    Assignee: Altera Corporation
    Inventors: Nicholas Paul Joyce, Nigel Peter Topham
  • Publication number: 20100241811
    Abstract: Technologies are generally described for allocating available prefetch bandwidth among processor cores in a multiprocessor computing system. The prefetch bandwidth associated with an off-chip memory interface of the multiprocessor may be determined, partitioned, and allocated across multiple processor cores.
    Type: Application
    Filed: March 20, 2009
    Publication date: September 23, 2010
    Inventor: Yan Solihin
  • Patent number: 7802077
    Abstract: A new class traces for a processing engine, called “extended blocks,” possess an architecture that permits possible many entry points but only a single exit point. These extended blocks may be indexed based upon the address of the last instruction therein. Use of the new trace architecture provides several advantages, including reduction of instruction redundancies, dynamic block extension and a sharing of instructions among various extended blocks.
    Type: Grant
    Filed: June 30, 2000
    Date of Patent: September 21, 2010
    Assignee: Intel Corporation
    Inventors: Stephen J. Jourdan, Lihu Rappoport, Ronny Ronen, Adi Yoaz
  • Patent number: 7793085
    Abstract: A memory control circuit for providing a small-circuit-size memory control circuit capable of reducing a branch penalty during the execution of a branch instruction in a CPU. A branch-destination buffer caches a branch-destination instruction and a branch-destination-instruction address determined by a branch instruction executed by the CPU. When the CPU executes a branch instruction thereafter, if the branch-destination-instruction address output from the CPU matches an instruction address in the branch-destination buffer, the corresponding branch-destination instruction stored in the branch-destination buffer is sent to the CPU. When a branch instruction is executed, an address comparison circuit compares the branch-destination-instruction address with the branch-source-instruction address.
    Type: Grant
    Filed: December 30, 2004
    Date of Patent: September 7, 2010
    Assignee: Fujitsu Semiconductor Limited
    Inventor: Kenji Furuya
  • Patent number: 7783863
    Abstract: A method of handling a trace to be aborted includes receiving an indication of a trace to be aborted and an indication of an abort reason corresponding to an execution of the trace to be aborted. The trace to be aborted has a trace type associated therewith and includes a sequence of the operations, and represents a sequence of at least two of the instructions. The method further includes identifying a corrective action based at least in part on the type of the trace to be aborted and on the abort reason, not taking into account a correspondence between the at least one operation that caused the execution to be aborted and the at least one instruction that the at least one operation at least in part represents. A next trace and its trace type is determined for execution, where the determining is based on the trace to be aborted and on the corrective action.
    Type: Grant
    Filed: October 24, 2007
    Date of Patent: August 24, 2010
    Assignee: Oracle America, Inc.
    Inventors: Christopher Patrick Nelson, John Gregory Favor, Richard Win Thaik, Matthew William Ashcraft
  • Patent number: 7779232
    Abstract: A method and apparatus for dynamically managing instruction buffer depths for non-predicted branches reduces wasted energy and resources associated with low confidence branch prediction conditions. A portion of the instruction buffer for a instruction thread is allocated for storing predicted branch instruction streams and another portion, which may be zero-sized during high prediction confidence conditions, is allocated to the non-predicted branch instruction stream. The size of the buffers is adjusted dynamically in conformity with an on-going prediction confidence that provides a measure of how well branch prediction mechanisms are working for a given instruction thread. An alternate instruction fetch address table can be maintained and multiplexed with the main fetch address register for addressing the instruction cache, so that the instruction stream can be quickly shifted to the non-predicted path when a branch instruction is resolved to the non-predicted path.
    Type: Grant
    Filed: August 28, 2007
    Date of Patent: August 17, 2010
    Assignee: International Business Machines Corporation
    Inventors: Richard W. Doing, Michael O. Klett, Kevin N. Magill, Brian R. Mestan, David Mui, Balaram Sinharoy, Jeffrey R. Summers
  • Patent number: 7779234
    Abstract: The present invention includes a system and method for implementing a hardware-supported thread assist under load lookahead mechanism for a microprocessor. According to an embodiment of the present invention, hardware thread-assist mode can be activated when one thread of the microprocessor is in a sleep mode. When load lookahead mode is activated, the fixed point unit copies the content of one or more architected facilities from an active thread to corresponding architected facilities in the first inactive thread. The load-store unit performs at least one speculative load in load lookahead mode and writes the results of the at least one speculative load to a duplicated architected facility in the first inactive thread.
    Type: Grant
    Filed: October 23, 2007
    Date of Patent: August 17, 2010
    Assignee: International Business Machines Corporation
    Inventors: James W. Bishop, Hung Q. Le, Dung Q. Nguyen, Wolfram Sauer, Benjamin W. Stolt, Michael T. Vaden
  • Patent number: 7779233
    Abstract: A system and computer-implementable method for implementing software-supported thread assist within a data processing system, wherein the data processing system supports processing instructions within at least a first thread and a second thread. An instruction dispatch unit (IDU) places the first thread into a sleep mode. The IDU separates an instruction stream for the second thread into at least a first independent instruction stream and a second independent instruction stream. The first independent instruction stream is processed utilizing facilities allocated to the first thread and the second independent instruction stream is processed utilizing facilities allocated to the second thread.
    Type: Grant
    Filed: October 23, 2007
    Date of Patent: August 17, 2010
    Assignee: International Business Machines Corporation
    Inventors: Hung Q. Le, Dung Q. Nguyen
  • Patent number: 7769954
    Abstract: A data processing system includes: a cache memory comprising a plurality of ways, each of which stores a data line including a data and address information of the data; an analysis module that analyzes whether or not a data requested in a read instruction is to be used in a subsequent instruction to be executed within a predetermined time period after the execution of the read instruction is started; a mode selection module that selects one of a plurality of access modes for accessing the cache memory based on a result of the analysis module; and an access unit that accesses the cache memory in the selected one of the access modes when the read instruction is executed.
    Type: Grant
    Filed: March 29, 2007
    Date of Patent: August 3, 2010
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Kenta Yasufuku
  • Patent number: 7761667
    Abstract: A mechanism is provided that identifies instructions that access storage and may be candidates for catch prefetching. The mechanism augments these instructions so that any given instance of the instruction operates in one of four modes, namely normal, unexecuted, data gathering, and validation. In the normal mode, the instruction merely performs the function specified in the software runtime environment. An instruction in unexecuted mode, upon the next execution, is placed in data gathering mode. When an instruction in the data gathering mode is encountered, the mechanism of the present invention collects data to discover potential fixed storage access patterns. When an instruction is in validation mode, the mechanism of the present invention validates the presumed fixed storage access patterns.
    Type: Grant
    Filed: August 12, 2008
    Date of Patent: July 20, 2010
    Assignee: International Business Machines Corporation
    Inventors: Christopher Michael Donawa, Allan Henry Kielstra
  • Publication number: 20100153689
    Abstract: Instruction execution includes fetching an instruction that comprises a first set of one or more bits identifying the instruction, and a second set of one or more bits associated with a first address value. It further includes executing the instruction to determine whether to perform a trap, wherein executing the instruction includes selecting from a plurality of tests at least one test for determining whether to perform a trap and carrying out the at least one test.
    Type: Application
    Filed: February 12, 2010
    Publication date: June 17, 2010
    Inventors: Jack Choquette, Gil Tene, Michael A. Wolf
  • Patent number: 7730288
    Abstract: A method and apparatus for executing instructions. The method includes receiving a first load instruction and a second load instruction. The method also includes issuing the first load instruction and the second load instruction to a cascaded delayed execution pipeline unit having at least a first execution pipeline and a second execution pipeline, wherein the second execution pipeline executes an instruction in a common issue group in a delayed manner relative to another instruction in the common issue group executed in the first execution pipeline. The method also includes accessing a cache by executing the first load instruction and the second load instruction. A delay between execution of the first load instruction and the second load instruction allows the cache to complete the access with the first load instruction before beginning the access with the second load instruction.
    Type: Grant
    Filed: June 27, 2007
    Date of Patent: June 1, 2010
    Assignee: International Business Machines Corporation
    Inventor: David Arnold Luick
  • Patent number: 7730289
    Abstract: A method for preloading data in a CPU pipeline is provided, which includes the following steps. When a hint instruction is executed, allocate and initiate an entry in a preload table. When a load instruction is fetched, load a piece of data from a memory into the entry according to the entry. When a use instruction which uses the data loaded by the load instruction is executed, forward the data for the use instruction from the entry instead of from the memory. When the load instruction is executed, update the entry according to the load instruction.
    Type: Grant
    Filed: September 27, 2007
    Date of Patent: June 1, 2010
    Assignee: Faraday Technology Corp.
    Inventors: I-Jui Sung, Ming-Chung Kao
  • Patent number: 7725659
    Abstract: A method of obtaining data, comprising at least one sector, for use by at least a first thread wherein each processor cycle is allocated to at least one thread, includes the steps of: requesting data for at least a first thread; upon receipt of at least a first sector of the data, determining whether the at least first sector is aligned with the at least first thread, wherein a given sector is aligned with a given thread when a processor cycle in which the given sector will be written is allocated to the given thread; responsive to a determination that the at least first sector is aligned with the at least first thread, bypassing the at least first sector, wherein bypassing a sector comprises reading the sector while it is being written; and responsive to a determination that the at least first sector is not aligned with the at least first thread, delaying the writing of the at least first sector until the occurrence of a processor cycle allocated to the at least first thread by retaining the at least first s
    Type: Grant
    Filed: September 5, 2007
    Date of Patent: May 25, 2010
    Assignee: International Business Machines Corporation
    Inventors: Michael Karl Gschwind, Hans Mikael Jacobson, Robert Alan Philhower
  • Patent number: 7721070
    Abstract: A high-performance, superscalar-based computer system with out-of-order instruction execution for enhanced resource utilization and performance throughput. The computer system fetches a plurality of fixed length instructions with a specified, sequential program order (in-order). The computer system includes an instruction execution unit including a register file, a plurality of functional units, and an instruction control unit for examining the instructions and scheduling the instructions for out-of-order execution by the functional units. The register file includes a set of temporary data registers that are utilized by the instruction execution control unit to receive data results generated by the functional units. The data results of each executed instruction are stored in the temporary data registers until all prior instructions have been executed, thereby retiring the executed instruction in-order.
    Type: Grant
    Filed: September 22, 2008
    Date of Patent: May 18, 2010
    Inventors: Le Trong Nguyen, Derek J. Lentz, Yoshiyuki Miyayama, Sanjiv Garg, Yasuaki Hagiwara, Johannes Wang, Te-Li Lau, Sze-Shun Wang, Quang H. Trang
  • Publication number: 20100122064
    Abstract: A device may include a data processing logic cell field and one or more sequential CPUs. The logic cell field and the CPUs may be configured to be coupled to each other for data exchange. The data exchange may be in block form using lines leading to a cache memory. In a method for operating a reconfigurable unit having runtime-limited configurations, the configurations may be able to increase their maximum allowed runtime, e.g., by triggering a parallel counter. An increase in configuration runtime by the configurations may be suppressed in response to an interrupt.
    Type: Application
    Filed: September 30, 2009
    Publication date: May 13, 2010
    Inventor: MARTIN VORBACH
  • Patent number: 7716427
    Abstract: In a microprocessor having a load/store unit and prefetch hardware, the prefetch hardware includes a prefetch queue containing entries indicative of allocated data streams. A prefetch engine receives an address associated with a store instruction executed by the load/store unit. The prefetch engine determines whether to allocate an entry in the prefetch queue corresponding to the store instruction by comparing entries in the queue to a window of addresses encompassing multiple cache blocks, where the window of addresses is derived from the received address. The prefetch engine compares entries in the prefetch queue to a window of 2M contiguous cache blocks. The prefetch engine suppresses allocation of a new entry when any entry in the prefetch queue is within the address window. The prefetch engine further suppresses allocation of a new entry when the data address of the store instruction is equal to an address in a border area of the address window.
    Type: Grant
    Filed: January 4, 2008
    Date of Patent: May 11, 2010
    Assignee: International Business Machines Corporation
    Inventors: John Barry Griswell, Jr., Hung Qui Le, Francis Patrick O'Connell, William J. Starke, Jeffrey Adam Stuecheli, Albert Thomas Williams
  • Publication number: 20100115513
    Abstract: Provided is a virtual machine including a first virtualization module operating on a physical CPU, for providing a first CPU, and a second virtualization module operating on the first CPU, for providing second CPU. The second virtualization module includes first processor control information holding a state of the first CPU obtained at a time of execution of the user program. The first virtualization module includes second processor control information containing a state of the physical CPU obtained at the time of the execution of the second virtualization module, third processor control information containing a state of the physical CPU obtained at the time of the execution of the user program, and prefetch entry information in which information to be prefetched from the third processor control information is set, and, upon detection of a event, the information set in the prefetch entry information is reflected to the first processor control information.
    Type: Application
    Filed: October 30, 2009
    Publication date: May 6, 2010
    Inventors: Toshiomi MORIKI, Naoya Hattori, Yuji Tsushima
  • Patent number: 7711927
    Abstract: An instruction preload instruction executed in a first processor instruction set operating mode is operative to correctly preload instructions in a different, second instruction set. The instructions are pre-decoded according to the second instruction set encoding in response to an instruction set preload indicator (ISPI). In various embodiments, the ISPI may be set prior to executing the preload instruction, or may comprise part of the preload instruction or the preload target address.
    Type: Grant
    Filed: March 14, 2007
    Date of Patent: May 4, 2010
    Assignee: QUALCOMM Incorporated
    Inventors: Thomas Andrew Sartorius, Brian Michael Stempel, Rodney Wayne Smith
  • Patent number: 7707388
    Abstract: In one embodiment, a serial processor is configured to execute software instructions in a software program in serial. A serial memory is configured to store data for use by the serial processor in executing the software instructions in serial. A plurality of parallel processors are configured to execute software instructions in the software program in parallel. A plurality of partitioned memory modules are provided and configured to store data for use by the plurality of parallel processors in executing software instructions in parallel. Accordingly, a processor/memory structure is provided that allows serial programs to use quick local serial memories and parallel programs to use partitioned parallel memories. The system may switch between a serial mode and a parallel mode. The system may incorporate pre-fetching commands of several varieties.
    Type: Grant
    Filed: November 29, 2006
    Date of Patent: April 27, 2010
    Assignee: XMTT Inc.
    Inventor: Uzi Vishkin
  • Patent number: 7702856
    Abstract: The prefetch distance to be used by a prefetch instruction may not always be correctly calculated using compile-time information. In one embodiment, the present invention generates prefetch distance calculation code to dynamically calculate a prefetch distance used by a prefetch instruction at run-time.
    Type: Grant
    Filed: November 9, 2005
    Date of Patent: April 20, 2010
    Assignee: Intel Corporation
    Inventors: Rakesh Krishnaiyer, Somnath Ghosh, Abhay Kanhere
  • Patent number: 7702888
    Abstract: An apparatus for executing branch predictor directed prefetch operations. During operation, a branch prediction unit may provide an address of a first instruction to the fetch unit. The fetch unit may send a fetch request for the first instruction to the instruction cache to perform a fetch operation. In response to detecting a cache miss corresponding to the first instruction, the fetch unit may execute one or more prefetch operation while the cache miss corresponding to the first instruction is being serviced. The branch prediction unit may provide an address of a predicted next instruction in the instruction stream to the fetch unit. The fetch unit may send a prefetch request for the predicted next instruction to the instruction cache to execute the prefetch operation. The fetch unit may store prefetched instruction data obtained from a next level of memory in the instruction cache or in a prefetch buffer.
    Type: Grant
    Filed: February 28, 2007
    Date of Patent: April 20, 2010
    Assignee: GlobalFoundries Inc.
    Inventors: Marius Evers, Trivikram Krishnamurthy
  • Publication number: 20100082948
    Abstract: In a CCW fetching section, for each input/output device being a control objective, a result prediction table in which prediction values of status values to be returned from an input/output device as execution results of CCW commands, is referred to. Then, based on the prediction values, commands being pre-fetching objectives are pre-fetched from a CCW program stored in a memory, and transmitted to a CCW executing section. On the other hand, in the CCW executing section, the pre-fetched commands are sequentially executed, and the actual status values as the execution results are received from the input/output device. Then, when the received actual status values are not same as the predicted status values, success or failure in prediction is notified to the CCW fetching section, and also, the result prediction table is updated in the CCW fetching section.
    Type: Application
    Filed: June 29, 2009
    Publication date: April 1, 2010
    Applicant: FUJITSU LIMITED
    Inventors: Tsukasa Matsuda, Hideki Yamanaka
  • Patent number: 7689775
    Abstract: Computer implemented method, system and computer program product for prefetching data in a data processing system. A computer implemented method for prefetching data in a data processing system includes generating attribute information of prior data streams by associating attributes of each prior data stream with a storage access instruction which caused allocation of the data stream, and then recording the generated attribute information. The recorded attribute information is accessed, and a behavior of a new data stream is modified using the accessed recorded attribute information.
    Type: Grant
    Filed: March 9, 2009
    Date of Patent: March 30, 2010
    Assignee: International Business Machines Corporation
    Inventors: John Barry Griswell, Jr., Francis Patrick O'Connell
  • Patent number: 7689774
    Abstract: A system and method for improving the page crossing performance of a data prefetcher is presented. A prefetch engine tracks times at which a data stream terminates due to a page boundary. When a certain percentage of data streams terminate at page boundaries, the prefetch engine sets an aggressive profile flag. In turn, when the data prefetch engine receives a real address that corresponds to the beginning/end of a new page, and the aggressive profile flag is set, the prefetch engine uses an aggressive startup profile to generate and schedule prefetches on the assumption that the real address is highly likely to be the continuation of a long data stream. As a result, the system and method minimize latency when crossing real page boundaries when a program is predominately accessing long streams.
    Type: Grant
    Filed: April 6, 2007
    Date of Patent: March 30, 2010
    Assignee: International Business Machines Corporation
    Inventors: Francis Patrick O'Connell, Jeffrey A. Stuecheli
  • Patent number: 7681188
    Abstract: One embodiment of the present invention provides a system that facilitates locked prefetch scheduling in general cyclic regions of a computer program. The system operates by first receiving a source code for the computer program and compiling the source code into intermediate code. The system then performs a trace detection on the intermediate code. Next, the system inserts prefetch instructions and corresponding locks into the intermediate code. Finally, the system generates executable code from the intermediate code, wherein a lock for a given prefetch instruction prevents subsequent prefetches from being issued until the data value returns for the given prefetch instruction.
    Type: Grant
    Filed: April 29, 2005
    Date of Patent: March 16, 2010
    Assignee: Sun Microsystems, Inc.
    Inventors: Partha P. Tirumalai, Spiros Kalogeropulos, Yonghong Song
  • Patent number: 7676659
    Abstract: In a processor executing instructions from a variable-length instruction set, a preload instruction is operative to retrieve from memory a data block corresponding to an instruction cache line, pre-decode instructions from a variable-length instruction set in the data block, and load the instructions and pre-decode information into the instruction cache. An instruction execution unit indicates to a pre-decoder the position within the data block of a first valid instruction. The pre-decoder successively determines the length of each instruction and hence the instruction boundaries. An instruction cache line offset indicator that identifies the position of the first valid instruction may be generated and provided to the pre-decoder in a variety of ways.
    Type: Grant
    Filed: April 4, 2007
    Date of Patent: March 9, 2010
    Assignee: QUALCOMM Incorporated
    Inventors: Brian Michael Stempel, Thomas Andrew Sartorius, Rodney Wayne Smith
  • Publication number: 20100049947
    Abstract: A processor and an early-load method thereof are provided. In the early-load method, an instruction is fetched and determined in an instruction fetch stage to obtain a determination result. Whether to early-load an early-loaded data corresponding to the instruction is determined according to the determination result. A target data is fetched according to the instruction in an instruction execution stage if the early-loaded data is not loaded correctly. The early-loaded data is served as the target data if the early-loaded data is loaded correctly.
    Type: Application
    Filed: August 22, 2008
    Publication date: February 25, 2010
    Applicant: FARADAY TECHNOLOGY CORP.
    Inventors: Shun-Chieh Chang, Yuan-Hwa Li, Yuan-Jung Kuo, Chin-Ling Huang, Chung-Ping Chung
  • Patent number: 7669194
    Abstract: A mechanism for minimizing effective memory latency without unnecessary cost through fine-grained software-directed data prefetching using integrated high-level and low-level code analysis and optimizations is provided. The mechanism identifies and classifies streams, identifies data that is most likely to incur a cache miss, exploits effective hardware prefetching to determine the proper number of streams to be prefetched, exploits effective data prefetching on different types of streams in order to eliminate redundant prefetching and avoid cache pollution, and uses high-level transformations with integrated lower level cost analysis in the instruction scheduler to schedule prefetch instructions effectively.
    Type: Grant
    Filed: August 26, 2004
    Date of Patent: February 23, 2010
    Assignee: International Business Machines Corporation
    Inventors: Roch Georges Archambault, Robert James Blainey, Yaoqing Gao, Allan Russell Martin, James Lawrence McInnes, Francis Patrick O'Connell
  • Patent number: 7664942
    Abstract: Embodiments of the present invention provide a system that executes program code in a processor. The system starts by executing the program code in a normal mode using a primary strand while concurrently executing the program code ahead of the primary strand using a subordinate strand in a scout mode. Upon resolving a branch using the subordinate strand, the system records a resolution for the branch in a speculative branch resolution table. Upon subsequently encountering the branch using the primary strand, the system uses the recorded resolution from the speculative branch resolution table to predict a resolution for the branch for the primary strand. Upon determining that the resolution of the branch was mispredicted for the primary strand, the system determines that the subordinate strand mispredicted the branch. The system then recovers the subordinate strand to the branch and restarts the subordinate strand executing the program code.
    Type: Grant
    Filed: August 25, 2008
    Date of Patent: February 16, 2010
    Assignee: Sun Microsystems, Inc.
    Inventors: Marc Tremblay, Shailender Chaudhry
  • Patent number: 7664920
    Abstract: A microprocessor includes a hierarchical memory subsystem, an instruction decoder, and a stream prefetch unit. The decoder decodes an instruction that specifies a locality characteristic parameter. In one embodiment, the parameter specifies a relative urgency with which a data stream specified by the instruction is needed rather than specifying exactly which of the cache memories in the hierarchy to prefetch the data stream into. The prefetch unit selects one of the cache memory levels in the hierarchy for prefetching the data stream into based on the memory subsystem configuration and on the relative urgency. In another embodiment, the prefetch unit instructs the memory subsystem to mark the prefetched cache line for early, late, or normal eviction according to its cache line replacement policy based on the parameter value.
    Type: Grant
    Filed: August 11, 2006
    Date of Patent: February 16, 2010
    Assignee: MIPS Technologies, Inc.
    Inventor: Keith E. Diefendorff
  • Publication number: 20100036987
    Abstract: Techniques for interrupt processing are described. An exceptional condition is detected in one or more stages of an instruction pipeline in a processor. In response to the detected exceptional condition and prior to the processor accepting an interrupt in response to the detected exceptional condition, an instruction cache is checked for the presence of an instruction at a starting address of an interrupt handler. The instruction at the starting address of the interrupt vector table is prefetched from storage above the instruction cache when the instruction is not present in the instruction cache to load the instruction in the instruction cache, whereby the instruction is made available in the instruction cache by the time the processor accepts the interrupt in response to the detected exceptional condition.
    Type: Application
    Filed: August 8, 2008
    Publication date: February 11, 2010
    Applicant: QUALCOMM INCORPORATED
    Inventors: Daren Eugene Streett, Brian Michael Stempel
  • Patent number: 7657883
    Abstract: A dispatch scheduler in a multithreading microprocessor is disclosed. Each of N concurrently executing threads has one of P priorities. P N-bit round-robin vectors are generated, each being a 1-bit left-rotated and subsequently sign-extended version of an N-bit 1-hot input vector indicating the last thread selected for dispatching at the priority. N P-input muxes each receive a corresponding one of the N bits of each of the P round-robin vectors and selects the input specified by the thread priority. Selection logic selects an instruction for dispatching from the thread having a dispatch value greater than or equal to any of the threads left thereof in the N-bit input vectors. The dispatch value of each of the threads comprises a least-significant bit equal to the corresponding P-input mux output, a most-significant bit that is true if the instruction is dispatchable, and middle bits comprising the priority of the thread.
    Type: Grant
    Filed: March 22, 2005
    Date of Patent: February 2, 2010
    Assignee: MIPS Technologies, Inc.
    Inventor: Michael Gottlieb Jensen
  • Patent number: 7657723
    Abstract: A system and method are described for a memory management processor which, using a table of reference addresses embedded in the object code, can open the appropriate memory pages to expedite the retrieval of information from memory referenced by instructions in the execution pipeline. A suitable compiler parses the source code and collects references to branch addresses, calls to other routines, or data references, and creates reference tables listing the addresses for these references at the beginning of each routine. These tables are received by the memory management processor as the instructions of the routine are beginning to be loaded into the execution pipeline, so that the memory management processor can begin opening memory pages where the referenced information is stored. Opening the memory pages where the referenced information is located before the instructions reach the instruction processor helps lessen memory latency delays which can greatly impede processing performance.
    Type: Grant
    Filed: January 28, 2009
    Date of Patent: February 2, 2010
    Assignee: Micron Technology, Inc.
    Inventor: Dean A. Klein
  • Patent number: 7647477
    Abstract: Inspecting a currently fetched instruction group and determining branching behavior of the currently fetched instruction group, allows for intelligent instruction prefetching. A currently fetched instruction group is predecoded and, assuming the currently fetch instruction group includes a branch type instruction, a branch target is characterized in relation to a fetch boundary, which delimits a memory region contiguous with the memory region that hosts the currently fetched instruction group. Instruction prefetching is included based, at least in part, on the predecoded characterization of the branch target.
    Type: Grant
    Filed: November 23, 2004
    Date of Patent: January 12, 2010
    Assignee: Sun Microsystems, Inc.
    Inventors: Paul Caprioli, Shailender Chaudhry
  • Publication number: 20100005251
    Abstract: The memory unit is compatible with a plurality of operation modes. The plurality of operation modes include the normal mode allowing access and the standby mode consuming a lower power than the normal mode. The branch detection section detects a branch instruction from an instruction fetched from the memory unit by the CPU. The mode control section changes an operation mode of the memory unit according to a detection result by the branch detection section.
    Type: Application
    Filed: December 23, 2008
    Publication date: January 7, 2010
    Applicant: NEC ELECTRONICS CORPORATION
    Inventor: Kiminari Yamazoe
  • Publication number: 20090313455
    Abstract: A multithreaded processor is provided with a saturating counter which serves to generate a thread preference signal to steer selection of which program thread operations are taken from for issue into the multiple processor pipelines. The counter is updated based upon the selections made for issue. The counter is a saturating counter and its sign bit may be used as a thread preference signal when discriminating between two threads. The update made to the count value can be weighted depending upon programmable priorities associated with the respective threads as well as a weighting based upon the time taken to execute the type of operation selected.
    Type: Application
    Filed: December 15, 2005
    Publication date: December 17, 2009
    Inventors: David Hennah Mansell, Stuart David Biles
  • Publication number: 20090313456
    Abstract: A method, storage medium, processor instruction and processor to for specifying a value in a first portion of a conditional pre-fetch instruction associated with a branch instruction used for effectuating a branch operation, specifying a target instruction address in a second portion of the instruction, evaluating the value to determine whether a condition is met, and pre-fetching one or more instructions starting at the target instruction address into an instruction buffer of the processor when the condition is met, is provided.
    Type: Application
    Filed: August 13, 2009
    Publication date: December 17, 2009
    Applicant: SONY COMPUTER ENTERTAINMENT INC.
    Inventors: Masahiro Yasue, Akiyuki Hatakeyama
  • Patent number: 7627740
    Abstract: A method, storage medium, processor instruction and processor to for specifying a value in a first portion of a conditional pre-fetch instruction associated with a branch instruction used for effectuating a branch operation, specifying a target instruction address in a second portion of the instruction, evaluating the value to determine whether a condition is met, and pre-fetching one or more instructions starting at the target instruction address into an instruction buffer of the processor when the condition is met, is provided.
    Type: Grant
    Filed: January 31, 2006
    Date of Patent: December 1, 2009
    Assignee: Sony Computer Entertainment Inc.
    Inventors: Masahiro Yasue, Akiyuki Hatakeyama
  • Patent number: 7620749
    Abstract: A DMA device prefetches descriptors into a descriptor prefetch buffer. The size of descriptor prefetch buffer holds an appropriate number of descriptors for a given latency environment. To support a linked list of descriptors, the DMA engine prefetches descriptors based on the assumption that they are sequential in memory and discards any descriptors that are found to violate this assumption. The DMA engine seeks to keep the descriptor prefetch buffer full by requesting multiple descriptors per transaction whenever possible. The bus engine fetches these descriptors from system memory and writes them to the prefetch buffer. The DMA engine may also use an aggressive prefetch where the bus engine requests the maximum number of descriptors that the buffer will support whenever there is any space in the descriptor prefetch buffer. The DMA device discards any remaining descriptors that cannot be stored.
    Type: Grant
    Filed: January 10, 2007
    Date of Patent: November 17, 2009
    Assignee: International Business Machines Corporation
    Inventors: Giora Biran, Luis E. De la Torre, Bernard C. Drerup, Jyoti Gupta, Richard Nicholas
  • Publication number: 20090276576
    Abstract: Techniques are described for decoupling fetching of an instruction stored in a main program memory from earliest execution of the instruction. An indirect execution method and program instructions to support such execution arc addressed. In addition, an improved indirect deferred execution processor (DXP) VLIW architecture is described which supports a scalable array of memory centric processor elements that do not require local load and store units.
    Type: Application
    Filed: July 9, 2009
    Publication date: November 5, 2009
    Applicant: Altera Corporation
    Inventors: Gerald George Pechanek, Stamatis Vassiliadis
  • Patent number: RE41012
    Abstract: A double indirect method of accessing a block of data in a register file is used to allow efficient implementations without the use of specialized vector processing hardware. In addition, the automatic modification of the register addressing is not tied to a single vector instruction nor to repeat or loop instructions. Rather, the technique, termed register file indexing (RFI) allows full programmer flexibility in control of the block data operational facility and provides the capability to mix non-RFI instructions with RFI instructions. The block-data operation facility is embedded in the iVLIW ManArray architecture allowing its generalized use across the instruction set architecture without specialized vector instructions or being limited in use only with repeat or loop instructions.
    Type: Grant
    Filed: June 3, 2004
    Date of Patent: November 24, 2009
    Assignee: Altera Corporation
    Inventors: Edwin Franklin Barry, Gerald George Pechanek, Patrick R. Marchand