Prefetching Patents (Class 712/207)
  • Patent number: 9304773
    Abstract: A data processor (102) includes a prefetch buffer (112) and a fetch control unit (116). The prefetch buffer (112) has a plurality of lines. The prefetch buffer (112) has a variable maximum depth that defines a number of lines of the plurality of lines that are capable of storing instructions. The fetch control unit (116) is coupled to the prefetch buffer to monitor at least one of the plurality of lines of the prefetch buffer (112) and to adjust the variable maximum depth of the prefetch buffer (112) in response to a state of the data processor (102).
    Type: Grant
    Filed: March 21, 2006
    Date of Patent: April 5, 2016
    Assignee: FREESCALE SEMICONDUCTOR, INC.
    Inventors: Jeffrey W. Scott, William C. Moyer
  • Patent number: 9298644
    Abstract: Provided are method and device for managing a memory in a data stream management system (DSMS) of a portable device. The method includes moving data of a selected memory region that has a low priority to a secondary storage and storing a received data stream in the selected memory region.
    Type: Grant
    Filed: March 15, 2012
    Date of Patent: March 29, 2016
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Seung-woo Ryu, Seok-jin Hong, Keun-joo Kwon
  • Patent number: 9286224
    Abstract: In an embodiment, a processor includes at least one core having one or more execution units, a first cache memory and a first cache control logic. The first cache control logic may be configured to generate a first prefetch request to prefetch first data, where this request is to be aborted if the first data is not present in a second cache memory coupled to the first cache memory. Other embodiments are described and claimed.
    Type: Grant
    Filed: November 26, 2013
    Date of Patent: March 15, 2016
    Assignee: Intel Corporation
    Inventors: Seth H. Pugsley, Robert L. Scott, Zeshan A. Chishti, Peng-Fei Chuang, Khun Ban, Christopher B. Wilkerson, Shih-Lien L. Lu, Kingsum Chow
  • Patent number: 9280351
    Abstract: Embodiments relate to second-level branch target buffer bulk transfer filtering. An aspect includes a system for second-level branch target buffer bulk transfer filtering. The system includes a first-level branch target buffer and a second-level branch target buffer coupled to a processing circuit. The processing circuit is configured to perform a method. The method includes receiving branch target buffer miss indicators, receiving instruction cache miss indicators, and recording information about the branch target buffer miss indicators and the instruction cache miss indicators in search trackers. Based on detecting, by the processing circuit, a search tracker representing a correlated pair of the branch target buffer miss indicators and the instruction cache miss indicators, the search tracker is activated by the processing circuit to perform a bulk transfer from the second-level branch target buffer to the first-level branch target buffer.
    Type: Grant
    Filed: June 15, 2012
    Date of Patent: March 8, 2016
    Assignee: International Business Machines Corporation
    Inventors: James J. Bonanno, Ulrich Mayer, Brian R. Prasky
  • Patent number: 9268572
    Abstract: An modify next sequential instruction (MNSI) instruction, when executed, modifies a field of the fetched copy of the next sequential instruction (NSI) to enable a program to dynamically provide parameters to the NSI being executed. Thus the MNSI instruction is a non-disruptive prefix instruction to the NSI. The NSI may be modified to effectively extend the length of the NSI field, thus providing more registers or more range (in the case of a length field) than otherwise available to the NSI instruction according to the instruction set architecture (ISA).
    Type: Grant
    Filed: December 11, 2012
    Date of Patent: February 23, 2016
    Assignee: International Business Machines Corporation
    Inventors: Michael K Gschwind, Eric M Schwarz
  • Patent number: 9250904
    Abstract: An modify next sequential instruction (MNSI) instruction, when executed, modifies a field of the fetched copy of the next sequential instruction (NSI) to enable a program to dynamically provide parameters to the NSI being executed. Thus the MNSI instruction is a non-disruptive prefix instruction to the NSI. The NSI may be modified to effectively extend the length of the NSI field, thus providing more registers or more range (in the case of a length field) than otherwise available to the NSI instruction according to the instruction set architecture (ISA).
    Type: Grant
    Filed: December 24, 2013
    Date of Patent: February 2, 2016
    Assignee: International Business Machines Corporation
    Inventors: Michael K Gschwind, Eric M Schwarz
  • Patent number: 9251117
    Abstract: A reconfigurable circuit includes a reconfigurable arithmetic execution unit array including a plurality of arithmetic execution units and a network circuit to provide reconfigurable connections between the arithmetic execution units, a suspension control circuit configured to control suspension and resumption of operation of the reconfigurable arithmetic execution unit array, and a buffer circuit configured to temporarily store data supplied from an external source upon suspension of the operation of the reconfigurable arithmetic execution unit array and to supply the stored data to the reconfigurable arithmetic execution unit array upon resumption of the operation of the reconfigurable arithmetic execution unit array.
    Type: Grant
    Filed: March 12, 2010
    Date of Patent: February 2, 2016
    Assignee: CYPRESS SEMICONDUCTOR CORPORATION
    Inventors: Takashi Hanai, Shinichi Sutou
  • Patent number: 9207938
    Abstract: Embodiments herein relate to forwarding an instruction based on predication criteria. A predicate state associated with a packet of data is to be compared to an instruction associated with the predication criteria. The instruction is to be forwarded to an execution unit if the predication criteria includes or matches the predicate state of the packet.
    Type: Grant
    Filed: August 29, 2012
    Date of Patent: December 8, 2015
    Assignee: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.
    Inventors: David A Warren, Thomas A Keaveny
  • Patent number: 9201798
    Abstract: A computer implemented method for prefetching data. The method includes: receiving one or more addresses by a prefetching unit upon execution of an enqueuing command in a first piece of program logic; enqueuing each of the received addresses to a recording-list; identifying one of the positions in the recording-list as jump position; providing the identified jump position to a frame-shifter; using a sub-list of the recording-list defined by a shiftable frame as a playback-list; executing a frame-shift command which triggers the frame-shifter to shift the frame in dependence on the jump position to provide an updated playback-list; fetching data identified by the updated playback-list from a second memory; and transferring the fetched data to a first memory.
    Type: Grant
    Filed: October 9, 2013
    Date of Patent: December 1, 2015
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Hans Boettiger, Thilo Maurer
  • Patent number: 9189242
    Abstract: One embodiment of the present invention sets forth a technique for ensuring cache access instructions are scheduled for execution in a multi-threaded system to improve cache locality and system performance. A credit-based technique may be used to control instruction by instruction scheduling for each warp in a group so that the group of warps is processed uniformly. A credit is computed for each warp and the credit contributes to a weight for each warp. The weight is used to select instructions for the warps that are issued for execution.
    Type: Grant
    Filed: September 17, 2010
    Date of Patent: November 17, 2015
    Assignee: NVIDIA Corporation
    Inventors: John Erik Lindholm, Brett W. Coon, Jered Wierzbicki, Robert J. Stoll, Stuart F. Oberman
  • Patent number: 9177878
    Abstract: Indexing a plurality of die obtainable from a material wafer comprising a plurality of stacked material layers. Each die is obtained in a respective position of the wafer. A manufacturing stage comprises at least two steps for treating a respective superficial portion of the material wafer that corresponds to a subset of said plurality of dies using the at least one lithographic mask through the exposition to the proper radiation in temporal succession. The method may include providing a die index on each die which is indicative of the position of the respective die by forming an external index indicative of the position of the superficial portion of the material wafer corresponding to the subset of the plurality of dies including said die and may comprise a plurality of electronic components electrically coupled to each other by means of a respective common control line.
    Type: Grant
    Filed: July 2, 2014
    Date of Patent: November 3, 2015
    Assignee: STMicroelectronics S.r.l.
    Inventors: Daniele Alfredo Brambilla, Fausto Redigolo
  • Patent number: 9152482
    Abstract: The present invention provides a multi-core processor system, including: multiple central processor units and multiple groups of level-one hardware message queues. Each central processor unit is separately connected to a group of level-one hardware message queues and is configured to process messages in the level-one hardware message queues. Each group of level-one hardware message queues includes multiple level-one hardware message queues. Moreover, in each group of level-one hardware message queues, a level-one hardware message queue having a higher priority is scheduled preferentially, and level-one hardware message queues having the same priority are scheduled in a round-robin manner according to round robin scheduling weights. Through the multi-core processor system provided in the present invention, the efficiency and performance of the multi-core processor system are improved.
    Type: Grant
    Filed: April 28, 2014
    Date of Patent: October 6, 2015
    Assignee: Huawei Technologies Co., Ltd.
    Inventors: Weiguo Zhang, Libo Wu
  • Patent number: 9146740
    Abstract: Embodiments relate to branch prediction preloading. A method for branch prediction preloading includes fetching a plurality of instructions in an instruction stream, and decoding a branch prediction preload instruction in the instruction stream. The method also includes determining, by a processing circuit, an address of a predicted branch instruction based on the branch prediction preload instruction, and determining, by the processing circuit, a predicted target address of the predicted branch instruction based on the branch prediction preload instruction. The method further includes identifying a mask field in the branch prediction preload instruction, and determining, by the processing circuit, a branch instruction length of the predicted branch instruction based on the mask field.
    Type: Grant
    Filed: March 5, 2013
    Date of Patent: September 29, 2015
    Assignee: International Business Machines Corporation
    Inventors: James J. Bonanno, Marcel Mitran, Brian R. Prasky, Joran Siu, Timothy J. Slegel, Alexander Vasilevskiy
  • Patent number: 9135011
    Abstract: A data processing system 2 is provided with branch prediction circuitry 20 for performing branch prediction operations. Next branch table circuitry 22 stores data identifying from a given branch instruction what will be the address of the next branch instruction to be encountered within the program flow. This next branch instruction address is supplied to the branch prediction circuitry 20 which uses it to form its prediction prior to that next branch instruction being identified as such by the instruction decoder 16. This permits branch prediction to commence earlier in the branch prediction circuitry 20 than would otherwise be the case.
    Type: Grant
    Filed: April 30, 2012
    Date of Patent: September 15, 2015
    Assignee: The Regents of the University of Michigan
    Inventors: David Thomas Manville, Trevor Nigel Mudge
  • Patent number: 9069675
    Abstract: Systems and Program Products are created to execute a prefetch data machine instruction having an M field performs a function on a cache line of data specifying an address of an operand. The operation comprises either prefetching a cache line of data from memory to a cache or reducing the access ownership of store and fetch or fetch only of the cache line in the cache or a combination thereof. The address of the operand is either based on a register value or the program counter value pointing to the prefetch data machine instruction.
    Type: Grant
    Filed: March 21, 2014
    Date of Patent: June 30, 2015
    Assignee: International Business Machines Corporation
    Inventors: Dan F Greiner, Timothy J Slegel
  • Patent number: 9043579
    Abstract: A prefetch optimizer tool for an information handling system (IHS) may improve effective memory access time by controlling both hardware prefetch operations and software prefetch operations. The prefetch optimizer tool selectively disables prefetch instructions in an instruction sequence of interest within an application. The tool measures execution times of the instruction sequence of interest when different prefetch instructions are disabled. The tool may hold hardware prefetch depth constant while cycling through disabling different prefetch instructions and taking corresponding execution time measurements. Alternatively, for each disabled prefetch instruction in the instruction sequence of interest, the tool may cycle through different hardware prefetch depths and take corresponding execution time measurements at each hardware prefetch depth.
    Type: Grant
    Filed: January 10, 2012
    Date of Patent: May 26, 2015
    Assignee: International Business Machines Corporation
    Inventor: Randall Ray Heisch
  • Publication number: 20150143084
    Abstract: Receiving an instruction indicating a source operand and a destination operand. Storing a result in the destination operand in response to the instruction. The result operand may have: (1) first range of bits having a first end explicitly specified by the instruction in which each bit is identical in value to a bit of the source operand in a corresponding position; and (2) second range of bits that all have a same value regardless of values of bits of the source operand in corresponding positions. Execution of instruction may complete without moving the first range of the result relative to the bits of identical value in the corresponding positions of the source operand, regardless of the location of the first range of bits in the result. Execution units to execute such instructions, computer systems having processors to execute such instructions, and machine-readable medium storing such an instruction are also disclosed.
    Type: Application
    Filed: December 12, 2014
    Publication date: May 21, 2015
    Applicant: INTEL CORPORATION
    Inventors: Maxim Loktyukhin, Eric W. Mahurin, Bret L. Toll, Martin G. Dixon, Sean P. Mirkes, David L. Kreitzer, ELMOUSTAPHA OULD-AHMED-VALL, Vinodh Gopal
  • Patent number: 9037835
    Abstract: A data processing device includes processing circuitry 20 for executing a first memory access instruction to a first address of a memory device 40 and a second memory access instruction to a second address of the memory device 40, the first address being different from the second address. The data processing device also includes prefetching circuitry 30 for prefetching data from the memory device 40 based on a stride length 70 and instruction analysis circuitry 50 for determining a difference between the first address and the second address. Stride refining circuitry 60 is also provided to refine the stride length based on factors of the stride length and factors of the difference calculated by the instruction analysis circuitry 50.
    Type: Grant
    Filed: October 24, 2013
    Date of Patent: May 19, 2015
    Assignee: ARM Limited
    Inventors: Ganesh Suryanarayan Dasika, Rune Holm
  • Publication number: 20150134933
    Abstract: A data processing apparatus and method of data processing are disclosed. An instruction execution unit executes a sequence of program instructions, wherein execution of at least some of the program instructions initiates memory access requests to retrieve data values from a memory. A prefetch unit prefetches data values from the memory for storage in a cache unit before they are requested by the instruction execution unit. The prefetch unit is configured to perform a miss response comprising increasing a number of the future data values which it prefetches, when a memory access request specifies a pending data value which is already subject to prefetching but is not yet stored in the cache unit. The prefetch unit is also configured, in response to an inhibition condition being met, to temporarily inhibit the miss response for an inhibition period.
    Type: Application
    Filed: November 14, 2013
    Publication date: May 14, 2015
    Applicant: ARM Limited
    Inventors: Rune HOLM, Ganesh Suryanarayan Dasika
  • Publication number: 20150121039
    Abstract: Method, apparatus, and program means for shuffling data. The method of one embodiment comprises receiving a first operand having a set of L data elements and a second operand having a set of L control elements. For each control element, data from a first operand data element designated by the individual control element is shuffled to an associated resultant data element position if its flush to zero field is not set and a zero is placed into the associated resultant data element position if its flush to zero field is not set.
    Type: Application
    Filed: December 30, 2014
    Publication date: April 30, 2015
    Applicant: Intel Corporation
    Inventors: William W. Macy, JR., Eric L. Debes, Patrice L. Roussel, Huy V. Nguyen
  • Publication number: 20150121038
    Abstract: A single instruction multiple thread (SIMT) processor 2 includes execution circuitry 6, prefetch circuitry 12 and prefetch strategy selection circuitry 14. The prefetch strategy selection circuitry serves to detect one or more characteristics of a stream of program instructions that are being executed to identify whether or not a given data access instruction within a program will be executed a plurality of times. The prefetch strategy to use is selected from a plurality of selectable prefetch strategy in dependence upon the detection of such characteristics.
    Type: Application
    Filed: October 24, 2013
    Publication date: April 30, 2015
    Applicant: ARM LIMITED
    Inventors: Ganesh Suryanarayan DASIKA, Rune HOLM, David Hennah MANSELL
  • Patent number: 9020211
    Abstract: A data processing apparatus which sequentially executes a verification process so as to recognize a target object, comprising: an obtaining unit configured to obtain dictionary data to be referred to in the verification process; a holding unit configured to hold a plurality of dictionary data; a verification unit configured to execute the verification process for the input data by referring to one dictionary data; a history holding unit configured to hold a verification result; and a prefetch determination unit configured to determine based on the verification result whether to execute prefetch processing in which the obtaining unit obtains in advance dictionary data to be referred to by the verification unit in a succeeding verification process, and holds the dictionary data in the holding unit before the succeeding verification process.
    Type: Grant
    Filed: May 10, 2012
    Date of Patent: April 28, 2015
    Assignee: Canon Kabushiki Kaisha
    Inventor: Akiyoshi Momoi
  • Publication number: 20150113249
    Abstract: Methods and apparatus to perform adaptive pre-fetch operations in managed runtime environments are disclosed herein. An example disclosed method includes determining an object size associated with a pre-fetch operation; comparing the object size to a first one of a series of thresholds having increasing respective values; when the object size is less than the first one of the series of thresholds, pre-fetching a first amount of stored data assigned to the first one of the series of thresholds; and when the object size is greater than the first one of the plurality of thresholds, comparing the object size to a next one of the series of thresholds.
    Type: Application
    Filed: December 30, 2014
    Publication date: April 23, 2015
    Inventor: Mingqiu Sun
  • Patent number: 9015720
    Abstract: A system and method to optimize processor performance and minimizing average thread latency by selectively loading a cache when a program state, resources required for execution of a program or the program itself change, is described. An embodiment of the invention supports a “cache priming program” that is selectively executed for a first thread/program/sub-routine of each process. Such a program is optimized for situations when instructions and other program data are not yet resident in cache(s), and/or whenever resources required for program execution or the program itself changes. By pre-loading the cache with two resources required for two instructions for only a first thread, average thread latency is reduced because the resources are already present in the cache.
    Type: Grant
    Filed: January 6, 2009
    Date of Patent: April 21, 2015
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Andrew Brown, Brian Emberling
  • Publication number: 20150106590
    Abstract: The disclosed embodiments relate to a system that selectively filters out redundant software prefetch instructions during execution of a program on a processor. During execution of the program, the system collects information associated with hit rates for individual software prefetch instructions as the individual software prefetch instructions are executed, wherein a software prefetch instruction is redundant if the software prefetch instruction accesses a cache line that has already been fetched from memory. As software prefetch instructions are encountered during execution of the program, the system selectively filters out individual software prefetch instructions that are likely to be redundant based on the collected information, so that likely redundant software prefetch instructions are not executed by the processor.
    Type: Application
    Filed: October 14, 2013
    Publication date: April 16, 2015
    Applicant: Oracle International Corporation
    Inventor: Yuan C. Chou
  • Patent number: 9009734
    Abstract: One or more embodiments of the invention is a computer-implemented method for speculatively executing application event responses. The method includes the steps of identifying one or more event responses that could be issued for execution by an application being executed by a master process, for each event response, generating a child process to execute the event response, determining that a first event response included in the one or more event responses has been issued for execution by the application, committing the child process associated with the first event response as a new master process, and aborting the master process and all child processes other than the child process associated with the first event response.
    Type: Grant
    Filed: March 6, 2012
    Date of Patent: April 14, 2015
    Assignee: AUTODESK, Inc.
    Inventor: Francesco Iorio
  • Patent number: 9009449
    Abstract: A system that executes program instructions on a processor is described. During a normal-execution mode, the system issues instructions for execution in program order. Upon encountering an unresolved data dependency during execution of an instruction, the system speculatively executes subsequent instructions in a lookahead mode to prefetch future loads. While executing in the lookahead mode, if the processor determines that the lookahead mode is unlikely to uncover any additional outer-level cache misses, the system terminates the lookahead mode. Then, after the unresolved data dependency is resolved, the system recommences execution in the normal-execution mode from the instruction that triggered the lookahead mode.
    Type: Grant
    Filed: November 10, 2011
    Date of Patent: April 14, 2015
    Assignee: Oracle International Corporation
    Inventors: Yuan C. Chou, Eric W. Mahurin
  • Publication number: 20150100762
    Abstract: A processor includes an instruction fetch unit and an execution unit. The instruction fetch unit retrieves instructions from memory to be executed by the execution unit. The instruction fetch unit includes a branch prediction unit which is configured to predict whether a branch instruction is likely to be executed. The memory includes an instruction cache comprising a portion of the fetch blocks available in the memory. The instruction fetch unit may use a combination of way prediction and serialized access to retrieve instructions from the instruction cache. The instruction fetch unit initially accesses the instruction cache to retrieve the predicted fetch block associated with a way prediction. The instruction fetch unit compares a cache tag associated with the way prediction with the address of the cache line that includes the predicted fetch block. If the tag matches, then the way prediction is correct and the retrieved fetch block is valid.
    Type: Application
    Filed: October 6, 2014
    Publication date: April 9, 2015
    Inventor: Eino Jacobs
  • Patent number: 9003421
    Abstract: Disclosed are embodiments of a system, methods and mechanism for using idle thread units to perform acceleration threads that are transparent to the operating system. When the operating system scheduler has no work to schedule on the idle thread units, the operating system may issue a halt or monitor/mwait or other instruction to place the thread unit into an idle state. While the thread unit is idle, from the operating system perspective, the thread unit may be utilized to perform speculative acceleration threads in order to accelerate threads running on non-idle thread units. The context of the idle thread unit is saved prior to execution of the acceleration thread and is restored when the operating system requires use of the thread unit. The acceleration threads are transparent to the operating system. Other embodiments are also described and claimed.
    Type: Grant
    Filed: November 28, 2005
    Date of Patent: April 7, 2015
    Assignee: Intel Corporation
    Inventors: Ron Gabor, Gad Sheaffer, Avi Mendelson, Uri C. Weiser, Hong Wang
  • Publication number: 20150095616
    Abstract: The present invention realizes an efficient superscalar instruction issue and low power consumption at an instruction set including instructions with prefixes. An instruction fetch unit is adopted which determines whether an instruction code is of a prefix code or an instruction code other than it, and outputs the result of determination and the 16-bit instruction code. Along with it, decoders each of which decodes the instruction code, based on the result of determination, and decoders each of which decodes the prefix code, are disposed separately. Further, a prefix is supplied to each decoder prior to a fixed-length instruction code like 16 bits modified with it. A fixed-length instruction code following the prefix code is supplied to each decoder of the same pipeline as the decoder for the prefix code.
    Type: Application
    Filed: December 9, 2014
    Publication date: April 2, 2015
    Inventors: Hiroaki Nakaya, Yuki Kondoh, Makoto Ishikawa
  • Publication number: 20150089196
    Abstract: Method and apparatus for performing a shift and XOR operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources perform a shift and XOR on at least one value.
    Type: Application
    Filed: December 1, 2014
    Publication date: March 26, 2015
    Inventors: Vinodh Gopal, James D. Guilford, Erdinc Ozturk, Wajdi K. Feghali, Gilbert M. Wolrich, Martin G. Dixon
  • Publication number: 20150089195
    Abstract: Method and apparatus for performing a shift and XOR operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources perform a shift and XOR on at least one value.
    Type: Application
    Filed: December 1, 2014
    Publication date: March 26, 2015
    Inventors: Vinodh Gopal, James D. Guilford, Erdinc Ozturk, Wajdi K. Feghali, Gilbert M. Wolrich, Martin G. Dixon
  • Publication number: 20150089197
    Abstract: Method and apparatus for performing a shift and XOR operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources perform a shift and XOR on at least one value.
    Type: Application
    Filed: December 1, 2014
    Publication date: March 26, 2015
    Inventors: Vinodh Gopal, James D. Guilford, Erdinc Ozturk, Wajdi K. Feghali, Gilbert M. Wolrich, Martin G. Dixon
  • Patent number: 8978022
    Abstract: Embodiments include systems and methods for reducing instruction cache miss penalties during application execution. Application code is profiled to determine “hot” code regions likely to experience instruction cache miss penalties. The application code can be linearized into a set of traces that include the hot code regions. Embodiments traverse the traces in reverse, keeping track of instruction scheduling information, to determine where an accumulated instruction latency covered by the code blocks exceeds an amount of latency that can be covered by prefetching. Each time the accumulated latency exceeds the amount of latency that can be covered by prefetching, a prefetch instruction can be scheduled in the application code. Some embodiments insert additional prefetches, merge prefetches, and/or adjust placement of prefetches to account for scenarios, such as loops, merging or forking branches, edge confidence values, etc.
    Type: Grant
    Filed: January 10, 2013
    Date of Patent: March 10, 2015
    Assignee: Oracle International Corporation
    Inventors: Spiros Kalogeropulos, Partha Tirumalai
  • Patent number: 8977815
    Abstract: A processing pipeline 6, 8, 10, 12 is provided with a main query stage 20 and a fetch stage 22. A buffer 24 stores program instructions which have missed within a cache memory 14. Query generation circuitry within the main query stage 20 and within a buffer query stage 26 serve to concurrently generate a main query request and a buffer query request sent to the cache memory 14. The cache memory returns a main query response and a buffer query response. Arbitration circuitry 28 controls multiplexers 30, 32 and 34 to direct the program instruction at the main query stage 20, and the program instruction stored within the buffer 24 and the buffer query stage 26 to pass either to the fetch stage 22 or to the buffer 24. The multiplexer 30 can also select a new instruction to be passed to the main query stage 20.
    Type: Grant
    Filed: November 29, 2010
    Date of Patent: March 10, 2015
    Assignee: ARM Limited
    Inventors: Frode Heggelund, Rune Holm, Andreas Due Engh-Halstvedt, Edvard Feilding
  • Publication number: 20150067263
    Abstract: A microprocessor includes a plurality of processing cores, a service processing unit and a memory accessible by both the service processing unit and the plurality of processing cores. At least one of the plurality of processing cores is configured to write a patch to the memory. The patch comprises one or more instructions to be fetched from the memory and executed by the service processing unit after written to the memory by the at least one of the plurality of processing cores.
    Type: Application
    Filed: May 19, 2014
    Publication date: March 5, 2015
    Applicant: VIA TECHNOLOGIES, INC.
    Inventors: G. Glenn Henry, Stephan Gaskins
  • Publication number: 20150058600
    Abstract: A data processor of an embodiment includes a memory, an instruction cache, a processing unit (CPU), and a fetch process control unit. The memory stores a program in which a plurality of instructions are written. The instruction cache operates only when a branch instruction included in the program is executed, and data of a greater capacity than a width of a bus of the memory is read from the memory and stored in the instruction cache in advance. The processing unit accesses both the memory and the instruction cache and executes, in a pipelined manner, instructions read from the memory or the instruction cache. The fetch process control unit generates, in response to a branch instruction executed by the processing unit, a stop signal for stopping a fetch process of reading an instruction from the memory, and outputs the stop signal to the memory.
    Type: Application
    Filed: February 14, 2012
    Publication date: February 26, 2015
    Applicant: RENESAS ELECTRONICS CORPORATION
    Inventor: Masakatsu Ishizaki
  • Patent number: 8954714
    Abstract: An apparatus includes a processor. The processor includes two memories. The first memory stores one set of instructions. The second memory stores another set of instructions that are longer than the set of instructions in the first memory. An instruction in the set of instructions in the first memory is used as a pointer to a corresponding instruction in the set of instructions in the second memory.
    Type: Grant
    Filed: February 1, 2010
    Date of Patent: February 10, 2015
    Assignee: Altera Corporation
    Inventor: Steven Perry
  • Patent number: 8949579
    Abstract: A processor of an information handling system (IHS) initiates an L3 cache prefetch operation in response to a demand load during instruction processing. The processor selects an L3 cache prefetch at random for tracking as a target prefetched instruction. The processor initiates an L1 cache target prefetch operation and stores the resultant target prefetched instruction in the L1 cache. If a demand load arrives, the processor analyzes the target prefetched instruction for effectiveness and determines the source of the prefetch data. If a demand does not arrive, the processor tests to determine if the particular prefetched instruction timed out in the cache and identifies the ineffectiveness of the prefetch operation. The processor samples multiple prefetch operations at random and generates a history of prefetch effectiveness and other useful prefetch information. The processor stores the prefetch effectiveness information to enable reduction or removal of ineffective prefetch operations.
    Type: Grant
    Filed: October 4, 2010
    Date of Patent: February 3, 2015
    Assignee: International Business Machines Corporation
    Inventors: Miles R. Dooley, Venkat R. Indukuru, Alex E. Mericas, Francis P. O'Connell
  • Publication number: 20150019841
    Abstract: Embodiments relate to prefetching data on a chip having a scout core and a parent core coupled to the scout core. A method includes determining that a program executed by the parent core requires content stored in a location remote from the parent core. The method includes sending a fetch table address determined by the parent core to the scout core. The method includes accessing a fetch table that is indicated by the fetch table address by the scout core. The fetch table indicates how many of pieces of content are to be fetched by the scout core and a location of the pieces of content. The method includes based on the fetch table indicating, fetching the pieces of content by the scout core. The method includes returning the fetched pieces of content to the parent core.
    Type: Application
    Filed: September 30, 2014
    Publication date: January 15, 2015
    Inventors: Brian R. Prasky, Fadi Y. Busaba, Steven R. Carlough, Christopher A. Krygowski, Chung-lung K. Shum
  • Publication number: 20150006855
    Abstract: Predictive fetching and decoding for selected instructions (e.g., operating system instructions, hypervisor instructions or other such instructions). A determination is made that a selected instruction, such as a system call instruction, an asynchronous interrupt, a return from system call instruction or return from asynchronous interrupt, is to be executed. Based on determining that such an instruction is to be executed, a predicted address is determined for the selected instruction, which is the address to which processing transfers in order to provide the requested services. Then, fetching of instructions beginning at the predicted address prior to execution of the selected instruction is commenced. Further, speculative state relating to a selected instruction, including, for instance, an indication of the privilege level of the selected instruction or instructions executed on behalf of the selected instruction, is predicted and maintained.
    Type: Application
    Filed: June 28, 2013
    Publication date: January 1, 2015
    Inventors: Michael K. Gschwind, Valentina Salapura
  • Publication number: 20150006854
    Abstract: Predictive fetching and decoding for selected instructions. A determination is made as to whether an instruction to be executed in a pipelined processor is a selected return instruction, the pipelined processor having a plurality of stages including an execute stage. Based on the instruction being the selected return instruction, obtaining from a data structure a predicted return address, the predicted return address being an address of an instruction to which it is predicted that processing is to be returned. Additionally, based on the instruction being the selected return instruction, operating state for the instruction at the predicted return address is predicted. The instruction is fetched at the predicted return address, prior to the selected return instruction reaching the execute stage, and decoding of the fetched instruction is initiated based on the predicted operating state.
    Type: Application
    Filed: June 28, 2013
    Publication date: January 1, 2015
    Inventors: Michael K. Gschwind, Valentina Salapura
  • Patent number: 8924651
    Abstract: An apparatus and method is described herein for optimization to prefetch throttling, which potentially enhances performance, reduces power consumption, and maintains positive gain for workloads that benefit from prefetching. More specifically, the optimizations described herein allow for bandwidth congestion and prefetch accuracy to be taken into account as feedbacks for throttling at the source of prefetch generation. As a result, when there is low congestion, full prefetch generation is allowed, even if the prefetch is inaccurate, since there is available bandwidth. However, when congestion is high, the determination of throttling falls to prefetch accuracy. If accuracy is high—miss rate is low—then less throttling is needed, because the prefetches are being utilized—performance is being enhanced.
    Type: Grant
    Filed: April 16, 2013
    Date of Patent: December 30, 2014
    Assignee: Intel Corporation
    Inventors: Perry P. Tang, Hemant G. Rotithor, Ryan L. Carlson, Nagi Aboulenein
  • Patent number: 8918626
    Abstract: The disclosed embodiments relate to a system that executes program instructions on a processor. During a normal-execution mode, the system issues instructions for execution in program order. Upon encountering an unresolved data dependency during execution of an instruction, the system speculatively executes subsequent instructions in a lookahead mode to prefetch future loads. When an instruction retires during the lookahead mode, a working register which serves as a destination register for the instruction is not copied to a corresponding architectural register. Instead the architectural register is marked as invalid. Note that by not updating architectural registers during lookahead mode, the system eliminates the need to checkpoint the architectural registers prior to entering lookahead mode.
    Type: Grant
    Filed: November 10, 2011
    Date of Patent: December 23, 2014
    Assignee: Oracle International Corporation
    Inventors: Yuan C. Chou, Eric W. Mahurin
  • Publication number: 20140372731
    Abstract: A data processing system includes an execution pipeline that includes one or more programmable execution stages which execute execution threads to execute instructions to perform data processing operations. Instructions to be executed by a group of execution threads are first fetched into an instruction cache and then read from the instruction cache for execution by the thread group. When an instruction to be executed by a thread group is present in a cache line in the instruction cache, or is to be fetched into an allocated cache line in the instruction cache, a pointer to the location of the instruction in the instruction cache is stored for the thread group. This stored pointer is then used to retrieve the instruction for execution by the thread group from the instruction cache.
    Type: Application
    Filed: June 14, 2013
    Publication date: December 18, 2014
    Inventors: Jorn Nystad, Andreas Engh-Halstvedt
  • Publication number: 20140372730
    Abstract: The invention relates to the method of prefetching data in micro-processor buffer under software controls.
    Type: Application
    Filed: June 14, 2013
    Publication date: December 18, 2014
    Inventors: Muhammad Yasir Qadri, Nadia Nawaz Qadri, Klaus Dieter McDonald-Maier
  • Publication number: 20140344551
    Abstract: The dual-mode instruction fetching apparatus includes a mode register, a branch prediction unit, a Program Counter (PC) calculator, an Instruction Queue (IQ), and a fetch multiplexer. The mode register is set to one of normal mode and line mode. The PC calculator accesses a tag in which the address indices of instructions have been stored or a line in which the instructions have been grouped and then outputs an instruction, or accesses only the line and then outputs an instruction depending on the type of set mode. The IQ stores instructions selected by an instruction selector from among the instructions grouped in the line. The fetch multiplexer fetches the instructions stored in the IQ if the normal mode has been set, and fetches instructions read from the line of an instruction cache if the line mode has been set.
    Type: Application
    Filed: April 21, 2014
    Publication date: November 20, 2014
    Applicant: Electronics and Telecommunications Research Institute
    Inventor: Young-Su KWON
  • Patent number: 8850162
    Abstract: A method and system for implementing vector prefetch with streaming access detection is contemplated in which an execution unit such as a vector execution unit, for example, executes a vector memory access instruction that references an associated vector of effective addresses. The vector of effective addresses includes a number of elements, each of which includes a memory pointer. The vector memory access instruction is executable to perform multiple independent memory access operations using at least some of the memory pointers of the vector of effective addresses. A prefetch unit, for example, may detect a memory access streaming pattern based upon the vector of effective addresses, and in response to detecting the memory access streaming pattern, the prefetch unit may calculate one or more prefetch memory addresses based upon the memory access streaming pattern. Lastly, the prefetch unit may prefetch the one or more prefetch memory addresses into a memory.
    Type: Grant
    Filed: May 22, 2012
    Date of Patent: September 30, 2014
    Assignee: Apple Inc.
    Inventor: Jeffry E. Gonion
  • Publication number: 20140281390
    Abstract: A data processor includes a packet selector. The packet selector creates an ordered list of packets, each packet corresponding to a respective communication flow, determines whether each packet in the ordered list of packets is eligible for transfer to a prefetch unit based on whether a preceding packet in the same communication flow has been transferred to the prefetch unit, and sets a selection priority for each packet based on start time constraints for the respective communication flow, and based on a processing status of a preceding packet in the communication flow.
    Type: Application
    Filed: March 13, 2013
    Publication date: September 18, 2014
    Applicant: FREESCALE SEMICONDUCTOR, INC.
    Inventors: Timothy G. Boland, Anne C. Harris, Steven D. Millman
  • Publication number: 20140258682
    Abstract: Provided is a processor with a multi-pipeline fetch structure or a multi-cycle cache structure, including: an integer core which reads instruction transmitted from a lower block, executes an operation corresponding to the instruction, and transmits an instruction address to the lower block; an instruction buffer which stores instruction data which are requested by the integer core by using the instruction address and transmits the instruction data in response to the request of the integer core; and an instruction cache which stores a portion of data of a program memory and transmit the data to the instruction buffer in response to the request of the instruction buffer.
    Type: Application
    Filed: May 16, 2013
    Publication date: September 11, 2014
    Applicant: Advanced Digital Chips Inc.
    Inventors: YOUNG HO CHA, KWANG HO LEE, KWAN YOUNG KIM, BYUNG GUEON MIN