Prefetching Patents (Class 712/207)
-
Patent number: 9304773Abstract: A data processor (102) includes a prefetch buffer (112) and a fetch control unit (116). The prefetch buffer (112) has a plurality of lines. The prefetch buffer (112) has a variable maximum depth that defines a number of lines of the plurality of lines that are capable of storing instructions. The fetch control unit (116) is coupled to the prefetch buffer to monitor at least one of the plurality of lines of the prefetch buffer (112) and to adjust the variable maximum depth of the prefetch buffer (112) in response to a state of the data processor (102).Type: GrantFiled: March 21, 2006Date of Patent: April 5, 2016Assignee: FREESCALE SEMICONDUCTOR, INC.Inventors: Jeffrey W. Scott, William C. Moyer
-
Patent number: 9298644Abstract: Provided are method and device for managing a memory in a data stream management system (DSMS) of a portable device. The method includes moving data of a selected memory region that has a low priority to a secondary storage and storing a received data stream in the selected memory region.Type: GrantFiled: March 15, 2012Date of Patent: March 29, 2016Assignee: Samsung Electronics Co., Ltd.Inventors: Seung-woo Ryu, Seok-jin Hong, Keun-joo Kwon
-
Patent number: 9286224Abstract: In an embodiment, a processor includes at least one core having one or more execution units, a first cache memory and a first cache control logic. The first cache control logic may be configured to generate a first prefetch request to prefetch first data, where this request is to be aborted if the first data is not present in a second cache memory coupled to the first cache memory. Other embodiments are described and claimed.Type: GrantFiled: November 26, 2013Date of Patent: March 15, 2016Assignee: Intel CorporationInventors: Seth H. Pugsley, Robert L. Scott, Zeshan A. Chishti, Peng-Fei Chuang, Khun Ban, Christopher B. Wilkerson, Shih-Lien L. Lu, Kingsum Chow
-
Patent number: 9280351Abstract: Embodiments relate to second-level branch target buffer bulk transfer filtering. An aspect includes a system for second-level branch target buffer bulk transfer filtering. The system includes a first-level branch target buffer and a second-level branch target buffer coupled to a processing circuit. The processing circuit is configured to perform a method. The method includes receiving branch target buffer miss indicators, receiving instruction cache miss indicators, and recording information about the branch target buffer miss indicators and the instruction cache miss indicators in search trackers. Based on detecting, by the processing circuit, a search tracker representing a correlated pair of the branch target buffer miss indicators and the instruction cache miss indicators, the search tracker is activated by the processing circuit to perform a bulk transfer from the second-level branch target buffer to the first-level branch target buffer.Type: GrantFiled: June 15, 2012Date of Patent: March 8, 2016Assignee: International Business Machines CorporationInventors: James J. Bonanno, Ulrich Mayer, Brian R. Prasky
-
Patent number: 9268572Abstract: An modify next sequential instruction (MNSI) instruction, when executed, modifies a field of the fetched copy of the next sequential instruction (NSI) to enable a program to dynamically provide parameters to the NSI being executed. Thus the MNSI instruction is a non-disruptive prefix instruction to the NSI. The NSI may be modified to effectively extend the length of the NSI field, thus providing more registers or more range (in the case of a length field) than otherwise available to the NSI instruction according to the instruction set architecture (ISA).Type: GrantFiled: December 11, 2012Date of Patent: February 23, 2016Assignee: International Business Machines CorporationInventors: Michael K Gschwind, Eric M Schwarz
-
Patent number: 9250904Abstract: An modify next sequential instruction (MNSI) instruction, when executed, modifies a field of the fetched copy of the next sequential instruction (NSI) to enable a program to dynamically provide parameters to the NSI being executed. Thus the MNSI instruction is a non-disruptive prefix instruction to the NSI. The NSI may be modified to effectively extend the length of the NSI field, thus providing more registers or more range (in the case of a length field) than otherwise available to the NSI instruction according to the instruction set architecture (ISA).Type: GrantFiled: December 24, 2013Date of Patent: February 2, 2016Assignee: International Business Machines CorporationInventors: Michael K Gschwind, Eric M Schwarz
-
Patent number: 9251117Abstract: A reconfigurable circuit includes a reconfigurable arithmetic execution unit array including a plurality of arithmetic execution units and a network circuit to provide reconfigurable connections between the arithmetic execution units, a suspension control circuit configured to control suspension and resumption of operation of the reconfigurable arithmetic execution unit array, and a buffer circuit configured to temporarily store data supplied from an external source upon suspension of the operation of the reconfigurable arithmetic execution unit array and to supply the stored data to the reconfigurable arithmetic execution unit array upon resumption of the operation of the reconfigurable arithmetic execution unit array.Type: GrantFiled: March 12, 2010Date of Patent: February 2, 2016Assignee: CYPRESS SEMICONDUCTOR CORPORATIONInventors: Takashi Hanai, Shinichi Sutou
-
Patent number: 9207938Abstract: Embodiments herein relate to forwarding an instruction based on predication criteria. A predicate state associated with a packet of data is to be compared to an instruction associated with the predication criteria. The instruction is to be forwarded to an execution unit if the predication criteria includes or matches the predicate state of the packet.Type: GrantFiled: August 29, 2012Date of Patent: December 8, 2015Assignee: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.Inventors: David A Warren, Thomas A Keaveny
-
Patent number: 9201798Abstract: A computer implemented method for prefetching data. The method includes: receiving one or more addresses by a prefetching unit upon execution of an enqueuing command in a first piece of program logic; enqueuing each of the received addresses to a recording-list; identifying one of the positions in the recording-list as jump position; providing the identified jump position to a frame-shifter; using a sub-list of the recording-list defined by a shiftable frame as a playback-list; executing a frame-shift command which triggers the frame-shifter to shift the frame in dependence on the jump position to provide an updated playback-list; fetching data identified by the updated playback-list from a second memory; and transferring the fetched data to a first memory.Type: GrantFiled: October 9, 2013Date of Patent: December 1, 2015Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Hans Boettiger, Thilo Maurer
-
Patent number: 9189242Abstract: One embodiment of the present invention sets forth a technique for ensuring cache access instructions are scheduled for execution in a multi-threaded system to improve cache locality and system performance. A credit-based technique may be used to control instruction by instruction scheduling for each warp in a group so that the group of warps is processed uniformly. A credit is computed for each warp and the credit contributes to a weight for each warp. The weight is used to select instructions for the warps that are issued for execution.Type: GrantFiled: September 17, 2010Date of Patent: November 17, 2015Assignee: NVIDIA CorporationInventors: John Erik Lindholm, Brett W. Coon, Jered Wierzbicki, Robert J. Stoll, Stuart F. Oberman
-
Patent number: 9177878Abstract: Indexing a plurality of die obtainable from a material wafer comprising a plurality of stacked material layers. Each die is obtained in a respective position of the wafer. A manufacturing stage comprises at least two steps for treating a respective superficial portion of the material wafer that corresponds to a subset of said plurality of dies using the at least one lithographic mask through the exposition to the proper radiation in temporal succession. The method may include providing a die index on each die which is indicative of the position of the respective die by forming an external index indicative of the position of the superficial portion of the material wafer corresponding to the subset of the plurality of dies including said die and may comprise a plurality of electronic components electrically coupled to each other by means of a respective common control line.Type: GrantFiled: July 2, 2014Date of Patent: November 3, 2015Assignee: STMicroelectronics S.r.l.Inventors: Daniele Alfredo Brambilla, Fausto Redigolo
-
Patent number: 9152482Abstract: The present invention provides a multi-core processor system, including: multiple central processor units and multiple groups of level-one hardware message queues. Each central processor unit is separately connected to a group of level-one hardware message queues and is configured to process messages in the level-one hardware message queues. Each group of level-one hardware message queues includes multiple level-one hardware message queues. Moreover, in each group of level-one hardware message queues, a level-one hardware message queue having a higher priority is scheduled preferentially, and level-one hardware message queues having the same priority are scheduled in a round-robin manner according to round robin scheduling weights. Through the multi-core processor system provided in the present invention, the efficiency and performance of the multi-core processor system are improved.Type: GrantFiled: April 28, 2014Date of Patent: October 6, 2015Assignee: Huawei Technologies Co., Ltd.Inventors: Weiguo Zhang, Libo Wu
-
Patent number: 9146740Abstract: Embodiments relate to branch prediction preloading. A method for branch prediction preloading includes fetching a plurality of instructions in an instruction stream, and decoding a branch prediction preload instruction in the instruction stream. The method also includes determining, by a processing circuit, an address of a predicted branch instruction based on the branch prediction preload instruction, and determining, by the processing circuit, a predicted target address of the predicted branch instruction based on the branch prediction preload instruction. The method further includes identifying a mask field in the branch prediction preload instruction, and determining, by the processing circuit, a branch instruction length of the predicted branch instruction based on the mask field.Type: GrantFiled: March 5, 2013Date of Patent: September 29, 2015Assignee: International Business Machines CorporationInventors: James J. Bonanno, Marcel Mitran, Brian R. Prasky, Joran Siu, Timothy J. Slegel, Alexander Vasilevskiy
-
Patent number: 9135011Abstract: A data processing system 2 is provided with branch prediction circuitry 20 for performing branch prediction operations. Next branch table circuitry 22 stores data identifying from a given branch instruction what will be the address of the next branch instruction to be encountered within the program flow. This next branch instruction address is supplied to the branch prediction circuitry 20 which uses it to form its prediction prior to that next branch instruction being identified as such by the instruction decoder 16. This permits branch prediction to commence earlier in the branch prediction circuitry 20 than would otherwise be the case.Type: GrantFiled: April 30, 2012Date of Patent: September 15, 2015Assignee: The Regents of the University of MichiganInventors: David Thomas Manville, Trevor Nigel Mudge
-
Patent number: 9069675Abstract: Systems and Program Products are created to execute a prefetch data machine instruction having an M field performs a function on a cache line of data specifying an address of an operand. The operation comprises either prefetching a cache line of data from memory to a cache or reducing the access ownership of store and fetch or fetch only of the cache line in the cache or a combination thereof. The address of the operand is either based on a register value or the program counter value pointing to the prefetch data machine instruction.Type: GrantFiled: March 21, 2014Date of Patent: June 30, 2015Assignee: International Business Machines CorporationInventors: Dan F Greiner, Timothy J Slegel
-
Patent number: 9043579Abstract: A prefetch optimizer tool for an information handling system (IHS) may improve effective memory access time by controlling both hardware prefetch operations and software prefetch operations. The prefetch optimizer tool selectively disables prefetch instructions in an instruction sequence of interest within an application. The tool measures execution times of the instruction sequence of interest when different prefetch instructions are disabled. The tool may hold hardware prefetch depth constant while cycling through disabling different prefetch instructions and taking corresponding execution time measurements. Alternatively, for each disabled prefetch instruction in the instruction sequence of interest, the tool may cycle through different hardware prefetch depths and take corresponding execution time measurements at each hardware prefetch depth.Type: GrantFiled: January 10, 2012Date of Patent: May 26, 2015Assignee: International Business Machines CorporationInventor: Randall Ray Heisch
-
Publication number: 20150143084Abstract: Receiving an instruction indicating a source operand and a destination operand. Storing a result in the destination operand in response to the instruction. The result operand may have: (1) first range of bits having a first end explicitly specified by the instruction in which each bit is identical in value to a bit of the source operand in a corresponding position; and (2) second range of bits that all have a same value regardless of values of bits of the source operand in corresponding positions. Execution of instruction may complete without moving the first range of the result relative to the bits of identical value in the corresponding positions of the source operand, regardless of the location of the first range of bits in the result. Execution units to execute such instructions, computer systems having processors to execute such instructions, and machine-readable medium storing such an instruction are also disclosed.Type: ApplicationFiled: December 12, 2014Publication date: May 21, 2015Applicant: INTEL CORPORATIONInventors: Maxim Loktyukhin, Eric W. Mahurin, Bret L. Toll, Martin G. Dixon, Sean P. Mirkes, David L. Kreitzer, ELMOUSTAPHA OULD-AHMED-VALL, Vinodh Gopal
-
Patent number: 9037835Abstract: A data processing device includes processing circuitry 20 for executing a first memory access instruction to a first address of a memory device 40 and a second memory access instruction to a second address of the memory device 40, the first address being different from the second address. The data processing device also includes prefetching circuitry 30 for prefetching data from the memory device 40 based on a stride length 70 and instruction analysis circuitry 50 for determining a difference between the first address and the second address. Stride refining circuitry 60 is also provided to refine the stride length based on factors of the stride length and factors of the difference calculated by the instruction analysis circuitry 50.Type: GrantFiled: October 24, 2013Date of Patent: May 19, 2015Assignee: ARM LimitedInventors: Ganesh Suryanarayan Dasika, Rune Holm
-
Publication number: 20150134933Abstract: A data processing apparatus and method of data processing are disclosed. An instruction execution unit executes a sequence of program instructions, wherein execution of at least some of the program instructions initiates memory access requests to retrieve data values from a memory. A prefetch unit prefetches data values from the memory for storage in a cache unit before they are requested by the instruction execution unit. The prefetch unit is configured to perform a miss response comprising increasing a number of the future data values which it prefetches, when a memory access request specifies a pending data value which is already subject to prefetching but is not yet stored in the cache unit. The prefetch unit is also configured, in response to an inhibition condition being met, to temporarily inhibit the miss response for an inhibition period.Type: ApplicationFiled: November 14, 2013Publication date: May 14, 2015Applicant: ARM LimitedInventors: Rune HOLM, Ganesh Suryanarayan Dasika
-
Publication number: 20150121039Abstract: Method, apparatus, and program means for shuffling data. The method of one embodiment comprises receiving a first operand having a set of L data elements and a second operand having a set of L control elements. For each control element, data from a first operand data element designated by the individual control element is shuffled to an associated resultant data element position if its flush to zero field is not set and a zero is placed into the associated resultant data element position if its flush to zero field is not set.Type: ApplicationFiled: December 30, 2014Publication date: April 30, 2015Applicant: Intel CorporationInventors: William W. Macy, JR., Eric L. Debes, Patrice L. Roussel, Huy V. Nguyen
-
Publication number: 20150121038Abstract: A single instruction multiple thread (SIMT) processor 2 includes execution circuitry 6, prefetch circuitry 12 and prefetch strategy selection circuitry 14. The prefetch strategy selection circuitry serves to detect one or more characteristics of a stream of program instructions that are being executed to identify whether or not a given data access instruction within a program will be executed a plurality of times. The prefetch strategy to use is selected from a plurality of selectable prefetch strategy in dependence upon the detection of such characteristics.Type: ApplicationFiled: October 24, 2013Publication date: April 30, 2015Applicant: ARM LIMITEDInventors: Ganesh Suryanarayan DASIKA, Rune HOLM, David Hennah MANSELL
-
Patent number: 9020211Abstract: A data processing apparatus which sequentially executes a verification process so as to recognize a target object, comprising: an obtaining unit configured to obtain dictionary data to be referred to in the verification process; a holding unit configured to hold a plurality of dictionary data; a verification unit configured to execute the verification process for the input data by referring to one dictionary data; a history holding unit configured to hold a verification result; and a prefetch determination unit configured to determine based on the verification result whether to execute prefetch processing in which the obtaining unit obtains in advance dictionary data to be referred to by the verification unit in a succeeding verification process, and holds the dictionary data in the holding unit before the succeeding verification process.Type: GrantFiled: May 10, 2012Date of Patent: April 28, 2015Assignee: Canon Kabushiki KaishaInventor: Akiyoshi Momoi
-
Publication number: 20150113249Abstract: Methods and apparatus to perform adaptive pre-fetch operations in managed runtime environments are disclosed herein. An example disclosed method includes determining an object size associated with a pre-fetch operation; comparing the object size to a first one of a series of thresholds having increasing respective values; when the object size is less than the first one of the series of thresholds, pre-fetching a first amount of stored data assigned to the first one of the series of thresholds; and when the object size is greater than the first one of the plurality of thresholds, comparing the object size to a next one of the series of thresholds.Type: ApplicationFiled: December 30, 2014Publication date: April 23, 2015Inventor: Mingqiu Sun
-
Patent number: 9015720Abstract: A system and method to optimize processor performance and minimizing average thread latency by selectively loading a cache when a program state, resources required for execution of a program or the program itself change, is described. An embodiment of the invention supports a “cache priming program” that is selectively executed for a first thread/program/sub-routine of each process. Such a program is optimized for situations when instructions and other program data are not yet resident in cache(s), and/or whenever resources required for program execution or the program itself changes. By pre-loading the cache with two resources required for two instructions for only a first thread, average thread latency is reduced because the resources are already present in the cache.Type: GrantFiled: January 6, 2009Date of Patent: April 21, 2015Assignee: Advanced Micro Devices, Inc.Inventors: Andrew Brown, Brian Emberling
-
Publication number: 20150106590Abstract: The disclosed embodiments relate to a system that selectively filters out redundant software prefetch instructions during execution of a program on a processor. During execution of the program, the system collects information associated with hit rates for individual software prefetch instructions as the individual software prefetch instructions are executed, wherein a software prefetch instruction is redundant if the software prefetch instruction accesses a cache line that has already been fetched from memory. As software prefetch instructions are encountered during execution of the program, the system selectively filters out individual software prefetch instructions that are likely to be redundant based on the collected information, so that likely redundant software prefetch instructions are not executed by the processor.Type: ApplicationFiled: October 14, 2013Publication date: April 16, 2015Applicant: Oracle International CorporationInventor: Yuan C. Chou
-
Patent number: 9009734Abstract: One or more embodiments of the invention is a computer-implemented method for speculatively executing application event responses. The method includes the steps of identifying one or more event responses that could be issued for execution by an application being executed by a master process, for each event response, generating a child process to execute the event response, determining that a first event response included in the one or more event responses has been issued for execution by the application, committing the child process associated with the first event response as a new master process, and aborting the master process and all child processes other than the child process associated with the first event response.Type: GrantFiled: March 6, 2012Date of Patent: April 14, 2015Assignee: AUTODESK, Inc.Inventor: Francesco Iorio
-
Patent number: 9009449Abstract: A system that executes program instructions on a processor is described. During a normal-execution mode, the system issues instructions for execution in program order. Upon encountering an unresolved data dependency during execution of an instruction, the system speculatively executes subsequent instructions in a lookahead mode to prefetch future loads. While executing in the lookahead mode, if the processor determines that the lookahead mode is unlikely to uncover any additional outer-level cache misses, the system terminates the lookahead mode. Then, after the unresolved data dependency is resolved, the system recommences execution in the normal-execution mode from the instruction that triggered the lookahead mode.Type: GrantFiled: November 10, 2011Date of Patent: April 14, 2015Assignee: Oracle International CorporationInventors: Yuan C. Chou, Eric W. Mahurin
-
Publication number: 20150100762Abstract: A processor includes an instruction fetch unit and an execution unit. The instruction fetch unit retrieves instructions from memory to be executed by the execution unit. The instruction fetch unit includes a branch prediction unit which is configured to predict whether a branch instruction is likely to be executed. The memory includes an instruction cache comprising a portion of the fetch blocks available in the memory. The instruction fetch unit may use a combination of way prediction and serialized access to retrieve instructions from the instruction cache. The instruction fetch unit initially accesses the instruction cache to retrieve the predicted fetch block associated with a way prediction. The instruction fetch unit compares a cache tag associated with the way prediction with the address of the cache line that includes the predicted fetch block. If the tag matches, then the way prediction is correct and the retrieved fetch block is valid.Type: ApplicationFiled: October 6, 2014Publication date: April 9, 2015Inventor: Eino Jacobs
-
Patent number: 9003421Abstract: Disclosed are embodiments of a system, methods and mechanism for using idle thread units to perform acceleration threads that are transparent to the operating system. When the operating system scheduler has no work to schedule on the idle thread units, the operating system may issue a halt or monitor/mwait or other instruction to place the thread unit into an idle state. While the thread unit is idle, from the operating system perspective, the thread unit may be utilized to perform speculative acceleration threads in order to accelerate threads running on non-idle thread units. The context of the idle thread unit is saved prior to execution of the acceleration thread and is restored when the operating system requires use of the thread unit. The acceleration threads are transparent to the operating system. Other embodiments are also described and claimed.Type: GrantFiled: November 28, 2005Date of Patent: April 7, 2015Assignee: Intel CorporationInventors: Ron Gabor, Gad Sheaffer, Avi Mendelson, Uri C. Weiser, Hong Wang
-
Publication number: 20150095616Abstract: The present invention realizes an efficient superscalar instruction issue and low power consumption at an instruction set including instructions with prefixes. An instruction fetch unit is adopted which determines whether an instruction code is of a prefix code or an instruction code other than it, and outputs the result of determination and the 16-bit instruction code. Along with it, decoders each of which decodes the instruction code, based on the result of determination, and decoders each of which decodes the prefix code, are disposed separately. Further, a prefix is supplied to each decoder prior to a fixed-length instruction code like 16 bits modified with it. A fixed-length instruction code following the prefix code is supplied to each decoder of the same pipeline as the decoder for the prefix code.Type: ApplicationFiled: December 9, 2014Publication date: April 2, 2015Inventors: Hiroaki Nakaya, Yuki Kondoh, Makoto Ishikawa
-
Publication number: 20150089196Abstract: Method and apparatus for performing a shift and XOR operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources perform a shift and XOR on at least one value.Type: ApplicationFiled: December 1, 2014Publication date: March 26, 2015Inventors: Vinodh Gopal, James D. Guilford, Erdinc Ozturk, Wajdi K. Feghali, Gilbert M. Wolrich, Martin G. Dixon
-
Publication number: 20150089195Abstract: Method and apparatus for performing a shift and XOR operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources perform a shift and XOR on at least one value.Type: ApplicationFiled: December 1, 2014Publication date: March 26, 2015Inventors: Vinodh Gopal, James D. Guilford, Erdinc Ozturk, Wajdi K. Feghali, Gilbert M. Wolrich, Martin G. Dixon
-
Publication number: 20150089197Abstract: Method and apparatus for performing a shift and XOR operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources perform a shift and XOR on at least one value.Type: ApplicationFiled: December 1, 2014Publication date: March 26, 2015Inventors: Vinodh Gopal, James D. Guilford, Erdinc Ozturk, Wajdi K. Feghali, Gilbert M. Wolrich, Martin G. Dixon
-
Patent number: 8978022Abstract: Embodiments include systems and methods for reducing instruction cache miss penalties during application execution. Application code is profiled to determine “hot” code regions likely to experience instruction cache miss penalties. The application code can be linearized into a set of traces that include the hot code regions. Embodiments traverse the traces in reverse, keeping track of instruction scheduling information, to determine where an accumulated instruction latency covered by the code blocks exceeds an amount of latency that can be covered by prefetching. Each time the accumulated latency exceeds the amount of latency that can be covered by prefetching, a prefetch instruction can be scheduled in the application code. Some embodiments insert additional prefetches, merge prefetches, and/or adjust placement of prefetches to account for scenarios, such as loops, merging or forking branches, edge confidence values, etc.Type: GrantFiled: January 10, 2013Date of Patent: March 10, 2015Assignee: Oracle International CorporationInventors: Spiros Kalogeropulos, Partha Tirumalai
-
Patent number: 8977815Abstract: A processing pipeline 6, 8, 10, 12 is provided with a main query stage 20 and a fetch stage 22. A buffer 24 stores program instructions which have missed within a cache memory 14. Query generation circuitry within the main query stage 20 and within a buffer query stage 26 serve to concurrently generate a main query request and a buffer query request sent to the cache memory 14. The cache memory returns a main query response and a buffer query response. Arbitration circuitry 28 controls multiplexers 30, 32 and 34 to direct the program instruction at the main query stage 20, and the program instruction stored within the buffer 24 and the buffer query stage 26 to pass either to the fetch stage 22 or to the buffer 24. The multiplexer 30 can also select a new instruction to be passed to the main query stage 20.Type: GrantFiled: November 29, 2010Date of Patent: March 10, 2015Assignee: ARM LimitedInventors: Frode Heggelund, Rune Holm, Andreas Due Engh-Halstvedt, Edvard Feilding
-
Publication number: 20150067263Abstract: A microprocessor includes a plurality of processing cores, a service processing unit and a memory accessible by both the service processing unit and the plurality of processing cores. At least one of the plurality of processing cores is configured to write a patch to the memory. The patch comprises one or more instructions to be fetched from the memory and executed by the service processing unit after written to the memory by the at least one of the plurality of processing cores.Type: ApplicationFiled: May 19, 2014Publication date: March 5, 2015Applicant: VIA TECHNOLOGIES, INC.Inventors: G. Glenn Henry, Stephan Gaskins
-
Publication number: 20150058600Abstract: A data processor of an embodiment includes a memory, an instruction cache, a processing unit (CPU), and a fetch process control unit. The memory stores a program in which a plurality of instructions are written. The instruction cache operates only when a branch instruction included in the program is executed, and data of a greater capacity than a width of a bus of the memory is read from the memory and stored in the instruction cache in advance. The processing unit accesses both the memory and the instruction cache and executes, in a pipelined manner, instructions read from the memory or the instruction cache. The fetch process control unit generates, in response to a branch instruction executed by the processing unit, a stop signal for stopping a fetch process of reading an instruction from the memory, and outputs the stop signal to the memory.Type: ApplicationFiled: February 14, 2012Publication date: February 26, 2015Applicant: RENESAS ELECTRONICS CORPORATIONInventor: Masakatsu Ishizaki
-
Patent number: 8954714Abstract: An apparatus includes a processor. The processor includes two memories. The first memory stores one set of instructions. The second memory stores another set of instructions that are longer than the set of instructions in the first memory. An instruction in the set of instructions in the first memory is used as a pointer to a corresponding instruction in the set of instructions in the second memory.Type: GrantFiled: February 1, 2010Date of Patent: February 10, 2015Assignee: Altera CorporationInventor: Steven Perry
-
Patent number: 8949579Abstract: A processor of an information handling system (IHS) initiates an L3 cache prefetch operation in response to a demand load during instruction processing. The processor selects an L3 cache prefetch at random for tracking as a target prefetched instruction. The processor initiates an L1 cache target prefetch operation and stores the resultant target prefetched instruction in the L1 cache. If a demand load arrives, the processor analyzes the target prefetched instruction for effectiveness and determines the source of the prefetch data. If a demand does not arrive, the processor tests to determine if the particular prefetched instruction timed out in the cache and identifies the ineffectiveness of the prefetch operation. The processor samples multiple prefetch operations at random and generates a history of prefetch effectiveness and other useful prefetch information. The processor stores the prefetch effectiveness information to enable reduction or removal of ineffective prefetch operations.Type: GrantFiled: October 4, 2010Date of Patent: February 3, 2015Assignee: International Business Machines CorporationInventors: Miles R. Dooley, Venkat R. Indukuru, Alex E. Mericas, Francis P. O'Connell
-
Publication number: 20150019841Abstract: Embodiments relate to prefetching data on a chip having a scout core and a parent core coupled to the scout core. A method includes determining that a program executed by the parent core requires content stored in a location remote from the parent core. The method includes sending a fetch table address determined by the parent core to the scout core. The method includes accessing a fetch table that is indicated by the fetch table address by the scout core. The fetch table indicates how many of pieces of content are to be fetched by the scout core and a location of the pieces of content. The method includes based on the fetch table indicating, fetching the pieces of content by the scout core. The method includes returning the fetched pieces of content to the parent core.Type: ApplicationFiled: September 30, 2014Publication date: January 15, 2015Inventors: Brian R. Prasky, Fadi Y. Busaba, Steven R. Carlough, Christopher A. Krygowski, Chung-lung K. Shum
-
Publication number: 20150006855Abstract: Predictive fetching and decoding for selected instructions (e.g., operating system instructions, hypervisor instructions or other such instructions). A determination is made that a selected instruction, such as a system call instruction, an asynchronous interrupt, a return from system call instruction or return from asynchronous interrupt, is to be executed. Based on determining that such an instruction is to be executed, a predicted address is determined for the selected instruction, which is the address to which processing transfers in order to provide the requested services. Then, fetching of instructions beginning at the predicted address prior to execution of the selected instruction is commenced. Further, speculative state relating to a selected instruction, including, for instance, an indication of the privilege level of the selected instruction or instructions executed on behalf of the selected instruction, is predicted and maintained.Type: ApplicationFiled: June 28, 2013Publication date: January 1, 2015Inventors: Michael K. Gschwind, Valentina Salapura
-
Publication number: 20150006854Abstract: Predictive fetching and decoding for selected instructions. A determination is made as to whether an instruction to be executed in a pipelined processor is a selected return instruction, the pipelined processor having a plurality of stages including an execute stage. Based on the instruction being the selected return instruction, obtaining from a data structure a predicted return address, the predicted return address being an address of an instruction to which it is predicted that processing is to be returned. Additionally, based on the instruction being the selected return instruction, operating state for the instruction at the predicted return address is predicted. The instruction is fetched at the predicted return address, prior to the selected return instruction reaching the execute stage, and decoding of the fetched instruction is initiated based on the predicted operating state.Type: ApplicationFiled: June 28, 2013Publication date: January 1, 2015Inventors: Michael K. Gschwind, Valentina Salapura
-
Patent number: 8924651Abstract: An apparatus and method is described herein for optimization to prefetch throttling, which potentially enhances performance, reduces power consumption, and maintains positive gain for workloads that benefit from prefetching. More specifically, the optimizations described herein allow for bandwidth congestion and prefetch accuracy to be taken into account as feedbacks for throttling at the source of prefetch generation. As a result, when there is low congestion, full prefetch generation is allowed, even if the prefetch is inaccurate, since there is available bandwidth. However, when congestion is high, the determination of throttling falls to prefetch accuracy. If accuracy is high—miss rate is low—then less throttling is needed, because the prefetches are being utilized—performance is being enhanced.Type: GrantFiled: April 16, 2013Date of Patent: December 30, 2014Assignee: Intel CorporationInventors: Perry P. Tang, Hemant G. Rotithor, Ryan L. Carlson, Nagi Aboulenein
-
Patent number: 8918626Abstract: The disclosed embodiments relate to a system that executes program instructions on a processor. During a normal-execution mode, the system issues instructions for execution in program order. Upon encountering an unresolved data dependency during execution of an instruction, the system speculatively executes subsequent instructions in a lookahead mode to prefetch future loads. When an instruction retires during the lookahead mode, a working register which serves as a destination register for the instruction is not copied to a corresponding architectural register. Instead the architectural register is marked as invalid. Note that by not updating architectural registers during lookahead mode, the system eliminates the need to checkpoint the architectural registers prior to entering lookahead mode.Type: GrantFiled: November 10, 2011Date of Patent: December 23, 2014Assignee: Oracle International CorporationInventors: Yuan C. Chou, Eric W. Mahurin
-
Publication number: 20140372731Abstract: A data processing system includes an execution pipeline that includes one or more programmable execution stages which execute execution threads to execute instructions to perform data processing operations. Instructions to be executed by a group of execution threads are first fetched into an instruction cache and then read from the instruction cache for execution by the thread group. When an instruction to be executed by a thread group is present in a cache line in the instruction cache, or is to be fetched into an allocated cache line in the instruction cache, a pointer to the location of the instruction in the instruction cache is stored for the thread group. This stored pointer is then used to retrieve the instruction for execution by the thread group from the instruction cache.Type: ApplicationFiled: June 14, 2013Publication date: December 18, 2014Inventors: Jorn Nystad, Andreas Engh-Halstvedt
-
Publication number: 20140372730Abstract: The invention relates to the method of prefetching data in micro-processor buffer under software controls.Type: ApplicationFiled: June 14, 2013Publication date: December 18, 2014Inventors: Muhammad Yasir Qadri, Nadia Nawaz Qadri, Klaus Dieter McDonald-Maier
-
Publication number: 20140344551Abstract: The dual-mode instruction fetching apparatus includes a mode register, a branch prediction unit, a Program Counter (PC) calculator, an Instruction Queue (IQ), and a fetch multiplexer. The mode register is set to one of normal mode and line mode. The PC calculator accesses a tag in which the address indices of instructions have been stored or a line in which the instructions have been grouped and then outputs an instruction, or accesses only the line and then outputs an instruction depending on the type of set mode. The IQ stores instructions selected by an instruction selector from among the instructions grouped in the line. The fetch multiplexer fetches the instructions stored in the IQ if the normal mode has been set, and fetches instructions read from the line of an instruction cache if the line mode has been set.Type: ApplicationFiled: April 21, 2014Publication date: November 20, 2014Applicant: Electronics and Telecommunications Research InstituteInventor: Young-Su KWON
-
Patent number: 8850162Abstract: A method and system for implementing vector prefetch with streaming access detection is contemplated in which an execution unit such as a vector execution unit, for example, executes a vector memory access instruction that references an associated vector of effective addresses. The vector of effective addresses includes a number of elements, each of which includes a memory pointer. The vector memory access instruction is executable to perform multiple independent memory access operations using at least some of the memory pointers of the vector of effective addresses. A prefetch unit, for example, may detect a memory access streaming pattern based upon the vector of effective addresses, and in response to detecting the memory access streaming pattern, the prefetch unit may calculate one or more prefetch memory addresses based upon the memory access streaming pattern. Lastly, the prefetch unit may prefetch the one or more prefetch memory addresses into a memory.Type: GrantFiled: May 22, 2012Date of Patent: September 30, 2014Assignee: Apple Inc.Inventor: Jeffry E. Gonion
-
Publication number: 20140281390Abstract: A data processor includes a packet selector. The packet selector creates an ordered list of packets, each packet corresponding to a respective communication flow, determines whether each packet in the ordered list of packets is eligible for transfer to a prefetch unit based on whether a preceding packet in the same communication flow has been transferred to the prefetch unit, and sets a selection priority for each packet based on start time constraints for the respective communication flow, and based on a processing status of a preceding packet in the communication flow.Type: ApplicationFiled: March 13, 2013Publication date: September 18, 2014Applicant: FREESCALE SEMICONDUCTOR, INC.Inventors: Timothy G. Boland, Anne C. Harris, Steven D. Millman
-
Publication number: 20140258682Abstract: Provided is a processor with a multi-pipeline fetch structure or a multi-cycle cache structure, including: an integer core which reads instruction transmitted from a lower block, executes an operation corresponding to the instruction, and transmits an instruction address to the lower block; an instruction buffer which stores instruction data which are requested by the integer core by using the instruction address and transmits the instruction data in response to the request of the integer core; and an instruction cache which stores a portion of data of a program memory and transmit the data to the instruction buffer in response to the request of the instruction buffer.Type: ApplicationFiled: May 16, 2013Publication date: September 11, 2014Applicant: Advanced Digital Chips Inc.Inventors: YOUNG HO CHA, KWANG HO LEE, KWAN YOUNG KIM, BYUNG GUEON MIN