Prefetching Patents (Class 712/207)

Data processor having dynamic control of instruction prefetch buffer depth and method therefor

Patent number: 9304773

Abstract: A data processor (102) includes a prefetch buffer (112) and a fetch control unit (116). The prefetch buffer (112) has a plurality of lines. The prefetch buffer (112) has a variable maximum depth that defines a number of lines of the plurality of lines that are capable of storing instructions. The fetch control unit (116) is coupled to the prefetch buffer to monitor at least one of the plurality of lines of the prefetch buffer (112) and to adjust the variable maximum depth of the prefetch buffer (112) in response to a state of the data processor (102).

Type: Grant

Filed: March 21, 2006

Date of Patent: April 5, 2016

Assignee: FREESCALE SEMICONDUCTOR, INC.

Inventors: Jeffrey W. Scott, William C. Moyer
Method and portable device for managing memory in a data stream management system using priority information

Patent number: 9298644

Abstract: Provided are method and device for managing a memory in a data stream management system (DSMS) of a portable device. The method includes moving data of a selected memory region that has a low priority to a secondary storage and storing a received data stream in the selected memory region.

Type: Grant

Filed: March 15, 2012

Date of Patent: March 29, 2016

Assignee: Samsung Electronics Co., Ltd.

Inventors: Seung-woo Ryu, Seok-jin Hong, Keun-joo Kwon
Constraining prefetch requests to a processor socket

Patent number: 9286224

Abstract: In an embodiment, a processor includes at least one core having one or more execution units, a first cache memory and a first cache control logic. The first cache control logic may be configured to generate a first prefetch request to prefetch first data, where this request is to be aborted if the first data is not present in a second cache memory coupled to the first cache memory. Other embodiments are described and claimed.

Type: Grant

Filed: November 26, 2013

Date of Patent: March 15, 2016

Assignee: Intel Corporation

Inventors: Seth H. Pugsley, Robert L. Scott, Zeshan A. Chishti, Peng-Fei Chuang, Khun Ban, Christopher B. Wilkerson, Shih-Lien L. Lu, Kingsum Chow
Second-level branch target buffer bulk transfer filtering

Patent number: 9280351

Abstract: Embodiments relate to second-level branch target buffer bulk transfer filtering. An aspect includes a system for second-level branch target buffer bulk transfer filtering. The system includes a first-level branch target buffer and a second-level branch target buffer coupled to a processing circuit. The processing circuit is configured to perform a method. The method includes receiving branch target buffer miss indicators, receiving instruction cache miss indicators, and recording information about the branch target buffer miss indicators and the instruction cache miss indicators in search trackers. Based on detecting, by the processing circuit, a search tracker representing a correlated pair of the branch target buffer miss indicators and the instruction cache miss indicators, the search tracker is activated by the processing circuit to perform a bulk transfer from the second-level branch target buffer to the first-level branch target buffer.

Type: Grant

Filed: June 15, 2012

Date of Patent: March 8, 2016

Assignee: International Business Machines Corporation

Inventors: James J. Bonanno, Ulrich Mayer, Brian R. Prasky
Modify and execute next sequential instruction facility and instructions therefor

Patent number: 9268572

Abstract: An modify next sequential instruction (MNSI) instruction, when executed, modifies a field of the fetched copy of the next sequential instruction (NSI) to enable a program to dynamically provide parameters to the NSI being executed. Thus the MNSI instruction is a non-disruptive prefix instruction to the NSI. The NSI may be modified to effectively extend the length of the NSI field, thus providing more registers or more range (in the case of a length field) than otherwise available to the NSI instruction according to the instruction set architecture (ISA).

Type: Grant

Filed: December 11, 2012

Date of Patent: February 23, 2016

Assignee: International Business Machines Corporation

Inventors: Michael K Gschwind, Eric M Schwarz
Modify and execute sequential instruction facility and instructions therefor

Patent number: 9250904

Abstract: An modify next sequential instruction (MNSI) instruction, when executed, modifies a field of the fetched copy of the next sequential instruction (NSI) to enable a program to dynamically provide parameters to the NSI being executed. Thus the MNSI instruction is a non-disruptive prefix instruction to the NSI. The NSI may be modified to effectively extend the length of the NSI field, thus providing more registers or more range (in the case of a length field) than otherwise available to the NSI instruction according to the instruction set architecture (ISA).

Type: Grant

Filed: December 24, 2013

Date of Patent: February 2, 2016

Assignee: International Business Machines Corporation

Inventors: Michael K Gschwind, Eric M Schwarz
Reconfigurable circuit with suspension control circuit

Patent number: 9251117

Abstract: A reconfigurable circuit includes a reconfigurable arithmetic execution unit array including a plurality of arithmetic execution units and a network circuit to provide reconfigurable connections between the arithmetic execution units, a suspension control circuit configured to control suspension and resumption of operation of the reconfigurable arithmetic execution unit array, and a buffer circuit configured to temporarily store data supplied from an external source upon suspension of the operation of the reconfigurable arithmetic execution unit array and to supply the stored data to the reconfigurable arithmetic execution unit array upon resumption of the operation of the reconfigurable arithmetic execution unit array.

Type: Grant

Filed: March 12, 2010

Date of Patent: February 2, 2016

Assignee: CYPRESS SEMICONDUCTOR CORPORATION

Inventors: Takashi Hanai, Shinichi Sutou
Instruction forwarding based on predication criteria

Patent number: 9207938

Abstract: Embodiments herein relate to forwarding an instruction based on predication criteria. A predicate state associated with a packet of data is to be compared to an instruction associated with the predication criteria. The instruction is to be forwarded to an execution unit if the predication criteria includes or matches the predicate state of the packet.

Type: Grant

Filed: August 29, 2012

Date of Patent: December 8, 2015

Assignee: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.

Inventors: David A Warren, Thomas A Keaveny
Processor instruction based data prefetching

Patent number: 9201798

Abstract: A computer implemented method for prefetching data. The method includes: receiving one or more addresses by a prefetching unit upon execution of an enqueuing command in a first piece of program logic; enqueuing each of the received addresses to a recording-list; identifying one of the positions in the recording-list as jump position; providing the identified jump position to a frame-shifter; using a sub-list of the recording-list defined by a shiftable frame as a playback-list; executing a frame-shift command which triggers the frame-shifter to shift the frame in dependence on the jump position to provide an updated playback-list; fetching data identified by the updated playback-list from a second memory; and transferring the fetched data to a first memory.

Type: Grant

Filed: October 9, 2013

Date of Patent: December 1, 2015

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Hans Boettiger, Thilo Maurer
Credit-based streaming multiprocessor warp scheduling

Patent number: 9189242

Abstract: One embodiment of the present invention sets forth a technique for ensuring cache access instructions are scheduled for execution in a multi-threaded system to improve cache locality and system performance. A credit-based technique may be used to control instruction by instruction scheduling for each warp in a group so that the group of warps is processed uniformly. A credit is computed for each warp and the credit contributes to a weight for each warp. The weight is used to select instructions for the warps that are issued for execution.

Type: Grant

Filed: September 17, 2010

Date of Patent: November 17, 2015

Assignee: NVIDIA Corporation

Inventors: John Erik Lindholm, Brett W. Coon, Jered Wierzbicki, Robert J. Stoll, Stuart F. Oberman
Method for indexing dies comprising integrated circuits

Patent number: 9177878

Abstract: Indexing a plurality of die obtainable from a material wafer comprising a plurality of stacked material layers. Each die is obtained in a respective position of the wafer. A manufacturing stage comprises at least two steps for treating a respective superficial portion of the material wafer that corresponds to a subset of said plurality of dies using the at least one lithographic mask through the exposition to the proper radiation in temporal succession. The method may include providing a die index on each die which is indicative of the position of the respective die by forming an external index indicative of the position of the superficial portion of the material wafer corresponding to the subset of the plurality of dies including said die and may comprise a plurality of electronic components electrically coupled to each other by means of a respective common control line.

Type: Grant

Filed: July 2, 2014

Date of Patent: November 3, 2015

Assignee: STMicroelectronics S.r.l.

Inventors: Daniele Alfredo Brambilla, Fausto Redigolo
Multi-core processor system

Patent number: 9152482

Abstract: The present invention provides a multi-core processor system, including: multiple central processor units and multiple groups of level-one hardware message queues. Each central processor unit is separately connected to a group of level-one hardware message queues and is configured to process messages in the level-one hardware message queues. Each group of level-one hardware message queues includes multiple level-one hardware message queues. Moreover, in each group of level-one hardware message queues, a level-one hardware message queue having a higher priority is scheduled preferentially, and level-one hardware message queues having the same priority are scheduled in a round-robin manner according to round robin scheduling weights. Through the multi-core processor system provided in the present invention, the efficiency and performance of the multi-core processor system are improved.

Type: Grant

Filed: April 28, 2014

Date of Patent: October 6, 2015

Assignee: Huawei Technologies Co., Ltd.

Inventors: Weiguo Zhang, Libo Wu
Branch prediction preloading

Patent number: 9146740

Abstract: Embodiments relate to branch prediction preloading. A method for branch prediction preloading includes fetching a plurality of instructions in an instruction stream, and decoding a branch prediction preload instruction in the instruction stream. The method also includes determining, by a processing circuit, an address of a predicted branch instruction based on the branch prediction preload instruction, and determining, by the processing circuit, a predicted target address of the predicted branch instruction based on the branch prediction preload instruction. The method further includes identifying a mask field in the branch prediction preload instruction, and determining, by the processing circuit, a branch instruction length of the predicted branch instruction based on the mask field.

Type: Grant

Filed: March 5, 2013

Date of Patent: September 29, 2015

Assignee: International Business Machines Corporation

Inventors: James J. Bonanno, Marcel Mitran, Brian R. Prasky, Joran Siu, Timothy J. Slegel, Alexander Vasilevskiy
Next branch table for use with a branch predictor

Patent number: 9135011

Abstract: A data processing system 2 is provided with branch prediction circuitry 20 for performing branch prediction operations. Next branch table circuitry 22 stores data identifying from a given branch instruction what will be the address of the next branch instruction to be encountered within the program flow. This next branch instruction address is supplied to the branch prediction circuitry 20 which uses it to form its prediction prior to that next branch instruction being identified as such by the instruction decoder 16. This permits branch prediction to commence earlier in the branch prediction circuitry 20 than would otherwise be the case.

Type: Grant

Filed: April 30, 2012

Date of Patent: September 15, 2015

Assignee: The Regents of the University of Michigan

Inventors: David Thomas Manville, Trevor Nigel Mudge
Creating a program product or system for executing an instruction for pre-fetching data and releasing cache lines

Patent number: 9069675

Abstract: Systems and Program Products are created to execute a prefetch data machine instruction having an M field performs a function on a cache line of data specifying an address of an operand. The operation comprises either prefetching a cache line of data from memory to a cache or reducing the access ownership of store and fetch or fetch only of the cache line in the cache or a combination thereof. The address of the operand is either based on a register value or the program counter value pointing to the prefetch data machine instruction.

Type: Grant

Filed: March 21, 2014

Date of Patent: June 30, 2015

Assignee: International Business Machines Corporation

Inventors: Dan F Greiner, Timothy J Slegel
Prefetch optimizer measuring execution time of instruction sequence cycling through each selectable hardware prefetch depth and cycling through disabling each software prefetch instruction of an instruction sequence of interest

Patent number: 9043579

Abstract: A prefetch optimizer tool for an information handling system (IHS) may improve effective memory access time by controlling both hardware prefetch operations and software prefetch operations. The prefetch optimizer tool selectively disables prefetch instructions in an instruction sequence of interest within an application. The tool measures execution times of the instruction sequence of interest when different prefetch instructions are disabled. The tool may hold hardware prefetch depth constant while cycling through disabling different prefetch instructions and taking corresponding execution time measurements. Alternatively, for each disabled prefetch instruction in the instruction sequence of interest, the tool may cycle through different hardware prefetch depths and take corresponding execution time measurements at each hardware prefetch depth.

Type: Grant

Filed: January 10, 2012

Date of Patent: May 26, 2015

Assignee: International Business Machines Corporation

Inventor: Randall Ray Heisch
HAND HELD DEVICE TO PERFORM A BIT RANGE ISOLATION INSTRUCTION

Publication number: 20150143084

Abstract: Receiving an instruction indicating a source operand and a destination operand. Storing a result in the destination operand in response to the instruction. The result operand may have: (1) first range of bits having a first end explicitly specified by the instruction in which each bit is identical in value to a bit of the source operand in a corresponding position; and (2) second range of bits that all have a same value regardless of values of bits of the source operand in corresponding positions. Execution of instruction may complete without moving the first range of the result relative to the bits of identical value in the corresponding positions of the source operand, regardless of the location of the first range of bits in the result. Execution units to execute such instructions, computer systems having processors to execute such instructions, and machine-readable medium storing such an instruction are also disclosed.

Type: Application

Filed: December 12, 2014

Publication date: May 21, 2015

Applicant: INTEL CORPORATION

Inventors: Maxim Loktyukhin, Eric W. Mahurin, Bret L. Toll, Martin G. Dixon, Sean P. Mirkes, David L. Kreitzer, ELMOUSTAPHA OULD-AHMED-VALL, Vinodh Gopal
Data processing method and apparatus for prefetching

Patent number: 9037835

Abstract: A data processing device includes processing circuitry 20 for executing a first memory access instruction to a first address of a memory device 40 and a second memory access instruction to a second address of the memory device 40, the first address being different from the second address. The data processing device also includes prefetching circuitry 30 for prefetching data from the memory device 40 based on a stride length 70 and instruction analysis circuitry 50 for determining a difference between the first address and the second address. Stride refining circuitry 60 is also provided to refine the stride length based on factors of the stride length and factors of the difference calculated by the instruction analysis circuitry 50.

Type: Grant

Filed: October 24, 2013

Date of Patent: May 19, 2015

Assignee: ARM Limited

Inventors: Ganesh Suryanarayan Dasika, Rune Holm
ADAPTIVE PREFETCHING IN A DATA PROCESSING APPARATUS

Publication number: 20150134933

Abstract: A data processing apparatus and method of data processing are disclosed. An instruction execution unit executes a sequence of program instructions, wherein execution of at least some of the program instructions initiates memory access requests to retrieve data values from a memory. A prefetch unit prefetches data values from the memory for storage in a cache unit before they are requested by the instruction execution unit. The prefetch unit is configured to perform a miss response comprising increasing a number of the future data values which it prefetches, when a memory access request specifies a pending data value which is already subject to prefetching but is not yet stored in the cache unit. The prefetch unit is also configured, in response to an inhibition condition being met, to temporarily inhibit the miss response for an inhibition period.

Type: Application

Filed: November 14, 2013

Publication date: May 14, 2015

Applicant: ARM Limited

Inventors: Rune HOLM, Ganesh Suryanarayan Dasika
METHOD AND APPARATUS FOR SHUFFLING DATA

Publication number: 20150121039

Abstract: Method, apparatus, and program means for shuffling data. The method of one embodiment comprises receiving a first operand having a set of L data elements and a second operand having a set of L control elements. For each control element, data from a first operand data element designated by the individual control element is shuffled to an associated resultant data element position if its flush to zero field is not set and a zero is placed into the associated resultant data element position if its flush to zero field is not set.

Type: Application

Filed: December 30, 2014

Publication date: April 30, 2015

Applicant: Intel Corporation

Inventors: William W. Macy, JR., Eric L. Debes, Patrice L. Roussel, Huy V. Nguyen
PREFETCH STRATEGY CONTROL

Publication number: 20150121038

Abstract: A single instruction multiple thread (SIMT) processor 2 includes execution circuitry 6, prefetch circuitry 12 and prefetch strategy selection circuitry 14. The prefetch strategy selection circuitry serves to detect one or more characteristics of a stream of program instructions that are being executed to identify whether or not a given data access instruction within a program will be executed a plurality of times. The prefetch strategy to use is selected from a plurality of selectable prefetch strategy in dependence upon the detection of such characteristics.

Type: Application

Filed: October 24, 2013

Publication date: April 30, 2015

Applicant: ARM LIMITED

Inventors: Ganesh Suryanarayan DASIKA, Rune HOLM, David Hennah MANSELL
Data processing apparatus, control method therefor, and non-transitory computer-readable storage medium

Patent number: 9020211

Abstract: A data processing apparatus which sequentially executes a verification process so as to recognize a target object, comprising: an obtaining unit configured to obtain dictionary data to be referred to in the verification process; a holding unit configured to hold a plurality of dictionary data; a verification unit configured to execute the verification process for the input data by referring to one dictionary data; a history holding unit configured to hold a verification result; and a prefetch determination unit configured to determine based on the verification result whether to execute prefetch processing in which the obtaining unit obtains in advance dictionary data to be referred to by the verification unit in a succeeding verification process, and holds the dictionary data in the holding unit before the succeeding verification process.

Type: Grant

Filed: May 10, 2012

Date of Patent: April 28, 2015

Assignee: Canon Kabushiki Kaisha

Inventor: Akiyoshi Momoi
METHODS AND APPARATUS TO PERFORM ADAPTIVE PRE-FETCH OPERATIONS IN MANAGED RUNTIME ENVIRONMENTS

Publication number: 20150113249

Abstract: Methods and apparatus to perform adaptive pre-fetch operations in managed runtime environments are disclosed herein. An example disclosed method includes determining an object size associated with a pre-fetch operation; comparing the object size to a first one of a series of thresholds having increasing respective values; when the object size is less than the first one of the series of thresholds, pre-fetching a first amount of stored data assigned to the first one of the series of thresholds; and when the object size is greater than the first one of the plurality of thresholds, comparing the object size to a next one of the series of thresholds.

Type: Application

Filed: December 30, 2014

Publication date: April 23, 2015

Inventor: Mingqiu Sun
Efficient state transition among multiple programs on multi-threaded processors by executing cache priming program

Patent number: 9015720

Abstract: A system and method to optimize processor performance and minimizing average thread latency by selectively loading a cache when a program state, resources required for execution of a program or the program itself change, is described. An embodiment of the invention supports a “cache priming program” that is selectively executed for a first thread/program/sub-routine of each process. Such a program is optimized for situations when instructions and other program data are not yet resident in cache(s), and/or whenever resources required for program execution or the program itself changes. By pre-loading the cache with two resources required for two instructions for only a first thread, average thread latency is reduced because the resources are already present in the cache.

Type: Grant

Filed: January 6, 2009

Date of Patent: April 21, 2015

Assignee: Advanced Micro Devices, Inc.

Inventors: Andrew Brown, Brian Emberling
FILTERING OUT REDUNDANT SOFTWARE PREFETCH INSTRUCTIONS

Publication number: 20150106590

Abstract: The disclosed embodiments relate to a system that selectively filters out redundant software prefetch instructions during execution of a program on a processor. During execution of the program, the system collects information associated with hit rates for individual software prefetch instructions as the individual software prefetch instructions are executed, wherein a software prefetch instruction is redundant if the software prefetch instruction accesses a cache line that has already been fetched from memory. As software prefetch instructions are encountered during execution of the program, the system selectively filters out individual software prefetch instructions that are likely to be redundant based on the collected information, so that likely redundant software prefetch instructions are not executed by the processor.

Type: Application

Filed: October 14, 2013

Publication date: April 16, 2015

Applicant: Oracle International Corporation

Inventor: Yuan C. Chou
Application level speculative processing

Patent number: 9009734

Abstract: One or more embodiments of the invention is a computer-implemented method for speculatively executing application event responses. The method includes the steps of identifying one or more event responses that could be issued for execution by an application being executed by a master process, for each event response, generating a child process to execute the event response, determining that a first event response included in the one or more event responses has been issued for execution by the application, committing the child process associated with the first event response as a new master process, and aborting the master process and all child processes other than the child process associated with the first event response.

Type: Grant

Filed: March 6, 2012

Date of Patent: April 14, 2015

Assignee: AUTODESK, Inc.

Inventor: Francesco Iorio
Reducing power consumption and resource utilization during miss lookahead

Patent number: 9009449

Abstract: A system that executes program instructions on a processor is described. During a normal-execution mode, the system issues instructions for execution in program order. Upon encountering an unresolved data dependency during execution of an instruction, the system speculatively executes subsequent instructions in a lookahead mode to prefetch future loads. While executing in the lookahead mode, if the processor determines that the lookahead mode is unlikely to uncover any additional outer-level cache misses, the system terminates the lookahead mode. Then, after the unresolved data dependency is resolved, the system recommences execution in the normal-execution mode from the instruction that triggered the lookahead mode.

Type: Grant

Filed: November 10, 2011

Date of Patent: April 14, 2015

Assignee: Oracle International Corporation

Inventors: Yuan C. Chou, Eric W. Mahurin
INSTRUCTION CACHE WITH WAY PREDICTION

Publication number: 20150100762

Abstract: A processor includes an instruction fetch unit and an execution unit. The instruction fetch unit retrieves instructions from memory to be executed by the execution unit. The instruction fetch unit includes a branch prediction unit which is configured to predict whether a branch instruction is likely to be executed. The memory includes an instruction cache comprising a portion of the fetch blocks available in the memory. The instruction fetch unit may use a combination of way prediction and serialized access to retrieve instructions from the instruction cache. The instruction fetch unit initially accesses the instruction cache to retrieve the predicted fetch block associated with a way prediction. The instruction fetch unit compares a cache tag associated with the way prediction with the address of the cache line that includes the predicted fetch block. If the tag matches, then the way prediction is correct and the retrieved fetch block is valid.

Type: Application

Filed: October 6, 2014

Publication date: April 9, 2015

Inventor: Eino Jacobs
Acceleration threads on idle OS-visible thread execution units

Patent number: 9003421

Abstract: Disclosed are embodiments of a system, methods and mechanism for using idle thread units to perform acceleration threads that are transparent to the operating system. When the operating system scheduler has no work to schedule on the idle thread units, the operating system may issue a halt or monitor/mwait or other instruction to place the thread unit into an idle state. While the thread unit is idle, from the operating system perspective, the thread unit may be utilized to perform speculative acceleration threads in order to accelerate threads running on non-idle thread units. The context of the idle thread unit is saved prior to execution of the acceleration thread and is restored when the operating system requires use of the thread unit. The acceleration threads are transparent to the operating system. Other embodiments are also described and claimed.

Type: Grant

Filed: November 28, 2005

Date of Patent: April 7, 2015

Assignee: Intel Corporation

Inventors: Ron Gabor, Gad Sheaffer, Avi Mendelson, Uri C. Weiser, Hong Wang
DATA PROCESSOR

Publication number: 20150095616

Abstract: The present invention realizes an efficient superscalar instruction issue and low power consumption at an instruction set including instructions with prefixes. An instruction fetch unit is adopted which determines whether an instruction code is of a prefix code or an instruction code other than it, and outputs the result of determination and the 16-bit instruction code. Along with it, decoders each of which decodes the instruction code, based on the result of determination, and decoders each of which decodes the prefix code, are disposed separately. Further, a prefix is supplied to each decoder prior to a fixed-length instruction code like 16 bits modified with it. A fixed-length instruction code following the prefix code is supplied to each decoder of the same pipeline as the decoder for the prefix code.

Type: Application

Filed: December 9, 2014

Publication date: April 2, 2015

Inventors: Hiroaki Nakaya, Yuki Kondoh, Makoto Ishikawa
METHOD AND APPARATUS FOR PERFORMING A SHIFT AND EXCLUSIVE OR OPERATION IN A SINGLE INSTRUCTION

Publication number: 20150089196

Abstract: Method and apparatus for performing a shift and XOR operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources perform a shift and XOR on at least one value.

Type: Application

Filed: December 1, 2014

Publication date: March 26, 2015

Inventors: Vinodh Gopal, James D. Guilford, Erdinc Ozturk, Wajdi K. Feghali, Gilbert M. Wolrich, Martin G. Dixon
METHOD AND APPARATUS FOR PERFORMING A SHIFT AND EXCLUSIVE OR OPERATION IN A SINGLE INSTRUCTION

Publication number: 20150089195

Abstract: Method and apparatus for performing a shift and XOR operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources perform a shift and XOR on at least one value.

Type: Application

Filed: December 1, 2014

Publication date: March 26, 2015

Inventors: Vinodh Gopal, James D. Guilford, Erdinc Ozturk, Wajdi K. Feghali, Gilbert M. Wolrich, Martin G. Dixon
METHOD AND APPARATUS FOR PERFORMING A SHIFT AND EXCLUSIVE OR OPERATION IN A SINGLE INSTRUCTION

Publication number: 20150089197

Abstract: Method and apparatus for performing a shift and XOR operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources perform a shift and XOR on at least one value.

Type: Application

Filed: December 1, 2014

Publication date: March 26, 2015

Inventors: Vinodh Gopal, James D. Guilford, Erdinc Ozturk, Wajdi K. Feghali, Gilbert M. Wolrich, Martin G. Dixon
Reducing instruction miss penalties in applications

Patent number: 8978022

Abstract: Embodiments include systems and methods for reducing instruction cache miss penalties during application execution. Application code is profiled to determine “hot” code regions likely to experience instruction cache miss penalties. The application code can be linearized into a set of traces that include the hot code regions. Embodiments traverse the traces in reverse, keeping track of instruction scheduling information, to determine where an accumulated instruction latency covered by the code blocks exceeds an amount of latency that can be covered by prefetching. Each time the accumulated latency exceeds the amount of latency that can be covered by prefetching, a prefetch instruction can be scheduled in the application code. Some embodiments insert additional prefetches, merge prefetches, and/or adjust placement of prefetches to account for scenarios, such as loops, merging or forking branches, edge confidence values, etc.

Type: Grant

Filed: January 10, 2013

Date of Patent: March 10, 2015

Assignee: Oracle International Corporation

Inventors: Spiros Kalogeropulos, Partha Tirumalai
Control of entry of program instructions to a fetch stage within a processing pipepline

Patent number: 8977815

Abstract: A processing pipeline 6, 8, 10, 12 is provided with a main query stage 20 and a fetch stage 22. A buffer 24 stores program instructions which have missed within a cache memory 14. Query generation circuitry within the main query stage 20 and within a buffer query stage 26 serve to concurrently generate a main query request and a buffer query request sent to the cache memory 14. The cache memory returns a main query response and a buffer query response. Arbitration circuitry 28 controls multiplexers 30, 32 and 34 to direct the program instruction at the main query stage 20, and the program instruction stored within the buffer 24 and the buffer query stage 26 to pass either to the fetch stage 22 or to the buffer 24. The multiplexer 30 can also select a new instruction to be passed to the main query stage 20.

Type: Grant

Filed: November 29, 2010

Date of Patent: March 10, 2015

Assignee: ARM Limited

Inventors: Frode Heggelund, Rune Holm, Andreas Due Engh-Halstvedt, Edvard Feilding
SERVICE PROCESSOR PATCH MECHANISM

Publication number: 20150067263

Abstract: A microprocessor includes a plurality of processing cores, a service processing unit and a memory accessible by both the service processing unit and the plurality of processing cores. At least one of the plurality of processing cores is configured to write a patch to the memory. The patch comprises one or more instructions to be fetched from the memory and executed by the service processing unit after written to the memory by the at least one of the plurality of processing cores.

Type: Application

Filed: May 19, 2014

Publication date: March 5, 2015

Applicant: VIA TECHNOLOGIES, INC.

Inventors: G. Glenn Henry, Stephan Gaskins
DATA PROCESSOR

Publication number: 20150058600

Abstract: A data processor of an embodiment includes a memory, an instruction cache, a processing unit (CPU), and a fetch process control unit. The memory stores a program in which a plurality of instructions are written. The instruction cache operates only when a branch instruction included in the program is executed, and data of a greater capacity than a width of a bus of the memory is read from the memory and stored in the instruction cache in advance. The processing unit accesses both the memory and the instruction cache and executes, in a pipelined manner, instructions read from the memory or the instruction cache. The fetch process control unit generates, in response to a branch instruction executed by the processing unit, a stop signal for stopping a fetch process of reading an instruction from the memory, and outputs the stop signal to the memory.

Type: Application

Filed: February 14, 2012

Publication date: February 26, 2015

Applicant: RENESAS ELECTRONICS CORPORATION

Inventor: Masakatsu Ishizaki
Processor with cycle offsets and delay lines to allow scheduling of instructions through time

Patent number: 8954714

Abstract: An apparatus includes a processor. The processor includes two memories. The first memory stores one set of instructions. The second memory stores another set of instructions that are longer than the set of instructions in the first memory. An instruction in the set of instructions in the first memory is used as a pointer to a corresponding instruction in the set of instructions in the second memory.

Type: Grant

Filed: February 1, 2010

Date of Patent: February 10, 2015

Assignee: Altera Corporation

Inventor: Steven Perry
Ineffective prefetch determination and latency optimization

Patent number: 8949579

Abstract: A processor of an information handling system (IHS) initiates an L3 cache prefetch operation in response to a demand load during instruction processing. The processor selects an L3 cache prefetch at random for tracking as a target prefetched instruction. The processor initiates an L1 cache target prefetch operation and stores the resultant target prefetched instruction in the L1 cache. If a demand load arrives, the processor analyzes the target prefetched instruction for effectiveness and determines the source of the prefetch data. If a demand does not arrive, the processor tests to determine if the particular prefetched instruction timed out in the cache and identifies the ineffectiveness of the prefetch operation. The processor samples multiple prefetch operations at random and generates a history of prefetch effectiveness and other useful prefetch information. The processor stores the prefetch effectiveness information to enable reduction or removal of ineffective prefetch operations.

Type: Grant

Filed: October 4, 2010

Date of Patent: February 3, 2015

Assignee: International Business Machines Corporation

Inventors: Miles R. Dooley, Venkat R. Indukuru, Alex E. Mericas, Francis P. O'Connell
ANTICIPATED PREFETCHING FOR A PARENT CORE IN A MULTI-CORE CHIP

Publication number: 20150019841

Abstract: Embodiments relate to prefetching data on a chip having a scout core and a parent core coupled to the scout core. A method includes determining that a program executed by the parent core requires content stored in a location remote from the parent core. The method includes sending a fetch table address determined by the parent core to the scout core. The method includes accessing a fetch table that is indicated by the fetch table address by the scout core. The fetch table indicates how many of pieces of content are to be fetched by the scout core and a location of the pieces of content. The method includes based on the fetch table indicating, fetching the pieces of content by the scout core. The method includes returning the fetched pieces of content to the parent core.

Type: Application

Filed: September 30, 2014

Publication date: January 15, 2015

Inventors: Brian R. Prasky, Fadi Y. Busaba, Steven R. Carlough, Christopher A. Krygowski, Chung-lung K. Shum
PREDICTIVE FETCHING AND DECODING FOR SELECTED INSTRUCTIONS

Publication number: 20150006855

Abstract: Predictive fetching and decoding for selected instructions (e.g., operating system instructions, hypervisor instructions or other such instructions). A determination is made that a selected instruction, such as a system call instruction, an asynchronous interrupt, a return from system call instruction or return from asynchronous interrupt, is to be executed. Based on determining that such an instruction is to be executed, a predicted address is determined for the selected instruction, which is the address to which processing transfers in order to provide the requested services. Then, fetching of instructions beginning at the predicted address prior to execution of the selected instruction is commenced. Further, speculative state relating to a selected instruction, including, for instance, an indication of the privilege level of the selected instruction or instructions executed on behalf of the selected instruction, is predicted and maintained.

Type: Application

Filed: June 28, 2013

Publication date: January 1, 2015

Inventors: Michael K. Gschwind, Valentina Salapura
PREDICTIVE FETCHING AND DECODING FOR SELECTED RETURN INSTRUCTIONS

Publication number: 20150006854

Abstract: Predictive fetching and decoding for selected instructions. A determination is made as to whether an instruction to be executed in a pipelined processor is a selected return instruction, the pipelined processor having a plurality of stages including an execute stage. Based on the instruction being the selected return instruction, obtaining from a data structure a predicted return address, the predicted return address being an address of an instruction to which it is predicted that processing is to be returned. Additionally, based on the instruction being the selected return instruction, operating state for the instruction at the predicted return address is predicted. The instruction is fetched at the predicted return address, prior to the selected return instruction reaching the execute stage, and decoding of the fetched instruction is initiated based on the predicted operating state.

Type: Application

Filed: June 28, 2013

Publication date: January 1, 2015

Inventors: Michael K. Gschwind, Valentina Salapura
Prefetch optimization in shared resource multi-core systems

Patent number: 8924651

Abstract: An apparatus and method is described herein for optimization to prefetch throttling, which potentially enhances performance, reduces power consumption, and maintains positive gain for workloads that benefit from prefetching. More specifically, the optimizations described herein allow for bandwidth congestion and prefetch accuracy to be taken into account as feedbacks for throttling at the source of prefetch generation. As a result, when there is low congestion, full prefetch generation is allowed, even if the prefetch is inaccurate, since there is available bandwidth. However, when congestion is high, the determination of throttling falls to prefetch accuracy. If accuracy is high—miss rate is low—then less throttling is needed, because the prefetches are being utilized—performance is being enhanced.

Type: Grant

Filed: April 16, 2013

Date of Patent: December 30, 2014

Assignee: Intel Corporation

Inventors: Perry P. Tang, Hemant G. Rotithor, Ryan L. Carlson, Nagi Aboulenein
Prefetching load data in lookahead mode and invalidating architectural registers instead of writing results for retiring instructions

Patent number: 8918626

Abstract: The disclosed embodiments relate to a system that executes program instructions on a processor. During a normal-execution mode, the system issues instructions for execution in program order. Upon encountering an unresolved data dependency during execution of an instruction, the system speculatively executes subsequent instructions in a lookahead mode to prefetch future loads. When an instruction retires during the lookahead mode, a working register which serves as a destination register for the instruction is not copied to a corresponding architectural register. Instead the architectural register is marked as invalid. Note that by not updating architectural registers during lookahead mode, the system eliminates the need to checkpoint the architectural registers prior to entering lookahead mode.

Type: Grant

Filed: November 10, 2011

Date of Patent: December 23, 2014

Assignee: Oracle International Corporation

Inventors: Yuan C. Chou, Eric W. Mahurin
DATA PROCESSING SYSTEMS

Publication number: 20140372731

Abstract: A data processing system includes an execution pipeline that includes one or more programmable execution stages which execute execution threads to execute instructions to perform data processing operations. Instructions to be executed by a group of execution threads are first fetched into an instruction cache and then read from the instruction cache for execution by the thread group. When an instruction to be executed by a thread group is present in a cache line in the instruction cache, or is to be fetched into an allocated cache line in the instruction cache, a pointer to the location of the instruction in the instruction cache is stored for the thread group. This stored pointer is then used to retrieve the instruction for execution by the thread group from the instruction cache.

Type: Application

Filed: June 14, 2013

Publication date: December 18, 2014

Inventors: Jorn Nystad, Andreas Engh-Halstvedt
SOFTWARE CONTROLLED DATA PREFETCH BUFFERING

Publication number: 20140372730

Abstract: The invention relates to the method of prefetching data in micro-processor buffer under software controls.

Type: Application

Filed: June 14, 2013

Publication date: December 18, 2014

Inventors: Muhammad Yasir Qadri, Nadia Nawaz Qadri, Klaus Dieter McDonald-Maier
DUAL-MODE INSTRUCTION FETCHING APPARATUS AND METHOD

Publication number: 20140344551

Abstract: The dual-mode instruction fetching apparatus includes a mode register, a branch prediction unit, a Program Counter (PC) calculator, an Instruction Queue (IQ), and a fetch multiplexer. The mode register is set to one of normal mode and line mode. The PC calculator accesses a tag in which the address indices of instructions have been stored or a line in which the instructions have been grouped and then outputs an instruction, or accesses only the line and then outputs an instruction depending on the type of set mode. The IQ stores instructions selected by an instruction selector from among the instructions grouped in the line. The fetch multiplexer fetches the instructions stored in the IQ if the normal mode has been set, and fetches instructions read from the line of an instruction cache if the line mode has been set.

Type: Application

Filed: April 21, 2014

Publication date: November 20, 2014

Applicant: Electronics and Telecommunications Research Institute

Inventor: Young-Su KWON
Macroscalar vector prefetch with streaming access detection

Patent number: 8850162

Abstract: A method and system for implementing vector prefetch with streaming access detection is contemplated in which an execution unit such as a vector execution unit, for example, executes a vector memory access instruction that references an associated vector of effective addresses. The vector of effective addresses includes a number of elements, each of which includes a memory pointer. The vector memory access instruction is executable to perform multiple independent memory access operations using at least some of the memory pointers of the vector of effective addresses. A prefetch unit, for example, may detect a memory access streaming pattern based upon the vector of effective addresses, and in response to detecting the memory access streaming pattern, the prefetch unit may calculate one or more prefetch memory addresses based upon the memory access streaming pattern. Lastly, the prefetch unit may prefetch the one or more prefetch memory addresses into a memory.

Type: Grant

Filed: May 22, 2012

Date of Patent: September 30, 2014

Assignee: Apple Inc.

Inventor: Jeffry E. Gonion
SYSTEM AND METHOD FOR ORDERING PACKET TRANSFERS IN A DATA PROCESSOR

Publication number: 20140281390

Abstract: A data processor includes a packet selector. The packet selector creates an ordered list of packets, each packet corresponding to a respective communication flow, determines whether each packet in the ordered list of packets is eligible for transfer to a prefetch unit based on whether a preceding packet in the same communication flow has been transferred to the prefetch unit, and sets a selection priority for each packet based on start time constraints for the respective communication flow, and based on a processing status of a preceding packet in the communication flow.

Type: Application

Filed: March 13, 2013

Publication date: September 18, 2014

Applicant: FREESCALE SEMICONDUCTOR, INC.

Inventors: Timothy G. Boland, Anne C. Harris, Steven D. Millman
PIPELINED PROCESSOR

Publication number: 20140258682

Abstract: Provided is a processor with a multi-pipeline fetch structure or a multi-cycle cache structure, including: an integer core which reads instruction transmitted from a lower block, executes an operation corresponding to the instruction, and transmits an instruction address to the lower block; an instruction buffer which stores instruction data which are requested by the integer core by using the instruction address and transmits the instruction data in response to the request of the integer core; and an instruction cache which stores a portion of data of a program memory and transmit the data to the instruction buffer in response to the request of the instruction buffer.

Type: Application

Filed: May 16, 2013

Publication date: September 11, 2014

Applicant: Advanced Digital Chips Inc.

Inventors: YOUNG HO CHA, KWANG HO LEE, KWAN YOUNG KIM, BYUNG GUEON MIN

prev 1 2 3 4 5 6 7 … next