Abstract: Operation of a multi-slice processor that includes a plurality of execution slices and a plurality of load/store slices coupled via a results bus includes: retrieving, from the results bus into an entry of a register file of an execution slice, speculative result data of a load instruction generated by a load/store slice; and determining, from the load/store slice after expiration of a predetermined period of time, whether the result data is valid.
Type:
Grant
Filed:
December 15, 2015
Date of Patent:
March 20, 2018
Assignee:
International Business Machines Corporation
Inventors:
Joshua W. Bowman, Sundeep Chadha, Michael J. Genden, Dhivya Jeganathan, Dung Q. Nguyen, David R. Terry, Eula A. Tolentino
Abstract: There is a need to provide a microcomputer capable of eliminating an external terminal for endian selection. Flash memory includes a user boot area for storing a program executed in user boot mode and corresponding endian information and a user area for storing a program executed in user mode and corresponding endian information. A data transfer circuit reads endian information stored in the user boot area or the user area in accordance with operation mode and supplies the endian information to a CPU before reset release of the CPU. Accordingly, an external terminal for endian selection can be eliminated.
Abstract: Embodiments relate to selectively blocking branch instruction predictions. An aspect includes computer implemented method for performing selective branch prediction. The method includes detecting, by a processor, a branch-prediction blocking instruction in a stream of instructions and blocking, by the processor, branch prediction of a predetermined number of branch instructions following the branch-prediction blocking instruction based on the detecting the branch-prediction blocking instruction.
Type:
Grant
Filed:
March 14, 2013
Date of Patent:
February 20, 2018
Assignee:
INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventors:
James J. Bonanno, Ulrich Mayer, Anthony Saporito, Chung-Lung K. Shum, Timothy Slegel
Abstract: A system and method of executing a plurality of threads, including a first thread and a set of remaining threads, on a computer processor core. The system and method includes determining that a start interpretive execution exit condition exists; determining that the computer processor core is within a grace period; and entering by the first thread a start interpretive execution exit sync loop without signaling to any of the set of remaining threads. In turn, the first thread remains in the start interpretive execution exit sync loop until the grace period expires or each of the remaining threads enters a corresponding start interpretive execution exit sync loop.
Type:
Grant
Filed:
September 3, 2015
Date of Patent:
February 20, 2018
Assignee:
INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventors:
Jonathan D. Bradbury, Fadi Y. Busaba, Mark S. Farrell, Charles W. Gainey, Dan F. Greiner, Lisa C. Heller, Jeffrey P. Kubala, Damian L. Osisek, Donald W. Schmidt, Timothy J. Slegel
Abstract: A system and method of executing a plurality of threads, including a first thread and a set of remaining threads, on a computer processor core. The system and method includes determining that a start interpretive execution exit condition exists; determining that the computer processor core is within a grace period; and entering by the first thread a start interpretive execution exit sync loop without signaling to any of the set of remaining threads. In turn, the first thread remains in the start interpretive execution exit sync loop until the grace period expires or each of the remaining threads enters a corresponding start interpretive execution exit sync loop.
Type:
Grant
Filed:
October 20, 2014
Date of Patent:
February 20, 2018
Assignee:
INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventors:
Jonathan D. Bradbury, Fadi Y. Busaba, Mark S. Farrell, Charles W. Gainey, Jr., Dan F. Greiner, Lisa C. Heller, Jeffrey P. Kubala, Damian L. Osisek, Donald W. Schmidt, Timothy J. Slegel
Abstract: Embodiments relate to selectively blocking branch instruction predictions. An aspect includes a computer system for performing selective branch prediction. The system includes memory and a processor, and the system is configured to perform a method. The method includes detecting a branch-prediction blocking instruction in a stream of instructions and blocking branch prediction of a predetermined number of branch instructions following the branch-prediction blocking instruction based on the detecting the branch-prediction blocking instruction.
Type:
Grant
Filed:
June 15, 2012
Date of Patent:
February 13, 2018
Assignee:
INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventors:
James J. Bonanno, Ulrich Mayer, Anthony Saporito, Chung-Lung K. Shum, Timothy J. Slegel
Abstract: An array processor includes a managing element having a load streaming unit coupled to multiple processing elements. The load streaming unit provides input data portions to each of a first subset of processing elements and receives output data from each of a second subset of the processing elements based on a comparatively sorted combination of the input data portions. Each processing element is configurable by the managing element to compare input data portions received from the load streaming unit or two or more of the other processing elements. Each processing unit can further select an input data portion to be output data based on the comparison, and in response to selecting the input data portion, remove a queue entry corresponding to the selected input data portion. Each processing element can provide the selected output data portion to the managing element or as an input to one of the processing elements.
Type:
Grant
Filed:
October 31, 2014
Date of Patent:
February 13, 2018
Assignee:
International Business Machines Corporation
Inventors:
Ganesh Balakrishnan, Bartholomew Blaner, John J. Reilly, Jeffrey A. Stuecheli
Abstract: A processor including a register file having a plurality of registers, and configured for out-of-order instruction execution, further includes a renamer unit that produces generation numbers that are associated with register file addresses to provide a renamed version of a register that is temporally offset from an existing version of that register rather than assigning a non-programmer-visible physical register as the renamed register.
Type:
Grant
Filed:
October 31, 2014
Date of Patent:
December 12, 2017
Assignee:
Avago Technologies General IP (Singapore) Pte. Ltd.
Inventors:
Sophie Wilson, John Redford, Tariq Kurd
Abstract: A processor includes: an instruction fetch portion configured to fetch simultaneously a plurality of fixed-length instructions in accordance with a program counter; an instruction predecoder configured to predecode specific fields in a part of the plurality of fixed-length instructions; and a program counter management portion configured to control an increment of the program counter in accordance with a result of the predecoding.
Abstract: A technique includes receiving a request from a processor to retrieve a first instruction from a memory for a staged execution pipeline. The technique includes selectively retrieving the first instruction from the memory in response to the request based on a determination of whether the processor will execute the first instruction.
Abstract: An instruction sequencing unit in an out-of-order (OOO) processor includes a Most Favored Instruction (MFI) mechanism that designates an instruction as an MFI. The processing queues in the processor identify when they contain the MFI, and assures processing the MFI. The MFI remains the MFI until it is completed or is flushed, and which time the MFI mechanism selects the next MFI.
Type:
Grant
Filed:
October 31, 2016
Date of Patent:
October 24, 2017
Assignee:
International Business Machines Corporation
Inventors:
Maarten J. Boersma, Robert A. Cordes, David A. Hrusecky, Jennifer L. Molnar, Brian W. Thompto, Albert J. Van Norstrand, Jr., Kenneth L. Ward
Abstract: A method and apparatus for zero overheard loops is provided herein. The method includes the steps of identifying, by a decoder, a loop instruction and identifying, by the decoder, a last instruction in a loop body that corresponds to the loop instruction. The method further includes the steps of generating, by the decoder, a branch instruction that returns execution to a beginning of the loop body, and enqueing, by the decoder, the branch instruction into a branch reservation queue concurrently with an enqueing of the last instruction in a reservation queue.
Type:
Grant
Filed:
October 31, 2014
Date of Patent:
October 24, 2017
Assignee:
Avago Technologies General IP (Singapore) Pte. Ltd.
Inventors:
Tariq Kurd, John Redford, Geoffrey Barrett
Abstract: A system, method, and device for stochastically processing data. There is an architect module operating on a processor configured to manage and control stochastic processing of data, a non-deterministic data pool module configured to provide a stream of non-deterministic values that are not derived from a function, a plurality of functionally equivalent data processing modules each configured to stochastically process data as called upon by the architect module, a data feed configured to feed a data set desired to be stochastically processed, and a structure memory module including a memory storage device and configured to provide sufficient information for the architect module to duplicate a predefined processing architecture and to record a utilized processing architecture.
Abstract: Loop vectorization methods and apparatus are disclosed. An example method includes generating a first control mask for a set of iterations of a loop by evaluating a condition of the loop, wherein generating the first control mask includes setting a bit of the control mask to a first value when the condition indicates that an operation of the loop is to be executed, and setting the bit of the first control mask to a second value when the condition indicates that the operation of the loop is to be bypassed. The example method also includes compressing indexes corresponding to the first set of iterations of the loop according to the first control mask.
Type:
Grant
Filed:
September 28, 2012
Date of Patent:
August 22, 2017
Assignee:
Intel Corporation
Inventors:
Christopher J. Hughes, Mikhail Plotnikov, Andrey Naraikin
Abstract: A processor includes a processing unit including a storage module having stored thereon a physical reference list for storing identifications of physical registers that have been referenced by multiple logical registers, and a reclamation module for reclaiming physical registers to a free list based on a count of each of the physical registers on the physical reference list.
Type:
Grant
Filed:
September 28, 2012
Date of Patent:
August 15, 2017
Assignee:
Intel Corporation
Inventors:
Vijaykumar Balaram Kadgi, James D. Hadley, Avinash Sodani, Matthew C. Merten, Morris Marden, Joseph A. McMahon, Grace C. Lee, Laura A. Knauth, Robert S. Chappell, Fariborz Tabesh
Abstract: Provided is a reconfigurable processor capable of reducing the routing processing time of routing nodes by driving the routing nodes at a greater frequency than a driving frequency of the processing elements. The reconfigurable processor includes one or more processing elements configured to be driven at a first driving frequency, and one or more routing nodes configured to be provided on paths that are formed between the processing elements, and to be driven at a second driving frequency that is greater than the first driving frequency.
Type:
Grant
Filed:
July 7, 2011
Date of Patent:
August 8, 2017
Assignee:
Samsung Electronics Co., Ltd.
Inventors:
Bernhard Egger, Taisong Jin, Won-Sub Kim
Abstract: A computing unit is disclosed, comprising a first memory bank for storing input activations and a second memory bank for storing parameters used in performing computations. The computing unit includes at least one cell comprising at least one multiply accumulate (“MAC”) operator that receives parameters from the second memory bank and performs computations. The computing unit further includes a first traversal unit that provides a control signal to the first memory bank to cause an input activation to be provided to a data bus accessible by the MAC operator. The computing unit performs one or more computations associated with at least one element of a data array, the one or more computations being performed by the MAC operator and comprising, in part, a multiply operation of the input activation received from the data bus and a parameter received from the second memory bank.
Type:
Grant
Filed:
March 17, 2017
Date of Patent:
July 18, 2017
Assignee:
Google Inc.
Inventors:
Olivier Temam, Ravi Narayanaswami, Harshit Khaitan, Dong Hyuk Woo
Abstract: Embodiments provide methods, apparatus, systems, and computer readable media associated with predicting predicates and branch targets during execution of programs using combined branch target and predicate predictions. The predictions may be made using one or more prediction control flow graphs which represent predicates in instruction blocks and branches between blocks in a program. The prediction control flow graphs may be structured as trees such that each node in the graphs is associated with a predicate instruction, and each leaf associated with a branch target which jumps to another block. During execution of a block, a prediction generator may take a control point history and generate a prediction. Following the path suggested by the prediction through the tree, both predicate values and branch targets may be predicted. Other embodiments may be described and claimed.
Type:
Grant
Filed:
March 25, 2015
Date of Patent:
July 11, 2017
Assignee:
The Board of Regents of the University of Texas System
Abstract: Vector processing engines (VPEs) employing merging circuitry in data flow paths between execution units and vector data memory to provide in-flight merging of output vector data stored to vector data memory are disclosed. Related vector processing instructions, systems, and methods are also disclosed. Merging circuitry is provided in data flow paths between execution units and vector data memory in the VPE. The merging circuitry is configured to merge an output vector data sample set from execution units as a result of performing vector processing operations in-flight while the output vector data sample set is being provided over the output data flow paths from the execution units to the vector data memory to be stored. The merged output vector data sample set is stored in a merged form in the vector data memory without requiring additional post-processing steps, which may delay subsequent vector processing operations to be performed in execution units.
Abstract: Embodiments include processing systems that determine, based on an instruction address range indicator stored in a first register, whether a next instruction fetch address corresponds to a location within a first memory region associated with a current privilege state or within a second memory region associated with a different privilege state. When the next instruction fetch address is not within the first memory region, the next instruction is allowed to be fetched only when a transition to the different privilege state is legal. In a further embodiment, when a data access address is generated for an instruction, a determination is made, based on a data address range indicator stored in a second register, whether access to a memory location corresponding to the data access address is allowed. The access is allowed when the current privilege state is a privilege state in which access to the memory location is allowed.
Type:
Grant
Filed:
May 31, 2012
Date of Patent:
June 6, 2017
Assignee:
NXP USA, INC.
Inventors:
Daniel M. McCarthy, Joseph C. Circello, Kristen A. Hausman