Abstract: A processor including a register file having a plurality of registers, and configured for out-of-order instruction execution, further includes a renamer unit that produces generation numbers that are associated with register file addresses to provide a renamed version of a register that is temporally offset from an existing version of that register rather than assigning a non-programmer-visible physical register as the renamed register.
Type:
Grant
Filed:
October 31, 2014
Date of Patent:
December 12, 2017
Assignee:
Avago Technologies General IP (Singapore) Pte. Ltd.
Inventors:
Sophie Wilson, John Redford, Tariq Kurd
Abstract: A system and method for efficiently processing instructions in hardware parallel execution lanes within a processor. In response to a given divergent point within an identified loop, a compiler arranges instructions within the identified loop into very large instruction words (VLIW's). At least one VLIW includes instructions intermingled from different basic blocks between the given divergence point and a corresponding convergence point. The compiler generates code wherein when executed assigns at runtime instructions within a given VLIW to multiple parallel execution lanes within a target processor. The target processor includes a single instruction multiple data (SIMD) micro-architecture. The assignment for a given lane is based on branch direction found at runtime for the given lane at the given divergent point. The target processor includes a vector register for storing indications indicating which given instruction within a fetched VLIW for an associated lane to execute.
Abstract: A processor includes a queue for storing instructions processed within the context of a current value of a register field, where for some embodiments the instruction is undefined or defined, depending upon the register field at time of processing. After a write instruction (an instruction that writes to the register field) executes, the queue is searched for any entries that contain instructions that depend upon the executed write instruction. Each such entry stores the value of the register field at the time the instruction in the entry was processed. If such an entry is found in the queue and its stored value of the register field does not match the value that the write instruction wrote to the register field, then the processor flushes the pipeline and restarts at a state so as to correctly execute the instruction.
Type:
Grant
Filed:
March 15, 2013
Date of Patent:
November 21, 2017
Assignee:
QUALCOMM Incorporated
Inventors:
Daren Eugene Streett, Brian Michael Stempel, Thomas Philip Speier, Rodney Wayne Smith, Michael Scott McIlvaine, Kenneth Alan Dockser, James Norris Dieffenderfer
Abstract: Methods and apparatus for a low energy accelerator processor architecture with short parallel instruction word. An integrated circuit includes a system bus having a data width N, where N is a positive integer; a central processor unit coupled to the system bus and configured to execute instructions retrieved from a memory coupled to the system bus; and a low energy accelerator processor coupled to the system bus and configured to execute instruction words retrieved from a low energy accelerator code memory, the low energy accelerator processor having a plurality of execution units including a load store unit, a load coefficient unit, a multiply unit, and a butterfly/adder ALU unit, each of the execution units configured to perform operations responsive to op-codes decoded from the retrieved instruction words, wherein the width of the instruction words is equal to the data width N. Additional methods and apparatus are disclosed.
Type:
Grant
Filed:
April 4, 2015
Date of Patent:
November 14, 2017
Assignees:
TEXAS INSTRUMENTS INCORPORATED, TEXAS INSTRUMENTS DEUTSCHLAND GMBH
Inventors:
Srinivas Lingam, Seok-Jun Lee, Johann Zipperer, Manish Goel
Abstract: The apparatus and method for calculating and retaining a bound on error during floating point operations inserts an additional bounding field into the standard floating-point format that records the retained significant bits of the calculation with notification upon insufficient retention. The bounding field, which accounts for both rounding and cancellation errors, has two parts, the lost bits D Field and the accumulated rounding error R Field. The D Field states the number of bits in the floating point representation that are no longer meaningful. The bounds on the real value represented are determined from the truncated floating point value (first bound) and the addition of the error determined by the number of lost bits (second bound). The true, real value is absolutely contained by the first and second bounds. The allowed loss (optionally programmable) of significant bits provides a fail-safe, real-time notification of loss of significant bits.
Abstract: In one embodiment, a processor includes an instruction decoder to receive and decode an instruction having a prefix and an opcode, an execution unit to execute the instruction based on the opcode, and flag modification override logic to prevent the execution unit from modifying a flag register of the processor based on the prefix of the instruction.
Type:
Grant
Filed:
November 4, 2011
Date of Patent:
November 7, 2017
Assignee:
Intel Corporation
Inventors:
Jonathan D. Combs, Jason W. Brandt, Robert Valentine
Abstract: In one embodiment, a processor includes an instruction decoder to receive a first instruction having a prefix and an opcode and to generate, by an instruction decoder of the processor, a second instruction executable based on a condition determined based on the prefix, and an execution unit to conditionally execute the second instruction based on the condition determined based on the prefix.
Type:
Grant
Filed:
November 30, 2011
Date of Patent:
October 31, 2017
Assignee:
Intel Corporation
Inventors:
Jonathan D. Combs, Jason W. Brandt, Robert Valentine, Kevin B. Smith, Zia Ansari, Maxim Loktyukhin
Abstract: A method and apparatus for zero overheard loops is provided herein. The method includes the steps of identifying, by a decoder, a loop instruction and identifying, by the decoder, a last instruction in a loop body that corresponds to the loop instruction. The method further includes the steps of generating, by the decoder, a branch instruction that returns execution to a beginning of the loop body, and enqueing, by the decoder, the branch instruction into a branch reservation queue concurrently with an enqueing of the last instruction in a reservation queue.
Type:
Grant
Filed:
October 31, 2014
Date of Patent:
October 24, 2017
Assignee:
Avago Technologies General IP (Singapore) Pte. Ltd.
Inventors:
Tariq Kurd, John Redford, Geoffrey Barrett
Abstract: An instruction sequencing unit in an out-of-order (OOO) processor includes a Most Favored Instruction (MFI) mechanism that designates an instruction as an MFI. The processing queues in the processor identify when they contain the MFI, and assures processing the MFI. The MFI remains the MFI until it is completed or is flushed, and which time the MFI mechanism selects the next MFI.
Type:
Grant
Filed:
October 31, 2016
Date of Patent:
October 24, 2017
Assignee:
International Business Machines Corporation
Inventors:
Maarten J. Boersma, Robert A. Cordes, David A. Hrusecky, Jennifer L. Molnar, Brian W. Thompto, Albert J. Van Norstrand, Jr., Kenneth L. Ward
Abstract: A TRANSACTION BEGIN instruction begins execution of a transaction and includes a general register save mask having bits, that when set, indicate registers to be saved in the event the transaction is aborted. At the beginning of the transaction, contents of the registers are saved in memory not accessible to the program, and if the transaction is aborted, the saved contents are copied to the registers.
Type:
Grant
Filed:
May 20, 2016
Date of Patent:
October 17, 2017
Assignee:
INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventors:
Dan F. Greiner, Christian Jacobi, Timothy J. Slegel
Abstract: An engine architecture for processing finite automata includes a hyper non-deterministic automata (HNA) processor specialized for non-deterministic finite automata (NFA) processing. The HNA processor includes a plurality of super-clusters and an HNA scheduler. Each super-cluster includes a plurality of clusters. Each cluster of the plurality of clusters includes a plurality of HNA processing units (HPUs). A corresponding plurality of HPUs of a corresponding plurality of clusters of at least one selected super-cluster is available as a resource pool of HPUs to the HNA scheduler for assignment of at least one HNA instruction to enable acceleration of a match of at least one regular expression pattern in an input stream received from a network.
Type:
Grant
Filed:
July 8, 2014
Date of Patent:
October 10, 2017
Assignee:
Cavium, Inc.
Inventors:
Rajan Goyal, Satyanarayana Lakshmipathi Billa, Yossef Shanava, Gregg A. Bouchard, Timothy Toshio Nakada
Abstract: A method for operating a control unit, the control unit including a software-controlled main processing unit, a strictly hardware-based model calculation unit for calculating an algorithm, for carrying out a Bayesian regression method, based on configuration data, and a memory unit, a model memory area being defined in the memory unit to which a configuration register block for providing the configuration data in the model calculation unit is assigned, a calculation start-configuration register being assigned the highest address in the configuration register block into which configuration data are written, the writing into of which starts the calculation in the model calculation unit, the configuration data being written in a memory area of the memory unit from the model memory area into the configuration register block with an incremental copying process, the addresses being copied in the incremental copying process in ascending order.
Type:
Grant
Filed:
July 1, 2014
Date of Patent:
October 10, 2017
Assignee:
ROBERT BOSCH GMBH
Inventors:
Heiner Markert, Wolfgang Fischer, Nico Bannow, Andre Guntoro, Michael Hanselmann
Abstract: A system, method, and device for stochastically processing data. There is an architect module operating on a processor configured to manage and control stochastic processing of data, a non-deterministic data pool module configured to provide a stream of non-deterministic values that are not derived from a function, a plurality of functionally equivalent data processing modules each configured to stochastically process data as called upon by the architect module, a data feed configured to feed a data set desired to be stochastically processed, and a structure memory module including a memory storage device and configured to provide sufficient information for the architect module to duplicate a predefined processing architecture and to record a utilized processing architecture.
Abstract: An apparatus for frequency measurement (1ODMTM) which provides precise and accurate measurement of a single input tone frequency and/or multiple separable input tone frequencies. Tone separability can be achieved by proper selection of the parameter N, the sample length of the DFT/FFT.
Type:
Grant
Filed:
June 10, 2015
Date of Patent:
October 3, 2017
Assignee:
The United States of America as represented by the Secretary of the Air Force
Abstract: An apparatus is provided. The apparatus comprises a polynomial register having a plurality of bits, a first bus, a second bus, and a transceiver that is coupled to the first bus, the second bus, and the polynomial register. The polynomial register is configured to store a user-defined polynomial, and the transceiver includes a pseudorandom bit sequence (PRBS) generator is configured to generate a scrambled signal from the user-defined polynomial and a PRBS checker that is configured to generate a descrambled signal from a second signal using the user-defined polynomial.
Abstract: Circuitry operating under a floating-point mode or a fixed-point mode includes a first circuit accepting a first data input and generating a first data output. The first circuit includes a first arithmetic element accepting the first data input, a plurality of pipeline registers disposed in connection with the first arithmetic element, and a cascade register that outputs the first data output. The circuitry further includes a second circuit accepting a second data input and generating a second data output. The second circuit is cascaded to the first circuit such that the first data output is connected to the second data input via the cascade register. The cascade register is selectively bypassed when the first circuit is operated under the fixed-point mode.
Abstract: In one example, a method includes responsive to receiving, by a processing unit, one or more instructions requesting that a first value be moved from a first general purpose register (GPR) to a third GPR and that a second value be moved from a second GPR to a fourth GPR, copying, by an initial logic unit and during a first clock cycle, the first value to an initial pipeline register, copying, by the initial logic and during a second clock cycle, the second value to the initial pipeline register, copying, by a final logic unit and during a third clock cycle, the first value from a final pipeline register to the third GPR, and copying, by the final logic unit and during a fourth clock cycle, the second value from the final pipeline register to the fourth GPR.
Type:
Grant
Filed:
May 12, 2014
Date of Patent:
August 29, 2017
Assignee:
QUALCOMM Incorporated
Inventors:
Lin Chen, Yun Du, Sumesh Udayakumaran, Chihong Zhang, Andrew Evan Gruber
Abstract: Disclosed are an opportunistic multi-thread method and processor, the method comprising the following steps: if a zeroth thread, a first thread, a second thread and a third thread all have instructions ready to be executed, then a zeroth clock period, a first clock period, a second clock period and a third clock period are respectively allocated to the zeroth thread, the first thread, the second thread and the third thread; if one of the threads cannot issue an instruction within a specified clock period because the instruction is not ready, and the previous thread still has an instruction ready to be executed after issuing certain instructions in the previous specified clock period, then the previous thread will take the specified clock period.
Abstract: Loop vectorization methods and apparatus are disclosed. An example method includes generating a first control mask for a set of iterations of a loop by evaluating a condition of the loop, wherein generating the first control mask includes setting a bit of the control mask to a first value when the condition indicates that an operation of the loop is to be executed, and setting the bit of the first control mask to a second value when the condition indicates that the operation of the loop is to be bypassed. The example method also includes compressing indexes corresponding to the first set of iterations of the loop according to the first control mask.
Type:
Grant
Filed:
September 28, 2012
Date of Patent:
August 22, 2017
Assignee:
Intel Corporation
Inventors:
Christopher J. Hughes, Mikhail Plotnikov, Andrey Naraikin
Abstract: A processor includes a processing unit including a storage module having stored thereon a physical reference list for storing identifications of physical registers that have been referenced by multiple logical registers, and a reclamation module for reclaiming physical registers to a free list based on a count of each of the physical registers on the physical reference list.
Type:
Grant
Filed:
September 28, 2012
Date of Patent:
August 15, 2017
Assignee:
Intel Corporation
Inventors:
Vijaykumar Balaram Kadgi, James D. Hadley, Avinash Sodani, Matthew C. Merten, Morris Marden, Joseph A. McMahon, Grace C. Lee, Laura A. Knauth, Robert S. Chappell, Fariborz Tabesh