Abstract: A branch processing unit (BPU) is used, in an exemplary embodiment, in a superscalar, superpipelined microprocessor compatible with the x86 instruction set architecture. The BPU implements a branch prediction scheme using a target cache and a separate history cache. The target cache stores target addressing information and history information for predicted taken branches. The history cache stores history information only for predicted not-taken branches. The exemplary embodiment uses a two-bit prediction algorithm such that the target cache and the history cache need only story a single history bit (to differentiate between strong and weak states of respectively predicted taken and not-taken branches).
Abstract: A branch processing unit (BPU) is used, in an exemplary embodiment, in a superscalar, superpipelined microprocessor compatible with the x86 instruction set architecture. The BPU includes a target cache organized in banks to support split prefetching. Prefetch requests (addressing a prefetch block of 16 bytes) are separated into low and high block addresses (addressing split blocks of 8 bytes). The low and high block addresses differ in bit position ?3! designated a bank select bit, where the low block address of an associated prefetch request may be designated by a ?1 or 0! such that a split block associated with a low block address may be allocated into either bank of the target cache (i.e., the low block of a prefetch request can start on an 8 byte alignment rather than the 16 byte alignment).
Abstract: A method of data communication between asynchronous processes of a computer system is disclosed in connection with a cache coherency system for a processor-cache used in a multi-master computer system in which bus arbitration signals either are not available to the processor-cache, or are not exclusively relied on by the processor-cache to assure validity of the data in the cache (e.g., a 386-bus compatible computer system using an external secondary cache in which bus arbitration signals are only connected to and used by the secondary cache controller). In an exemplary external-chip implementation, the cache coherency system (120) comprises two PLAs--a FLUSH module (122) and a WAVESHAPING module (124). The FLUSH module (a) receives selected bus cycle definition and control signals from a microprocessor ((110), (b) detects FLUSH (cache invalidation) conditions, i.e., bus master synchronization events, and for each such FLUSH condition, (c) provides a FLUSH output signal.
Type:
Grant
Filed:
October 1, 1993
Date of Patent:
March 3, 1998
Assignee:
Cyrix Corporation
Inventors:
Thomas D. Selgas, Thomas B. Brightman, William C. Patton, Jr.
Abstract: A branch processing unit (BPU) is used, in an exemplary embodiment, in a superscalar, superpipelined microprocessor compatible with the x86 instruction set architecture. The BPU includes a return stack for call/returns, including return stack pointer repair in the case of the failure of a call/return to confirm (decode) or resolve. Return stack control logic maintains a return stack pointer, incrementing and decrementing the return stack pointer respectively for call/return pairs that hit in the target cache--in addition, the return stack control logic maintains two additional stack pointers used for repair: (a) a confirmation pointer that is incremented when a call is decoded and decremented when a return is decoded; and (b) a resolution pointer that is incremented when a call resolves, and decremented when a return resolves.
Abstract: A pipelined 32 bit x86 processor including a prefetch unit and a branch unit. During sequential prefetching, the prefetch unit increments a prefetch physical address PFPA and a corresponding prefetch linear address PFLA--for each prefetch address, the PFLA is compared with the code segment limit linear address CSLA to determine if the corresponding prefetch block of 16 instruction bytes (cache line) contains the segment limit. If a COF hits in the branch unit, it outputs corresponding target address information used to generate a prefetch address--this target address information includes bits ?11:0! of the target address (which are the same for the target physical address), i.e., the branch unit does not provide a full PFLA for comparison with the CSLA.
Abstract: A prefetch unit includes flow control for controlling the transfer of instruction bytes from a prefetch buffer to a decoder where the prefetch buffer includes predicted change of flow instructions. Instruction bytes in the prefetch buffer are arranged in prefetch blocks--associated with each prefetch block is a flow control bit. When the transfer of instruction bytes from a current prefetch block is complete, the flow control bit is checked--if the flow control bit is set to indicate that the prefetch clock includes a predicted COF instruction, instruction bytes will not be transferred from the next prefetch block unless the predicted COF instruction is confirmed as having been decoded. This flow control avoids the complexity of maintaining information to repair the prefetcher and decoder if the predicted COF instruction is not decoded.
Abstract: Circuitry and methodology for pulse capture employs S-R latch, precharge, and switch circuitries for quickly sensing and capturing a logic pulse from dynamic logic circuitry. The present invention while having general application to any dynamic logic circuitry has particular application to random access memory (RAM), content addressable memory (CAM), and adder circuitries.
Abstract: A method of detecting anomalous overflow conditions is used, in an exemplary embodiment, in implementing in a 486-type microprocessor, nonrestoring two's complement division for negative quotients using 2n bit dividends and n bit divisors. Each interative division step, an adder/subtractor is used to add/subtract the properly aligned divisor to/from the left shifted dividend, to produce a partial remainder and a carry out bit Cout. Complement Cout is assumed to be the same as the most significant bit of the partial remainder PR(MSB), such that PR(MSB) is used as the sign bit in further computations, with complement Cout being used to control quotient generation according to DVRS XOR Cout. The anomalous overflow test signals overflow when complement Cout is the different than the most significant bit of the first partial remainder PR1(MSB), such that the anomalous overflow test is implemented according to the logic equation: Cout XNOR PR1(MSB).
Abstract: A programmable phase shift clock generator is disclosed including a phase comparator, an up-down counter, a ring oscillator, and an adjustable delay line for determining a digital signature of an input clock and precisely generating a phase shifted clock signal.
Abstract: A system for the early detection of overflow or exceptional quotient/remainder pairs is used in conjunction with performing nonrestoring division using two's complement 2n bit dividends N and two's complement n bit divisors D--if early overflow is not signaled, and if an exceptional quotient/remainder pair is not detected, a quotient Q and remainder R are obtained by successive iterative partial remainder computations, which may be performed with no possibility of overflow. The detection system uses only the divisor, dividend, and first partial remainder. Early overflow detection uses three tests (FIGS. 2a, 2b, 2c): an exceptional divisor test, an exceptional dividend test, and an exceptional quotient test. Early exceptional quotient/remainder pair detection provides, when overflow is not signaled, exceptional quotient/remainder pairs using the exceptional divisor test for the exceptional divisor -2.sup.n-1 (FIG. 2c) and the exceptional quotient test for the exceptional quotient -2.sup.n-1 (FIG. 2b).
Abstract: A write-back coherency system, including FLUSH/INVAL and LOCK protocols, is used, in an exemplary embodiment, in a microprocessor used in a computer system that selectively provides to the processor FLUSH and INVAL signals to implement a limited write-back protocol. The FLUSH/INVAL protocol is used by the computer system to control export and invalidate operations. In response to a FLUSH signal, the microprocessor exports dirty data from the cache. If INVAL is also asserted, the cache is also invalidated (i.e., if FLUSH is asserted and INVAL is not asserted, no invalidation is performed). With the LOCK protocol, LOCKed reads are serviced out of the cache for read hits--however, to maintain compatibility with computer systems that expect a LOCK operation to involve a read followed by a write access to external memory, the microprocessor will still run the external LOCKed read cycle, ignoring the returned data.
Type:
Grant
Filed:
November 12, 1993
Date of Patent:
September 2, 1997
Assignee:
Cyrix Corporation
Inventors:
Marvin Wayne Martinez, Jr., Mark W. Bluhm, Jeffrey S. Byrne, David A. Courtright, Douglas Ewing Duschatko, Raul A. Garibay, Jr., Margaret R. Herubin
Abstract: A numeric processor includes a multiply-add circuit with redundant value interface circuitry for performing mathematical function computations as a succession of product sums using redundant binary format values (such as signed digit) as the multiplicand and/or the addend inputs to the multiply-add circuit. The redundant value interface circuitry (i) extracts a predetermined number of bits from a redundant product sum to form a redundant truncated product sum, and (ii) couples the redundant truncated product sum to either, or both, multiplicand and addend inputs. In this manner, successive redundant product sums are calculated using without conversion to nonredundant binary format. In a preferred embodiment, the numeric processor includes a single multiply-add circuit, with redundant truncated product sum values being fed back to the multiplicand and/or addend inputs.
Type:
Grant
Filed:
July 11, 1994
Date of Patent:
August 19, 1997
Assignee:
Cyrix Corporation
Inventors:
Willard Stuart Briggs, David William Matula
Abstract: A processor includes storage circuitry for storing an instruction and memory circuitry addressable by a microaddress for outputting a microinstruction in response to the microaddress. The processor further includes sequencing circuitry coupled to provide the microaddress to the memory circuitry. Finally, the processor includes decode circuitry coupled to the storage circuitry for detecting whether the instruction stored in the storage circuitry comprises a single clock instruction before the memory circuit outputs the microinstruction, and for indicating to the sequencing circuitry in response to detecting whether the instruction stored in the storage circuitry comprises a single clock instruction.
Type:
Grant
Filed:
October 18, 1993
Date of Patent:
July 1, 1997
Assignee:
Cyrix Corporation
Inventors:
Mark W. Bluhm, Mark W. Hervin, Steven C. McMahan, Raul A. Garibay, Jr.
Abstract: Burst ordering logic is used, in an exemplary embodiment, to implement an ascending only burst ordering for cache line fills in 486 computer systems while maintaining compatibility with the conventional 486 burst ordering which uses both ascending and descending burst orders depending upon the position of the requested address (critical Dword) within a cache line (conventional 486 burst ordering is illustrated in Table 1 in the Background). The burst ordering logic (60) implements a 1+4 burst ordering for requested addresses that, for conventional 486 burst ordering, would result in a descending burst order (the exemplary 1+4 burst ordering is illustrated in Table 2 in the Specification). The burst ordering logic includes request modification circuitry (64), address modification circuitry (66), and cacheability modification circuitry (68).
Type:
Grant
Filed:
October 28, 1994
Date of Patent:
July 1, 1997
Assignee:
Cyrix Corporation
Inventors:
David A. Courtright, Douglas Ewing Duschatko
Abstract: An adjustable duty cycle clock generator has first and second delay lines coupled to receive an input clock and cascaded to first and second edge detectors, respectively. The second delay line has a programmable delay and the first and second edge detectors are further coupled to set and reset inputs on an S-R latch to generate an adjustable duty cycle clock with independently adjustable high and low times proportional to the induced delays of the first and second delay lines.
Abstract: A processing unit includes a plurality of subcircuits and circuitry for generating clock signals thereto. Detecting circuitry detects the assertion of a first signal indicative of a request for suspending operation of the processing unit and the assertion of a second signal indicating the state of operation of a coprocessing unit. Disabling circuitry is operable to disable clock signals to one or more of the subcircuits responsive to the first and second control signals.
Type:
Grant
Filed:
March 27, 1992
Date of Patent:
May 20, 1997
Assignee:
Cyrix Corporation
Inventors:
Robert Maher, Raul A. Garibay, Jr., Margaret R. Herubin, Mark Bluhm
Abstract: A microprocessor comprises one or more instruction pipelines having a plurality of stages for processing a stream of instructions, wherein one or more of the instructions reference a defined set of logical registers having multiple addressable sizes as sources and destinations of operands for the instruction. A plurality of physical registers are provided in excess of the number of defined set of logical registers. Physical registers are selectively allocated to one of said defined set of logical registers responsive to an instruction for writing to said one of said logical registers and the size associated with the logical register.
Abstract: A processing unit includes a plurality of subcircuits and circuitry for generating clock signals thereto. Detection circuitry detects the assertion of a control signal and disabling circuitry is operable to disable the clock signals to one or more of the subcircuits responsive to the control signal.
Type:
Grant
Filed:
September 22, 1994
Date of Patent:
May 13, 1997
Assignee:
Cyrix Corporation
Inventors:
Robert Maher, Raul A. Garibay, Jr., Margaret R. Herubin, Mark Bluhm
Abstract: An integrated circuit extraction tool for extracting sockets or microprocessors having a staggered pin grid array (SPGA) pin arrangement. Such tool includes an elongated base having a first end and a second end, each end forming a set of teeth that permit entry and extension of the teeth, diagonally, through the staggered pins of the socket or microprocessor. In the preferred embodiment, the first end is disposed at ninety degree with respect to the elongated base. Further, the elongated base is formed with a curvature to enhance the leverage action necessary for an extraction operation.
Type:
Grant
Filed:
December 16, 1994
Date of Patent:
April 8, 1997
Assignee:
Cyrix Corporation
Inventors:
Stanley D. Harder, Thomas D. Selgas, Jr.