Processor micro-architecture for compute, save or restore multiple registers, devices, systems, methods and processes of manufacture
An electronic circuit (4000) includes a bias value generator circuit (3900) operable to supply a varying bias value in a programmable range, and an instruction circuit (3625, 4010) responsive to a first instruction to program the range of the bias value generator circuit (3900) and further responsive to a second instruction having an operand to repeatedly issue the second instruction with the operand varied in an operand value range determined as a function of the varying bias value.
Latest TEXAS INSTRUMENTS INCORPORATED Patents:
This application is a divisional of U.S. patent application Ser. No. 15/379,515 filed on Dec. 15, 2016, which is a divisional of U.S. patent application Ser. No. 14/215,412 filed on Mar. 17, 2014 (now U.S. Pat. No. 9,557,992), which is a divisional of U.S. patent application Ser. No. 13/247,101 filed on Sep. 28, 2011 (now U.S. Pat. No. 8,713,293), which is a divisional of U.S. patent application Ser. No. 12/125,431 filed on May 22, 2008 (now U.S. Pat. No. 8,055,886), which claims priority to U.S. Provisional Patent Application No. 60/949,426, filed on Jul. 12, 2007, titled “Processor Micro-Architecture for Compute, Save or Restore Multiple Registers, Devices, Systems, Methods and Processes of Manufacture,” all of which are incorporated by reference herein.
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENTNot applicable.
COPYRIGHT NOTIFICATIONPortions of this patent application contain materials that are subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document, or the patent disclosure, as it appears in the United States Patent and Trademark Office, but otherwise reserves all copyright rights whatsoever.
BACKGROUNDThis invention is in the field of electronic computing hardware and software and communications, and is more specifically directed to improved circuits, devices, and systems for power management and information and communication processing, and processes of operating and making them. Without limitation, the background is further described in connection with communications processing.
Wireline and wireless communications, of many types, have gained increasing popularity in recent years. The personal computer with a wireline modem such as DSL (digital subscriber line) modem or cable modem communicates with other computers over networks. The mobile wireless (or cellular) telephone has become ubiquitous around the world. Mobile telephony has recently begun to communicate video and digital data, and voice over packet (VoP or VoIP), in addition to cellular voice. Wireless modems for communicating computer data over a wide area network are also available.
Mobile video on cellular telephones and other mobile platforms is increasing in popularity. It is desirable that many streams of information such as video, voice and data should be flexibly handled by such mobile devices and platforms under power management.
Wireless data communications in wireless mesh networks, such as those operating according to the IEEE 802.16 standard or “WiMax,” are increasing over a widening installed base of installations. The wireless mesh networks offer wideband multi-media transmission and reception that also appear to call for substantial computing power and hardware. Numerous other wireless technologies exist and are emerging about which various burdens and demands for power management exist and will arise.
Security techniques are used to improve the security of retail and other business commercial transactions in electronic commerce and to improve the security of communications wherever personal and/or commercial privacy is desirable. Security is important in both wireline and wireless communications and apparently imposes still further demands for computing power and hardware and compatible power management.
Processors of various types, including DSP (digital signal processing) chips, RISC (reduced instruction set computing), information storage memories and/or other integrated circuit blocks and devices are important to these systems and applications. Containing or reducing energy dissipation and the cost of manufacture and providing a variety of circuit and system products with performance features for different market segments are important goals in DSPs, integrated circuits generally and system-on-a-chip (SOC) design.
Further advantageous solutions and alternative solutions would, accordingly, be desirable in the art.
SUMMARYGenerally and in one form of the invention, an electronic circuit includes a bias value generator circuit operable to supply a varying bias value in a programmable range, and an instruction circuit responsive to a first instruction to program the range of the bias value generator circuit and further responsive to a second instruction having an operand to repeatedly issue the second instruction with the operand varied in an operand value range determined as a function of the varying bias value.
Generally and in another form of the invention, a processor for electronic computing includes an instruction register, an instruction decoder having a decoded instruction output with an instruction operand output, the instruction decoder operable to successively decode a repeat instruction and a repeated instruction having an operand, a pipeline having pipestages including a particular pipestage coupled to the decoded instruction output, and a repeating instruction circuit coupled between the instruction decoder and the particular pipestage, the repeating instruction circuit responsive to the repeat instruction to program an operand value range and also responsive to the repeated instruction and its operand to vary the value of the operand over the operand value range and deliver the varying value of the operand to the particular pipestage.
Generally and in a further form of the invention, an electronic circuit includes an instruction circuit operable to provide a push instruction having an immediate constant, a count register operable to hold a changing count, a destination stack, and push instruction execution circuitry operable to dynamically push data to the destination stack in response to the immediate constant from the instruction circuit biased with the changing count from the count register.
Generally and in a process form of the invention, a process of operating an electronic circuit, includes supplying a varying counter value in a programmable range, and responding to a first instruction to program the range and responding to a second instruction having an associated operand to repeatedly vary the operand in an operand value range determined as a function of the counter value varying in the programmable range.
Generally and in another process form of the invention, a process of operating a processor having a pipeline for electronic computing, includes successively delivering a repeat instruction and a repeating instruction having an operand, responding to the repeat instruction to program an operand value range, and responding to the repeated instruction and its operand to repeatedly vary the value of the operand in the operand value range and to deliver the repeatedly varied value of the operand to the pipeline.
Generally and in yet another form of the invention, an electronic circuit includes a memory, a set of longer width and shorter width storage elements, an instruction operand value generating circuit operable to generate a succession of values in an operand value range, an address pipeline coupled to the instruction operand value generating circuit and operable to use the succession of values to access a succession of memory locations in the memory, and selection circuitry also coupled to the instruction operand value generating circuit and operable to concurrently use the same succession of values to access the set of longer width and shorter width storage elements and thereby effectuate transfers of information between the set and the memory.
Generally and in an additional form of the invention, a processing system includes a printed circuit board, a volatile memory, a processor on the printed circuit board for electronic computing coupled to the volatile memory and the processor including a pipeline and a set of longer width and shorter width storage elements, a nonvolatile memory elsewhere on the printed circuit board and coupled to the processor, for holding representations of instructions for the instruction register to save and restore the set of longer width and shorter width storage elements to the volatile memory, the instructions including a repeat instruction as well as a repeated instruction having an operand, the processor further including an instruction operand value generating circuit operable to generate values varying in an operand value range and biasedly related to the operand of the repeated instruction represented in the nonvolatile memory, and selection circuitry in the pipeline coupled to the instruction operand value generating circuit and operable to use the values to access the set of longer width and shorter width storage elements, and thereby facilitate transfers of information between the set and the volatile memory.
Generally and in yet another form of the invention, an electronic debugging circuit includes a bias value generator circuit operable to supply a varying bias value in a programmable range and having a counter register, a pipeline register, an instruction circuit responsive to a first instruction to program the range of the bias value generator circuit and further responsive to a second instruction having an operand to repeatedly issue the second instruction to the pipeline register with the operand varied in an operand value range determined as a function of the varying bias value, and a scan controller having at least one scan path linking the counter register and the pipeline register to the scan controller.
Generally and in another further process form of the invention, a process of manufacturing includes fabricating structures on an integrated circuit wafer defining both a bias value generator circuit having a programmable range and an instruction circuit coupled to the bias value generator circuit, and electrically testing the structures to verify that the instruction circuit is responsive to a first instruction to program the range of the bias value generator circuit and that the bias value generator circuit supplies a varying bias value in the programmed range and that the instruction circuit is further responsive to a second instruction having an operand to repeatedly issue the second instruction with the operand varied in an operand value range determined as a function of the varying bias value.
These and other circuit, device, system, apparatus, process, and other forms of the invention are disclosed and claimed.
Corresponding numerals in different figures indicate corresponding parts except where the context indicates otherwise. Some otherwise-identical designations may inadvertently have different characters or portions upper case or lower case in different parts of the description and drawings, and such otherwise-identical designations indicate the corresponding parts except where the context indicates otherwise.
DETAILED DESCRIPTION OF EMBODIMENTSIn
In this way, advanced networking capability for services, software, and content, such as cellular telephony and data, audio, music, voice, video, e-mail, gaming, security, e-commerce, file transfer and other data services, internet, world wide web browsing, TCP/IP (transmission control protocol/Internet protocol), voice over packet and voice over Internet protocol (VoP/VoIP), and other services accommodates and provides security for secure utilization and entertainment appropriate to the just-listed and other particular applications.
The embodiments, applications and system blocks disclosed herein are suitably implemented in fixed, portable, mobile, automotive, seaborne, and airborne, communications, control, set top box 2092, television 2094 (receiver or two-way TV), and other apparatus. The personal computer (PC) 2070 is suitably implemented in any form factor such as desktop, laptop, palmtop, organizer, mobile phone handset, PDA personal digital assistant 2096, internet appliance, wearable computer, content player, personal area network, or other type.
For example, handset 2010 is improved for selectively determinable functionality, performance, security and economy when manufactured. Handset 2010 is interoperable and able to communicate with all other similarly improved and unimproved system blocks of communications system 2000. Camera 1490 provides video pickup for cell phone 1020 to send over the internet to cell phone 2010′, PDA 2096, TV 2094, and to a monitor of PC 2070 via any one, some or all of cellular base station 2050, DVB station 2020, WLAN AP 2060, STB 2092, and WLAN gateway 2080. Handset 2010 has a video storage, such as hard drive, high density memory, and/or compact disk (CD) in the handset for digital video recording (DVR) such as for delayed reproduction, transcoding, and retransmission of video to other handsets and other destinations.
On a cell phone printed circuit board (PCB) 1020 in handset 2010, is provided a higher-security processor integrated circuit 1022, an external flash memory 1025 and SDRAM 1024, and a serial interface 1026. Serial interface 1026 is suitably a wireline interface, such as a USB interface connected by a USB line to the personal computer 1070 and magnetic and/or optical media 2075 when the user desires and for reception of software intercommunication and updating of information between the personal computer 2070 (or other originating sources external to the handset 2010) and the handset 2010. Such intercommunication and updating also occur via a processor in the cell phone 2010 itself such as for cellular modem, WLAN, Bluetooth from a website 2055 or 2065, or other circuitry 1028 for wireless or wireline modem processor, digital television and physical layer (PHY).
In
The words “internal” and “external” as applied to a circuit or chip respectively refer to being on-chip or off-chip of the applications processor chip 1022. All items are assumed to be internal to an apparatus (such as a handset, base station, access point, gateway, PC, or other apparatus) except where the words “external to” are used with the name of the apparatus, such as “external to the handset.”
ROM 1032 provides a boot storage having boot code that is executable in at least one type of boot sequence. One or more of RAM 1034, internal flash 1036, and external flash 1024 are also suitably used to supplement ROM 1032 for boot storage purposes.
It is contemplated that the skilled worker uses each of the integrated circuits shown in
In
Digital circuitry 1150 on integrated circuit 1100 supports and provides wireless interfaces for any one or more of GSM, GPRS, EDGE, UMTS, and OFDMA/MIMO (Global System for Mobile communications, General Packet Radio Service, Enhanced Data Rates for Global Evolution, Universal Mobile Telecommunications System, Orthogonal Frequency Division Multiple Access and Multiple Input Multiple Output Antennas) wireless, with or without high speed digital data service, via an analog baseband chip 1200 and GSM/CDMA transmit/receive chip 1300. Digital circuitry 1150 includes a ciphering processor CRYPT for GSM ciphering and/or other encryption/decryption purposes. Blocks TPU (Time Processing Unit real-time sequencer), TSP (Time Serial Port), GEA (GPRS Encryption Algorithm block for ciphering at LLC logical link layer), RIF (Radio Interface), and SPI (Serial Port Interface) are included in digital circuitry 1150.
Digital circuitry 1160 provides codec for CDMA (Code Division Multiple Access), CDMA2000, and/or WCDMA (wideband CDMA or UMTS) wireless suitably with HSDPA/HSUPA (High Speed Downlink Packet Access, High Speed Uplink Packet Access) (or 1×EV-DV, 1×EV-DO or 3×EV-DV) data feature via the analog baseband chip 1200 and RF GSM/CDMA chip 1300. Digital circuitry 1160 includes blocks MRC (maximal ratio combiner for multipath symbol combining), ENC (encryption/decryption), RX (downlink receive channel decoding, de-interleaving, viterbi decoding and turbo decoding) and TX (uplink transmit convolutional encoding, turbo encoding, interleaving and channelizing.). Blocks for uplink and downlink processes of WCDMA are provided.
Audio/voice block 1170 supports audio and voice functions and interfacing. Speech/voice codec(s) are suitably provided in memory space in audio/voice block 1170 for processing by processor(s) 1110. An applications interface block 1180 couples the digital baseband chip 1100 to an applications processor 1400. Also, a serial interface in block 1180 interfaces from parallel digital busses on chip 1100 to USB (Universal Serial Bus) of PC (personal computer) 2070. The serial interface includes UARTs (universal asynchronous receiver/transmitter circuit) for performing the conversion of data between parallel and serial lines. A power resets and control module 1185 provides power management circuitry for chip 1100. Chip 1100 is coupled to location-determining circuitry 1190 for GPS (Global Positioning System). Chip 1100 is also coupled to a USIM (UMTS Subscriber Identity Module) 1195 or other SIM for user insertion of an identifying plastic card, or other storage element, or for sensing biometric information to identify the user and activate features.
In
An audio block 1220 has audio I/O (input/output) circuits to a speaker 1222, a microphone 1224, and headphones (not shown). Audio block 1220 has an analog-to-digital converter (ADC) coupled to the voice codec and a stereo DAC (digital to analog converter) for a signal path to the baseband block 1210 including audio/voice block 1170, and with suitable encryption/decryption activated.
A control interface 1230 has a primary host interface (I/F) and a secondary host interface to DBB-related integrated circuit 1100 of
A power conversion block 1240 includes buck voltage conversion circuitry for DC-to-DC conversion, and low-dropout (LDO) voltage regulators for power management/sleep mode of respective parts of the chip regulated by the LDOs. Power conversion block 1240 provides information to and is responsive to a power control state machine between the power conversion block 1240 and circuits 1250.
Circuits 1250 provide oscillator circuitry for clocking chip 1200. The oscillators have frequencies determined by one or more crystals. Circuits 1250 include a RTC real time clock (time/date functions), general purpose I/O, a vibrator drive (supplement to cell phone ringing features), and a USB On-The-Go (OTG) transceiver. A touch screen interface 1260 is coupled to a touch screen XY 1266 off-chip.
Batteries such as a lithium-ion battery 1280 and backup battery provide power to the system and battery data to circuit 1250 on suitably provided separate lines from the battery pack. When needed, the battery 1280 also receives charging current from a Charge Controller in analog circuit 1250 which includes MADC (Monitoring ADC and analog input multiplexer such as for on-chip charging voltage and current, and battery voltage lines, and off-chip battery voltage, current, temperature) under control of the power control state machine. Battery monitoring is provided by either or both of 1-Wire and/or an interface called HDQ.
In
Further in
The RISC processor 1420 and the DSP 1424 in section 1420 have access via an on-chip extended memory interface (EMIF/CF) to off-chip memory resources 1435 including as appropriate, mobile DDR (double data rate) DRAM, and flash memory of any of NAND Flash, NOR Flash, and Compact Flash. On chip 1400, the shared memory controller 1426 in circuitry 1420 interfaces the RISC processor 1420 and the DSP 1424 via an on-chip bus to on-chip memory 1440 with RAM and ROM. A 2D graphic accelerator is coupled to frame buffer internal SRAM (static random access memory) in block 1440. A security block 1450 in security logic 1038 of
Security logic 1038 of
On-chip peripherals and additional interfaces 1410 include UART data interface and MCSI (Multi-Channel Serial Interface) voice wireless interface for an off-chip IEEE 802.15 (Bluetooth and low and high rate piconet and personal network communications) wireless circuit 1430. Debug messaging and serial interfacing are also available through the UART. A JTAG emulation interface couples to an off-chip emulator Debugger for test and debug. Further in peripherals 1410 are an I2C interface to analog baseband ABB chip 1200, and an interface to applications interface 1180 of integrated circuit chip 1100 having digital baseband DBB.
Interface 1410 includes a MCSI voice interface, a UART interface for controls, and a multi-channel buffered serial port (McBSP) for data. Timers, interrupt controller, and RTC (real time clock) circuitry are provided in chip 1400. Further in peripherals 1410 are a MicroWire (u-wire 4 channel serial port) and multi-channel buffered serial port (McBSP) to Audio codec, a touch-screen controller, and audio amplifier 1480 to stereo speakers.
External audio content and touch screen (in/out) and LCD (liquid crystal display), organic semiconductor display, and DLP™ digital light processor display from Texas Instruments Incorporated, are suitably provided in various embodiments and coupled to interface 1410. In vehicular use, the display is suitably any of these types provided in the vehicle, and sound is provided through loudspeakers, headphones or other audio transducers provided in the vehicle. In some vehicles a transparent organic semiconductor display 2095 of
Interface 1410 additionally has an on-chip USB OTG interface couples to off-chip Host and Client devices. These USB communications are suitably directed outside handset 1010 such as to PC 1070 (personal computer) and/or from PC 1070 to update the handset 1010.
An on-chip UART/IrDA (infrared data) interface in interfaces 1410 couples to off-chip GPS (global positioning system block cooperating with or instead of GPS 1190) and Fast IrDA infrared wireless communications device. An interface provides EMT9 and Camera interfacing to one or more off-chip still cameras or video cameras 1490, and/or to a CMOS sensor of radiant energy. Such cameras and other apparatus all have additional processing performed with greater speed and efficiency in the cameras and apparatus and in mobile devices coupled to them with improvements as described herein. Further in
Further, on-chip interfaces 1410 are respectively provided for off-chip keypad and GPIO (general purpose input/output). On-chip LPG (LED Pulse Generator) and PWT (Pulse-Width Tone) interfaces are respectively provided for off-chip LED and buzzer peripherals. On-chip MMC/SD multimedia and flash interfaces are provided for off-chip MMC Flash card, SD flash card and SDIO peripherals.
In
Still other additional wireless interfaces such as for wideband wireless such as IEEE 802.16 WiMAX mesh networking and other standards are suitably provided and coupled to the applications processor integrated circuit 1400 and other processors in the system. WiMax has MAC and PHY processes and the illustration of blocks 1510 and 1520 for WLAN indicates the relative positions of the MAC and PHY blocks for WiMax.
In
TABLE 1 provides a list of some of the abbreviations used in this document.
In
Data exchange between the peripheral subsystem and the memory subsystem and general system transactions from memory to memory are handled by the System SDMA. Data exchanges within a DSP subsystem 3510.2 are handled by the DSP DMA 3518.2. Data exchange to refresh a display is handled in display subsystem 3510.4 using a DISP DMA 3518.4 (numeral omitted). This subsystem 3510.4, for instance, includes a dual output three layer display processor for 1× Graphics and 2× Video, temporal dithering (turning pixels on and off to produce grays or intermediate colors) and SDTV to QCIF video format and translation between other video format pairs. The Display block 3510.4 feeds an LCD panel using either a serial or parallel interface. Also television output TV and Amp provide CVBS or S-Video output and other television output types. Data exchange to store camera capture is handled using a Camera DMA 3518.3 in camera subsystem CAM 3510.3. The CAM subsystem 3510.3 suitably handles one or two camera inputs of either serial or parallel data transfer types, and provides image capture hardware image pipeline and preview.
A hardware security architecture including SSM 2460 propagates qualifiers on the interconnect 3521 and 3534 as shown in
Firewall protection by firewalls 3522.i is provided for various system blocks 3520.i, such as GPMC to Flash memory 3520.1, ROM 3520.2, on-chip RAM 3520.3, Video Codec 3520.4, WCDMA/HSDPA 3520.6, MAD2D 3520.7 to Modem chip 1100, and a DSP 3528.8. Various initiators in the system are given 4-bit identifying codes designated ConnID. Some Initiators and their buses in one example are Processor Core MPU 2610 [RD, WR, INSTR Buses], digital signal processor direct memory access DSP DMA 3510 [RD, WR], system direct memory access SDMA 3510.1 [RD, WR], Universal Serial Bus USB HS, virtual processor PROC_VIRTUAL [RD, WR, INSTR], virtual system direct memory access SDMA_VIRTUAL [RD, WR], display 3510.4 such as LCD, memory management for digital signal processor DSP MMU, camera CAMERA 3510.3 [CAMERA, MMU], and a secure debug access port DAP.
The DMA channels support interconnect qualifiers collectively designated MreqInfo, such as MreqSecure, MreqPrivilege, MreqSystem in order to regulate access to different protected memory spaces. The system configures and generates these different access qualifiers in a security robust way and delivers them to hardware firewalls 3512.1, 3512.2, etc. and 3522.1, 3522.2, etc. associated with some or all of the targets. The improved hardware firewalls protect the targets according to different access rights of initiators. Some background on hardware firewalls is provided in incorporated patent application TI-38804, “Method And System For A Multi-Sharing Security Firewall,” Ser. No. 11/272,532 filed Nov. 10, 2005, which is hereby incorporated herein by reference.
The DMA channels 3515.1, .2, etc. are configurable through the L4 Interconnect 3534 by the MPU 2610. A circuitry example provides a Firewall configuration on a DMA L4 Interconnect interface that restricts different DMA channels according to the configuration previously written to configuration register fields. This Firewall configuration implements hardware security architecture rules in place to allow and restrict usage of the DMA channel qualifiers used in attempted accesses to various targets.
When an attempt to configure access for DMA channels in a disallowed way is detected, in-band errors are sent back to the initiator that made the accesses and out-band errors are generated to the Control Module 2765 and converted into an MPU Interrupt. Some background on security attack detection and neutralization is described in the incorporated patent application TI-37338, “System and Method of Identifying and Preventing Security Violations Within a Computing System,” Ser. No. 10/961,344 filed Oct. 8, 2004, which is hereby incorporated herein by reference.
In
A signal ConnID is issued onto the various buses by each initiator in the system 3500. The signal ConnID is coded with the 4-bit identifying code pertaining to the initiator originating that ConnID signal. System Memory Interface 3555 in some embodiments also has an adjustment made to ConnID initiator code so that if incoming ConnID=MPU AND MreqSystem=‘1’, then ConnID=MPU_Virtual. If incoming ConnID=SDMA AND MreqSystem=‘1’, then ConnID=SDMA_Virtual. In this way the special signal MreqSystem identifies a virtual world for these initiators to protect their real time operation. For background on these initiators and identifiers, see for instance incorporated patent application TI-61985, “Virtual Cores And Hardware-Supported Hypervisor Integrated Circuits, Systems, Methods and Processes of Manufacture,” Ser. No. 11/671,752, filed Feb. 6, 2007, which is hereby incorporated herein by reference.
The System Memory Interface SMS with SMS Firewall 3555 is coupled to SRAM Refresh Controller SDRC 3552.1 and to system SRAM 3550. A new ConnID is suitably generated each time the processor core MPU 2610 or system SDMA 3530.1, 3535.1 perform an access in the case when the MreqSystem qualifier is one (1).
In
In
The modem enters the deep sleep state by acknowledging the D2D idle request by asserting the signal MODEM_IDLEACK. The PRCM will gate the modem functional clock upon assertion of the D2D Idle Acknowledge. The modem exits this deep sleep state by asserting a D2D wakeup signal MODEM_SWAKEUP. The SAD2D OCP interface clock and modem functional clock are each restarted by the PRCM upon assertion of the D2D wakeup.
Numerous operations involving context switching, interrupts and various computations used in the circuits, blocks and systems of
In
In
In
I unit 3620 receives instructions on a wide bus and stores some or many lines of instructions in a multi-word-wide Instruction Buffer Queue (IBQ) 3622. Instructions are transferred as needed to an Instruction Decoder Controller 3624 with associated Instruction Register 3626 having sections or slots 3626.1 for Instruction 1 and 3626.2 for Instruction 2.
A Unit 3630 has a block of Address Registers 3632 and a block of Data Registers 3634. An arithmetic logic unit ALU 3636 supports data address generator DAG functions. Storage blocks 3638 for Xmem, Ymem, Zmem are coupled to ALU 3636. A Stack unit 3639 holds context-specific register contents and supports multiple push and multiple pop operations as taught herein.
D Unit 3640 has multiply-accumulate units MAC1 3642.1 and MAC2 3642.2, each coupled to receive Data read Data such as from any one or more of buses BB, CB, DB. A set of Accumulator registers 3644 are coupled to the MACs 3642.1 and 3642.2, as well as to a pair of arithmetic logic units ALU1 3646.1, 3646.2 with associated shifters 3647.1, 3647.2, and to a Bit Operations Unit 3648. D Unit 3640 is divided into execute pipe stages as described later hereinbelow. D Unit 3640 is coupled to and supplies Data Write Data to buses EB and FB.
An Interrupt Control circuit 3629 vectors operations in response to any of plural interrupt inputs so that the Program Counter PC is loaded with the address of the initial instruction in the applicable interrupt service routine corresponding to the particular interrupt, and so that the Instruction Register(s) have the initial instruction itself entered (jammed) therein so that the applicable interrupt service routine commences.
In
D Unit 3640 register block 3644 has a set of accumulator registers, e.g., designated AC0-AC15 coupled to ALUs 3646.i and shifters 3647.i and Bit Operations 3648 as well as to the Data Write Buses EB and FB. Bit Operations 3648 perform any of various logic operations on a bit-wise basis.
In
In
In
The Instruction Register IR 3626.i is controlled by the STOP so that the IR continues to hold a given instruction, such as Push or Pop, that is subject to the Repeat. Because the given instruction remains in the IR, the Instruction Decoder 3625 continues to output the same uOPcode corresponding to the given instruction which is to be repeated as long as Repeat Counter RPTC 3830 is down counting. When zero is reached in RPTC, the STOP from AND-gate 3860 is terminated, and the Instruction Register IR receives a subsequent instruction, and Instruction Decoder 3625 provides a subsequent corresponding uOPcode. The selector control associated with mux 3820 is then ready to detect that subsequent uOPcode. If the uOPcode is a Single Repeat at that time or at some later time, then circuits 3820, 3830, 3840, 3850 again cooperate and respond as just described.
In
For example, a signal STOP is applied to the IR when both the instruction decoder has a flag active indicative of a type of instruction from which update may need to be stopped (e.g. Single Repeat Action Flag SRAF) and the value in repeat counter RPTC is not zero. Concurrently, the instruction is repeatedly delivered to the instruction pipe register. Repeat counter RPTC is decremented at each cycle until it reaches zero, whereupon the STOP signal goes inactive and the IR is updated with a new instruction because the repeat instruction RPT generation is completed.
Execute time of the repeat instruction RPT is proportional or equal to the repeat number plus one. For example, in the case of a multiple push the execute time is proportional to (number_of_push+1). Stack size is user defined (software stack), not limited by hardware. Some embodiments are provided, situated, and operated near the circuitry where the instruction is decoded.
A single repeat instruction saves program space when a loop iterates one instruction, such as a computation in a digital filter, or initialization, e.g., zero-filling) of some memory region.
-
- repeat(#count)
- AC0=AC0+(*AR0+* *AR1+); a multiply-and-accumulate instruction,
- fetching data from memory pointed to (register prefix *)
- by AR0 (address reg 0) and to by AR1,
- then AR0 and AR1 are auto-post-incremented
- (register suffix+).
- AC0=AC0+(*AR0+* *AR1+); a multiply-and-accumulate instruction,
- is almost identical to the train of the repeated:
- AC0=AC0+(*AR0+* *AR1+); 1st time
- AC0=AC0+(*AR0+* *AR1+); 2nd
- :
- AC0=AC0+(*AR0+* *AR1+); last
- repeat(#count)
As between those two listings, when executed in the processor, the repeat instruction itself consumes a cycle. Summarizing, the repeat instruction saves many bytes in the code size, and acceptably incurs a cost of one execution cycle.
A memory-fill loop is represented by:
-
- repeat(#n)
- AR2+=#0; store value 0 to n+1 successive memory spaces
- starting from an address to which register AR2 points
- and auto-post-incrementing AR2.
- AR2+=#0; store value 0 to n+1 successive memory spaces
- repeat(#n)
Save/restore of a set of CPU registers to the stack could laboriously be coded as follows, at considerable cost in code size:
-
- dbl(push(AC0)); push to stack a longword(32 bit) which is accumulator 0
- dbl(push(AC1))
- :
- dbl(push(AC15)).
Suppose registers were mapped on the data memory space as Memory Mapped Registers (MMR) and sequential access to CPU registers were realized by using memory addressing mode. However, using such a memory addressing mode would take up one of the data address registers AR of
PSH AR0; Save address register 0 (AR0) at first
AR0=*(AR15); Load the address of AR15 to data address register AR0
RPT #14; Repeat next instruction 15 times
PSH*AR0-; Decremented addressing points AR15, AC14, . . . AR1.
MMR mapping as described presents an inevitable difficulty of expandability, or issue of increasing the space for new registers. To map increased CPU registers on data memory space can lead to data memory allocation policy change. This policy change may force old codes developed by old policy to be modified. If some old CPU registers cannot be mapped, then upgrading software code in accordance with the policy change necessarily entails a tedious and burdensome revision of the software code so that those registers are still saved or restored one by one by corresponding upgrade instructions.
Some processors have a few scoreboard style multiple registers such as for a Load/Store instruction. But other processors have many more registers that vary generation by generation of processors. In one particular example of a processor addressed by some of the embodiments herein, more than 100 registers are to be saved, and the embodiments are applicable to smaller or larger numbers of registers.
By contrast, a code example using hardware of an embodiment can execute same thing by below code.
RPT #15; Repeat next instruction 16 times
PSH AR0; Saving AR15˜AR0
In words, some embodiments herein single-repeat a push instruction (and pop also), which takes the register ID as an operand. Using an embodiment, suppose this piece of code is executed as follows, saving a great deal of code size:
-
- repeat(#15)
- dbl(push(AC0)); repeated
- repeat(#15)
However, without more, the repeated push instruction supported by the circuit of
To solve this problem, some embodiments provide and execute a single-repeat-of-a-push wherein the static operand of the push instruction is automatically and sequentially offset with a decrementing value 15, 14, . . . , 2, 1, 0 during repetition. A repeat counter register RPTC in the CPU is augmented with hardware including Register ID Generation Logic 4010 of
Using the code “repeat (#15) dbl (push (AC0))”, now supported by the hardware of
-
- repeat(#15)
- dbl(push(AC15)); the first repetition; the original operand in instruction
- which is AC0 when offset by 15 delivers results into AC15
- dbl(push(AC14))
- :
- dbl(push(AC0))
A given one instance of push instruction generated from this stored instruction code “repeat (#15) dbl (push (AC0))” in the code memory is thus repeated in actual operation by being replicated into multiple issued instructions supplied into the pipeline for execution. The repeat instruction provides the number 15 to program the range of counting for register RPTC. The dbl push instruction has an associated operand AC0 that is used together with the varying counter value of RPTC to vary the operand of instruction dbl push over its desired operand value range AC15, . . . AC0.
In
The processor has an instruction to store register content to top-of-stack (PSH) and load register from top-of-stack (POP). And the processor has an instruction that repeats next instruction N+1 times (RPT). At the repeat, RPTC (dedicated register for single repeat) is initialized by N and decrements once for each instruction execution in the repeat sequence.
The concept is here introduced of Register Space which contains some set of the CPU registers, or all CPU registers, regardless of its register group. Any CPU register is assigned its own Register ID (RegID) and mapped onto the Register Space.
Also introduced are remarkable PSH and POP instructions that can take the all CPU registers as their source or destination field (PSH RegID, POP RegID). And when that is repeated, the source or destination RegID field of that instruction is modified at instruction decode phase by adding or subtracting the value of RPTC. In an example of the repeat process and structure, the syntax
PSH RegID actually works as PSH RegID plus RPTC value, PSH(RegID+RPTC), and
POP RegID actually works as POP RegID minus RPTC value, POP(RegID−RPTC).
In
In
In
Mux 4050 has a selector control responsive so that if the repeated instruction opcode uOPcode is either Push OR Pop, then mux 4050 couples mux 4040 via first input lines 4054 to an output 4056 of mux 4050 coupled to Instruction pipe register 3810. In this way Instruction pipe register 3810 receives either a Source Identification SrcID value if the instruction is a Push or receives a Destination Identification DstID value if the instruction is a Pop. If the instruction is neither a Push nor a Pop, nor any other instruction improved by the teachings herein to which the mux 4040 output is relevant, then the selector controls for mux 4050 perform default selection and couple the Operand input 4052 through mux 4050 to the Instruction pipe register 3810, as if the rest of the RegisterID Generation Logic 4010 were absent.
Also in
Further in
Another AND-gate 3960 further controls clock and ends decrementing by decrementor 3840. AND-gate 3960 and operates so that if any of the following conditions occur, decrementor 3840 operation is suspended or terminated: 1) Not-Zero detector 3850 detects that Repeat Counter RPTC value has reached zero, 2) a processor Break signal is active, 3) active state of a low-active signal INWHILE/generated elsewhere in the processor in response to duration of a predetermined condition in a status register or otherwise.
In
RPT #15; Repeat next instruction 16 times
PSH AR0; Push data address register 0, RegID=x0h
where x is a predetermined bit field that depends on implementation and represents the particular register.
The above set of instructions execute the instruction “PSH AR0” differently each time for 16 times and produce a succession of sixteen PSH instructions. Here, the operand RegID field of “PSH AR0” is x0h. At the first iteration, RPTC shows decimal number 15. Its RegID field of the instruction is modified as RegID+RPTC, generates RegID=xFh where xFh is the hexadecimal RegID of AR15.
Then the example instructions “RPT #15, PSH AR0” use the circuitry of
PSH AR15; RegID field=x0h and RPTC=15, generates RegID=xFh
PSH AR14; RegID field=x0h and RPTC=14, generates RegID=xEh:
:
PSH AR1; RegID field=x0h and RPTC=1, generates RegID=x1h
PSH AR0; RegID field=x0h and RPTC=0, generates RegID=x0h
In
RPT #15
POP AR15; Pop data address register 15, RegID=xFh,
The stack operates as a Last In First Out (LIFO) memory so the operation is done in the reversed order. The operand RegID field is modified as RegID−RPTC.
Then, above set of instructions “RPT #15, POP AR15” uses the circuitry of
POP AR0; RegID field=xFh and RPTC=15, generates RegID=x0h
POP AR1; RegID field=xFh and RPTC=14, generates RegID=x1h:
POP AR14; RegID field=xFh and RPTC=1, generates RegID=xEh
POP AR15; RegID field=xFh and RPTC=0, generates RegID=xFh
Stack size is defined by an allocation of adequate memory bytes for data, memory bytes for program code, and memory bytes for stack. Some embodiments e.g., “repeat (#n) dbl (push/pop (AC0))” desirably provide a compression of program code in the program memory compared to the amount of program memory bytes that would otherwise be used for an explicitly-lengthy block of code such “dbl (push/pop (AC15)), dbl (push/pop (AC14)), . . . dbl (push/pop (AC0))”. In some embodiments, the repeat parameter n can be revised in a parameter memory referenced by the repeat instruction RPT, and such revision inexpensively and effectively accommodates system upgrades.
In some other embodiments tabulated in TABLES 3 and 4 herein below, the scope of the registers intended to be covered by a multi-push or multi-pop instruction is abstractly represented by mnemonics like ALL, RLH, or XR, etc. instead of using a repeat number n. The decoding hardware in each hardware upgrade or generation of a processor automatically executes the tabulated instruction syntax to cover the scope of registers applicable to that generation of the processor.
The multiple push and multiple pop and other multiple instructions herein are applicable to data unit DU, address unit AU, memory spaces and to all other units and pipeline stages to which their advantages make them applicable.
A second input 4224 of mux 4220 receives the current contents of Instruction Register IR 3626.i. In an aspect of the
Further in
Context change logic 4255 pulses the stack for pushing and popping the stack in response to inputs such as Call, Return, and Interrupt Request as shown in
In this way, the instruction circuit is operable over a time interval to repeatedly issue the repeated instruction with its Operand thus varied, and the instruction circuit is interruptible prior to completion of the time interval to issue an interrupt instruction and further operable to subsequently resume from the interruption and complete the repeatedly issuing of the second instruction with the Operand varied in an operand value range determined as a function of the varying bias value.
Once a single-repeat instruction is decoded, the processor then freezes instruction register IR 3626.1, for instance, by holding the repeated instruction content in the IR 3626.1 for multiple cycles. This freeze operation is symbolized by the feedback path from IR via line 4224, mux 4220, mux 4210, and back to IR. During this repeat process, RPTC is decremented toward zero (0). A logic gate 3860 performs an AND function represented by
SRAF (single repeat active flag) AND (RPTC>0)
When that logic function is True (AND-gate 3860 output active), the circuit thereby determines if repeat is ongoing and should continue. The AND circuit 3860 supplies STOP to AND-gate 4230 that controls multiplexer 4220 coupled after the instruction FIFO. The multiplexer 4220 selectively controls and delivers either a new instruction from IBQ 3622 or delivers a repeated instruction, when that logic function is true, to mux 4210 to feed the instruction register IR.
Now suppose an interrupt request is presented at mux 4210. The processor desirably hangs up the repeat process in the sense of interrupting execution of the repeat process and saving its context for resumption later. The processor then serves the interrupt by coupling an interrupt-related instruction from interrupt control circuit 3629 via mux 4210 to IR and executing the associated interrupt service routine ISR. Then when a return from interrupt is executed, the processor restores the context of the repeat process and resumes the repeat process. (It should be understood that some embodiments alternatively flush IBQ 3622 and load the ISR through IBQ 3622.)
The interrupt request de-freezes IR using mux 4210. At the same time, the interrupt request loads specific instruction(s) designated INTR into instruction register IR. Instruction(s) INTR saves a return context for the interrupt software, then saves SRAF and PC to RETurn Address register RETA, and invokes a branch to an interrupt service routine. (INTR itself can include a multiple push as taught herein.)
At this point the value in register SRAF 3864 representing repeat-active (e.g. a one bit) is packed into the return context. At the same time SRAF itself is cleared to prevent further decrementing of RPTC.
The interrupt service routine ends with a RET_INT instruction, with which SRAF is restored, then the first instruction loaded into IR (which is the very instruction that was repeated) will be again repetitively processed (until RPTC reaches 0; during CPU's executing the interrupt service routine SRAF is 0 thus RPTC is not decremented). If some instruction is repeated in the interrupt service routine, then SRAF is set and the repeat instruction in the ISR loads the RPTC. A repeat multiple pop can be used to restore the context of the interrupted code as well.
In TABLE 3, the instructions perform a respective Push to Top Of Stack operation, and have a word pointer mode and a byte pointer mode as alternative modes, for instance. In the operations represented next, XSP is the extended data stack pointer (position), and *XSP is the stack space at the position to which pointer XSP points. HI and LO represent high and low words or the first and second halves of a long word.
When in the word pointer mode of PUSH, some embodiments operate as shown in TABLE 3A, see corresponding enumeration in the Syntax TABLE 3 above.
When in the byte pointer mode of PUSH, the pointer value XSP is twice as large and the decrements are twice as large as in word mode. This is because a word is twice as large as a byte here. The corresponding operations on the same operands are as shown in TABLE 3B:
The instructions of TABLES 3, 3A, 3B perform various forms of a PUSH operation. Operand(s) such as a CPU register (e.g., ALLx, RLHx, XRx) or a data memory location addressed by Smem is moved to a data memory location addressed by XSP (and XSSP). If the source is a member of ALLa, (e.g., includes RLHa, XRa), a memory store is performed that is the same as a Store instruction. For instruction #1 and #2, when it is used in the single repeat loop, multiple CPU registers are pushed sequentially.
An instruction push(regID) when repeated works additively as pseudocode “push(regID+RPTC)” in a repeat loop, and uses adder “(+)” of
Example:
-
- repeat(#15)
- dbl(push(AC0))
- In first iteration, AC0+#15 is AC15, thus AC15 is pushed.
- In second iteration, AC0+#14 is AC14, thus AC14 is pushed.
- :
- In last iteration, AC0+#0 is AC0, thus AC0 is pushed.
- repeat(#15)
Some processor embodiments are dual issue as in
Different dual issue processor embodiments can utilize different embodiments of circuitry as regards the matter of entering the multi-push or multi-pop instruction into the wide instruction register and whether to enter it if another type of instruction occupies instruction slot 1. Multi-push and multi-pop instructions (instruction #1 and #2) in this particular example are not used as the slot 2 instruction in the single repeated instructions having a wide instruction register for plural instructions held in slots of the wide instruction register, although alternative embodiments can be arranged to operate differently. Some embodiments replicate the circuitry of
For instruction type #7, when in the byte pointer mode, operation is same as instruction #1 or #2. When stack configuration is 32 bit stack mode, For instruction #1, #2, #3, #4, #5 and #6, same amount of decrement is applied to XSSP. For instruction #7, when in the byte pointer mode, same amount of decrement (−4) is applied to XSSP.
These instructions in TABLE 4 perform Pop from Top Of Stack operation in a single cycle and have a word pointer mode and a byte pointer mode analogous to such modes for the Push to Top of Stack operation of TABLE 3 but performing operations in reverse.
When in the word pointer mode of POP, some Pop embodiments operate as shown in TABLE 4A. See corresponding enumerated operations in the Syntax TABLE 4 above.
When in the byte pointer mode of POP, some other Pop embodiments operate as shown in TABLE 4B:
The instruction types of TABLES 4, 4A, 4B perform a POP operation. A data memory location *XSP addressed by pointer XSP (or *XSSP by XSSP) is moved to a CPU register or data memory location addressed by Smem.
If the destination is a member of register group ALLa (includes RLHa, XRa), then a register update is performed and is same as a Logical load (copy) instruction.
For instruction #1 and #2, when it is used in the single repeat loop, multiple CPU registers are popped sequentially.
Syntax “regID=pop( )” works subtractively as “regID−RPTC=pop( )” in the loop. Expressed in other symbolism, an instruction pop(regID) when repeated works as pseudocode “pop(regID−RPTC)” in a repeat loop, and uses subtractor “(−)” of
Example:
-
- repeat(#15)
- AC15=dbl(pop( )
- In first iteration, AC15-#15 is AC0, thus AC0 is popped.
- In second iteration, AC15-#14 is AC1, thus AC1 is popped.
- :
- In last iteration, AC15-#0 is AC15, thus AC15 is popped.
- repeat(#15)
This multi-pop instruction is applicable when the instruction is in the instruction slot 1. During multi-pop, generated register identification regID remains within the boundary between single word register and long word register. For instruction #7, when in the byte pointer mode, operation is same as instruction #1 or #2. When stack configuration is 32 bit stack mode, then for instruction #1, #2, #3, #4 #5 and #6, same amount of increment is applied to XSSP. And for instruction #7, when in the byte pointer mode, a same amount of increment (+4) is applied to XSSP.
In the multi-push/pop, using some other register besides AC0 as base for repeating works just as well. For example,
repeat(#14)
-
- dbl(push(AC1)); pushes AC15, AC14, . . . AC1
Any register which is in sequential order in the ALLx register ID can be pushed or popped sequentially by single repeat. For example, in an embodiment herein, the repeat push instruction could be:
-
- repeat(#3)
push(AC4).
- repeat(#3)
Then, the order of push is push(AC7), push(AC6), push(AC5), push(AC4). The corresponding repeat pop instruction is:
-
- repeat(#3)
- pop(AC7).
- repeat(#3)
That repeat pop instruction then pops in the order AC4, AC5, AC6, AC7.
Even if the interrupt contains its own sequence like single repeat on push from AC0, the register index is generated from register ID in the instruction and the RPTC value. In this way, the RPTC is saved on interrupt and that is sufficient information for restoring the repeat instruction at the point at which the repeat instruction was interrupted. For example, let a repeat push instruction be:
-
- repeat(#3)
- push(AC0)
- repeat(#3)
In operation, the sequence of pushes and corresponding RPTC contents are:
-
- push(AC3); RPTC=3
- push(AC2); RPTC=2
- push(AC1); RPTC=1
- push(AC0); RPTC=0
Suppose Reg ID of AC0 is x00. Then RPTC value is added to regID of AC0 to generate register index with which to restore a point in the sequence after an interrupt and then resume pushes.
The assembler is suitably structured to check for repeat instructions that are incompatible with the hardware architecture of the processor and flags an error. For example, suppose there are 16 accumulator registers in the hardware but the repeat instruction calls for a push/pop relating to more accumulator registers than exist in the hardware.
-
- repeat(#15)
- dbl(push(AC1));
Push AC16, AC15, . . . AC1 is being requested, and results in an error.
These various embodiments of repeat instructions operating on the circuitry of
For the repeat instruction #1 and #2 of TABLE 5, in the decode phase of the pipeline of
In
In
Expanded Push/Pop and Load/Store instructions are now described using TABLE 5A, which tabulates each of several types of repeated instructions that are repeated by application of any given repeat instruction of TABLE 5. Push/Pop instructions and supporting hardware embodiments are expanded to support all CPU architecture registers including any exception registers that might exist in a given processor architecture. Also, Load/Store instructions LD/ST that support all CPU architecture registers are added as embodiments to unify load/store instructions for particular registers.
TABLES 6 and 7 respectively show an example sequence of context save and context restore for use in interrupt processing and return. The tabulated code saves a very substantial percentage of code storage space compared to register-by-register instructions pushing/popping, and results will vary depending on embodiment and application. The code sequence of TABLE 7 effectively undoes or reverses the operations of TABLE 6.
Notice that the assembler conveniently responds to register mnemonics in TABLES 6 and 7, and the repetition number #n covers a set of registers over a contiguous set of pointer positions in Register Space. One example in TABLE 6 is “repeat(#3); dbl(push(RSA0))” which pushes four registers REA1, REA0, RSA1, RSA0 in decreasing underlying numerical order in Register Space and completes the operation by pushing the register (e.g., RSA0) that is explicitly specified in the repeat push instruction. The corresponding repeated pop in TABLE 7 is “repeat(#3); REA1=dbl(pop( )” which pops those four registers RSA0, RSA1, REA0, REA1 in increasing underlying numerical reverse order in Register Space, completing the operation by popping the register (e.g., REA1) that is explicitly specified in the repeat pop instruction.
A still more complicated operational example in TABLE 6 is given by the remarkably uncomplicated instructions “repeat(#24); push(PDP)”. Instructions are decoded whereupon a whole panoply of 24 contiguous registers in Register Space are pushed in decreasing underlying numerical order in Register Space and operationally ending with register PDP. The panoply of registers includes sixteen sequentially numbered registers AC15.G, AC14.G, . . . AC0.G, as well as BK47, . . . , BKC circular buffer size register, BOFC, . . . , BOF01 buffer offset, and finally the PDP peripheral data page pointer that is literally specified in the repeat push syntax. Conversely, the context restore repeat pop syntax is “repeat(#24); AC15.G=pop( )”.
In other words, the repeat pop syntax uses the circuitry of
In a particular processor and outside of the context save of TABLE 6, status registers ST0_55, ST1_55, ST2 and RETA (with SRAF and PC) are automatically saved. Certain other registers IIR, BER, BIOS, IFRx, IERx, DBGIERx, IVPx, SP and SSP do not need to be saved in some embodiments.
Depending on various considerations and type of embodiment, save/restore operations on registers according to teachings herein may be performed using a set of different multiple repeat instructions as in TABLES 6 and 7 supported by the hardware of
1) If a machine context involves information stored in types of registers involving different register lengths, e.g., a word (16 bit) register and alongword (32 bit) register. In a processor that has distinct instructions to support different register lengths (a single-word push then pop, and a longword push then pop), it is advisable to use different multiple repeat instructions to save and restore the machine context. Dynamic computing of the register identification RegID in Register Space using adder 4020 or subtracter 4030 is associated with a repeated push/pop instruction operating on one length or type of register throughout the counting process in RPTC 3830 established by a given repeat(#n) instruction.
2) If a machine context involves information stored in a subset of particular registers that are sparsely or not contiguously mapped among the RegIDs comprising Register Space, then it may be more convenient to save/restore the machine context by using different multiple repeat instructions to piecewise save/restore only the particular registers. However, some other embodiments can be prepared to store a contiguous set of registers that includes the subset of the particular registers, and then to ignore some of the registers in the contiguous set in the restoring process.
3) In some embodiments, some registers are seen twice, reflecting a capability of the processor to access some registers or part of them. Thus, one register can be seen twice, with “full” form and with “divided” form. An example of such is address registers. In
-
- x001x AR0 [15:0]<-lower 16 bits of XAR0
- x100x XAR0 [23:0]<-full form
- x101x AR0H [7:0]<-upper 8 bits of XAR0 [23:16].
Notice that Register Space inFIG. 9C does not necessarily resemble either a Physical Space of a register nor a Memory Address Space of a physically regular structure like a memory. The selection circuits 4520 and 4540 ofFIGS. 9A and 9B are suitably arranged in this example just above to respond to widely different RegID values in Register Space to access different parts of the same register. Conversely, closely spaced RegID values in Register Space may access operationally distinct and physically quite separate structures on the processor semiconductor chip layout.
4) Some processor embodiments may have one or more RegIDs that are reserved in the sense that no corresponding actual register is implemented in the hardware of the processor. In such case, the actual registers holding information representing a machine context are not contiguous in Register Space, and different multiple repeat instructions are suitably used to save/restore the actual registers.
Turning to a further consideration of TABLE 4, the instruction types #1, #3, #5, #6 of TABLE 4 perform a multiple or single 16-bit word Pop from top of Stack, and they move one, two, or multiple data memory locations addressed by XSP to the 16-bit destination operand. The destination operand may be: 1) a 16-bit data memory operand (Smem), 2) an accumulator low part, an accumulator high part, an auxiliary register, or a temporary register, 3) any 16-bit CPU register having a register ID symbol within the defined Register Space and some registers may be excluded either in here late from the Register Space or at excluded from the instruction operations as desired. These instructions use a dedicated datapath independent of the Address Unit AU ALU 3636 and independent of the Data Unit DU operations to perform the specified instruction operation.
Instruction #1 performs a single 16-bit word pop from the top of the stack. The content of the 16-bit data memory location addressed by XSP is moved to the 16-bit data memory location Smem. XSP is incremented to address the following 16-bit word.
Instruction #2 performs two 16-bit word pops from the top of the stack. The content of the 16-bit data memory location addressed by XSP is moved to the 16-bit destination register RLHa. XSP is incremented to address the following 16-bit word. The content of the 16-bit data memory location addressed by XSP is moved to the 16-bit data memory location Smem. XSP is again incremented to address the next following 16-bit word.
Instruction #3 performs two 16-bit word pops from the top of the stack. The content of the 16-bit data memory location addressed by XSP is moved to the 16-bit destination register RLHa. XSP is incremented to address the following 16-bit word. The content of the 16-bit data memory location addressed by XSP is moved to the 16-bit destination register RLHb. XSP is again incremented to address the next following 16-bit word. Instruction #4 performs either a single 16-bit word pop from the top of the stack, or multiple 16-bit pops from the top of the stack.
When executed out of an unconditional repeat single structure, this instruction #3 performs a single 16-bit word pop from the top of the stack as follows. The content of the 16-bit data memory location addressed by XSP is moved to the 16-bit register ALLa. XSP is incremented to address the following 16-bit word. The user designates the 16-bit ALLa registers by using the valid register ID symbols (register names). When accumulator high parts (ACx.H) are referenced as the destination operand, the 16-bit data memory location addressed by XSP is loaded to bits 16-31 of ACx. When accumulator low parts (ACx.L) are referenced as the destination operand, the 16-bit data memory location addressed by XSP is loaded to bits 0-15 of ACx. When XARx.H, XSSP.H, XSP.H, XDP.H, or ACx.G are referenced as the destination operand, the eight lowest bits of the 16-bit data memory location addressed by XSP are loaded to the destination register. When peripheral data page register (PDP) is referenced as the destination operand, the nine lowest bits of the 16-bit data memory location addressed by XSP are loaded to the destination register.
When Block Repeat Counter BRC1 is loaded with the content of a data memory location addressed by XSP, the block repeat save register (BRS1) is also loaded with the same value. Therefore, when performing a CPU register context save with push( ) instructions, instructions are coded to save the BRS1 register to the stack before BRC1. At context restore with pop( ) instructions, the BRS1 register is restored after BRC1.
When executed inside an unconditional repeat single structure, this instruction performs a sequence of pops from the top of the stack to a 16-bit ALLx register with the registerID of the popped register incrementing along the iterations of the single repeat structure.
Consider an example using the instruction in the repeat single structure below:
repeat(#(NB_REG_TO_POP−1))
ALLa=pop( ).
The register ID (regIDa) of the selected 16-bit ALLa register references another 16-bit CPU register ALLb with a register ID regIDb equal to (regIDa−NB_REG_TO_POP+1). This reference is made by subtracter 4030 for pop subtraction. At the first iteration of the repeat single structure, the following operations occur. ALLb register is popped from the top of the stack. XSP is incremented to address the following 16-bit word. At the next iteration, the 16-bit register with the register ID (regIDb+1) is popped, XSP is again incremented to address the next following 16-bit word, and so on, until, at the last iteration the 16-bit register (ALLa) is popped and XSP is again incremented to address the next following 16-bit word.
Note that a dual issue embodiment might not execute another instruction in parallel of this instruction when used in an unconditional repeat single structure. The set of registers popped by this multiple pop structure are of the same type (16-bit). Also, note that when XSP is incremented to address the following 16-bit word, this means that in word-pointer mode, XSP is incremented by 1, and in byte-pointer mode, XSP is incremented by 2. In byte-pointer mode, he software code is written to ensure that the Smem address and XSP are aligned on a multiple of two bytes. If not, then the CPU generates a bus error in one example processor embodiment.
When stack configuration is 32-bit stack mode, XSSP is incremented by the same amount as XSP. The registers modified by these instructions are updated in the execute2
pipeline phase (X2). The increment operations performed on XSP (and XSSP in 32-bit stack mode) are performed by the AU DAGEN S dedicated to the stack addressing management. XSP and XSSP registers are read in the address1 pipeline phase (AD1) and are updated in the address2 pipeline phase (AD2). Note that there may be a latency between PDP, SP, SSP, ARx, BSAxx, BKxx, BRCx, BRS1, and CSR write by these instructions and their subsequent read in the AD1 phase by the AU DAGENs or by the P-unit loop control management.
Consider the following example syntax: AC0.L, AC1.L=pop( ) The content of the memory location addressed by the data stack pointer (XSP) is copied to AC0[15-0] and the content of the memory location addressed by XSP+1 is copied to AC1[15-0]. The XSP register is incremented by 2. SP and SP+1 are unchanged.
Execution of the syntax AC8.H, *AR3=pop( ) involves the following operations. The content of the memory location addressed by the data stack pointer (XSP) is copied to AC8[31-16], and the content of the memory location addressed by XSP+1 is copied to the location addressed by XAR3. The XSP is incremented by 2.
Instruction types #2 and #4 of TABLE 4 perform multiple or single 32-bit word pop from the top of stack. In TABLE 4B, these instructions move one or multiple data memory locations addressed by XSP to the 32-bit destination operand. The destination operand may be a 32-bit data memory operand (dbl(Smem)), or any 32-bit CPU register having a register ID symbol. These instructions use a dedicated datapath independent of the AU ALU and the DU operators to perform the operation.
Instruction #4 of TABLE 4 performs a single 32-bit word pop from the top of the stack. The content of the 16-bit data memory location addressed by XSP is moved to the higher 16 bits of the 32-bit data memory operand dbl(Smem). XSP is incremented to address the following 16-bit word. The content of the 16-bit data memory location addressed by XSP is moved to the lower 16 bits of the 32-bit data memory operand dbl(Smem). XSP is again incremented to address the next following 16-bit word.
Instruction #2 of TABLE 4 performs either a single 32-bit word pop from the top of the stack, or multiple 32-bit pops from the top of the stack. When executed out of an unconditional repeat single structure, this instruction #2 performs a single 32-bit word pop from the top of the stack as follows. The content of the 16-bit data memory location addressed by XSP is moved to the higher 16 bits of the 32-bit register ALLa. XSP is incremented to address the following 16-bit word. The content of the 16-bit data memory location addressed by XSP is moved to the lower 16 bits of the 32-bit register ALLa. XSP is incremented to address the following 16-bit word. The user designates the 32-bit ALLa registers by using valid register ID symbols.
When accumulators (ACx) are referenced as the destination operand, the 32-bit words popped from the stack (as described previously) are loaded to bits 0-31 of ACx. When a particular width register (XARx, XSSP, XSP, XDP, RSAx, or REAx) is referenced as the destination operand, the corresponding part of the width of the 32-bit word popped from the stack is loaded to the destination register.
When RETA register is referenced as the destination operand, the 32-bit word popped from the stack is loaded to the width of RETA register content (the return address of the calling subroutine) and the balance of the content to a CFCT register having active control flow execution context flags of the calling subroutine.
When executed inside an unconditional repeat single structure, this instruction #2 performs a sequence of pops from the top of the stack to a 32-bit ALLx register with the registerID of the popped register incrementing along the iterations of the single repeat structure.
Consider a process example using the following instruction in a repeat single structure:
repeat(#(NB_REG_TO_POP−1));
ALLa=dbl(pop( ).
The register ID (RegIDa) of the selected 32-bit ALLa register references another 32-bit CPU register ALLb with a register ID regIDb equal to (RegIDa−NB_REG_TO_POP+1). At the first iteration of the repeat single structure the ALLb register is popped from the top of the stack. XSP is incremented to address the following 32-bit word. At the next iteration the 32-bit register with the register ID (RegIDb+1) is popped, and XSP is again incremented to address the next following 32-bit word, and so on. At the last iteration, the 32-bit register (ALLa) is popped, and XSP is again incremented to address the next following 32-bit word. Note that a dual issue embodiment might not execute another instruction in parallel with this instruction when used in an unconditional repeat single structure. The set of registers popped by this multiple pop structure are of the same type (32-bit). Also, note that when XSP is incremented to address the following 16-bit word, this means the following: In word-pointer mode, XSP is incremented by 1. In byte-pointer mode, XSP is incremented by 2. In byte-pointer mode, ensure the dbl(Smem) address is aligned on a multiple of four bytes. If not, then the CPU generates a bus error. Similarly, the code is written to ensure that XSP is aligned on a multiple of two bytes. If not, then the CPU generates a bus error. When the stack configuration is 32-bit stack mode, XSSP is incremented by the same amount as XSP.
For instruction #4 of TABLE 4 in word-pointer mode, when dbl(Smem) is at an even address, the two 16-bit values popped from the stack are stored in memory in the same order as they are stored at memory location dbl(Smem). When dbl(Smem) is at an odd address, the two 16-bit values popped from the stack are stored in the reverse order of the one at memory location dbl(Smem). Regarding pipeline operations, the registers modified by these instructions are updated in the execute2 pipeline phase (X2). The increment operations performed on XSP (and XSSP in 32-bit stack mode) are performed by the AU DAGEN S dedicated to the stack addressing management. The XSP and XSSP registers are read in the address1 pipeline phase (AD1) and are updated in the address2 pipeline phase (AD2). Note that a latency may exist between XDP, XSP, XSSP, and XARx write by these instructions and their subsequent read in the AD1 phase by the AU DAGENs or by the P-unit loop control management. When executing a block-repeat loop, registers RSAx and REAx are not modified by these instructions #4 and #2.
Consider this example syntax: dbl(*AR2+)=pop( ). The content of the memory location addressed by the data stack pointer XSP is stored at the address pointed to by XAR2. If the address pointed to by XAR2 is even, the content of the memory location addressed by
XSP+1 is stored at the address pointed to by XAR2+1. If the address pointed to by XAR2 is odd, the content of the memory location addressed by XSP+1 is stored at the address pointed to by XAR2−1. The XSP register is incremented by 2. XAR2 is incremented by 2. When *AR[0-15]+ is used with dbl( ) XAR[0-15] is incremented by 2.
Regarding the syntax AC2=dbl(pop( ), the content of the memory location addressed by the data stack pointer XSP is copied to AC2[31-16]. The content of the memory location addressed by XSP+1 is copied to AC2[15-0]. The XSP register is incremented by 2.
Register Space is independent from the other spaces in the processor so as to permit easily expanding the number of registers in the future without losing upward compatibility. A repeated instruction is generated dynamically in every instruction decode stage. A new Instruction is dynamically generated at each time by just using and referring to the base instruction being repeated and to the repeat counter RPTC. Real estate is conserved in some embodiments as shown. Some embodiments use a state machine to perform the dynamically repeated multi-cycle instruction.
Some other embodiments repeatedly issue the same instruction down the pipe and then vary its effect at the point somewhere down in the pipe where Source ID SID is used by Source selection block 4520 in
Some of the embodiments remarkably provide compatibility with interrupts asserted during the repeat process. An additional register is unnecessary here to save instruction state of the repeated instruction. Since instruction is generating a dynamically repeated version at each time, this sequence is interruptible without an additional register.
Some of the embodiments include can provide any one or more of the following desirable features and/or other desirable features: smaller code size, easily expandable number of registers in processor upgrades, unnecessary to assign new instruction opcode as number of registers is expanded, unnecessary to introduce new CPU register, unnecessary to provide new mode bit or status bit, interrupt response time remains undiminished. Dynamic instruction modification at decode stage is also applied in some embodiments.
In some embodiments, the code size reduction saves more real estate than the adder, subtracter, mux and selector circuitry 4010 of
as more multiple repeat instructions and larger repeat number n(i)=1+#n in the argument of each repeat instruction i are used. Since the real estate expense for the circuitry appears to be fixed by the structure of any particular embodiment, the code savings and convenience of the various embodiments appear to easily justify their use.
In
A decoder analyzes the instruction(s) and interprets each one into an internal expression or machine language that is implementation-dependent. The decoder also activates a data address generator DAgen when desired. The decoder activates the data address generator in the case of a push/pop, using the stack pointer SP to produce a write-to/read-from memory operation.
For address generation, the register file is read in Address1 stage then processed into effective address in Addr2 stage, which is then sent off-the-CPU to memory for a read operation/operand or pipelined to a later stage for a write operation/operand. In one example, a so-called memory-operand pipeline is used wherein memory access is intimately, closely or tightly combined into the processor pipeline.
Following such memory read-request issuance, when MPU pipelines an instruction to Execute stage, the MPU activates a math-operating unit named DU (data unit) for some sort of computing. The DU has operational units (ALU or MAC) inside which the units take operand(s) from memory(s) and from registers and compute as the instruction specifies (e.g., add, compare or multiply).
Here a push instruction acts as a store-to-memory instruction, for which the selected register is read in Execute1 stage then finally passed to the memory interface to be stored, coupling with a corresponding address. A pop instruction acts as a load-from-memory for which no computation is performed and the value from memory, which was once pushed to the stack, is retrieved. A stack is a specific region in the memory, pointed to by SP (stack pointer) register. The stack is provided to preserve the MPU register contents temporarily and then is retrieved by writing back to the destination register.
Some embodiments provide a remarkable operation that dynamically produces the source/destination register for a push/pop instruction under single-repeat. The register value is embedded in the instruction as immediate constant, which is intentionally biased with RPTC (single repeat count) register.
Instruction pipe register 3810 of
Sourcing and reading of the register file registers is performed using the source/destination selection block in
As shown in an
As further shown in
For context changing purposes, register file 4544 in this description suitably also is meant, in addition to those registers in a physically regular register file structure, to stand for all the registers which are used to specify a processor context even though some of these registers may be operationally non-analogous and physically quite separate or different structures on the chip real estate. The use of register identification RegID values in Register Space (
The architecture of
The address generator, if used to sum the Operand as an offset to a base address, may deliver a succession of memory address values in non-contiguous portions of Memory Address Space in response to a succession of a repeat multiple instructions that operate through the hardware of
In
A simple example of contiguous ranges of numbers is that a range 1-5 (decimal) is noncontiguous with a range 8-12. By contrast a range 8-12 is contiguous with a range 13-14. Non-contiguous ranges are such that when range end and start values are subtracted from each other, the differences are all at least two (2). Contiguous ranges have at least one difference of range end and start values that exactly equals one (1).
A refinement of the contiguousness concept is that byte ranges are bytewise contiguous when the foregoing numerical subtraction definition pertains at the byte level, such as when all bytes in a series of 32-bit registers have contents full. Word ranges are wordwise contiguous when the foregoing numerical subtraction definition pertains at the word level even though the word may have only one byte of content. Longword ranges are longword-wise contiguous when the foregoing numerical subtraction definition pertains at the longword level even though the longword may be missing one, two or three bytes of content, as illustrated in
In the Middle area of main pipeline 4410, a first Push PSH in a series of pushes makes a Source selection using Source selector 4520 and the actual source register in Register File 4544 is just updated by execution of one or more previous instructions farther down in the Execute pipestage(s). The selected part of Register File 4544 is muxed out and piped down to the End area of the main pipeline 4410. Concurrently, the address from Address Generator of address pipeline 4420 is piped down correspondingly to the End area of address pipeline 4420 before assertion as a memory address PUSH ADDR for the Push to access memory 4480 and write the data PUSH DATA from the End area of main pipeline 4410 to memory 4480. In this way the data PUSH DATA is fully updated with the any pertinent results of execution of the previous instruction(s) that were farther down in the Execute pipestage(s) of main pipeline 4410 when the Source selector 4520 was operated as part of the overall operation of Push.
By contrast, the last POP in a series of pops makes a Destination selection using Destination selector 4540, also in the Middle area of main pipeline 4410. Destination selector 4540 loads Register File 4544 in the Middle of the pipeline 4410. A new non-Pop instruction is likely to be right behind the last POP in the pipeline. In this way, the new non-Pop instruction is able to immediately use the restored contents of Register File 4544 in the Execute stages thereafter. Thus, Pop operates conversely to Push in the sense that restore is the opposite of save, but the location and timing of the Pop operation in the pipeline is not simply a reverse operation in the same place. In
From a pipeline architecture viewpoint, RegisterID generation logic 4010 of
The selection circuits 4520 and 4540 of
The selection circuits 4520 and 4540 have some circuitry for decoding the operand (RegID) onto access signal lines that enable the access and that physically realize and correspond to the organization of Register Space, i.e., the correspondences of various RegID values in Register Space to each respective actual register or storage element in the processor hardware that is needed to define the context or is otherwise pertinent to a given transfer of information that is to be effectuated. The organization of Register Space and the circuitry of the selection circuits 4520 and 4540 that implement Register Space are suitably arranged or designed by the skilled worker in accordance with the teachings herein so that the amount of context save/restore software, an example of which is shown in TABLES 6 and 7, operates on few enough sets of contiguous RegID values so that the number of operand value ranges (indexed i, not n, in the Savings equation elsewhere herein) is small enough to be convenient for purposes of a given system and its foreseeable upgrades. A nonvolatile memory such as a flash memory in the system, or boot flash space in the processor core or other suitably located nonvolatile memory, is programmed with a plurality of repeat and repeated instructions as sequential instructions defining plural operand value ranges indexed i that can be non-contiguous, for specifying operations of an instruction operand value generating circuit.
A first example of an instruction operand value generating circuit is the combination of bias value generator circuit 3900 with RegisterID generation logic 4010 of
In some embodiments as illustrated in
Some embodiments also utilize register access by RegID asserted by multiple repeat of the repeated instruction in plural non-contiguous operand value ranges for information transfer between each accessed register and a hardware stack. The hardware stack automatically responds to each Push and Pop without need of address generation to push and pop the hardware stack.
Parallelizing execution of the Repeat instruction is also contemplated by using plural-ported memory for memory 4480 in some embodiments, performing wide accesses to register file 4544, and using the address pipeline or associated circuitry to do concurrent accesses to the plural ports of the plural-ported memories. Source selection circuit 4520 and Destination selection circuit 4540 are hardwired or configured to respond to each RegID identifying a given shorter or wider width portion of a context register (like AR0H and AR0) or the entire shorter or wider width context register itself (like XAR0 and registers 4580) to apply appropriate byte enable(s) to access the corresponding portion of that register or the entire register. The circuitry accommodates various types of memory caching and caches with cache line access. For instance, access to a memory cache in some embodiments transfers an entire wide cache line of several words between cache and a cache line wide register for quick access and the appropriate byte enables are applied at both the context register and the cache access bus and/or the cache line wide register to transfer one or more bytes therebetween.
This approach also confers flexibility to software to retrieve context in pieces, if desired, and execute some application code right away that may only depend on part of the context information. Thus, some application code may be executed in between the execution of pieces of software that retrieve parts of a given context for effectively-faster context switches or returns.
In
Memory Address Space usefully accommodates information from registers that describes each of several contexts, wherein respective context saves of information in context registers specified by the RegID values in Register Space are performed as the processor goes through operations in different contexts and switches between contexts. In some embodiments, Register Space is independent of and separate from Memory Address Space. For example, when Source selection 4520 and Destination selection 4540 are not directly accessible by asserting a memory address on a memory address bus, then Register Space is independent of and separate from Memory Address Space. Security of Register Space is enhanced and pipeline operation does not involve accesses to Register Space by memory addresses.
The circuitry of
Register Space can be separate and independent from Memory Address Space, or may partially overlap Memory Address Space. Register Space pertains to all registers which the skilled worker designers to include and in some embodiments suitably includes all context-defining registers of a processor.
“ALLa” herein means a register belongs in ALLx register group, see TABLE 2 Glossary. The instruction format dbl(push(ALLa)) is decoded to deliver a register identification RegID value as operand on line 4022 of
-
- repeat(#n)
- dbl(push(AC0)).
“ALLa” is also used as a generalized expression of “a register” in processor assembly language, similar to expressing a concept in algebra, to which concrete numbers are applied later. ALLa and ALLb are analogous to pronouns of a language. ALLa can be used to indicate the register which is literally named in a given instruction, and ALLb can be used to indicate the register which is actually indicated in any given instance of successive generation of different instances of a repeated instruction.
The same encoding is assigned for “ALLa” and register identification RegID. Alphabetic “ALLa” is encoded at assembly time. ALLx when it is first register operand in the instruction, is written ALLa. ALLx when it is second register operand in the instruction, is written ALLb.
In the generalized use herein, a push instruction is represented (on documents, or in generic form) as “push(ALLa)” and then used in the computer program code with actual register selection dbl(push(AC0)); push to stack accumulator0 32 bit value, or
push(AC1.L); push to stack the lower 16 bits of accumulator1.
Data access is suitably any appropriate width, and in one example the register file RF registers are accessed register by register when reading from or writing to memory.
RPT instruction followed by PUSH/POP instruction results in a multi-cycle instruction that does not pre-establish or limit operation to a fixed range of registers. Instead, a number N of registers to save and identification of which registers to save are both user defined.
Further Embodiments
In
- RPT #n; Repeat next instruction n+1 times, initialize RPTC to n.
- PSH AC0; Push sequence starts at RegID of register AC0 plus RPTC repeat #n
- and decrements RPTC, ending at RegID of AC0 itself.
When restoring an accumulator context, the following code is used:
- RPT #n; Repeat next instruction n+1 times, initialize RPTC to 0.
- POP AC0; Pop sequence starts at RegID of register AC0 plus RPTC=0
- and increments RPTC, ending at RegID plus repeat number #n.
The stack operates as a Last In First Out (LIFO) memory so the operation is done in the reverse order. The operand RegID field is modified as RegID plus RPTC for both Push and Pop.
In
In
Instruction Register IR 3626.i is frozen by the STOP signal from AND-gate 3860 during the down counting. The down counting RPTC value is successively summed by arithmetic element 4820 with the Operand value for RegID (e.g. of AC0) provided by Instruction Decoder 3625 on line 4022. The output 4854 of arithmetic element 4020 operating as an adder is coupled by mux 4050 output 4056 to an operand portion of Instruction Pipe Register 3810. Comparator 4850 detects when the RPTC value on line 3832 equals zero, the value stored in CONST register 4935 for push. Then comparator 4850 disables decrementing by decrementor/incrementor circuit 4840 and the repeated Push is complete.
Conversely, in
Notice that for either Push or Pop, decrementor/incrementor circuit 4840 selectively establishes the direction of counting depending on the nature of the repeated instruction as Push or Pop, Store or Load, or otherwise. Also, notice that for either Push or Pop, comparator 4850 determines when register RPTC has reached an opposite end of the programmable range of bias values from which counting began.
As in
In
Scan controller 3990 is operable to probe, debug, and verify this circuitry along at least one scan path linking the following registers to the scan controller by serial scanning in and scanning out bits in register SRAF, the Configuration Register in block 4980, the CSR register 3945, CONST register 4935, Instruction Pipe Register 3810, and register RPTC 3830.
Examples of a set of configuration codes for a first code field are shown in TABLE 8.1, with xxx in the second code field:
Examples of a set of configuration codes for a second code field are shown in TABLE 8.2, with xxx in the first code field. The terminology uOPcode1 refers to a first operation that generates data or sets up a first transition of location of data, such as PSH, ST, etc; and uOPcode2 refers to a second reverse operation that restores things as they were before the application of uOPcode1 or reverses the first transition of location of data, such as POP, LD, etc. The symbolism <RegID> means an alphanumeric register name (e.g., AR6, AC0, PDP, etc.) having a register identification RegID in Register Space. CONST refers to register 4935 value for comparison with RPTC for Not-Equal detector 4850. RPTC in this TABLE 8.2 refers to the initial value is supplied by mux 4930 output 4932 to register 3830 from which counting begins. Dec or Inc refers to mode of operation of decrementor/incrementor 4840. Add or Subtract refers to mode of operation of arithmetic element 4020. In TABLE 8.2, a respective such list {CONST, RPTC, Inc/Dec, Add/Subtract} is respectively provided underneath each corresponding uOPcode1 and uOPcode2.
A first form of reconfiguration changes the mode of operation of adder 4020 to provide a subtracting input mode for a line 3932. Then, for example, when saving/restoring n accumulator registers to the stack, the following code is used:
- RPT #n; Repeat next instruction n+1 times, initialize RPTC to n.
- PSH ACn; Push sequence starts at RegID of register ACn minus
- RPTC repeat number #n and decrements RPTC, ending at RegID
- of AC0.
- . . .
- RPT #n; Repeat next instruction n+1 times, initialize RPTC to 0.
- POP ACn; Pop sequence starts at RegID of register AC0 minus RPTC=0
- and increments RPTC, ending at RegID plus repeat number #n.
Assembler syntax in another example has a listing as follows.
-
- RPT #15
- PUSH ARx; push AR0˜AR15, Assembler encode operand field as AR0
- RPT #15
- POP ARx; pop AR15˜AR0, Assembler encodes an operand field as AR15.
Some further embodiments prepare an assembler macro like push (AC15-AC0) and it is encoded as repeat+push.
Some other further embodiments pack “RPT #15” and “PUSH ARx” as one instruction symbol like “MPUSH ARx,” for instance. In such embodiments, a further code packing advantage is obtained by packing a repeat instruction and a push or pop instruction together.
Another application of an embodiment utilizes the below example.
-
- ADD AC0 AC1; AC0=AC0+AC1
In the RPT,
-
- RPT #5
- ADD AC0 AC1; Accumulate AC1, AC2, AC3, AC4, AC5 and AC6
Some other embodiments apply not only to the operand field but also to the opcode field of an instruction. Operations are suitably performed sequentially on one register and/or memory space at a time or on plural registers and/or memory spaces at a time. In such case, consider the multiple repeat instruction
-
- RPT #8
- Push (AC0, AC1)
This multiple repeat instruction pushes AC0 and AC1 in a first push, then AC2 and AC3 in a second push, . . . and finally AC14 and AC15 in a last push. Besides pairs of registers of this example, other numbers of registers can be concurrently repeat-pushed/popped.
Still further embodiments provide a useful instruction sequence by assigning a sequential sub-opcode field for a given instruction. Repeat counter RPTC modifies the sub-opcode field (and perhaps operand field also) of the given instructions and thereby realizes that instruction sequence. Some of these embodiments also have Repeat counter RPTC modify the operand field of the given instruction and thereby realizes a further type of instruction sequence.
A repeat instruction in yet further embodiments is applied to a block of instructions thereafter. For instance, in such an embodiment with a block of just two instructions held in parallel in Instruction Registers IR1 and IR2 respectively for execution down a pair of superscalar pipes, an example of the code is written
-
- RPT #n
- PSH(AC0), PSH(PDP)
. . . - RPT #n
- POP(AC0), POP(PDP)
Each of the instructions in the block has the same repeat number #n applicable to it, so the Repeat Counter RPTC circuitry of
-
- RPT #n
- PSH(AC0)
- RPT #n
- PSH(PDP)
. . . - RPT #n
- POP(AC0)
- RPT #n
- POP(PDP).
The order of the saving of the registers to the memory 4480 presents no difficulty for a multiple repeat Push operation like context save because the reverse operation of multiple repeat Pop performs context restore into the original register locations in Register Space.
Some embodiments have a multiple repeat instruction of any of the foregoing types that is made to be a conditional instruction that operates on a built-in condition such as IF, WHILE, etc., involving status bits or status register bit fields for statuses such as carry, less than zero, equal to zero, etc. The instruction evaluates a condition defined by its condition field and as long as the condition is true, the repeat instruction is repeatedly executed. In the decode pipeline, the SRAF and a While Repeat Active Flag WRAF are set active. At each repeat operation, the condition defined in the condition field of the instruction is tested in an execute pipe stage, and when the condition becomes false, the repeat operation is stopped. RPTC shows how many iterations remained to be performed. In a pipeline structure wherein the condition is evaluated in an execute pipestage, then when the condition tests false, some of the succeeding iterations of that repeated instruction may already be in address generation or read pipestages. When the while repeat structure is exited, reading the computed single repeat (CSR) content enables a determination of how many instructions have gone through the address generation phase of the pipeline. An unconditional single repeat instruction is used to rewind the pointer registers if a false condition has been met inside the while repeat structure. An interrupt can be serviced during conditional repeating. SRAF and WRAF are saved to the stack along with the returned address and then recovered upon the return.
Some embodiments have one or more types of macro-instruction that includes multiple micro-instructions, one or more of which micro-instructions includes a multiple repeat instruction.
Some other embodiments program the counter and the counter counts to some end-of-range value other than #n or zero (0). Both ends of the range are programmed by configuration of plural register values for start and end of the range in some embodiments.
Still other embodiments use some other function for value V besides an addition
V=Op+RPTC
to vary the operand. For instance, another contemplated function is a more complicated linear function wherein either or both of the operand Op and the counter value RPTC have multiplicative constants or coefficients associated with them according to the relationship
V=c1Op+c2RPTC.
In
Some further embodiments use a nonlinear function. One simple example of a nonlinear function is a multiplicative product of the operand Op times the counter value RPTC according to the relationship
V=c1Op×RPTC.
Other further embodiments vary the values and cover the programmable range in some manner such as
Op+(n,n−2,n−1,n−3, . . . 0),
or in a pseudorandom manner in the programmable range, or otherwise.
Put another way, the RPTC register in some embodiments is not used as a counter and instead holds successive values that are not all in a decrementing or incrementing order of counting. The successive values result from operation of any suitable circuit for generating them. Some embodiments do not wholly use the operand value range and/or do not fill up or cover the programmable range with RPTC values. The phrase “bias value generator circuit” is expansively used herein to refer to all counting and non-counting types of embodiments because both generate bias values with which to bias the operand. Thus many embodiments are contemplated.
In
The results of scan/test 4630 are evaluated at a step 4635, and if corrections are needed, then operations loop back to step 4610. Otherwise operations proceed to system integration step 4640 wherein one or more processor integrated circuits are stuffed onto printed wiring board(s).
In a step 4645, a flash memory is programmed with system parameters, boot configuration, and data for configuration register 4980 for the circuitry of
A step 4650 tests the multiple push/pop or other repeat multiple instructions for correct operation of the processor and in the system. An evaluation step 4655 determines whether the test results are all right, and if not, operations of a step 4660 adjust the parameters and loop back to step 4645 or back to step 4610 if need be. If the test results are all right, operations proceed to a step 4670 to assemble telecommunications units or other products for sale and consumption, whereupon an End 4675 is reached.
Various embodiments are used with one or more microprocessors, each microprocessor having a pipeline is selected from the group consisting of 1) reduced instruction set computing (RISC), 2) digital signal processing (DSP), 3) complex instruction set computing (CISC), 4) superscalar, 5) skewed pipelines, 6) in-order, 7) out-of-order, 8) very long instruction word (VLIW), 9) single instruction multiple data (SIMD), 10) multiple instruction multiple data (MIMD), 11) multiple-core using any one or more of the foregoing, and 12) microcontroller pipelines, control peripherals, and other micro-control blocks using any one or more of the foregoing.
Various embodiments are implemented in any integrated circuit manufacturing process such as different types of CMOS (complementary metal oxide semiconductor), SOI (silicon on insulator), SiGe (silicon germanium), organic transistors, and with various types of transistors such as single-gate and multiple-gate (MUGFET) field effect transistors, and with single-electron transistors and other structures. Photonic integrated circuit blocks, components, and interconnects are also suitably applied in various embodiments.
While some embodiments may have an entire feature totally absent or totally present, other embodiments, such as those performing the blocks and steps of the Figures of drawing, have more or less complex arrangements that execute some process portions, selectively bypass others, and have some operations running concurrently sequentially regardless. Accordingly, words such as “enable,” “disable,” “operative,” “inoperative” are to be interpreted relative to the code and circuitry they describe. For instance, disabling (or making inoperative) a second function by bypassing a first function can establish the first function and modify the second function. Conversely, making a first function inoperative includes embodiments where a portion of the first function is bypassed or modified as well as embodiments where the second function is removed entirely. Bypassing or modifying code increases function in some embodiments and decreases function in other embodiments.
A few preferred embodiments have been described in detail hereinabove. It is to be understood that the scope of the invention comprehends embodiments different from those described yet within the inventive scope. Microprocessor and microcomputer are synonymous herein. Processing circuitry comprehends digital, analog and mixed signal (digital/analog) integrated circuits, ASIC circuits, PALs, PLAs, decoders, memories, non-software based processors, microcontrollers and other circuitry, and digital computers including microprocessors and microcomputers of any architecture, or combinations thereof. Internal and external couplings and connections can be ohmic, capacitive, inductive, photonic, and direct or indirect via intervening circuits or otherwise as desirable. Implementation is contemplated in discrete components or fully integrated circuits in any materials family and combinations thereof. Various embodiments of the invention employ hardware, software or firmware. Process diagrams herein are representative of flow diagrams for operations of any embodiments whether of hardware, software, or firmware, and processes of manufacture thereof.
While this invention has been described with reference to illustrative embodiments, this description is not to be construed in a limiting sense. Various modifications and combinations of the illustrative embodiments, as well as other embodiments of the invention may be made. The terms “including”, “includes”, “having”, “has”, “with”, or variants thereof are used in the detailed description and/or the claims to denote non-exhaustive inclusion in a manner similar to the term “comprising”. It is therefore contemplated that the appended claims and their equivalents cover any such embodiments, modifications, and embodiments as fall within the true scope of the invention.
Claims
1. A processing system comprising:
- a printed circuit board;
- a volatile memory;
- a processor arranged on the printed circuit board and coupled to the volatile memory, wherein the processor includes a pipeline, an instruction register, a set of first storage elements having a first width, and a set of second storage elements having a second width, the first width being greater than the second width; and
- a non-volatile memory that is separate from the processor and arranged on the printed circuit board and coupled to the processor, the non-volatile memory being configured to hold representations of instructions for the instruction register to save and restore contents of the first and second sets of storage elements to the volatile memory, the instructions including a repeat instruction as well as a repeated instruction having an operand;
- wherein the processor further includes: an instruction operand value generating circuit configured to generate values varying in an operand value range and biasedly related to the operand of the repeated instruction represented in the non-volatile memory; and selection circuitry in the pipeline coupled to the instruction operand value generating circuit and configured to use the values to access the sets of first and second storage elements, and thereby facilitate transfers of information between the sets of first and second storage elements and the volatile memory.
2. The processing system as claimed in claim 1, wherein the volatile memory has a memory address space and the selection circuitry is responsive to the values to support information transfers from the sets of first and second storage elements corresponding to values in noncontiguous operand value ranges to contiguous spaces in the memory address space of the volatile memory.
3. The processing system as claimed in claim 1, further comprising a wireless modem and a user interface coupled to the processor on the printed circuit board, whereby a mobile telecommunications apparatus is provided.
4. The processing system as claimed in claim 1, wherein:
- the volatile memory has a memory address bus coupled to the pipeline;
- the selection circuitry is separate from the memory address bus; and
- a register space for the sets of first and second storage elements is separate from a memory address space for the volatile memory.
5. The processing system as claimed in claim 1, wherein the non-volatile memory is configured to be programmed with a plurality of sequential instructions defining plural non-contiguous operand value ranges.
4713749 | December 15, 1987 | Mager et al. |
4785392 | November 15, 1988 | Maier |
5241679 | August 31, 1993 | Nakagawa et al. |
5680568 | October 21, 1997 | Sakamura |
5765207 | June 9, 1998 | Curran |
5809514 | September 15, 1998 | Nasserbakht et al. |
5854930 | December 29, 1998 | McLain, Jr. et al. |
5870321 | February 9, 1999 | Konrad |
6209082 | March 27, 2001 | Col et al. |
6314564 | November 6, 2001 | Charles et al. |
6557093 | April 29, 2003 | Vlot et al. |
6554871 | April 29, 2003 | Aidan et al. |
6990570 | January 24, 2006 | Masse et al. |
7219170 | May 15, 2007 | Janus |
20020151314 | October 17, 2002 | Nohara |
20020194466 | December 19, 2002 | Catherwood et al. |
20030084432 | May 1, 2003 | Kobayashi |
20040012596 | January 22, 2004 | Allen et al. |
20040123084 | June 24, 2004 | Dewitt, Jr. et al. |
20050289208 | December 29, 2005 | Harrison et al. |
20060218380 | September 28, 2006 | Boggs |
20060248315 | November 2, 2006 | Honda |
20070150705 | June 28, 2007 | Mishaeli et al. |
20110161943 | June 30, 2011 | Bellows |
A-56-143583 | November 1981 | JP |
- Kip R Irvine; “Assembly Language for Intel-Based Computers”; Fifth Edition; Jun. 2006; pp. 27, 87-88, 105-107.
Type: Grant
Filed: Nov 19, 2018
Date of Patent: Feb 18, 2020
Patent Publication Number: 20190102171
Assignee: TEXAS INSTRUMENTS INCORPORATED (Dallas, TX)
Inventors: Kenichi Tashiro (Tsukuba), Hiroyuki Mizuno (Kashiwa), Yuji Umemoto (Tsuchiura)
Primary Examiner: William B Partridge
Application Number: 16/194,668
International Classification: G06F 9/30 (20180101);