METHOD AND APPARATUS FOR ASYNCHRONOUS PROCESSOR WITH A TOKEN RING BASED PARALLEL PROCESSOR SCHEDULER
A method of operating a clock-less asynchronous processing system comprising a plurality of successive asynchronous processing components. The method comprises providing a first token signal path in the plurality of processing components to allow propagation of a token through the processing components. Possession of the token by one of the processing components enables the processing component to conduct a transaction with a resource component that is shared among the processing components. The method comprises propagating the token from one processing component to another processing component along the token signal path.
This application claims priority under 35 USC 119(e) to U.S. Provisional Application Ser. Nos. 61/874,794, 61/874,810, 61/874,856, 61/874,914, 61/874,880, 61/874,889, and 61/874,866, all filed on Sep. 6, 2013, and all of which are incorporated herein by reference.
This application is related to:
U.S. patent application Ser. No. ______ entitled “METHOD AND APPARATUS FOR ASYNCHRONOUS PROCESSOR WITH FAST AND SLOW MODE” and filed on the same date herewith, and identified by attorney docket number HUAW07-06583, and which is incorporated herein by reference;
U.S. patent application Ser. No. ______ entitled “METHOD AND APPARATUS FOR ASYNCHRONOUS PROCESSOR REMOVAL OF META-STABILITY” and filed on the same date herewith, and identified by attorney docket number HUAW07-06400, and which is incorporated herein by reference;
U.S. patent application Ser. No. ______ entitled “METHOD AND APPARATUS FOR ASYNCHRONOUS PROCESSOR WITH A TOKEN RING BASED PARALLEL PROCESSOR SCHEDULER” and filed on the same date herewith, and identified by attorney docket number HUAW07-06376, and which is incorporated herein by reference;
U.S. patent application Ser. No. ______ entitled “METHOD AND APPARATUS FOR ASYNCHRONOUS PROCESSOR PIPELINE AND BYPASS PASSING” and filed on the same date herewith, and identified by attorney docket number HUAW07-06364, and which is incorporated herein by reference; and
U.S. patent application Ser. No. ______ entitled “METHOD AND APPARATUS FOR ASYNCHRONOUS PROCESSOR BASED ON CLOCK DELAY ADJUSTMENT” and filed on the same date herewith, and identified by attorney docket number HUAW07-06351, and which is incorporated herein by reference.
TECHNICAL FIELDThe present disclosure relates generally to asynchronous processors, and more particularly to an asynchronous processor with a token ring based parallel processor scheduler.
BACKGROUNDHigh performance synchronous digital processing systems utilize pipelining to increase parallel performance and throughput. In synchronous systems, pipelining results in many partitioned or subdivided smaller blocks or stages and a system clock is applied to registers between the blocks/stages. The system clock initiates movement of the processing and data from one stage to the next, and the processing in each stage must be completed during one fixed clock cycle. When certain stages take less time than a clock cycle to complete processing, the next processing stages must wait—increasing processing delays (which are additive).
In contrast, asynchronous systems (i.e., clockless) do not utilize a system clock and each processing stage is intended, in general terms, to begin its processing upon completion of processing in the prior stage. Several benefits or features are present with asynchronous processing systems. Each processing stage can have a different processing delay, the input data can be processed upon arrival, and consume power only on demand.
Now turning to
Accordingly, there are needed asynchronous processing systems, asynchronous processors, and methods of asynchronous processing that are stable and detect and resolve potential hazards.
SUMMARYAccording to one embodiment, there is provided a method of operating a clock-less asynchronous processing system comprising a plurality of successive asynchronous processing components. The method comprises providing a first token signal path in the plurality of processing components to allow propagation of a token through the processing components. Possession of the token by one of the processing components enables the processing component to conduct a transaction with a resource component that is shared among the processing components. The method comprises propagating the token from one processing component to another processing component along the token signal path.
In another embodiment, there is provided a clock-less asynchronous processing system. The processing system comprises a plurality of successive asynchronous processing components, each processing component comprising token processing logic configured to receive, hold and pass a token from a given processing component to another processing component. The token processing logic comprises token signal path in the plurality of processing components to allow propagation of the token through the processing components. Possession of the token by one of the processing components enables the processing component to conduct a transaction a resource component that is shared among the processing components.
For a more complete understanding of the present disclosure, and the advantages thereof, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, wherein like numbers designate like objects, and in which:
Asynchronous technology seeks to eliminate the need of synchronous technology for a global clock-tree which not only consumes an important portion of the chip power and die area, but also reduces the speed(s) of the faster parts of the circuit to match the slower parts (i.e., the final clock-tree rate derives from the slowest part of a circuit). To remove the clock-tree (or minimize the clock-tree), asynchronous technology requires special logic to realize a handshaking protocol between two consecutive clock-less processing circuits. Once a clock-less processing circuit finishes its operation and enters into a stable state, a signal (e.g., a “Request” signal) is triggered and issued to its ensuing circuit. If the ensuing circuit is ready to receive the data, the ensuing circuit sends a signal (e.g., an “ACK” signal) to the preceding circuit. Although the processing latencies of the two circuits are different and varying with time, the handshaking protocol ensures the correctness of a circuit or a cascade of circuits.
Hennessy and Patterson coined the term “hazard” for situations in which instructions in a pipeline would produce wrong answers. A structural hazard occurs when two instructions might attempt to use the same resources at the same time. A data hazard occurs when an instruction, scheduled blindly, would attempt to use data before the data is available in the register file.
With reference to
The L1/L2 cache memory 340 may be subdivided into L1 and L2 cache, and may also be subdivided into instruction cache and data cache. Likewise, the cache controller 320 may be functionally subdivided.
Aspects of the present disclosure provide architectures and techniques for a clock-less asynchronous processor architecture that utilizes a token ring based parallel processor scheduler. A token system is a two-dimensional system. Within a functional unit, tokens gate each other to form a closed loop. Across functional units, a token signal is delayed “deliberately” to avoid a structural hazard. A token-based asynchronous processor uses a token system to “emulate” a pipeline to yield instruction-level-parallelism (ILP) to preserve the program order, and avoid the data/structural/control hazards.
As described above with respect to
Designated tokens are used to gate other designated tokens in a given order of the pipeline. This means that when a designated token passes through an ALU, a second designated token is then allowed to be processed and passed by the same ALU in the token ring architecture. In other words, releasing one token by the ALU becomes a condition to consume (process) another token in that ALU in that given order.
A particular example of a token-gating relationship is illustrated in
The token ring 2704 allows propagation of a token through the ALUs 2702. Token processing logic is provided (not shown) for propagating the token from one ALU to other ALU amongst the ALUs 2702 along the token ring 2704. The token processing logic is configured to propagate the token between the ALUs 2708 at a propagation rate that is related to a transaction rate of the shared external resource 2708. For example, the rate at which the ALU completes a transaction may vary depending on the specific transaction requested.
Each token in the token ring 2704 is a signal indicator for the availability of one or more of the external resources 2708. The token is such that only one ALU amongst the ALUs 2702 can possess it at any given time. In a specific example of implementation, possession of the token by a given ALU enables the given ALU to conduct a transaction with the shared external resource 2708. Conversely, lack of possession of the token by the given ALU prevents the given ALU from conducting a transaction with the shared external resource 2708. In this manner, the token allows preventing more than one ALU from conducting a transaction with the external resource 2708 at a given time. After a given ALU conducts a transaction with the shared external resource 2708, or if the given ALU does not wish to conduct a transaction with the shared external resource 2708, the ALU releases or “passes” the token to the next ALU. Serialized in this way, multiple ALUs can share a common external resource. As illustrated, multiple tokens may be required to control access to the shared external resources 2708 via an N-bit selection control signal 2712 and the multiplexor 2706.
In this example, the communication system 1400 includes user equipment (UE) 1410a-1410c, radio access networks (RANs) 1420a-1420b, a core network 1430, a public switched telephone network (PSTN) 1440, the Internet 1450, and other networks 1460. While certain numbers of these components or elements are shown in
The UEs 1410a-1410c are configured to operate and/or communicate in the system 1400. For example, the UEs 1410a-1410c are configured to transmit and/or receive wireless signals or wired signals. Each UE 1410a-1410c represents any suitable end user device and may include such devices (or may be referred to) as a user equipment/device (UE), wireless transmit/receive unit (WTRU), mobile station, fixed or mobile subscriber unit, pager, cellular telephone, personal digital assistant (PDA), smartphone, laptop, computer, touchpad, wireless sensor, or consumer electronics device.
The RANs 1420a-1420b here include base stations 1470a-1470b, respectively. Each base station 1470a-1470b is configured to wirelessly interface with one or more of the UEs 1410a-1410c to enable access to the core network 1430, the PSTN 1440, the Internet 1450, and/or the other networks 1460. For example, the base stations 1470a-1470b may include (or be) one or more of several well-known devices, such as a base transceiver station (BTS), a Node-B (NodeB), an evolved NodeB (eNodeB), a Home NodeB, a Home eNodeB, a site controller, an access point (AP), or a wireless router, or a server, router, switch, or other processing entity with a wired or wireless network.
In the embodiment shown in
The base stations 1470a-1470b communicate with one or more of the UEs 1410a-1410c over one or more air interfaces 1490 using wireless communication links. The air interfaces 1490 may utilize any suitable radio access technology.
It is contemplated that the system 1400 may use multiple channel access functionality, including such schemes as described above. In particular embodiments, the base stations and UEs implement LTE, LTE-A, and/or LTE-B. Of course, other multiple access schemes and wireless protocols may be utilized.
The RANs 1420a-1420b are in communication with the core network 1430 to provide the UEs 1410a-1410c with voice, data, application, Voice over Internet Protocol (VoIP), or other services. Understandably, the RANs 1420a-1420b and/or the core network 1430 may be in direct or indirect communication with one or more other RANs (not shown). The core network 1430 may also serve as a gateway access for other networks (such as PSTN 1440, Internet 1450, and other networks 1460). In addition, some or all of the UEs 1410a-1410c may include functionality for communicating with different wireless networks over different wireless links using different wireless technologies and/or protocols.
Although
As shown in
The UE 1410 also includes at least one transceiver 1502. The transceiver 1502 is configured to modulate data or other content for transmission by at least one antenna 1504. The transceiver 1502 is also configured to demodulate data or other content received by the at least one antenna 1504. Each transceiver 1502 includes any suitable structure for generating signals for wireless transmission and/or processing signals received wirelessly. Each antenna 1504 includes any suitable structure for transmitting and/or receiving wireless signals. One or multiple transceivers 1502 could be used in the UE 1410, and one or multiple antennas 1504 could be used in the UE 1410. Although shown as a single functional unit, a transceiver 1502 could also be implemented using at least one transmitter and at least one separate receiver.
The UE 1410 further includes one or more input/output devices 1506. The input/output devices 1506 facilitate interaction with a user. Each input/output device 1506 includes any suitable structure for providing information to or receiving information from a user, such as a speaker, microphone, keypad, keyboard, display, or touch screen.
In addition, the UE 1410 includes at least one memory 1508. The memory 1508 stores instructions and data used, generated, or collected by the UE 1410. For example, the memory 1508 could store software or firmware instructions executed by the processing unit(s) 1500 and data used to reduce or eliminate interference in incoming signals. Each memory 1508 includes any suitable volatile and/or non-volatile storage and retrieval device(s). Any suitable type of memory may be used, such as random access memory (RAM), read only memory (ROM), hard disk, optical disc, subscriber identity module (SIM) card, memory stick, secure digital (SD) memory card, and the like.
As shown in
Each transmitter 1552 includes any suitable structure for generating signals for wireless transmission to one or more UEs or other devices. Each receiver 1554 includes any suitable structure for processing signals received wirelessly from one or more UEs or other devices. Although shown as separate components, at least one transmitter 1552 and at least one receiver 1554 could be combined into a transceiver. Each antenna 1556 includes any suitable structure for transmitting and/or receiving wireless signals. While a common antenna 1556 is shown here as being coupled to both the transmitter 1552 and the receiver 1554, one or more antennas 1556 could be coupled to the transmitter(s) 1552, and one or more separate antennas 1556 could be coupled to the receiver(s) 1554. Each memory 1558 includes any suitable volatile and/or non-volatile storage and retrieval device(s).
Additional details regarding UEs 1410 and base stations 1470 are known to those of skill in the art. As such, these details are omitted here for clarity.
In some embodiments, some or all of the functions or processes of the one or more of the devices are implemented or supported by a computer program that is formed from computer readable program code and that is embodied in a computer readable medium. The phrase “computer readable program code” includes any type of computer code, including source code, object code, and executable code. The phrase “computer readable medium” includes any type of medium capable of being accessed by a computer, such as read only memory (ROM), random access memory (RAM), a hard disk drive, a compact disc (CD), a digital video disc (DVD), or any other type of memory.
It may be advantageous to set forth definitions of certain words and phrases used throughout this patent document. The terms “include” and “comprise,” as well as derivatives thereof, mean inclusion without limitation. The term “or” is inclusive, meaning and/or. The phrases “associated with” and “associated therewith,” as well as derivatives thereof, mean to include, be included within, interconnect with, contain, be contained within, connect to or with, couple to or with, be communicable with, cooperate with, interleave, juxtapose, be proximate to, be bound to or with, have, have a property of, or the like.
While this disclosure has described certain embodiments and generally associated methods, alterations and permutations of these embodiments and methods will be apparent to those skilled in the art. Accordingly, the above description of example embodiments does not define or constrain this disclosure. Other changes, substitutions, and alterations are also possible without departing from the spirit and scope of this disclosure, as defined by the following claims.
Claims
1. A method of operating a clock-less asynchronous processing system comprising a plurality of successive asynchronous processing components, the method comprising:
- providing a first token signal path in the plurality of processing components to allow propagation of a token through the processing components, wherein possession of the token by one of the processing components enables the processing component to conduct a transaction with a resource component that is shared among the processing components; and
- propagating the token from one processing component to another processing component along the first token signal path.
2. The method in accordance with claim 1, wherein propagating the token is performed at a propagation rate that is related to a latency associated with the processing component.
3. The method in accordance with claim 2, wherein the latency is variable and is based on an operation to be conducted by the processing component.
4. The method in accordance with claim 1, wherein propagating the token is performed at a propagation rate that is related to a transaction rate associated with the shared resource component.
5. The method as defined in claim 4, wherein the transaction rate is variable and is based on the transaction to be conducted with the shared resource component.
6. The method in accordance with claim 1, wherein lack of possession of the token by the processing component prevents the processing component from conducting a transaction with the shared resource component.
7. The method in accordance with claim 1, further comprising:
- in response to determining that the processing component desires no transaction with the shared resource component, releasing the token so that the token is propagated along the token signal path to another processing component.
8. The method in accordance with claim 1, further comprising:
- providing a second token signal path in the plurality of processing components separate and distinct from the first token signal path to allow propagation of a second token through the processing components, wherein the first token signal path and the second token signal path form a multi-token ring.
9. The method in accordance with claim 8, further comprising:
- providing an intra-processing component gating system, wherein a first designated token of a plurality of tokens is used to gate other designated tokens in a given order.
10. The method in accordance with claim 9, wherein releasing the designated token by the processing component becomes a condition to consume another token in the processing component in the given order.
11. The method in accordance with claim 8, further comprising:
- providing an inter-processing component passing system, wherein the first token is delayed from passing from a first processing component to a second processing component to avoid a structural hazard.
12. The method in accordance with claim 11, further comprising:
- providing an intra-processing component gating system, wherein a first designated token of a plurality of tokens is used to gate other designated tokens in a given order;
- wherein the inter-processing component passing system and the intra-processing component gating system form a pipeline with different stages.
13. A clock-less asynchronous processing system comprising:
- a plurality of successive asynchronous processing components, each processing component comprising token processing logic configured to receive, hold and pass a token from a given processing component to another processing component;
- wherein the token processing logic comprises a token signal path in the plurality of processing components to allow propagation of the token through the processing components, wherein possession of the token by one of the processing components enables the processing component to conduct a transaction with a resource component that is shared among the processing components.
14. The processing system in accordance with claim 13, wherein the token processing logic is configured to propagate the token at a propagation rate that is related to a latency associated with the processing component.
15. The processing system in accordance with claim 14, wherein the latency is variable and is based on an operation to be conducted by the processing component.
16. The processing system in accordance with claim 13, wherein lack of possession of the token by the processing component prevents the processing component from conducting a transaction with the shared resource component.
17. The processing system in accordance with claim 13, wherein the token processing circuitry further comprises intra-processing component gating circuitry, where a first designated token of a plurality of tokens is used to gate other designated tokens in a given order.
18. The processing system in accordance with claim 17, wherein releasing the first designated token by the processing component becomes a condition to consume another token in the processing component in the given order.
19. The processing system in accordance with claim 13, wherein the token processing circuitry further comprises inter-processing component passing circuitry, wherein the token is delayed from passing from a first processing component to a second processing component to avoid a structural hazard.
20. The processing system in accordance with claim 19, wherein the token processing circuitry further comprises intra-processing component gating circuitry, wherein a first designated token of a plurality of tokens is used to gate other designated tokens in a given order;
- wherein the inter-processing component passing circuitry and the intra-processing component gating circuitry form a pipeline with different stages.
Type: Application
Filed: Sep 8, 2014
Publication Date: Mar 12, 2015
Inventors: Qifan Zhang (Lachine), Yiqun Ge (Ottawa), Wuxian Shi (Ottawa), Tao Huang (Ottawa), Wen Tong (Ottawa)
Application Number: 14/480,561
International Classification: G06F 9/50 (20060101); G06F 9/38 (20060101);