Multi-service system-on-chip including on-chip memory with multiple access path

Info

Publication number: 20020161978
Type: Application
Filed: Feb 28, 2002
Publication Date: Oct 31, 2002
Inventors: George Apostol (Santa Clara, CA), Jeffrey S. Earl (San Jose, CA)
Application Number: 10086938

Abstract

In an integrated circuit, a memory unit includes a first and a second data transfer interface. The first data interface services successive first accesses by a processor and subsystem of the IC, whereas the second data interface services second accesses by at least the processor in parallel. In one embodiment, the accesses are properly sequenced and responded to.

Description

Description

RELATED APPLICATION

[0001] This application claims priority to U.S. Provisional Application No. 60/272,439, entitled “MUILTI-SERVICE PROCESSOR INCLUDING A MULTI-SERVICE BUS”, filed Feb. 28, 2001, the specification of which is hereby fully incorporated by reference.

BACKGROUND OF THE INVENTION

[0002] 1. Field of the Invention

[0003] The present invention relates to the field of integrated circuit. More specifically, the present invention relates to inter-subsystem communication between subsystems on an integrated circuit device.

[0004] 2. Background Information

[0005] Advances in integrated circuit technology have led to the birth and proliferation of a wide variety of integrated circuits, including but not limited to application specific integrated circuits, micro-controllers, digital signal processors, general purpose microprocessors, and network processors. Recent advances have also led to the birth of what's known as “system on a chip” or SOC. Typically, a SOC includes multiple “tightly coupled” subsystems performing very different functions. These subsystems often have a need to communicate and cooperate with each other on a regular basis.

[0006] U.S. Pat. No. 6,122,690 discloses an on-chip bus architecture that is both processor independent and scalable. The '690 patent discloses a bus that uses “standardized” bus interfaces to couple functional blocks to the on-chip bus. The “standardized” bus interfaces include embodiments for bus master functional blocks, slave functional blocks, or either. The '690 bus suffers from at least one disadvantage in that it does not offer rich functionalities for prioritizing interactions or transactions between the subsystems, which are needed for a SOC with subsystems performing a wide range of very different functions.

[0007] Accordingly, a more flexible approach to facilitate inter-subsystem communication between subsystems on a chip is desired.

BRIEF DESCRIPTION OF DRAWINGS

[0008] The present invention will be described by way of exemplary embodiments, but not limitations, illustrated in the accompanying drawings in which like references denote similar elements, and in which:

[0009] FIG. 1 illustrates an overview of a system on-chip including an on-chip bus and a number of subsystems coupled to the on-chip bus, in accordance with one embodiment;

[0010] FIG. 2 illustrates the method of the present invention, in accordance with one embodiment;

[0011] FIGS. 3a-3b illustrate a request and a reply transaction between two subsystems, in accordance with one embodiment;

[0012] FIG. 4 illustrates data transfer unit of FIG. 1 in further detail, in accordance with one embodiment;

[0013] FIGS. 5a-5b illustrate the operational states and the transition rules for the state machines of FIG. 4, in accordance with one embodiment;

[0014] FIGS. 6a-6c are timing diagrams for practicing the present invention, in accordance with one implementation;

[0015] FIG. 7 illustrates another overview of a system on-chip further including a data traffic router, in accordance with an alternate embodiment;

[0016] FIG. 8 illustrates the data aspect of the data traffic router of FIG. 7 in further details, in accordance with one embodiment; and

[0017] FIG. 9 illustrates the control aspect of the data traffic router of FIG. 7 in further details, in accordance with one embodiment;

[0018] FIG. 10 illustrates another overview of a system on-chip further including a direct communication path between two subsystems, in accordance with an alternate embodiment; and

[0019] FIG. 11 illustrates an exemplary subsystem having a first data transfer interface interfacing with the data traffic router, and a second data transfer interface interfacing with another subsystem directly, in accordance with one embodiment.

DETAILED DESCRIPTION OF THE INVENTION

[0020] The present invention includes interface units and operational methods for flexibly facilitating inter-subsystem communication between subsystems of a SOC. In the following description, various features and arrangements will be described, to provide a thorough understanding of the present invention. However, the present invention may be practiced without some of the specific details or with alternate features/arrangement. In other instances, well-known features are omitted or simplified in order not to obscure the present invention.

[0021] The description to follow repeatedly uses the phrase “in one embodiment”, which ordinarily does not refer to the same embodiment, although it may. The terms “comprising”, “having”, “including” and the like, as used in the present application, including in the claims, are synonymous.

Overview

[0022] Referring now to FIG. 1, wherein a block diagram illustrating an overview of a SOC 100 with subsystems 102a-102d incorporated with the teachings of the present invention for inter-subsystem communication, in accordance with one embodiment, is shown. As illustrated, for the embodiment, SOC 100 includes on-chip bus 104 and subsystems 102a-102d coupled to each other through bus 104. Moreover, each of subsystems 102a-102d includes data transfer unit or interface (DTU) 108a-108d incorporated with teachings of the present invention, correspondingly coupling the subsystems 102a-102d to bus 104. SOC 100 also includes arbiter 106, which is also coupled to bus 104.

[0023] In one embodiment, bus 104 includes a number of sets of request lines (one set per subsystem), a number of sets of grant lines (one set per subsystem), and a number of shared control and data/address lines. Included among the shared control lines is a first control line for a subsystem granted access to the bus (grantee subsystem, also referred to as the master subsystem) to assert a control signal to denote the beginning of a transaction cycle, and to de-assert the control signal to denote the end of the transaction cycle; and a second control line for a subsystem addressed by the grantee/master subsystem (also referred to as the slave subsystem) to assert a control signal to inform the grantee/master subsystem that the addressee/slave subsystem is busy (also referred to as “re-trying” the master system).

[0024] As a result of the facilities advantageously provided by DTU 108a-108d, and the teachings incorporated in subsystem 102a-102d, subsystems 102a-102d are able to flexibly communicate and cooperate with each other, allowing subsystems 102a-102d to handle a wide range of different functions having different needs. More specifically, as will be described in more detail below, in one embodiment, subsystems 102a-102d communicate with each other via transactions conducted across bus 104. Subsystems 102a-102d, by virtue of the facilities advantageously provided by DTU 108a-108d, are able to locally prioritize the order in which its transactions are to be serviced by the corresponding DTU 108a-108d to arbitrate for access to bus 104. Further, in one embodiment, by virtue of the architecture of the transactions, subsystems 102a-102d are also able to flexibly control the priorities on which the corresponding DTU 108a-108d are to use to arbitrate for bus 104 with other contending transactions of other subsystems 102a-102d.

[0025] Arbiter 106 is employed to arbitrate access to bus 104. That is, arbiter 106 is employed to determine which of the contending transactions on whose behalf the DTU 108a-108d are requesting for access (through e.g. the request lines of the earlier described embodiment), are to be granted access to bus 104 (through e.g. the grant lines of the earlier described embodiment).

[0026] SOC 100 is intended to represent a broad range of SOC, including multi-service ASIC. In particular, in various embodiments, subsystems 102a-102d may be one or more of a memory controller, a security engine, a voice processor, a collection of peripheral device controllers, a framer processor, and a network media access controller. Moreover, by virtue of the advantageous employment of DTU 108a-108d to interface subsystems 102a-102d to on-chip bus 104, with DTU 108a-108d and on-chip bus operating on the same clock speed, the core logic of subsystems 102a-102d may operate in different clock speeds, including clock speeds that are different from the clock speed of non-chip bus 104 and DTU 108 a-108d. In one embodiment, one or more subsystems 102a-102d may be a multi-function subsystems, in particular, with the functions identified by identifiers. Except for the teachings of the present invention incorporated into subsystems 102a-102d, the exact constitution and the exact manner their core logic operate in providing the functions/services the subsystems are immaterial to the present invention. While for ease of understanding, SOC 100 is illustrated as having only four subsystems 102a-102d, in practice, SOC 100 may have more or less subsystems. In particular, by virtue of the advantageous employment of DTU 108a-108d to interface subsystems 102a-102d to on-chip bus 104, zero or more selected ones of subsystems 102a-102d may be removed, while other subsystems 102a-102d may be flexibly added to SOC 100.

[0027] Similarly, arbiter 106 may be any one of a number of bus arbiters known in the art. The facilities of DTU 108a-108d and the teachings incorporated into the core logic of subsystems 102a-102d to practice the present invention will be described in turn below.

Method

[0028] Referring now to FIG. 2, wherein a flow chart illustrating a method of the present invention, in accordance with one embodiment, is shown. As illustrated, in accordance with the present invention, subsystems 102a-102d initiate transactions with one another, using the facilities of their corresponding DTU 108a-108d to locally prioritize the order the transactions of the corresponding subsystems are to be serviced, for arbitration for access to bus 104, block 202.

[0029] Further, for the embodiment, for each transaction, each subsystem 102a-102d also includes as part of the transaction the bus arbitration priority the corresponding DTU 108a-108d is to use to arbitrate for access to bus 104, when servicing the transaction in the prioritized manner.

[0030] In response, DTU 108a-108d service the transactions of the respective subsystems 102a-102d accordingly, and arbitrating for access to bus 104, using the bus arbitration priorities included among the transactions. Arbiter 106 in turn grants accesses to bus 104 based on the bus arbitration priorities of the contending transactions, block 204.

[0031] In one embodiment, arbiter 106 grants access strictly by the transaction priorities, e.g. in a three priority implementation, all high priority transactions will be granted access first, before the medium priority transactions are granted access, and finally the low priority transactions are granted access. In another embodiment, arbiter 106 further employs certain non-starvation techniques, to ensure the medium and/or low priority transactions will also be granted access to bus 104. The non-starvation techniques may be any one of a number of such techniques known in the art.

[0032] Still referring to FIG. 2, once granted access to bus 104, the grantee DTU 108* places the grantee transaction on bus 104 (through e.g. the shared data/address lines of the earlier described embodiment). In one embodiment, the transaction includes address of the targeted subsystem 102*. In response, once placed onto bus 104, the addressee subsystem 102* claims the transaction, and acknowledges the transaction, or if the subsystem 102* is busy, instructs the requesting subsystem 102* to retry later, block 206. If acknowledgement is given and a reply is due (as in the case of a read request), the reply is later initiated as a reply transaction. In other words, for the embodiment, “read” transactions are accomplished in a “split” manner.

[0033] In the present application, for ease of designation, the trailing “*” of a reference number denotes one of the instances of reference. For example, 108* means either 108a, 108b, 108c or 108d.

Exemplary Transaction Formats

[0034] FIGS. 3a-3b illustrate two exemplary transaction formats, a request format and a reply format, suitable for use to practice the present invention, in accordance with one embodiment. As illustrated in FIG. 3a, exemplary request transaction 302 includes three parts, first header 301a, second header 301b, and optional data portion 312. First header 301 a includes in particular, a command or request code 304, which for the embodiment, includes the bus arbitration priority, and address 306 of the target subsystem 102*. The various subsystems 102a-102d of SOC 100 are assumed to be memory mapped. Arbitration is initiated by a DTU 108* requesting arbiter 106 for access (through e.g. the earlier described subsystem based request lines), including with the request the included bus arbitration priority in the command portion 304 of first header 301a. Second header 301b includes an identifier identifying a function of the originating subsystem 102*, allowing subsystem 102* to be a multifunction subsystem and be able to associate transactions with the various functions. Second header 301b also includes size 310. For write transactions, size 310 denotes the size of the write data to follow (the “optional” data portion), in number of bytes. For read transactions, size 310 denotes the size of the data being accessed (i.e. read), also in number of bytes.

[0035] As illustrated in FIG. 3b, exemplary reply transaction 322 also includes three parts, first header 321a, second header 321b and data 332. First header 321a includes in particular, a command or request code 324, which includes the bus arbitration priority, identifier 328 which identifies the subsystem and its function, and low order byte of targeted address 326a of the replying subsystem 102*. As alluded earlier, data 332 includes the data being accessed/read by the original read request. Again, arbitration is initiated by a DTU 108* requesting arbiter 106 for access (through e.g. the earlier described subsystem based request lines), including with the request the included bus arbitration priority in the command portion 324 of first header 321a. Second header 321b includes the remaining high order bytes targeted address 326a of the replying subsystem 102*. Accordingly, employment of these transaction formats enables a subsystem 102* to communicate with another subsystem 102* at any byte position, reducing the number of operations for unaligned data transfers.

[0036] In one embodiment, different commands are supported for “conventional” data versus control, and voice data. More specifically, for the embodiment, the commands supported are: 1 Command Code Command Description 000 Reserved Reserved 001 DRead Data Read Request 010 CRead Control Read Request 011 VRead Voice Read Request 100 DWrite Data Write Request 101 CWrite Control Write Request 110 VWrite Voice Write Request 111 Reply Read Reply

Data Transfer Units

[0037] FIG. 4 illustrates DTU 108* in further details, in accordance with one embodiment. As illustrated, DTU 108* includes a number of pairs of outbound and inbound transaction queues 402* and 404*, one pair each for each priority level. For example, in one embodiment where DTU 108* supports three levels of priority, high, medium and low, DTU 108* includes three pairs of outbound and inbound transaction queues 402a and 404a, 402b and 404b, and 402c and 404c, one each for the high, medium and low priorities. In another embodiment, DTU 108* supports two levels of priority, high and low, DTU 108* includes two pairs of outbound and inbound transaction queues 402a and 404a, and 402b and 404b, one each for the high and low priorities. Of course, in other embodiments, DTU 108* may support more than three levels of priority or less than two levels of priority, i.e. no prioritization.

[0038] Additionally, DTU 108* includes outbound transaction queue service state machine 406 and inbound transaction queue service state machine 408, coupled to the transaction queues 402* and 404* as shown. Outbound transaction queue service state machine 406 services, i.e. processes, the transactions placed into the outbound queues 402* in order of the assigned priorities of the queues 402* and 404*, i.e. with the transactions queued in the highest priority queue being serviced first, then the transaction queued in the next highest priority queue next, and so forth.

[0039] For each of the transactions being serviced, outbound transaction queue service state machine 406 provides the control signals to the corresponding outbound queue 402* to output on the subsystem's request lines, the included bus arbitration priority of the first header of the “oldest” (in turns of time queued) transaction of the queue 402*, to arbitrate and compete for access to bus 104 with other contending transactions of other subsystems 102*. Upon being granted access to bus; 104 (per the state of the subsystem's grant lines), for the embodiment, outbound transaction queue service state machine 406 provides the control signals to the queue 402* to output the remainder of the transaction, e.g. for the earlier described transaction format, the first header, the second header and optionally, the trailing data.

[0040] Similarly, inbound transaction queue service state machine 408 provides the control signals to the corresponding inbound queue 402* to claim a transaction on bus 104, if it is determined that the transaction is a new request transaction of the subsystem 102* or a reply transaction to an earlier request transaction of the subsystem 102*. Additionally, in one embodiment, if the claiming of a transaction changes the state of the queue 404* from empty to non-empty, inbound transaction queue service state machine 408 also asserts a “non-empty” signal for the core logic (not shown) of the subsystem 102*.

[0041] In due course, the core logic, in view of the “non-empty” signal, requests for the inbound transactions queued. In response, inbound transaction queue service state machine 408 provides the control signals to the highest priority non-empty inbound queue to cause the queue to output the “oldest” (in turns of time queued) transaction of the queue 404*. If all inbound queues 404* become empty after the output of the transaction, inbound transaction queue service state machine 408 de-asserts the “non-empty” signal for the core logic of the subsystem 102*.

[0042] Thus, under the present invention, a core logic of a subsystem 102* is not only able to influence the order its transactions are being granted access to bus 104, relatively to transactions of other subsystems 102*, through specification of the bus arbitration priorities in the transactions' headers, a core logic of a subsystem 102*, by selectively placing transactions into the various outbound queues 402* of its DTU 108*, may also utilize the facilities of DTU 108* to locally prioritize the order in which its transactions are to be serviced to arbitrate for access for bus 104.

[0043] Queue pair 402* and 404* may be implemented via any one of a number of “queue” circuitry known in the art. Similarly, state machines 406-408, to be described more fully below, may be implemented using any one of a number programmable or combinatory circuitry known in the art. In one embodiment, assignment of priorities to the queues pairs 402* and 404* are made by programming a configuration register (not shown) of DTU 108*. Likewise, such configuration register may be implemented in any one of a number of known techniques.

State Machines

[0044] Referring now to FIGS. 5s and 5b wherein two state diagrams illustrating the operating states and transitional rules of state machines 406 and 408 respectively, in accordance with one embodiment, are shown. The embodiment assumes three pairs of queues 402a and 404a, 402b and 404b, and 402c and 404c, having three corresponding level of assign priorities, high, medium and low.

[0045] As illustrated in FIG. 5a, initially, for the embodiment, state machine 406 is in idle state 502. If state machine 406 detects that the high priority queue 402 a is non-empty, it transitions to first arbitrate state 504, and arbitrate for access to bus 104 for the “oldest” (in terms of time queued) transaction queued in the high priority queue 402a. However, if while in idle state 502, state machine 406 detects that the high priority queue 420a is empty and the medium priority queue 402b is not empty, it transitions to second arbitrate state 508, and arbitrate for access to bus 104 for the “oldest” (in terms of time queued) transaction queued in the medium priority queue 402b. Similarly, if while in idle state 502, state machine 406 detects that both the high and medium priority queues 402a-402b are empty, and the low priority queue 402c is not empty, it transitions to third arbitrate state 512, and arbitrate for access to bus 104 for the “oldest” (in terms of time queued) transaction queued in the low priority queue 402c. If none of these transition conditions are detected, state machine 406 remains in idle state 502.

[0046] Upon arbitrating for access to bus 104 for the “oldest” (in terms of time queued) transaction queued in the highest priority queue 402a after entering first arbitrate state 504, state machine 406 remains in first arbitrate state 504 until the bus access request is granted. At such time, it transitions to first placement state 506, where it causes the granted transaction in the high priority queue 404a to be placed onto bus 104.

[0047] From first placement state 506, state machine 406 returns to one of the three arbitrate states 504, 508 and 512 or idle state 502, depending on whether the high priority queue 402a is empty, if yes, whether the medium priority queue 402b is empty, and if yes, whether the low priority queue 402c is also empty.

[0048] Similarly, upon arbitrating for access to bus 104 for the “oldest” (in terms of time queued) transaction queued in the medium priority queue 402b after entering second arbitrate state 508, state machine 406 remains in second arbitrate state 508 until the bus access request is granted. At such time, it transitions to second placement state 510, where it causes the granted transaction in medium priority queue 402b to be placed onto bus 104.

[0049] From second placement state 510, state machine 406 returns to one of the three arbitrate states 504, 508 and 512 or idle state 502, depending on whether the high priority queue 402a is empty, if yes, whether the medium priority queue 402b is empty, and if yes, whether the low priority queue 402c is also empty.

[0050] Likewise, upon arbitrating for access to bus 104 for the “oldest” (in terms of time queued) transaction queued in the low priority queue 402c, state machine 406 remains in third arbitrate state 512 until the bus access request is granted. At such time, it transitions to third placement state 514, where it causes the granted transaction in low priority queue 402b to be placed onto bus 104.

[0051] From third placement state 514, state machine 406 returns to one of the three arbitrate states 504, 508 and 512 or idle state 502, depending on whether the high priority queue 402a is empty, if yes, whether the medium priority queue 402b is empty, and if yes, whether the low priority queue 402c is also empty.

[0052] As illustrated in FIG. 5b, initially, for the embodiment, state machine 408 is also in idle state 602. While at idle state 602, if no transaction on bus 104 is addressed to the subsystem 102* (or one of the functions of the subsystem 102*, in the case of a reply transaction), nor are there any pending request for data from the core logic of the subsystem 102*, state machine 408 remains in idle state 602.

[0053] However, if the presence of a transaction on bus 104 addressed to the subsystem 102* (or one of the functions of the subsystem 102*, in the case of a reply transaction) is detected, state machine 408 transitions to claim state 604, where it provides control signals to the appropriate queue 404* to claim the transaction, and acknowledges the transaction.

[0054] If claiming of the transaction changes the state of the queues from all empty to at least one queue not empty, state machine 408 transitions to the notify state 606, in which it asserts the “non-empty” signal for the core logic of subsystem 102*, as earlier described.

[0055] From notify state 606, state machine 408 transitions to either claim state 604 if there is another transaction on bus 104 addressed to the subsystem 102* (or a function of the subsystem 102*, in case of a reply), or output state 608, if there is a pending request for data from the core logic of the subsystem 102*. From output state 608, state machine 408 either transitions to claim state 604 another transaction on bus 104 addressed to the subsystem 102* (or a function of the subsystem 102*, in case of a reply) is detected, remains in output state 608 if there is no applicable transaction on bus 104, but request for data from the core logic is outstanding, or returns to idle state 602, if neither of those two conditions are true. 2 Bus Signals, Timing and Rules In one embodiment, the bus signals supported are as follows: Signal Name Signal Width Description MSCLK 1 Bus Clock (e.g. 25-100 MHz) MSRST 1 System Bus Reset MSAD[31:0] 32 Address/Data (tri-state, bi-directional) MSCYC 1 Shared among subsystems to denote master bus cycles MSREQ-1:0] pair for each Bus request, 2 per subsystem to gain owner- subsystem ship of bus MSGNT # of Bus grant-signifies subsystem own the bus subsystems MSSEL 1 Slave select-signifies subsystem has been selected (tri-state) MSRDY 1 Master ready-signifies master data ready on the bus (tri-state) MSBSY 1 Slave Busy-signifies selected device is busy (tri-state) MSINT # of Interrupt request subsystems

[0056] In one embodiment, the request codes supported are as follows: 3 Req[1:0] Request Type 00 Idle-no request 01 Low priority (“conventional” Data) 10 Medium priority (Control) 11 High priority (Voice and Replies)

[0057] FIGS. 6a-6c are three timing diagrams illustrating the timings of the various signals of the above described embodiment, for burst write timing, write followed by read timing and read followed by write timing (different subsystems) respectively.

[0058] In one embodiment, the maximum burst transfer size is 64-bytes of data (+8 bytes for the transaction header). The subsystems guarantee the burst transfers to be within a page. The slave devices would accept the maximum sized transfer (64 bytes +header) before generating the above described MSSEL signal.

[0059] In one embodiment, each data transfer unit would permit only one Read request to be outstanding. If a Read request is pending, the subsystem would not accept requests from other masters until the reply to the outstanding Read request has been received. This advantageously prevents a deadlock condition. The subsystem may, however, continue to generate write requests.

[0060] In alternate embodiments, the present invention may be practiced with other approaches being employed to address these and other operational details.

Alternate Embodiment with Data Traffic Router

[0061] Referring now to FIG. 7, wherein another overview of SOC 100, including the employment of data traffic router 110, in accordance with another embodiment is shown. As will be described in more detail below, data traffic router 110 advantageously enables selected combinations of the attached subsystems, such as subsystem A 102a, subsystem B 102b, subsystem C 102c, and the subsystem among subsystems D-G 102d-102g granted access to on-chip bus 104 to concurrently communicate with one another. For example, subsystem A 102a may be communicating with subsystem B 102b, while at the same time subsystem B 102b may be communicating with subsystem C 102c, subsystem C communicating with one of subsystem D-G 102d-102g having been granted access to on-chip bus 104, and so forth. Thus, data traffic router 110 is particularly useful for embodiments of SOC 100 where a subset of the subsystems 102 a-102g has a relatively higher volume of communications with other subsystems. For example, in one embodiment of SOC 100 comprising a processor controlling the overall operation of SOC 100, an on-chip memory, and an external device controller having multiple external devices attached to them, these subsystems, i.e. the processor, the on-chip memory, and the external device controller all have relatively more communication needs than other subsystems 102* of SOC 100.

Data Traffic Router

[0062] FIG. 8 illustrates the data paths 110a of data traffic router 110 in accordance with one embodiment. As illustrated, for the embodiment, multiplexor 802a is coupled to subsystem B 102b, subsystem C 102c and at any instance in time, one of subsystems D-G 102d-102g, the current grantee subsystem of on-chip bus 104, to select and output one of the outputs of these subsystems for subsystem A 102a. Similarly, multiplexor 802bis coupled to subsystem A 102a, subsystem C 102c and at any instance in time, one of subsystems D-G 102d-102g, the current grantee subsystem of on-chip bus 104, to select and output one of the outputs of these subsystems for subsystem B 102b. For the embodiment, multiplexor 802b may also select the output of subsystem B 102b and output it back for subsystem B 102b, thereby enabling two external devices attached to subsystem B 102b to communicate with one another.

[0063] In like manner, multiplexor 802c is coupled to subsystem A 102a, subsystem B 102b, and at any instance in time, one of subsystems D-G 102d-102g, grantee subsystem of on-chip bus 104, to select and output one of the outputs of these subsystems for subsystem C 102b, and multiplexor 802d is coupled to subsystem A 102a, subsystem B 102b, and subsystem C 802c to select and output one of the outputs of these subsystems for one of subsystem D-G 102d-102g, the current grantee subsystem of on-chip bus 104.

[0064] Thus, it can be seen by selectively configuring multiplexors 802a-802d, communications between a subset of selected combinations of subsystem A-G 102a-102g may be facilitated concurrently.

[0065] FIG. 9 illustrates the control aspect 110b of data traffic router 110, in accordance with one embodiment. As illustrated, for the embodiment, the control aspect 110b of data traffic router 110 includes configurator 902 and configuration states storage unit 904 coupled to one another. Further, configurator 902 is coupled to the various subsystems, i.e. subsystem A 102a, subsystem B 102b, subsystem C 102c, and in any instance in time, one of subsystems D-G 102d-102g, the grantee subsystem of on-chip bus 104, and receiving the outputs of these subsystems. More specifically, for the embodiment, recall, upon granted access to on-chip bus 104, each of these subsystems outputs the header of a transaction, which includes an address of the addressee subsystem, denoting the destination of the output.

[0066] Thus, based at least on the headers of the transactions, configurator 902 is able to discern the desired destinations of the outputs of the various subsystems. In response, configurator 902 generates and provides control signals to multiplexors 802a-802d to configure multiplexors 802a-802d to correspondingly select the appropriate inputs of the multiplexors 802a-802d to provide the appropriate data paths for the data to reach the desired destinations, if configurator 902 is able to do so. That is, if the required multiplexors 802a-802d have not already been configured to provide data paths for other communications first.

[0067] Thus, configurator 902 generates and provides control signals to multiplexors 802a-802d to configure multiplexors 802a-802d based at least in part on the current configuration states of multiplexors 802a-802d. If a needed multiplexor is idle, and may be configured to provide the desired path. Configurator 902 generates and outputs the appropriate control signal for the multipelxor to configure the multiplexor to provide the appropriate path. Moreover, the new configuration state of the multiplexor is tracked and stored in configuration state storage unit 904 for use in subsequent configuration decisions. On the other hand, if the multiplexor is busy, that is it has already been configured to facilitate another communication, for the embodiment, configurator 902 notifies the subsystem accordingly. More specifically, for the embodiment, configurator 902 “retries” the subsystem, notifying the source subsystem that the destination subsystem is busy.

[0068] For the embodiment, as described earlier, upon granted access to on-chip bus 104, each subsystem drives a control signal denoting the beginning of a transaction, and at the end of the transaction, deasserts the control signal, denoting the end of the transaction. Thus, responsive to the deassertion of the control signal denoting the end of the transaction, configurator 902 returns the corresponding multiplexor to the idle state, making the multiplexor available to be configured in a next cycle to facilitate another communication.

Subsystem Providing Multiple Access Paths-Memory

[0069] FIG. 10 illustrates yet another overview of SOC 100, including a private/unshared access path between two most frequently communicated subsystems, in accordance with one embodiment. As illustrated, among the subsystems With relatively higher communication needs, such as subsystems A-C 102a-102c, a couple of these subsystems have the highest need for timely high priority communications. Thus, it would be advantageous to provide a private/unshared communication link between these two most frequently communicated subsystems. Examples of two such subsystems are processor of SOC 100 responsible for controlling the overall operation of SOC 100, and on-chip memory of SOC 100.

[0070] FIG. 11 illustrates an exemplary memory subsystem equipped with multiple data transfer interfaces 1106a and 1106b, one data transfer interface 1106a to operate and facilitate communication with other subsystems as earlier described, and another data transfer interface 1106b to operate and facilitate communication with the other most frequently communicated subsystem, such as the earlier described processor, in a complementary manner.

[0071] In addition to the two data transfer interfaces 1106a-1106b, memory 1100 further includes memory array 1102, a number of storage structures 1114 -1118, a number of multiplexors 1112a-1112c and two control blocks 1104a-1104b, coupled to each other as shown. First data transfer interface 1106a, for the embodiment, includes two inbound queues 1108a and 1108b and an outbound queue 1110a, whereas second data transfer interface 1106b includes one each, an inbound queue 1108c and an outbound queue 1110b.

[0072] During operation, memory accesses from processor as well as other subsystems accessing memory 1100 in the earlier described manner are queued in inbound queues 1108a-1108b in accordance with their priorities. However, super high priority memory accesses from processor accessing memory 1100 through the private non-shared communication link are queued in inbound queue 1108c. Multiplexor 1112a coupled to these queues 1108a-1108c, under the control of access control 1104a, selects memory accesses queued in these queues in order of their priorities and sequences the memory accesses into sequential storage structure 1114. The sequenced memory accesses are released in turn, accessing memory array 1102, and causing the data in the desired locations to be outputted.

[0073] In addition to controlling the operation of multiplexor 1112a and sequential storage structure 1114, for each memory access, access control 1104a also causes the header of the reply transaction to the memory access be formed and staged in header storage structure 1116. Output responses to memory accesses from memory array 1102 are stored in output storage structure 1118.

[0074] Multiplexor 1112b under the control of response control 1104b selectively selects either headers queued in header storage structure 1116 and output data stored in output storage structure 1118, and outputs them for multiplexor 1112c. Multiplexor 1112c in turn, under the control of response control 1104b outputs the header/data to either outbound queue 1110a or outbound queue 1110.

[0075] Thus, it can be seen that memory 1110 is advantageously equipped to provide an additional private/unshared access path for a frequent access subsystem, such as a processor. Resultantly, inter-subsystem communication between subsystems of SOC 100 is further improved.

[0076] In various embodiments, to ensure data coherency, if subsystem A 102a (e.g. a system processor) has been notified (e.g. via interrupt) by one of the subsystems 102*, e.g. subsystem D 102d (such as a network media access controller) of the fact that subsystem D 102d has placed certain data into subsystem B 102b (a memory unit), and subsystem A 102a has a need to access the certain data, subsystem A 102a would ensure the certain data have actually been placed into subsystem B 102b (the memory unit) before accessing for that certain data through the direct unshared parallel access path. Subsystem A 102 (control processor) assumes that for efficiency of operation, subsystem D 102d places the certain data in subsystem B 102b (the memory unit) without requesting for confirmation, and notifies subsystem A 102a upon initiating the placement transaction. Thus, accessing the certain data via the unshared direct parallel access path immediately without the assurance may result in invalid data.

[0077] In one embodiment, subsystem A 102a (control processor) ensures that the certain data have actually been placed into subsystem B 102b (the memory unit) by first making “accesses” to subsystem B 102b (the memory unit) and subsystem D 102d via the shared access path, awaits for replies to these “accesses”, and defers making the access to subsystem B 102b (the memory unit) for the certain data via the unshared direct parallel access path until receiving the replies to these special “accesses”. The “accesses” to subsystem B 102b (the memory unit) and subsystem D 102d are made with appropriate priorities, equal to or less than the priority employed by subsystem D 102d in placing the certain data in subsystem B 102d (the memory unit). Accordingly, receipt of the replies by subsystem A 102a (control processor) ensures for subsystem A 102a that the certain data has indeed been placed into subsystem B 102b (the memory unit), by virtue of the manner transactions are processed through the respective DTU 108a-108d and 108d.

Conclusion and Epilogue

[0078] Thus, it can be seen from the above descriptions, an improved method and apparatus for inter-subsystem communication between subsystems of a SOC has been described. The novel scheme includes features incorporable in data transfer interfaces to facilitate additional private channels of communications between subsystems frequently communicate with one another. While the present invention has been described in terms of the foregoing embodiments, those skilled in the art will recognize that the invention is not limited to these embodiments. The present invention may be practiced with modification and alteration within the spirit and scope of the appended claims. Thus, the description is to be regarded as illustrative instead of restrictive on the present invention.

Claims

1. A memory unitfor use in an integrated circuit (IC) comprising:

an array of memory cells;

a first data transfer interface coupled to the array of memory cells to provide a first access path for a selected one of a processor and a plurality of subsystems of the IC to access said array of memory cells;

a second data transfer interface coupled to the array of memory cells to provide a second access path for said processor to access said array of memory cells; and

a controller coupled to the array of memory cells and the first and second data transfer interfaces to control said array of memory cells and said first and second data transfer interfaces to facilitate concurrent accesses of said memory unit by said processor and said subsystems.

2. The memory unit of claim 1, wherein said first data transfer interface comprises

a first inbound queue coupled to said array of memory cells for queuing a first plurality of memory accesses of said processor and said subsystem of a first priority;

a second inbound queue coupled to said array of memory cells for queuing a second plurality of memory accesses of said processor and said subsystem of a second priority; and

an outbound queue coupled to said array of memory cells for queuing output responses to said first and second plurality of memory accesses of said processor and said subsystem of said first and second priorities accessed through said first and second inbound queues.

3. The memory unit of claim 1, wherein said second data transfer interface comprises

an inbound queue coupled to said array of memory cells for queuing a first plurality of memory accesses of said processor; and

an outbound queue coupled to said array of memory cells for queuing output responses to said first plurality of memory accesses of said processor.

4. The memory unit of claim 1, wherein said controller comprises

a sequential storage structure coupled to said array of memory cells;

a multiplexor coupled to inbound queues of said first and second data transfer units and said sequential storage structure to sequence memory accesses queued in said inbound queues into said sequential storage structure; and

a state machine coupled to said sequential storage structure, said multiplexor, and said inbound queues of said first and second data transfer units to control their operation.

5. The memory unit of claim 4, wherein said state machine controls said multiplexor to sequence memory accesses queued in said second data transfer interface into said sequential storage structure before sequencing any memory access queued in said first data transfer interface into said sequential storage structure.

6. The memory unit of claim 4, wherein said state machine controls said multiplexor to sequence memory accesses queued in inbound queues of said first data transfer interface into said sequential storage structure, in accordance with assigned priorities of said inbound queues.

7. The memory unit of claim 1, wherein said controller comprises

a first sequential storage structure to stage headers for output responses to memory accesses;

a second sequential storage structure coupled to said array of memory cells to stage output responses to memory accesses;

a first multiplexor coupled to said first and second sequential storage structures to selective output one of said staged headers of output responses to memory accesses and said staged output responses to memory accesses;

a second multiplexor coupled to said first multiplexor and outbound queues of said first and second data transfer units to selective output the selected output of said first multiplexor to a selected one of said outbound queues of said first and second data transfer unit; and

a state machine coupled to said first and second sequential storage structures, said first and second multiplexors, and said outbound queues of said first and second data transfer units to control their operation.

8. In a memory unit of an integrated circuit (IC), a method of operation comprising:

queuing first memory accesses of a processor and a plurality of subsystems of the IC in inbound queues of a first data transfer interface;

queuing second memory accesses of the processor in an inbound queue of a second data transfer interface;

sequencing said first and second memory accesses into a single sequence of memory accesses; and

servicing said first and second memory accesses in accordance with their sequence order.

9. The method of claim 8, wherein said queuing of said first memory accesses of a processor and a plurality of subsystems of the IC in inbound queues of a first data transfer interface comprises queuing said first memory accesses into inbound queues of said first data transfer interface having associated priorities, in accordance with priorities of said first memory accesses.

10. The method of claim 8, wherein said sequencing comprises sequencing second memory accesses queued in said second data transfer interface before sequencing any memory access queued in said first data transfer interface.

11. The method of claim 8, wherein said sequencing comprises sequencing first memory accesses queued in inbound queues of said first data transfer interface, in accordance with assigned priorities of the inbound queues.

12. The method of claim 8, wherein said servicing comprises generating and queuing headers for output responses to said first and second memory accesses.

13. The method of claim 8, wherein said servicing comprises queuing output responses to said first and second memory accesses.

14. The method of claim 8, wherein said servicing comprises merging headers for output responses to said first and second memory accesses and output responses to said first and second memory accesses.

15. The method of claim 8, wherein said servicing comprises selectively outputting headers for output responses to said first and second memory accesses and output responses to said first and second memory accesses to a selected one of said first and said second data transfer interfaces.

16. An integrated circuit comprising:

a processor;

a plurality of subsystems; and

a memory unit coupled to said processor and said subsystems having at least a first access path to facilitate access by a selected one of said processor and said subsystem to access said memory unit and a second access path to facilitate access by said processor.

17. The integrated circuit of claim 16, wherein said memory unit comprises

an array of memory cells;

a first data transfer interface coupled to the array of memory cells to provide said first access path for said selected one of said processor and said subsystems;

a second data transfer interface coupled to the array of memory cells to provide said second access path for said processor; and

a controller coupled to the array of memory cells and the first and second data transfer interfaces to control said array of memory cells and said first and second data transfer interfaces.

18. The integrated circuit of claim 17, wherein said first data transfer interface comprises

a first inbound queue coupled to said array of memory cells for queuing a first plurality of memory accesses of said processor and said subsystem of a first priority;

a second inbound queue coupled to said array of memory cells for queuing a second plurality of memory accesses of said processor and said subsystem of a second priority; and

an outbound queue coupled to said array of memory cells for queuing output responses to said first and second plurality of memory accesses of said processor and said subsystem of said first and second priorities accessed through said first and second inbound queues.

19. The integrated circuit of claim 17, wherein said second data transfer interface comprises

an inbound queue coupled to said array of memory cells for queuing a first plurality of memory accesses of said processor; and

an outbound queue coupled to said array of memory cells for queuing output responses to said first plurality of memory accesses of said processor.

20. The integrated circuit of claim 17, wherein said controller comprises

a sequential storage structure coupled to said array of memory cells;

a multiplexor coupled to inbound queues of said first and second data transfer units and said sequential storage structure to sequence memory accesses queued in said inbound queues into said sequential storage structure; and

a state machine coupled to said sequential storage structure, said multiplexor, and said inbound queues of said first and second data transfer units to control their operation.

21. The integrated of claim 20, wherein said state machine controls said multiplexor to sequence memory accesses queued in said second data transfer interface into said sequential storage structure before sequencing any memory access queued in said first data transfer interface into said sequential storage structure.

22. The integrated circuit of claim 20, wherein said state machine controls said multiplexor to sequence memory accesses queued in inbound queues of said first data transfer interface into said sequential storage structure, in accordance with assigned priorities of said inbound queues.

23. The integrated circuit of claim 17, wherein said controller comprises

a first sequential storage structure to stage headers for output responses to memory accesses;

a second sequential storage structure coupled to said array of memory cells to stage output responses to memory accesses;

a first multiplexor coupled to said first and second sequential storage structures to selective output one of said staged headers of output responses to memory accesses and said staged output responses to memory accesses;

a second multiplexor coupled to said first multiplexor and outbound queues of said first and second data transfer units to selective output the selected output of said first multiplexor to a selected one of said outbound queues of said first and second data transfer unit; and

a state machine coupled to said first and second sequential storage structures, said first and second multiplexors, and said outbound queues of said first and second data transfer units to control their operation.

24. The integrated circuit of claim 16, wherein said integrated circuit further comprises an on-chip bus to which said memory unit and at least a subset of said subsystems are attached.

25. The integrated circuit of claim 16, wherein said integrated circuit further comprises a data traffic router to which said memory unit, said processor, and at least one of said subsystems is attached, said data traffic router facilitating concurrent communication between selected combinations of said memory unit, said processor and said at least one subsystem.

26. The integrated circuit of claim 16, wherein each of said subsystems is a selected one of a security engine, a voice processor, a collection of peripheral device controllers, a framer processor, a network media access controller, and an external device controller.

27. In an integrated circuit (IC), a method of operation comprising:

a processor and a plurality of subsystem of the IC successively making first memory accesses of a memory unit of the IC via a first access path in turn;

the processor also successively making second memory accesses to said memory unit via a second access path in parallel; and

the memory unit servicing said first and second memory accesses made through said first and second access paths in parallel.

28. The method of claim 27, wherein said servicing comprises

queuing said first memory accesses of said processor and said plurality of subsystems of the IC in inbound queues of a first data transfer interface of said memory unit;

queuing said second memory accesses of the processor in an inbound queue of a second data transfer interface of said memory unit;

sequencing said first and second memory accesses into a single sequence of memory accesses; and

servicing said first and second memory accesses in accordance with their sequence order.

29. The method of claim 28, wherein said queuing of said first memory accesses of said processor and said plurality of subsystems of the IC in inbound queues of a first data transfer interface comprises queuing said first memory accesses into inbound queues of said first data transfer interface having associated priorities, in accordance with priorities of said first memory accesses.

30. The method of claim 28, wherein said sequencing comprises sequencing second memory accesses queued in said second data transfer interface before sequencing any memory access queued in said first data transfer interface.

31. The method of claim 28, wherein said sequencing comprises sequencing first memory accesses queued in inbound queues of said first data transfer interface, in accordance with assigned priorities of the inbound queues.

32. The method of claim 28, wherein said servicing comprises generating and queuing headers for output responses to said first and second memory accesses.

33. The method of claim 28, wherein said servicing comprises queuing output responses to said first and second memory accesses.

34. The method of claim 28, wherein said servicing comprises merging headers for output responses to said first and second memory accesses and output responses to said first and second memory accesses.

35. The method of claim 28, wherein said servicing comprises selectively outputting headers for output responses to said first and second memory accesses and output responses to said first and second memory accesses to a selected one of said first and said second data transfer interfaces.

36. The method of claim 27, wherein the method further comprises

a first of said subsystem notifying said processor with respect to said first subsystem having placed a first data in said memory unit;

said processor making a first access of said first subsystem, and awaiting for a first reply to said first access of said first subsystem; and

said processor making a second access of said memory unit via said second access path for said first data placed into said memory unit by said first subsystem after at least receiving said first reply for said first access of said first subsystem.

37. The method of claim 38, wherein the method further comprises

said processor making a second access of said memory unit via said first access path, and awaiting for a second reply to said second access of said memory unit; and

said processor making said first access of said memory unit via said second access path for said first data placed into said memory unit by said first subsystem after also receiving said second reply for said second access of said memory unit.

38. The method of claim 27, wherein the method further comprises

a first of said subsystem notifying said processor with respect to said first subsystem having placed a first data in said memory unit;

said processor making a first access of said memory unit via said first access path, and awaiting for a first reply to said first access of said memory unit; and

said processor making a second access of said memory unit via said second access path for said first data placed into said memory unit by said first subsystem after at least receiving said first reply for said first access of said memory unit.