Abstract: Computing apparatus (11) includes a plurality of processor cores (12) and a cache (10), which is shared by and accessible simultaneously to the plurality of the processor cores. The cache includes a shared memory (16), including multiple block frames of data imported from a level-two (L2) memory (14) in response to requests by the processor cores, and a shared tag table (18), which is separate from the shared memory and includes table entries that correspond to the block frames and contain respective information regarding the data contained in the block frames.
Abstract: An apparatus (10) includes a first plurality of processor cores (200) and a Central Scheduling/Synchronization Unit (CSU, 110), which is coupled to allocate computing tasks for execution by the processor cores. A second plurality of Distribution Units (DUs, 2000) is arranged in a logarithmic network (1000) between the CSU and the processor cores and configured to distribute the computing tasks from the CSU among the processor cores. Each DU includes an associative task registry (2200) for storing information with regard to the computing tasks distributed to the processor cores by the DU.
Abstract: A shared memory system for a multicore computer system utilizing an interconnection network that furnishes tens of processing cores or more with the ability to refer concurrently to random addresses in a shared memory space with efficiency comparable to the typical efficiency achieved when referring to private memories. The network is essentially a lean and light-weight combinational circuit, although it may also contain non-deep pipelining. The network is generally composed of a sub-network for writing and a separate multicasting sub-network for reading, whose topologies are based on multiple logarithmic multistage networks, e.g. Baseline Networks, connected in parallel. The shared memory system computes paths between processing cores and memory banks anew at every clock cycle, without rearrangement.