Abstract: A plurality of chiplets may be used to multiply two matrices A and B. Matrix A may be decomposed into horizontal stripes and matrix B may be decomposed into vertical stripes. Each of the horizontal stripes may be multiplied by each of the vertical stripes to form the output matrix C. Specifically, horizontal stripes may be stored in a stationary, distributed manner across the chiplets, while the vertical stripes (or sub-vertical stripes) may be passed between respective pairs of the chiplets until each of the vertical stripes (or sub-vertical stripes) of matrix B has been received and processed by each of the chiplets. The vertical stripes may be passed along one or more paths that interconnect the chiplets. Similar techniques can be applied to an arrangement in which the vertical stripes are stationary and the horizontal stripes (or sub-horizontal stripes) are passed between respective pairs of the chiplets.
Abstract: A first and second host device are configured to communicate data via a data channel (e.g., a half-duplex channel). A control channel (e.g., a full-duplex channel) may be used to arbitrate the dynamic assignment of the respective roles of master and slave to the first and second host devices. The arbitration procedure may include transmitting a first message from the first host device to the second host device that specifies information for allocating one or more time slots to the first host device, and transmitting a second message from the second host device to the first host device that specifies information for allocating one or more time slots to the second host device. At each of the first and second host devices, a communication schedule for the data channel may be determined based on a rule set, the first message and the second message.
Abstract: A plurality of chiplets may be used to multiply two matrices A and B. Matrix A may be decomposed into horizontal stripes and matrix B may be decomposed into vertical stripes. Each of the horizontal stripes may be multiplied by each of the vertical stripes to form the output matrix C. Specifically, horizontal stripes may be stored in a stationary, distributed manner across the chiplets, while the vertical stripes (or sub-vertical stripes) may be passed between respective pairs of the chiplets until each of the vertical stripes (or sub-vertical stripes) of matrix B has been received and processed by each of the chiplets. The vertical stripes may be passed along one or more paths that interconnect the chiplets. Similar techniques can be applied to an arrangement in which the vertical stripes are stationary and the horizontal stripes (or sub-horizontal stripes) are passed between respective pairs of the chiplets.