Abstract: An example embodiment of the present invention provides processes relating to a cache coherence protocol for distributed shared memory. In one process, a DSM-management chip receives a request to modify a block of memory stored on a node that includes the chip and one or more CPUs, which request is marked for fast invalidation and comes from one of the CPUs. The DSM-management chip sends probes, also marked for fast invalidation, to DSM-management chips on other nodes where the block of memory is cached and responds to the original probe, allowing the requested modification to proceed without waiting for responses from the probes. Then the DSM-management chip delays for a pre-determined time period before incrementing the value of a serial counter which operates in connection with another serial counter to prevent data from leaving the node's CPUs over the network until responses to the probes have been received.