Abstract: A system and method for efficiently processing messages stored in multiple message stores is described. Metadata identifying a range of topically identical messages extracted from a plurality of message stores storing a multiplicity of messages to be processed is iteratively copied. The metadata for the extracted range of topically identical messages is categorized. Those messages containing substantially duplicative content within the extracted range are identified as duplicate messages. Those non-duplicate messages within the extracted range are tallied into an ordering of conversation thread length. Those messages whose content is recursively-included content within another of the tallied non-duplicate messages are classified as near-duplicate messages. The remaining messages are designated as unique messages containing substantially non-duplicative content.