Abstract: Data set discovery is disclosed, including: identifying first file metadata elements for a first file associated with a node in a hierarchy of data; identifying second file metadata elements for a second file associated with the node; identifying common file metadata elements among the first file metadata elements and the second file metadata elements; and determining that the common file metadata elements represent a data set comprising at least the first file and the second file.