Abstract: In embodiments of the present invention improved capabilities are described for developing, training, validating and deploying discovery avatars embodying mathematical models that may be used for document and data discovery and deployed within large data repositories.