Abstract: Methods and systems are provided for identifying on-topic sources of media content. According to one embodiment, candidate seed sites are identified from which current seeds are selected for deep crawling. The current seeds are identified by correlating relevancy scores or key-word search results from multiple search engines; and selecting the current seeds based on on-topic scores of the candidate seeds. Periodically, a topic net associated with the topic area of interest is executed to locate relevant sources of media content by (i) building a graph in which nodes represent pages and edges represent links among pages by performing an iterative 360 crawl starting from the seeds; (ii) assigning initial node graph scores; (iii) computing final node graph scores by performing link analysis; (iv) computing a site graph scores by aggregating and averaging corresponding node graph scores; and (v) configuring sites with the highest site graph scores to be scraped.