Abstract: A web crawling solution is presented for automatically prioritizing web crawling according to crawling policies. Web page wrappers are automatically created and updated using XPath expressions and web page analysis algorithms. Crawling is implemented using parallel queues converging into a single prioritized queue taking into account web site reputation and influence and also exploiting content, comments and metadata from social media, blogs and other sources. The crawled news content is clustered according to similarity and thematic summaries are created before serving the results.