Patents Assigned to COLLECTIVE INTELLECT, INC.
  • Publication number: 20080114755
    Abstract: Methods and systems are provided for identifying on-topic sources of media content. According to one embodiment, candidate seed sites are identified from which current seeds are selected for deep crawling. The current seeds are identified by correlating relevancy scores or key-word search results from multiple search engines; and selecting the current seeds based on on-topic scores of the candidate seeds. Periodically, a topic net associated with the topic area of interest is executed to locate relevant sources of media content by (i) building a graph in which nodes represent pages and edges represent links among pages by performing an iterative 360 crawl starting from the seeds; (ii) assigning initial node graph scores; (iii) computing final node graph scores by performing link analysis; (iv) computing a site graph scores by aggregating and averaging corresponding node graph scores; and (v) configuring sites with the highest site graph scores to be scraped.
    Type: Application
    Filed: November 12, 2007
    Publication date: May 15, 2008
    Applicant: COLLECTIVE INTELLECT, INC.
    Inventors: Timothy J. Wolters, Mehrshad Setayesh