Abstract: A method of generating information relating to a web page. The method comprises applying a transformation (S2) to a page representation (S1) of a first web page to determine a first feature vector. The first feature vector is compared (S3) to a plurality of feature vectors for other web pages. A subset of feature vectors from the plurality of feature vectors are selected based upon said comparison. Data is extracted (S4) from the subset of feature vectors. A method of generating a transformation for generating a feature vector from a page representation of a web page is also provided.