Abstract: For each newspaper page represented in the PostScript data, the PostScript data are parsed to extract therefrom text data, text position data, font information data, image position data and, preferably, a bitmap of the page. Furthermore, each occurrence of a “page refer,” a URL or an electronic mail address on the page as described by the PostScript data is identified and the location of same on the page is extracted. Also, the PostScript data are processed to identify the story locations and image/advertisement locations on the page. Finally, the PostScript data are processed to identify bookmark data thereon. All extracted information concerning the page is stored in a current page information database. The current page information database for each page of the newspaper is thereafter used together with a predefined page type information database that includes default data that varies depending upon the particular type of newspaper page to be represented.