Patents Assigned to Lixto Software GmbH
  • Patent number: 8719291
    Abstract: A method for extracting tabular information from a web source by determining a plurality of coordinates for a plurality of visualized element nodes on the web source; determining a subset of the plurality of visualized element nodes based on the plurality of coordinates to obtain a candidate web table, wherein each of the subset of the plurality of visualized element nodes constitutes a logical cell of the candidate web table; determining textual content corresponding to the subset of the plurality of visualized element nodes as the textual content would appear after rendering the web source in a browser; and transforming the candidate web table into an explicit representation of relative spatial relation between at least one of the logical cell; and saving the explicit representation in a structured document format.
    Type: Grant
    Filed: April 24, 2008
    Date of Patent: May 6, 2014
    Assignee: Lixto Software GmbH
    Inventors: Wolfgang Gatterbauer, Bernhard Kruepl, Paul Bohunsky, Marcus Herzog
  • Patent number: 7581170
    Abstract: A method and a system for information extraction from Web pages formatted with markup languages such as HTML [8]. A method and system for interactively and visually describing information patterns of interest based on visualized sample Web pages [5,6,16-29]. A method and data structure for representing and storing these patterns [1]. A method and system for extracting information corresponding to a set of previously defined patterns from Web pages [2], and a method for transforming the extracted data into XML is described. Each pattern is defined via the (interactive) specification of one or more filters. Two or more filters for the same pattern contribute disjunctively to the pattern definition [3], that is, an actual pattern describes the set of all targets specified by any of its filters.
    Type: Grant
    Filed: May 28, 2002
    Date of Patent: August 25, 2009
    Assignee: Lixto Software GmbH
    Inventors: Robert Baumgartner, Sergio I'Lesca, Georg Gottlob, Marcus Herzoo
  • Publication number: 20080294679
    Abstract: A method for extracting tabular information from a web source by determining a plurality of coordinates for a plurality of visualized element nodes on the web source; determining a subset of the plurality of visualized element nodes based on the plurality of coordinates to obtain a candidate web table, wherein each of the subset of the plurality of visualized element nodes constitutes a logical cell of the candidate web table; determining textual content corresponding to the subset of the plurality of visualized element nodes as the textual content would appear after rendering the web source in a browser; and transforming the candidate web table into an explicit representation of relative spatial relation between at least one of the logical cell; and saving the explicit representation in a structured document format.
    Type: Application
    Filed: April 24, 2008
    Publication date: November 27, 2008
    Applicant: LIXTO SOFTWARE GMBH
    Inventors: Wolfgang Gatterbauer, Bernhard Kruepl, Paul Bohunsky, Marcus Herzog