Patents by Inventor Serge Bronstein

Serge Bronstein has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 7705848
    Abstract: A method of identifying semantic units in an electronic document includes the steps of: providing an electronic document being described in a page description language, the document having at least one page having a plurality of text fragments, each text fragment including a plurality of glyphs that have not been identified as semantic units, the document further including geometric information and page description language parameters; determining strips of at least one glyph by comparing the geometric position of subsequent glyphs; determining zones of at least one strip wherein a zone is defined by the combined area of strips, the geometrical areas of which overlap with each other; determining a boundary between two semantic units in a zone based on the geometric properties of the glyphs; sorting the identified semantic units in the zone in a sorted list; and, combining subsequent semantic units in the sorted list according to geometric considerations.
    Type: Grant
    Filed: April 18, 2006
    Date of Patent: April 27, 2010
    Assignee: PDFlib GmbH
    Inventor: Serge Bronstein
  • Patent number: 7643682
    Abstract: A method of identifying redundant text fragments, which create artificial artifacts only, in an electronic page description language document includes a) providing a page having a plurality of text fragments, each text fragment comprising at least one glyph, the document including Unicode values for all glyphs and geometric information of all text fragments on the page and page description language parameters of all glyphs, b) identifying two text fragments as redundant candidates, if the Unicode sequence of the text fragments have identical corresponding Unicode sequences, c) defining a bounding box of quadrangular shape for each of the two redundant candidates according to their font characteristics, d) calculating the overlapping area of the two bounding boxes, and e) determining whether the two candidates form redundant text fragments by comparing the ratio of the overlapping area to the area of the smaller bounding box of both text fragments with a predetermined threshold.
    Type: Grant
    Filed: April 18, 2006
    Date of Patent: January 5, 2010
    Assignee: PDFlib GmbH
    Inventor: Serge Bronstein
  • Publication number: 20070002054
    Abstract: A method of identifying semantic units in an electronic document includes the steps of: providing an electronic document being described in a page description language, the document having at least one page having a plurality of text fragments, each text fragment including a plurality of glyphs that have not been identified as semantic units, the document further including geometric information and page description language parameters; determining strips of at least one glyph by comparing the geometric position of subsequent glyphs; determining zones of at least one strip wherein a zone is defined by the combined area of strips, the geometrical areas of which overlap with each other; determining a boundary between two semantic units in a zone based on the geometric properties of the glyphs; sorting the identified semantic units in the zone in a sorted list; and, combining subsequent semantic units in the sorted list according to geometric considerations.
    Type: Application
    Filed: April 18, 2006
    Publication date: January 4, 2007
    Inventor: Serge Bronstein
  • Publication number: 20060282769
    Abstract: A method of identifying redundant text fragments, which create artificial artifacts only, in an electronic page description language document includes a) providing a page having a plurality of text fragments, each text fragment comprising at least one glyph, the document including Unicode values for all glyphs and geometric information of all text fragments on the page and page description language parameters of all glyphs, b) identifying two text fragments as redundant candidates, if the Unicode sequence of the text fragments have identical corresponding Unicode sequences, c) defining a bounding box of quadrangular shape for each of the two redundant candidates according to their font characteristics, d) calculating the overlapping area of the two bounding boxes, and e) determining whether the two candidates form redundant text fragments by comparing the ratio of the overlapping area to the area of the smaller bounding box of both text fragments with a predetermined threshold.
    Type: Application
    Filed: April 18, 2006
    Publication date: December 14, 2006
    Inventor: Serge Bronstein