Method and apparatus for adapting web contents to different display area dimensions
A method is disclosed to generate, while preserving text, image, transactional and embedded presentation constraint information, a minimum set of simplified and navigable web contents from a single web document that is oversized for targeted smaller devices. The method includes a parser, a content tree builder, a document tree builder, a document simplifier, a virtual layout engine, a document partitioner, a content scalar and a markup generator. The parser generates markup and data tags from an HTML source document. The builder constructs a content tree. The simplifier transforms the document tree into an intermediate one defined by a subset of XHTML tags and attributes. Layout constraints, including size, area, placement order, and column/row relationships, are calculated for partitioning and scaling the document tree into sub document trees with assigned navigation order and hierarchical hyperlinks. A simplified HTML document is then generated with the markup generator.
This application is a continuation of co-pending U.S. patent application Ser. No. 10/757,840, filed Jan. 14, 2004, which claims the benefit of U.S. Provisional Application No. 60/442,873, filed Jan. 27, 2003. The disclosure of the above identified applications is hereby incorporated herein in its entirety by reference.
BACKGROUND OF THE INVENTION1. Field of the Invention
The present invention relates generally to automatic markup language based digital content transcoding and, more specifically, it relates to a method to simplify, split, scale and hyperlink HTML web content for providing a new method to repurpose legendary web content authored for desk top viewing to support smaller devices using limited network bandwidth such as palmtops, PDAs and data-enabled cell phones wirelessly connected with small display areas and processing capacities.
2. Description of the Related Art
With the popular use of Internet, vast and still growing amount of content have been made available through typical desktop browsers such as Internet Explorer (from Microsoft), Navigator (from AOL), and Opera (from Opera). They are coded in standard markup languages such as HTML and JavaScript. However, majority of them have been authored to fit regular desktop or notebook computers with large screen size, big processing capacity connected with high speed network.
As the web steadily increases its reach beyond the desktop to devices ranging from mobile phones, palmtops, PDAs and domestic appliances, problem in accessing legendary web content start to surface. Constraints from form factor and processing capacity render them practically useless on these devices. To solve this device dependency problem, one most cost effective approach is to provide intermediary adaptation in the content delivery chain.
Examples such as transcoding proxies can transform markup languages by removing HTML tags, reformatting table cells as text, converting image file formats, reducing image size, reducing image color depths, and translating HTML into other markup languages, e.g. WML, CHTML, and HDML. More involved approaches extract subsets of original content, either automatically or manually, or employ text summarizing techniques to condense the target content. Even more elaborated systems include client components using proprietary protocols between intermediaries and corresponding programs running in client devices to emulate standard browser interfaces, such as Zframeworks from Zframe Inc.
The main problem with conventional markup content transcoding is its inability to handle the sheer volume of content, both text and images, etc. inside the document for small devices. Arbitrary linear approach to partition the content based on markup language codes often makes the results unorganized with the original presentation intent lost. Summary techniques second guess the author's intent and are not able to always satisfy user's need.
Another problem with conventional markup content transcoding is its inability to handle common hidden semantics inside web documents such as HTML tables. However, authors are increasingly marking up content with presentation rather than semantic information and render the adapted content unusable.
Another problem with conventional markup content transcoding is its complexity in supporting new devices with different form factors. Instead of gracefully scaling the target transcoding result from small to large display devices, it relies on case-by-case settings requiring expensive development effort to support new devices.
Another problem with some conventional markup content transcoding is reliant on manual customizations to edit, select or annotate original content to assist adaptation process, which tends to be costly, error prone and not readily scalable.
Another problem with some conventional markup content transcoding is its dependency on specialized client software. Both deploying proprietary software to various client devices and administrating/configuring server adaptation engine increase cost significantly. This defies the original purpose of automatic content adaptation in place of adopting complete content re-authoring.
In these respects, Content Divide & Condense, the method to generate and scale document partitions with navigational links from single web content according to the present invention substantially departs from the conventional concepts and designs of the prior art, and in so doing provides an apparatus primarily developed for the purpose of providing a new method to transcode web content authored for desk top viewing into smaller ones to accommodate small display areas and capacities in mobile devices.
SUMMARY OF THE INVENTIONIn view of the foregoing disadvantages inherent in the known types of markup content transcoding now present in the prior art, the present invention provides a new method, hereby named Content Divide & Condense, to simplify, partition, scale, and structure single content page onto hyperlinked and ordered set of content pages suitable for small device viewing before direct transcoding from HTML to the target markup language is applied, wherein the same can be utilized for providing a new method to transcode web content authored for desk top viewing into smaller ones to accommodate small display areas and capacities in mobile devices.
The general purpose of the present invention, which will be described subsequently in greater detail, is to provide a method to generate a minimum set of simplified and easily navigable web contents from a single web document, oversized for targeted small devices, while preserving all text, image, transactional as well as embedded presentation constraint information. Each of the simplified web content fits in display size and processing/networking capacity constraints of the target device. The whole set of generated pages are hyperlinked and ordered according to the intended two dimensional navigation semantics embedded inside the original content. A subset of XHTML is adopted to define the kind of content to be extracted from the original document. With the reduced content complexity in each partitioned page and the preserved navigational organization from original content, final set of documents after applying direct transcoding from each HTML partition to target markup language represent a much more accurate presentation with respect to the original content yet suitable for small device viewing.
To attain this, the present invention, named as Content Divide & Condense, generally comprises HTML parser, content tree builder, document tree builder, document simplifier, virtual layout engine, document partitioner, content scalar, and markup generator. The parser generates a list of markup and data tags out of HTML source document. It handles script-generated content on the fly and redirected content fetch similar to how common web browsers behave. Based on a specific set of layout tags, the builder constructs a content tree out of the markup and data tags. It interprets loosely composed HTML document following a set of heuristic rules to be compatible with how standard browsers work. This builder completes document tree build from the rest of markup and data tags on top of content element tree. It also adjusts the tree structure to be in compliant with XML specification without changing rendering semantics of the source HTML document interpreted by common browsers. The simplifier transforms the document tree onto an intermediate one defined by a subset of XHTML tags and attributes through filtering and mapping operations on tree nodes. Spatial layout constraints are heuristically estimated and calculated for data and image content embedded inside the document tree according to the semantics of HTML tags. Layout constraints include size, area, placement order, and column/row relationships. Based on the display size and rendering/network capacity constraints, the document tree is partitioned into a set of sub document trees with added hyperlinks and order according to the layout order and content structure. With target device display size constraint, each sub document tree is scaled individually by adjusting height and width attributes through the scalar. Source image references are modified if needed to assure server side image transcoding capability is leveraged. Each document tree defines a simplified HTML document which is generated during the markup generation step. Navigation order and hierarchical hyperlinks are assigned at the same time. The original content is thus represented by the set of smaller documents with hyperlinks and order defined between each other. Additional files such as catalog file indicating network bandwidth required for each document or text only document partitions can be generated and hyperlinked together in the same manner. Each simplified document can be transcoded onto target markup languages such as WML and cached by applying available direct transcoding technique.
There has thus been outlined, rather broadly, the more important features of the invention in order that the detailed description thereof may be better understood, and in order that the present contribution to the art may be better appreciated. There are additional features of the invention that will be described hereinafter.
In this respect, before explaining at least one embodiment of the invention in detail, it is to be understood that the invention is not limited in its application to the details of construction and to the arrangements of the components set forth in the following description or illustrated in the drawings. The invention is capable of other embodiments and of being practiced and carried out in various ways. Also, it is to be understood that the phraseology and terminology employed herein are for the purpose of the description and should not be regarded as limiting.
A primary object of the present invention is to provide a method to simplify, split, scale, and structure web content for small devices that will overcome the shortcomings of the prior art devices.
An object of the present invention is to provide a method to simplify web content to contain only the most primitive parts such as texts, images, forms, hyperlinks, and layout presentation arrangements etc., supported by standard markup language browsers for small devices.
An object of the present invention is to provide a method to extract web content to contain only the selected parts, such as text only with images as text links, or forms only, while preserving layout presentation arrangements etc. supported by standard markup language browsers for small devices.
Another object is to provide a method to split two dimensional layout arrangement such as tables, framesets and alignment to fit content display to the screen width constraint of the target device.
Another object is to provide a method to partition web content along both logical and embedded layout structure according to display area and capacity constraints of the target client device.
Another object is to provide a method to apply minimal scaling to each document partition individually to fit in target device display width constraint.
Another object is to provide a method to present the original web content by a set of hyperlinked and ordered document partitions according to the two-dimensional navigation order embedded inside the original document.
Another object is to provide a method to utilize target device display size and resource capacities to partition the document by conducting virtual layout against the original content represented by a markup language.
Another object is to provide a method to present a hyperlinked catalog content indicating the required network bandwidth required for accessing each document partition from the target device.
Other objects and advantages of the present invention will become obvious to the reader and it is intended that these objects and advantages be within the scope of the present invention.
To the accomplishment of the above and related objects, this invention may be embodied in the form illustrated in the accompanying drawings, attention being called to the fact, however, that the drawings are illustrative only, and that changes may be made in the specific construction illustrated.
BRIEF DESCRIPTION OF THE DRAWINGSVarious other objects, features and attendant advantages of the present invention will become fully appreciated as the same becomes better understood when considered in conjunction with the accompanying drawings, in which like reference characters designate the same or similar parts throughout the several views, and wherein:
Turning now descriptively to the drawings, the attached figures illustrate a method to generate and scale document partitions with navigation links from single web content for small device viewing, as shown in
Turning now to
The engine could process the same document with more than one settings at the same time. For example, it generates the partition both with and without images to allow the flexibility to turn on or off the image content while ensuring device capacity is fully utilized. The same text paragraph could appear in two partitions, one consists of only text data and the other contains also image links. Because image capacity is replaced by text data, these two partition documents can not be transcoded directly between each other by adding or removing image links. However, cross links can be inserted such that it is possible to access text data as preview and retrieve full image embedded one when interested.
Systems with Content Divide & Condense working together with client device and other servers are shown in
Referring to
Referring to
The HTML parser translates input HTML document into a list of markup and tags similar to what common browsers do. Each element of the list is either a markup with its attributes or a block of raw data, such as text data or script codes. An example of HTML code sample 77a and its corresponding markup and data list 77b is shown in
When the parser 78 encounters <FRAME> tag, source links inside the tag are resolved and corresponding document fetched 80a/parsed 78 on the fly. <FRAME> source is inserted into the original tag list right after the corresponding <FRAME> tag with an added </FRAME> tag at the end to enclose it. The process continues recursively as shown in
When the parser 78 encounters <SCRIPT> tag, JavaScript source codes are executed by a JavaScript engine 86 with a simplified document object model 88. Source links are followed to fetch remote codes 82a, if there is any. The simplified document object model 88 supports both document.write and document.writeln functions and is capable of generating HTML content 86a on the fly. The in-line generated codes, if there is any, are parsed by the parser 78 and the resulting tag list is inserted right after the corresponding <SCRIPT> tag. This process runs recursively as shown in
After parser 78 exhausts all input sources, HTML tags requiring exclusive-or selection or filtering are handled before final list 90 is generated. They include <SCRIPT> vs. <NOSCRIPT>, <FRAME> vs. <NOFRAME>, and <EMBED> vs. <NOEMBED>, <LAYER> vs. <NOLAYER>. The parser tag selection 84 ignores <NOSCRIPT>, <NOFRAME>, and <EMBED> tags. These tags and all source markup data enclosed are left out from final tag list. Capability for the parser tag selection 84 to select an intended subset of tag list from the source document could readily be added. Depending on target client device context and document semantics, the parser might have an option to choose <NOFRAME> instead of <FRAME>, <NOSCRIPT> instead of <SCRIPT>, <EMBED> instead of <NOEMBED>, <LAYER> instead of <NOLAYER>, etc. Additional tags accepted as standards moving forward could also be supported in the similar manner.
The content tree builder constructs a tree out of the set of content markup elements based on the tag list generated by the parser. An HTML tag is considered content element if it designates directly an actual layout area when the content is rendered. The set of HTML tags considered content elements are listed in Table 1(a). These tags are different from those specifying mainly display styles, user interface context, or executable script codes such as those shown in Table 2(b). The set of content tags are focused first to simplify handling of many loosely composed HTML documents where style and context tags are not required to follow strict XML structures.
The steps to build content tree 104 is shown in
Implicit tags are generated on the fly as shown in
The document tree is built during popping tags from the stack. A tree node is defined after the top element is popped from the stack. There are three possible kinds of nodes, as shown in
The set of tags considered content elements and the set of rules for determining existence of implicit tags are expected to be updated and evolve. As this design is to support legendary web content, it needs to be as lenient to document not following exactly HTML specs as common browsers are. Evolution of browser markup languages would also force new updates, hence new changes in rules and setting as discussed here.
Based on content element tree 132, the remaining non-content markup and data tags are handled to complete the document tree 134. Firstly, these tags are visited following the steps shown in
Based on the content element tree 132, each node is visited following a depth first order 132a and sub document trees, based on non-content tags, are built 142 and inserted 144 onto the content element tree 132 to form a preliminary document tree 134. The steps shown in
The tree building steps are shown in
Steps to rectify preliminary document tree 136 to be XML compliant are shown in
There are four tree insertion operations employed in the handler as shown in
Detailed steps of text style tag handler are shown in
The form handler follows the steps shown in
A sample document tree 183 built by the above stated steps from sample content element tree 131b in
The simplifier transforms the document tree onto an intermediate one defined by a subset of XHTML tags and attributes through filtering and mapping operations on tree node. A document tree is condensed and simplified based on a subset of XHTML 1.0 markup tag list specified in Table 2. The main objective of this design is to render the content in terms of document tree while preserving as much as possible the intended content, style, hyperlinks and form interactions. Markup tag associated with original document tree node could belong to HTML, XHTML, or even generic XML. The simplification process goes through each node and performs transformation or filtering against a node or a sub tree. Semantics of HTML and XHTML tags are embodied in these transformation rules.
The simplification steps are shown in
For <META> tags 200, only those with the presence of HTTP-EQUIV attribute are retained 202. Other Meta tags used for naming, keywords or other purposes are removed 204, as they do not have significance either on content or how content is fetched. Response information is extracted from the HTTP attribute value pair denoted by the values of HTTP-EQUIV and CONTENT attributes, and stored as part of document context information 206, such as document encoding and language set specification.
Table simplifier 208 is applied for <TABLE> nodes as described in
A node belonging to four types of tags are replaced by <DIV> node 210 to keep the structure in place while retaining the enclosed data. <ILAYER> 212 and <LAYER> 214 are used for positioning a block of content. This will not be proper after splitting and scaling the content. <MARQUEE> 216 is used for animating a block of content, not supported by most browsers. <OBJECT> 218 is to activate embedded client application and not handled by the simplification process. Alternate text enclosed by the <OBJECT> tags is preserved. The simplification process ignores presentation and functional controls intended by these tags and keeps only the content data as a division block.
When <BASE> node is encountered 220, the document context is updated 222 on the originating source URL. This node is removed afterwards 224, as the resulting content would be sent from servers of different URL.
An <INPUT> node with type attribute value FILE or IMAGE is removed 226. Image based input button might require client side image mapping capability which would be distorted during scaling.
<FRAMESET> node is handled by Frameset simplifier 228 as shown in
<TR> node is handled by TR simplifier as shown in
<MAP> node is handled by the map simplifier as shown in
<IMG> node 267 is handled by the img simplifier as shown in
<IFRAME> node 273 is handled by IFrame simplifier as shown in
If a node does not match any of tags considered above, it is checked against the list in Table 2. Those with tag names not preset in this table are removed from the document tree 278. Then its attributes are updated 280 as shown in
After walking through the whole document tree nodes, each <IMG> node with USEMAP attribute indexed by a map name, is further condensed 282 as shown in
Changes in the target tag and attribute list as well as how different types of document nodes are handled would result in variations of document tree reduction. For example, the data filter could employ a scheme to retain only content for hyperlinks or form interface but removing all others. Another example is the support of <STYLE> tags for getting more precise information and better control on how document would be rendered at client devices. Yet another example is support for international language attributes inside markup tags in addition to those from HTTP headers. As standards of markup language evolve, changes are expected to accommodate new developments.
Spatial layout constraints are heuristically estimated and calculated for test and image content embedded inside the document tree according to the semantics of HTML tags. Layout constraints include size, area, placement order, and column/row relationships. Display size and client capacity requirements are estimated for the simplified document through virtual layout on the underlying document tree. These parameters are used to determine how the document should be partitioned and scaled to accommodate a target client device. The process of virtual layout includes assigning placement constraints and calculating layout sizing information for each content node based on the constraints and a set of layout parameter settings.
Given a document tree, virtual layout determines the set of content children for each content node and assigns it placement constraint among these children nodes. A set of nodes C1, C2, . . . Cn form content children set S of a node N if 1. N is ancestor node of each node Ci in S and 2. each node Ci in S is either content node or data node and 3. for all leaf nodes under N rooted tree, there exists one and only one node in S as its ancestor node. By default, the content children set of a content node is defined as the collection of highest-level offspring content/data nodes. Virtual layout assigns placement constraint to document tree nodes such that 1. every leaf node of the document tree belongs to one and only one content children set and 2. each content node belongs to at most one content children set.
To estimate the minimum display width needed for content rendering, placement constraint is designated to content nodes. Placement constraints adapted here are either table with rows/columns or simply a single column. Steps to assign placement constraint are illustrated in
Four sizing parameters could be derived from the document tree with placement constraints assigned and display font sizes selected for the target client device. They are scalable width (W) in pixel, minimum width (M) in pixel, image area (A) in square pixel, and total number of characters (N). W represents size required for scalable layout components such as <IMG> and <TEXTAREA>, for example. M characterizes the minimum fixed layout component needed. It is typically the width of the longest word in the document text. A is the total area of all images in the document. N is the number of all display characters inside the document, symbolizing the amount of text information carried. The minimum display width D required for rendering a document rooted at a node with W and M will be W+M.
Font size and language settings are needed to calculate layout sizing information. Character and word boundaries are determined by language encoding for the content text data. Average width of character is dependent on the specified font family and font size. To simplify the layout process, a single font family with minimum and default font size is indexed by the client agent and language code. For example, English content from IPAQ IE browser would use Times Roman font with minimum font size 2 and default font size 3. Selection of these parameters is to be as realistic as possible and depends on the settings of specific user agent.
A layout context is referenced and updated when visiting each node. Included in this context are current font size, layout sizing constraint (Nmax, MWmax, Amax), NoFlow flag, and Atomic flag, etc. Nmax is the maximum value of N allowed for the whole document. MWmax is the maximum (W+M) value for the whole document. Amax is the maximum image area allowed, NoFlow flag is used when text characters would be laid out in one line. And Atomic flag means no partition is allowed. Style nodes such as <FONT> node affect the font size. <FORM> node enables Atomic flag, meaning elements of <FORM> tags should belong to the same document. <SELECT> node enables NoFlow flag to indicate text in a data node, mainly under <OPTION> node, should be shown in one single line.
Steps to calculate sizing parameter values for a document node associated with placement constraint are shown in
After all children of a content node have been sized, the associated placement constraint is applied to obtain sizing information 300 for this node. Generic steps to calculate these parameter values according to the constraint are shown in
Propagation function could be node specific. For <SELECT> node, the minimum of all M values among all its <OPTION> child nodes is assigned as <SELECT> node's M value. Based on a simplified document tree, the virtual layout engine derives document layout parameters without conducting actual document rendering. Final result depends on the set of sizing parameter used, placement constraints applied to each node, constraint propagation functions adopted, text layout style context employed, and the global display size setting including language encoding and user agent font families. Variation of these parameters is expected as additional aspects of document layout are considered.
Based on the display size and rendering/network capacity constraints, the document tree is partitioned into a set of sub document trees with added hyperlinks according to the layout order and content structure. Based on the sizing estimation from virtual layout, a document is partitioned and/or split according to user agent size constraints. Partitioning applies to a document and creates new documents while split operates on a document node, generating new nodes but not additional document. Partitioning and split operations are applied in accordance with the document tree to preserve the original content structure as much as possible.
Virtual layout and document partitioning are interweaved together in a bottom up process from leaf tree nodes to arrive at a set of documents where each one satisfies the user agent constraint. The steps of this process are shown in
Leaf node considered for sizing is either an <IMG> node 306 or a data node 308. <IMG> node 306 cannot be split or partitioned but a scaling factor could always be found to satisfy the sizing constraint. With NoFlow flag on in the associated layout context 310, a data node 308 cannot be split nor partitioned. Its sizing parameters are adjusted artificially 312 to satisfy the layout constraint with an assumption that the user agent would be able to make proper adjustment on the client side.
(W,N,M,A) adjustment makes updates directly on the sizing parameter values without changing the document tree. If an <IMG> node with original sizing data as (W,A,0,0) where W>MWmax or A>Amax, the sizing parameters are adjusted through a scaling factor r=min(W/MWmax, sqrt(A/Amax)). The adjusted set of sizing parameters would be (r*W, r*r*A, 0, 0). A data node under NoFlow flag with original sizing parameter (0, 0, M, N) exceeding sizing constraints would be adjusted to be (0,0, min(M, MWmax), min(N, Nmax)).
Once sizing parameters (W,A,M,N) of a node is obtained 316, the constraint MWmax is checked and split operation 318 applied if (W+M)>MWmax until the constraint is satisfied, then both Amax and Nmax constraint are checked 320 and partition operation applied if (N>Nmax) or (A>Amax) until both are satisfied. Document partition 322 is based on node split but creating a new document tree.
To split a data node, an attempt is made to insert breaks in the longest word to bring the width requirement under the MWmax constraint. This is an update of the node without adding new ones. In the case no such break is possible, M value is artificially adjusted to MWmax with an intent for user client to handle and leave the node unchanged.
Split of non-data node T separates the original T rooted sub tree into two separate ones. This operation, denoted as split (T, N0, N1, . . . Nk), requires the target node T and a set of descendant content nodes, N0, N1, . . . , Nk, from its associated placement constraint. The steps are shown in
A non-data node T with (W+M)>MWmax needs to be split based on columns in the associated placement constraint, as shown in
After MWmax constraint is handled, Amax and Nmax are considered as shown in
Steps to partition a document is shown in
A document partition on a node is accomplished by cloning its ancestor nodes and a node split on itself, as shown in
Data node and non-data node are handled differently. For a data node 366, its clone T′ is created 366a and the set of data from first characters up to the cut word identified is moved from the original node to the cloned one 366b. An example is shown in
Based on target device display size constraint, each sub document tree is scaled individually by adjusting height and width attributes through the scalar. Source image references are modified, if needed, to assure server side image transcoding capabilities, including, for example, image format change, color depth adjustment, and width/height scaling, are leveraged. Scaling process is applied to each partitioned document as well as the updated original one to change tag attributes and perform tree optimization at the same time. Scaling factor is calculated according to estimated document node layout sizing information and the target client display width available.
Overall steps for scaling are shown in
Given M, W, and Dw, scaling factor S is calculated as (Dw−M)/W if (Dw>(W+M)) and (W>0). Otherwise, S is set to 1, i.e. the content fits the screen without the need for scaling. As M represents non-scalable sizing information such as minimum word length, only W, usually minimum image width, could be scaled.
Sizing information for a document node is updated 384 and optimized 386 after scaling operations have been performed on all descendant content child nodes according to its placement constraint. The optimization removes content nodes with empty A and N. Non-content nodes without any content offspring nodes are also deleted. Placement constraint is also simplified by removing rows and columns without any descendant content nodes. Column and row span values are updated accordingly. Whether to allow ALIGN right or left for a child <IMG> node, when present, can be determined by the available display width for the current node and the minimum width needed for the rest of child nodes. Additional constraint could be employed to eliminate document nodes that don't satisfy minimum height, width or maximum scaling factor values, for example.
Steps to assign Dw to each descendant content child node of a placement constraint are shown in
The objective of this algorithm is to find a set of values for all column width such that each node in the placement constraint can be accommodated and the sum of all column width equals Dw. Because of the way Dw is calculated, there always exists such a set of values. This algorithm considers first the subset of nodes with single column span. It establishes minimum column width Dm(Ci) for each column Ci. Cw, by definition, is no less than sum of these minimum widths. The difference, if there is, is distributed among each column Ci as D(Ci).
Then it iterates through all other nodes with multiple column span and makes adjustment of column width accordingly for the new node constraint while maintaining the original minimum column width assigned. Because of convergence nature of this assignment, it is expected to settle down to a solution after certain steps. However, maximum number of iteration cycles along the nodes is set 394 to arrive at an acceptable solution without much cost.
Several additional notations used in
After a node is scaled, optimization rules are applied 386 to either remove the node or the whole rooted tree. A content document node which doesn't have any content size, i.e. A=0 and N=0, would be removed together with its rooted tree. In addition, <HTML> node which is not document root, created because of <FRAMESET> handling, is removed along with its child <HEAD> node rooted tree.
Content scalar as in
Based on the set of partitioned and scaled document trees, corresponding markup files are generated according to the subset of XHTML specs defined in Table 2, along with navigational relationship among each other. Document partition operation defines a hyperlinked relationship between the document tree with the split node and the one partitioned out. Additional ordering relationships are established for accessing one document from another in a linear manner based on the original document source text order.
Steps to calculate order for each document are shown in
Sample hyperlinks and navigation order so constructed are illustrated in FIG. 41. Six document partitions are ordered D0 398, D1400, . . . , D5 402. Hyperlink [−] points to the previous document, [+] to the next, and [ˆ] to its parent. If the first page selected to send back to the client is based on navigation order only, the client receives document D0 398. The user could either click on [+] 404 from D0 398 to go to the next page, D1400, or back to its hierarchical parent page, D5 402, through [ˆ] 406. From page D5 402, the root page, four partitions D0 398, D1400, D3 408, and D4 410 are directly linked as its child document pages. Although D2 412 follows D1400 in order, it is also linked under D3 408 hierarchically. Such hierarchy has been built during document partitions reflecting the original document layout semantics.
The first page returning to the client after partitioning varies depending on the need. It could be the first one based on navigation order, the root page along the partition hierarchy, or a separate page built from these partitions for special purpose. One such example is a catalog page with simple summary information on bandwidth requirement and navigation as well as hierarchy relationships among the pages, connected with hyperlinks. This will give user an overview of the target document without costing too much bandwidth resource before proceeding further.
As to a further discussion of the manner of usage and operation of the present invention, the same should be apparent from the above description. Accordingly, no further discussion relating to the manner of usage and operation will be provided.
With respect to the above description then, it is to be realized that variations and extensions of the embodiment are deemed readily apparent and obvious to one skilled in the art, and all equivalent relationships to those illustrated in the drawings and described in the specification are intended to be encompassed by the present invention.
Therefore, the foregoing is considered as illustrative only of the principles of the invention. Further, since numerous modifications and changes will readily occur to those skilled in the art, it is not desired to limit the invention to the exact construction and operation shown and described, and accordingly, all suitable modifications and equivalents may be resorted to, falling within the scope of the invention.
Claims
1. A method for rectifying a markup document having one or more elements comprising:
- selecting a first portion of the markup document;
- partitioning a second portion of the markup document into one or more parts according to the first portion, the first portion being separate from the second portion;
- building a document tree according to a content tree and one or more element trees, wherein the content tree corresponds to the first portion, and wherein each of the one or more element trees corresponds to one of the one or more parts; and
- generating a rectified markup document in compliance with a markup language based on the document tree for a user interface presentation.
2. The method of claim 1, wherein the one or more elements include data elements, wherein each of the data elements corresponds to a physical partition of the presentation in the user interface.
3. The method of claim 2, wherein the selection comprises:
- determining if an element of the one or more elements is a data element, wherein each element of the first portion of the markup document is a data element.
4. The method of claim 2, wherein the data elements include a table element corresponding to a table in the user interface presentation.
5. The method of claim 2, wherein the first portion of the markup document includes a first data element associated with a tag name, wherein building the document tree comprises building the content tree based on the first portion of the markup document, wherein the content tree includes a first content tree node corresponding to the first data element and a first matching data element associated with the tag name, and wherein building the content tree comprises:
- comparing the first data element with one or more of the data elements in the first portion of the markup document; and
- determining the first matching data element based on the comparison.
6. The method of claim 5, wherein the first portion of the markup document includes a second data element, wherein the content tree includes a second content tree node corresponding to the second data element, wherein one of the one or more parts is partitioned from the one or more elements of the markup document according to the first matching data element and the second data element.
7. The method of claim 6, wherein the one or more elements are associated with an order, wherein each of the one or more elements is associated with a position in the order, wherein the first matching data element is associated with a first position in the order, wherein the second data element is associated with a second position in the order, and wherein each element in the one of the one or more parts is positioned between the first position and the second position in the order.
8. The method of claim 7, wherein building the document tree further comprises:
- building one of the element tress according to elements positioned between the first position and the second position in the order in the one or more elements.
9. The method of claim 5, wherein the one or more elements include the first matching data element.
10. The method of claim 5, wherein the one or more elements do not include the first matching data element and wherein the determination comprises:
- generating the first matching data element; and
- positioning the first matching data element among the one or more elements according to the order.
11. A machine-readable storage medium having instructions therein, which when executed by a machine, causes the machine to perform a method, the method comprising:
- selecting a first portion of the markup document;
- partitioning a second portion of the markup document into one or more parts according to the first portion, the first portion being separate from the second portion;
- building a document tree according to a content tree and one or more element trees, wherein the content tree corresponds to the first portion, and wherein each of the one or more element trees corresponds to one of the one or more parts; and
- generating a rectified markup document in compliance with a markup language based on the document tree for a user interface presentation.
12. The machine-readable storage medium of claim 11, wherein the one or more elements include data elements, wherein each of the data elements corresponds to a physical partition of the presentation in the user interface.
13. The machine-readable storage medium of claim 12, wherein the selection comprises:
- determining if an element of the one or more elements is a data element, wherein each element of the first portion of the markup document is a data element.
14. The machine-readable storage medium of claim 13, wherein the data elements include an image element corresponding to an image in the user interface presentation.
15. The machine-readable storage medium of claim 12, wherein the first portion of the markup document includes a first data element associated with a tag name, wherein building the document tree comprises building the content tree based on the first portion of the markup document, wherein the content tree includes a first content tree node corresponding to the first data element and a first matching data element associated with the tag name, and wherein building the content tree comprises:
- comparing the first data element with one or more of the data elements in the first portion of the markup document; and
- determining the first matching data element based on the comparison.
16. The machine-readable storage medium of claim 15, wherein the first portion of the markup document includes a second data element, wherein the content tree includes a second content tree node corresponding to the second data element, wherein one of the one or more parts is partitioned from the one or more elements of the markup document according to the first matching data element and the second data element.
16. The machine-readable storage medium of claim 15, wherein the one or more elements are associated with an order, wherein each of the one or more elements is associated with a position in the order, wherein the first matching data element is associated with a first position in the order, wherein the second data element is associated with a second position in the order, and wherein each element of the one of the one or more parts is positioned between the first position and the second position in the order.
18. The machine-readable storage medium of claim 15, wherein the one or more elements include the first matching data element.
19. The machine-readable storage medium of claim 15, wherein the one or more elements do not include the first matching data element and wherein the determination comprises:
- generating the first matching data element; and
- positioning the first matching data element among the one or more elements according to the order.
20. An apparatus for rectifying a markup document having one or more elements, the apparatus comprising:
- means for selecting a first portion of the markup document;
- means for partitioning a second portion of the markup document into one or more parts according to the first portion, the first portion being separate from the second portion;
- means for building a document tree according to a content tree and one or more element trees, wherein the content tree corresponds to the first portion, and wherein each of the one or more element trees corresponds to one of the one or more parts; and
- means for generating a rectified markup document in compliance with a markup language based on the document tree for a user interface presentation.
Type: Application
Filed: Dec 27, 2007
Publication Date: May 8, 2008
Inventor: Vincent Lue (Sunnyvale, CA)
Application Number: 12/005,589
International Classification: G06F 17/30 (20060101);