System and method for proofing individual documents of variable information document runs using document quality measurements
Text, images, and/or graphics of variable content documents should be organized and laid out by a template to provide effective and quality documents. The best such template depends upon the variable content interaction with the template for each individual document. To analyze the qualitative nature of the template in quantifiable terms, the each variable content document is measure using various quantifiable factors; such as, balance, uniformity, white space management, alignment, consistency, legibility, etc.; that impact a qualitative nature of a document. Such quantifiable factors are then used to quantize the aesthetics, ease of use, eye-catching ability, interest, communicability, comfort, and convenience of the document, thereby giving a designer a measure of the quality of the template in the variable content document context.
Latest Xerox Corporation Patents:
- SYSTEM AND METHOD FOR IMPLEMENTING A DATA-DRIVEN FRAMEWORK FOR OBSERVATION, DATA ASSIMILATION, AND PREDICTION OF OCEAN CURRENTS
- Authentication for mobile print jobs on public multi-function devices
- Printed textured surfaces with antimicrobial properties and methods thereof
- Method and apparatus to generate encrypted codes associated with a document
- BIODEGRADABLE POLYMER PARTICULATES AND METHODS FOR PRODUCTION AND USE THEREOF
When documents are created, many decisions must be made as to style, content, layout, and the like. The text, images, and graphics must be organized and laid out in a two-dimensional format with the intention of providing a presentation to the viewer which will capture and preferably maintain their attention for the time sufficient to get the intended message across. Different style options are available for the various content elements and choices must be made. The best choices for style and layout depend upon content, intent, viewer interests, etc. In order to tell if a set of choices made as to the look and feel of the final version of the document were good or bad, one might request feedback from a set of viewers after viewing the document and compile the feedback into something meaningful from which the document's creators or developers can make alterations, changes, or other improvements. This cycle repeats until the document's owners are satisfied that the final version achieves the intended result.
This is method of designing a document may work well with a single, non-variable, document, but not it can be very labor intensive and/or time consuming to utilize such a process on a variable information documents.
Variable Information documents are documents that are personalized or tailored in some way to the particular user of the document. In traditional variable information applications, a graphic artist creates a template for the document, which describes the overall layout for the content items, the holes into which the variable content should be placed, and rules for how to fill in the variable slots with the content, or links to particular fields in a database. The variable data application then creates a document for each customer by inserting the data for the customer into its linked slot. The resulting set of documents can contain instances which don't work well with the designed template's desired quality and/or effectiveness.
Factors that contribute to the quality and effectiveness of a document are the document's layout and style. Conventionally, these factors have been measured using subjective measures, thereby adding to the labor and time needed to fully evaluate a document.
This may not be a significant problem when evaluating a production run of a non-variable content document because an evaluator needs only to look at a sample or proof to make a determination if the non-variable content documents, generated by the production run, will have the desired quality and effectiveness. However, if the need is to evaluate a production run of variable content documents, an evaluator would need to look at all the variable content documents on an individual basis because although each document may have started with a common template, the inclusion of the variable content into the documents makes each document unique. By requiring an evaluator to review each individual document to determine effectiveness and quality, one could not ever have an effective quality control process with respect to production runs of variable content documents.
Therefore, it is desirable to measure a variable-content document's effectiveness and quality without relying upon an evaluator subjective evaluation. Moreover, it is desirable to provide a methodology to measure the quality of a variable-content document in a quantifiable way. It is also desirable to provide a quantifiable measurement of quality which is useable in evaluating a run of variable-content documents and making individual document time-effective decisions as whether a particular variable-content document of a run of variable-content documents meets the desired effectiveness and quality criteria.
A method for automatically identifying an unacceptable variable content document within a set of variable content documents generates a set of variable content documents using a pre-designed template having a desired layout and quality; measures a predetermined set of characteristics for each variable content document within the set of variable content documents; quantizes the measured predetermined set of characteristics for each variable content document within the set of variable content documents; generates a quantized quality score for each variable content document within the set of variable content documents; and identifies a variable content document as having an unacceptable quality when the quantized quality score of the variable content document is outside a predetermined range of values.
A method for automatically identifying an unacceptable variable content document within a set of variable content documents generates a set of variable content documents using a pre-designed template having a desired layout and quality; measures a predetermined set of characteristics for each variable content document within the set of variable content documents; quantizes the measured predetermined set of characteristics for each variable content document within the set of variable content documents; generates a quantized quality score for each variable content document within the set of variable content documents; and identifies a variable content document as having an unacceptable quality when the quantized quality score of the variable content document is statistically different from other quantized quality scores of the set of variable content documents.
A method for automatically identifying an unacceptable template to be used in creating a set of variable content documents generates a template to generate a set of variable content documents having a desired layout and quality; generates a set of variable content documents using the generated template and a pre-determined database; measures a predetermined set of characteristics for each variable content document within the set of variable content documents; quantizes the measured predetermined set of characteristics for each variable content document within the set of variable content documents; generates a quantized quality score for each variable content document within the set of variable content documents; identifies a variable content document having a worst unacceptable quality based upon the quantized quality scores; modifies the generated template; re-generates the variable content document having a worst unacceptable quality using the modified template; and determines if the re-generated variable content document has an acceptable quality.
The drawings are only for purposes of illustrating and are not to be construed as limiting, wherein:
For a general understanding, reference is made to the drawings. In the drawings, like reference have been used throughout to designate identical or equivalent elements. It is also noted that the various drawings are not drawn to scale and that certain regions may have been purposely drawn disproportionately so that the features and concepts could be properly illustrated.
As discussed above, variable Information documents are documents that are personalized or tailored in some way to the particular user of the document. In conventional variable information applications, a graphic artist creates a template 1000, as illustrated in
A conventional variable data application then, using the designed template, creates a document for each customer by inserting the data for the customer into its linked window. However, the resulting set of documents, because each document is individualized due to the inserting of the variable content, can contain instances which don't work well with the designed template.
For instance, longer texts than anticipated may cause overlaps. Many conventional variable information applications have a “proofing” step to test out such situations. This conventional feature allows the document's creator to look at the document when instanced with the extreme case, the document that would have the longest text. The creator might then fix the template based on this proof in order to ensure no document would generate an undesired overlap.
Although such “proofing” may eliminate the overlap problem, the conventional “proofing” methods lack the capability to automatically identify how the various instances of the variable content relates to a quality standard based on design rules. In other words, the conventional “proofing” methods lack the capability to automatically detect whether the instance with the long text violates the intended design qualities of the template, such as balance, effectiveness, aesthetics, comfort, and/or eye catchability. Without a capability to automatically analyze and detect problems, a graphic artist is limited to looking at each individual instance to determine potential or actual quality problems.
As noted above, it is desirable to provide a proofing mechanism for variable data documents that can identify which instances of a template will have poor design qualities, and also provides a means by which to suggest alternatives that improve those qualities. It is also desirable to provide a proofing mechanism for variable data documents that can automatically identify which documents of production run will have poor design qualities and possibly tag or set those documents aside for a special run and/or template modification. It is also desirable to provide a proofing mechanism for variable data documents that can automatically identify which documents of production run will have poor design qualities and to automatically adjust the layout to drive the created document towards a higher quality.
A database containing records having the first names, last names, and appropriate image to insert into each instance of the document is accessed during the production run that creates the variable content documents. As noted above, certain records will work nicely with the template 1000, as originally designed, such as the example illustrated in
On the other hand, other records will not work well at all, such as illustrated in
In the example of
In a non-illustrated example, the individuals name could be much longer. In this example, the first name data could possibly be larger than the first name slot such that the first name data overextends the first name slot and overlaps the last name data within the last name slot. In such a situation, the rendering of the persons name becomes muddle, thereby creating a variable content document of an unacceptable quality.
One solution to the above situation is a fitness function to determine quantifiable measures of a document's quality, thereby enabling the scoring of each individual instance of the document against predetermined qualities so as to find the outliers, unacceptable quality documents, of the document set. In this solution, each member of the document set is automatically proofed and given a score. The document(s) with the worst score(s) is(are) identified to the user, along with the scores for each design quality.
More specifically, the method specifies a custom document as a constraint optimization problem and automatically creates the specified document using one of a set of many existing constraint optimization algorithms. The document is modeled as a constraint optimization problem which combines both required constraints with non-required design constraints that act as optimization criteria. One of a set of many existing constraint optimization algorithms is then used to solve the problem, resulting in an automatically generated document that is well designed because it has optimized some specified design criteria. In particular, a document template is represented as a constraint optimization problem, and therefore contains a set of variables, a value domain for each variable, a set of required constraints, and a set of desired constraints (i.e. optimization functions).
The areas of the document to be filled with content are modeled as problem variables, as are any parameters of the document that can be changed. The template specifies that there are two areas that should be filled with content: areaA and areaB. The template also specifies that the positions and sizes of areaA and areaB can be changed. Thus, the problem variables for this example are: areaA, areaB, areaA-topLeftX, areaA-topLeftY, areaB-topLeftX, areaB-topLeftY, areaA-width, areaA-height, areaB-width, areaB-height.
The constraint optimization formulation further specifies that each problem variable has a value domain consisting of the possible values to assign to that variable. For variables that are document areas to be filled with content, the value domains are the content pieces that are applicable to each area. For variables that are document parameters, the value domains are discretized ranges for those parameters, so that each potential value for the parameter appears in the value domain. For variables whose value domains are content pieces, the default domain is set up to be all possible content pieces in the associated content database, which is specified in the document template.
The required constraints specify relationships between variables and/or values that must hold in order for the resulting document to be valid. The desired constraints specify relationships between variables and/or values that we would like to satisfy, but aren't required in order for the resulting document to be valid. Constraints may be unary (apply to one value/variable), binary (apply to two values/variables), or n-ary (apply to n values/variables), and are entered by the user as part of the document template. An example of a required unary constraint in the document domain is: areaA must contain an image of a castle. An example of a required binary constraint is: areaA-topLeftY+areaA-height<areaB-topLeftY. If the process had another variable (areaC), an example of a required 3-ary constraint is: areaA-width+areaB-width>areaC-width. In a variable data application, the constraints could also refer to customer attributes (e.g., areaA must contain an image that is appropriate for customer1.age).
Desired constraints are represented as objective functions to maximize or minimize. For example, a desired binary constraint might be the objective function: f=areaA-width*areaA-height, to be maximized. If more than one objective function is defined for the problem, the problem becomes a multi-criteria optimization problem. If it is a multi-criteria optimization problem, we sum the individual objective function scores to produce the overall optimization score for a particular solution. Furthermore weight each of the desired constraints can be weighted with a priority, so that the overall optimization score then becomes a weighted sum of the individual objective function scores.
Any one of the known existing constraint optimization algorithms is then applied to create the final output document. A genetic algorithm (one of the many possible constraint optimization algorithms) can be used for doing the constraint optimization and thereby automatically creating a final output document that adheres not only to the required constraints, but also to a set of desired constraints.
In a genetic algorithm formulation of constraint optimization for document creation, the genome is built such that each gene in the genome is a variable of the constraint problem. The unary constraints are used to set up the allowable value domains for each gene. These can be some default range, or input by the user.
The fitness function is defined such that it returns a fitness of 0 for any population members that do not meet the required constraints, and for the members that do meet the required constraints, it returns a fitness score that is a sum of the scores of the individual desired constraints. For instance, if the required constraints are:
C1: areaA-width<300
C2: areaB-width<300
And the desired constraints are:
C3: areaA-width=areaB-width, to be maximized (ranges from 0 to 1)
C4: areaA-height=areaB-height, to be maximized (ranges from 0 to 1)
Examples of fitness function for these desired constraints could be
f3=1−|areaA-width−areaB-width|/(areaA-width+areaB-width)
f4=1−|areaA-height−areaB-height|/(areaA-width+areaB-height)
If a population member has areaA-width=350, areaA-height=350, areaB-width=400, areaB-height=200; the fitness function returns a score of 0. If, however, a population member has areaA-width=300, areaA-height=200, areaB-width=300, areaB-height=200; the fitness function returns a score of 2. If a population member has areaA-width=225, areaA-height=200, areaB-width=300, areaB-height=200; the fitness function returns a score of 1.875.
The formulation may also be extended to allow weighting of the various desired constraints. Thus, the document creator can specify that certain desired constraints are more important than others. For instance, the constraint C3 could be weighted with an importance of 1.5, and C4 weighted with an importance of 0.5, meaning that the two objects having the same width is more important than the two objects having the same height. The fitness function's overall score is then computed as a weighted sum of the individual desired constraints.
For instance, if a population member has areaA-width=225, areaA-height=200, areaB-width=300, areaB-height=200; desired constraint C3 returns 0.875, which is multiplied by C3's weight of 1.5, to get 1.286. Desired constraint C4 returns 1, which is multiplied by C4's weight of 0.5, to get 0.5. The overall fitness score is then 1.125+0.5=1.786.
If, on the other hand, a population member has areaA-width=300, areaA-height=200, areaB-width=300, areaB-height=150; desired constraint C3 returns 1, which is multiplied by C3's weight of 1.5 to get 1.5. Desired constraint C4 returns 0.875, which is multiplied by C4's weight of 0.5, to get 0.438. The overall fitness score is then 1.5+0.438=1.938, thereby preferring the solution that violates C3 the least.
One solution to the situation illustrated in
A further solution to the situation illustrated in
In this example, the thresholds may be absolute quality measures, or alternatively, the document instances can be compared to one another, rather than to the global limits, in order to determine which documents are statistically significant outliers of the population. This would alleviate the need for a human to determine which set of qualities are the most desired in the output document.
Rather, the system would simply measure each document instance, and find those variable content documents that are statistically different than the rest, indicating a potential problem. These statistically significant variable content documents would then be reported to the user, and used as starting points for iterating through new options.
It is noted that the basic quality measures can be combined into more comprehensive scores. Combined scores such as these can be presented to the user in addition to, or as an alternative to the basic quality measures. The methodologies used to measure quality; such as aesthetics, ease of use, convenience, interest, communicability, comfort, and eye-catching; in a quantifiable manner will be discussed below.
Options for improving the document can also be suggested addition to reporting quality scores. This can be realized is by using the genetic fitness function algorithm described above. The genetic fitness function algorithm iterates through options for variations on the worst case document instance until it finds one with a better overall score than the original and presents the better one(s) as new options. Note that the variations might be applied to all instances of the document to insure that one has not created a worse score for one of the other instances in the effort to improve the current worst case.
Another possible methodology is to realize a proofing mechanism for variable data documents that can automatically identify which documents of production run will have poor design qualities by automatically measuring each instance of the template against a set of design criteria and determine which instances are “bad” according to those criteria. Once identified, the bad instances could be automatically “fixed.”
Various methods for quantifying various document properties to assist document developers in determining document quality will be discussed below. Quality can have several competing aspects and the overall quality can depend not only on the absolute properties of the document, but also on the relative importance of these properties to the beholder. One aspect or class of document quality is its aesthetics, which is its beauty, the degree to which pleasure can be derived from its appearance. Often this property is manifested in the degree of displeasure generated by an ugly layout.
Another aspect or class contributing to the quality of a document is the effectiveness with which it communicates information to the user. Documents are vessels of information, and the ease at which the viewer can gather and understand the information can be an important factor in how well the document does its job. A third aspect or class that contributes to the quality of a document is its ease of use. A factor that contributes to the ease of use is how convenient the document is, that is, can it be used with a minimum of effort. A second factor contributing overall ease of use is content grouping. Information often has some logical organization and documents can reflect this organization by grouping the content. The effectiveness, with which the document coveys this grouping and enables the viewer to capitalize on it, contributes to the ease of use. A fourth aspect or class that enters into document quality is the degree to which the user is comfortable with it. Documents that create anxiety are generally not as desirable as those that the viewer finds soothing and familiar. A fifth aspect or class that is an important contributor to the quality of some documents is the degree to which they can catch the eye of the viewer. Advertisements for example, strive to capture the attention and not to be easily overlooked. A sixth aspect or class that is similar is the ability for the document to maintain interest. It is one thing to capture the attention, but another to hold it and to avoid boredom as the document is used. A seventh aspect or class of quality can be the economy of the document, both to the creator and to the viewer. If the other contributors to quality are the same, then a lower cost version of a document is generally considered better than a more expensive one. While other factors may also contribute to document quality, the measuring of these seven aspects or classes provides a good basis for evaluating document quality.
The aspects or classes listed as contributing to document quality (with the exception of economy) are usually considered soft and ill-defined concepts; however, these properties can be quantified. The method for measuring and quantifying these attributes is to first identify document features that contribute to the property. Quantifiable measures of the individual features are then devised. And finally, the individual feature values are combined to form an overall score for the more abstract property. A full discussion of the quantization of document quality is set forth in co-pending U.S. patent application Ser. No. 10/881,157, filed Jun. 30, 2004. The entire content of co-pending U.S. patent application Ser. No. 10/881,157 is hereby incorporated by reference.
The display 93 may display the document or portion thereof that is being quantized with respect to quality. The display 93 may also display the various options that a user can choose though the user interface 94 with respect to the classes that the user wishes to quantize or the various parameters that a user can choose though the user interface 94, which are to be measured within the chosen quantization class.
The quantization architecture of
On the other hand
Each value thereof is based on properties inherent in the document itself. The values are individually combined into an overall value or score for the document. Other methods for measuring, assigning, or otherwise associating a quantifiable value for document quality should be considered for determining a value for document quality. Each rule may be defined to produce a value ranging between 0 and 1 such that 0 means low value and 1 means high value. This enables quantized quality values to be calculated and combined to form the overall document quality measure. If Vi is the value calculated for the ith rule, the document quality measure VQ is formed as a function E of these contributions such that: VQ=E(V1, V2, . . . VN). The combining function E can be as simple as a weighted average of the contributions. However, because any bad contributor can ruin the document quality no matter how good the others are, a linear combination is not preferred.
For the case of document aesthetics, the methods herein are used to generate quantifiable values for the contributing features of: balance, uniformity, white-space fraction, white-space free-flow, alignment, regularity, page security, and/or aspect ratio (optimal proportionality). As illustrated in
As illustrated in
If the visual center of the page 116 is at (xc, yc) and the maximum x and y distances (117 shows the x distance) an object can be from the visual center 102 are dx and dy, a balance value can be calculated as: VOB=1−[(((xm−xc)/dx)2+((ym−yc)/dy)2)/2]1/2. Note that one can, in a similar way, compute the balance of subclasses of objects by considering only objects belonging to the subclasses. For example, one could compute the visual balance of all pictorial images on the page, or the visual balance of all text blocks. For left-right balance, the center of visual weight (118 of
If a content object spans both the left and right sides of the page, for the purposes of this calculation, the object is divided along the vertical centerline of the page. The left and right divisions of the object are then entered into the left and right sums, respectively. If the page height is dh, a left-right balance value is: VLR=1−[(((xm−xc)/dx)2+((yL−yR)/dh)2)/2]1/2. It is noted that other definitions are possible. One might, for example, raise these balance values to powers in order to express the idea that balance is non-linear. Ideally, one would perform the psychophysical experiments to measure human response to balance and define a function that matches that response. The above expressions make use of the visual weight of an object. To first order, this can be defined as the objects area times its optical density. However, other psychological effects can also be included. Examples include color carrying more weight than gray; round shapes carrying more weight than rectangular, and positioning at the top of the page giving more weight than at the bottom.
As illustrated in
As illustrated in
A non-uniformity value can be calculated as the difference between the visual density for the portion of the page and the average page density, which is squared and weighted by the portion's area. Subtracting 1 this gives a uniformity value. In other words, a non-uniformity value van be defined as VNU=1−(Σ(Di−Dav)2Ai)/ΣAi. The average page density can also be calculated for each page individually, or an overall average page density can be determined from the visual weight of all objects on portions of all pages and the area of all pages.
As illustrated in
In
As illustrated in
The class of trapped white space is primarily concerned with relatively large blocks of white space. One way that efficiency, as used herein, can be improved is by performing a trapped white space analysis at a coarse resolution. The approach taken is to determine the area of all white space that can be accessed directly from the margins. This area then gets added to the area of the content objects (110 of
For points from the left to right boundary, the value stored in the TopProf array is compared to the top boundary and the array value is replaced with the top value if top is smaller. The difference between the bottom boundary and the page height is compared to the BottomProf array value and updated with the smaller result. Total white space area (125 of
As illustrated in
In other words, if a position has n edges contributing, n−1 separations exist between edges of distance zero. As such, there should be a contribution of n−1 from an entry count of n as well as the contribution from the separations between neighboring entry positions. If the total number of components were NumberOfObjects, the maximum contribution, if they were all perfectly aligned, would be NumberOfObjects−1. Divide by this value to normalize the score so that the final result ranges between 0 and 1.
The alignment, as illustrated in
As illustrated in
Advantageously, these regularity measures can be combined into the document quality measure as: VRH and VRV where VRH=preg calculated when EdgeCount is filled with left edge positions and VRV=preg calculated when EdgeCount is filled with top edge position. An overall position regularity value can be defined as a weighted sum of the horizontal and vertical contributions. Other measures of the effect of position regularity on document aesthetics and on document quality are envisioned, for example, a function of measured responses to differing position regularities. A uniform separation between objects can also be calculated to determine document quality. This is a measure of spacing regularity preferably calculated in a manner similar to alignment and positional regularity. However, in this instance, the array of data values corresponding to EdgeCount, contains the histogram of spacing values between objects. To determine spacing values for horizontal spacing regularity for each object, first determine the closest object (if any) that lies to the right and which overlaps in the vertical direction. The spacing then becomes the distance from the right edge of the current object and the left edge of that object's neighbor. A similar calculation determines separations for the vertical direction. If performance is an issue, an approximation of spacing can be created without the cost of identifying object neighbors by examining arrays of edge positions (as were generated for the alignment calculation). For horizontal spacing, step through the array of right edge positions. For each position determine the first left edge to the right of this location from the left edge array. The separation value becomes the distance between the right and left edge positions. To account for the possibility that more than one object may have an edge at these locations, enter into the histogram the product of the count of edges from the right and left edge histograms at these locations. The sum of these products is then used to normalize the final result instead of NumberOfObjects as in the alignment calculation. For vertical separations the calculation is analogous with the use of top and bottom edge values. An approximation of the vertical spacing histogram is determined in the same manner using the top and bottom edge-position arrays. Advantageously, regularity measures can be combined into the document quality measure as: VSH and VSV where VSH=sreg when SpacSepCount is computed from left and right edges, while VSV=sreg when SpacSepCount is computed from top and bottom edges. An overall separation regularity measure can be defined as the weighted sum of the horizontal and vertical contributions. Other measures of the effect of spacing regularity on document aesthetics and on document quality are envisioned, for example, a function of measured responses to differing spacing regularities.
As illustrated in
As illustrated in
For the case of document ease of use, the methods herein are used to generate quantifiable values for the contributing features of: separability, distinguishability, locatablility, searchability, and/or group identity. As illustrated in
Once the document content of interest has been identified, content needs to be characterized, as illustrated in
First a neighbor list associated with content group G is initialized to an empty list. The content tree is traversed upward to identify branches neighboring content group G. The content tree is then traversed downward such that elements of the identified content branches can be examined. Branches are pruned that are considered to exceed a predetermined distance from the node of the group G. Only branches considered as ‘nearby’ are recursively analyzed. Although the process described herein involves identifying neighbors N of group G, it should be understood that nothing requires group G to actually comprise a group of content as group G can be a single element (paragraphs, images, etc.) of content. The procedure IsNeighbor(G,N) is used herein to ascertain whether or not a node N is within a threshold distance of content group G, such that node N is to be considered a neighbor N of group G. This can be readily effectuated by calculating a distance between group G and neighbor N and comparing that distance to a threshold variable CloseEnough so as to determine whether Distance(G,N)<CloseEnough.
Distance can be the distance between content borders or alternatively the distance between content centers. With respect to the former, if the content centers of group G are (xG, yG) and neighbor N are (xN, yN) and the widths and heights of group G and neighbor N are (wG, hG) and (wN, hN) respectively, then distance can be readily computed by the relationship of: max(abs(xG−xN)−(wG+wN)/2, 0)+max(abs(yG−yN)−(hG+hN)/2, 0). More complex distance calculations such as minimum Euclidean distance between corners can also be used. The threshold CloseEnough can either be a constant or be adjustable with respect to content size. One can use the square root of the area of object G to determine a threshold value such that: CloseEnough=(Area(G))1/2. This also can be scaled by factor S where S is typically close to 1 such that: CloseEnough=S*(Area(G))1/2. The methods provided for evaluating distance or determining threshold are not to be considered as limiting in scope. Other methods for determining a distance measure for content objects should be considered for using a measure of distance between content objects in the context of evaluating document quality.
The depth in the tree of neighbor node N relative to content group G can be obtained by adding a depth d parameter wherein d+1 is passed in the recursive call to TraverseUp and wherein depth d−1 is passed in the recursive call to TraverseDown. The initial value of depth for d would be zero, i.e., TraverseUp(G, G, 0). Depth can be stored along with other information on the previously described list of neighbor nodes of group G. Once the document's content has been parsed and neighboring content has been identified for all content objects of interest, various properties respecting content separation can then be determined which will be subsequently used to quantify document quality.
As illustrated in
More specifically, the effective separability, as illustrated in
As illustrated in
As illustrated in
As illustrated in
When the two objects are both the same type, then one can compare the style values of one object to the corresponding style value of the other. For each style value pair one can calculate a style difference. For numeric parameters such as font size, line spacing, the style difference can be calculated as just the absolute difference of the size values. For multidimensional values such as color, the style difference can be the distance between the values. For enumerated values such as quadding, font family or font style one can use a two-dimensional look-up table indexed by the enumerated values for the two objects to retrieve difference. An overall style separation difference becomes the weighted sum of the various style differences available for the object type. For example: StyleSep=Σwi di(G, N); where the sum is over available style parameters i, and wi is the weight of the ith style parameter, and di is the difference measure for the ith style parameter.
As illustrated in
As illustrated in
One of these InherentSep values may be more appropriate for neighbor N depending upon whether N is mostly above, below, left, or right of object G. Note that neighbor N will also have an inherent separation. Thus, the complementary inherent separations from both object G and neighbor N can be combined as well. For example, if neighbor N is substantially above object G, then use the sum of InherentSepTop of G and InherentSepBottom of N.
As illustrated in
In this embodiment, this means combining separation values for each neighbor. Total separation can be given by: TotalSep=mini(EffectiveSepi); where EffectiveSepi is the EffectiveSep value for the ith neighbor, and the minimum is taken over all neighbors. An overall separability measure for a document is determined by combining total separations for all document content objects and groups. This can be by a straight average. Although, any object or group with a low separability value may adversely impact the value for the entire document, and therefore, should be given a higher weight by combining as the root of powers. Separability may vary with level in the content tree hierarchy in which an object exists. An algorithm for computing separability by recursively traversing the content tree is provided herein which calculates a weighted average using weights wL which vary with content's tree level L.
As illustrated in
On the other hand, a measure of distinguishability of these two would be low because absent neighboring objects, providing a frame of reference, few clues are provided as to which of the two paragraphs are actually being looked at. A heading can distinguish the content that follows, as illustrated in
Object G and neighbor N should be distinguishable based on content type and value, as illustrated in
For example, for paragraphs, the number of words or characters thereof can be counted. For lists, the number of list elements can be compared. For tables, the number of rows and columns can be compared. For graphic objects, size and shape can be compared. Since some object types may have several properties by which differences are measured, an overall difference is preferably calculated as a weighted sum of the various content differences for an object type. For example, ContentDistinguish=Σwi cdi(G,N), where the sum is over available style parameters i, wi is the weight for the ith content difference measure, and cdi is the actual ith difference measure. Furthermore, objects can be distinguished by their position on their respective pages, as illustrated in
For example: PositionDistinguish=(((xG−xN)2+(yG−yN)2)/(WP2+HP2))1/2. This can be further limited by only considering nearby neighbors on the same page. The same list of neighbors generated for separability can then be utilized. The cost in limiting comparisons to objects on a page, however, is the failure to recognize cases where objects on different pages are indistinguishable. If any of AlignmentSep, StyleSep, BackgroundSep and ContentDistinguish measures, (described above), provides a strong difference, then the overall effective distinguishability should be high. The closer the neighbor is to the object, the easier it should be to observe their differences. The end result should receive a boost from the SpatialSep. The value of PositionDistinguish can be a further differentiator. If boost b is defined by: b=d/(d+SpatialSep); where the d parameter controls the strength of the boost effect of spatial nearness, then: EffectiveDistinguish=c−[wa*(c−b*AlignmentSep)−p+ws*(c−b*StyleSep)−p+wb*(c−b*BackgroundSep)−p+wc*(c−b*ContentDistinguish)−p+wp*(c−PositionDistinguish)−p]−1/p); where wa, ws, wb, wc and wp are weighting values that give the relative importance of the alignment, style, background, content and position differences respectively and should sum to 1. The constant c is slightly larger than 1 to prevent division by zero. Note that this is the effective distinguishability between an object and one of its neighbors. To quantify the total distinguishability of a content object, it must be distinguished from all neighbors. In addition, any inherent features such as headers must also be considered. Total distinguishability can be determined by taking the minimum of all EffectiveDistinguish values for all neighbors. A combination of distinguishability measures, as illustrated in
As illustrated in
As illustrated in
If colors are specified in red, green and blue (R,G,B) coordinates normalized to range between 0 and 1 then luminance can be given by: Y=yr R+yg G+yb B; where yr, yg and yb are the luminance values for the red, green and blue primary colors respectively. The yr, yg and yb values depend upon the details of the color space actually used but typical values are 0.25, 0.68 and 0.07 respectively. Contrast is calculated from the luminance of the foreground Yf and that of the background Yb such that: Contrast=2|Yb−Yf|/(Yb+Yf). It should be pointed out that since both contrast and size affect visibility, these values are combined by multiplying them together. While contrast ranges between 0 and 1, size can be unbounded. For a size to be bounded by 0 and 1, the object size is normalized by dividing it by the maximum size it can be. For example: visibility=contrast*(object area)/(maximum area). In general, this is the area of the document. But, if objects are restricted to a page, the page size can be used.
As illustrated in
The structural contribution to locating a group member is combined with the distinguishability contribution. A weighted sum of the two contributions is used where the weights determine the relative importance of the two factors. However, it can be argued that if either contribution allows one to locate the element, then the overall result should be high, regardless of the other contribution. The combined result should reduce according to the size of the group. This can be achieved by: MemberLocate=(c−[wm*(c−StructLocate)−p+(1−wm)*(c−DistinguishLocate)−p]−1/p)*GroupSizeFactor; where wm is the weight of the structural contribution relative to the distinguishability contribution, c is a constant slightly larger than 1 and P is an number greater than 1.
A combination of locatability measures, as illustrated in
A further combination of locatability measures, as illustrated in
As illustrated in
An overall locatability for a document is determined by combining the total locatability for all document content objects and groups. The simplest way to combine these values is a straight average. Just as for separability and distinguishability, one might argue that any object or group with a low locatability value strongly impacts the entire document and should be given higher weight such as by combining the root of powers. The documents overall locatability gives an overall feel for how easy it is to locate items in a document by calculating and combining measures of how easy it is to locate each and every document component. An algorithm for computing document locatability is provided herein which recursively traverses the content tree to calculate a weighted average; although the weights wL can vary with tree level L.
A combination of locatability measures, as illustrated in
A document's degree of searchability can be determined by first determining a value for strength of searchability of the document, and then determining the document's search density relative to the strength of searchability. The search density is mapped to a value that ranges between 0 and 1 and in one embodiment consists of evaluating the relationship given by: 1−c/(c+Search Density); where c is a constant which is the size of the typical search density and P determines how quickly searchability approaches 1 with increasing search density. The strength of searchability is determined by features of the document intended to aid in searching. Features include at least one of the number of table elements, the number of list elements, the number of list bullets, and the number of list element numbers or the number of other reference terminals, a reference terminal being a position indicator that can be used by a reference; such as a label, a chapter number for a textual reference, or an anchor for a hyperlink.
As illustrated in
As illustrated in
A combination of measures, as illustrated in
A combination of ease of use measures, as illustrated in
A combination of measures, as illustrated in
When multiple colors are present on a page, it is not only the amount of color saturation present that is important, but also how harmonious those colors are. For example, pink and green go together much more harmoniously than pink and orange. Colors that clash will catch the eye. A contributor to the eye-catching property is therefore the color dissonance. In the following discussion, the calculation of color dissonance is described for the objects that can be seen together (i.e. the objects on a page). If the document has multiple pages, then an average color dissonance value for all pages can be determined. The color dissonance (or harmony) between two colors is largely determined by their hue difference (although the colors should have sufficient saturation and area to be noteworthy).
There are several methods known in the art for calculating an approximate hue value as an angle for the chrominance components. For example, using the E and S values described above one can define the hue as: h=arctan(S/E). As is well known in the art, special handling of the case E=0 is needed and checking signs to determine the quadrant should be done in order to avoid the confusion between E/S and (−E)/(−S). The result can also be divided by 2π to yield a value between 0 and 1. In order to calculate the color dissonance one must first determine which hues, as illustrated in
Using a table allows any desired function shape to be used; however direct calculation of the dissonance value is also possible. The dissonance table captures the model of color harmony and dissonance. A simple model is that the harmony of colors only depends on their hue difference and not the absolute hues themselves. Using this model, the dissonance table need only be indexed with the hue difference. An example of such a model is colors with hue angles that are similar (near 0 degrees apart) or opposite (180 degrees apart) or a third of the way round the hue circle (120 degrees apart) are considered harmonious while other hue angle differences are dissonant. The values stored in the dissonance table would look similar to those depicted graphically in
Another mechanism for catching the eye is to use large fonts. This makes the text readable from a distance and gives it a feeling of importance. This mechanism can be used when the document is presented in black and white. It is the maximum font size that is important here (not the average). It can be found by stepping through all the fonts used (or stepping through all the text and finding the fonts) and keeping track of the largest. The maximum font size found should be converted to a number between 0 and 1 for combination with the other measures.
A way to do this is as follows: Vf=f/(fn+f) where f is the maximum font size found and fn is close to the typical font size found in documents (e.g. 8 or 10 point). One can also consider weighting the largest font by a function the number of characters. However, while increasing the number of characters may make the document more eye-catching when only a few characters are present, the effect may diminish for large numbers of characters. The impact of font size can be calculated by considering all of the fonts within a document simultaneously, however, an alternative would be to determine the impact of each page separately and then to combine the results of the pages. Combining page results could be done by a simple average, and this may be appropriate for documents such as presentations. However, for many documents it is sufficient for only one page to be eye-catching (e.g. the cover page) and it may be better to employ a non-linear combining method that gives a high score if any of the individual page contributions are high. Or alternatively, one might use a weighted average where the first page is weighted higher than the other.
Page that is densely packed with information will typically require that information to be small and uniform and unlikely to catch the eye. This is not as hard-and-fast an indicator as color or font size because the information might, for example, be presented as a mixture of easy to ignore small black text and eye-catching large colored text. Never the less, one can use the information lightness (the inverse of information density) as another clue as to the documents eye-catching behavior. For text, a rough measure of the information present is just the number of characters Nc used to encode the information. One might also consider alternative measures such as a count of the number of words. For graphic figures, one can count the number of primitive graphical constructs (lines, rectangles, circles, arcs, strokes, triangles, polygons, etc.) used to build the figures. The count of graphic constructs Ng may be multiplied by a scaling value to normalize it with respect to the text measure.
Estimating the information content of pictorial images Np is more problematical. One simple approach is to just include a constant information estimation value for each image. Pictures are more eye-catching than pure text. That is why there are pictures on paperback-book covers that are intended to attract viewers to purchase them, but only simple text inside to convey the story. Of course, not all pictures are equally interesting, and for a true measure of a picture eye-catching ability, some analysis of the picture content would be necessary. Still, the mere presence of any pictures in a document is generally an indicator of greater eye-catching ability. A simple measure of this is the fraction of the document area devoted to pictorial images Ap. A normalized measure is: Vp=Ap/Ad
Another indicator of how eye-catching a document is its novelty, that is, the presence of the unexpected or unconventional. Of course, to tell if something is unexpected or unconventional, one must first have some model of what is expected or conventional. Such models can be quite sophisticated and can include such factors as the type of document and its anticipated use. However, the use of novelty is illustrated with a simple model. That model is a single typical value expected for each style parameter.
Style parameters are the available choices that govern the appearance and presentation of the document. They can include the presence of backgrounds and borders, the thickness of borders and rules, paragraph indentation and separation, list indentation, list bulleting, font style, font weight and so on. Style parameters also include font size and color selections, which were considered separately above. It is believed that it is proper to include color and font size in the estimation of novelty for completeness, but that they should also be singled out in the calculation of eye-catching ability since their contribution in this respect is much greater than would be explained by unconventionality alone. In the simple model each style parameter Pi has an anticipated value P0i. For any style parameter, but particularly for parameters with binary (or enumerated) choices, one can simply add in a constant novelty contribution ni if the actual style Pi does not match the expected value P0i. More sophisticated calculations are possible; for example, when the style parameter can vary continuously from the expected value (as perhaps in the case of rule width or font size). A function of the style difference can be calculated as the novelty contribution: ni=F(Pi−P0i)
For enumerated style values one can employ a table look-up to yield more flexibility and control over the novelty contribution: ni=T[Pi]. The overall document novelty can be found by taking the average of the novel contributions for all style settings. Thus if the document had m style choices, the average novelty would be: Vn=Σni/m. The expected values P0i can be set a priori, or preferably can be found by examining the style settings of typical documents. If they are determined by analyzing documents, the analysis can be conducted on an on-going basis and they can be allowed to adapt to the current typical document style.
In more sophisticated models, the expected style value may depend upon the location of the content item within the document's logical structure. Thus, the expected font style for a heading might be weighted differently from the expected setting for the body text. But however it is calculated, novelty can provide a clue as to the documents ability to catch the eye.
A property of a document contributing to its quality that is similar to its eye-catching ability is the ability of the document to hold attention and interest. While a major contributor to the interest of a document is its subject matter, the presentation of that subject matter (the style and format) can affect the interest level as well. This method calculates an interest measure for the style and format decisions, calculated as a combination of simpler factors that contribute to interest. If any of the simpler interest factors is strongly present, then the overall effect is an interesting document. Factors can include variety, change rate, emphasis, graphic fraction colorfulness, color dissonance, picture fraction, and/or novelty. Calculation methods are defined for each of these factors and each are designed to produce a value ranging between 0 and 1, such that 0 means low or bad interest value, and 1 means high or good interest value. These (and possibly other such factors) can be calculated and combined to form an overall interest measure Vi. The separate factors can be combined by a method similar to that described above for the eye-catching ability property.
As illustrated in
For example, if a document contains a 12-point bold weight font and a 10-point normal weight font, is that four styles (two sizes plus two weights) or just two styles (two fonts)? The answer for the preferred embodiment is two and the styles should be considered in combination. But this still leaves the question of what combinations should be considered. If the 12-point bold is used in a list without bullets, and the 10-point normal is used in a list with bullets, is this still only two styles, or should the list styles and font styles be considered independently? This answer is less clear. But, if one considers the correct grouping to be the entire set of style parameters so that whenever any style parameter changes a new overall style is generated, there is the potential of a combinational explosion of style instances. While this approach is not ruled out, the preferred method is to group the style parameters according to their associated content type (i.e. text styles, paragraph styles, graphic styles, list styles, table styles, content element background styles etc.).
Thus, in the above example, one would have two text styles and two list styles for four style choices in the document. This approach also avoids the problems arising from the growth of style parameters from the hierarchical structure of a document. If the document contains lists of lists of lists, the preferred approach gives three instances of the simple list style group instead of some new large group containing all the style choices of the structure.
To estimate the style variety, first decide what style parameters and parameter groups to include in the analysis. For example, one might decide to consider just the text, paragraph, and graphic styles. For text, consider font family, size, weight, style and color. For graphics, consider fill color, edge color and edge thickness. For paragraphs, consider line length, line spacing, quadding, and first-line indentation. Three lists are constructed, one for each type of style group. The list elements contain the values of the style parameters for that group. One then steps through the document's logical structure, examining each logical element being analyzed for the style setting (in this example each text segment, graphic element and paragraph.) One considers the style parameter settings of each logical content element and checks the corresponding list to see if an entry has been made with a matching set of values. If a matching list entry is found, nothing more need be done for this content element. If, however, the list does not contain a match, a new list element containing the new set of style values should be constructed and added to the list.
At the end of the document analysis, the lists should contain all of the style parameter combinations that were discovered. One can then simply count the number of list elements to determine the number of styles used. The sizes of all the lists should be combined into an overall style count. One can weight the list sizes when adding them together if one wishes to make the variety of one form of content count more than that of another (for example, one might make variety in paragraph style count more than variety in graphics). The result would be an overall weighted count of style changes s: s=Σwx sx where sx is the size of the xth style list and wx is the weight. In order to combine the style variety measure with the other contributions to interest, this weighted count should be converted to a number ranging between 0 and 1. This can be done as follows: Vv=s/(as+s) where Vv is the variety measure and as is a constant value about the size of the expected number of styles in a typical document.
As illustrated in
Calculating the style change rate is similar to calculating the style variety as described above, and uses the same style parameters and groupings. However, one need only to maintain for a single description of the most recently encountered style parameter set for each group (instead of a list of all previously encountered sets). For example, there would be a single set of most recently encountered text style parameters, a single set of the graphic style parameters and a set of the most recently encountered paragraph parameters. Step through the document's logical description and examine the style settings. Whenever a content element has style parameters that differ from those seen most recently, a count of the changes for that style group is incremented, and the new set of style values for use with the next content element is remembered. In a manner similar to the variety calculation, the change counts can be weighted and combined to form a total weighted change count c. c=Σwx cx where cx is the size of the xth style group change count and wx is the weight.
In order to combine the style change rate measure with the other contributions to interest, this weighted count should be converted to a number ranging between 0 and 1. This can be done as follows: Vch=c/(ach+c) where Vch is the variety measure and ach is a constant value about the size of the expected number of style changes in a typical document.
Some font styles are chosen to emphasize the text. Large text, bold text, and underscored text all have an implied importance over the normal text presentation. This implied importance tells the reader to wakeup and pay attention. As such, it has a special contribution to the maintenance of viewer interest. One can calculate an average emphasis measure for the text in a document by summing an emphasis value for each character and then dividing by the total number of characters. Ve=Σe(t)/nc where Ve is the emphasis measure, e is the emphasis function for character t, the sum is over all characters and nc is the total number of characters. The function e(t) should include factors for the size of the text, its weight, its variant and its contrast (other factors such as font style might also be included). The larger the font size, the greater the emphasis, but one would like to have a factor that ranges between 0 and 1. An expression such as size(t)/(afs+size(t)), where afs is a constant about the size of a typical font, will do this. The font weight (e.g. light, normal, bold, heavy) is typically an enumerated value and a table of suitable emphasis factors for each weight ew[weight(t)] can be used in the emphasis function. Similarly, the font variant (e.g. normal, underlined, strikethrough, outlined) can be handled as a table look-up such as ev[variant(t)].
Contrast also plays a role in the strength of text emphasis. Text with low contrast to the background will not have the same degree of impact as high contrast text. The luminance contrast can be calculated as described above as 2|Yb−Yf|/(Yb+Yf) where Yb is the luminance of the background and Yf=Lum(t) is the luminance of the text. An example of an emphasis function is then: e(t)=(size(t)/(afs+size(t))) ew[weight(t)] ev[variant(t)] (2|Yb−Lum(t)|/(Yb+Lum(t))). Note that one might also include other characteristics such as the font style (e.g. italic).
As illustrated in
Several of the factors that attract attention and catch the viewer's eye, will also serve to hold the attention and interest. One can list the properties of colorfulness, color dissonance, picture fraction, and novelty as examples of this joint use. The difference in behavior between attention and interest is one of relative importance or weight. Colorfulness, for example, can be very important in catching the eye, but less important in maintaining interest. Novelty, on the other hand, can be more important to maintaining interest than it is to capturing attention. Methods for estimating the strength of these four measures were described above.
A combination of measures, as illustrated in
As with aesthetics and ease-of-use, the approach to quantifying communicability is to evaluate factors identified as contributing to the effectiveness of the communication. These factors are then combined to form a composite measure. The factors contribute to the quality of the document design. If any of the simpler communicability factors is absent, then the overall ability of the document to communicate is reduced.
Component factors can include legibility, information lightness, technical level, text and image balance, red-green friendliness, ease of progression, and/or ease of navigation. Each factor can be defined such as to produce a value ranging between 0 and 1, where 0 means low or bad communicability value and 1 means high or good communicability value. These, (and possibly other such factors), can be calculated and combined to form an overall communicability measure in a manner similar to that described above for aesthetics.
A combination of measures, as illustrated in
It is further noted that a combination of measures, as illustrated in
A combination of measures, as illustrated in
The properties of the display device and the font may often be considered together; that is, one determines how decipherable a particular font is on a particular device. For example, fonts with serifs are, as a rule, easier to decipher than san serif fonts; but on a device that cannot effectively produce serifs, this may not be true. The font family, font size, font weight, font style, and font variant all can contribute to the decipherability.
An approach to dealing with the effect of font specification and device choice is to measure by experiment the decipherability (the ability to correctly determine the character presented) for a fully specified font on a particular device. This measurement can then be handled as a font property. Given the font specification one can then look up the font's decipherability contribution in a font table (df=DF[font specification]). If the font is to be displayed on the same type of device as was used for the measurement, the font contribution will not require further adjustment for the device. However, if a different display device type is used, then some sort of adjustment is needed. For example, fonts are, in general, much more decipherable when printed on paper than when presented on a CRT display. An example of an adjustment to the font decipherability is to multiply it by an adjustment factor ad for the display device.
One way to determine the adjustment factor is as a function of the smallest font size that the device is capable of effectively presenting. The function could, for example, be the ratio of the smallest effective text size for the device used in measuring the font decipherability to the smallest effective text size for the display to actually be used. For example, if the font properties were measured on a CRT that could effectively display only 8-point or larger fonts, but was to be printed on paper that could support 4-point fonts or larger, then the device adjustment factor should be 2. One may wish to adjust this factor according to the font size actually used because the effect of the display may be less important for large text.
The ease in correctly deciphering a character depends upon the familiarity with it. Reading all caps is harder than reading normal text. Numbers and punctuation characters each have their own degree of difficulty. Thus, another adjustment factor ac for the familiarity of a character should be multiplied in. This adjustment factor can be found from a table indexed by the character code. The contrast of the character with the background also contributes to the decipherability. It is harder to decipher light yellow characters on a white background than to decipher black ones. A third adjustment factor is the luminance contrast that can be calculated as was described above for locatability: al=2|Yb−Yt|/(Yb+Yt) where Yb is the luminance of the background and Yt is the luminance of the text.
As illustrated in
As illustrated in
As illustrated in
A combination of measures, as illustrated in
Reading ease is a well-known measure of a document's text. An example of a reading ease algorithm is: RE=206.835−0.846 Sy−1.015 W where Sy is the average number of syllables per 100 words and W is the average number of words per sentence. For the calculation of technical level one wants a reading difficulty measure, which can be roughly calculated as: Rd=0.85 Sy+W. Words are easier to comprehend than numbers; a large table of numbers is typically much more difficult to grasp than an equal quantity of words. To capture this, calculate the number fraction Fn, measure the ratio of numbers to the total of numbers and words. Pictures are used to aid understanding. The use of pictures reduces the technical level measure. Picture fraction was defined above as: Fp=Ap/Ad where Ap is the area of the pictures and Ad is the total area of the document. One actually needs the inverse behavior of the picture fraction, so that as Fp increases, the technical level decreases. Using Fnp=1−Fp is possible, but a few images can make a big difference in the technical level, while as more images are added, the benefits may fall off. Thus a better choice is a nonlinear function such as: Fnp=1/(ap+Fp) where ap is a constant near 1. The technical level measure can then be computed as: Tl=Rd Fn Fnp. However, Rd (and therefore Tl) is not limited to range only between 0 and 1. This can be remedied by the function: Vtl=Tl/(atl+Tl) where atl is a positive constant.
As illustrated in
The blue-yellow contrast can be calculated from the S chrominance component, defined as: S=(R+G)/2−B. The blue-yellow contrast is calculated similarly to the luminance case as: Cby=2|Sf−Sb|/(2+Sf+Sb) where Sf and Sb are the foreground and background S chrominance components respectively. The red-green friendliness of an object can be estimated by combining the luminance and blue-yellow chrominance contrast components: Frg=(CY+Cby)/2. A weighted average can also be used to combine the contrast components.
For the entire document some mechanism is needed for combining the red-green friendliness values for all document objects. One way to do this is to average the values weighted by the corresponding object areas. If Frgi is the red-green friendliness of the ith object and Ai is its area, then the average would be given by: Vrg=(ΣFrgi Ai)/ΣAi where the sums are over all objects. However, a single small object or set of objects that are difficult to decipher can have a large impact on the overall understanding of the document. Thus, some method other that weighting by area may be preferred for combining friendliness values. Other methods of combining the friendliness values are also possible.
One more property that has a bearing on the communicability of a document is the ease of progression, as illustrated in
These contributing factors are combined using a weighted average since they are not all equally important. Vep=wds Vds+wgi Vgi+wsc Vsc+wlb Vlb+wplk Vplk+whd Vhd +wal Val +wws Vws+wcs Vcs+wco Vco where the w's are the weights and the V's are the contributing factors. A combination of measures, as illustrated in
The distinguishability indicating how well one can distinguish an element from its neighbors, the group identity property indicating how easy it is to tell which objects belong as part of a logical group and which do not, the spatial coherence property that measures how closely packed together the members of a group are, and headings that describe the logical structure, were defined above in the discussion of the group contribution to ease of use. These factors also contribute to how well the document communicates, but with weights to reflect different relative importance. Spatial Coherence is singled out here because it has particular relevance to ease of progression and one may wish to give its contribution a different weight form that entering via group identity. The discussion of headings measured above combined headings, list bullets and list numbers all as one measure, but one can leave out the checks for list bullets and numbers and adapt the method to look at headings alone. This could allow headings and list bullets to be calculated separately and weighted independently. Bullets and numbers in lists help to identify the list elements and to progress through them. Documents that use bulleted and/or numbered lists should be easier to progress through that those that do not. A method to calculate a measure for this property is to count the total number of list bullets Nlb or numbers Nln and divide by the total number of list elements Nle. Vlb=(Nlb+Nln)/Nle
Since there is less chance of confusing two list numbers than confusing two list bullets, one may wish to weight the benefits of list numbers higher than bullets. Weighting the counts of bullets and numbers differently when they are combined into the numerator of the ratio to total list elements can easily do this. Vlb=(alb Nlb+aln Nln)/Nle where alb and aln are the constant weights applied to the count of bullets and count of list numbers. Internal references (such as “continued on page 7”) serve to guide the reader when the intended progression differs from basic convention. Electronic documents can include hyperlink forms that conduct the same function of guiding the reader. A simple measure of how helpful the document is in guiding the reader is just a count of such hyperlinks and/or references NL. This count should be divided by some measure of the size of the document (such as the number of content objects NO) in order to get a link density. Vplk=NL/NO
A better measure may be obtained by dividing the count of the references by a count of all the points at which the progression does not follow the typical scan order NSO. The conventional western scan order is that the next logical content element should be aligned with and to the right or below the current object. One can examine the positions of the content elements in their logical order and count the instances when this rule is not followed. These are the cases where a reference to redirect the reader would be most helpful and one can calculate the ratio of references to breaks in scan order. This will typically be a number between 0 and 1, but is not guaranteed to be confined to values 1 or less. To restrict the range, function such as those used above for confining the range can be used, but in this case a simple clamping the value to 1 should be sufficient. Vplk=MINIMUM(1, NL/NSO). It is easier to follow the conventional rules of progression (e.g. the next logical element is located directly below the current element) if the elements are aligned. This makes it clear just which element is below and which is to the right of the current element. A measure of the document alignment Val was described above in the discussion of document aesthetics.
Documents with lots of white space typically are less crowded. It is easier to distinguish and follow the elements. Thus, a high white space amount can provide a small contribution to the overall ease of progression. The non-white space area can be estimated by totaling the areas of the content objects (Ai for content object i). The total object area can be scaled by the total document area Ad. Vws=(Ad−ΣAi)/Ad. One of the conventions for progression through western documents is the scan positioning of left to right, top to bottom. This is the convention followed by text, but it can also be applied to other objects (such as the panes in a comic book). For this convention, one expects the items to have about the same height and to be aligned in rows. The left edge of the rows should be vertically aligned. One can construct a measure that indicates the deviation from this rule. The inverse of this deviation measure then gives the adherence to the rule.
Step through the document elements in their logical order. For each element find a bounding box that contains the object and indicates the position of its top yt, bottom yb, left side xl and right side xr. As one steps through the objects, the vertical position of the new object (ytn, ybn) is compared with that of the old object (yto, ybo). Objects should be placed to the right and below, but not above, so a deviation amount should be added to a deviation accumulation dcs for the degree to which the new object is above the old. If the new object is vertically in the same row as the old object, then one expects it to be located to the right of the old object. The degree to which it is left of the old object is the amount by which it deviates from the scan order model. These calculations are carried out for each consecutive pair of content elements as one steps through the document in logical order. The result is then normalized by dividing by the number of pair comparisons (the number of elements minus 1) and clamped to 1. The inverse is then returned.
A combination of measures, as illustrated in
Ease of navigation is strongly related to the locatability property for group elements that was described above in the discussion on the ease of use of groups. The measures of headings, list bullets and numbers and internal links can be captured as described. In the discussion on ease of progression one measured the fraction of progressive links. For ease of navigation one wants to count the total number of internal links or references (not just the progressive ones). This will include the entries in a table of contents and in an index as well as references or links within the main body of the document. As suggested above, one can normalize the count by dividing by the number of content objects: Vlnk=MINIMUM(1, NLT/NO) where NLT is the total number of internal links and NO is the number of content objects.
In trying to find one's way around in a document it is helpful to know when one group of content ends and another begins. Thus, there should be a contribution to the ease of navigation from the group identity measure. This is another measure that is also used in the ease of progression estimation. A measure of group identity was described in the above discussion of ease of use of groups. Group identity is calculated from other measures such as spatial coherence, the presence of borders or backgrounds, style uniformity, and alignment of elements. Another property that contributes to the quality of a document is the comfort level at which the document is perceived. A method for quantifying the document comfort level will be described next.
Comfort is calculated as a combination of simpler properties or rules. Violating any of the component rules can result in discomfort and ruin the overall comfort of the document layout. Component rules can include limitation of font forms, limitation of colors, grouping number, neatness, decipherability, non-intimidating, conventionality, color harmony, color appropriateness, consistency of luminance, and/or consistency of size. Each rule is defined to produce a value ranging between 0 and 1 such that 0 means low or bad comfort value and 1 means high or good comfort value. These (and possibly other such rules) can be calculated and combined to form an overall comfort measure. Note that the set of rules chosen is illustrative of how a comfort measure can be constructed. Other factors contributing to comfort exist and could certainly be included in a more sophisticated quantification of comfort. A combination of measures, as illustrated in
Fonts have many properties that can be selected to achieve different effects. Font families can be chosen to give the document different feelings, from formal to playful, light to serious, modern to classical. Font size can affect the cost and legibility. Font weights such as bold, can convey importance; font styles, such as italic, can indicate that it is special. Font variants such as strikethrough or outlined can add further meaning. If, however, a single document contains too many different font forms, the result is disquieting. Such “ransom note” documents are considered bad style because they lead to discomfort in the reader. The first factor that shall be considered as contributing to viewer comfort is the limitation of the number of font forms. Any change in the font specification (family, size, weight, style or variant) yields a new form. The document can be examined, and the number of distinct font forms Nf can be counted. This can be converted to a number ranging from near 0 (for the case of many font forms) to 1 (for when there is no more than a single font form) by the expression: Vlt=1/MAXIMUM(1,Nf)
However, more sophisticated measures are possible. One can, for example, include as part of the measure just how different the fonts are from one another. This can be done by first constructing a list, F, of all the font forms that appear in the document. One can then compare every font form in the list to every other font form and accumulate a measure of their differences. For fonts of different sizes, one can make the measure a function of the size difference (such as its absolute value). For font weights, one can add to the measure a function of the weight difference. Since weights are usually limited to a small set of choices, tables FW[weight(f1), weight(f2)] can be used to describe the weight difference function. Contributions due to differences in family style and variant can also be captured in tables, or a single constant amount can be added whenever any difference in any of these properties occurs. Comparing every font form to every other font form results in differences accumulating on the order of the square of the number of fonts.
Just as too many fonts are considered to be poor style, so are too many colors. A document with lots of colors is considered garish. The viewer tries to make sense of the colors and a large number makes this a difficult and uncomfortable task. A large number of colors will tire the eye. A simple measure of the effect is just a count of the number of different colors found within the document. This can be determined by stepping through the document, identifying the colors and saving them in a list (or other data structure such as a tree or hash table). As each color is encountered it can be compared to the colors already in the list to determine whether or not it has been seen before. If it is a new color then it is added to the list. After the document has been processed, the number of entries in the list can be counted to give the total number of colors Nc. This can be converted to a number ranging from near 0 (for many colors) to 1 (for no more than a single color) by the expression: Vlc=1/MAXIMUM(1, Nc)
The above scheme works for constant, uniform colors such as typically used in graphics, but does not address how to handle color sweeps or the huge number of colors seen in pictorial images. For color sweeps one can restrict the list entry to only the first and last colors of the sweep. For pictorial images, one can ignore them altogether, or extract a few colors from the image by subsampling, or extract a few colors by a cluster analysis of the image values in color space. The test for whether a color is already in the list does not have to be a strict match. One can compare colors by computing the distance between them in color space and comparing the distance to a threshold. If the distance is below the threshold, the colors can be considered close enough to match, and a new color list entry is not needed.
The comfort can depend on the choice of colors as well as the number of them. One might therefore compare the colors of the document pair-wise and accumulate a measure of their compatibility. A simple value to accumulate would be the distance between the colors in a color space, but a better measure of the affect on comfort would be the color dissonance of the pair. Since comparing colors pair-wise accumulates values as the square of the number of colors, one can divide the total by the number of colors in the document to get a measure that varies linearly with the number of colors. Not every color is equally tiring on the eye and more sophisticated measures can take this into account. Strongly saturated colors have more of an effect than neutral ones. There are several possible ways to calculate an approximate saturation value that can be used in augmenting its discomfort contribution. These were described in the above discussion on colorfulness under the eye-catching ability property.
For each color in the list, one can add a contribution to a total color discomfort measure. The contribution can be a function of the saturation. For example, for the ith color with saturation ci, the contribution might be ac+ci where ac is a constant value representing the effect of just having another color, and ci is the additional discomfort due to that color's saturation. dc=ac Nc+Σci where dc is the color discomfort measure. It is also possible to keep track of the total document area rendered in each color and include a function of both the saturation and the area in the augmentation of the discomfort calculation. The idea here is that the effect of a large colored area is stronger than the effect of a small one. An expression such as: Vlc=1/(bc+dc) where bc is a small positive constant, can be used to convert the discomfort measure into a limitation of color measure that varies between 0 and 1.
People are more comfortable with some group sizes than others. A group should not have too many or too few elements, and odd numbers are preferred over even. The best size for a group is 3 elements. A simple expression for the comfort of a group number is: Gc=1/(eg+ag (1−MOD2(eg))) where eg is the number of elements in the group, ag is a constant that gives the added discomfort of a even number of elements, and MOD2 is a function that give 0 if its argument is even and 1 if it is odd. For an entire document, one needs some method of averaging the grouping number comfort values over all groups. For example, if there are Ng groups in the document and the comfort value of the ith group is Gci, then the simple average over all groups yields: Vgn=ΣGci/Ng. More complex averaging schemes are possible. For example, one could weight the effect of the grouping number comfort differently depending on the placement of the group within the hierarchy of the document's logical structure tree.
People are generally more comfortable with a neat document than with a messy one. One can quantify neatness as a combination of contributing factors. In many cases it is easier to identify a factor that makes a document messy and uses the inverse of such factors. An example of a neatness measure is offered based on the text neatness, border and background presence, alignment, and/or regularity. Neatness estimates that employ additional factors are possible. In combining the component neatness measures, assume that any source of messiness will destroy the overall neatness (just as was argued for overall comfort). A similar combining formula can be used. Vnt=[Σwi(d+Vi)−p]−1/p−d only now the Vi are taken from the set Vtn, Vbb, Val and Vrg for the text neatness, border/background, alignment and regularity. The weights wi, and parameters p and d can be different from those used in calculating comfort.
A combination of measures, as illustrated in
For punctuation, look for quotation marks and add an extra contribution for the quotation. In general one can add a contribution based on the character code c and a table Tc can store the contribution amounts. This can apply to spaces, letters and numbers as well as punctuation. mt=mt+Tc[c]. The contributions from font and character can be chosen such that the total messiness contribution for a character never exceeds 1. To get an average value for text messiness sum the messiness value for each character (mti for the ith character) and divide by the total number of characters Nch. The text neatness is the inverse of the messiness. Vtn=1−Σmti/Nch.
An important contributor to neatness is the impression that the document components are aligned and regularly positioned. These factors were described above in the discussion on document aesthetics. Using the techniques described measures Val and Vrg for document alignment and regularity can be calculated. Note that the weighting factors for their contribution to neatness are likely to be different from the factors used in their contribution to aesthetics. Some text takes more work to decipher and understand than others do. Text printed in italics or using an abnormal font variant is harder to read. Light colored text on a light background, or dark text on a dark background takes an effort to decipher. This work will tire the reader and make the document uncomfortable to use. A method for estimating the average decipherability of a document Vdc was described above in the discussion on how well a document communicates. Some document constructs can act to intimidate the reader. By noting the degree to which these factors are present, one can form an intimidation measure. Intimidation acts against comfort, so the inverse of the intimidation factor should contribute to the comfort estimation. Factors that intimidate include a low amount of white space, high information density, low legibility, bold text, a low picture fraction, line use, and/or a high technical level. Many of the factors are familiar from IRS forms. A non-intimidation measure is actually calculated by combining the inverses of the factors that intimidate. To combine the various contributions to the document's non-intimidation factor, a simple weighted average is used, although more complex combination schemes are possible. Vin=ΣwiVi where wi are the weights and the Vi are the non-intimidation component values Vws, Vil, Vlg, Vdc, Vnb, Vpf, Vnl, Vlt corresponding to the above list of factors.
A combination of measures, as illustrated in
The non white space area can be estimated by totaling the areas of the content objects. The total object area can be scaled by the total document area Ad. Vws=(Ad−ΣAi)/Ad. Densely packed information is intimidating and so inverse of the information density can contribute to the non-intimidation measure. Such an information lightness measure was described above in the discussion of a document's eye-catching ability. An illegible document is intimidating, so legibility should contribute to the non-intimidation measure. A method for estimating legibility was described in the above discussion of a document's ability to communicate. The use of bold or heavy weight text is intimidating. Since a non-intimidation measure is desired, one would like to have a text lightness measure (high values associated with light text weights). A method for determining such a measure is straightforward. Step through the document and examine the text to see what fonts are used. One can use a table TI to look up a lightness value tl for the weight of the font f. tl=TI[weight(f)].
If tli is the lightness value for the ith character, then one can find an average lightness (non-boldness) value by summing the lightness values and dividing by the total number of characters Nch. Vnb=Σtli/Nch. The presence of vertical lines can be intimidating, especially thick ones with high contrast. A method for quantifying the effect of vertical lines is to first step through the document and find them. This includes vertical lines that are part of borders and also rectangles with the ratio of width to height less than a threshold value. For each line discovered, multiply its area Al by its luminance contrast cl. Sum all the weighted areas and divide by the area of the document Ad to get a value between 0 and 1. Since the area devoted to vertical lines is typically small this expression understates the effect, but raising it to a fractional power can boost its strength. One then needs to invert the result to get the non-intimidation contribution. Vnl=1−(ΣcliAli/Ad)1/p
Highly technical material is intimidating. The measure of technical level includes such things as reading ease, the presence of numbers, and the absence of pictures. A definition of an example technical level measure is given above in the discussion of how well a document communicates. The technical level Vtl can be inverted for a measure of non-technical level that can be used in the non-intimidating calculation. Vnt=1−Vtl. People have certain expectations about document styles. There are conventions that they are accustomed to. Violating such customs may yield some benefits (such as attracting attention) and incur costs (such as reduced ease of use). Violating convention almost always creates a little discomfort. Conventionality is defined as the inverse of novelty. A measure of novelty was presented above in the discussion of how well a document holds interest.
Some combinations of colors fit harmoniously together while others clash. Clashing or dissonant colors tire the eye and cause discomfort while harmonious colors can sooth the viewer. Color harmony is defined as the inverse of color dissonance, Vd, which was described above in the discussion of a document's eye-catching ability. The color harmony is then: Vch=1−Vd. Another aspect of what is expected is the appropriateness of the color choices. The document design rule is that large background areas should use desaturated colors while small foreground objects should use saturated colors. One can form a measure of the color inappropriateness by multiplying each object's area by its saturation. Actually the area should be measured as a fraction of the total document area Ad in order to restrict the result to the range of 0 to 1. A large result comes from a large area with a high saturation (which is inappropriate). For an average value for the entire document, one must combine the values from all objects, and with a simple weighting of saturation by area it would be possible to get a measure of inappropriate color use from many small saturated foreground objects, when this may actually be appropriate. A better measure is to raise the area fraction to a power. This further reduces the influence of small objects. This leads to a color appropriated measure that looks as follows: Vca=1−Σci(Ai/Ad)p where p is a value greater than 1.
The rule for consistency of luminance states that for a group of content elements, the dark elements should come first and the lighter elements should follow. Note, however, that the logical structure of a document is typically a tree with each branch node representing a group. Thus the members of a group are often other groups. The content elements may not be simple objects with a single color and luminance. The consistency of luminance rule can still be applied, but the luminance used should be the average luminance of the subtree group member. To determine the average luminance of an object, get the luminance of the object Lf, the luminance of the background Lb, the area with the foreground color Af and the bounding area of the object Ao. The average luminance Lav is then: Lav=(Lf Af+Lb(Ao−Af))/Ao. The average luminance for a group of objects is the sum of the average luminance values for its members weighted by their areas plus the contribution from the background. If Ag is the bounding area of the group, Lavi is the average luminance for the ith group member and Ai is the area of that member then the average luminance for the group Lavg is: Lavg=ΣLaviAi+Lb(Ag−ΣAi))/Ag.
To find a measure of the consistency of luminance for a group, step through the members of the group and find the average luminance of each member. Compare that luminance to the previous member's luminance and if the new luminance is darker than the old then collect the difference. This actually gives a measure of the inconsistency and one can use a reciprocal function to convert it to a consistency value ranging between 0 and 1. The above method indicates how to calculate a measure for each node in the content tree, but does not say how to obtain a collective value for the tree as a whole. One method for doing this is to form a weighted average of all the tree node values, where the weight is a function of the depth of the tree. One can also raise the values being combined to a negative power such that a bad consistency value carries the impact of many good values. This can be summarized as: Vcl=((Σwi(dcl+Vcli)−p)/Σwi)−1/p−dcl where the sums are over all group nodes in the content tree, wi is the node depth Vcli is the consistency of luminance of the node and dcl is a small positive constant and p is a positive value such as 1.
A combination of measures, as illustrated in
In graphic design there are many consistency rules. Consistency helps people build an internal model of the document that, in turn, makes it easier to use. Some of the contributing rules or factors to consistency and how factors can be combined into an overall consistency measure will now be described. The example consistency measure will include position order, luminance, size, and/or style. The methods for calculating measures for these factors have been described above and will not be repeated in detail here. In combining the component consistency measures assume that any source of inconsistency will destroy the overall consistency. A combining formula that can be used is as follows. Vnt=[Σwi(d+Vi)−p]−1/p−d where the Vi are taken from the set Vcp, Vcl, Vcsz and Vcst. The weights wi, indicate the relative importance of the different measures. The parameter p is a number 1 or larger and d is a value slightly larger than 0.
A combination of measures, as illustrated in
A combination of measures, as illustrated in
A document that is difficult to read is often difficult to use. A measure of legibility Vlg was defined above as a contributor to a document's communicability. It can contribute to convenience as well as communicability but with a different weight. In fact, one could argue that communicability, as a whole, should be used as a contributor to convenience. While this is not ruled out, the example here will just include a few of the components of communicability that have particular bearing on convenience. Considering them separately allows one to give them different weights when contributing to convenience than those used for the contribution to communicability. In general, disability proof refers to how well the document can serve people with handicaps. For example, a document of only text can be read to someone who is blind, but a document with images would be much harder to convey. Another example of a contributor to a disability proof measure is the red-green friendliness property that was defined in the above discussion on how well a document communicates. The idea behind the measure is that there should be either luminance contrast or blue-yellow contrast between foreground and background colors in order to be red-green friendly. Without this contrast it would be difficult for a colorblind person to distinguish foreground object from background. This measure will be used as an example of a simple disability proof function, Vdp. Additional functions for other handicaps are certainly possible and could be combined into a more sophisticated measure.
Methods for estimating the ease of navigation Ven and ease of progression Vep were also described above in the discussion of how well a document communicates. They contribute to convenience as well as communicability, and, in fact, are more important (and have larger weights) as convenience measures than as communicability measures. The idea behind the calculation of these properties is to estimate and combine contributing features such as distinguishability, group identity, spatial coherence, list bullets, headings, internal links, alignment and others. Two other related concepts are the searchability Vsh and the locatability Vlo. Locatability is a measure of how easy it is to find a document object (whereas ease of navigation is how easy it is to find a document location). Searchability is a rougher measure that looks for the presence of document features that aid in locating document objects. These measures have been described above in the discussion of measures for the ease of use of content groups.
When a document is broken into pages, some content groups may get spread over two or more pages. If the document is displayed on a workstation, some entire content groups may not fit completely into the display window. This inability to view the logical group as a unit can be a hindrance and should reduce the document's convenience measure. To estimate the viewable fraction for a group displayed on a workstation, first find the bounding size (width and height of the group (wg, hg). Next find the size of the typical display window (wp, hp). The viewable width and height is the minimum of the group and window dimensions.
wv=MINIMUM(wg, wp)
hv=MINIMUM(hg, hp)
The measure of unity of display for the group is then given by ratio of the visible area to group area: U=(wv hv)/(wg hg).
For the case where the group has been split over pages, one can construct a measure by first finding the area of the group elements on each page (e.g. Agp for page p). Next find the maximum area among the pieces and divide it by the total group area. U=MAXP(Agp)/ΣAgp. While this provides a measure for any particular group within a document, one still has to somehow combine these group measures to achieve an overall measure of the document's viewable fraction. Recognize that the level of the group within the documents logical tree structure should make a difference. One would be much less likely to expect or need high-level groups to be seen as a unit than the low level groups near the bottom of the tree. First sort the groups by their tree level and find a simple average value for each level (i.e. Uav L). Then combine the average values for the levels weighted by a function of the level: Vvf=Σw(L) Uav L/Σw(L)
The weighting function w(L) should increase with increasing level such as w(L)=a L for a constant a. While the viewable fraction measure gives some indication of whether document components can be seen in their entirety, there is a special advantage in being able to see the entire document in a single window or page. A simple calculation can be used to create this measure. It is the same as for viewable fraction, only it uses the area of the entire document. If the width and height of the document are wd, hd and the width and height of the display or page are wp hp, then calculate:
wv=MINIMUM(wd, wp)
hv=MINIMUM(hd, hp)
And set the single window display measure to: Vswd=(wv hv)/(wd hd)
One other dimension by which the quality of a document may be judged is by the costs that it incurs. Costs arise in several ways. For printed documents, there is the cost of the materials required (the paper and the ink). There is also a cost in the effort required to print the document (labor and press time). Material cost may not apply to documents viewed on electronic displays, but there is the cost to transmit and store the document. There is also the cost in the time the viewer spends waiting while the document is transmitted, or while it is being processed for display. Many of these costs depend upon the size of the document (such as described above for transmission and processing time). However, other properties can also have an effect. For example, the size of the fonts can affect the amount of paper needed for printing, and the presence of color can affect the cost of the ink.
Claims
1. A method for automatically identifying an unacceptable variable content document within a set of variable content documents, comprising:
- (a) generating a set of variable content documents using a pre-designed template having a desired layout and quality;
- (b) measuring a predetermined set of characteristics for each variable content document within the set of variable content documents;
- (c) quantizing the measured predetermined set of characteristics for each variable content document within the set of variable content documents;
- (d) generating a quantized quality score for each variable content document within the set of variable content documents; and
- (e) identifying a variable content document as having an unacceptable quality when the quantized quality score of the variable content document is outside a predetermined range of values.
2. The method as claimed in claim 1, further comprising:
- (f) fixing each variable content document identified as having an unacceptable quality to improve a quality thereof.
3. The method as claimed in claim 1, further comprising:
- (f) modifying each variable content document identified as having an unacceptable quality;
- (g) measuring a predetermined set of characteristics for each modified variable content document within the set of variable content documents;
- (c) quantizing the measured predetermined set of characteristics for each modified variable content document within the set of variable content documents;
- (d) generating a quantized quality score for each modified variable content document within the set of variable content documents; and
- (e) identifying a modified variable content document as having acceptable quality when the quantized quality score of the variable content document is within the predetermined range of values.
4. The method as claimed in claim 1, further comprising:
- (f) generating a second set of variable content documents, one variable content document for each variable content document identified as having an unacceptable quality, using a second pre-designed template having a different layout.
5. The method as claimed in claim 1, further comprising:
- (f) generating a second set of variable content documents, one variable content document for each variable content document identified as having an unacceptable quality, using a second pre-designed template having a different layout;
- (g) measuring a predetermined set of characteristics for each modified variable content document within the second set of variable content documents;
- (c) quantizing the measured predetermined set of characteristics for each modified variable content document within the second set of variable content documents;
- (d) generating a quantized quality score for each modified variable content document within the second set of variable content documents; and
- (e) identifying a variable content document within the second set of variable content documents as having acceptable quality when the quantized quality score of the variable content document within the second set of variable content documents is within the predetermined range of values.
6. A method for automatically identifying an unacceptable variable content document within a set of variable content documents, comprising:
- (a) generating a set of variable content documents using a pre-designed template having a desired layout and quality;
- (b) measuring a predetermined set of characteristics for each variable content document within the set of variable content documents;
- (c) quantizing the measured predetermined set of characteristics for each variable content document within the set of variable content documents;
- (d) generating a quantized quality score for each variable content document within the set of variable content documents; and
- (e) identifying a variable content document as having an unacceptable quality when the quantized quality score of the variable content document is statistically different from other quantized quality scores of the set of variable content documents.
7. The method as claimed in claim 6, further comprising:
- (f) fixing each variable content document identified as having an unacceptable quality to improve a quality thereof.
8. The method as claimed in claim 6, further comprising:
- (f) modifying each variable content document identified as having an unacceptable quality;
- (g) measuring a predetermined set of characteristics for each modified variable content document within the set of variable content documents;
- (c) quantizing the measured predetermined set of characteristics for each modified variable content document within the set of variable content documents;
- (d) generating a quantized quality score for each modified variable content document within the set of variable content documents; and
- (e) identifying a modified variable content document as having acceptable quality when the quantized quality score of the variable content document is statistically equal to the other quantized quality scores of the set of variable content documents.
9. The method as claimed in claim 6, further comprising:
- (f) generating a second set of variable content documents, one variable content document for each variable content document identified as having an unacceptable quality, using a second pre-designed template having a different layout.
10. The method as claimed in claim 6, further comprising:
- (f) generating a second set of variable content documents, one variable content document for each variable content document identified as having an unacceptable quality, using a second pre-designed template having a different layout;
- (g) measuring a predetermined set of characteristics for each modified variable content document within the second set of variable content documents;
- (c) quantizing the measured predetermined set of characteristics for each modified variable content document within the second set of variable content documents;
- (d) generating a quantized quality score for each modified variable content document within the second set of variable content documents; and
- (e) identifying a variable content document within the second set of variable content documents as having acceptable quality when the quantized quality score of the variable content document within the second set of variable content documents is statistically equal to the other quantized quality scores of the set of variable content documents.
11. A method for automatically identifying an unacceptable template to be used in creating a set of variable content documents, comprising:
- (a) generating a template to generate a set of variable content documents having a desired layout and quality;
- (b) generating a set of variable content documents using the generated template and a pre-determined database;
- (c) measuring a predetermined set of characteristics for each variable content document within the set of variable content documents;
- (d) quantizing the measured predetermined set of characteristics for each variable content document within the set of variable content documents;
- (e) generating a quantized quality score for each variable content document within the set of variable content documents;
- (f) identifying a variable content document having a worst unacceptable quality based upon the quantized quality scores;
- (g) modifying the generated template;
- (h) re-generating the variable content document having a worst unacceptable quality using the modified template; and
- (i) determining if the re-generated variable content document has an acceptable quality.
12. The method as claimed in claim 11, further comprising:
- (j) repeating the modification of the generated template and re-generation of the variable content document having a worst unacceptable quality until the re-generated variable content document has been determined as having an acceptable quality.
13. The method as claimed in claim 11, further comprising:
- (j) using the modified template, when the re-generated variable content document has been determined as having an acceptable quality, to generate a proofed set of variable content documents.
14. The method as claimed in claim 12, further comprising:
- (k) using the modified template, when the re-generated variable content document has been determined as having an acceptable quality, to generate a proofed set of variable content documents.
15. The method as claimed in claim 11, further comprising:
- (j) re-generating each variable content document identified as having acceptable quality using the modified template; and
- (k) determining if any of the re-generated variable content documents, originally having an acceptable quality, using the modified template have unacceptable quality.
16. The method as claimed in claim 15, further comprising:
- (l) repeating the modification of the generated template and re-generation of all the variable content documents until all the re-generated variable content documents have been determined as having an acceptable quality.
17. The method as claimed in claim 11, wherein the variable content document is identified as having a worst unacceptable quality based upon the quantized quality score of the variable content document being most statistically different from other quantized quality scores of the set of variable content documents.
18. The method as claimed in claim 11, wherein the variable content document is identified as having a worst unacceptable quality based upon the quantized quality score of the variable content document has a greatest difference from a predetermined range of values.
5018207 | May 21, 1991 | Purdum |
5425945 | June 20, 1995 | Bell |
5442778 | August 15, 1995 | Pedersen et al. |
5963641 | October 5, 1999 | Crandall et al. |
6571000 | May 27, 2003 | Rasmussen et al. |
6581056 | June 17, 2003 | Rao |
6671405 | December 30, 2003 | Savakis et al. |
6748097 | June 8, 2004 | Gindele et al. |
6795580 | September 21, 2004 | Janko et al. |
20020191219 | December 19, 2002 | Bondy et al. |
20030051216 | March 13, 2003 | Hsu et al. |
20030140307 | July 24, 2003 | Bar-Yossef et al. |
20030202684 | October 30, 2003 | Beckman et al. |
20030229543 | December 11, 2003 | Zimmerman et al. |
20040039990 | February 26, 2004 | Bakar et al. |
20040066527 | April 8, 2004 | Kloosterman et al. |
20040068692 | April 8, 2004 | Cho et al. |
20040078337 | April 22, 2004 | King et al. |
20040151399 | August 5, 2004 | Skurdal et al. |
20050028074 | February 3, 2005 | Harrington et al. |
20050028075 | February 3, 2005 | Harrington et al. |
20050028076 | February 3, 2005 | Harrington et al. |
20050028096 | February 3, 2005 | Harrington et al. |
20050028097 | February 3, 2005 | Harrington et al. |
20050028098 | February 3, 2005 | Harrington et al. |
20050028099 | February 3, 2005 | Harrington et al. |
20070036393 | February 15, 2007 | Harrington et al. |
20070036394 | February 15, 2007 | Harrington et al. |
20070041617 | February 22, 2007 | Harrington et al. |
1109132 | June 2001 | EP |
1168245 | January 2002 | EP |
- Shin, H.; Dala, E.; Rasmussen, R.; Predicting Customer Preference form Ojective Image Quality Metrics for Monochrome Document Properties; SPIE vol. 5294 c. 2004 SPIE-IS&T; pp. 155-164.
- Govindaraju, V.; Srihari, S.; Assessment Of Image Quality To Predict Readability Of Documents; SPIE vol. 2660; pp. 333-343.
- Engledrum, P.; IMCOTEK; Image Quality Modeling: Where Are We?; IS & T's 1999 PICS Conference; pp. 251-255.
- Flightcheck 5 Professional—User Manual; Markzware, Inc.; c. 1993-2003 USPatent 5963641.
Type: Grant
Filed: Jan 11, 2005
Date of Patent: Dec 18, 2007
Patent Publication Number: 20060155699
Assignee: Xerox Corporation (Norwalk, CT)
Inventors: Lisa S. Purvis (Fairport, NY), Steven J. Harrington (Webster, NY), Robert J. Rolleston (Rochester, NY), Jean M. Ellefson (Fairport, NY)
Primary Examiner: Diane Mizrahi
Attorney: Basch & Nickerson LLP
Application Number: 11/032,746
International Classification: G06F 17/30 (20060101);