SYSTEM AND METHOD FOR MARK-UP LANGUAGE DOCUMENT RANK ANALYSIS
A system and method for mark-up language document rank analysis that may be performed automatically and that may also determine one or more differences between mark-up language documents with regard to their relative rank.
Latest REMEZTECH LTD. Patents:
This application claims priority from U.S. Provisional Application No. 61/356,607, filed on Jun. 20, 2010, and from U.S. Provisional Application No. 61/394,350, filed on Oct. 19, 2010, both of which are hereby incorporated by reference as if fully set forth herein.
FIELD OF THE INVENTIONThe present invention is of a system and method for mark-up language document rank analysis, and in particular but not exclusively, to such a system and method that is useful for determining one or more differences between mark-up language documents with regard to their relative rank.
BACKGROUND OF THE INVENTIONSearch engines play important roles for supporting user interactions with the Internet. Search engines often act as a “gateway” to the Internet for many users, who use them to locate information of interest as a first resource. They are practically indispensable for negotiating the many thousands of web pages that form the World Wide Web.
Many users typically review only the first page or first few pages of search results that are provided by a search engine. For this reason, owners of web sites alter their web pages to increase their rank, whether by making the pages more “friendly” to spiders or by altering content, layout, tags and so forth. This process of changing a web page to increase its rank is known as SEO or “search engine optimization”.
Currently search engine optimization is typically performed manually. Search engines carefully guard their rules and algorithms for determining rank, both against competitors and also to avoid “spam” web pages which do not provide useful content but which seek only to have a high ranking, for example to attract advertisers. However, manual analysis and adjustments are highly limited and may miss many important improvements to web pages that could raise their rank in search engine results.
SUMMARY OF AT LEAST SOME ASPECTS OF THE INVENTIONThe background art does not teach or suggest a system and method for mark-up language document rank analysis that may be performed automatically and that may also determine one or more differences between mark-up language documents with regard to their relative rank.
The present invention overcomes these drawbacks of the background art by providing, in at least some embodiments, a system and method for mark-up language document rank analysis that may be performed automatically and that may also determine one or more differences between mark-up language documents with regard to their relative rank.
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The materials, methods, and examples provided herein are illustrative only and not intended to be limiting.
Implementation of the method and system of the present invention involves performing or completing certain selected tasks or steps manually, automatically, or a combination thereof. Moreover, according to actual instrumentation and equipment of preferred embodiments of the method and system of the present invention, several selected steps could be implemented by hardware or by software on any operating system of any firmware or a combination thereof. For example, as hardware, selected steps of the invention could be implemented as a chip or a circuit. As software, selected steps of the invention could be implemented as a plurality of software instructions being executed by a computer using any suitable operating system. In any case, selected steps of the method and system of the invention could be described as being performed by a data processor, such as a computing platform for executing a plurality of instructions.
Although the present invention is described with regard to a “computer” on a “computer network”, it should be noted that optionally any device featuring a data processor and the ability to execute one or more instructions may be described as a computer, including but not limited to any type of personal computer (PC), a server, a cellular telephone, an IP telephone, a smart phone, a PDA (personal digital assistant), or a pager. Any two or more of such devices in communication with each other may optionally comprise a “computer network”.
The invention is herein described, by way of example only, with reference to the accompanying drawings. With specific reference now to the drawings in detail, it is stressed that the particulars shown are by way of example and for purposes of illustrative discussion of the preferred embodiments of the present invention only, and are presented in order to provide what is believed to be the most useful and readily understood description of the principles and conceptual aspects of the invention. In this regard, no attempt is made to show structural details of the invention in more detail than is necessary for a fundamental understanding of the invention, the description taken with the drawings making apparent to those skilled in the art how the several forms of the invention may be embodied in practice.
In the drawings:
The present invention is, in at least some embodiments, of a system and method for mark-up language document rank analysis that may be performed automatically and that may also determine one or more differences between mark-up language documents with regard to their relative rank.
Referring now to the drawings,
Analysis subsystem 106 optionally and preferably receives such search results 104 in response to a query, which is preferably formatted as for any search engine query (for example, containing one or more keywords). The query is preferably generated and transmitted by a data collector 110, which also receives search results 104.
Data collector 110 also preferably obtains the mark-up language documents associated with search results 104, for example by downloading such documents from a server. As non-limiting examples, data collector 110 is shown as being in communication with a plurality of mark-up language document servers 112 through a computer network 114, which may optionally also be the Internet and/or otherwise the same computer network as computer network 108. Data collector 110 preferably receives one or more mark-up language documents 116 according to the search results 104, for example according to a URL or other address for a particular mark-up language document server 112, which is supplied with search results 104. Data collector 110 may optionally retrieve or “pull” a mark-up language document 116 or alternatively may have such a mark-up language document 116 “pushed” or sent to data collector 110.
Each mark-up language document server 112 is shown as providing a different type of mark-up language document 116 (although of course each server 112 may or may not be limited to a particular type of mark-up language document 116), with non-limiting examples including a static mark-up language document A 116, a dynamic mark-up language document B 116 or a mark-up language document C 116. Each mark-up language document server 112 optionally retrieves each such mark-up language document 116 from a database 118 as shown.
Data collector 110 then preferably passes these results and one or more of the above described mark-up language documents 116 to a prediction engine 120, which as shown is also part of analysis subsystem 106. As described in greater detail below, prediction engine 120 then analyzes the received search results 104 and also the corresponding mark-up language documents 116 with regard to the relative ranking of a plurality of mark-up language documents 116, and also by comparing one or more features within the plurality of mark-up language documents 116 according to their relative rank.
Additionally or alternatively, prediction engine 120 may also optionally compare one or more features of a target mark-up language document 122 to such one or more features in mark-up language documents 116, with regard to a relative rank of target mark-up language document 122 in comparison to mark-up language documents 116, as determined in search results 104.
Target mark-up language document 122 is preferably provided by a target mark-up language document source 119, which preferably comprises a target mark-up language document server 124. Target mark-up language document server 124 is preferably in communication with data collector 110, preferably through an API (application programming interface) 128, and also optionally through any computer network 106 as previously described (alternatively, target mark-up language document server 124 may optionally be in direct communication with data collector 110, for example through an internal network and/or as part of a particular computational hardware installation). Data collector 110 may optionally “pull” target mark-up language document 122 from target mark-up language document server 124 or alternatively may have target mark-up language document 122 “pushed” by target mark-up language document server 124.
The comparative analysis of target mark-up language document 122 with regard to mark-up language documents 116 is described in greater detail below, but preferably includes determining at least one difference between target mark-up language document 122 and mark-up language documents 116 with regard to relative rank. Optionally such a difference could for example explain a relatively lower rank of target mark-up language document 122 with regard to one or more mark-up language documents 116.
The results of the analysis may optionally be adjusted according to feedback from a user, which provided through a UI feedback and guidance module 126.
Analysis subsystem 106 is optionally in communication with one or more additional external computers or systems, which is preferably performed through one or more APIs (application programming interfaces) 128. In this exemplary system 100, API 128 supports communication between UI feedback and guidance module 126 and an application layer 130, which for example may optionally support a user interface (UI, not shown) for communication with UI feedback and guidance module 126.
Target mark-up language document source 119 also preferably features a mark-up language document editor 132, which may either optionally perform one or changes on target mark-up language document 122 automatically or alternatively (or additionally) according to one or more user inputs, for example through application layer 130. For example, UI feedback and guidance module 126 may also optionally provide inputs as to one or more proposed changes to target mark-up language document 122 to increase the relative rank of target mark-up language document 122 with regard to the plurality of mark-up language documents 112 obtained in the search results. Such inputs are preferably provided to application layer 130, whether for user approval or for automatic implementation by mark-up language document editor 132.
Alternatively or additionally, the user may perform one or more changes to target mark-up language document 122, whether through application layer 130 or directly through mark-up language document editor 132, after which the changed document is reanalyzed by prediction engine 120, to see whether the expected relative rank would be higher or lower, as described in greater detail below.
Stages 3-7 are then performed by the prediction engine. In stage 3, the prediction engine extracts one or more features from the web pages as described in greater detail below. In stage 4, the prediction engine preferably performs supervised training of an analysis algorithm with regard to such features.
Supervised training is a machine learning methodology whereby examples from a known set of classes are fed into a system with the class identifiers. Often the input samples are in the form of an N-dimensional feature vectors. The system is trained with these samples and class identifiers and the resultant model is called a classifier.
Ideally, the classifier should be able to classify the entire training set (now without the given class identifiers) correctly. The entire process of learning from a set of sample feature vectors is called “training the classifier”.
Once training is complete, the classifier is then used to classify unlabeled data into classes. This can be done through a variety of methods that typically rely on determining relative similarities between classes (as determined during training) and the new input vectors.
A simple example of supervised training is the ability to distinguish between males and females based on just two features. The first feature is height and the second feature is hair color. Clearly from a priori knowledge, it is known that height is more likely to be a usefully distinguishing feature than is hair color. The process starts by obtaining training samples from a selected and known training set of male and female participants. A feature vector (2-dimensional) is extracted from each of the training samples and plotted in a two-dimensional feature space, with one dimension for each feature. As seen from the example (
The main advantage of supervised training is the construction of the classifier is often more accurate and reliable than for unsupervised training, because the training set had a known set of class identifiers. For the presently described method, it is possible to leverage supervised training methods because the search engines provide the rankings in the Search Engine Result Pages. The supervised training is not limited to training by search engine rankings but may instead optionally include other classification information for training purposes.
In stage 5, the prediction engine optionally performs feature space reduction, to locate one or more features considered to be of particular importance in determining the relative rank of the target after the supervised training. Therefore, subsequent stages may optionally be performed with fewer features. Non-limiting examples of algorithms for feature space reduction include PCA (principle component analysis).
In stage 6, the prediction engine classifies the target web page according to the N dimensional feature space and according to the respective decision boundary for each feature. Optionally one or more features are weighted with regard to its respective decision boundary such that in cases where the classification of the target web page with regard to that feature is not clear, the decision may optionally be weighted toward a particular side of the boundary. In stage 7 the prediction engine then performs feature space expansion in which the engine determines which features have the most effect on altering the rank of the target web page with regard to the other ranked web pages.
Optionally stages 5 and 6 are not performed, for example if the method is not to be performed in real time, in which case the method optionally proceeds from stage 4 directly to stage 6A as described below.
From stage 6 the process may also optionally be performed by the UI feedback and guidance module in stage 6A, which may optionally perform real time reclassification of the target web page according to input through the web page editor. Also from stage 7, the process may also optionally be performed by the UI feedback and guidance module in stage 7A, which may optionally provide guidance to the user (or to an automated web page editor) with regard to whether one or more changes are likely to improve or reduce the rank of the web page with regard to the other analyzed web pages.
In stage 8, optionally such information is provided to the user and/or through the web; for example, optionally the altered webpage is published to the Internet by being uploaded to a web server.
Both feature extraction module 200 and supervised training module 202 preferably communicate with a feature space reduction and classification module 204. Feature space reduction and classification module 204 is optionally provided to increase the rapidity of calculations, by reducing the number of features initially considered for classifying the target web page with regard to its relative rank in the search results. Feature space reduction and classification module 204 also classifies the target web page with regard to the results determined through supervised training from supervised training module 202 and also according to the features extracted by feature extraction module 200.
The classification of the target web page according to the reduced features is then passed from prediction engine 120 to UI feedback and guidance module 126 through API 128 as previously described. Within UI feedback and guidance module 126, a feature space expansion and distance measure module 206 preferably first expands the feature space again to the full set of features that provide the best discrimination in terms of classification, and the calculates a distance between the target web page and the received web pages from the search engine results.
In addition, feature space expansion and distance measure module 206 may perform feature space expansion to determine which features have the most effect on altering the rank of the target web page with regard to the other ranked web pages. A heuristics module 208 may also optionally be used to provide guidance to the above process through one or more heuristically determined rules.
Also, feature space expansion and distance measure module 206 may determine the distance measure for a target web page that has been altered, to determine the potential effect of such alteration on the relative rank of the target web page within a set of received, ranked web pages (i.e. the search engine results).
The internet spider also obtains mark-up language documents according to the search results as specified by the SERPs (402A). Both the search results and the actual mark-up language documents are stored in a storage cache (404); the search engine results are then stored in a search engine results module (406) within a database (410), which may optionally correspond to the database of analysis subsystem (
As shown, a target web page is edited, such that at least one change is made (stage 1). Such editing may optionally be performed manually by a user, automatically by an editing software, or a combination thereof. To assist in performing the editing process, preferably textual guidance for improvements is received (stage 2) and/or graphical guidance for improvements (stage 3). More preferably, stages 2 and 3 are performed in a feedback cycle with stage 1 at least once, and most preferably a plurality of times, such that textual and/or graphical guidance from stages 2 and 3 is then input to the editing process of stage 1, for manual and/or automatic performance.
In stage 4, the suggested changes to the target web page are approved to improve the relative ranking of the target web page. Such approval may optionally be performed for each cycle of stages 1-3 or may optionally be performed once after all cycle or cycles of stages 1-3 have been performed.
Locality related server 702 preferably features a lexicon generator service 704 and a crawler service 706. Lexicon generator service 704 provides a lexicon for the specific locality, which as described above is a combination of language and cultural factors. Lexicon generator service 704 preferably constructs the lexicon. For the purpose of discussion only and without any intention of being limiting, it is assumed that lexicon generator service 704 generates the lexicon at least partially based upon search engine ranking results. By “topic modeling” it is meant any type of statistically based analysis of language related to a particular subject area or topic. The subject area may optionally be defined narrowly or broadly, but to the extent that the subject area or topic is defined more specifically, it is expected that the resultant model would capture more features of the language and/or capture them more precisely.
Without wishing to be limited in any way, optionally lexicon generator service 704 generates the lexicon by first obtaining a word count of each word in a collection of related documents; in this non-limiting example, the search engine ranking results serve to determine the extent to which the documents are related (and also which documents are related), such that the training process is supervised training. Optionally and preferably, every word appearing at least once in any document has a database entry and the number of times the word appears is also recorded.
Once the collection of words has been established, preferably any stop words are eliminated. Stop words are those words appearing frequently in all documents, regardless of topic (“and”, “the”, “a”, “an”, “is”, and so forth). The determination of which words are “stop words” is typically language dependent; for example, the stop words may optionally be taken from a list of known stop words in a particular language. Alternatively or additionally, a list of stop words may optionally be determined from the collection of documents itself, for example by determining which words appear with a statistical frequency that is greater than a threshold. Optionally phrases comprising such stop words (“for sale”) are not eliminated if the phrase itself is determined to be important.
After stop words are removed, the most frequently appearing terms for this specific topic, preferably which do not appear frequently for other topics, form the lexicon for the topic. For example, optionally a scoring system may be used to determine which words appear in the lexicon, and optionally and preferably also determines the ordering of the words in the lexicon.
Such a scoring system may optionally comprise determining the number of documents in which the lexicon term appears for the topic under consideration (“NumDocs”) and multiplying by the average number of occurrences of this term per document (again, within the context of this topic; “AvgOccur”). However, such a simple calculation could enable a frequently occurring (but otherwise irrelevant) word to be selected. To help prevent such an artifact, preferably the highest ranking document in which the term occurs is determined (HighRank) and the score is adjusted accordingly: Score=(NumDocs*AvgOcur)/HighRank.
The division by the HighRank ensures that the rank or relevancy of the document is also considered, thereby preventing a non-relevant word that appears more frequently in low ranking documents from being selected.
Lexicon generator service 704 preferably receives web pages and also search engine ranking results from crawler service 706 in order to analyze the search engine results as described above. Crawler service 706 optionally operates similarly to data collector 110, in that crawler service 706 at least requests and receives search engine ranking results; crawler service 706 may also optionally retrieve one or more mark-up language documents according to the search engine ranking results.
Lexicon generator service 704 then generates the lexicon according to these search engine results and also according to a topic model generated by a training engine 708. Training engine 708 optionally and preferably models a topic or subject area based upon an analysis of the language used, particularly with regard to the words selected, word frequency and also optionally with regard to word constructs (for example, having a plurality of words featured in the same sentence, same paragraph etc). Other types of language may also optionally be performed as previously described. The language analysis also preferably relates to the effect of such language on search engine ranking results as previously described. Training engine 708 may therefore optionally have a crawler service (not shown) or alternatively may optionally use crawler service 706.
Once the lexicon has been generated by lexicon generator service 704, a suggestion server 712 uses the lexicon to provide one or more language adjustment suggestions to a document as previously described, for example through a client (not shown). The lexicon may optionally be saved locally at a lexicon database 714; alternatively, suggestion server 712 may communicate with lexicon generator service 704 for each suggestion. Suggestion server 712 optionally and preferably communicates with lexicon generator service 704 to determine the efficacy of suggestions provided, such that lexicon generator service 704 optionally determines the actual search engine ranking of a mark-up language document that has been adjusted according to one or more suggestions from suggestion server 712. Training engine 708 preferably operates at least once to provide the topic model for lexicon generator service 704, but may also optionally be invoked again, one or more times, to adjust the topic model, by a watchdog 710 according to at least some embodiments of the present invention. Watchdog 710 preferably samples at least a portion of search engine results, such that for example, such an adjustment may be invoked according to a comparison of the actual and predicted search engine rankings; if the predicted rankings are too distant from the actual rankings, then watchdog 710 may optionally activate training engine 708. The actual and predicted search engine rankings are optionally compared by watchdog 710. If the predicted values are too far off from some specified tolerance, then the topic model is preferably reviewed and if necessary adjusted, more preferably through invoking the training engine 708 as noted previously.
By “distant” it is meant that the numerical difference between the predicted and actual search engine rankings is greater than a threshold level.
Once watchdog 710 has activated training engine 708, the above process for generating the topic model is preferably repeated, after which lexicon generator service 704 receives the new model and generates a new lexicon based upon this model.
As shown with regard to a system 800 of
As non-limiting examples, two types of such software are shown: an agent 804 (of which three are shown for the purpose of illustration and without any intention of being limiting) and an authoring system 806 (of which two are shown for the purpose of illustration and without any intention of being limiting). Agent 804 optionally operates with any type of document generation and/or editing software as an “add on” to such software as previously described. Session manager 802 may also optionally communicate directly with authoring system 806, such that the suggestions are provided through authoring system 806 in an integrated manner that is optionally and preferably transparent to the end user.
If a keyword is not known to suggestion server 712, then optionally a request is sent from suggestion server 712 to a request dispatcher 810. Request dispatcher 810 then preferably communicates with locality related server 702 to analyze the unknown keyword. If the keyword is not part of the lexicon generated by lexicon generator 704, then optionally and preferably crawler 706 is invoked to determine a ranking based upon this keyword, after which search engine rankings and optionally any synonyms are to suggestion server 712. Optionally another request dispatcher 812 handles requests made by training engine 708 as previously described.
Also as shown, training engine 708 optionally features watchdog 710, which may also be implemented separately (not shown). Watchdog 710 preferably also receives the features from feature extraction module 1100 and compares them to predicted ranking results; as previously described, if too great a distance is found between the predicted and actual ranking results, watchdog 710 preferably activates supervised training module 1002 in order to generate a new or adjusted set of rules by rule formulation module 1006.
The process optionally and preferably starts with crawler service 706 being activated by a control request from supervised training module 1002 (arrow 1). Crawler service 706 may optionally be directly invoked by watchdog 710 as shown (arrow 8) or by feature extraction module 1100 (not shown). After being invoked, crawler service 706 provides search results (arrow 2), more preferably in the form of search engine rankings and also mark-up language documents ranked in such rankings, to feature extraction module 1100.
Feature extraction module 1100 analyzes the search results with regard to both the rankings and also the mark-up language documents to extract one or more features, which are then provided to supervised training module 1002 (arrow 3). Training module 1002 may optionally request further and/or repeated feature extraction one or more times (arrow 4).
Once supervised training module 1002 has obtained sufficient features, supervised training module 1002 then analyzes these features in order to determine which ones are important; the relative importance and also optionally a reduced feature space (preferably only including features that are deemed to have at least a threshold level of importance as previously described) are provided to regression module 1004 as a set of rules (arrow 5). Optionally, in order to determine whether the rules accurately predict search engine ranking behavior, crawler service 706 provides additional search results to feature extraction module 1100 (arrow 6), whether automatically or through a control request (not shown). Feature extraction module 1100 then extracts one or more features and compares actual to expected results. This information is then provided to regression module 1004 (arrow 7).
Based upon this information, regression module 1004 selects and/or determines one or more rules, for example for constructing the lexicon as previously described.
Optionally at least once (and preferably repeatedly), verification of these rules is performed by watchdog 710 in response to information provided by regression module 1004 (arrow 8).
Watchdog 710 may optionally invoke crawler service 710 again to restart the process as previously described (arrow 9).
As shown in
Claims
1. A method for analyzing a mark-up language document that is indexable by an internet based indexing computer program, the method being performed by a computer, the method comprising: inputting at least one search keyword to the internet based indexing computer program through the internet; receiving a response to said inputting, said response including at least one returned mark-up language document; analyzing said response according to a supervised training procedure; and analyzing the mark-up language document according to said at least one search keyword and said analysis of said response according to said supervised training procedure.
2. The method of claim 1, wherein said inputting said at least one search keyword comprises inputting a plurality of search keywords related to a specific subject, and wherein said analyzing said response comprises determining a difference between the different search keywords in said response by the internet based indexing computer program.
3. The method of claim 1, wherein said analyzing said response according to said supervised training procedure comprises receiving a plurality of returned mark-up language documents, including the target mark-up language document, and a relative rank of each returned mark-up language document; determining a relative rank of the target mark-up language document with regard to said plurality of returned mark-up language documents; and analyzing at least one feature of the target mark-up language in comparison to said plurality of returned mark-up language documents and said relative rank of the target mark-up language document.
4. The method of claim 3, wherein said feature is selected from the group consisting of content, metadata and structure.
5. The method of claim 4, wherein said content is selected from the group consisting of javascript, text, images, any type of media including multimedia, and any other suitable type of content.
6. The method of claim 5, wherein said analyzing said content of the target mark-up language comprising analyzing said returned mark-up language documents to determine a placement of said search keyword therein.
7. The method of claim 6, wherein said analyzing said content further comprises comparing a keyword density of said search keyword in said returned mark-up language documents to a keyword density in the target mark-up language document with regard to said relative rank.
8. The method of claim 6, wherein said analyzing said content further comprises comparing a keyword location of said search keyword in said returned mark-up language documents to a keyword location in the target mark-up language document with regard to said relative rank.
9. The method of claim 5, wherein the mark-up language document is a web page and said analyzing said content further comprises analyzing said content according to a parameter including one or more of keyword use anywhere in the title tag, keyword use as the first word(s) of the title tag, keyword use in the root domain name in the url, keyword use anywhere in the h1 headline tag, keyword use in internal link anchor text on the page, keyword use in external link anchor text on the page, keyword use as the first word(s) in the h1 tag, keyword use in the first 50-100 text words in the document, keyword use in the subdomain name of the url, keyword use in the page name url, keyword use in the page folder, url keyword use in other headline tags (<h2>-<h6>), keyword use in image alternative text, keyword use in image names, keyword use in <b> or <strong> tags, keyword use in list items <li> on the page, keyword use in the page's query parameters, keyword use in <i> or <em> tags, keyword use in the meta description tag, keyword use in the page's file extension, keyword use in comment tags in the web page, keyword use in the meta keywords tag, freshness of page creation, use of links on the page that point to other urls on this domain, frequency of updating page content, use of external-pointing links on the page, query parameters in the url vs. static url format, ratio of code to text in html, existence of a meta description tag, html validation to w3c standards, use of flash elements (or other plug-in content), or use of advertising on the page.
10. The method of claim 3 wherein said metadata is selected from the group consisting of a mark-up tag and a description of a mark-up tag.
11. The method of claim 9, wherein said mark-up tag is selected from the group consisting of a metatag, a page title and a section title.
12. The method of claim 10, wherein said analyzing said feature comprises analyzing said metadata of said returned mark-up language documents in comparison to the target mark-up language document with regard to said relative rank.
13. The method of claim 11, wherein said metadata comprises a mark-up language tag or description of said tag, and wherein said analyzing said mark-up language tag or description of said tag further comprises comparing said mark-up language tag or description of said tag in said returned mark-up language documents in comparison to the target mark-up language document with regard to said relative rank.
14. The method of claim 12, wherein said analyzing said mark-up language tag or description of said tag further comprises providing a plurality of mark-up language tag keywords; searching said mark-up language tag or description of said tag in said returned mark-up language documents for said plurality of mark-up language tag keywords; searching the target mark-up language document for said plurality of mark-up language tag keywords; and comparing the target mark-up language document and said returned mark-up language documents according to said relative rank.
15. The method of claim 10, wherein said analyzing said returned mark-up language documents further comprises determining a location of each mark-up language tag keyword in said returned mark-up language documents; determining a location of each mark-up language tag keyword in the target mark-up language; and comparing said respective locations.
16. The method of claim 4, wherein said structure is selected from the group consisting of location of a plurality of components, use of containers, rules of dynamic web pages, URL.
17. The method of claim 3, wherein said analyzing said plurality of returned mark-up language documents further comprises determining at least one difference in metadata keywords between a lower ranked returned mark-up language document and a higher ranked returned mark-up language document.
18. The method of claim 16, wherein said analyzing said plurality of returned mark-up language documents further comprises determining at least one difference in structure between a lower ranked returned mark-up language document and a higher ranked returned mark-up language document.
19. The method of claim 16, wherein said analyzing said response according to said supervised training procedure further comprises training said supervised training procedure according to a plurality of returned mark-up language documents and according to a relative rank of said plurality of returned mark-up language documents.
20. The method of claim 16, wherein said supervised training procedure comprises, but not limited to, one or more of the following approaches and methods: analytical learning, artificial neural network, Backpropagation, Bayesian analysis, Decision Trees, Case Based Reasoning, Inductive Logic Programming, Gaussian process regression, Kernel estimators, Learning Automata, Minimum message length (decision trees, decision graphs, etc.), Naive bayes classifier, Nearest Neighbor Algorithm, Probably approximately correct learning, Ripple down rules, Symbolic machine learning algorithms, Subsymbolic machine learning algorithms, Support vector machines, Random Forests, Ensembles of Classifiers, Ordinal Classification, Data Pre-processing, Handling imbalanced datasets, Statistical relational learning.
21. The method of claim 1, further comprising determining at least one change to the mark-up language document according to said analyzed response.
22. The method of claim 20, wherein said determining said at least one change comprises one or more of determining a changed content, a changed structure or a changed metadata.
23. The method of claim 21, wherein said determining said at least one change comprises increasing keyword density of at least one keyword in said target mark-up language document.
24. The method of claim 21, further comprising changing the mark-up language document with at least one change by a user; and displaying a result of said at least one change to the mark-up language document to the user.
25. The method of claim 23, wherein said displaying said result to the user comprises indicating an increase or decrease in potential rank of the mark-up language document by the internet based indexing computer program.
26. The method of claim 3, wherein the internet based indexing program comprises a plurality of programs and wherein the method is performed for each of said programs.
27. The method of claim 25, wherein each of said plurality of programs has a separate geographical location and the method is performed separately for each geographical location.
28. The method of claim 3, wherein said feature comprises a plurality of features, the method further comprising performing PCA to reduce a number of features before said analyzing and said comparing are performed.
29. The method of claim 3, wherein said comparing comprises performing a distance measurement; and comparing the target mark-up language document to said plurality of received mark-up language documents according to said distance measurement.
30. The method of claim 28, wherein said distance measurement is selected from the group consisting of L1, LDA (Latent Dirichlet Allocation) and L2.
31. The method of claim 1, further comprising generating a lexicon according to said supervised training.
32. The method of claim 21, further comprising determining an effect of said at least one change on a search engine ranking of said mark-up language document; and comparing actual and predicted search engine rankings to determine whether said effect is expected.
33. The method of claim 32, wherein if said effect has a sufficiently great difference than an expected effect, performing said supervised training again.
Type: Application
Filed: Jun 3, 2014
Publication Date: Apr 2, 2015
Applicant: REMEZTECH LTD. (Zichron Yaakov)
Inventors: Haim BARAD (Zichron Yaakov), Marina GRECHUHIN (Ganei Tikva), Jonathan SKELKER (Zichron Yaakov)
Application Number: 14/294,635
International Classification: G06F 17/30 (20060101); G06N 99/00 (20060101); G06F 17/22 (20060101);