Identifying and evaluating online references
One example embodiment includes a method for indexing online references of an entity. The method includes identifying one or more channels of the Internet to be searched for references to an entity and identifying one or more signals to be evaluated within each of the one or more channels. The method also includes crawling the Internet for online references to the entity, wherein crawling the Internet comprises searching the one or more channels of the Internet for references to the entity and evaluating the one or more signals. The method further includes constructing a reverse index of the references, wherein the reverse index is based on each channel in which a reference is found and the one or more signals evaluated for the reference.
Latest Brightedge Technologies, Inc. Patents:
This application claims the benefit of and priority to U.S. Provisional Patent Application Ser. No. 61/060,033, entitled “COLLECTING AND SCORING ONLINE ADVERTISEMENTS, ONLINE MARKETING CHANNELS, AND ORGANIC SEARCH,” filed on Jun. 9, 2008, which application is incorporated herein by reference in its entirety.
BACKGROUND OF THE INVENTIONSearch engine optimization, in general, is a process web-masters apply to improve the volume and quality of traffic to a given Web Page or other Internet site. Typical techniques include keywords in title tags, keywords in meta tags, keywords in body text, anchor text in inbound links, age of site, site structure, link popularity in a site's internal link structure, amount of indexable text/page content, number of links to a site, popularity/relevance of links to site and topical relevance of inbound link tags. Additional techniques are sometimes employed based on the search engine for which the webmaster is attempting to optimize. Since search engine algorithms and metrics are proprietary, search engine optimization techniques are widely used to improve visibility of a Web Page or other Internet site on search engine result pages.
Search engine marketing is a form of Internet marketing that includes search engine optimization (SEO), paid inclusion, and paid placement. Paid inclusion and paid placement are forms of paid Internet advertising that place advertisements on the result page of a particular keyword search. Paid inclusion and paid placement vary in price based on factors such as keyword or search term.
Online advertising is a form of advertising that leverages the Internet or World Wide Web to convey a message. Online advertisements include text advertisements, banner advertisements, skyscraper advertisements, floating advertisements, expanding advertisements, polite advertisements, wallpaper advertisements, trick banner advertisements, pop-up advertisements, pop-under advertisements, video advertisements, map advertisements, mobile advertisements and many other forms of online advertisement.
The subject matter claimed herein is not limited to embodiments that solve any disadvantages or that operate only in environments such as those described above. Rather, this background is only provided to illustrate one exemplary technology area where some embodiments described herein may be practiced.
BRIEF SUMMARY OF SOME EXAMPLE EMBODIMENTSThis Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential characteristics of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
In general, example embodiments of the invention relate to collecting and scoring online references of an entity. One example embodiment includes a method for indexing online references of an entity. The method includes identifying one or more channels of the Internet to be searched for references to an entity and identifying one or more signals to be evaluated within each of the one or more channels. The method also includes crawling the Internet for online references to the entity, wherein crawling the Internet comprises searching the one or more channels of the Internet for references to the entity and evaluating the one or more signals. The method further includes constructing a reverse index of the references, wherein the reverse index is based on each channel in which a reference is found and the one or more signals evaluated for the reference.
Another example embodiment includes a system for indexing online references of an entity. The system includes a deep index engine, wherein the deep index engine is configured to assemble parameters for crawling the Internet and to insert crawls to be performed into a job queue. The system also includes one or more worker nodes, wherein the worker nodes are configured to perform the Internet crawls assembled by the deep index engine. The system further includes one or more coordinators, wherein the coordinators are configured to launch jobs for the one or more worker nodes from the job queue.
These and other aspects of example embodiments of the invention will become more fully apparent from the following description and appended claims.
To further clarify various aspects of some embodiments of the present invention, a more particular description of the invention will be rendered by reference to specific embodiments thereof that are illustrated in the appended drawings. It is appreciated that these drawings depict only typical embodiments of the invention and are therefore not to be considered limiting of its scope. The invention will be described and explained with additional specificity and detail through the use of the accompanying drawings in which:
Reference will now be made to the figures wherein like structures will be provided with like reference designations. It is understood that the figures are diagrammatic and schematic representations of some embodiments of the invention, and are not limiting of the present invention, nor are they necessarily drawn to scale.
Reference is first made to
The system 105 includes a deep index engine 110. The deep index engine 110 is configured to assemble the parameters for crawling a network 112 into a search job. The network 112 exemplarily includes the Internet, comprising a global internetwork formed by logical and physical connections between multiple wide area networks and/or local area networks and can optionally include the World Wide Web (“Web”), comprising a system of interlinked hypertext documents accessed via the Internet. Alternately or additionally, the network 112 includes one or more cellular RF networks and/or one or more wired and/or wireless networks such as, but not limited to, 802.xx networks, Bluetooth access points, wireless access points, IP-based networks, or the like. The network 112 also includes servers that enable one type of network to interface with another type of network.
The parameters assembled by the deep index engine 110 can include one or more channels. These channels are the particular media within the Internet/network 112 that is to be searched. In some embodiments, channels can include organic searches, page searches, linked advertisement networks, banner advertisements, contextual advertisements, e-mail, blogs, social networks, social news, affiliate marketing, mobile advertisements, media advertisements, video advertisements, discussion forums, news sites, rich media, social bookmarks, paid searches and in-game advertisements. Nevertheless, the channels are not limited to those mentioned but can include any relevant areas of the Internet to be searched whether now existing or created in the future.
Organic searches refer to those listings in search engine result pages that appear by virtue of their relevance to the search terms, as opposed to their being advertisements. Page searches refer to the listings in search engine result pages regardless of the reason for appearing. Linked advertisement networks are advertisements that are automatically inserted into Web Pages if they contain relevant subject matter. Banner advertisements are advertisements that are placed on a particular Web Page in a particular location. Contextual advertisements are advertisements that are placed when certain key words or other identifiers are present, e.g., keyword advertisements. E-mail—electronic mail or email—is any method of creating, transmitting, or storing primarily text-based human communications with digital communications systems. Blogs—weblogs—are a type of Web Page, usually maintained by an individual with regular entries of commentary, descriptions of events, or other material such as graphics or video. Social networks are a social structure made of nodes (which are generally individuals or organizations) that are tied by one or more specific types of interdependency, such as values, visions, ideas, financial exchange, friendship, kinship, dislike, conflict or trade. Social news refers to Web Pages where users submit and vote on news stories or other links, thus determining which links are presented. Affiliate marketing includes using a Web Page to drive traffic to a different Web Page maintained by an affiliate of the first Web Page's owner. Mobile advertisements include personalized advertisements presented on wireless devices. Media advertisements include advertisements placed within a type of media, or means of communication, whether online, in print, in video or any other format. Video advertisements are advertisements presented in video format. Discussion forums—or message boards—are online discussion sites featuring user-generated content. News sites are Web Pages with the primary purpose of reporting news, including both general news and subject-specific news. Rich media—or interactive media—is media that allows for active participation by the recipient. Social bookmarks relate to a method for network users to store, organize, search, and manage bookmarks of Web Pages on the network and save the bookmarks privately, share the bookmarks with the general public, share the bookmarks with specified people or groups, share the bookmarks within certain networks or share the network with any other combination of private and public access. Paid searches are a type of contextual advertising where Web site owners pay an advertising fee, usually based on click-throughs or ad views, to have their Web Site search results shown in top placement on search engine result pages. In-game advertisements are advertisements placed within a video game either online or on a game console.
Returning to
With continued reference to
Jobs in the job queue 115 include, but are not limited to, search jobs, e.g., crawling the Internet. In some embodiments, once the Internet has been crawled data is obtained. In general, data refers to any information that the deep index engine has specified as relevant. In some embodiments, data can include information regarding the channels searched and the signals evaluated. In other embodiments, data can include downloading the Web Page for further processing, as discussed below. In further embodiments, data can include search results to be parsed, as discussed below.
In some embodiments, once data has been obtained, it must be processed. The deep index engine 110 can insert such processing jobs into the job queue 115. In some embodiments, processing the data can include evaluating the signals. In other embodiments, processing the data can include parsing search results, as discussed below. In further embodiments, processing can include evaluating the reference for positive or negative connotations. For example, a blog entry about a product can be processed to determine whether the entry is generally positive or negative in regards to the product.
In other embodiments, once the data has been obtained, the data may need to be compressed, which is another job that can be inserted by the deep index engine 110 into the job queue 115. In some embodiments, compressing the data can include saving the data for later processing. In other embodiments, compressing the data can include parsing the Web Page for the relevant signals and saving only the portion of the Web Page which relates to the relevant signals. It will be appreciated, with the benefit of the present disclosure, that the deep index engine 110 may insert any job that needs to be performed, including collecting and/or processing data, into the job queue 115.
In some embodiments, the system 105 includes worker nodes 125. Worker nodes 125 comprise nodes that perform the jobs that have been inserted by the deep index engine 110 into the job queue 115. In some embodiments, jobs performed by the worker nodes 125 include: crawling the Web and performing the relevant search, compressing the data, processing the data, constructing a reverse index, calculating a search engine optimization score or any other job that has been inserted into the job queue 115. In some embodiments, every worker node 125 can be a general worker node that is configured to perform any job inserted into the job queue 115. In other embodiments, worker nodes 125 can be specialized worker nodes that are each configured to perform a single job. In further embodiments, worker nodes 125 can be any combination of general worker nodes and specialized worker nodes.
In some embodiments, the worker nodes 125 are further configured to simulate the activities of a human user of the Internet. In some embodiments, simulating the activities of a human user of the internet includes mimicking and/or providing one or more attributes typically associated with a human user, including one or more of: a geographic location, a particular time of browsing, an age, an income level, an e-mail address, or other demographics of human users. For example, the worker nodes 125 can be configured to connect to the Internet through multiple Internet service providers to simulate human users of the Internet in different geographic locations. Alternately or additionally, the worker nodes 125 can be configured to connect to the Internet at a particular time of day. Alternately or additionally, the worker nodes 125 can be configured to input, in certain web sites, an age, income level, or the like, corresponding to a particular demographic of human users. Alternately or additionally, the worker nodes 125 can be configured to input, in certain websites, an e-mail address. In some embodiments of the invention, simulating the activities of a human user of the Internet allows for more relevant search results as the search references concern how such references would be presented to a user of the Internet.
Returning to
The modules, or individual components, of the system 105, including the deep index engine 110, the job queue 115, the worker nodes 125 and the coordinators 120, can be implemented in hardware, software or any combination thereof. If implemented in software, the modules of the system 105 are stored in a computer-readable medium, to be accessed as needed to perform their functions. Additionally, if implemented in software, the tasks assigned to each module can be carried out by a processor, field-programmable gate array (FPGA) or any other logic device capable of carrying out software instructions or other logic functions.
Returning to
Once the search is performed, the worker node 305 collects 325 the search engine result page. The result page can be collected 325 as text to be processed by the worker node 305 or to be inserted into the job queue for processing by other worker nodes. The search engine result page can also be collected 325 in the original format or only the links themselves preserved with the links inserted into the job queue for additional Web crawls by the worker nodes. Nevertheless, any method that collects the search engine result page, either now existing or created in the future, is contemplated for collecting the search engine result page 325.
After the search engine result page is collected, the search engine result page is parsed 330 for relevant information. The result page can be parsed 330 by worker node 305 or can be inserted into a job queue for parsing by other worker nodes. The information that is considered relevant can be determined by the parameters assembled previously by the deep index engine 110 of
The method of
The worker node can also parse 425 the paid advertisement results to identify 430 one or more signals in the paid advertisement results of the search engine result page that refer to the entity, the one or more signals including, for example, placement and/or URL of a corresponding paid advertisement in the search engine result page that refers to the entity. Prominent placement is often considered more effective and, therefore, will normally cost more, than less prominent placement of a paid advertisement. Therefore, placement of the paid advertisement within a search engine result page and/or other Web Page gives an indication of how much was paid for an advertisement and the relevance that is placed on the correlation between the keyword searched and the marketer placing the advertisement. As with the organic search results, a Web Page 435 pointed to by a paid advertisement can be identified and itself parsed for additional references to the entity.
With combined reference to
Returning to
In other embodiments, the results can be presented as raw data. For example, the results could be presented as the number of hits on a particular Web Page, i.e., the traffic history of the Web Page, or as the organic search result rank for a particular keyword or set of keywords. In further embodiments, the results can be presented as mentions in a particular media. For example, the results can be presented as the number of mentions within blogs. Alternately or additionally, the results can be further broken down. For example, blog mentions can be broken down into positive mentions and negative mentions.
Alternatively or additionally, a reverse index generated according to the method of
The method of
Once the keyword(s) has been searched, the organic rank for the Web Page is identified 520. A weighted multiplier is then applied 525 to the organic rank, where the weighted multiplier can be based on the organic rank. That is, the weighted multiplier is different for each ranking (i.e., not a constant). In some embodiments, the weighted multiplier considers 530 the distribution of click analysis of the organic rank. That is, the multiplier takes into account the number of users that follow the link to the URL. For example, a search may turn up a result that is irrelevant to the majority of users for whatever reason. Even if the ranking of the result is high, the multiplier can be adjusted to reflect the low number of users who follow the link. From the weighted multiplier and the organic rank, an SEO score can be generated 535. The SEO score allows an analysis of the relevance of the references based on the predefined criteria.
In some embodiments, the chart 615 can be limited to the organic rank history of the client. In other embodiments, the chart 615 can be limited to the organic rank history of competitors and can exclude the organic rank history of the client. The chart 615 can include the organic rank history of more or less than two competitors, as specified by the client. Additionally, competitors can be identified in any manner. For example, only the largest competitors could be shown or certain competitors could be identified which are of particular interest.
The first region 675 of the chart 670 indicates premium backlinks, or backlinks from Web Pages with a pagerank value of 7 to 10. The second region 680 of the chart 670 indicates quality backlinks, or backlinks from Web Pages with a pagerank value of 3 to 6. The third region 685 of the chart 670 indicates regular backlinks, or backlinks from Web Pages with a pagerank value of 0 to 2. Quality of backlinks can be evaluated using other methods and is not limited to pagerank.
It is appreciated that the charts of
With additional reference to
The method 700 includes identifying 705 the channel or channels to be searched. As explained above, channels are the particular medium within the Internet that is to be searched. In some embodiments, channels can include organic searches, page searches, linked advertisement networks, banner advertisements, contextual advertisements, e-mail, blogs, social networks, social news, affiliate marketing, mobile advertisements, media advertisements, video advertisements discussion forums, news sites, rich media, social bookmarks, paid searches and in-game advertisements. Nevertheless, the channels are not limited to those mentioned but can include any relevant areas of the Internet to be searched, whether now existing or created in the future.
The method 700 further includes identifying 710 signals to be evaluated. The signals include relevant information about the references to the entity. For example, advertisements placed at the top of a Web Page are much more visible, and therefore, are generally more expensive and are considered more effective. Therefore, if the references to be indexed include online advertisements, advertisement placement is a signal that can be identified for the indexing. Signals to be evaluated can alternately or additionally include frequency of the reference on a given Web Page, location of the reference on the Web Page, calendar date of the crawl, calendar date of Web Page posting, time of day of the crawl, time of day of Web Page posting, context-driven Web indexing, time to download the Web Page, Web browser compatibility of the Web Page, Web plug-in compatibility of the Web Page, or the like. Additionally or alternatively, signals within an e-mail message to be evaluated can include frequency of the e-mail message received, outbound links on the e-mail message, calendar date of the e-mail message received, time of day of the e-mail message received, or the like. Context-driven Web indexing can further include links within the Web Page, current events surrounding the posting and the topic of the Web Page. Nevertheless, the signals to be evaluated are not limited to those mentioned but can include any relevant information about the references to the entity, whether now existing or created in the future.
The method 700 also includes crawling 715 the Web, the Internet, or other network, such as the network 112 of
The method 700 further includes constructing 720 a reverse index of the results. For example, a reverse index can be constructed 720 that references the online references to the entity and ranks them according to a set of predetermined criteria. Constructing 720 a reverse index can optionally include performing a trend analysis. A trend analysis shows how the online references have changed over time. For instance, the chart 605 of
Constructing 720 a reverse index can alternately or additionally include generating an SEO score. The SEO score allows a user, such as the customer that requested the index, to see the relevance of the references based on one or more predetermined criteria, e.g., cost-effectiveness.
In some embodiments, the method 700 may be performed using a system, such as the system described in
The embodiments described herein may include the use of a special purpose or general purpose computer including various computer hardware and/or software modules, as discussed in greater detail below.
Embodiments within the scope of the present invention may also include physical computer-readable media and/or intangible computer-readable media for carrying or having computer-executable instructions or data structures stored thereon. Such physical computer-readable media and/or intangible computer-readable media can be any available media that can be accessed by a general purpose or special purpose computer. By way of example, and not limitation, such physical computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other physical medium which can be used to carry or store desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer. Within a general purpose or special purpose computer, intangible computer-readable media can include electromagnetic means for conveying a data signal from one part of the computer to another, such as through circuitry residing in the computer.
When information is transferred or provided over a network or another communications connection (either hardwired, wireless, or a combination of hardwired or wireless) to a computer, hardwired devices for sending and receiving computer-executable instructions, data structures, and/or data signals (e.g., wires, cables, optical fibers, electronic circuitry, chemical, and the like) should properly be viewed as physical computer-readable mediums while wireless carriers or wireless mediums for sending and/or receiving computer-executable instructions, data structures, and/or data signals (e.g., radio communications, satellite communications, infra-red communications, and the like) should properly be viewed as intangible computer-readable mediums. Combinations of the above should also be included within the scope of computer-readable media.
Computer-executable instructions comprise, for example, instructions, data, and/or data signals which cause a general purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions. Although not required, aspects of the invention have been described herein in the general context of computer-executable instructions, such as program modules, being executed by computers, in network environments and/or non-network environments. Generally. program modules include routines, programs, objects, components, and content structures that perform particular tasks or implement particular abstract content types. Computer-executable instructions, associated content structures, and program modules represent examples of program code for executing aspects of the methods disclosed herein.
Embodiments may also include computer program products for use in the systems of the present invention, the computer program product having a physical computer-readable medium having computer-readable program code stored thereon, the computer-readable program code comprising computer-executable instructions that, when executed by a processor, cause the system to perform the methods of the present invention.
The present invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is, therefore, indicated by the appended claims rather than by the foregoing description. All changes that come within the meaning and range of equivalency of the claims are to be embraced within their scope.
Claims
1. A method of indexing online references of an entity, the method comprising:
- identifying an entity, wherein the entity is an individual, corporation, brand, or product;
- crawling the Internet for online references to the entity after identifying the entity, wherein crawling the Internet comprises: querying a first search engine for a first search results page using a keyword; parsing the first search result pages into organic search results; identifying an organic online reference to the entity based on the organic search results; parsing the organic search results to identify one or more organic signals that include information about the organic online reference to the entity; searching social networks for social network search results that refer to the entity; identifying a social online reference to the entity based on the social network search results; parsing the social network search results to identify one or more social signals that include information about the social online reference to the entity; evaluating the social signals and organic signals; and
- constructing, using a processor, a reverse index of the online references based on the evaluated social signals and the evaluated organic signals.
2. The method of claim 1, further comprising computing a search engine optimization score for the identified organic online reference to the entity based on the reverse index.
3. The method of claim 1, wherein the reverse index lists the social online reference and the organic online reference.
4. The method of claim 1, wherein the reverse index lists the organic online reference with respect to the keyword.
5. The method of claim 1, wherein the social signals comprise a rank, a uniform resource locator (URL), a title, or a description of the social online reference.
6. The method of claim 1, wherein the social signals are evaluated to identify the relevance of the social online reference to the entity.
7. The method of claim 1, wherein the organic signals comprise a rank, a uniform resource locator (URL), a title, or a description of the organic online reference.
8. The method of claim 1, wherein the organics signals are evaluated to identify the relevance of the organics online reference to the entity.
9. The method of claim 1, wherein the social networks comprise a social structure of nodes that are tied by one or more specific types of interdependency.
10. The method of claim 1, wherein the organic search results are search results in a search engine results page that appear based on their relevance to a search term used to generate the search results.
11. A method of indexing online references of an entity, the method comprising, by a system comprising a plurality of worker nodes:
- crawling the Internet for the online references to the entity by querying, using a keyword, one or more search engines to retrieve search results pages, wherein the entity is an individual, corporation, brand, or product, wherein the search results pages comprise one or more search results identified by the respective search engine as responsive to the keyword, and wherein the worker nodes are configured to simulate activities of a human user of the Internet by providing the one or more search engines with information that affects how the search results would be presented to the human user;
- parsing the search results pages to identify organic search results, wherein the organic search results appear in the search results pages based on relevance of webpages corresponding to the organic search results to the keyword;
- parsing the organic search results to identify organic online references to entities in the organic search results and one or more signals to be evaluated that include information about the organic online references;
- evaluating the signals to identify relevance of the organic online references to the entities in the organic search results;
- constructing a reverse index of the online references based on the evaluated signals;
- in response to receiving a query related to the entity, computing, using the reverse index, statistics regarding the organic online references to the identified entity, wherein computing the statistics comprises attributing an organic online reference to the identified entity identified via a first search engine of the one or more search engines with a weight greater than an organic online reference to the entity identified via a second search engine; and
- presenting the computed statistics regarding the organic online references to the identified entity.
12. The method of claim 11, wherein the information that affects how the search results would be presented to the human user comprises:
- the keyword;
- attributes of the human user;
- attributes of a computing device used by the human user to perform the activities; or
- contextual information related to the crawling.
13. The method of claim 12, wherein the information that affects how the search results would be presented to the human user comprises a geographic location, demographic information of the human user, group characteristics of the human user, software installed on the computing device, or technical features of the computing device.
14. The method of claim 11, wherein the evaluated signals comprise placement of the organic online references in the search results pages, contextual information related to the crawl, time to load a web page, contextual information related to the organic search results, contextual information related to the organic online references, technical information related to the organic online references, technical information related to the organic search results, or the information provided during the crawl that affects how the search results would be presented to the human user.
15. The method of claim 11, wherein organic search results are associated with different categories, and wherein each of the categories represents a type of online reference.
16. The method of claim 15, wherein the type of online reference comprises one of web page content, advertisements, content referenced in social networks, social news, content referenced in discussion forums or blogs, content referenced in news sites, rich media, images, video, social bookmarks, game content, and subtypes thereof.
17. The method of claim 11, wherein the computing the statistics regarding the organic online references to the identified entity further comprises computing a number of hits for a web page containing the organic online reference to the identified entity, statistics related to mentions of the identified entity within a particular type of the organic online references, a click analysis based on a number of users that follow a link to a uniform resource locator (URL), or a characteristic of the mentions of the identified entity.
18. The method of claim 11, wherein the computing the statistics regarding the organic online references to the identified entity further comprises computing a search engine optimization score for the identified entity based on the organic online references in the reverse index.
19. The method of claim 11, wherein the computing the statistics regarding the organic online references to the identified entity further comprises identifying backlinks to the organic online references based on the search results.
20. The method of claim 11, further comprising performing a trend analysis to determine how the organic online references have changed over time.
21. A method of indexing online references of an entity, the method comprising, by a system comprising a plurality of worker nodes:
- crawling the Internet for the online references to the entity by querying, using a keyword, one or more search engines to retrieve search results pages, wherein the entity is an individual, corporation, brand, or product, wherein the search results pages comprise one or more search results identified by the respective search engine as responsive to the keyword, and wherein the worker nodes are configured to simulate the activities of a human user of the Internet by providing the one or more search engines with information that affects how the search results would be presented to the human user;
- parsing the search results pages to identify paid content based on placement or inclusion;
- parsing the paid content to identify paid online references to entities and one or more signals to be evaluated that include information about the paid online references;
- evaluating the signals to identify relevance of the paid online references to the entities;
- constructing a reverse index of the online references based on the evaluated signals;
- in response to receiving a query related to the entity, computing, using the reverse index, statistics regarding the paid online references to the identified entity, wherein computing the statistics comprises attributing a paid online reference to the identified entity identified via a first search engine of the one or more search engines with a weight greater than a paid online reference to the entity identified via a second search engine; and
- presenting the computed statistics regarding the paid online references to the identified entity.
22. The method of claim 21, wherein the information that affects how the search results would be presented to the human user comprises:
- the keyword;
- attributes of the human user;
- attributes of a computing device used by the human user to perform the activities; or
- contextual information related to the crawling.
23. The method of claim 22, wherein the information that affects how the search results would be presented to the human user comprises a geographic location, demographic information of the human user, group characteristics of the human user, software installed on the computing device, or technical features of the computing device.
24. The method of claim 21, wherein parsing the search results pages to identify paid content further comprises parsing the search results pages to identify results associated with different categories, and wherein each of the categories represents a type of online reference.
25. The method of claim 24, wherein the type of online reference comprises one of web page content, advertisements, content referenced in social networks, social news, content referenced in discussion forums or blogs, content referenced in news sites, rich media, images, video, social bookmarks, game content, and subtypes thereof.
26. The method of claim 21, wherein the evaluated signals placement of the paid online references in the search results pages, contextual information related to the crawl, contextual information related to the search results, contextual information related to the paid online references, technical information related to the paid online references, technical information related to the search results, or the information provided during the crawl that affects how the search results would be presented to the human user.
27. The method of claim 21, wherein the computing the statistics regarding the paid online references to the identified entity further comprises computing a paid search optimization score for the identified entity based on the paid online references in the reverse index.
28. The method of claim 21, wherein the computing the statistics regarding the paid online references to the identified entity further comprises computing a number of hits for a web page containing the paid online reference to the identified entity, a number of mentions of the identified entity within a particular type of the paid online references, a click analysis based on a number of users that follow a link to a uniform resource locator (URL), or a characteristic of the mentions of the identified entity.
29. The method of claim 21, wherein the computing the statistics regarding the paid online references to the identified entity further comprises identifying backlinks to the paid online references based on the search results pages.
30. The method of claim 21, further comprising performing a trend analysis to determine how the paid online references have changed over time.
31. A method of indexing online references of an entity, the method comprising, by a system comprising a plurality of worker nodes:
- crawling the Internet for the online references to the entity by querying, using a keyword, one or more search engines to retrieve search results pages, wherein the entity is an individual, corporation, brand, or product, wherein the search results pages comprise one or more search results identified by the respective search engine as responsive to the keyword, and wherein the worker nodes are configured to simulate the activities of a human user of the Internet by providing the one or more search engines with information that affects how the search results would be presented to the human user;
- parsing the search results pages to identify organic search results and paid content, wherein the organic search results appear in the search results pages based on relevance of webpages corresponding to the organic search results to the keyword, and wherein the paid content appears in the search results pages based on placement or inclusion;
- parsing the organic search results to identify organic online references to entities in the organic search results and one or more organic signals to be evaluated that include information about the organic online references;
- parsing the paid content to identify paid online references to entities and one or more paid signals to be evaluated that include information about the paid online references;
- evaluating the organic signals and the paid signals to identify relevance of the organic online references and relevance of the paid online references to the entities in the search results;
- constructing a reverse index of the online references based on the evaluated signals;
- in response to receiving a query related to the entity, computing, using the reverse index, statistics regarding the organic online references to the identified entity and statistics regarding the paid online references to the identified entity, wherein computing the statistics regarding the organic online references comprises attributing an organic online reference to the identified entity identified via a first search engine of the one or more search engines with a first weight greater than an organic online reference to the entity identified via a second search engine, and wherein computing the statistics regarding the paid online references comprises attributing a paid online reference to the identified entity identified via the first search engine of the one or more search engines with a second weight greater than a paid online reference to the entity identified via the second search engine; and
- presenting the computed statistics regarding the organic online references to the identified entity and the computed statistics regarding the paid online references to the identified entity.
32. The method of claim 31, wherein the information that affects how the search results would be presented to the human user comprises:
- the keyword;
- attributes of the human user;
- attributes of a computing device used by the human user to perform the activities; or
- contextual information related to the crawling.
33. The method of claim 32, wherein the information that affects how the search results would be presented to the human user comprises a geographic location, demographic information of the human user, group characteristics of the human user, software installed on the computing device, or technical features of the computing device.
34. The method of claim 31, wherein the organic search results and the paid content are associated with different categories, and wherein each of the categories represents a type of online reference.
35. The method of claim 34, wherein the type of online reference comprises one of web page content, advertisements, content referenced in social networks, social news, content referenced in discussion forums or blogs, content referenced in news sites, rich media, images, video, social bookmarks, game content, and subtypes thereof.
36. The method of claim 35, wherein the type of the organic online references comprises either content related to social networks or blogs, and wherein the computed statistics comprises a number of mentions of the identified entity, a number of a particular different type of mentions of the identified entity, or a comparative measure of the number of mentions of the identified entity with respect to a second identified entity.
37. The method of claim 31, wherein the computing the statistics regarding the organic online references to the identified entity further comprises computing a number of hits for a web page containing the organic online reference to the identified entity, statistics related to mentions of the identified entity within a particular type of the organic online references, a click analysis based on a number of users that follow a link to a uniform resource locator (URL), or a characteristic of the mentions of the identified entity.
38. The method of claim 31, wherein the computing the statistics regarding the organic online references to the identified entity and the statistics regarding the paid online references to the identified entity further comprises determining a particular weight to be applied to a particular type of the online references or a particular grouping of types of the online references.
39. The method of claim 31, wherein parsing the search results pages to identify organic search results and paid content further comprises parsing the search results pages to identify results associated with different categories, and wherein each of the categories represents a type of online reference.
40. The method of claim 39, wherein the type of online reference comprises one of web page content, advertisements, content referenced in social networks, social news, content referenced in discussion forums or blogs, content referenced in news sites, rich media, images, video, social bookmarks, or game content.
41. The method of claim 31, wherein the evaluated signals comprise contextual information related to the crawl, time to load a web page, placement of the organic online references in the search results pages, contextual information related to the organic search results, contextual information related to the organic online references, technical information related to the organic online references, technical information related to the organic search results, placement of the paid online references in the search results pages, contextual information related to the paid content, contextual information related to the paid online references, technical information related to the paid content, technical information related to the paid online references, or the information provided during the crawl that affects how the search results would be presented to the human user.
42. The method of claim 31, wherein the computing the statistics regarding the organic online references to the identified entity further comprises computing a search engine optimization score for the identified entity based on the organic online references in the reverse index.
43. The method of claim 31, wherein the computing the statistics regarding the paid online references to the identified entity further comprises computing a search engine optimization score for the identified entity based on the paid online references in the reverse index.
44. The method of claim 31, wherein the computing the statistics regarding the organic online references to the identified entity further comprises computing a number of hits for a web page containing the organic online reference to the identified entity, a number of mentions of the identified entity in a channel-related grouping, a click analysis based on a number of users that follow a link to a uniform resource locator (URL), or a characteristic of the mentions of the identified entity.
45. The method of claim 31, wherein the computing the statistics regarding the paid online references to the identified entity further comprises computing a number of hits for a web page containing the paid online reference to the identified entity, a number of mentions of the identified entity within a particular type of the paid online references, a click analysis based on a number of users that follow a link to a uniform resource locator (URL), or a characteristic of the mentions of the identified entity.
46. The method of claim 31, wherein the computing the statistics regarding the organic online references to the identified entity further comprises identifying backlinks to the organic online references based on the search results.
47. The method of claim 31, wherein the computing the statistics regarding the paid online references to the identified entity further comprises identifying backlinks to the paid online references based on the search results.
48. The method of claim 31, further comprising performing a trend analysis to determine how the organic online references and the paid online references have changed over time.
7584194 | September 1, 2009 | Tuttle et al. |
7788216 | August 31, 2010 | Li et al. |
7801881 | September 21, 2010 | Brawer et al. |
8972379 | March 3, 2015 | Grieselhuber |
20030208482 | November 6, 2003 | Kim et al. |
20050223061 | October 6, 2005 | Auerbach et al. |
20060282328 | December 14, 2006 | Gerace et al. |
20060294085 | December 28, 2006 | Rose et al. |
20070198459 | August 23, 2007 | Boone et al. |
20070203891 | August 30, 2007 | Solaro et al. |
20070233649 | October 4, 2007 | Wang et al. |
20080040342 | February 14, 2008 | Hust et al. |
20080201324 | August 21, 2008 | Aronowich et al. |
20080270158 | October 30, 2008 | Abhyanker |
20090037393 | February 5, 2009 | Fredricksen et al. |
20090164431 | June 25, 2009 | Zivkovic et al. |
20090222404 | September 3, 2009 | Dolin et al. |
20090307056 | December 10, 2009 | Park |
20100070485 | March 18, 2010 | Parsons et al. |
1 182 581 | February 2002 | EP |
2008192157 | August 2008 | JP |
2007/070199 | June 2007 | WO |
- Extended European Search Report dated Aug. 26, 2011 as received in EP Application No. 09763298.8.
- International Search Report and Written Opinion dated Dec. 30, 2009 as issued in International Application No. PCT/US2009/046100 filed Jun. 3, 2009.
Type: Grant
Filed: Feb 16, 2021
Date of Patent: Apr 16, 2024
Assignee: Brightedge Technologies, Inc. (Foster City, CA)
Inventors: Lemuel S. Park (Foster City, CA), Jimmy Yu (Foster City, CA)
Primary Examiner: Peng Ke
Application Number: 17/176,586
International Classification: H04L 65/40 (20220101); G06F 8/30 (20180101); G06F 9/44 (20180101); G06F 9/54 (20060101); G06Q 50/00 (20120101);