METHODS AND SYSTEMS FOR A CONTENT DEVELOPMENT AND MANAGEMENT PLATFORM

The present system and method relate to enhancements to online search engines and search result rankings, which can benefit from large scale analysis of online content (e.g., Web pages) and vast amounts of information kept and processed from prior searches to develop intelligent associations between various content with one another. Some aspects employ machine learning systems and methods to further enhance the present goals. Other aspects employ novel arrangements of data in data stores to extract best associations and deliver greater search engine rankings to users in an increasingly context-based or personalized type of searching environment.

Skip to: Description  ·  Claims  · Patent History  ·  Patent History
Description
CLAIM TO PRIORITY

This application is based upon, claims priority to, and incorporates by reference U.S. Provisional Patent Application No. 62/419,772, entitled “System and Method for Topical Machine Search Handling,” filed on Nov. 9, 2016.

TECHNICAL FIELD

The present application relates to the fields of automated computer searching and ranking of search results, and to electronic data stores used in the same, including for generating outputs and data used to determine online subject matter in a topic-driven and context-sensitive search environment.

BACKGROUND

Millions of users turn to the Internet through client browser applications such as Firefox, Safari, Explorer, Chrome and others in search of information and other content stored on the global network. Perhaps the most popular use of the Internet and the World Wide Web is through search engines that provide results in response to user searches entered into the search engine interfaces. Each of the major browser providers (and others) have developed widely used search engines that reap significant economic returns to their providers. A search engine is usually presented to a user as a simple user entry field into which a user types a word, phrase, natural language question, or into which the user enters an image file other search query. The search engines, typically within seconds, carry out proprietary and highly complex search algorithms to sort or rank a response to the user's query, which is typically presented in the browser interface as a list of ranked search results. The lists may preferentially present paid content or sites in some search engines. In others, the search results are presented in descending order of relevance. Search results are sometimes referred to as “hits” and a typical search may return thousands of hits sorted by the search engine according to the data available to it and according to the results of the search engine's processing of said data. The Internet is a repository of such a large amount of data, of widely variable quality and value, such that a poorly designed search engine would be of little or no use to its users. Poor search methods can inundate a user with irrelevant or useless responses to the user's query and therefore bad search engines are quickly relegated to oblivion by negative publicity and non-use. Search engines are recognized by their ability to quickly find the most relevant and useful search results to user queries. The best search engines, based on the quality, depth and speed of their operation, are very highly used by all segments of society, and as such command lucrative advertising and placement fees for content such as banner advertising. In addition, search engines generate enormous value through their ability to track and retain historical data generated in the course of billions of searches continuously taking place in said search engines. This ability makes search engines very valuable for predictive analytics, marketing and personalization of services including targeted communications and advertising at the mass and individual levels. In particular, marketers have used Internet searching and search engine functionality to deliver favorable content to potential consumers. Marketers or promoters have been known to generate content relating to a subject of interest (a market segment, product line, idea being promoted) and to then generate corresponding Web and social media content directed to an aspect of the subject of interest so as to drive up interest in the subject of interest. In other words, secondary conversations, postings or content directed to a sub-space of the subject of interest can mutually drive up search engine rankings of both the sub-space and the subject of interest, to the benefit of the marketers or their clients.

Enterprises increasingly rely on their online presence for success, including content websites (including mobile sites), mobile applications, social networks, and electronic commerce sites. As these forms of online presence grow across a wide range of enterprises, a consistent challenge for each enterprise has been how to be discovered by relevant parties seeking engagement with the enterprise or its online resources. Search engines are still the dominant solution for most users seeking content; accordingly, in addition to bidding for presence in the “sponsored” portion of search results, enterprises have historically sought to optimize their content to increase the likelihood of being presented at or near the top of the search results for relevant searches. Major search engines, like Google™, have historically used ranking algorithms that ascribe importance to relevant keywords occurring in content and also ascribe weight to the linking of others to the content; therefore, content publishers and advertisers have typically sought to include relevant keywords in the content, metadata, advertisements, and the like that comprise their online presence and have sought to encourage linking to the content. The prevalence of keyword-based approaches has limited their effectiveness, as many parties have sought to load their content with the same keywords, often in situations where the keywords do not bear a close relationship to the actual content of a site. In response, search engines have begun to seek alternatives to keywords in ranking content, and some search engines have, in an effort to discourage misleading use of keywords, ceased to expose their ranking methodologies, making it increasingly difficult for enterprises to manage activities that promote relevant discovery. A need exists for methods and systems that enable enterprises to develop content for their online presence that is highly relevant, without dependence on conventional keyword-driven approaches.

With many metrics available and varying, obscure techniques for ranking within search engines, it is difficult for marketer and other workers for an enterprise to know (a) around what topics content should be created and (b) whether a selected content approach is truly performing well. At every stage of the game, it is difficult to know what content is or will be effective. One problem contributing to this confusion is the fact that many marketers choose to focus on certain keywords around which to craft content around. Historically, marketers could strive to have their content rank highly (e.g., on search engines like Google™, etc.), and use search engine optimization (SEO) techniques to make their pages as appealing as possible within the ranking procedure of the search engine (e.g., based on the engine's attributing value to a page's authority, relevance, etc.). This approach has several challenges. First, it is often hard to select which keywords to target. Also, ranking highly for a term might not necessarily correlate to attracting more customers. Further, search engines are moving to a more “personalized” search results mode based on a searcher's location, history, and the like that makes the concept of “rank” an inherently troublesome metric on which to base efforts for success.

A number of different enterprise functions, and individuals involved in various roles within those functions, depend on identifying content for communications, including marketing functions, sales functions, services functions, public relations functions, investor relations functions, and others. For example, a marketing professional inside an enterprise or an agency working for an enterprise may need to determine favorable topics for website content or for an advertising campaign, while a sales professional may wish to engage with a customer and wonder what topic would be best to encourage engagement. Customers increasingly expect more personalized engagement with an enterprise, reflected by the emergence of chat functions, conversational agents, and bots, all of which tune communications more closely to the situation of a specific individual. As the number of such conversations increases rapidly in scale, it is increasingly challenging to meet such needs solely by human workers. For marketers who don't know what to write about, current solutions involve extremely metrics-heavy keyword research tools to choose these keywords. For marketers who can't evaluate their content's performance, current solutions involve looking at the content's rank, which is rapidly becoming less and less relevant, and perhaps using an analytics tool to map traffic. Neither of these solutions help guide marketers on how to fix potential problems, and both solutions can even give an incorrect perspective on content's success. A need exists for automated methods and systems that assist various functions and roles within an enterprise in finding appropriate topics to draw customers into relevant conversations and to extend the conversations in a way that is relevant to the enterprise and to each customer.

A proliferation of metrics is used to rank potential search results by a search engine according to the art. Successful promoters (whether promoting products, ideas or other notions) are constantly looking for ways to improve the positive exposure of their subjects of interest, leveraging the search engine and ranking metrics. However, the over-abundance of options and strategies for generating search results and sorting the same makes it challenging for promoters to determine what topics and related sub-topics of interest to offer so as to maximize their overall desired impact. Also, it is complicated and difficult for such promoters to determine whether their content is in fact having a positive impact on the promoted subject, e.g., marketing campaigns, political campaigns, etc. An entire industry catering to “search engine optimization” (SEO) has emerged, which purports to optimize a customer's online efforts for maximum search engine yield. Some of the promoters' challenges are caused by their focus on the use of key words in their promotional campaigns and online promotional content. But it is far from a known process how to best select the optimum sets of key words to achieve maximum search result ranking with the various major search engines. These challenges are complicated recently by the move to “personalize” search results whereby major search engines are not limiting their rankings to objective or absolute metrics, but instead, a personalized and context-based ranking based on numerous subjective factors are being employed. Therefore, there may not be an ideal or known optimized methodology by which to carry out SEO as the field develops and becomes more sophisticated.

SUMMARY

The present disclosure is directed to various ways of improving the functioning of computer systems, information networks, data stores, search engine systems and methods, and other advantages. As stated above, more personalized, context-dependent search engine operations are departing from traditional absolute metrics for ranking search results. Instead, search results may be further dependent on the identity, demographic, location and online history of a person making a search query. More sophisticated search engine systems and methods are required to give an advantage to a promoter or marketer in such a search environment so as to push the marketer or promoter's subject matter to the top of the search engine rankings.

Among other features and advantages, the present system and method can benefit from large scale analysis of online content (e.g., Web pages) and vast amounts of information kept and processed from prior searches to develop intelligent associations between various content with one another. Some aspects employ machine learning systems and methods to further enhance the present goals including assisting users to develop successful content strategy for online content generation. Other aspects employ novel arrangements of data in data stores to extract best associations and deliver greater search engine rankings to users in an increasingly context-based or personalized type of searching environment. In particular, this system and method can improve the architecture that guides promoters to understanding what online subject matter to direct their efforts at for maximum effect on the target marketplace.

In embodiments of the present disclosure, a platform is provided for enabling automated development of content, typically for an enterprise, that is adapted to support a variety of enterprise functions, including marketing strategy and communications, website development, search engine optimization, sales force management, electronic commerce, social networking, and others. Among other benefits, the content development platform uses a range of automated processes to extract and analyze existing online content of an enterprise, parse and analyze the content, and develop a cluster of additional content that is highly relevant to the enterprise, without reliance on conventional keyword-based techniques.

In embodiments, the platform integrates functions of a content management system (CMS) with functions of a customer relationship management (CRM) system, including sharing access to database records and other information that is typically stored in and/or accessed by either.

In embodiments, methods and systems are provided herein for a platform for generating a cluster of correlated content from a primary online content object, the methods and systems including an automated crawler for crawling the primary online content object and storing a set of results from the crawling in a data storage facility; a parser for parsing the stored content from the crawling to generate a plurality of key phrases and to generate a content corpus from the primary online content object; a plurality of models for processing at least one of the key phrases and the corpus, the models comprising at least two of a word2vec model, a doc2vec model, a latent semantic analysis (LSA) extraction model, and a key phrase logistic regression model, wherein the processing results in a plurality of content clusters representing topics within the primary online content object; a content cluster data store for storing the content clusters; and a suggestion generator for generating, using output from at least one of the models, a suggested topic that is similar to at least one topic among the content clusters and for storing the suggested topic and information regarding the similarity of the suggested topic to at least one content cluster in the content cluster data store.

In embodiments, the plurality of models used by the platform may comprise at least one of a word2vec model, a doc2vec model, a latent semantic analysis extraction model, a latent semantic indexing model, a principle component analysis model, and a key phrase logistic regression model. In embodiments, the parser uses a machine learning system to parse the crawled content. In embodiments, the machine learning system iteratively applies a set of weights to input data, wherein the weights are adjusted based on a parameter of success, wherein the parameter of success is based on the success of suggested topics in the online presence of an enterprise. In embodiments, the machine learning system is provided with a training data set that is created based on human analysis of the crawled content.

In embodiments, at least one of the plurality of models used in the platform uses a machine learning system to cluster content. In embodiments, the machine learning system iteratively applies a set of weights to input data, wherein the weights are adjusted based on a parameter of success, wherein the parameter of success is based on the success of suggested topics in the online presence of an enterprise. In embodiments, the machine learning system is provided with a training data set that is created based on human clustering of a set of content topics.

In embodiments, the suggestion generator uses machine learning to suggest topics. In embodiments, the machine learning system iteratively applies a set of weights to input data, wherein the weights are adjusted based on a parameter of success, wherein the parameter of success is based on the success of suggested topics in the online presence of an enterprise. In embodiments, the machine learning system is provided with a training data set that is created based on human creation of a set of suggested topics.

In embodiments, the methods and systems disclosed herein may further include an application for developing a strategy for development of online presence content, the application accessing the content cluster data store and having a set of tools for exploring and selecting suggested topics for online presence content generation. In embodiments, the application provides a list of topics that are of highest semantic relevance for an enterprise based on the parsing of the primary online content object. In embodiments, the methods and systems may further include a user interface of the application for presenting a suggestion, wherein the generated suggestion is presented with an indicator of the similarity of the suggested topic to a content cluster topic as calculated by at least one of the models. In embodiments, the methods and systems may further include a user interface of the application for presenting a suggested topic, wherein the user interface facilitates generation of content related to the suggested topic. In embodiments, the user interface includes at least one of key words and key phrases that represent the suggested topic. In embodiments, the at least one of key words and key phrases are used to prompt the user with content for generation of online presence content. In embodiments, the online presence content is at least one of website content, mobile application content, a social media post, a customer chat, a frequently asked question item, a product description, a service description and a marketing message. In embodiments, the user interface for generation of content includes a plurality of suggested topics, each associated with an indicator of the similarity of a given suggested topic to a content cluster topic as calculated by at least one of the models.

In embodiments, the data storage facility is a cloud-based storage facility. In embodiments, the data storage facility is a distributed data storage facility.

In embodiments, the primary online content is a web page of an enterprise. In embodiments, the primary online content is a social media page of an enterprise.

In embodiments, the methods and systems may further include an application for developing a strategy for development of online presence content, the application accessing the content cluster data store and having a set of tools for exploring and selecting suggested topics for online presence content generation, wherein the application further accesses the content of a customer relationship management system. In embodiments, the application includes a user interface for developing content regarding a suggested topic for presentation in a communication to a customer, wherein selection of a suggested topic for presentation to a customer is based at least in part on a semantic relationship between the suggested topic as determined by at least one of the models and at least one data record relating to the customer stored in the customer relationship management system.

Also provided herein are methods and systems for automated discovery of topics for interactions with customers of an enterprise, including methods and systems that assist various functions and roles within an enterprise in finding appropriate topics to draw customers into relevant conversations and to extend the conversations in a way that is relevant to the enterprise and to each customer. Automated discovery of relevant content topics may support processes and workflows that require insight into what topics should be written about, such as during conversations with customers. Such processes and workflows may include development of content by human workers, as well as automated generation of content, such as within automated conversational agents, bots, and the like. Automated discovery may include identifying concepts that are related by using a combination of analysis of a relevant item of text (such as core content of a website, or the content of an ongoing conversation) with an analysis of linking (such as linking of related content). In embodiments, this may be performed with awareness at a broad scale of the nature of content on the Internet, such that new, related topics can be automatically discovered that further differentiate an enterprise, while remaining relevant to its primary content. The new topics can be used within a wide range of enterprise functions, such as marketing, sales, services, public relations, investor relations and other functions, including functions that involve the entire lifecycle of the engagement of a customer with an enterprise.

These and other systems, methods, objects, features, and advantages of the present disclosure will be apparent to those skilled in the art from the following detailed description of the preferred embodiment and the drawings.

All documents mentioned herein are hereby incorporated in their entirety by reference. References to items in the singular should be understood to include items in the plural, and vice versa, unless explicitly stated otherwise or clear from the text. Grammatical conjunctions are intended to express any and all disjunctive and conjunctive combinations of conjoined clauses, sentences, words, and the like, unless otherwise stated or clear from the context.

BRIEF DESCRIPTION OF THE FIGURES

The disclosure and the following detailed description of certain embodiments thereof may be understood by reference to the following figures:

FIG. 1 depicts a high-level flow in which a content platform is used to process online content, identify a cluster of semantically relevant topics and produce generated online presence content involving the semantically relevant topics.

FIG. 2 provides a functional block diagram of certain components and elements of a content development platform, including elements for extracting key phrases from a primary online content object, a content cluster data store for storing clusters of topics and a content development and management application having a user interface for developing content.

FIGS. 3, 4, and 5 show examples of user interface elements for presenting suggested topics and related information.

FIG. 6 provides a functional block diagram of certain components and elements of a content development platform, including integration of a customer relationship management system with other elements of the platform.

FIG. 7 provides a detailed functional block diagram of components and elements of a content development platform.

FIG. 8 illustrates a user interface for reporting information relating to online content generated using the content development and management platform.

FIG. 9 depicts a user interface in which activity resulting from the use of the platform is reported to a marketer or other user.

DETAILED DESCRIPTION

Embodiments of the present disclosure are directed to computers, computer systems, networks and data storage arrangements comprising digitally encoded information and machine-readable instructions. The systems are configured and arranged so as to accomplish the present methods, including by transforming given inputs according to said instructions to yield new and useful outputs determining behaviors and physical outcomes. Users of the present system and method will gain new and commercially significant abilities to convey ideas and to promote, create, sell and control articles of manufacture, goods, and other products. The machinery in which the present system and method are implemented will therefore comprise novel and useful devices and architectures of computing and processing equipment for achieving the present objectives.

With reference to FIG. 1, in embodiments of the present disclosure, a platform is provided having a variety of methods, systems, components, services, interfaces, processes, components, data structures and other elements (collectively referred to as the “content development platform 100” except where context indicates otherwise), which enable automated development, deployment and management of content, typically for an enterprise, that is adapted to support a variety of enterprise functions, including marketing strategy and communications, website development, search engine optimization, sales force management, electronic commerce, social networking, and others. Among other benefits, the content development platform 100 uses a range of automated processes to extract and analyze existing online content of an enterprise, parse and analyze the content, and develop a cluster of additional content that is highly relevant to the enterprise, without reliance on conventional keyword-based techniques. Referring to FIG. 1, the content development platform 100 may generally facilitate processing of a primary online content object 102, such as a main web page of an enterprise, to establish a topic cluster 168 of topics that are relevant to one or more core topics 106 that are found in or closely related to the content of the primary line content object 102, such as based on semantic similarity of the topics in the topic cluster 168, including core topics 106, to content within the primary content object 102. The platform 100 may further enable generation of generated online presence content 160, such as reflecting various topics in the topic cluster 168, for use by marketers, sales people, and other writers or content creators on behalf of the enterprise.

In embodiments, the content development platform 100 includes methods and systems for generating a cluster of correlated content from the primary online content object 102. In embodiments, the primary online content object 102 is a web page of an enterprise. In embodiments, the primary online content object 102 is a social media page of an enterprise. In the embodiments described throughout this disclosure, the main web page of an enterprise, or of a business unit of an enterprise, is provided as an example of a primary online content object 102 and in some cases herein is described as a “pillar” of content, reflecting that the web page is an important driver of business for the enterprise, such as for delivering marketing messages, managing public relations, attracting talent, and routing or orienting customers to relevant products and other information. References to a web page or the like herein should be understood to apply to other types of primary online content objects 102, except where context indicates otherwise. An objective of the content development platform 100 may be to drive traffic to a targeted web page, in particular by increasing the likelihood that the web page may be found in search engines, or by users following links to the web page that may be contained in other content, such as content developed using the content development platform 100.

In an aspect, the present systems, data configuration architectures and methods allow an improvement over conventional online content generation schemes. As stated before, traditional online promotional content relied on key word placement and on sympathetic authorship of a main subject (e.g., a web site) and corresponding secondary publications (e.g., blogs and sub-topical content related to the web site), which methods rely on known objective and absolute ranking criteria to successfully promote and rank the web site and sub-topical content. In an increasingly subjective, personalized and context-sensitive search environment, the present systems and methods develop canonical value around a primary online content object such as a web site. In an aspect, a cluster of supportive and correlated content is intelligently generated or indicated so as to optimize and promote the online work product of a promoter (e.g., in support of an agenda or marketing effort). In an example, large numbers of online pages are taken as inputs to the present system and method (e.g., using a crawling, parallel or sequential page processing machine and software).

As shown in simplified FIG. 1, a “core topic” 106 or main subject for a promotional or marketing effort, related to one or more topics, phrases, or the like extracted based on the methods and systems described herein from a primary online content object 102, may be linked to a plurality of supporting and related other topics, such as sub-topics. The core topic 106 may comprise, for example, a canonical source of information on that general subject matter, and preferably be a subject supporting or justifying links with other information on the general topic of a primary online content object 102. In embodiments, visitors to a site where generated online content 160 is located can start at a hyperlinked sub-topic of content and be directed to a core topic 106 within a page, such as a page linked to a primary online content object 102 or to the primary online content object 102 itself. In an example, a core topic 106 can be linked to several (e.g., three to eight, or more) sub-topics. A recommendation or suggestion tool, to be described further below, can recommend or suggest sub-topics, or conversely, it can dissuade or suggest avoidance of sub-topics based on automated logic, which can be enabled by a machine learned process. As will be discussed herein, a content strategy may be employed in developing the overall family of linked content, and the content strategy may supersede conventional key word based strategies according to some or all embodiments hereof.

In embodiments, the system and method analyze, store and process information available from a crawling step, including for a given promoter's web site (e.g., one having a plurality of online pages) so as to determine a salient subject matter and potential sub-topics related to said subject matter of the site. Associations derived from this processing and analysis are stored and further used in subsequent machine learning based analyses of other sites. Data derived from the analysis and storage of the above pages, content and extracted analytics may be organized in an electronic data store, which is preferably a large aggregated database and which may be organized for example using MYSQL or a similar format.

FIG. 2 provides a detailed functional block diagram of certain components and elements of a content development platform, including elements for extracting key phrases from a primary online content object, a content cluster data store for storing clusters of topics and a content development and management application 150 having a user interface for developing content. Within the platform 100, key phrases 112 are extracted from the primary online content object 102 and are processed, such as using a variety of models 118, resulting in one or more content clusters 130 that are stored in a content cluster data store 132. The clusters may comprise the topic clusters 168 that are semantically relevant to core topics reflected in the primary content object 102, as indicated by the key phrases. The models 118, which may access a corpus of content extracted by crawling a relevant set of pages on the Internet, are applied to the key phrases 112 to establish the clusters, which arrange topics around a core topic based on semantic similarity. From the content clusters 130 a suggestion generator 134 may generate one or more suggested topics 138, which may be presented in a user interface 152 of a content development management application 150 within which an agent of an enterprise, such as a marketer, a sales person, or the like may view the suggested topic 138 and relevant information about it (such as indicators of its similarity or relevancy as described elsewhere herein) and create content, such as web pages, emails, customer chats, and other online presence content 160 on behalf of the enterprise. Within the interface 152, the resulting generated online presence content 160 may be linked to the primary online content object 102, such that the primary online content object 102 and one or more generated online presence objects 160 form a cluster of semantically related content, such that visitors to any one of the objects 102, 160 may be driven, including by the links, to the other objects 102, 160. In particular, the platform 100 enables driving viewers who are interested in the topics that differentiate the enterprise to the online presence content, such as the main web pages, of the enterprise. Performance of the topics may be tracked, such as in a reporting and analytics system 180, such that performance-based suggestions may be provided by the suggestion generator 134, such as by suggesting more suggested topics 138 that are similar to ones that have driven increases in traffic to the primary online content object 102.

The system and method are then capable of projection of the crawled, stored and processed information, using the present processing hardware, networking and computing infrastructure so as to generate specially-formatted vectors, e.g., a single vector. The vector or vectors can be according to a Word2vec model used to produce word embeddings in a multi-layer neural network or similar arrangement. Those skilled in the art may appreciate that further reconstruction of linguistic contexts of words are possible by taking a body of content (e.g., language words) to generate such vector(s) in a suitable vector space. Said vectors may further indicate useful associations of words and topical information based on their proximity to one another in said vector space. Vectors based on other content information (e.g., phrases or documents, which can be referred to as Phrase2vec or Document2vec herein) may also be employed in some embodiments. Documents or pages having similar semantic meaning would be conceptually proximal to one another according to the present model. In this way, new terms or phrases or documents may be compared against known data in the data store of the system and generate a similarity, relevance or nearness quantitative metric. Cosine similarity or other methods can be employed as part of this nearness determination. The similarity may be translated into a corresponding score in some embodiments. In other aspects, said score may be used as an input to another process or another optional part of the present system. In yet other aspects, the output may be presented in a user interface presented to a human or machine. The score can further be presented as a “relevance” metric. Human-readable suggestions may be automatically generated by the system and method and provided as outputs, output data or output signals in a processor-driven environment such as a modern computing architecture. The suggestions may in some aspects provide a content context model for guiding promoters (e.g., marketers) towards a best choice of topical content to prepare and put up on their web sites, including suitable and relevant recommendations for work product such as articles and blog posts and social media materials that would promote the promoters' main topics or subjects of interest or sell the products and services of the marketers using the system and method.

In an aspect, the present system and method allows for effective recommendations to promoters that improve the link structure between existing content materials such as online pages, articles and posts. In another aspect, this allows for better targeting of efforts of a promoter based on the desired audience of the efforts, including large groups, small groups or even individuals.

Implementations of the present system and method can vary as would be appreciated by those skilled in the art. For example, the system and method can be used to create a content strategy tool using processing hardware and special machine-readable instructions executing thereon. Consider as a simple illustrative example that a promoter desires to best market a fitness product, service or informational topic. This can be considered as a primary or “core topic” about which other secondary topics can be generated, which are in turn coupled to or related to the core topic. For example, weight lifting, dieting, exercise or other secondary topics may be determined to have a favorable context-based relevance to the core topic. Specific secondary sub-topics about weight lifting routines, entitled, e.g., ‘Best weight lifting routines for men’ or ‘How to improve your training form’ (and so on) may be each turned into a blog post that links back to the core topic web page.

When a user uses the content strategy tool of the present system and method the user may be in some embodiments prompted to select or enter a core (primary) topic based on the user's own knowledge or the user's field of business. The tool may them use this, along with a large amount of crawled online content that was analyzed, or along with extracted information resulting from such crawling of online content and prior stored search criteria and results, which is now context-based, to validate a topic against various criteria.

In an example, topics are suggested (or entered topics are rated) based on the topics' competitiveness, popularity and relevance. Those skilled in the art may appreciate other similar criteria which can be used as metrics in the suggestion or evaluation of a topic.

Competitiveness can comprise a measure of how likely a domain (Web domain) would be ranked on “Page 1” for a particular term or phrase. The lower the percentile ranking, the more difficult it is to rank for that term or phrase (e.g., as determined by a Moz rank indicating a site's authority).

Popularity as a metric is a general measure of a term or phrase's periodic (e.g., monthly) search volume from various major search engines. The greater this percentage, the more popular the term or phrase is.

Relevance as a metric generally indicates how close a term or phrase is to other content put up on the user's site or domain. The lower the relevance, the further away the term or phrase is from what the core topic of the site or domain is. This can be automatically determined by a crawler that crawls the site or domain to determine its main or core topic of interest to consumers. If relevance is offered as a service by the present system and method a score can be presented through a user or machine interface indicating how relevant the new input text is to an existing content pool.

Timeliness of the content is another aspect that could be used to drive content suggestions or ratings with respect to a core topic. For example, a recent-ness (recency) metric may be used in addition to those given above for the sake of illustration of embodiments of the system and method.

Therefore, analysis and presentation of information indicating cross relationships between topics becomes effective under the present scheme. These principles may further be applied to email marketing or promotional campaigns to aid in decision making as to the content of emails sent to respective recipients so as to maximally engage the recipients in the given promotion.

Other possible features include question classification; document retrieval; passage retrieval; answer processing; and factoid question answering.

Note that the present concepts can be carried across languages insofar as an aspect hereof provides for manual or automated translation from a first language to a second language, and that inputs, results and outputs of the system can be processed in one or another language, or in a plurality of languages as desired.

FIG. 3, FIG. 4, and FIG. 5 are illustrative depictions of exemplary simplified aspects of the present system, method and tools. These depictions are not meant to be exhaustive or limiting, but are merely examples of how some features could be provided to a user of the system and method.

Some embodiments hereof employ a latent semantic analysis (LSA) model, encoded using data in a data store and programmed instructions and/or processing circuitry to generate an output comprising an association between various content by the promoter user of the system and method. LSA being applied here to analyze relationships between a (large) set of documents and the data contained therein. In one embodiment machine learning may be used to develop said association output or outputs.

FIG. 6 provides a functional block diagram of certain additional optional components and elements of the content development platform 100, including integration of a customer relationship management system 158 with other elements of the platform. In embodiments, the generated online content object 160 may comprise messaging content for a customer interaction that is managed via a customer relationship management system 158. In embodiments, the customer relationship management system 158 may include one or more customer data records 164, such as reflecting data on groups of customers or individual customers, including demographic data, geographic data, psychographic data, data relating to one or more transactions, data indicating topics of interest to the customers, data relating to conversations between agents of the enterprise and the customers, data indicating past purchases, interest in particular products, brands, or categories, and other customer relationship data. The customer data records 164 may be used by the platform 100 to provide additional suggested topics 138, to select among suggested topics 138, to modify suggested topics 138, or the like. In embodiments, the CRM system 158 may support interactions with a customer, such as through a customer chat 184, which in embodiments may be edited in the user interface 152 of the content development and management application 150, such as to allow a writer, such as an inside sales person or marketer who is engaging in the customer chat 184 with the customer to see suggested topics 138 that may be of interest to the customer, such as based on the customer data records 164 and based on relevancy of the topics to the main differentiators of the enterprise. In embodiments, a conversational agent 182 may be provided within or integrated with the platform 100, such as for automating one or more conversations between the enterprise and a customer. The conversational agent 182 may take suggested topics from the suggestion generator 134 to facilitate initiation of conversations with customers around topics that differentiate the enterprise, such as topics that are semantically relevant to key phrases found in the primary online content object 102. In embodiments, the conversational agent 182 may populate a customer chat 184 in the user interface 152, such as providing seed or draft content that a writer for the enterprise can edit.

FIG. 7 provides a detailed functional block diagram of components and elements of a content development platform. The methods and systems may include an automated crawler 104 for crawling the primary online content object 102 and storing a set of results from the crawling in a data storage facility 108. In embodiments, the data storage facility is a cloud-based storage facility, such as a simple storage facility, such as an S3™ bucket provided by Amazon™, such as on a web service platform, such as the Amazon Web Services™ (AWS) platform. In embodiments, the data storage facility is a distributed data storage facility. In embodiments, the automated crawler 104 crawls one or more domains associated with an enterprise customers' content, such as the customer's portal, main web page, or the like, as the primary online content object 102, in order to identify topics already in use on those sites and stores the pages in S3™ storage, with metadata in a database, such as a MySQL database. The content development platform 100 may include a parser 110 for parsing the stored content from the crawling activity of the automated crawler 104 to generate a plurality of key phrases 112 and to generate a content corpus 114 from the primary online content object 102. The content development platform 100 may include, use or integrate with one or more of a plurality of models 118 for processing at least one of the key phrases 112 and the corpus 114.

The models 118 may include one or more of a word2vec model 120, a doc2vec model 122, a latent semantic analysis (LSA) extraction model, or LSA model 124, and a key phrase logistic regression model 128, wherein the processing results in a plurality of content clusters 130 representing topics within the primary online content object 102. In embodiments, the platform 100 may take content for a primary content object 102, such as a website, and extract a number of phrases, such as a number of co-located phrases, based on processing the n-grams present in the content (e.g., unigrams, bi-grams, tri-grams, tetra-grams, and so on), which may in the LSA model 124, be ranked based on the extent of presence in the content and based on a vocabulary that is more broadly used across a more general body of content, such as a broad set of Internet content. This provides a vector representation of a website within the LSA model 124. Based on crawling with automatic crawler 104 of over 619 million pages on the public internet (seeking to ignore ignoring those pages that are light on content), an LSA model 124 has been trained using machine learning, using a training set of more than 250 million pages, such that the LSA model 124 is trained to understand associations between elements of content.

In embodiments, the one or more models 118 include the word2vec model 120 or other model (e.g., doc2vec 122 or phrase2vec) that projects crawled-domain primary online object content 102, such as from customers' domains, into a single vector. In embodiments, the vector space is such that documents that contain similar semantic meaning are close together. The application of the word2vec model 120 and the doc2vec model 122 to the vector representation of primary content object 102 (e.g., website) to draw vectors may result in a content-context model based on co-located phrases. This allows new terms to be compared against that content context database to determine how near it is to the enterprise's existing primary online content objects 102 (e.g., webpages), such as using cosine similarity. That similarity may then be converted into a score and displayed through the UI, such as displaying it as a “Relevancy” score. Ultimately, the content context model may be used to give recommendations and guidance for how individuals can choose good topics to write about, improve the link structure of existing content, and target marketing and other efforts based on their audiences' individual topic groups of interest. In embodiments, the plurality of models 118 used by the platform may comprise other forms of model for clustering documents and other content based on similarity, such as a latent semantic indexing model, a principle component analysis model, or the like. In embodiments other similar models may be used, such as a phrase2vec model, or the like.

An objective of the various models 118 is to enable clustering of content, or “topic clusters 168” around relevant key phrases, where the topic clusters 168 include semantically similar words and phrases (rather than simply linking content elements that share exactly matching keywords). Semantic similarity can be determined by calculating vector similarity around key phrases appearing in two elements of content. In embodiments, topic clusters may be automatically clustered, such as by an auto-clustering engine 172 that manages a set of software jobs that take web pages from the primary content object 102, use a model 118, such as the LSA model 124 to turn the primary content object 102 into a vector representation, project the vector representation on to a space (e.g., a two-dimensional space), perform an affinity propagation that seeks to find natural groupings among the vectors (representing clusters of ideas within the content), and show the groupings as clusters of content. Once groups are created, a reviewer, such as a marketer or other content developer, can select one or more “centers” within the clusters, such as recognizing a core topic within the marketer's “pillar” content (such as a main web page), which may correspond to the primary content object 102. Nodes in the cluster that are in close proximity to the identified centers may represent good additional topics about which to develop content or to which to establish links; for example, topic clusters can suggest an appropriate link structure among content objects managed by an enterprise and with external content objects, such as third-party objects, where the link structure is based on building an understanding of a semantic organization of cluster of topics and mirroring the other content and architecture of links surrounding a primary content object 102 based on the semantic organization.

The content development platform 100 may include a content cluster data store 132 for storing the content clusters 130. The content cluster data store 132 may comprise a MySQL database or other type of database. The content cluster data store 132 may store mathematical relationships, based on the various models 118, between content objects, such as the primary content object 102 and various other content objects or topics, which, among other things, may be used to determine what pages should be in the same cluster of pages (and accordingly should be linked to each other). In embodiments, clusters are based on matching semantics between phrases, not just matching exact phrases. Thus, new topics can be discovered by observing topics or subtopics within semantically similar content objects in a cluster that are not already covered in a primary content object 102. In embodiments, an auto-discovery engine 170 may process a set of topics in a cluster to automatically discover additional topics that may be of relevance to parties interested in the content of the primary content object 102.

In embodiments, topics within a cluster in the content cluster data store 132 may be associated with a relevancy score 174 (built from the models 118), which in embodiments may be normalized to a single number that represents the calculated extent of semantic similarity of a different topic to the core topic (e.g., the center of a cluster, such as reflecting the core topic of a primary content object 102, such as a main web page of an enterprise). The relevancy score 174 may be used to facilitate recommendations or suggestions about additional topics within a cluster that may be relevant for content development.

The content development platform may include a suggestion generator 134 for generating, using output from at least one of the models, a suggested topic 138 that is similar to at least one topic among the content clusters and for storing the suggested topic 138 and information regarding the similarity of the suggested topic 138 to at least one content cluster 130 in the content cluster data store 132. Suggested topics 138 may include sub-topic suggestions, suggestions for additional core topics and the like, each based on semantic similarity (such as using a relevancy score 174 or similar calculation) to content in the primary content object 102, such as content identified as being at the center of a cluster of topics. Suggestions may be generated by using the keyphrase logistic regression model 128 on the primary content object 102, which, among other things, determines, for a given phrase that is similar to the content in a cluster, how relatively unique the phrase is relative to a wider body of content, such as all of the websites that have been crawled across the broader Internet. Thus, through a combination of identifying semantically similar topics in a cluster (e.g., using the word2vec model 120, doc2vec model 122, and LSA model 124) and identifying which of those are relatively differentiated (using the keyphrase logistic regression model 128), a set of highly relevant, well differentiated topics may be generated, which the suggestion generator 134 may process for production of one or more suggested topics 138.

In embodiments, the parser 110 uses a parsing machine learning system 140 to parse the crawled content. In embodiments, the machine learning system 140 iteratively applies a set of weights to input data, wherein the weights are adjusted based on a parameter of success, wherein the parameter of success is based on the success of suggested topics 138 in the online presence of an enterprise. In embodiments, the machine learning system is provided with a parser training data set 142 that is created based on human analysis of the crawled content.

In embodiments, at least one of the plurality of models used in the platform uses a clustering machine learning system 144 to cluster content into the content clusters 130. In embodiments, the clustering machine learning system 144 iteratively applies a set of weights to input data, wherein the weights are adjusted based on a parameter of success, wherein the parameter of success is based on the success of suggested topics in the online presence of an enterprise. In embodiments, the machine learning system is provided with a training data set that is created based on human clustering of a set of content topics.

In embodiments, the suggestion generator 134 uses a suggestion machine learning system 148 to suggest topics. In embodiments, the suggestion machine learning system 148 iteratively applies a set of weights to input data, wherein the weights are adjusted based on a parameter of success, wherein the parameter of success is based on the success of suggested topics in the online presence of an enterprise. In embodiments, the suggestion machine learning system 148 is provided with a training data set that is created based on human creation of a set of suggested topics.

In embodiments, the methods and systems disclosed herein may further include a content development and management application 150 for developing a strategy for development of online presence content, the application 150 accessing the content cluster data store 132 and having a set of tools for exploring and selecting suggested topics 138 for online presence content generation. In embodiments, the application 150 provides a list of suggested topics 138 that are of highest semantic relevance for an enterprise based on the parsing of the primary online content object. In embodiments, the methods and systems may further include a user interface 152 of the application 150 for presenting a suggestion, wherein the generated suggestion is presented with an indicator of the similarity 154 of the suggested topic 138 to a topic in the content cluster 130 as calculated by at least one of the models 118.

In embodiments, the content development and management application 150 may include a cluster user interface 178 portion of the user interface 152 in which, after a primary content object 102 has been brought on board to the content development platform 100, a cluster of linked topics can be observed, including core topics in the primary content object 102 and various related topics. The cluster user interface 178 may allow a user, such as a sales or marketing professional, to explore a set of topics, such as seeing topics that are highly relevant to a brand of the enterprise and related topics, which, in embodiments, may be presented with a relevancy score 174 or other measure of similarity, as well as with other information, such as search volume information and the like. In embodiments, the cluster user interface 178 or other portion of the user interface 152 may allow a user to select and attach one or more topics or content objects, such as indicating which topics should be considered at the core for the enterprise, for a brand, or for a particular project. Thus, the cluster framework embodied in the cluster user interface 178 allows a party to frame the context of what topics the enterprise wishes to be known for online (such as for the enterprise as a whole or for a brand of the enterprise).

The content development and management application 150 may comprise a content strategy tool that encourages users to structure content in clusters based on the notion that topics are increasingly more relevant that keywords, so that enterprises should focus on owning a content topic, rather than going after individual keywords. Each topic cluster 168 may have a “core topic,” such as implemented as a web page on that core topic. For example, on a personal trainer's website, the core topic might be “weightlifting.” Around those core topics 106 should be subtopics (in this example, this might include things like “best weightlifting routines” or “how to improve your weightlifting form”), each of which should be made into a blog post that links back to the core topic page.

When users use the content development and management application 150, or content strategy tool, the user may be prompted to enter a topic based on the user's own knowledge of the enterprise. The content development and management application 150 or tool may also use information gleaned by crawling domains of the enterprise with the automated crawler 104, such as to identify existing topic clusters on their site (i.e., the primary online content object 102). For each identified core topic, the topic may be validated based on one or more metrics or criteria, such as competitiveness, popularity, relevancy, or the like, such as reflected by relevancy based on cosine similarity between a topic and the core topic, or based on various other sources of website analytics data. Competitiveness may comprise a measure of how likely a domain or primary online content object 102 is to rank highly, such as on a first page of search engine results, for a particular word, phrase, or term. The lower the percentage on this metric, the harder it will be to achieve a high rank for that term. This may be determined by a source like MozRank™ (provided by Moz™), a PageRank™ (provided by Google™), or other ranking metric, reflecting the primary online content object's 102 domain authority, absent other factors. Popularity may comprise a general measure of a topic's monthly search volume or similar activity level, such as from various search engines. The higher the percentage, the more popular the term. This may be obtained from a source like SEMRush™, such as with data in broad ranges of 1-1000, 1000-10000, etc. Relevancy may comprise a metric of close a topic, phrase, term or the like to other content, such as topic already covered in other domains of a user, or the like. The lower the relevancy, the further away a given term is from what an enterprise is known for, such as based on comparison to a crawl by the automated crawler 104 of the enterprise's website and other domains. Relevancy may be provided or supported by the content context models 118 as noted throughout this disclosure.

As the models 118 analyze more topics, the models learn and improve, such that increasingly accurate measures may be provided as relevancy and the like. Once the user has selected a topic, the user may be prompted to identify subtopics related to that topic. Also, the platform 100 may recommend or auto-fill subtopics that have been validated based on their similarity to the core topic and based on other scoring metrics. When the user has filled out a cluster of topics, the platform 100 may alert the user to suggested links connecting each subtopic page to a topic page, including recommending adding links where they are currently absent. The content development and management application 150 may also allow customers to track the performance of each cluster, including reporting on various metrics used by customers to analyze individual page performance. The content development and management application 150 or tool may thus provide several major improvements over our current tools, including a better “information architecture” to understand the relationship between pieces of content, built-in keyword validation, and holistic analysis of how each cluster of topics performs.

In embodiments, the user interface 152 facilitates generation of generated online presence content 160 related to the suggested topic 138. In embodiments, the user interface 152 includes at least one of key words and key phrases that represent the suggested topic 138, which may be used to prompt the user with content for generation of online presence content. In embodiments, the generated online presence content is at least one of website content, mobile application content, a social media post, a customer chat, a frequently asked question item, a product description, a service description and a marketing message. In embodiments, the generated online presence content may be linked to the primary content object 102, such as to facilitate traffic between the generated online presence content and the primary content object 102 and to facilitate discovery of the primary content object 102 and the generated online presence content 160 by search engines 162. The user interface 152 for generating content may include a function for exploring phrases for potential inclusion in generated online presence content 160; for example, a user may input a phrase, and the platform 100 may use a relevancy score 174 or other calculation to indicate a degree of similarity. For example, if a topic is only 58% similar to a core topic, then a user might wish to find something more similar. User interface elements, such as colors, icons, animated elements and the like may help orient a user to favorable topics and help avoid unfavorable topics.

In embodiments, the application 150 may facilitate creation and editing of content, such as blog posts, chats, conversations, messages, website content, and the like, and the platform may parse the phrases written in the content to provide a relevancy score 174 as the content is written. For example, as a blog is being written, the marketer may see whether phrases that are being written are more or less relevant to a primary content object 102 that has been selected and attached to an enterprise, a project, or a brand within the platform 100. Thus, the content development and management application 150 may steer the content creator toward more relevant topics, and phrases that represent those topics. This may include prompts and suggestions from the suggestion generator 134. The user interface 152 may include elements for assisting the user to optimize content, such as optimizing for a given reading level and the like. The user interface 152 may provide feedback, such as confirming that the right key phrases are contained in a post, so that it is ready to be posted.

In embodiments, the application 150 for developing a strategy for development of generated online presence content 160 may access content cluster data store 132 and may include various tools for exploring and selecting suggested topics 138 for generating the generated online presence content 160. In embodiments 150, the application 150 may further access the content of the customer relationship management (CRM) system 158. In embodiments, the application 150 includes a user interface 152 for developing content regarding a suggested topic 138 for presentation in a communication to a customer, wherein selection of a suggested topic 138 for presentation to a customer is based at least in part on a semantic relationship between the suggested topic as determined by at least one of the models 118 and at least one customer data record 164 relating to the customer stored in the customer relationship management system 158.

The platform 100 may include, be integrated with, or feed a reporting and analytics system 180 that may provide, such as in a dashboard or other user interface, such as, in a non-limiting example, in the user interface 152 of the content development and management application 150, various reports and analytics 188, such as various measures of performance of the platform 100 and of the generated online content object 160 produced using the platform 100, such as prompted by suggestions of topics. As search engines have increasingly obscured information about how sites and other content objects are ranked (such as by declining to provide keywords), it has become very important to develop alternative measures of engagement. In embodiments, the platform 100 may track interactions across the life cycle of engagement of an enterprise with a customer, such as during an initial phase of attracting interest, such as through marketing or advertising that may lead to a visit to a website or other primary online content object 102, during a process of lead generation, during conversations or engagement with the customer (such as by chat functions, conversational agents, or the like), during the process of identifying relevant needs and products that may meet those needs, during the delivery or fulfillment of orders and the provision of related services, and during any post-sale follow-up, including to initiate further interactions. By integration with the CRM system 158 of an enterprise, the platform 100 may provide measures that indicate what other activities of or relating to customers, such as generation of leads, visits to web pages, traffic and clickstream data relating to activity on a web page, links to content, e-commerce and other revenue generated from a page, and the like, were related to a topic, such as a topic for which a generated online content object 160 was created based on a suggestion generated in the platform 100. Thus, by integration of a content development and management application 150 and a CRM system 158, revenue can be linked to generated content 160 and presented in the reporting and analytics system 180.

FIG. 8 shows an example of a user interface of the reporting and analytics system 180.

In general, a wide range of analytics may be aggregated by topic cluster (such as a core topic and related topics linked to the core topic in the cluster), rather than by web page, so that activities involved in generating the content in the cluster can be attributed with the revenue and other benefits that are generated as a result. Among these are elements tracked in a CRM system 158, such as contact events, customers (such as prospective customers, leads, actual customers, and the like), deals, revenue, profit, and tasks.

In embodiments, the platform 100 may proactively recommend core topics, such as based on crawling and scraping existing site content of an enterprise. Thus, also provided herein is the auto-discovery engine 170, including various methods, systems, components, modules, services, processes, applications, interfaces and other elements for automated discovery of topics for interactions with customers of an enterprise, including methods and systems that assist various functions and roles within an enterprise in finding appropriate topics to draw customers into relevant conversations and to extend the conversations in a way that is relevant to the enterprise and to each customer. Automated discovery of relevant content topics may support processes and workflows that require insight into what topics should be written about, such as during conversations with customers. Such processes and workflows may include development of content by human workers, as well as automated generation of content, such as within automated conversational agents, bots, and the like. Automated discovery may include identifying concepts that are related by using a combination of analysis of a relevant item of text (such as core content of a website, or the content of an ongoing conversation) with an analysis of linking (such as linking of related content). In embodiments, this may be performed with awareness at a broad scale of the nature of content on the Internet, such that new, related topics can be automatically discovered that further differentiate an enterprise, while remaining relevant to its primary content. The new topics can be used within a wide range of enterprise functions, such as marketing, sales, services, public relations, investor relations and other functions, including functions that involve the entire lifecycle of the engagement of a customer with an enterprise.

As noted above, customers increasingly expect more personalized interactions with enterprises, such as via context-relevant chats that properly reflect the history of a customer's relationship with the enterprise. Chats, whether undertaken by human workers, or increasingly by intelligent conversational agents, are involved across all of the customer-facing activities of an enterprise, including marketing, sales, public relations, services, and others. Content development and strategy is relevant to all of those activities, and effective conversational content, such as managed in a chat or by a conversational agent 182, needs to relate to relevant topics while also reflecting information about the customer, such as demographic, psychographic and geographic information, as well as information about past interactions with the enterprise. Thus, integration of the content development and management platform 100 with the CRM system 158 may produce appropriate topics within the historical context of the customer and the customer's engagement with the enterprise. For example, in embodiments, tickets or tasks may be opened in a CRM system 158, such as prompting creation of content, such as based on customer-relevant suggestions, via the content development and management application 150, such as content for a conversation or chat with a customer (including one that may be managed by a conversational agent 182 or bot), content for a marketing message or offer to the customer, content to drive customer interest in a web page, or the like. In embodiments, a customer conversation or customer chat 184 may be managed through the content development and management application 150, such as by having the chat occur within the user interface 152, such that an agent of the enterprise, like an inside sales person, can engage in the chat by writing content, while seeing suggested topics 138, indicators of relevance or similarity 154 and the like. In this context, relevance indicators can be based on scores noted above (such as reflecting the extent of relevance to core topics that differentiate the enterprise), as well as topics that are of interest to the customer, such as determined by processing information, such as on historical conversations, transactions, or the like, stored in the CRM system 158. In embodiments, to facilitate increased, the customer chat 184 may be populated with seed or draft content created by an automated conversational agent 182, so that a human agent can edit the content into a final version for the customer interaction.

In embodiments, the models 118 (collectively referred to as one or more content context models), and the platform 100 more generally, may enable a number of capabilities and benefits, including helping users come up with ideas of new topics to write about based on a combination of the content cluster data store 132, a graph of topics for the site or other content of the enterprise, and one or more analytics. This may help writers find gaps in content that should be effective, but that are not currently written about. The models 118, and platform 100 may also enable users to come up with ideas about new articles, white papers and other content based on effective topics. The models 118, and platform 100 may also enable users to understand effectiveness of content at the topic level, so that a user can understand which topics are engaging people and which aren't. This may be analyzed for trends over time, so a user can see if a topic is getting more or less engagement. The models 118, and platform 100 may also enable users to apply information about topics to at the level of the individual contact record, such as in the customer relationship management system 158, to help users understand with what content a specific person engages. For example, for a user “Joe,” the platform 100, by combining content development and management with customer relationship management, may understand whether Joe is engaging more in “cardio exercise” or “weight lifting.” Rather than only looking at the aggregate level, user may at the individual level for relevant topics. Development of content targeted to an individual's topics of interest may be time-based, such as understanding what content has recently been engaged with and whether preferences are changing over time.

The models 118, and platform 100 may also enable looking at cross relationships between topics. For example, analytics within the platform 100 and on engagement of content generated using the platform 100 may indicate that people who engage frequently with a “cardio” topic also engage frequently with a “running” topic. If so, the platform 100 may offer suggested topics that are interesting to a specific person based on identifying interest in one topic and inferring interest in others.

The models 118, and platform 100 may also enable development of email content, such as based on understanding the topic of the content of an email, an email campaign, or the like. This may include understanding which users are engaging with which content, and using that information to determine which emails, or which elements of content within emails, are most likely to be engaging to specific users.

FIG. 8 illustrates a user interface for reporting information relating to online content generated using the content development and management platform. Various indicators of success, as noted throughout this disclosure, may be presented, such as generated by the reporting and analytics systems 180.

FIG. 9 depicts an embodiment of a user interface in which activity resulting from the use of the platform is reported to a marketer or other user. Among other metrics that are described herein, the user interface can report on what customers, such as ones to be entered into or already tracked in the CRM system, have had a first session of engagement with content, such as a web page, as a result of the content strategy, such as where the customers arrive via a link contained in a sub-topic or other topic linked to a core topic as described herein.

The present concepts can be applied to modern sophisticated searching methods and systems with improved success. For example, in a context-sensitive or personalized search request, the results may be influenced by one or more of the following: location, time of day, format of query, device type from which the request is made, and contextual cues.

In an embodiment, a topical cluster comprising a core topic and several sub-topics can be defined and refined using the following generalized process: 1. Mapping out of several (e.g., five to ten) of the topics that a target person (e.g., customer) is interested in; 2. Group the topics into one or more generalized (core) topic into which the sub-topics could be fit; 3. Build out each of the core topics with corresponding sub-topics using keywords or other methods; 4. Map out content ideas that align with each of the core topics and corresponding sub-topics; 5. Validate each idea with industry and competitive research; and 6. Create, measure and refine the data and models and content discovered from the above process. These steps are not intended to be limiting or exhaustive, as those skilled in the art might appreciate alternate or additional steps suiting a given application. Some of the above steps may also be omitted or combined into one step, again, to suit a given application at hand.

In some embodiments, a system and method are provided that can be used to provide relevancy scores (or quantitative metrics) as a service. Content generation suggestions can also be offered as a service using the present system and method, including synonyms, long tail key words and enrichment by visitor analytics in some instances.

Having thus described several aspects and embodiments of the technology of this application, it is to be appreciated that various alterations, modifications, and improvements will readily occur to those of ordinary skill in the art. Such alterations, modifications, and improvements are intended to be within the spirit and scope of the technology described in the application. For example, those of ordinary skill in the art will readily envision a variety of other means and/or structures for performing the function and/or obtaining the results and/or one or more of the advantages described herein, and each of such variations and/or modifications is deemed to be within the scope of the embodiments described herein.

Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific embodiments described herein. It is, therefore, to be understood that the foregoing embodiments are presented by way of example only and that, within the scope of the appended claims and equivalents thereto, inventive embodiments may be practiced otherwise than as specifically described. In addition, any combination of two or more features, systems, articles, materials, kits, and/or methods described herein, if such features, systems, articles, materials, kits, and/or methods are not mutually inconsistent, is included within the scope of the present disclosure.

The above-described embodiments may be implemented in any of numerous ways. One or more aspects and embodiments of the present application involving the performance of processes or methods may utilize program instructions executable by a device (e.g., a computer, a processor, or other device) to perform, or control performance of, the processes or methods.

In this respect, various inventive concepts may be embodied as a computer readable storage medium (or multiple computer readable storage media) (e.g., a computer memory, one or more floppy discs, compact discs, optical discs, magnetic tapes, flash memories, circuit configurations in Field Programmable Gate Arrays or other semiconductor devices, or other tangible computer storage medium) encoded with one or more programs that, when executed on one or more computers or other processors, perform methods that implement one or more of the various embodiments described above.

The computer readable medium or media may be transportable, such that the program or programs stored thereon may be loaded onto one or more different computers or other processors to implement various ones of the aspects described above. In some embodiments, computer readable media may be non-transitory media.

The terms “program” or “software” are used herein in a generic sense to refer to any type of computer code or set of computer-executable instructions that may be employed to program a computer or other processor to implement various aspects as described above. Additionally, it should be appreciated that according to one aspect, one or more computer programs that when executed perform methods of the present application need not reside on a single computer or processor, but may be distributed in a modular fashion among a number of different computers or processors to implement various aspects of the present application.

Computer-executable instructions may be in many forms, such as program modules, executed by one or more computers or other devices. Generally, program modules include routines, programs, objects, components, data structures, etc. that performs particular tasks or implement particular abstract data types. Typically, the functionality of the program modules may be combined or distributed as desired in various embodiments.

Also, data structures may be stored in computer-readable media in any suitable form. For simplicity of illustration, data structures may be shown to have fields that are related through location in the data structure. Such relationships may likewise be achieved by assigning storage for the fields with locations in a computer-readable medium that convey relationship between the fields. However, any suitable mechanism may be used to establish a relationship between information in fields of a data structure, including through the use of pointers, tags or other mechanisms that establish relationship between data elements.

Also, as described, some aspects may be embodied as one or more methods. The acts performed as part of the method may be ordered in any suitable way. Accordingly, embodiments may be constructed in which acts are performed in an order different than illustrated, which may include performing some acts simultaneously, even though shown as sequential acts in illustrative embodiments.

The present disclosure should therefore not be considered limited to the particular embodiments described above. Various modifications, equivalent processes, as well as numerous structures to which the present disclosure may be applicable, will be readily apparent to those skilled in the art to which the present disclosure is directed upon review of the present disclosure.

Detailed embodiments of the present disclosure are disclosed herein; however, it is to be understood that the disclosed embodiments are merely exemplary of the disclosure, which may be embodied in various forms. Therefore, specific structural and functional details disclosed herein are not to be interpreted as limiting, but merely as a basis for the claims and as a representative basis for teaching one skilled in the art to variously employ the present disclosure in virtually any appropriately detailed structure.

The terms “a” or “an,” as used herein, are defined as one or more than one. The term “another,” as used herein, is defined as at least a second or more. The terms “including” and/or “having”, as used herein, are defined as comprising (i.e., open transition).

While only a few embodiments of the present disclosure have been shown and described, it will be obvious to those skilled in the art that many changes and modifications may be made thereunto without departing from the spirit and scope of the present disclosure as described in the following claims. All patent applications and patents, both foreign and domestic, and all other publications referenced herein are incorporated herein in their entireties to the full extent permitted by law.

The methods and systems described herein may be deployed in part or in whole through a machine that executes computer software, program codes, and/or instructions on a processor. The present disclosure may be implemented as a method on the machine, as a system or apparatus as part of or in relation to the machine, or as a computer program product embodied in a computer readable medium executing on one or more of the machines. In embodiments, the processor may be part of a server, cloud server, client, network infrastructure, mobile computing platform, stationary computing platform, or other computing platform. A processor may be any kind of computational or processing device capable of executing program instructions, codes, binary instructions and the like. The processor may be or may include a signal processor, digital processor, embedded processor, microprocessor or any variant such as a co-processor (math co-processor, graphic co-processor, communication co-processor and the like) and the like that may directly or indirectly facilitate execution of program code or program instructions stored thereon. In addition, the processor may enable execution of multiple programs, threads, and codes. The threads may be executed simultaneously to enhance the performance of the processor and to facilitate simultaneous operations of the application. By way of implementation, methods, program codes, program instructions and the like described herein may be implemented in one or more thread. The thread may spawn other threads that may have assigned priorities associated with them; the processor may execute these threads based on priority or any other order based on instructions provided in the program code. The processor, or any machine utilizing one, may include non-transitory memory that stores methods, codes, instructions and programs as described herein and elsewhere. The processor may access a non-transitory storage medium through an interface that may store methods, codes, and instructions as described herein and elsewhere. The storage medium associated with the processor for storing methods, programs, codes, program instructions or other type of instructions capable of being executed by the computing or processing device may include but may not be limited to one or more of a CD-ROM, DVD, memory, hard disk, flash drive, RAM, ROM, cache and the like.

A processor may include one or more cores that may enhance speed and performance of a multiprocessor. In embodiments, the process may be a dual core processor, quad core processors, other chip-level multiprocessor and the like that combine two or more independent cores (called a die).

The methods and systems described herein may be deployed in part or in whole through a machine that executes computer software on a server, client, firewall, gateway, hub, router, or other such computer and/or networking hardware. The software program may be associated with a server that may include a file server, print server, domain server, internet server, intranet server, cloud server, and other variants such as secondary server, host server, distributed server and the like. The server may include one or more of memories, processors, computer readable media, storage media, ports (physical and virtual), communication devices, and interfaces capable of accessing other servers, clients, machines, and devices through a wired or a wireless medium, and the like. The methods, programs, or codes as described herein and elsewhere may be executed by the server. In addition, other devices required for execution of methods as described in this application may be considered as a part of the infrastructure associated with the server.

The server may provide an interface to other devices including, without limitation, clients, other servers, printers, database servers, print servers, file servers, communication servers, distributed servers, social networks, and the like. Additionally, this coupling and/or connection may facilitate remote execution of program across the network. The networking of some or all of these devices may facilitate parallel processing of a program or method at one or more location without deviating from the scope of the disclosure. In addition, any of the devices attached to the server through an interface may include at least one storage medium capable of storing methods, programs, code and/or instructions. A central repository may provide program instructions to be executed on different devices. In this implementation, the remote repository may act as a storage medium for program code, instructions, and programs.

The software program may be associated with a client that may include a file client, print client, domain client, internet client, intranet client and other variants such as secondary client, host client, distributed client and the like. The client may include one or more of memories, processors, computer readable media, storage media, ports (physical and virtual), communication devices, and interfaces capable of accessing other clients, servers, machines, and devices through a wired or a wireless medium, and the like. The methods, programs, or codes as described herein and elsewhere may be executed by the client. In addition, other devices required for execution of methods as described in this application may be considered as a part of the infrastructure associated with the client.

The client may provide an interface to other devices including, without limitation, servers, other clients, printers, database servers, print servers, file servers, communication servers, distributed servers and the like. Additionally, this coupling and/or connection may facilitate remote execution of program across the network. The networking of some or all of these devices may facilitate parallel processing of a program or method at one or more location without deviating from the scope of the disclosure. In addition, any of the devices attached to the client through an interface may include at least one storage medium capable of storing methods, programs, applications, code and/or instructions. A central repository may provide program instructions to be executed on different devices. In this implementation, the remote repository may act as a storage medium for program code, instructions, and programs.

The methods and systems described herein may be deployed in part or in whole through network infrastructures. The network infrastructure may include elements such as computing devices, servers, routers, hubs, firewalls, clients, personal computers, communication devices, routing devices and other active and passive devices, modules and/or components as known in the art. The computing and/or non-computing device(s) associated with the network infrastructure may include, apart from other components, a storage medium such as flash memory, buffer, stack, RAM, ROM and the like. The processes, methods, program codes, instructions described herein and elsewhere may be executed by one or more of the network infrastructural elements. The methods and systems described herein may be adapted for use with any kind of private, community, or hybrid cloud computing network or cloud computing environment, including those which involve features of software as a service (SaaS), platform as a service (PaaS), and/or infrastructure as a service (IaaS).

The methods, program codes, and instructions described herein and elsewhere may be implemented on or through mobile devices. The mobile devices may include navigation devices, cell phones, mobile phones, mobile personal digital assistants, laptops, palmtops, netbooks, pagers, electronic books readers, music players and the like. These devices may include, apart from other components, a storage medium such as a flash memory, buffer, RAM, ROM and one or more computing devices. The computing devices associated with mobile devices may be enabled to execute program codes, methods, and instructions stored thereon. Alternatively, the mobile devices may be configured to execute instructions in collaboration with other devices. The mobile devices may communicate with base stations interfaced with servers and configured to execute program codes. The mobile devices may communicate on a peer-to-peer network, mesh network, or other communications network. The program code may be stored on the storage medium associated with the server and executed by a computing device embedded within the server. The base station may include a computing device and a storage medium. The storage device may store program codes and instructions executed by the computing devices associated with the base station.

The computer software, program codes, and/or instructions may be stored and/or accessed on machine readable media that may include: computer components, devices, and recording media that retain digital data used for computing for some interval of time; semiconductor storage known as random access memory (RAM); mass storage typically for more permanent storage, such as optical discs, forms of magnetic storage like hard disks, tapes, drums, cards and other types; processor registers, cache memory, volatile memory, non-volatile memory; optical storage such as CD, DVD; removable media such as flash memory (e.g. USB sticks or keys), floppy disks, magnetic tape, paper tape, punch cards, standalone RAM disks, Zip drives, removable mass storage, off-line, and the like; other computer memory such as dynamic memory, static memory, read/write storage, mutable storage, read only, random access, sequential access, location addressable, file addressable, content addressable, network attached storage, storage area network, bar codes, magnetic ink, and the like.

The methods and systems described herein may transform physical and/or intangible items from one state to another. The methods and systems described herein may also transform data representing physical and/or intangible items from one state to another.

The elements described and depicted herein, including in flowcharts and block diagrams throughout the figures, imply logical boundaries between the elements. However, according to software or hardware engineering practices, the depicted elements and the functions thereof may be implemented on machines through computer executable media having a processor capable of executing program instructions stored thereon as a monolithic software structure, as standalone software modules, or as modules that employ external routines, code, services, and so forth, or any combination of these, and all such implementations may be within the scope of the present disclosure. Examples of such machines may include, but may not be limited to, personal digital assistants, laptops, personal computers, mobile phones, other handheld computing devices, medical equipment, wired or wireless communication devices, transducers, chips, calculators, satellites, tablet PCs, electronic books, gadgets, electronic devices, devices having artificial intelligence, computing devices, networking equipment, servers, routers and the like. Furthermore, the elements depicted in the flowchart and block diagrams or any other logical component may be implemented on a machine capable of executing program instructions. Thus, while the foregoing drawings and descriptions set forth functional aspects of the disclosed systems, no particular arrangement of software for implementing these functional aspects should be inferred from these descriptions unless explicitly stated or otherwise clear from the context. Similarly, it will be appreciated that the various steps identified and described above may be varied, and that the order of steps may be adapted to particular applications of the techniques disclosed herein. All such variations and modifications are intended to fall within the scope of this disclosure. As such, the depiction and/or description of an order for various steps should not be understood to require a particular order of execution for those steps, unless required by a particular application, or explicitly stated or otherwise clear from the context.

The methods and/or processes described above, and steps associated therewith, may be realized in hardware, software or any combination of hardware and software suitable for a particular application. The hardware may include a general-purpose computer and/or dedicated computing device or specific computing device or particular aspect or component of a specific computing device. The processes may be realized in one or more microprocessors, microcontrollers, embedded microcontrollers, programmable digital signal processors or other programmable device, along with internal and/or external memory. The processes may also, or instead, be embodied in an application specific integrated circuit, a programmable gate array, programmable array logic, or any other device or combination of devices that may be configured to process electronic signals. It will further be appreciated that one or more of the processes may be realized as a computer executable code capable of being executed on a machine-readable medium.

The computer executable code may be created using a structured programming language such as C, an object oriented programming language such as C++, or any other high-level or low-level programming language (including assembly languages, hardware description languages, and database programming languages and technologies) that may be stored, compiled or interpreted to run on one of the above devices, as well as heterogeneous combinations of processors, processor architectures, or combinations of different hardware and software, or any other machine capable of executing program instructions.

Thus, in one aspect, methods described above and combinations thereof may be embodied in computer executable code that, when executing on one or more computing devices, performs the steps thereof. In another aspect, the methods may be embodied in systems that perform the steps thereof, and may be distributed across devices in a number of ways, or all of the functionality may be integrated into a dedicated, standalone device or other hardware. In another aspect, the means for performing the steps associated with the processes described above may include any of the hardware and/or software described above. All such permutations and combinations are intended to fall within the scope of the present disclosure.

While the disclosure has been disclosed in connection with the preferred embodiments shown and described in detail, various modifications and improvements thereon will become readily apparent to those skilled in the art. Accordingly, the spirit and scope of the present disclosure is not to be limited by the foregoing examples, but is to be understood in the broadest sense allowable by law.

The use of the terms “a” and “an” and “the” and similar referents in the context of describing the disclosure (especially in the context of the following claims) is to be construed to cover both the singular and the plural, unless otherwise indicated herein or clearly contradicted by context. The terms “comprising,” “having,” “including,” and “containing” are to be construed as open-ended terms (i.e., meaning “including, but not limited to,”) unless otherwise noted. Recitation of ranges of values herein are merely intended to serve as a shorthand method of referring individually to each separate value falling within the range, unless otherwise indicated herein, and each separate value is incorporated into the specification as if it were individually recited herein. All methods described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The use of any and all examples, or exemplary language (e.g., “such as”) provided herein, is intended merely to better illuminate the disclosure and does not pose a limitation on the scope of the disclosure unless otherwise claimed. No language in the specification should be construed as indicating any non-claimed element as essential to the practice of the disclosure.

While the foregoing written description enables one of ordinary skill to make and use what is considered presently to be the best mode thereof, those of ordinary skill will understand and appreciate the existence of variations, combinations, and equivalents of the specific embodiment, method, and examples herein. The disclosure should therefore not be limited by the above described embodiment, method, and examples, but by all embodiments and methods within the scope and spirit of the disclosure.

Any element in a claim that does not explicitly state “means for” performing a specified function, or “step for” performing a specified function, is not to be interpreted as a “means” or “step” clause as specified in 35 U.S.C. § 112(f). In particular, any use of “step of” in the claims is not intended to invoke the provision of 35 U.S.C. § 112(f).

Persons of ordinary skill in the art may appreciate that numerous design configurations may be possible to enjoy the functional benefits of the inventive systems. Thus, given the wide variety of configurations and arrangements of embodiments of the present disclosure, the scope of the disclosure is reflected by the breadth of the claims below rather than narrowed by the embodiments described above.

Claims

1. A system for generating a cluster of correlated content from a primary online content object, comprising:

an automated crawler for crawling the primary online content object and storing a set of results from the crawling in a data storage facility;
a parser for parsing the stored set of results from the crawling to generate a plurality of key phrases and to generate a content corpus from the primary online content object;
a plurality of models for processing at least one of the plurality of key phrases and the content corpus, the plurality of models comprising at least two of a word2vec model, a doc2vec model, a latent semantic analysis (LSA) extraction model, and a key phrase logistic regression model, wherein the processing results in a plurality of content clusters representing topics within the primary online content object;
a content cluster data store for storing the plurality of content clusters; and
a suggestion generator for generating, using output from at least one of the plurality of models, a suggested topic that is similar to at least one topic among the plurality of content clusters and for storing the suggested topic and information regarding a similarity of the suggested topic to at least one content cluster in the content cluster data store.

2. The system of claim 1, wherein the plurality of models comprises at least one of the word2vec model, the doc2vec model, the latent semantic analysis extraction model, the latent semantic indexing model, the principle component analysis model, and the key phrase logistic regression model.

3. The system of claim 1, wherein the parser uses a machine learning system to parse the crawled content.

4. The system of claim 3, wherein the machine learning system iteratively applies a set of weights to input data, wherein the set of weights are adjusted based on a parameter of success, wherein the parameter of success is based on a success of suggested topics in an online presence of an enterprise.

5. The system of claim 4, wherein the machine learning system is provided with a training data set that is created based on human analysis of the crawled content.

6. The system of claim 1, wherein at least one of the plurality of models uses a machine learning system to cluster content.

7. The system of claim 6, wherein the machine learning system iteratively applies a set of weights to input data, wherein the set of weights are adjusted based on a parameter of success, wherein the parameter of success is based on a success of suggested topics in an online presence of an enterprise.

8. The system of claim 7, wherein the machine learning system is provided with a training data set that is created based on human clustering of a set of content topics.

9. The system of claim 1, wherein the suggestion generator uses a machine learning system to suggest topics.

10. The system of claim 9, wherein the machine learning system iteratively applies a set of weights to input data, wherein the set of weights are adjusted based on a parameter of success, wherein the parameter of success is based on a success of suggested topics in an online presence of an enterprise.

11. The system of claim 10, wherein the machine learning system is provided with a training data set that is created based on human creation of a set of suggested topics.

12. The system of claim 1, further comprising, an application for developing a strategy for development of online presence content, the application accessing the content cluster data store and having a set of tools for exploring and selecting suggested topics for online presence content generation.

13. The system of claim 12, wherein the application provides a list of topics that are of highest semantic relevance for an enterprise based on the parsing of the primary online content object.

14. The system of claim 13, further comprising, a user interface of the application for presenting the suggested topic, wherein the suggested topic is presented with an indicator of the similarity of the suggested topic to a content cluster topic as calculated by at least one of the plurality of models.

15. The system of claim 13, further comprising a user interface of the application for presenting the suggested topic, wherein the user interface facilitates generation of content related to the suggested topic.

16. The system of claim 15, wherein the user interface includes at least one of key words and key phrases that represent the suggested topic.

17. The system of claim 16, wherein the at least one of key words and key phrases are used to prompt a user with content for generation of online presence content.

18. The system of claim 17, wherein the online presence content is at least one of website content, mobile application content, a social media post, a customer chat, a frequently asked question item, a product description, a service description and a marketing message.

19. The system of claim 18, wherein the user interface for generation of content includes a plurality of suggested topics, each associated with an indicator of the similarity of a given suggested topic to a content cluster topic as calculated by at least one of the plurality of models.

20. The system of claim 1, wherein the data storage facility is a cloud-based storage facility.

21. The system of claim 1, wherein the data storage facility is a distributed data storage facility.

22. The system of claim 1, wherein the primary online content object is a web page of an enterprise.

23. The system of claim 1, wherein the primary online content object is a social media page of an enterprise.

24. The system of claim 1, further comprising, an application for developing a strategy for development of online presence content, the application accessing the content cluster data store and having a set of tools for exploring and selecting suggested topics for online presence content generation, wherein the application further accesses the content of a customer relationship management system.

25. The system of claim 24, wherein the application comprises a user interface for developing content regarding the suggested topic for presentation in a communication to a customer, wherein selection of the suggested topic for presentation to the customer is based at least in part on a semantic relationship between the suggested topic as determined by at least one of the plurality of models and at least one data record relating to the customer stored in the customer relationship management system.

Patent History
Publication number: 20180137203
Type: Application
Filed: Nov 9, 2017
Publication Date: May 17, 2018
Inventors: Tim Hennekey (Cambridge, MA), Asher Krim (Cambridge, MA), Angela DeFranco (Cambridge, MA), Theodore Cleveland (Cambridge, MA), Jonathan Meharry (Cambridge, MA), Matthew Barby (Cambridge, MA)
Application Number: 15/807,869
Classifications
International Classification: G06F 17/30 (20060101); G06N 99/00 (20060101);