Learning System for Pangenetic-Based Recommendations
An embodiment may involve storing, by a computing device and in a database, a set of pangenetic attributes of a set of individuals, wherein the pangenetic attributes of the set are respectively and statistically associated with products; based on the statistical associations between the pangenetic attributes and the products, determining, by the computing device, product recommendations for a second set of individuals; receiving, by the computing device and from the second set of individuals, a plurality of measures of satisfaction with the product recommendations; based on the plurality of measures of satisfaction, learning, by the computing device, an association between a subset of the pangenetic attributes and a particular product; and storing, by the computing device and in the database, the learned association, wherein the learned association provides a basis for subsequent recommendations of the particular product when a subsequent individual exhibits the subset of the pangenetic attributes.
This application is a continuation of and claims priority to U.S. patent application Ser. No. 17/212,906, filed Mar. 25, 2021 and hereby incorporated by reference in its entirety.
U.S. patent application Ser. No. 17/212,906 is a continuation of and claims priority to U.S. patent application Ser. No. 15/999,198, filed Aug. 17, 2018 and hereby incorporated by reference in its entirety.
U.S. patent application Ser. No. 15/999,198 is a continuation of and claims priority to U.S. patent application Ser. No. 14/708,415 (now abandoned), filed May 11, 2015 and hereby incorporated by reference in its entirety.
U.S. patent application Ser. No. 14/708,415 is a continuation of and claims priority to U.S. patent application Ser. No. 13/361,533 (now U.S. Pat. No. 9,031,870), filed Jan. 30, 2012 and hereby incorporated by reference in its entirety.
U.S. patent application Ser. No. 13/361,533 is a continuation of and claims priority to U.S. patent application Ser. No. 12/346,738 (now U.S. Pat. No. 8,108,406), filed Dec. 30, 2008 and hereby incorporated by reference in its entirety.
BRIEF DESCRIPTION OF THE DRAWINGSThe following detailed description will be better understood when read in conjunction with the appended drawings, in which there is shown one or more of the multiple embodiments of the present invention. It should be understood, however, that the various embodiments are not limited to the precise arrangements and instrumentalities shown in the drawings.
With the recent introduction and successes of single nucleotide polymorphism (SNP) sequencing, full genomic sequencing and epigenetic sequencing in humans, wide ranging applications that utilize the pangenetic attributes (genetic and epigenetic attributes) of individuals become possible. Herein we disclose methods, systems, software and databases for delivering personalized web search results and online recommendations based on the pangenetic attributes of individuals. These approaches rely on correlations determined between specific pangenetic attributes—also referred to in this disclosure as pangenetic data—and historical online behavior and preferences of users with respect to information and offerings contained in webpages. These correlations can be used to predict the future behavior and preferences of users. By linking pangenetic attributes to webpages as metadata, for example, and then comparing that metadata to the pangenetic profile of a user, web search engines can be enabled to retrieve information and offerings that better satisfy the user's interests, preferences and needs.
In one embodiment, the present invention is designed to utilize correlations between pangenetic attributes of users of the World Wide Web (WWW or web) and the feedback and behaviors they express with respect to web items (objects and content of the web) to improve the relevancy of web items retrieved and/or recommended for future users. More specific applications include those within the healthcare field involving medical information retrieval for diagnosis and treatment of patients whose pangenetic attributes are known. Personalization of information retrieval using pangenetic attributes of individuals has the potential to greatly increase efficiency and accuracy by minimizing resources that are spent retrieving less relevant results.
In another embodiment, a pangenetic based search and recommendation system has potential benefits for many applications, not the least of which is in providing user recommendations for online shopping. Take for example, a search for music earphones. The human ear exhibits great variability from individual to individual with respect to internal ear canal size and shape, external ear size and shape, and perception of sound frequencies across the audible range. Consequently, user ratings and preferences of earphones vary greatly, so that while many individuals may give the highest possible rating to a particular make and model of earphone, other individuals may find the frequency response and/or physical fit of that earphone to be unacceptable. So despite the availability of user feedback through existing online rating and recommendation systems, a future consumer (i.e., user) may be unable to identify the best product for themselves based on existing search and recommendation systems because they have little or no information regarding how similar they are to other consumers that rated the product highly. Since the individual characteristics of each person's hearing response (in the normal undamaged state) and ear structure are dictated predominantly by information encoded in their genome, a comparison of the relevant genetic and epigenetic attributes responsible for particular variations in ear morphology and frequency sensitivity of a current consumer with that of past consumers who found particular earphones to be outstanding can enable a much more reliable recommendation to guide the consumer directly to those earphones that will provide them with the highest level of satisfaction in terms of sound quality and fit. While the user may direct a search using keywords that specify what type of earphone is desired—earbud vs. in-ear canal earphone vs. ear-clip earphone vs. neck-band earphone vs. head-band earphone, etc. —incorporating a pangenetic similarity comparison between the current consumer and past consumers who found particular types of earphones most satisfactory can dramatically narrow down the selection of possible recommendations within any particular earphone category.
This approach helps ensure that the best choices for an individual consumer are recommended and also enables avoiding choices which would likely prove unsatisfactory. Benefits extend to others including product sellers who typically loose both time and money when a consumer purchases a product based on current recommender systems, is dissatisfied with the product, and then returns the product for a refund. Many other aspects of human perception and sensory preferences are dictated at least in part by individual pangenetic characteristics. Individual differences in taste, smell, and color perception, as well as preferences for certain types of melodies and instrument tonalities in music and particular thematic subject matter in movies and books, are associated with and can be extracted from our genetic and epigenetic makeups. Consequently, web based search and recommendation of a wide variety of items including foods, wines, perfumes, colons, music, movies and books can be significantly enhanced with respect to both efficiency and consumer satisfaction by evaluating consumers' pangenetic attributes. We envision a Pangenetic World Wide Web, or simply Pangenetic Web, in which search, navigation, online user behavior, item recommendation, and social networking are all guided by the pangenetic profiles of users.
Existing internet search engines rely on the preprocessing of webpage information prior to performing a user specified web search, in which nearly the entire content of the WWW is crawled by a ‘spider’ module (web crawler) which logs and retrieves webpages while an indexer module analyzes the word and syntactic content of each webpage in order to index and store that content in various datasets for rapid access during a user query. Words occurring in a webpage can be represented as word_IDs (word identifiers) which can be linked (using a lexicon hash table, for example) to doc IDs (document identifiers) that represent the webpage documents in which those words occur. The doc IDs may be stored a doclist index containing additional information which identifies the total number of occurrences of a word within a webpage and the context of each occurrence. The web search engine can then retrieve and rank webpages in part by matching user queried keywords to the respective word_IDs and following pointers (i.e. links) into the doclist index which contains word hitlists providing the number and context of occurrences of each keyword within each webpage document that is a hit for (i.e., contains) that keyword. The higher the number of occurrences and the more significant the context of each occurrence of a keyword in a webpage, the higher the relevancy score computed for the webpage, which can be referred to as an Information Retrieval (IR) score. Also, webpages that contain hits for a greater number of the user's query keywords receive a higher IR score than those that hit on fewer keywords. While the term webpage is used, the above and following concepts apply more broadly to web items that may not be webpages, such as indexes, data files and other documents. The term ‘web items’ refers to data contents of the internet and WWW.
One prominent internet search engine design can store a lexicon dataset representing millions of words using word_IDs and a hash table of pointers indicating which webpage documents each of the words occurs in. The search engine has access to forward index and inverted index datasets which record the total number of occurrences of each of the words in the respective webpages, as well as hitlist datasets which contain context information indicating the type of word occurrence in addition to the number of hits. Type of occurrence includes information such as whether the word occurs in the URL, title, body, or anchor hypertext of a particular webpage, as well as position of occurrence, font style, and relative font size of each occurrence of the word on the webpage. These context attributes are incorporated into a computation of a type-weight for each occurrence of a word. The type-weights make up a vector that is indexed by type. Also, the search engine counts the number of hits (i.e., number of occurrences) of each type in the hit list and then converts every count into a count-weight. Count-weights increase linearly with counts at first but quickly taper off, so that beyond a certain point increasing counts no longer contribute to the count-weight. The IR score for the document is computed as the dot product between the vector of count-weights and the vector of type-weights.
In addition to an IR score, the above search engine can compute a page ranking score using an algorithm which evaluates the quantity and quality of inbound hyperlinks of each webpage. The higher the quality and quantity of the inbound hyperlinks pointing to a webpage, the higher the page ranking score will be for that webpage. The search engine combines the hyperlink-based page ranking score with the IR score to derive a final rank for a webpage which determines whether that webpage will be listed in the Search Engine Results Page (SERP), and where in the listing it will appear based on its rank relative to other webpages listed in the SERP.
Herein we disclose that information retrieval systems, methods, software and databases, especially those involving web search engines, can be enhanced by incorporating an individual's pangenetic attributes to personalize results, thereby providing greater relevancy and accuracy of results for a particular user. The methods and systems disclosed herein can be used as stand alone methods and systems for pangenetic based web searching, or alternatively, as complementary methods and systems to more traditional methods and systems, such as those described above, to enable incorporation of pangenetic based web search as an add-on functionality. Pangenetic attributes can be contained within the source code of a webpage, or they may be externally associated with a webpage by storing them within a search engine lexicon and linking them to the webpage. The latter can require the parsing and indexing of a webpage in a first step, comparing the content of the compiled index from the webpage with a pangenetic correlation table to determine pangenetic attributes that should be linked to the webpage in a second step, and storing the relevant pangenetic attributes from the correlation table in association with the webpage in a third step.
Within this disclosure, the term ‘attribute’ refers a quality, trait, characteristic, feature relationship, property, factor, object, or data associated with or possessed by an individual, a group of individuals, an activity, a state, or datum. The term ‘pangenetic attribute’ refers to genetic and epigenetic attributes. The term ‘non-pangenetic attribute’ refers to attributes other than genetic or epigenetic attributes. In one embodiment, non-pangenetic attributes can be selected from the group consisting of physical attributes (i.e., attributes describing any material quality, trait, characteristic, property or factor of an individual present at the atomic, molecular, cellular, tissue, organ or organism level, excluding genetic and epigenetic attributes), behavioral attributes (i.e., attributes describing any singular, periodic, or aperiodic response, action, opinion or habit of an individual with respect to internal or external stimuli, including but not limited to an action, reflex, emotion or psychological state that is controlled or created by the nervous system on either a conscious or subconscious level), and situational attributes (i.e., attributes describing any object, condition, influence, or milieu that surrounds, impacts or contacts an individual). Examples of non-pangenetic attributes of a user include demographics such as their age, gender, ethnicity, marital status, and zip code.
Within this disclosure, the term ‘genetic attribute’ refers to attributes relating to a genome, genotype, haplotype, chromatin, chromosome, chromosome locus, chromosomal material, deoxyribonucleic acid (DNA), allele, gene, gene cluster, gene locus, genetic polymorphism, genetic mutation, genetic mutation rate, nucleotide, nucleotide base pair, single nucleotide polymorphism (SNP), restriction fragment length polymorphism (RFLP), variable tandem repeat (VTR), microsatellite sequence, genetic marker, sequence marker, sequence tagged site (STS), plasmid, transcription unit, transcription product, gene expression level, genetic expression (i.e., transcription) state, ribonucleic acid (RNA), or copy DNA (cDNA), including the nucleotide sequence and encoded amino acid sequence associated with any of the above.
Within this disclosure, the term ‘epigenetic attribute’ refers to attributes relating to modifications of genetic material that affect gene expression in a manner that is heritable during somatic cell divisions and sometimes heritable in germline transmission, but that is nonmutational to the DNA sequence and is therefore fundamentally reversible, including but not limited to methylation of DNA nucleotides and acetylation of chromatin-associated histone proteins.
The attribute profile of an individual, which can be a pangenetic profile, a non-pangenetic profile or a hybrid (combined) attribute profile containing both pangenetic and non-pangenetic attributes, is preferably provided to embodiments of the present invention as a dataset record whose association with the individual can be indicated by a unique identifier contained in the dataset record. An actual attribute of an individual can be represented in data form as an attribute descriptor in attribute profiles, records, datasets, and databases. Herein, both actual attributes and attribute descriptors may be referred to simply as attributes. In one embodiment, statistical relationships and associations between pangenetic and non-pangenetic attributes as determined by the methods disclosed herein are a direct result of relationships and associations between actual attributes of an individual, including behavioral attributes they exhibit (e.g., online computing and web surfing behaviors). Individuals' attribute profiles and attributes can be real and/or measurable, or they may be hypothetical and/or not directly observable.
To provide the pangenetic data needed for pangenetic based web searching, genetic and/or epigenetic sequencing of an individual can be performed, typically through SNP sequencing or genomic sequencing methods, and the pangenetic data obtained through sequencing can be associated with the individual as a pangenetic data profile (pangenetic profile), for example, that can be subsequently accessed by web search engines during a search query. Access and reading of an individual's pangenetic profile may involve various security measures such as authentication verification, as well as masking of certain pangenetic attributes to maintain anonymity of the individual with respect to identification by third parties or to maintain privacy with respect to particular pangenetic attributes which could reveal health conditions or traits that the individual desires to keep confidential.
Additionally, pangenetic attributes need to be linked or associated with webpages to enable retrieval of webpages that best match the individual's pangenetic profile. More specifically, in one embodiment pangenetic attributes can be linked to a webpage as a whole, based on the categories, topics or product offerings of the webpage. In another embodiment, pangenetic attributes can be linked to a webpage through associations with particular words or phrases in the text of a webpage. For example, the specific gene mutation responsible for the majority of cystic fibrosis disease cases is the ‘CFTR gene F508 mutation’ which can be linked to the phrase ‘cystic fibrosis’ appearing in text content of web pages. Similarly, other pangenetic attributes known to cause cystic fibrosis can simultaneously be linked to the same ‘cystic fibrosis’ phrase. While pangenetic attributes can exist as text on a webpage, it is expected that pangenetic attributes will be linked to webpages as hidden attributes in the form of metadata, such as meta-tags and meta-keywords that provide an additional layer of meaning and interpretation to the explicit content of webpages, consistent with visions for a semantic web. The pangenetic metadata associated with a webpage can be used to indicate that a user sharing some or all of those pangenetic attributes will be more likely to benefit or be satisfied with the content offered by that webpage, and it should therefore receive a higher rank or higher listing position in the search results presented to the user.
As an example, where a particular combination of pangenetic attributes are found to be causally associated with a subtype of multiple sclerosis (MS), each of those pangenetic attributes can be stored as meta-keywords linked to websites providing information about the that MS subtype, healthcare provider websites that advertise specialized treatment for that MS subtype, pharmacy websites that offer medications for treating that MS subtype, and website support groups that offer help and information for people suffering with that MS subtype. Despite the existence of several subtypes of the disease, when a user performs a web search regarding MS, the particular pangenetic attributes of the user (or an individual represented by a user, such as a patient represented by a healthcare professional who acts as the user) can be utilized by the search engine to ensure that the subset of websites offering information, products and services associated with the pertinent genetic subtype of MS are retrieved and presented with higher rank and listing position, regardless of whether the user knows or is even aware of the relevant subtype of the disease. In one embodiment, the search results listed on a SERP can include the pangenetic attributes of the user that were a match for each of the webpage documents listed in the SERP.
In one embodiment, knowing which specific pangenetic attributes should be linked to a webpage requires knowing which pangenetic attributes historically correlate with satisfaction and/or utility (i.e., relevance) of the webpage's content offerings for at least one subgroup of users. Data for correlations between consumers' pangenetic attributes and their preferences and satisfaction with webpage content offerings can be obtained through at least two approaches. One approach is to obtain the data by monitoring and recording the behaviors and feedback of consumers and then determining correlations of those behaviors and feedback ratings with pangenetic attributes of the consumers using pattern finding methods known to those of skill in the art. Passive collaborative filtering methods can be used to monitor the online behavior of users and then determine correlations between subsets of their pangenetic attributes and particular behaviors, while active collaborative filtering methods can be used to record feedback from users and then determine correlations between subsets of their pangenetic attributes and their self-reported feedback (e.g., preferences and satisfaction levels) with respect to online information and offerings. Data for determining correlations can also be derived from consumer purchasing behavior at bricks-and-mortar stores by analyzing frequent shopper (club member) card data and/or credit card purchase history data, also through passive filtering. Once correlations are determined, they can be stored in a database and later accessed to extract information that can be used to predict an individual consumer's online behavior, preferences, and feedback based on their pangenetic attributes alone or in combination with non-pangenetic attributes of the consumer such as demographics. The population of consumers from which this data is obtained can be a designated test population, or it can a group of individuals in a user population that have consented to having at least a portion of their pangenetic data accessed for the purpose of receiving personalized information search capabilities and content recommendations in the future. Cross-system collaborative filtering can be used to combine user behavior and preference data compiled across multiple recommender systems in a privacy preserving manner.
A second approach for acquiring pangenetic based correlations is to obtain the correlation data from professionals such as scientists, researchers, and healthcare providers who evaluate and publish associations between pangenetic data and health conditions, behaviors, products, and services for purposes such as disease diagnosis and treatment, scientific research, and product development (e.g., pharmaceutical development). Data from these and similar sources can be further analyzed and refined for extracting information for web search applications. In certain instances, third parties may have collections of pangenetic and non-pangenetic information, without having attempted to determine correlations between the data. Such data can be subsequently processed with pattern finding methods to derive correlations that can be also used for web search based information retrieval. The correlations acquired by any of the above approaches can be derived from either rigorous statistical associations, or less desirably from non-statistical (i.e., informal) trends and inferences.
Many of the embodiments of the inventions of the present disclosure involve the comparison of pangenetic data, often the pairwise comparison of individual genetic attributes, to determine pangenetic matches, overall quantity of pangenetic matches between pangenetic datasets, and pangenetic similarity scores. In one or more embodiments, pangenetic data can be identified as being a match (i.e., equivalent) if they are identical. In one or more embodiments, pangenetic data can be identified as being a match if they are different pangenetic attributes known to be statistically associated with the same item or item preference (e.g., the same level of satisfaction with a particular item). In one or more embodiments, pangenetic data can be identified as being a match if they differ only with respect to one or more silent pangenetic variations (i.e., pangenetic variations those that do not impact a phenotype, outcome or item preference).
In order to link pangenetic attributes to webpage content, pangenetic based correlations can be processed by one or more software modules designed to recognize webpages containing informational content represented by the correlations and then store links between those webpages and the respective pangenetic attributes represented by the correlations. Linking can be accomplished by storing word_IDs representing the pangenetic attributes within datasets accessed by search engines, such as the lexicon dataset compiled from webpages and later read by the search engine upon receiving a user query, and then creating pointers from the word_IDs to the doc IDs, contained within document index datasets (i.e., indexes), which represent the webpages that contain the content or concepts represented by those word_IDs. So while the pangenetic data can be external metadata that is not contained within the webpage document itself, it can be represented and stored for utilization by search engines in the same manner as both visible webpage text and non-displayed internal metadata contained within the webpage document source code. This allows pangenetic attributes to be incorporated into existing search engine systems used by Google, Yahoo!, Microsoft Network and others. In one embodiment, pangenetic attributes can be represented with word_IDs in a single lexicon dataset which also contains word Ds representing non-pangenetic words, wherein the word_IDs can be hash values. In one embodiment, pangenetic attributes can be represented with word_IDs in a separate lexicon dataset devoted solely to pangenetic attributes, wherein the word_IDs representing the pangenetic attributes can be hash values. In one or more embodiments, word_IDs representing pangenetic attributes can be referred to as pangenetic IDs.
Typically, the user would not be expected to enter pangenetic data into their search query as keywords. The search engine can have the ability to identify and/or authenticate the user and then read at least a portion of their pangenetic profile (masked or otherwise). As such, the pangenetic attributes can be hidden from view as metadata associated with the user and as metadata associated with webpages. These pangenetic attributes can be treated as secondary keywords by the search engine. In one embodiment, pangenetic attributes of a user (or another individual for whom the user is performing a web search on behalf of) are used as a secondary means of selecting and ranking webpages. In this particular approach, webpages can be initially retrieved based on user queried keywords or topics, and then pangenetic IDs that were previously stored in the lexicon and document index datasets can be used to compute pangenetic based scores for the respective webpages they are associated based on matches with the pangenetic profile of the user. More specifically, once a set of webpages have been retrieved based on user query keywords, the doc IDs of the retrieved webpages can be used for a reverse lookup of pangenetic IDs associated with those webpages. The associated pangenetic IDs can be identified and counted for each webpage and then compared to the pangenetic profile of the user to determine the number of pangenetic hits (i.e., quantity of pangenetic matches) that the pangenetic profile has for each webpage. The total number of pangenetic hits recorded between the user's pangenetic profile and a webpage can be divided by the total count of pangenetic word_IDs associated with the webpage to produce a pangenetic score in the form of percent match, for example. The pangenetic score can then be normalized to any scale, for example, a scale of 1 to 10 as used by the ranking system of one prominent web search engine. Following normalization, it is possible to generate a consolidated score by combining the pangenetic score with an IR score, a PageRank or a final SERP rank by averaging, weighted averaging or other mathematical computations known to those of skill in the art. In one embodiment, the resulting composite score can be used as a final rank for determining the selection and ordering of one or more webpages in the SERP.
In one embodiment, the results presented on a SERP can be grouped into separate areas to allow the user to delineate between those results that were selected and ranked based on pangenetic data and those results which were not derived based on pangenetic data. By creating separate groups of results in the SERP, the user is able to save time that would otherwise be spent sifting through less relevant results by focusing their attention on the group of results that best satisfies their needs. In one embodiment, the user is able to indicate to the search engine which group of results in the SERP they are more satisfied with. This user feedback can be used by the search engine in subsequent searches to further refine the results by learning what best meets the needs of the user. For example, if the user prefers the results obtained using pangenetic attribute matching, then the search engine can forego the presentation of webpages based solely on non-pangenetic keywords and only present webpage links on the SERP that were selected and ranked, at least in part, based on pangenetic attributes associated with user and the pangenetic attributes associated with webpages.
As indicated,
Referring again to the circumstance illustrated in
As mentioned previously, determining correlations between pangenetic attributes and webpage content can be based on recording the online behaviors and feedback of users whose pangenetic attributes are accessible to a search engine. In one embodiment, a user can login to a search engine which either has access to a stored copy of their pangenetic profile in an associated database server or can be authorized to access the pangenetic data on another database server dedicated to storing pangenetic data of individuals (e.g., a pangenetic server). In another embodiment, users can store a copy of the pangenetic profile as a secure file on the desktop or storage device of a computing device that was used to connect to the web search engine, and the file can be uploaded or accessed by the web search engine upon receiving authorization by the user through the computing device.
Active collaborative filtering can then be used to provide a peer-to-peer approach for deriving correlations between user satisfaction with online content and one or more pangenetic attributes by first gathering explicit feedback from users. Explicit feedback can be obtained by recording the rating of a webpage by a group of users and then correlating rating scores one at a time with the pangenetic attributes that statistically segregate with each score. For example, if users having a particular combination of pangenetic attributes are observed to predominantly rate a particular webpage as a score of 5 on a scale of 1-5, then that combination of pangenetic attributes can be linked to that webpage so it will be more highly ranked and/or more frequently recommended to a user who possesses some or all of those particular pangenetic attributes. Active feedback for the purpose of developing correlations can also be collected by asking a user to rank a collection of webpage items on a qualitative scale (e.g., favorite to least favorite), presenting a user with two or more webpage offerings and asking the user to choose the best one, or asking a user to choose a list of webpage items that they like, for example. Software methods and systems designed for active collaborative filtering to collect explicit feedback from users can incorporate feedback input fields on the webpages in which the pertinent web content appears, interactive pop-up windows, or questionnaires integrated into the web browser.
Passive collaborative filtering is an alternative to active filtering for collecting data on user behavior and preferences that can be used to derive correlations between pangenetic attributes of users and relevant webpage offerings. Passive filtering is based on the assumption that the preferences and opinions of users can be implied by their actions and requires observing and recording online user behavior to determine user feedback implicitly without necessitating user inputs to acquire feedback ratings and opinions. This has the result of reducing demands on the user while reducing variability and information biases that afflict other types of feedback systems, such as surveillance bias (e.g., only certain types of people are willing to take the time to provide active feedback, thereby potentially skewing feedback data so that it may be unrepresentative of the general population of users as a whole) and reporting bias (e.g., users may provide insincere or inaccurate feedback in an active peer-to-peer system where they aware that others can view their feedback). More specifically, passive feedback can be obtained by recording what webpages and content a user viewed, listened to, or otherwise interacted with; how long a user viewed, listened to or interacted with a webpage or specific content (i.e., user dwell time); how much scrolling a user did on a webpage; what items a user bookmarked, printed out or saved (e.g., in shopping cart) for later consideration; what items a user purchased; what items a user recommended to others; the number of times a user queried particular topics or clicked on particular links; and details of a user's social network to discover interests, likes and dislikes. Methods for collecting implicit feedback can utilize software operating through a web browser to record the above behaviors as well as for collecting characteristics of the user's social network. In one embodiment, the software for passively recording user behaviors and/or social network characteristics can be applets running in the web browser and communicating with an external or remote database server.
Both active and passive collaborative filtering can be implemented through social networking applications and websites. A version of social networking can be provided to enable participants to share their pangenetic data with others in the network, or designated subgroups within the network such as friends, friends of friends, or business contacts. The system can correlate patterns of those users' pangenetic attributes with their behaviors, interests, needs and goals as expressed through the network. Subsequently, the identified pangenetic patterns can be used as the basis for inviting new friends or contacts into a user's network or group of friends, for example, under the premise that possession of certain pangenetic attribute patterns will help ensure that the newly invited friend or contact will have compatible behaviors, interests, needs and goals. The pangenetic associated information collected from social networks can be used to provide necessary data to enable web searching systems and item recommender and prediction systems.
Web based recommender systems can be enabled using the same basic principles as web search methods and systems. However, instead of linking pangenetic data in association with webpages through a document index, as in a pangenetic web search system, in pangenetic based recommender and prediction systems the pangenetic data can be associated with specific items within an item feedback matrix. While some of the items represented in the matrix may be webpage links or webpage information content, at least some of the represented items can be physical products, establishments, or tangible services indicated by descriptors. The matrix can also contain feedback data (e.g., scores, ratings, preferences) derived from explicit or implicit user feedback. Feedback data contained in the matrix can be represented as values which are consistent with various kinds of rating scales and scoring systems that provide an indication of the level of user satisfaction, interest or preference for the items represented in the matrix. Feedback data can include item descriptors and item identifiers in addition to item ratings. Feedback data can also include non-pangenetic attribute descriptors that provide an indication of user behaviors, such as whether a link or ad was clicked on, whether an item was placed in a shopping cart or purchased by the user, or how long a user spent interacting with (i.e., dwelling on) a particular web based item. All of the above feedback data can be referred to collectively as ‘item preferences’. Within this disclosure, the phrase ‘item preferences’ also refers to indications of item type, item category, item class, item manufacturer, item name, item brand, item model designation, item size, item shape, item color, item usage, an item feature, an item function, an item design, an item accessory, item price, item vendor, item return policy, item warranty, an item advertisement, an item promotion, a website, a webpage, a document, and a level of satisfaction with respect to any of the above.
In one embodiment, an item preference can, either implicitly or explicitly, provide an indication of the user's attitude, interest, opinion, relationship, or behavior toward the corresponding web based item. For example, an item preference can potentially be positive (e.g., long dwell time on webpage X), negative (e.g., short dwell time on webpage X) or neutral (e.g., average dwell time on webpage X). Alternatively, an item preference may provide no indication of the user's attitude, interest, opinion, relationship, or behavior toward the corresponding web based item, so that the item feedback table simply indicates the existence (or absence) of correlations between web items and users, or between web items and pangenetic data associated with users, without indicating the underlying basis of the correlations.
Initially an item preference or a query request for a particular item or type of item (category of item) can be received as input from the user or, alternatively, provided by the system from a stored dataset such as a non-pangenetic profile of the user or the user's saved shopping cart, for example. The system can then access a separate table, such as an item index or classification table, to identify a set of items that are similar or related to the item preference of the user (e.g., fall into the same item category). Information contained in the item table which enables identification of items that are similar/related as well as which items fit into particular categories can be implemented in the form of keys, references, pointers, associated data links, lists, or hashes. The relationships between items can be previously determined by a variety of methods, and can even be based on correlations and data collected by an item recommender system such as those disclosed herein. In one embodiment, an item feedback matrix can serve as an item index by containing keys, references, pointers, associated data links, lists, or hashes that indicate the identities of similar and related items and even which item classes or item categories they fall into. Once a set of items has been identified using either the item feedback matrix or a dedicated item index, those items can be looked up in the feedback matrix to retrieve corresponding ratings and correlated pangenetic attributes.
The above predictions based on
Once similar rating records have been clustered, pattern finding methods known to those of skill in the art can be used to determine correlations been each rating pattern and one or more combinations of pangenetic attributes. This approach creates the pangenetic clusters illustrated in
Similar individuals share greater similarity of preferences and opinions (i.e., ratings) with respect to particular items as well as a higher degree of similarity at the pangenetic level, and a comparison of a new user's pangenetic attributes and previous item ratings with those of each of the clusters contained in the feedback matrix can be performed to identify the particular cluster that is most similar to the new user and will provide the greatest accuracy and certainty in predicting their preferences and satisfaction with other items. It should be noted that determination of clusters (subgroups) can be performed in steps, each step involving either clustering based on rating similarities or clustering based on pangenetic similarities. Each step refines the results, creating clusters that are more homogeneous with respect to the individual records they contain. And the order of the clustering steps can be varied when involving selection based on pangenetics versus selection based on item rating patterns, so as to either place priority on creating clusters having greater internal pangenetic similarity, or alternatively, creating clusters having greater internal item ratings similarity. For example, as described in the example with respect to
The approaches described herein enable greater certainty in making predictions about what items users will prefer in the future by forming clusters of similar individuals from which to derive those predictions, the clustering being based on pangenetic similarities as well as previous item preference/rating similarities. With respect to predicting satisfaction with products and services offered online, this enables both item-centric and user-centric approaches for application to item selection, rating and recommendation for a user (e.g., a consumer). An item-centric approach predicts a user's level of satisfaction with a particular item that the user indicated. A user-centric approach recommends, based on a first item indicated by the user, additional items that are likely to satisfy the user.
An item-centric method of web based item rating and recommendation relies on selection of a specific product by a user, either directly through a keyword query input, selection from a product listing, or through a series of dropdown menus (i.e., pull-down menus) which guide the user to select a particular product. Based at least in part on a comparison of the user's relevant pangenetic attributes with those of other users that have provided feedback directly or indirectly for the item, the system can predict 1) the level of satisfaction the user will experience with the item, and 2) the probability or likelihood that the user will achieve that level of satisfaction. More specifically, the system receives at least one item preference of the user and accesses their pangenetic profile (i.e., pangenetic data associated with the user). The system can then access a dataset (e.g., a feedback matrix dataset) containing one or more satisfaction levels associated with the item along with pangenetic data corresponding to each of the one or more satisfaction levels, where the pangenetic data is derived from a plurality of consumers that indicated their level of satisfaction with the item (e.g., relevant pangenetic attributes of consumers that aggregate (co-occur) with a high level of satisfaction are linked in association with that level of satisfaction in a pangenetic based item feedback matrix). A comparison is performed between the pangenetic profile of the user and the pangenetic data corresponding to each of the one or more satisfaction levels (e.g., contained within the pangenetic based item feedback matrix). To determine the level of satisfaction that the user will most likely experience with the item, probabilities for each of the satisfaction levels can be computed and the satisfaction level corresponding to the highest probability can be selected. For example, past users sharing relevant pangenetic attributes with the user are identified then partitioned into clusters containing users who experienced a particular satisfaction level with the item, one cluster for each possible satisfaction level. To compute each of the probabilities, the numerical count of users in a particular satisfaction level cluster are divided by the total number of pangenetically matched users (i.e., the sum of all individuals in all satisfaction level groups associated with the set of relevant pangenetic attributes). At an extreme where only a single satisfaction level is correlated with the relevant pangenetic attributes, the probability that the user will also experience that level of satisfaction with the item will be 1.0 (i.e., 100% chance). The system can transmit an indication that the user will have a 100% chance of experiencing that satisfaction level. In most cases due to real world variability, there will likely be two or more possible satisfaction levels that the user may experience. In those cases, the system can transmit output indicating that the user will experience the satisfaction level corresponding with the highest probability, along with that numerical probability or another useful statistical measure result that provides an indication of the degree of certainty of that outcome. In another embodiment, a plurality of satisfaction levels can be output along with numerical probabilities or other statistical measure results that provide an indication of the degree of certainty of each of those potential outcomes. The output can be transmitted to at least one destination selected from the group consisting of a user, a database, a dataset, a computer readable memory, a computer readable medium, a computer processor, a computer network, a printout device, a visual display, and a wireless receiver.
A user-centric method of web based item recommendation relies on specification of a product, product class or product category by a user (e.g., consumer), either directly through a keyword query input, a recommendation from a social network or traditional recommender system, selection from a product listing, or selection from a series of dropdown menus (i.e., pull-down menus) which guide the user to make the selection. Based at least in part on a comparison of the user's relevant pangenetic attributes against those of other users that have provided direct or indirect feedback for items similar to the one indicated by the user, the system can 1) identify one or more specific items for consideration by the user, and 2) indicate the likely satisfaction level that the user will experience with each item as well as the associated probabilities, likelihoods, or percent chance that the user will achieve those satisfaction levels. An example of a suitable application for a user-centric item recommendation system is recommendation of music earphones as disclosed previously.
Another exemplary application is a web based restaurant recommendation guide which provides personalized restaurant recommendations based on, for example, both a user's query for a certain type of cuisine (e.g., Chinese, Cuban, French, Italian, Mexican, etc.) and their pangenetic attributes which inherently determine their preferences for certain tastes and smells that at least partially dictate the overall experience that an individual has at a restaurant. The feedback that users provide can even be linked in association with specific dishes on the menus of those restaurants to further refine the recommendation system. By incorporating or interfacing with a social network system that permits the feedback and recommendation system to access the pangenetic profiles of friends and acquaintances, a pangenetic based online restaurant guide can be enabled that is capable of making restaurant recommendations based on the pangenetic traits of all of the individuals in a dining party, thereby arriving at a restaurant recommendation that will best satisfy the innate preferences of all of the members of that party. In one embodiment, the system can accomplish this task by first accessing a pangenetics-item feedback matrix for restaurant and food preferences in order to identify corresponding pangenetic attributes that are relevant to restaurant and food preferences. The system can then access the pangenetic profiles of the individuals of the dining party to derive a consensus set of pangenetic attributes constituting the intersection of relevant pangenetic attributes for restaurant and food preferences that are shared among the individuals in the dining party. The pangenetic consensus set of attributes for the dining party is then compared with the pangenetic based item feedback matrix to identify the restaurant having associated pangenetic data that best matches the pangenetic consensus of the dining party, thereby resulting in recommendation of a restaurant that will best satisfy the dining party as a whole. Essentially the same approach can be used in the online selection and/or recommendation of numerous products and services including, but not limited to, alcoholic beverages, music, movies, vacation packages, hobbies and gift selection.
In one embodiment of a user-centric approach to web based item recommendation, the specific items identified for the user can include just the best choices, or a full range of choices including those identified as inappropriate for the user. By indicating corresponding satisfaction levels to the user and delineating good, average, and poor choices from each other, a user can clearly and quickly see what items will best meet their needs and which will not. Further groupings can be created based on such parameters as price, availability, and retailer rating/reliability. More specifically, a user-centric system receives at least one item preference of the user and accesses the pangenetic profile of the user (i.e., pangenetic data associated with the user). The system then accesses a dataset (e.g., an item feedback matrix dataset) containing a plurality of items matching the at least one item preference of the user, for example, a variety of brands and models of items falling within the broader item category indicated directly or indirectly by the user. Each of the plurality of items can be associated with (correlated with) pangenetic data derived from previous users that had experience with the items (e.g., pangenetic data correlating with good experiences and/or opinions of each of the items). The system performs a comparison between the pangenetic profile of the user and the pangenetic data corresponding to each of the plurality of items (contained within the pangenetic based item feedback matrix) to identify pangenetic matches. Particular items associated with pangenetic data that best matches the pangenetic data of the user can be transmitted as output, and can include associated probable satisfaction levels. The items can be ordered or ranked based on degree of pangenetic match and/or the relative magnitudes of the associated satisfaction levels. If one or more of the associated satisfaction levels indicate average or poor satisfaction, for example, the items corresponding to those lower satisfaction levels can be delineated from items predicted to provide high levels of satisfaction using visual or localization cues, such as different locations on a SERP, different coloration, highlighting, or symbols (i.e., markers) such as icons or flags. The output can be transmitted to at least one destination selected from the group consisting of a user, a database, a dataset, a computer readable memory, a computer readable medium, a computer processor, a computer network, a printout device, a visual display, and a wireless receiver.
In addition to being used for providing item recommendations to users, the disclosed inventions can also be used to predict which online offerings (i.e., webpage items) a user will ultimately choose to interact with or purchase. As such, the methods, systems, databases and software of the instant disclosure can be used for generating predictions of user behavior and user purchases. As previously described, the items represented in a user based item feedback matrix such as that of
When passive data gathering is used to collect data for a behavioral item feedback matrix—wherein passive data gathering entails monitoring users' online behavior to track and record what each user clicks on, opens, reads, plays, views, prints, purchases, recommends, and shares online through the internet—and that data is then correlated with users' pangenetic attributes, a number of different types of predictions can be made about pangenetically similar users including their likelihood of visiting a particular webpage; likelihood of clicking on a hyperlink on a particular webpage; likelihood of clicking on an advertisement on a particular webpage; likelihood of drilling deeper into a website from a landing webpage; likelihood of interacting with audio or video content on a webpage; likelihood of purchasing a product or service offered by a webpage; and likelihood of recommending or forwarding an online offering to someone else. While the term likelihood is used, a variety of statistical association measures can be used for determining level (degree) of certainty or strength of association values including, but not limited to, probability (a.k.a. absolute risk), relative risk, odds (a.k.a. likelihood), and odds ratio (a.k.a. likelihood ratio). Statistical significance of values computed for statistical associations can also be obtained using other statistical measures such as standard error, standard deviation and confidence intervals. Predetermined threshold values can be applied to any of the above in order to limit correlations stored in an item feedback matrix to those that are deemed to have an acceptable or high degree of strength, certainty, and/or statistical significance. Additionally, mathematical measures such as the cosine similarity measure, linear regression and slope one regression can be used to identify the most appropriate items to recommend to an individual based on data contained in a behavioral item feedback matrix (i.e., an item feedback matrix).
Items that are predicted to be of interest to an individual based on the results of one or more of the recommender methods disclosed herein can be used as the basis for going back and selecting pangenetic attributes from the item feedback matrix (those that are correlated with the items of interest), and then associating (linking) those pangenetic attributes with webpages that contain one or more of the items of interest. In one embodiment, correlations between item preferences and pangenetic attributes from an item feedback matrix can be used as the basis for selecting pangenetic attributes for incorporation into web based search indexes and hitlists containing entries that point to webpages containing the items of interest. In one embodiment, a personalized webpage search index can be generated for a user in real time or near real time, upon receiving a user query, by using data and/or results derived from an item feedback matrix. This approach, when conducted with the most recent data available for the current user as well as previous users whose behaviors and preferences comprise the item feedback matrix, has the potential to provide the most relevant and targeted web search results for the current user. As a result, recent trends that cause shifts in correlations between pangenetic makeup and web content can be rapidly detected, predicted and incorporated into personalized webpage searches to generate up-to-date search results having the highest relevance for the user.
One approach for determining pangenetic attributes that correlate (i.e., co-associate, co-occur) with particular web based items, item ratings, and online user behaviors to generate an item feedback matrix can initially involve determining the intersection of pangenetic attributes for every possible combination of pangenetic profiles that can be formed from a set of pangenetic profiles. Briefly, this method requires forming all possible 2-tuple combinations of pangenetic profiles from the set of pangenetic profiles and comparing the pangenetic profiles within each 2-tuple. The largest combination of attributes that occurs within both pangenetic profiles of each 2-tuple is identified and stored as the largest pangenetic attribute combination co-occurring in that 2-tuple. Next, all possible 3-tuple combinations of the pangenetic profiles are formed. For each 3-tuple, the largest pangenetic attribute combination occurring within all three pangenetic profiles of that 3-tuple is identified and stored as the largest pangenetic attribute combination co-occurring in that 3-tuple. Next 4-tuples are formed and the largest co-occurring pangenetic attribute combination within each 4-tuple identified. This approach is repeated for progressively larger tuples by simply increasing the n-tuple size by one at each step. Computational burden can be reduced in part by incorporating a requirement that prevents the formation of any (n+1)-tuple combination from an n-tuple combination for which no co-occurring pangenetic attribute combination was identified. With this requirement, the identification of pangenetic combinations is terminated at the point when every n-tuple generated at a particular step is null for possession of at least one co-occurring pangenetic attribute combination (i.e., not one of the newly generated n-tuple combinations contains pangenetic profiles which share at least pangenetic attribute combination in common).
The shortcomings of the immediately previous method are two-fold. The first shortcoming relates to the very large number of pangenetic comparisons that may be required in the initial step alone. For example, when comparing 1,000 pangenetic profiles comprising 1 million SNPs per pangenetic profile, 5×1011 individual pangenetic attribute comparisons are required just for the initial step of comparing all possible pairs of the 1,000 pangenetic profiles ((5×105 possible pairings of pangenetic profiles)×(106 attributes per pangenetic profile)=5×1011 individual pangenetic attribute comparisons). If each pangenetic profile contained the full complement of 3 billion nucleotides of whole genomic sequence, then 1.5×1015 individual pangenetic attribute comparisons would be required in the first step of comparing all possible pairs of pangenetic profiles, resulting in a computationally intensive method requiring a supercomputer. The second shortcoming of this particular method is that it only identifies the largest pangenetic combination that is shared within each n-tuple combination of pangenetic profiles. The method does not enable identification of smaller pangenetic combinations, contained within each largest identified pangenetic combination, which may be responsible for the bulk of the strength of association between the larger pangenetic combinations and an indicated item preference of a user. A smaller pangenetic combination would not be identified by this particular method unless there is at least one individual that possesses only that smaller pangenetic combination in their pangenetic profile without having any of the other attributes that are present in the larger pangenetic combination. The above shortcomings limit the usefulness of this approach for determining pangenetic attribute combinations associated with one or more non-pangenetic attributes and make it a nonpreferred method.
It is therefore desirable that a method for determining combinations of pangenetic attributes that correlate with particular items or items ratings be able to identify not only the largest pangenetic combinations shared by pangenetic profiles, but also smaller pangenetic combinations as well, to determine the smallest and most strongly associated core pangenetic combinations that co-associate with a particular item, item rating, or item rating pattern (i.e., item preferences). A core pangenetic combination can, for example, be defined as the smallest subset of attributes having a statistically significant association with one of those entities. An alternative definition of a core pangenetic combination can be the smallest subset of pangenetic attributes that confers an absolute risk of association above a predetermined threshold. Other definitions of a core pangenetic combination can be formulated, for example, based on needs arising from user implementation, population and sample sizes, statistical constraints, or available computing power. Identification of this core pangenetic combination and its pangenetic attribute content is of great importance because a core pangenetic combination should contain pangenetic attributes that directly correlate with (i.e., are strongly associated with) a particular preference or rating pattern for one or more items.
In one embodiment of a computationally efficient method for determining combinations of pangenetic attributes that correlate with particular items, item ratings, or online user behaviors, the pangenetic attribute combinations are identified without the need for supercomputing, even when evaluating populations comprising millions of individuals and pangenetic profiles each comprising billions of attributes. To help reduce computational burden, a representative subset of pangenetic profiles can be selected from a larger set of profiles. The representative subset of pangenetic profiles can be used to identify candidate pangenetic attribute combinations associated with an item or item rating pattern much more efficiently when the full set of pangenetic profiles being considered is large (e.g., thousands or millions of pangenetic profiles). The selection of a subset of pangenetic profiles can be a random selection or another appropriate and/or statistically valid method of selection. The size of this subset can vary, but for example, can comprise as few as 10 or as many as 100 or more pangenetic profiles. There may be several different core pangenetic attribute combinations associated with a particular item preference or rating pattern for a group of items, for example. In a case where three or fewer core pangenetic attribute combinations are expected to be associated with an item or item rating pattern, as few as 10 randomly pangenetic profiles may enable the identification of those pangenetic attribute combinations. If it is expected that more than three core pangenetic attribute combinations are associated with an item or item rating pattern, then selecting a higher number of pangenetic profiles for the subset may be advisable.
In one embodiment of a computationally efficient method for determining pangenetic attribute combinations that correlate with a particular item preference, a beneficial step involves eliminating from consideration those pangenetic attributes which show association with both satisfaction and dissatisfaction for the item, and therefore cannot specifically correlate with item satisfaction over item dissatisfaction. This can be accomplished by comparing a subset of pangenetic profiles associated with item satisfaction to an appropriately selected (e.g., randomly selected) subset of pangenetic profiles associated with item dissatisfaction to eliminate pangenetic attributes that co-occur at a high frequency in association with item dissatisfaction (at a frequency of 80% or greater, for example) and are therefore unlikely to have a direct positive correlation with the desired item or rating pattern. Failure to eliminate these pangenetic attributes may add complexity to a pangenetic attribute combination without increasing its strength of correlation with the desired item or rating pattern, thereby reducing the certainty and accuracy of predictions and recommendations that are based on those pangenetic attribute combinations. It is therefore advantageous to eliminate these pangenetic attributes in an initial step so that the core pangenetic attribute combinations can be determined as quickly, efficiently and accurately as possible. While not absolutely required, this approach greatly increases efficiency when comparing numerous pangenetic profiles each containing large numbers of attributes, as for example when processing whole genomic attribute profiles of a large population where each pangenetic profile can contain 6 billion nucleotide attributes which on average will be 99.9% identical between any given pair of individuals. The subset of pangenetic attributes identified by this approach can be referred to as a set of candidate pangenetic attributes. A set of candidate pangenetic attributes can be further processed to identify combinations of the candidate pangenetic attributes that correlate with the item or rating pattern of interest as described below.
In a further embodiment of a computationally efficient method for compiling co-associating attributes, a divide-and-conquer approach can be used to greatly increase the efficiency of identifying pangenetic attribute combinations that are associated with an item preference. This approach partitions (i.e., subdivides, divides, or segments) a set of pangenetic profiles into subsets of pangenetic profiles, each subset comprising those pangenetic profiles that share the most pangenetic attributes in common. Each iteration of the divide-and-conquer approach partitions the set (or subset) of pangenetic profiles associated with the item preference of interest into at least two subsets, and multiple iterations can be used to generate additional subsets. The pangenetic profiles that comprise each subset are evaluated to identify the largest pangenetic attribute combination that they share in common. Initially a first pangenetic profile is selected from the set of pangenetic profiles associated with the item preference of interest. As an example using a set of 10 pangenetic profiles, a first pangenetic profile is selected from the set of 10 pangenetic profiles. This first pangenetic profile, pangenetic profile #1, can then be used in a series of pairwise comparisons with each of the other pangenetic profiles in the set. In a preferred embodiment, all possible pairwise comparisons of the first pangenetic profile with the other pangenetic profiles are performed. In this example, the possible pairings are {#1, #2}, {#1, #3}, {#1, #4}, {#1, #5}, {#1, #6}, {#1, #7}, {#1, #8}, {#1, #9}, and {#1, #10}, for a total of nine pairwise pangenetic profile comparisons. If each of the 10 individuals has an associated pangenetic profile consisting of 106 pangenetic attributes, then this example would require performing 9×106 individual attribute comparisons (9 paired pangenetic profiles×106 attributes per pangenetic profile). Sets of attributes (i.e., pangenetic attribute combinations) constituting the intersection in content between the two pangenetic profiles of each pairwise comparison can be stored to generate a first set of pangenetic attribute combinations, wherein each pangenetic attribute combination can be stored in association with the pair of pangenetic profiles from which it was generated. The largest pangenetic attribute combination occurring in the first set of pangenetic attribute combinations can be identified and referred to as the primary pangenetic attribute combination. As an example, if the largest intersection of attributes occurs in the paired comparison {#1, #4}, then this intersection produces the primary pangenetic attribute combination for the set of pangenetic profiles #1-#10 under consideration. This primary pangenetic attribute combination can serve as the basis for partitioning the set of pangenetic profiles into subsets of pangenetic profiles, one of which can include pangenetic profiles that are most similar to #1 and #4. This is achieved by using the primary pangenetic attribute combination in a series of pairwise comparisons with each of the other pangenetic attribute combinations previously stored in the first set of pangenetic attribute combinations. Sets of attributes constituting the intersection in content between the two pangenetic attribute combinations of each pairwise comparison are stored to generate a second set of pangenetic attribute combinations, wherein each pangenetic attribute combination is stored in association with the three corresponding pangenetic profiles from it was generated. Continuing from the example above, by using the primary pangenetic attribute combination corresponding to {#1, #4} in pairwise comparisons with each of the other pangenetic attribute combinations in the first set corresponding to {#1, #2}, {#1, #3}, {#1, #5}, {#1, #6}, {#1, #7}, {#1, #8}, {#1, #9}, and {#1, #10}, the resulting eight intersections of attributes corresponding to the triplets of pangenetic profiles {#1, #2, #4}, {#1, #3, #4}, {#1, #4, #5}, {#1, #4, #6}, {#1, #4, #7}, {#1, #4, #8}, {#1, #4, #9}, and {#1, #4, #10} can be stored as a second set of pangenetic attribute combinations. The set of 10 pangenetic profiles can then be divided (i.e., partitioned) into at least two pangenetic profile subsets based on the sizes of the pangenetic attribute combinations in the second set as compared with the size of the primary pangenetic attribute combination. More specifically, the pangenetic profiles which correspond to pangenetic attribute combinations in the second set of pangenetic attribute combinations that are equal to or larger than a predetermined fraction of the size of the primary pangenetic attribute combination, for example those that are at least 50% of the size of the primary pangenetic attribute combination, can be assigned to a first subset of pangenetic profiles, while the pangenetic profiles corresponding to the remaining pangenetic attribute combinations which are less than the predetermined fraction of the size of the primary pangenetic attribute combination, for example those that are less than 50% of the size of the primary pangenetic attribute combination, can be assigned to a second subset of pangenetic profiles. By doing this, the pangenetic profiles that are most similar to the two pangenetic profiles which generated the primary pangenetic attribute combination in the current iteration are clustered together into the first subset of pangenetic profiles. The choice of 50% as the predetermined fraction of the size of the primary pangenetic attribute combination is arbitrary in these examples, and can be adjusted higher or lower to respectively increase or decrease the degree of similarity desired of pangenetic profiles that are partitioned into a subset. As such, the predetermined fraction of the size of the primary pangenetic attribute combination essentially acts as a stringency parameter for including and excluding pangenetic profiles from the subsets, and it can have substantial influence on the number of attributes profiles partitioned into each subset, as well as the number of subsets that will ultimately be formed.
Continuing with the above example in which the primary pangenetic attribute combination derived from comparison of pangenetic profiles #1 and #4, the first subset will include pangenetic profiles #1 and #4 as well as any other pangenetic profiles that correspond with pangenetic attribute combinations in the second set that are at least 50% of the size of that primary pangenetic attribute combination. For this example, assume that pangenetic profile triplets {#1, #4, #6} and {#1, #4, #9} are associated with pangenetic attribute combinations in the second set that are equal to or greater than 50% of the size of the primary pangenetic attribute combination. Pangenetic profiles #6 and #9 would therefore be included in the first subset of pangenetic profiles along with pangenetic profiles #1 and #4 (first subset={#1, #4, #6, #9}). Pangenetic profiles #2, #3, #5, #7, #8, and #10 on the other hand are assigned to the second subset because they each share less than 50% of the attributes in common with the primary pangenetic attribute combination. The above is illustrated graphically in
The pangenetic profiles in the second subset can then be processed through a reiteration of the method, where the second subset can be redesignated as the subset of pangenetic profiles, a new first pangenetic profile can be selected from this subset of pangenetic profiles, a new first set of pangenetic attribute combinations can be generated from pairwise comparison of the first pangenetic profile with the other pangenetic profiles of this subset, a new primary pangenetic attribute combination can be determined, a new second set of pangenetic attribute combinations can be generated from the pairwise comparison of the primary pangenetic attribute combination with the other pangenetic attribute combinations in the first set of pangenetic attribute combinations, and the current subset of pangenetic profiles can be divided into a new first subset and a new second subset based on the comparison of each of the pangenetic attribute combinations in the second set with the primary pangenetic attribute combination. The largest pangenetic attribute combination occurring in all the pangenetic profiles of the new first subset can be stored as a candidate pangenetic attribute combination in the set of candidate pangenetic attribute combinations. Reiteration can continue in this manner, beginning with the current second subset redesignated as the subset of pangenetic profiles, until an iteration is reached where a new second subset containing one or more pangenetic profiles cannot be formed (i.e., the new second subset formed is an empty/null set).
To exemplify this reiteration process continuing with the pangenetic profiles from the above example, the second subset comprising pangenetic profiles #2, #3, #5, #7, #8, and #10 is redesignated as the subset of pangenetic profiles, and pangenetic profile #2 can be selected as a first pangenetic profile for this subset. The selected pangenetic profile #2 is then used to determine the attribute intersections of the five pairwise pangenetic profile comparisons corresponding to {#2, #3}, {#2, #5}, {#2, #7}, {#2, #8}, and {#2, #10}. Assuming pangenetic profiles #5 and #10 are found to cluster with pangenetic profile #2 as a result of evaluating the intersection in attribute content of the pairwise comparisons as described above, partition of this subset of pangenetic profiles creates a new first subset containing pangenetic profiles #2, #5 and #10, and a new second subset containing pangenetic profiles #3, #7, and #8. The largest pangenetic attribute combination corresponding to the intersection of pangenetic profiles #2, #5 and #10 is stored as a candidate pangenetic attribute combination in the set of candidate pangenetic attribute combinations. Reiterative processing of the second subset comprising pangenetic profiles #3, #7 and #8 proceeds with pangenetic profile #3 selected as the first pangenetic profile, which is then used to perform the two pairwise comparisons {#3, #7} and {#3, #8}. Assuming a comparison finds these three pangenetic profiles to cluster together, no new second subset can be generated. The largest pangenetic attribute combination corresponding to the intersection of pangenetic profiles #3, #7 and #8 is stored as a candidate pangenetic attribute combination in the set of candidate pangenetic attribute combinations. Frequencies of occurrence of each of the candidate pangenetic attribute combinations that were generated and stored in the set of candidate pangenetic attribute combinations can be determined for a set of pangenetic profiles associated with a particular item preference (i.e., a query-attribute-positive set) and in a set of pangenetic profiles that are not associated with a particular item preference (i.e., a query-attribute-negative set) so that strength of association of the candidate pangenetic attribute combinations with the item preference (i.e., the query attribute) can be determined and used as desired for other methods.
By clustering the pangenetic profiles into subsets, the divide-and-conquer approach substantially increases efficiency because no comparisons of pangenetic profiles are performed across subsets. Consequently, the number of pangenetic profile comparisons required by the divide-and-conquer approach is much less than that required by just the first step of the nonpreferred method described previously which compares all possible combinations of pangenetic profiles that can be formed from a set of pangenetic profiles. To demonstrate this, consider again the above example which used the divide-and-conquer approach to partition a set of 10 pangenetic profiles into three nearly equally sized subsets of pangenetic profiles to generate three candidate pangenetic attribute combinations. That example required a total of 16 pairwise comparisons of pangenetic profiles over three iterations (9+5+2=16). In contrast, the nonpreferred method would require 45 pairwise comparisons of pangenetic profiles in its first step (10 choose 2=45). When processing a much larger set, for example a set of 1,000 pangenetic profiles, the divide-and-conquer approach would require 1,996 pairwise pangenetic profile comparisons in a scenario in which the 1,000 pangenetic profiles cluster into three nearly equally sized subsets of pangenetic profiles (999+665+332=1,996), while the nonpreferred method would require 499,500 pairwise comparisons in its first step (1,000 choose 2=499,500). Therefore, as the number of pangenetic profiles in the initial set increases, the computational burden of the divide-and-conquer approach increases linearly, while the computational burden of the nonpreferred method increases exponentially. This represents a tremendous advantage in computational efficiency of the divide-and-conquer approach. While methods for determining co-occurring attribute combinations are primarily described herein with respect to pangenetic attributes and pangenetic profiles, they equally apply to non-pangenetic attributes and non-pangenetic attribute profiles, as well as attribute profiles that contain both non-pangenetic attributes and pangenetic attributes.
In one embodiment, a plurality of sets of attributes (e.g., pangenetic profiles) are evaluated and clustered into subsets according to the divide-and-conquer approach described herein, wherein the subsets formed can be mapped to a first half and second half of the plurality of sets of attributes by clustering the two most similar attribute sets with other attribute sets that are highly similar to those two. Alternatively, other clustering methods which look for similarities and which provide a basis for aggregation of attributes can be used (e.g., seeding). In one embodiment all attributes are given binary values (present or not present) and the clustering is performed based on the presence of combinations of attributes within the group of pangenetic profiles associated with the item preference specified. In an alternate embodiment some attributes are continuous or multi-valued (e.g. obesity) and described on a continuous value or discrete multi-valued basis. A number of clustering algorithms, including but not limited to K-means clustering, as well as determination of similarity measures including geometric distance or angles can be used to determine one or more of the subsets. Additionally, seeding techniques can be used to generate subsets, for example by requiring that one or more pangenetic profiles that nucleate formation of one or more subsets contain a minimal specified or predetermined set of attributes (i.e., a core pangenetic attribute combination). In one embodiment, if a particular attribute or set of attributes is known to be causally associated with a particular outcome (i.e., an item preference), that attribute or set of attributes can be used as the basis for clustering attributes, pangenetic profiles, and/or individuals into subsets (clusters).
Each candidate pangenetic attribute combination generated by the divide-and-conquer approach constitutes the largest combination of attributes occurring within all of the pangenetic profiles of a particular subset of pangenetic profiles. As explained previously, the largest pangenetic attribute combination identified may contain smaller combinations of attributes (i.e., core pangenetic attribute combinations) that also co-associate with specified item preference. A further embodiment of a computationally efficient method for compiling co-associating attributes is able to identify core pangenetic attribute combinations, contained within a larger candidate pangenetic attribute combination for example, using a top-down approach. These smaller core pangenetic attribute combinations, by virtue of the way in which they are identified, can contain attributes which are the most essential attributes for contributing to co-association with the item preference. Candidate pangenetic attribute combinations determined by the divide-and-conquer approach are preferably used as the starting point for identifying core pangenetic attribute combinations. The following top-down approach to identifying a core pangenetic attribute combination begins with generating subcombinations of attributes selected from a candidate pangenetic attribute combination, the number of attributes in each subcombination being less than that of the candidate pangenetic attribute combination. In one embodiment, the number of attributes in each attribute subcombination is one less than the candidate pangenetic attribute combination from which the attributes are selected. In a further embodiment, all possible attribute subcombinations containing one less attribute than the candidate pangenetic attribute combination are generated, so that for every attribute comprising the candidate pangenetic attribute combination there will be exactly one attribute subcombination generated which lacks that attribute. The frequencies of occurrence of each of the candidate pangenetic attribute combinations and attribute subcombinations can be determined in the set of pangenetic profiles associated with the specified item preference (i.e., the query-attribute-positive group) and in the set of pangenetic profiles that are not associated with specified item preference (i.e., the query-attribute-negative group), and based on the frequencies of occurrence, each subcombination having a lower strength of association with the specified item preference than the candidate pangenetic attribute combination from which it was generated is identified. A lower strength of association would be expected to result from an increased frequency of occurrence, in the query-attribute-negative set of pangenetic profiles, of the attribute subcombination relative to the candidate pangenetic attribute combination from which it was generated. Because each attribute subcombination is missing at least one attribute relative to the candidate pangenetic attribute combination from which it was generated, a missing attribute can be readily identified as a core attribute responsible for the lower strength of association since it constitutes the only difference between the attribute subcombination and the candidate pangenetic attribute combination. By evaluating all of the attribute subcombinations that are generated from a particular candidate pangenetic attribute combination with respect to strength of association with the specified item preference as above, a set of attributes constituting a core pangenetic attribute combination can be identified. The identified core attributes can be stored as candidate attributes, or as a combination of candidate attributes (i.e., a candidate pangenetic attribute combination). Various combinations of the core attributes can be independently evaluated for frequencies of occurrence and strength of association with the specified item preference to determine a set containing even smaller pangenetic attribute combinations comprised of subsets of core attributes, each of these even smaller core pangenetic attribute combinations potentially having very different strengths of association with the specified item preference. When compiled into pangenetic attribute combination databases, these numerous small core pangenetic attribute combinations can enable methods of predisposition prediction and predisposition modification to provide considerably more accurate, comprehensive, flexible and insightful results.
In another embodiment of a computationally efficient method for compiling co-associating attributes, a bottom-up approach is used for determining pangenetic attribute combinations that are associated with an item preference. This bottom-up approach generates sets of attributes in stages, starting with small pangenetic attribute combinations and progressively building on those to generate larger and larger pangenetic attribute combinations. At each stage, only the pangenetic attribute combinations that are determined to be statistically associated with the specified item preference are used as building blocks for the next stage of generating larger pangenetic attribute combinations. The attributes used for generating these pangenetic attribute combinations can be selected from an pangenetic profile, from an pangenetic attribute combination, from a set of candidate attributes, or from a candidate pangenetic attribute combination, for example. At each stage, all of the pangenetic attribute combinations that are generated contain the same number of attributes, and can therefore be referred to as a set of n-tuple combinations of attributes, where n is a specified positive integer value designating the number of attributes in each n-tuple combination of attributes. This method can be used for de novo identification of pangenetic attribute combinations that are statistically associated with an item preference, as well as for identifying one or more core pangenetic attribute combinations from a previously identified candidate pangenetic attribute combination. The method can begin by generating n-tuples of any chosen size, size being limited only by the number of attributes present in the pangenetic profile, pangenetic attribute combination, or set of attributes from which attributes are selected for generating the n-tuple combinations. However, it is preferable to begin with small size n-tuple combinations if using this bottom-up approach for the de novo identification of pangenetic attribute combinations because this method typically requires generating all possible n-tuple combinations for the chosen starting value of n in the first step. If the n-tuple size chosen is too large, an unmanageable computational problem can be created. For example, if n=50 is chosen as the starting n-tuple size with a set of 100 attributes, all possible 50-tuple combinations from the 100 attributes would be 1×1029 combinations, which is a currently unmanageable even with current supercomputing power. Therefore, it is more reasonable to choose 2-tuple, 3-tuple, 4-tuple, or 5-tuple sized combinations to start with, depending on the size of the set of attributes from which the n-tuple combinations will be generated and the amount of computing time and computer processor speed available. Once a first set of n-tuple combinations of attributes is generated, frequencies of occurrence are determined for each n-tuple combination in a set of pangenetic profiles associated with the specified item preference and in a set of pangenetic profiles that is not associated with the specified item preference. Each n-tuple combination that is statistically associated with the specified item preference is identified based on the frequencies of occurrence and stored in a compilation containing pangenetic attribute combinations that are associated with that item preference. If no n-tuple combinations are determined to be statistically associated with the item preference specified, the value of n can be incremented by one and the method can be reiterated, beginning at the first step, for the larger n-tuple size. If, on the other hand, at least one n-tuple was determined to be statistically associated with the specified item preference and stored in the compilation, a set of (n+1)-tuple combinations are generated by combining each stored n-tuple combination with each attribute in the set of attributes that does not already occur in that n-tuple (combining an n-tuple with an attribute from the set that already occurs in that n-tuple would create an (n+1)-tuple containing an attribute redundancy, which is undesirable). Next, frequencies of occurrence of the (n+1)-tuple combinations are determined and those (n+1)-tuple combinations which have a higher strength of association with the specified item preference than the n-tuple combinations from which they were generated are stored in the compilation containing pangenetic attribute combinations that are associated with the specified item preference. Storing an (n+1)-tuple combination that does not have a higher strength of association with the specified item preference than the n-tuple combination from which it is generated effectively adds an pangenetic attribute combination to the compilation which contains an additional attribute that is not positively associated with the specified item preference, something that is undesirable. Provided at least one (n+1)-tuple combination has a stronger statistical association with the specified item preference and was stored, the value of n is incremented by one and a next iteration of the method is performed, so that the (n+1)-tuple combinations generated during the current iteration become the n-tuple combinations of the next iteration. By generating progressively larger n-tuple combinations at each iteration and storing those that have increasingly stronger statistical association with the specified item preference than the ones before, a compilation of pangenetic combinations that are associated with the specified item preference is generated which can be used effectively for methods of web search, web item recommendation, and user satisfaction and behavior prediction.
Confidentiality with respect to personal pangenetic data can be a major concern to individuals that submit their data for use in the disclosed inventions. Embodiments exist in which the identity of an individual can be linked directly or indirectly to their data, masked, anonymized, or provided only by privileged access or through authorization procedures, including but not limited to the embodiments which follow.
In one embodiment the identity of individuals are linked to their pangenetic profiles. In one embodiment the identity of individuals are linked directly to their pangenetic profiles. In one embodiment the identity of individuals are linked indirectly to their pangenetic profiles. In one embodiment the identity of individuals are anonymously linked to their pangenetic profiles. In one embodiment the identity of individuals are linked to their pangenetic profiles using a nondescriptive alphanumeric identifier. In one embodiment the identity of individuals are linked to their pangenetic profiles using a nondescriptive non-alphanumeric identifier. In one embodiment the identity of individuals are linked to the pangenetic attributes they possess as stored in one or more datasets of the methods. In one embodiment the linkage of identity is direct. In one embodiment the linkage of identity is indirect. In one embodiment the linkage of identity requires anonymizing or masking the identity of the individual. In one embodiment the linkage of identity requires use of a nondescriptive alphanumeric or non-alphanumeric identifier.
In one embodiment, an authorization granting access to the pangenetic data can be generated, transmitted and/authenticated if user input is supplied in the form of at least one combination of characters that matches at least one combination of characters (e.g., a user_ID, password, passphrase, passcode, or PIN) previously stored in association with the user, each of the characters being selected from the group consisting of alphanumeric characters and non-alphanumeric characters. For additional security, the combination of characters stored in association with the user can be stored as a cryptographic hash. In another embodiment, the authorization granting access to the pangenetic data can be generated if user input is supplied in the form of at least one combination of characters that matches at least one combination of randomly selected characters (e.g., automatically generated single-use passwords, and CAPTCHA and reCAPTCHA passwords) by software that interacts with the authorization interface, each of the characters being selected from the group consisting of alphanumeric characters and non-alphanumeric characters. In another embodiment, the authorization granting access to the pangenetic data can be generated if user input is supplied in the form of biometric data that matches biometric data previously stored in association with the user.
In one or more embodiments, data masks can be used in the present inventions to block access, reading and/or transmission of at least a portion of the data (i.e., data profile) associated with one or more users. Any type of pangenetic and non-pangenetic data can potentially be masked using data masks. Pangenetic data that can be masked includes, but is not limited to, individual attributes such as nucleotide identities within full or partial genomic sequence, SNP identities contained in genome scans, individual epigenetic modifications, epigenetic patterns (i.e., motifs), genetic or epigenetic regulated gene expression patterns (which can be tissue specific), individual genetic mutations, genetic mutation rates, telomere length (a marker of age and the rate of senescence), and occurrences of genome integrated viruses and virus sequences (such as occurrences of integration of HIV virus into the human genome). A user may want portions of their pangenetic data to be masked to ensure that certain confidential regions cannot be accessed or read by the other users and entities, including the pangenetic web search, recommendation, and prediction system. Confidential regions may include, for example, particular genetic sequences or epigenetic patterns that can reveal the individual's present health conditions, their susceptibilities toward acquiring particular diseases in the future (i.e., disease predispositions), or their predicted lifespan (i.e., longevity predisposition). Also, in instances where a consumer appoints someone else as a user to employ applications of the disclosed invention which use the consumer's pangenetic data, the consumer may want to keep the majority of their pangenetic information inaccessible and only permit access to the minimum amount of pangenetic data necessary for the particular application (e.g., an insurer or administrator looking up information on behalf of the consumer or requesting recommendations for the consumer). However, it should be noted that increased masking of pangenetic attributes may result in decreased certainty and accuracy of search results, recommendations and predictions by the pangenetic based web system.
To enable both individualized and application dependent control of pangenetic data access, one or more data masks (i.e., pangenetic data masks, non-pangenetic data masks) can be used to control access, reading and/or transmission of certain data attributes as specified by an authorized user. In one embodiment, one or more data masks can be associated with (i.e., linked to) one or more sets of data or a data profile (i.e., a pangenetic profile or a non-pangenetic profile) associated with a user. The data masks can be further linked to identifiers of other particular users, such as individuals (e.g., friends, acquaintances, business contacts, secondary users) and organizations (e.g., product and service providers) interacting with or acting on behalf the primary user, and/or they can be associated with particular queries or particular applications (certain web search engine sites or online shopping websites, for example). The data mask can be pre-approved by the consumer associated with the pangenetic data being masked, or the data mask can be pre-approved by a pangenetic based system that had previously identified a minimum set of pangenetic attributes required for accurate and reliable pangenetic based search, recommendation or prediction. When a user, application, website or system attempt to access the user's data, the appropriate mask will be applied to ensure access or transmission of only those portions of the consumer's data for which permission is granted. In another embodiment, data masks can be applied selectively in association with particular queries or applications, without regard to the particular entity (e.g., user, organization, computer system) that is accessing the consumers' data to implement those queries or applications. Generally, pangenetic data masks that are associated with particular users or applications can provide the added benefit of increasing processing efficiency of the disclosed methods by streamlining access and/or reading of consumer data attributes to only the designated portions of their data considered relevant to the particular user, query or application. In one embodiment, a data mask associated with a particular user and a data mask associated with a particular query or application can be applied simultaneously when accessing a consumer's data profile (and can span one or more data records of a data profile). In one or more embodiments, the user approves the data masks that are applied to their pangenetic and/or non-pangenetic data.
In one or more embodiments, a consensus mask (consensus data mask) can be generated from two or more data masks and used to limit access to a portion of the data represented by the intersection between those two or more data masks. In one embodiment, the consensus mask can be a data mask representing a consensus between a plurality of data masks with respect to which data should be unmasked. In another embodiment, a consensus mask can be a data mask that represents a set of attributes (i.e., attribute positions or identifiers, data record positions or identifiers) that a plurality of data masks all agree are permissible for access, reading and/or transmission. In the embodiment disclosed above which describes the simultaneous application of two or more data masks—at least one data mask associated with a consumer or user, and at least one data mask associated with a query or application—a consensus mask can be generated from the intersection of those two or more data masks and applied when accessing and/or transmitting the individual's data, effectively achieving the same result as the simultaneous application of the two or more separate data masks. In one embodiment, the simultaneous application of two of more data masks comprises the generation and application of a consensus mask. Consensus data masks can be applied to the pangenetic and non-pangenetic profiles of an individual.
A consensus mask can also be generated and used in circumstances of pangenetic profiling where, for example, two or more individuals have chosen to make at least a portion of their pangenetic data inaccessible using pangenetic data masks, but those pangenetic masks differ from each other. A consensus mask can be generated from the intersection of the differing data masks and then applied to the data profiles of all of the individuals being considered in that particular instance. With respect to pangenetic data for example, this ensures that the same set of pangenetic attributes, a minimal shared set of attributes, will be accessed for all of the pangenetic profiles associated with a group of individuals. So, by generating and using a consensus mask with respect to a group, inadvertent access to confidential pangenetic data can be prevented for the entire group while at the same time ensuring uniform access to exactly the same pangenetic attributes within each individual's pangenetic profile, thereby providing consistent and valid results when determining statistical association values, as may be required when determining correlations between pangenetic attributes and web items and or item ratings.
Referring again to
Both data masks and consensus data masks should align appropriately to the respective data profiles of the individuals, to ensure that each attribute associated with each of the individuals is handled as masked or unmasked in accordance with the corresponding data mask. In one embodiment, this can be achieved by generating and using data masks (and consensus data masks) that cover the entire data profile of an individual, from beginning to end, such that every attribute or attribute group (an associated set of attributes treated as a single unit) present within the data profile of the individual has a corresponding indicator in the mask (e.g., either a ‘M’ and ‘U’ character) which indicates whether that attribute is to be treated as a masked attribute or an unmasked attribute with respect to access and/or transmission. In an alternative embodiment, a data mask does not cover the entire pangenetic or non-pangenetic profile of a individual, but rather, is mapped to corresponding attributes in the profile of the individual using attribute identifiers, indices, addresses, pointers or keys which ensure that the masked and unmasked attribute indicators point to (i.e., map to) the appropriate attributes (i.e., corresponding attribute values) contained in the individual's data profile. In one embodiment, only masked attribute positions are represented in the data mask using attribute identifiers, indices, addresses, pointers or keys which point to the corresponding attributes of the individual's data profile, the unmasked attributes being absent from the data mask. In another embodiment, only the unmasked attribute positions are represented in the data mask using attribute identifiers, indices, pointers or keys which point to the corresponding attributes of the individual's data profile, the masked attributes being absent from the data mask.
There are several different methods by which to apply a data mask to a data profile. In one embodiment, a data mask is merged with a data profile of an individual to generate a temporary data profile (a masked hybrid data profile) of the individual. This can be accomplished by generating a copy of a data profile of the individual and replacing those attribute values which the data mask indicates need to be masked with, for example, nondescriptive placeholders such as an alphanumeric character or a symbol (e.g., ‘X’, ‘#’, ‘*’, or ‘$’), or alternatively, deleting the masked attribute values from the temporary data profile. The temporary data profile can then be made available in its entirety for reading or transmission without having to block access or transmission of any of the attributes it contains.
In a different embodiment, a data mask can be applied to a data profile by accessing, reading or transmitting data from the data profile in accordance with the pattern of mask and unmask indicators contained in the data mask. As such, the data mask is executed as a set of instructions, wherein each unmask attribute indicator is interpreted as a read/transmit (i.e., process attribute) instruction with respect to the corresponding attribute value in the individual's data profile, and wherein each mask attribute indicator is interpreted as a non-read/non-transmit (i.e., skip attribute) instruction with respect to the corresponding attribute value in the individual's data profile. In one embodiment, the data mask contains only unmask attribute indicators that provide read/transmit instructions with respect to the corresponding attribute values in the individual's data profile, wherein the unmask attribute indicators are mapped to the corresponding attributes of the individual's data profile using attribute identifiers, indices, addresses, pointers or keys. In another embodiment, the data mask contains only mask attribute indicators that provide non-read/non-transmit instructions with respect to the corresponding attribute values in the individual's data profile, wherein the mask attribute indicators are mapped to the corresponding attributes of the individual's data profile using attribute identifiers, indices, addresses, pointers or keys.
As can be seen from
As further illustrated in
As further illustrated in
As further illustrated in
Further with respect to
As previously disclosed, a completely different mask may be applied to the user's pangenetic data depending on who the user is, and whether the request results are to be transmitted as output to the user or a different user or entity such as a website. The nature of the request can also determine the application of additional masks, for example, a mask associated with item type, item provider type or request type which reduce the number pangenetic attributes of the user that need to be read, so that those which are considered by the system to be irrelevant are masked. With respect to
In one embodiment, the unmasked pangenetic attributes associated with the user are compared with the pangenetic data combinations by determining the percent match (one type of pangenetic similarity measure) between each pangenetic data combination and the pangenetic data of the user, and then ranking the pangenetic data combinations based on the percent matching achieved relative to one another. In one embodiment, the rank is also based on satisfaction levels, so that both satisfaction level and percent match are used to determine rank in a concurrent evaluation in which a pangenetic combination associated with a higher satisfaction level than another pangenetic combination will receive the higher rank when both have the same degree of pangenetic similarity to the user. In another embodiment, the percent match and the satisfaction level associated with a correlation are both used to determine rank of the correlation, but are differentially weighted for the purpose of making the determination. With respect to
In one embodiment, a computer based method for generating a pangenetic based item feedback matrix is provided comprising i) accessing item feedback data from a plurality of individuals with respect to one or more web items; ii) accessing pangenetic data associated with the plurality of individuals; iii) determining, by statistical association based on the item feedback data, correlations between the web items and combinations of the pangenetic data; and iv) storing the correlations between the web items and the combinations of pangenetic data to generate a pangenetic based item feedback matrix. The method can further comprise a step of transmitting one or more of the correlations from the pangenetic based item feedback matrix to at least one destination selected from the group consisting of the user, a database, a dataset, a computer readable memory, a computer readable medium, a computer processor, a computer network, a printout device, a visual display, and a wireless receiver. The method can also further comprise acts of i) transmitting at least one authorization request for access to the pangenetic data associated with the plurality of individuals, and ii) receiving an authorization granting access to the pangenetic data associated with the plurality of individuals.
In one embodiment of a computer based method for generating a pangenetic based item feedback matrix, the pangenetic data is pangenetic metadata. In one embodiment, the content of the item feedback matrix is stored within a dataset selected from the group consisting of an internet search engine document index, an internet search engine hitlist, and an internet search engine lexicon. In one embodiment, the determined correlations are used to generate a dataset selected from the group consisting of an internet search engine document index, an internet search engine hitlist, and an internet search engine lexicon. In one embodiment, the pangenetic data associated with the plurality of individuals constitute a plurality of pangenetic profiles of the individuals.
In one embodiment of a computer based method for generating a pangenetic based item feedback matrix, at least a portion of the correlations stored in the pangenetic based item feedback matrix are used for a method of providing internet search results for a user. In one embodiment, at least a portion of the correlations stored in the pangenetic based item feedback matrix are used for a method of online recommendation of items for a user. In one embodiment at least a portion of the correlations stored in the pangenetic based item feedback matrix are used for a method of online prediction of user satisfaction with an item. In one embodiment at least a portion of the correlations stored in the pangenetic based item feedback matrix are used for a method of predicting user behavior.
In one embodiment of a computer based method for generating a pangenetic based item feedback matrix, the plurality of individuals share one or more non-pangenetic attributes in common. In one embodiment, each correlation stored in the item feedback matrix indicates an association between one of the web items and one of the combinations of pangenetic data. In one embodiment, the correlations that are selected for being stored have one or more corresponding statistical association values, as determined by statistical association, that meet one or more predetermined threshold values, where for example, the statistical association values can indicate a minimum level of statistical significance or a minimum level of statistical certainty. In one embodiment, each correlation stored in the item feedback matrix can include at least one statistical association value, as determined by statistical association, which indicates strength of the association between one of the web items and one of the combinations of pangenetic data. In one embodiment, the correlations that are stored have one or more corresponding statistical association values, as determined by statistical association, which are used to rank web items correlating with the same combination of pangenetic data so that the pangenetic combinations having the strongest association with the web items can be readily identified. In one embodiment, the correlations are indicated by scores derived from the feedback data. In one embodiment, the correlations are indicated by ratings derived from the feedback data. In one embodiment, the correlations are indicated using binary indicators such as {like, dislike}.
In one embodiment of a computer based method for generating a pangenetic based item feedback matrix, accessing of the pangenetic data of the individuals is performed in accordance with at least one data mask applied to the pangenetic data. In one embodiment a different data mask that can be specified by each of the plurality of individuals can be applied to their respective pangenetic profiles (i.e., pangenetic data). In an alternative embodiment, the at least one data mask is a consensus data mask derived from a plurality of data masks and then applied uniformly to each of the plurality of pangenetic profiles of the plurality of individuals. In one embodiment, the pangenetic data of the plurality of individuals is performed in accordance with the steps of i) transmitting an authorization request for access to the pangenetic data associated with the plurality of individuals; ii) receiving an authorization which grants access to the pangenetic data; iii) accessing a data mask, wherein the data mask's parameters are associated with the authorization; and iv) applying the data mask to the pangenetic data. In one embodiment the identities of the individuals are masked or anonymized. In one embodiment, non-pangenetic data associated with the individuals is masked.
In one embodiment, a program storage device is provided that is readable by a machine and contains a set of instructions which, when read by the machine, causes execution of a computer based method for generating a pangenetic based item feedback matrix, the method comprising i) receiving item feedback data from a plurality of individuals with respect to one or more web items; ii) accessing pangenetic data associated with the plurality of individuals; iii) determining, by statistical association based on the item feedback data, correlations between the web items and combinations of the pangenetic data; and iv) storing the correlations between the web items and the combinations of pangenetic data to generate a pangenetic based item feedback matrix.
One embodiment of a computer database system for providing internet search results for a user comprises 1) a memory containing a first data structure containing item feedback data from a plurality of individuals with respect to one or more web items, and a second data structure containing pangenetic data associated with the plurality of individuals; and 2) a processor for: i) accessing the first data structure; ii) accessing the second data structure; iii) determining, by statistical association based on the item feedback data, correlations between the web items and combinations of the pangenetic data; and iv) storing the correlations between the web items and the combinations of pangenetic data to generate a pangenetic based item feedback matrix.
In one embodiment, a method for pangenetic based web search can comprise i) receiving non-pangenetic data associated with a user query; ii) accessing pangenetic data associated with the user; iii) accessing a dataset containing pangenetic data and non-pangenetic data correlated with web items; iv) determining for each web item, the quantity of non-pangenetic matches between the non-pangenetic data correlated with that web item and the non-pangenetic data associated with the user query and the quantity of pangenetic matches between the pangenetic data correlated with that web item and the pangenetic data associated with the user; and v) transmitting as output, based on the quantity of non-pangenetic matches and the quantity of pangenetic matches determined for each web item, a listing of at least a portion of the web items as internet search results for the user. In addition to transmitting a listing of the one or more web items to the user, the system can transmit the listing to one or more other users, a database, a dataset, a computer readable memory, a computer readable medium, a computer processor, a computer network, a printout device, a visual display, and a wireless receiver.
In one embodiment of a method for pangenetic based web search, the method can further comprise acts of transmitting an authorization request for access to the pangenetic data associated with the user, and receiving an authorization granting access to the pangenetic data associated with the user. In one embodiment, the pangenetic data associated with the user constitutes a pangenetic profile of the user. In one embodiment, the pangenetic data correlated with the web items can be pangenetic metadata. In one embodiment, the dataset containing pangenetic data and non-pangenetic data can be selected from the group consisting of an internet search engine document index, an internet search engine hitlist, and an internet search engine lexicon dataset. In one embodiment, the dataset containing pangenetic data and non-pangenetic data is a lexicon dataset with pointers to entries in an internet search engine document index containing a hitlist, wherein determining the quantity of matches comprises identifying, from the hitlist, the quantity of non-pangenetic hits and the quantity of pangenetic hits for each web item with respect to the non-pangenetic data associated with the user query and the pangenetic data associated with the user, wherein hits are matches.
In one embodiment of a method for pangenetic based web search, the portion of the web items transmitted as output in the listing is determined by one or more predetermined thresholds applied to the quantity of non-pangenetic matches and the quantity of pangenetic matches determined for each web item. In one embodiment, each web item represented in the listing was determined to have at least one non-pangenetic match. In one embodiment, the listing is a rank listing wherein the rank of each web item in the rank listing is based on the quantity of non-pangenetic matches and the quantity of pangenetic matches determined for each web item. In one embodiment, the portion of the web items transmitted as output consists of web items having a rank within a range defined by at least one predetermined threshold applied to rank. In one embodiment, the rank listing contains two sets of ranks for the web items in the rank listing, the first set of ranks being based on the quantity of non-pangenetic matches, and the second set of ranks being based on the quantity of non-pangenetic matches and the quantity of pangenetic matches. In one embodiment, the rank of each web item in the rank listing is determined by a score computed for each web item based on the quantity of non-pangenetic matches and the quantity of pangenetic matches for each web item. In one embodiment, a score for a web item is computed by using a quantitative similarity measure to determine a non-pangenetic similarity value based on the quantity of non-pangenetic matches and a pangenetic similarity value based on the quantity of pangenetic matches, and then averaging the non-pangenetic similarity value with the pangenetic similarity value to generate the score for the web item. The averaging can be a weighted averaging computation in which a higher weight is given to either the non-pangenetic similarity value or the pangenetic similarity value depending on the type of search, the particular query terms, or the relative importance of non-pangenetic factors versus pangenetic factors in selecting the most relevant results for a user, which can be based on or learned from user feedback regarding satisfaction with past search results.
In one embodiment of a method for pangenetic based web search, the dataset containing pangenetic data and non-pangenetic data correlated with web items also contains context of occurrence values for the pangenetic data and non-pangenetic data correlated with each web item, and the method further comprises steps of i) identifying, with respect to a web item, the non-pangenetic context of occurrence values for each of the non-pangenetic data correlated with the web item which match non-pangenetic data associated with the user query; ii) computing a non-pangenetic score for the web item by combining the non-pangenetic context of occurrence values with the quantity of matches determined for the corresponding non-pangenetic data; iii) identifying, with respect to the web item, the pangenetic context of occurrence values for each of the pangenetic data correlated with the web item which match pangenetic data associated with the user query; iv) computing a pangenetic score for the web item by combining the pangenetic context of occurrence values with the quantity of matches determined for the corresponding pangenetic data; v) determining a final score for the web item by averaging the non-pangenetic score with the pangenetic score; vi) repeating steps (i) to (v) for each of the web items; and vii) determining the rank of each web item based on the final scores determined for the web items.
In one embodiment of a method for pangenetic based web search, the pangenetic data correlated with the web items are derived from statistical associations between item preferences and pangenetic data associated with a group of individuals. In one embodiment, the pangenetic data correlated with the web items are derived by computing statistical associations which indicate the strength of association between the item preferences and pangenetic data associated with a group of individuals. In one embodiment, the pangenetic data correlated with the web items are derived from statistical associations between pangenetic data associated with individuals and online behaviors the individuals exhibit while interacting with the web items. In one embodiment, the pangenetic data correlated with the web items are derived from an item feedback matrix containing correlations between item preferences and pangenetic data associated with a group of individuals.
In one embodiment of a method for pangenetic based web search, the accessing of pangenetic data of the user is in accordance with an applied data mask, the method further comprising i) transmitting an authorization request for access to the pangenetic data associated with the user; ii) receiving an authorization which grants access to the pangenetic data; iii) accessing a data mask, wherein the data mask's parameters are associated with the authorization; and iv) applying the data mask to the pangenetic data.
In one embodiment, a program storage device is provided that is readable by a machine and contains a set of instructions which, when read by the machine, causes execution of a computer based method for providing internet search results for a user, wherein the method comprises i) receiving non-pangenetic data associated with a user query; ii) accessing pangenetic data associated with the user; iii) accessing a dataset containing pangenetic data and non-pangenetic data correlated with web items; iv) determining for each web item, the quantity of non-pangenetic matches between the non-pangenetic data correlated with that web item and the non-pangenetic data associated with the user query and the quantity of pangenetic matches between the pangenetic data correlated with that web item and the pangenetic data associated with the user; and v) transmitting as output, based on the quantity of non-pangenetic matches and the quantity of pangenetic matches determined for each web item, a listing of at least a portion of the web items as internet search results for the user.
In one embodiment, a computer database system for providing internet search results for a user comprises 1) a memory containing a first data structure containing pangenetic data associated with the user, and a second data structure containing pangenetic data and non-pangenetic data correlated with web items; and 2) a processor for: i) receiving non-pangenetic data associated with a user query; ii) accessing the first data structure; iii) accessing the second data structure; iv) determining for each web item, the quantity of non-pangenetic matches between the non-pangenetic data correlated with that web item and the non-pangenetic data associated with the user query and the quantity of pangenetic matches between the pangenetic data correlated with that web item and the pangenetic data associated with the user; and v) transmitting as output, based on the quantity of non-pangenetic matches and the quantity of pangenetic matches determined for each web item, a listing of at least a portion of the web items as internet search results for the user.
In one embodiment, a method for pangenetic based online recommendation of items comprises i) receiving at least one item preference associated with the user; ii) accessing pangenetic data associated with the user; iii) accessing a dataset containing item preferences of individuals who also share the at least one item preference associated with the user, wherein pangenetic data of the individuals are correlated with the item preferences; iv) determining for each item preference, the quantity of matches between the pangenetic data associated with that item preference and the pangenetic data associated with the user; and v) transmitting as output, based on the quantity of matches determined for each item preference, a listing of at least a portion of the item preferences to indicate recommended items for the user. In addition to transmitting a listing of item preferences to the user, the system can transmit the listing to one or more other users, a database, a dataset, a computer readable memory, a computer readable medium, a computer processor, a computer network, a printout device, a visual display, and a wireless receiver.
In one embodiment, a method for pangenetic based online recommendation of items for a user can further comprise transmitting, as output, at least a portion of the item preferences to indicate non-recommended items for the user. In one embodiment, the pangenetic data associated with the user constitutes a pangenetic profile of the user. In one embodiment, the pangenetic data correlated with the item preferences are combinations of pangenetic data selected from pangenetic profiles of the individuals. In one embodiment, the pangenetic data correlated with the item preferences are pangenetic metadata. In one embodiment, the item preferences are ratings that indicate levels of satisfaction with the items indicated by the item preferences. In one embodiment, the ratings are average ratings of the items by the individuals. In one embodiment, the method can further comprise receiving one or more non-pangenetic attributes associated with the user, wherein the one or more non-pangenetic attributes associated with the user match one or more non-pangenetic attributes associated with the individuals. In one embodiment, the method can further comprise the steps of i) transmitting an authorization request for access to the pangenetic data associated with the user, and ii) receiving an authorization granting access to the pangenetic data associated with the user.
In one embodiment of a method for pangenetic based online recommendation of items for a user, the portion of the item preferences transmitted as output can be determined by a predetermined threshold applied to the quantity of matches determined for each item preference. In one embodiment, the listing is a rank listing, and wherein the rank of each item preference in the rank listing is based on the quantity of matches determined for each item preference. In one embodiment, the item preferences transmitted as output consists of item preferences having a rank within a range defined by at least one predetermined threshold applied to rank. In one embodiment, the rank of each item preference represented in the rank listing is determined by a score computed for each item preference based on the quantity of matches determined for each item preference. In one embodiment, the score for each item preference is computed using a quantitative similarity measure applied to the pangenetic data.
In one embodiment of a method for pangenetic based online recommendation of items for a user, the correlations between the pangenetic data and the item preferences contained in the dataset are previously determined based on statistical associations between item preferences and pangenetic data associated with the individuals. In one embodiment, the correlations between the pangenetic data and the item preferences contained in the dataset are determined by computing statistical associations which indicate the strength of association between item preferences and pangenetic data associated with the individuals. In one embodiment, the correlations between the pangenetic data and the item preferences contained in the dataset are determined by computing statistical associations between pangenetic data of individuals and online behaviors which indicate the item preferences of the individuals. In one embodiment, the dataset is an item feedback matrix.
In one embodiment of a method for pangenetic based online recommendation of items for a user, the method further comprises acts of i) receiving item preference data associated with the individuals, wherein the item preference data indicates item preferences of the individuals; ii) accessing pangenetic data associated with the individuals; iii) determining correlations between the item preference data and the pangenetic data associated with the individuals; and iv) storing the correlations between the item preference data and the pangenetic data to generate an item feedback matrix.
In one embodiment of a method for pangenetic based online recommendation of items for a user, the method further comprises acts of i) transmitting an authorization request for access to the pangenetic data associated with the user; ii) receiving an authorization which grants access to the pangenetic data; iii) accessing a data mask, wherein the data mask's parameters are associated with the authorization; and iv) applying the data mask to the pangenetic data.
In one embodiment of a method for pangenetic based online recommendation of items for a user, wherein the dataset comprises data records containing the item preferences of the individuals, the method further comprises acts of i) identifying one or more clusters of data records, wherein within each cluster the data records share a similar pattern of item preferences as determined by a quantitative similarity measure; ii) determining, by statistical association, pangenetic data that correlate with each of the one or more clusters; and iii) identifying, by using a quantitative similarity measure, the cluster having the highest pangenetic similarity to the user to provide the portion of the item preferences to be transmitted as output. In a further embodiment, the item preferences of the identified cluster comprise item rating values that are averaged prior to transmission as output. In another embodiment, the item preferences identified for transmission as output are a subset of item preferences selected from the identified cluster based on an item category relationship with the at least one item preference associated with the user.
In one embodiment of a method for pangenetic based online recommendation of items for a user, wherein the dataset comprises data records containing the item preferences of the individuals, and wherein the item preferences comprise item rating values, the method further comprises acts of i) identifying one or more clusters of data records, wherein within each cluster the data records share a similar pattern of item preferences as determined by a quantitative similarity measure; ii) determining, by statistical association, pangenetic data that correlate with each of the one or more clusters; iii) identifying, by using a quantitative similarity measure, the cluster having the highest pangenetic similarity to the user; and iv) identifying, by using a quantitative similarity measure within the cluster having the highest pangenetic similarity to the user, a subcluster of data records having the most similar pattern of item preferences to the user to provide the portion of the item preferences to be transmitted as output. In a further embodiment, the item preferences of the identified subcluster comprise item rating values that are averaged prior to transmission as output.
In one embodiment, a program storage device is provided that is readable by a machine and contains a set of instructions which, when read by the machine, causes execution of a computer based method for online recommendation of items for a user, wherein the method comprises i) receiving at least one item preference associated with the user; ii) accessing pangenetic data associated with the user; iii) accessing a dataset containing item preferences of individuals who also share the at least one item preference associated with the user, wherein pangenetic data of the individuals are correlated with the item preferences; iv) determining for each item preference, the quantity of matches between the pangenetic data correlated with that item preference and the pangenetic data associated with the user; and v) transmitting as output, based on the quantity of matches determined for each item preference, a listing of at least a portion of the item preferences to indicate recommended items for the user.
In one embodiment, a computer database system for online recommendation of items for a user can comprise 1) a memory containing a first data structure containing pangenetic data associated with the user, and a second data structure containing item preferences of individuals who also share at least one item preference associated with the user, wherein pangenetic data of the individuals are correlated with the item preferences; and 2) a processor for i) receiving the at least one item preference associated with the user; ii) accessing the first data structure; iii) accessing the second data structure; iv) determining for each item preference, the quantity of matches between the pangenetic data correlated with that item preference and the pangenetic data associated with the user; and v) transmitting as output, based on the quantity of matches determined for each item preference, a listing of at least a portion of the item preferences to indicate recommended items for the user.
In one embodiment, a method for online prediction of user satisfaction with an item comprises i) receiving at least one item preference associated with a user; ii) accessing pangenetic data associated with the user; iii) accessing a dataset containing one or more levels of satisfaction associated with the at least one item preference, wherein pangenetic data are correlated with the one or more levels of satisfaction; iv) determining for each level of satisfaction, the quantity of matches between the pangenetic data correlated with that level of satisfaction and the pangenetic data associated with the user; and v) transmitting as output, based on the quantity of matches determined for each level of satisfaction, a level of satisfaction the user is predicted to experience with respect to the at least one item preference. In addition to transmitting a listing the predicted satisfaction level to the user, the system can transmit the predicted satisfaction level to one or more other users, a database, a dataset, a computer readable memory, a computer readable medium, a computer processor, a computer network, a printout device, a visual display, and a wireless receiver.
In one embodiment of a method for online prediction of user satisfaction with an item, the level of satisfaction for which the largest quantity of matches is determined is the satisfaction level the user is predicted to experience. In one embodiment, the method further comprises computing a score for each level of satisfaction using a quantitative similarity measure that processes the quantity of matches, and selecting the level of satisfaction having the highest score as the level of satisfaction the user is predicted to experience. In one embodiment, the pangenetic data correlated with the one or more levels of satisfaction are pangenetic metadata. In one embodiment, the pangenetic data associated with the user constitutes a pangenetic profile of the user. In one embodiment, the pangenetic data correlated with the one or more levels of satisfaction are combinations of pangenetic data selected from pangenetic profiles associated with a group of individuals. In one embodiment, the levels of satisfaction are the average levels of satisfaction of a group of individuals. In one embodiment, the method further comprises receiving one or more non-pangenetic attributes associated with the user, wherein the one or more non-pangenetic attributes associated with the user match one or more non-pangenetic attributes associated with the group of individuals. In one embodiment, the method further comprises the steps of transmitting an authorization request for access to the pangenetic data associated with the user, and receiving an authorization granting access to the pangenetic data associated with the user.
In one embodiment of a method for online prediction of user satisfaction with an item, the correlations between the pangenetic data and the one or more levels of satisfaction contained in the dataset are previously determined based on statistical associations between levels of satisfaction and pangenetic data associated with a group of individuals. In one embodiment, the correlations between the pangenetic data and the one or more levels of satisfaction contained in the dataset are determined by computing statistical associations which indicate the strength of association between levels of satisfaction and pangenetic data associated with a group of individuals. In one embodiment, the correlations between the pangenetic data and the one or more levels of satisfaction contained in the dataset are determined by computing statistical associations between pangenetic data of individuals and online behaviors which indicate levels of satisfaction of the individuals. In one embodiment, the correlations between the pangenetic data and the one or more levels of satisfaction contained in the dataset comprise statistical associations indicating level of certainty, and wherein a level of certainty that the user will experience the predicted level of satisfaction is also transmitted as output.
In one embodiment of a method for online prediction of user satisfaction with an item, the dataset is an item feedback matrix and the method further comprises i) receiving level of satisfaction data associated with a group of individuals, wherein the level of satisfaction data indicates levels of satisfaction of the individuals with the at least one item preference; ii) accessing pangenetic data associated with the individuals; iii) determining correlations between the levels of satisfaction of the individuals and the pangenetic data associated with the individuals; and iv) storing the correlations between the levels of satisfaction and the pangenetic data to generate an item feedback matrix.
In one embodiment of a method for online prediction of user satisfaction with an item, accessing of the pangenetic data associated with the user is in accordance with an applied data mask and the method further comprises i) transmitting an authorization request for access to the pangenetic data associated with the user; ii) receiving an authorization which grants access to the pangenetic data; iii) accessing a data mask, wherein the data mask's parameters are associated with the authorization; and iv) applying the data mask to the pangenetic data.
In one embodiment, a program storage device is provided that is readable by a machine and contains a set of instructions which, when read by the machine, causes execution of a computer based method for online prediction of user satisfaction with an item, wherein the method comprises i) receiving at least one item preference associated with a user; ii) accessing pangenetic data associated with the user; iii) accessing a dataset containing one or more levels of satisfaction associated with the at least one item preference, wherein pangenetic data are correlated with the one or more levels of satisfaction; iv) determining for each level of satisfaction, the quantity of matches between the pangenetic data correlated with that level of satisfaction and the pangenetic data associated with the user; and v) transmitting as output, based on the quantity of matches determined for each level of satisfaction, a level of satisfaction the user is predicted to experience with respect to the at least one item preference.
In one embodiment, a computer database system for online prediction of user satisfaction with an item comprises 1) a memory containing a first data structure containing pangenetic data associated with the user, and a second data structure containing one or more levels of satisfaction associated with at least one item preference associated with the user, wherein pangenetic data are correlated with the one or more levels of satisfaction; and 2) a processor for i) receiving the at least one item preference associated with the user; ii) accessing the first data structure; iii) accessing the second data structure; iv) determining for each level of satisfaction, the quantity of matches between the pangenetic data correlated with that level of satisfaction and the pangenetic data associated with the user; and v) transmitting as output, based on the quantity of matches determined for each level of satisfaction, a level of satisfaction the user is predicted to experience with respect to the at least one item preference.
In one embodiment, a method for pangenetic web based prediction of user behavior comprises i) receiving at least one item preference of a user; ii) accessing pangenetic data associated with the user; iii) accessing a dataset containing one or more non-pangenetic attributes associated with the at least one item preference of the user, wherein pangenetic data are correlated with the one or more non-pangenetic attributes and each non-pangenetic attribute indicates a user behavior; iv) determining for each non-pangenetic attribute, the quantity of matches between the pangenetic data correlated with that non-pangenetic attribute and the pangenetic data associated with the user; and v) transmitting as output, based on the quantity of matches determined for each non-pangenetic attribute, at least one non-pangenetic attribute to indicate at least one behavior predicted for the user. The transmission can be to any of several destinations including the user, one or more other users, a database, a dataset, a computer readable memory, a computer readable medium, a computer processor, a computer network, a printout device, a visual display, and a wireless receiver. In one embodiment, the at least one non-pangenetic attribute transmitted as output is used for a task selected from the group consisting of selecting data for retrieval, selecting data for visual display, selecting the locations of data in a visual display, formulating a internet search query, and selecting web based items for recommendation to a user.
In one embodiment of a method for pangenetic web based prediction of user behavior, the non-pangenetic attribute having the largest quantity of pangenetic matches with the user is the at least one non-pangenetic attribute transmitted as output. In one embodiment, the method further comprises computing a score for each non-pangenetic attribute using a quantitative similarity measure that processes the quantity of matches, and selecting the non-pangenetic attribute having the highest score for transmission as output.
In one embodiment of a method for pangenetic web based prediction of user behavior the pangenetic data correlated with the one or more non-pangenetic attributes are pangenetic metadata. In one embodiment, the pangenetic data associated with the user constitutes a pangenetic profile of the user. In one embodiment, the pangenetic data correlated with the one or more non-pangenetic attributes are combinations of pangenetic data selected from pangenetic profiles associated with a group of individuals. In one embodiment, the method further comprises receiving one or more non-pangenetic attributes associated with the user, wherein the one or more non-pangenetic attributes associated with the user match one or more non-pangenetic attributes associated with the group of individuals. In one embodiment, the quantity of matches determined for each non-pangenetic attribute is used to compute a pangenetic similarity value for each non-pangenetic attribute, wherein non-pangenetic attributes having pangenetic similarity values meeting a predetermined threshold value are transmitted as output. In one embodiment, the method further comprises transmitting an authorization request for access to the pangenetic data associated with the user, and receiving an authorization granting access to the pangenetic data associated with the user. In one embodiment, the accessing of pangenetic data associated with the user is performed in accordance with an applied data mask, wherein the method further comprises i) transmitting an authorization request for access to the pangenetic data associated with the user; ii) receiving an authorization which grants access to the pangenetic data; iii) accessing a data mask, wherein the data mask's parameters are associated with the authorization; and iv) applying the data mask to the pangenetic data.
In one embodiment of a method for pangenetic web based prediction of user behavior, the associations between the pangenetic data and the one or more non-pangenetic attributes contained in the dataset are previously determined based on statistical associations between non-pangenetic attributes and pangenetic data associated with a group of individuals. In one embodiment, the correlations between the pangenetic data and the one or more non-pangenetic attributes contained in the dataset are determined by the results of computing statistical associations which indicate the strength of association between non-pangenetic attributes and pangenetic data associated with a group of individuals. In one embodiment, the pangenetic data and the one or more non-pangenetic attributes contained in the dataset comprise statistical associations indicating level of certainty, and a level of certainty that the user will exhibit the predicted behavior is also transmitted as output.
In one embodiment of a method for pangenetic web based prediction of user behavior, the dataset is an item feedback matrix and the method further comprises i) receiving non-pangenetic attribute data associated with a group of individuals, wherein the non-pangenetic attribute data indicate behaviors of the individuals with respect to the at least one item preference; ii) accessing pangenetic data associated with the individuals; iii) determining correlations between the non-pangenetic attribute data and the pangenetic data associated with the individuals; and iv) storing the correlations between the non-pangenetic attribute data and the pangenetic data to generate an item feedback matrix.
In one embodiment, a program storage device is provided that is readable by a machine and contains a set of instructions which, when read by the machine, causes execution of a computer based method for predicting user behavior, wherein the method comprises i) receiving at least one item preference of a user; ii) accessing pangenetic data associated with the user; iii) accessing a dataset containing one or more non-pangenetic attributes associated with the at least one item preference of the user, wherein pangenetic data are correlated with the one or more non-pangenetic attributes and each non-pangenetic attribute indicates a user behavior; iv) determining for each non-pangenetic attribute, the quantity of matches between the pangenetic data correlated with that non-pangenetic attribute and the pangenetic data associated with the user; and v) transmitting as output, based on the quantity of matches determined for each non-pangenetic attribute, at least one non-pangenetic attribute to indicate at least one behavior predicted for the user.
In one embodiment, a computer database system for predicting user behavior comprises 1) a memory containing a first data structure containing pangenetic data associated with a user, and a second data structure containing one or more non-pangenetic attributes associated with at least one item preference of the user, wherein pangenetic data are correlated with the one or more non-pangenetic attributes and each non-pangenetic attribute indicates a user behavior; and 2) a processor for i) receiving the at least one item preference associated with the user; ii) accessing the first data structure; iii) accessing the second data structure; iv) determining for each non-pangenetic attribute, the quantity of matches between the pangenetic data correlated with that non-pangenetic attribute and the pangenetic data associated with the user; and v) transmitting as output, based on the quantity of matches determined for each non-pangenetic attribute, at least one non-pangenetic attribute to indicate at least one behavior predicted for the user.
Mobile devices (i.e., wireless computing and communications devices) can be utilized advantageously by consumers and other users for web based pangenetic data transactions because they can provide the ability to immediately request access to pangenetic information, authenticate themselves on the system, allow approval for access to the pangenetic information, and receive transmitted authorizations, approvals or denials with respect to selection of and payment for various products and services, for example. However, use of mobile devices place additional requirements on the system due to security concerns and memory limitations.
In terms of security and authentication, the mobile device may use any number of encryption techniques including but not limited to Wired Equivalent Privacy (WEP) encryption, Wi-Fi Protected Access (WPA), Temporal Key Integrity Protocol (TKIP), Lightweight Extensible Authentication Protocol (LEAP), Remote Authentication Dial In User Service (RADIUS), and WLAN Authentication and Privacy Infrastructure. In addition, the mobile devices may use one or more physical types of security including but not limited to smart cards and/or USB tokens. Software tokens may also be used as a form of security.
Additionally with respect to authentication, the mobile device may base authentication on simple password based authentication, biometric identification (e.g. fingerprint recognition or retinal scan) or combinations thereof. Additionally, hardware type solutions may be used in which smart cards, identification chips, or other devices personally associated with the user are utilized in part or wholly for identification and/or authentication. The authorization interface in the mobile device provides the appropriate combination of authentication protocols and procedures to insure that only an authorized individual is authenticated.
In addition to the secure connections, which may be established between the wireless devices and access nodes, pangenetic servers or web service provider servers, Virtual Private Networks (VPNs) can be used to establish secure end-to-end connections between devices. In one embodiment, wireless security is utilized to establish a secure connection to a server, and a VPN is subsequently established to ensure secure transmission along the entire data path. Similarly, a VPN may be established between the user mobile device and a web server, and a VPN may be established between the web server and a pangenetic data server.
In order to minimize data storage requirements at the mobile devices as well as to limit the amount of pangenetic data that is exposed to the wireless link, in one embodiment little or no pangenetic data is transmitted to the mobile units, but rather is transferred, after appropriate masking, from the pangenetic database server to the web server. In a further embodiment, a second “wireless mask” is utilized to allow the transmission of small amounts of critical pangenetic data to a mobile device. In one embodiment, key segments of the pangenetic information can be viewed through an appropriate presentation or Graphical User Interface (GUI). For example, a consumer or their physician may be seeking web based treatment information for a particular ailment and want to know the overlap of key pangenetic data with other individuals having the ailment. In one embodiment, a comparison of a large amount of masked pangenetic data is performed and used by a web search system to determine the appropriateness of web based information and/or item offerings for a consumer. The consumer may then receive, on their wireless device, a transmission of the key overlapping pangenetic attributes that represent the particular pangenetic attributes shared in common between the consumer making the inquiry (i.e., query, or request) and other consumers who found the information or item offers to be satisfactory. In one embodiment, a second wireless mask is used to reduce the amount of data transmitted. In an alternate embodiment, a mathematical or statistical method is used to determine what subset of pangenetic data should be transmitted to the mobile units. The above functionalities also apply to non-medical applications of the system.
In one embodiment, a mobile computing device for providing internet search results to a user comprises 1) a transmitter for sending, to a second computing device via a network, one or more transmissions of non-pangenetic data associated with a user query and an authorization granting access to pangenetic data associated with the user, whereupon receipt of the one or more transmissions via the network causes the second computing device to execute steps of i) accessing the pangenetic data associated with the user; ii) accessing a dataset (e.g., an item feedback matrix) containing pangenetic data and non-pangenetic data correlated with web items; iii) determining for each web item, the quantity of non-pangenetic matches between the non-pangenetic data correlated with that web item and the non-pangenetic data associated with the user query and the quantity of pangenetic matches between the pangenetic data correlated with that web item and the pangenetic data associated with the user; and 2) a receiver for receiving from the network, based on the quantity of non-pangenetic matches and the quantity of pangenetic matches determined for each web item, output comprising a listing of at least a portion of the web items as internet search results for the user.
In one embodiment, a mobile computing device for online recommendation of items for a user comprises 1) a transmitter for sending, to a second computing device via a network, one or more transmissions of at least one item preference associated with the user and an authorization granting access to pangenetic data associated with the user, whereupon receipt of the one or more transmissions via the network causes the second computing device to execute steps of i) accessing the pangenetic data associated with the user; ii) accessing a dataset (e.g., an item feedback matrix) containing item preferences of individuals who also share the at least one item preference associated with the user, wherein pangenetic data of the individuals are correlated with the item preferences; and iii) determining for each item preference, the quantity of matches between the pangenetic data correlated with that item preference and the pangenetic data associated with the user; and 2) a receiver for receiving from the network, based on the quantity of matches determined for each item preference, output comprising a listing of at least a portion of the item preferences to indicate recommended items for the user.
In one embodiment, a mobile computing device for online prediction of user satisfaction with an item comprises 1) a transmitter for sending, to a second computing device via a network, one or more transmissions of at least one item preference associated with a user and an authorization granting access to pangenetic data associated with the user, whereupon receipt of the one or more transmissions via the network causes the second computing device to execute steps of i) accessing pangenetic data associated with the user; ii) accessing a dataset (e.g., an item feedback matrix) containing one or more levels of satisfaction correlated with the at least one item preference, wherein pangenetic data are associated with the one or more levels of satisfaction; and iii) determining for each level of satisfaction, the quantity of matches between the pangenetic data associated with that level of satisfaction and the pangenetic data associated with the user; and 2) a receiver for receiving from the network, based on the quantity of matches determined for each level of satisfaction, output indicating a level of satisfaction the user is predicted to experience with respect to the at least one item preference.
In one or more of the embodiments of a mobile computing device as disclosed above, the receiver of the mobile computing device is also for receiving, from the second computing device via the network, an authorization request for access to the pangenetic data associated with the user, and wherein the transmitter of the mobile computing device is also for sending, to the second computing device via the network, an authorization granting access to the pangenetic data associated with the user.
It will be appreciated by one of skill in the art that the present methods, systems, software and databases can be implemented on a number of computing platforms, and that
As illustrated in an embodiment depicted by
The methods, systems, software and databases described herein can also be implemented on one or more specialized computing platforms, those platforms having been customized to provide the capabilities described herein. The specialized computing platforms may have specialized operating systems, database tools, graphical user interfaces, communications facilities and other customized hardware and/or software which allow use for the specific application which could not be run on a general purpose computing platform.
Although the systems and methods described herein are frequently described in reference to one or more computers owned and operated by the actors in the system (e.g., users, a pangenetic database administrator), the determination of web search results, item recommendations and user related predictions can be achieved through use of distributed computing systems or cloud computing, wherein the actor requests an action through an interface (typically a webpage) and the determination is made using computing resources at one or more server farms, those resources obtaining the appropriate information (pangenetic data, non-pangenetic data) from a variety of sources, and combining that information to make the required calculations and determinations. When using a cloud computing system, the subsequent calculations may be performed at alternate locations.
Pangenetic information may be stored in a number of formats, on a variety of media, and in a centralized or distributed manner. In one embodiment, the data is stored in one location with a label associating that data with a particular user, and one or more indices marking or identifying segments of pangenetic data. In an alternate embodiment, the pangenetic data is stored at a plurality of locations with one or more identifiers or labels associating that information with a particular user. In this embodiment, secure communications protocols can be used to allow the system to access all necessary portions of the data and to compile the data in a way that allows the determination of correspondences and applicability to be made. For example, a website or web application may be authorized to compile certain segments of genetic or epigenetic sequences stored in one location with demographic or lifestyle information stored in another location to determine web items or recommendations that are most appropriate for a particular user. By collecting the relevant information from a plurality of sources, the system is able to construct an appropriate file for making the determination. In one embodiment, the datasets of the methods of the present invention may be combined into a single dataset. In another embodiment the datasets may be kept separated. Separate datasets may be stored on a single computing device or distributed across a plurality of devices. As such, a memory for storing such datasets, while referred to as a singular memory, may in reality be a distributed memory comprising a plurality of separate physical or virtual memory locations distributed over a plurality of devices such as over a computer network. Data, datasets, databases, methods and software of the present invention can be embodied on a computer-readable media (medium), computer-readable memory (including computer readable memory devices), and program storage devices readable by a machine.
In one embodiment, the datasets of the methods of the present invention may be combined into a single dataset. In another embodiment the datasets may be kept separated. Separate datasets may be stored on a single computing device or distributed across a plurality of devices. As such, a memory for storing such datasets, while referred to as a singular memory, may in reality be a distributed memory comprising a plurality of separate physical or virtual memory locations distributed over a plurality of devices such as over a computer network. Data, datasets, databases, methods and software of the present invention can be embodied on a computer-readable media (medium), computer-readable memory (including computer readable memory devices), and program storage devices readable by a machine.
In one embodiment, at least a portion of the data for one or more individuals is obtained from medical records, such as a Personal Health Record (PHR), Electronic Health Record (EHR) or Electronic Medical Record (EMR). In one embodiment, at least a portion of the data for one or more individuals is accessed, retrieved or obtained (directly or indirectly) from a centralized medical records database. In one embodiment, at least a portion of the data for one or more individuals is accessed or retrieved from a centralized medical records database over a computer network.
A number of interfaces can be used to support access by different users and other parties, including computer systems, requiring access to the system. In one embodiment an interface is presented over the web, using protocols such as http and https in combination with Hypertext Markup Language (HTML), Java, and other programming and data description/presentation tools which allow information to be presented to and received from the user or users. The interface may contain a number of active elements such as applets or other code which actively constructs display elements and which prompts the user for specific information and which actively creates queries or formulates or formats results for presentation, transmission (e.g. downloading), or storage. In one embodiment the interface allows users to sort data such that products, service and providers can be listed by a particular parameter or sets of parameters. For example, in one embodiment the user can request a presentation of most appropriate (highly matched) web items which are sub-ranked according to appropriateness for the age and/or gender of the user. In an alternate embodiment, a graphical presentation (map) is presented which indicates the most appropriate web items by color or icon. The interface can allow authorized queries to the different databases in the system, and within the constraints of the authorizations and permissions, make the determinations of applicability (appropriateness) of web items based on the pangenetic data of the user. In one embodiment, the user interface at one location (e.g. subscriber location) works in conjunction with a user interface in another location (e.g. medical provider, healthcare provider) to allow pangenetic data to be accessed for making a determination of appropriateness of a web based information or product/service offerings.
The embodiments of the present invention may be implemented with any combination of hardware and software. If implemented as a computer-implemented apparatus, the present invention is implemented using means for performing all of the steps and functions disclosed above.
The embodiments of the present invention can be included in an article of manufacture (e.g., one or more computer program products) having, for instance, computer useable (i.e., readable) media. The media has embodied therein, for instance, computer readable program code means for providing and facilitating the mechanisms of the present invention. The article of manufacture can be included as part of a computer system or sold separately.
While specific embodiments have been described in detail in the foregoing detailed description and illustrated in the accompanying drawings, it will be appreciated by those skilled in the art that various modifications and alternatives to those details could be developed in light of the overall teachings of the disclosure and the broad inventive concepts thereof. It is understood, therefore, that the scope of the present invention is not limited to the particular examples and implementations disclosed herein, but is intended to cover modifications within the spirit and scope thereof as defined by the appended claims and any and all equivalents thereof.
Claims
1. A computer-implemented method comprising:
- obtaining genetic attributes of a plurality of individuals;
- based on the genetic attributes, presenting, by way of a computer-mediated interface, a questionnaire to each the individuals, wherein the questionnaire relates to health-related content;
- receiving, by way of the computer-mediated interface, respective responses corresponding to the questionnaire from the individuals;
- storing, in a database structure, associations between the genetic attributes of the individuals and their respective responses corresponding to the questionnaire;
- determining, from the associations, correlations between the health-related content and patterns within the genetic attributes; and
- based on the correlations, making recommendations regarding the health-related content to one or more of the plurality of individuals.
2. The computer-implemented method of claim 1, wherein the computer-mediated interface is a web-based interface, and wherein the questionnaire is presented as web content items.
3. The computer-implemented method of claim 1, wherein making recommendations regarding the health-related content to one or more of the plurality of individuals comprises:
- accessing genetic attributes of a further individual;
- determining, from the correlations, aspects of the health-related content relevant to the further individual; and
- providing, to the further individual, the aspects of the health-related content relevant to the further individual.
4. The computer-implemented method of claim 3, wherein the further individual has opted-in to using the genetic attributes to determine the aspects of the health-related content relevant to the further individual.
5. The computer-implemented method of claim 3, further comprising:
- receiving, from the further individual, one or more ratings of the aspects of the health-related content relevant to the further individual.
6. The computer-implemented method of claim 1, further comprising:
- receiving behavioral attributes of the plurality of individuals;
- storing, in a further database structure, further associations between the genetic attributes of the individuals and their respective behavioral attributes; and
- determining, from the further associations, further correlations between the respective behavioral attributes and patterns within the genetic attributes, wherein making the recommendations is also based on the further correlations.
7. The computer-implemented method of claim 1, further comprising:
- receiving behavioral attributes of the plurality of individuals;
- storing, in a further database structure, further associations between the behavioral attributes of the individuals and their respective responses corresponding to the questionnaire; and
- determining, from the further associations, further correlations between the respective behavioral attributes and the respective responses corresponding to the questionnaire, wherein making the recommendations is also based on the further correlations.
8. The computer-implemented method of claim 1, wherein the genetic attributes comprise one or more single nucleotide polymorphisms (SNPs).
9. The computer-implemented method of claim 1, wherein one or more of the genetic attributes are stored in a masked fashion to prevent access by unauthorized parties.
10. The computer-implemented method of claim 1, wherein at least some of the respective responses corresponding to the questionnaire from the individuals are from an ordered scale of two or more values.
11. A non-transitory computer-readable medium storing program instructions that, when executed by one or more processors of a computing system, cause the computing system to perform operations comprising:
- obtaining genetic attributes of a plurality of individuals;
- based on the genetic attributes, presenting, by way of a computer-mediated interface, a questionnaire to each the individuals, wherein the questionnaire relates to health-related content;
- receiving, by way of the computer-mediated interface, respective responses corresponding to the questionnaire from the individuals;
- storing, in a database structure, associations between the genetic attributes of the individuals and their respective responses corresponding to the questionnaire;
- determining, from the associations, correlations between the health-related content and patterns within the genetic attributes; and
- based on the correlations, making recommendations regarding the health-related content to one or more of the plurality of individuals.
12. The non-transitory computer-readable medium of claim 11, wherein making recommendations regarding the health-related content to one or more of the plurality of individuals comprises:
- accessing genetic attributes of a further individual;
- determining, from the correlations, aspects of the health-related content relevant to the further individual; and
- providing, to the further individual, the aspects of the health-related content relevant to the further individual.
13. The non-transitory computer-readable medium of claim 12, wherein the further individual has opted-in to using the genetic attributes to determine the aspects of the health-related content relevant to the further individual.
14. The non-transitory computer-readable medium of claim 13, the operations further comprising:
- receiving, from the further individual, one or more ratings of the aspects of the health-related content relevant to the further individual.
15. The non-transitory computer-readable medium of claim 11, the operations further comprising:
- receiving behavioral attributes of the plurality of individuals;
- storing, in a further database structure, further associations between the genetic attributes of the individuals and their respective behavioral attributes; and
- determining, from the further associations, further correlations between the respective behavioral attributes and patterns within the genetic attributes, wherein making the recommendations is also based on the further correlations.
16. The non-transitory computer-readable medium of claim 11, the operations further comprising:
- receiving behavioral attributes of the plurality of individuals;
- storing, in a further database structure, further associations between the behavioral attributes of the individuals and their respective responses corresponding to the questionnaire; and
- determining, from the further associations, further correlations between the respective behavioral attributes and the respective responses corresponding to the questionnaire, wherein making the recommendations is also based on the further correlations.
17. The non-transitory computer-readable medium of claim 11, the genetic attributes comprise one or more single nucleotide polymorphisms (SNPs).
18. The non-transitory computer-readable medium of claim 11, wherein one or more of the genetic attributes are stored in a masked fashion to prevent access by unauthorized parties.
19. The non-transitory computer-readable medium of claim 11, wherein at least some of the respective responses corresponding to the questionnaire from the individuals are from an ordered scale of two or more values.
20. A computing system comprising:
- one or more processors;
- memory; and
- program instructions, stored in the memory, that upon execution by the one or more processors cause the computing system to perform operations comprising: obtaining genetic attributes of a plurality of individuals; based on the genetic attributes, presenting, by way of a computer-mediated interface, a questionnaire to each the individuals, wherein the questionnaire relates to health-related content; receiving, by way of the computer-mediated interface, respective responses corresponding to the questionnaire from the individuals; storing, in a database structure, associations between the genetic attributes of the individuals and their respective responses corresponding to the questionnaire; determining, from the associations, correlations between the health-related content and patterns within the genetic attributes; and based on the correlations, making recommendations regarding the health-related content to one or more of the plurality of individuals.
Type: Application
Filed: Nov 3, 2022
Publication Date: Mar 2, 2023
Inventors: Andrew Alexander Kenedy (Sugar Land, TX), Charles Anthony Eldering (Furlong, PA)
Application Number: 17/980,024