SYSTEM AND METHOD FOR DETERMINING PREFERENCES FROM INFORMATION MASHUPS

- IBM

A system and method for determining preferences from information mashups and, in particular, for determining preferences from cross-modality information based on a social welfare function is disclosed. An exemplary embodiment of the invention uses a social welfare function (SWF) to identify a vote computing method from among a group of vote computing methods. The SWF embodies subjective values, e.g. business objectives. The embodiment uses the SWF to identify the vote computing method that combines cross-modality information into a single information mashup in a manner that is most congruent with the subjective values relative to the other vote computing methods. The information mashup may be in the form of a single, merged ranked list.

Skip to: Description  ·  Claims  · Patent History  ·  Patent History
Description
CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of and incorporates by reference in its entirety U.S. provisional application No. 61/041,128, which was filed on Mar. 31, 2008.

FIELD OF INVENTION

The present invention relates to information mashups, and in particular to a system and method for determining preferences from cross-modality information mashups.

BACKGROUND

Through the advances of technology, today's world has become inundated with information. One continuing technological and societal challenge is finding methods and systems to extract and combine useful data, knowledge, and understanding from a pool of information that is constantly growing in quantity and increasing in granularity.

Even when we narrow our analysis to one domain of interest, e.g. ranking wines, how do we combine all the information indicating preferences within the domain when the information is available from multiples sources and the sources differ in modality? For example, how do we combine multiple lists of preferences, e.g., from different online communities, sales numbers from different stores, etc? How do we combine the information in a manner that will reveal the aspects of that information that are important, valuable, significant to an entity (e.g., a machine, business, customer, end-user, etc.) requesting the results? And how do we enable tuning of the outcome, e.g., at the touch of a button, to target certain characteristics and elevate those characteristics to the forefront?

SUMMARY OF THE INVENTION

A computer-implemented method for determining preferences from cross-modality information mashups is provided. The method includes receiving a social welfare function (SWF) and identifying two or more vote computing methods. For each of the two or more vote computing methods, the method uses the vote computing method to combine information on preferences into a combined list ranking the preferences. The information is from a set of two or more sources. The set is heterogeneous in modality. For each combined list, the method inputs the combined list into the SWF to compute a score. The method outputs the combined list of the vote computing method associated with the highest score. The set of two or more sources may include data from websites indicating preferences within a certain domain of interest. The information from the set of two or more sources may include structured data from a first source and unstructured data from a second source. The number of preferences being ranked may be at least an order magnitude more in number than the number of sources.

A computer program product for determining preferences from cross-modality information is also provided. The computer program product includes a computer readable medium and program instructions. The program instructions include first program instructions to identify two or more vote computing methods and second program instructions to, for each of the two or more vote computing methods, use the vote computing method to combine information on preferences into a combined list ranking the preferences. The information is from a set of two or more sources. The set is heterogeneous in modality. The program instructions further include third program instructions to compute a score, and fourth program instructions to output the vote computing method associated with the highest score. The program instructions may also include fifth program instructions to output the combined list of the vote computing method associated with the highest score. The two or more sources may include an online blog, an online forum, and/or an online social networking website. The social welfare function may be selected from the group consisting of: Bergson-Samuelson, Precision Optimal Aggregation, and Spearman Footrule.

A system for determining preferences from cross-modality information is further provided. The system includes a communications interface, memory storing computer usable program code; and a processor coupled to the communications interface to receive information on preferences from an external device and coupled to the memory to execute the computer usable program code stored on the memory. The computer usable program code includes computer usable program code configured to identify two or more vote computing methods; computer usable program code configured to, for each of the two or more vote computing methods, use the vote computing method to combine the information on preferences into a combined list ranking the preferences, wherein the information is from a set of two or more sources, and wherein the set is heterogeneous in modality; computer usable program code configured to, for each combined list, input the combined list into a social welfare function to compute a score; and computer usable program code configured to identify the vote computing method associated with the highest score. The computer usable program code may further include computer usable program code configured to output the combined list of the vote computing method associated with the highest score.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 shows a method of combining data on preferences according to one embodiment of the present invention.

FIGS. 2A-2D shows sample tables showing the top-10 artists resulting by merging preference information from various sources using different vote computing methods.

FIG. 3 shows a table of four top-10 lists.

FIG. 4 shows a flow diagram of a method for combining data on preferences in accordance with an embodiment of the invention.

FIG. 5 shows a flow diagram of another method for combining data on preferences in accordance with an embodiment of the invention.

FIG. 6 is a high level block diagram showing an information processing system useful for implementing one embodiment of the present invention.

FIG. 7 represents an exemplary distributed data processing system in which aspects of the illustrative embodiments may be implemented.

DESCRIPTION Overview

The present invention provides a system and method for determining preferences from information mashups. An information mashup combines or mixes information or data from a multitude of often-conflicting sources into a single representation. For example, for any given domain of interest, opinions can be expressed in many places and collected by many sources. Online sources for people's opinion on a wide range of topics include, for example, blogs, discussion forums and social networking sites. Embodiments of the invention combine information gathered from across different sources, including in one application various online sources, to form a unified, focused view of a community's interests regarding that domain.

An exemplary embodiment of the invention determines preferences from cross-modality information mashups. In more traditional information integration scenarios, systems compare things with identical modalities, such as number of sales from different sources. However there are many domains of interests (e.g., patient preferences, drugs for certain medical conditions, cars, wine, financial products (stocks, bonds, etc.), consumer goods, cameras, computers, books, etc.) where information is available from many different modalities (e.g., comments, passive listens, sales, hits on a website, creation of new website, views on television, etc.). In the domain of books, the following information may be available for determining book preferences: book sales and returns, lists of books read, library checkouts, comments on books read (e.g., online, in newspapers, in magazines, on television or radio), etc. An exemplary embodiment of the invention determines preferences from information mashups constructed from information and data from a set of sources heterogeneous in modality. For example, say we want to combine different on-line data to generate a list of wines. One source of preferences may be generated from sales numbers of wines. Another source may be a list generated from wine tasters. Yet another may be generated by professionals at a wine magazine. Yet another may be generated from counts of comments users post on a wine aficionado site. There are many more sales of wines than posts on a website. Many people buy wines whereas composing a review takes more time and may indicate more interest in a particular vintage. Ultimately, a good cross-modality mashup combines these multiple sources, which indicate interest in all the same underlying subject matter, without allowing one source to unduly influence the combined/consensus list.

Yet, how can one combine the data from the various sources when they are heterogeneous in modality? Comparing different modalities is akin to comparing apples and oranges. How does one determine overall rankings for certain wines, for example, based on the combination of data on sales, written reviews, returns, website polls, etc.? How do you combine data indicating that the reviewer loves a certain wine (glowing reviews), but the public hates it (e.g., by ranking it low on wine.com or low sales)? Do we decide that ten times as many posts on a website reflect ten times as much interest in an event or item? There is a fair amount of subjectivity in how these combinations occur and it is not typically clear how to combine all these sources.

In systems that compare things with identical modalities, using a plurality type voting system makes sense. Plurality type voting systems are those that add together the number of votes from each source and simply adjudicate the winner based on whomever or whichever candidate has the most votes. Plurality type voting systems include systems in which votes are weighted. However, plurality type voting systems have deficiencies when combining information gathered from multiple sources with differing modalities. This can occur, for example, when there are large differences in the numbers returned by sources or when the values measured to derive those numbers indicate very different things.

To identify which of a multitude of combination techniques (including plurality type voting techniques) is optimal for combining data from various sources in a certain instance, embodiments of the invention use a construct known as a social welfare function (SWF). A SWF is a mapping from allocations of goods or rights among people to real numbers. The SWF construct was a tool introduced by Abram Bergson in 1938. The SWF construct allows for the determination of a society's taste for different economic states. There are two features to the SWF construct: first, it imposes a structure and second, it devises a single constitutional/voting system that changes the rankings of the individual into a single society ranking. A SWF might describe, for example, the preferences of an individual over social states, or might describe, as another example, outcomes of an allocation process, whether or not individuals had preferences over those outcomes. Examples of SWFs are the Bergson-Samuelson, Precision Optimal Aggregation, and Spearman Footrule. Thus, using an SWF, a method is supplied for embodying subjectiveness, such as those described above, into one function. Using a custom constructed or selected SWF, embodiments of the invention can capture, for example, business goals in a semi-heuristic way, objectively evaluate various preference combination techniques, and identify which of the combinations techniques to use in a specific instance.

In one exemplary application, the combination techniques include techniques that originate from vote computing or vote counting systems, such as a Borda count method or the Nauru method. Embodiments of the invention may supplement or modify a vote computing or vote counting technique depending on whether the original information expressing the preferences is, for example, structured or unstructured, numerical or textual, etc.

In one exemplary embodiment, the combination technique used is as describe in co-pending U.S. patent application Ser. No. ______ (having attorney docket number ARC920080029US2), and filed on ______. Accordingly, the system and method for determining preferences from information mashups described in detail herein compliments the system and method described in the co-pending U.S. patent application Ser. No. ______ (having attorney docket number ARC920080029US2). In one use, the system or method described in detail herein may be used in conjunction with the system or method described in detail in the co-pending application. In another use, the first system and method may be used separately from the latter

In embodiments of the invention, the SWF takes as input a “final” ranked list generated from each of the various vote counting/computing methods and/or systems, and the preferences of each source. The “final” ranked list may be generated using, for example, weighted voting systems, semi-proportional methods, delegates, Borda Count, inverted rank, run off, round robin, and/or a ranking method described in U.S. patent application Ser. No. ______ (having attorney docket number ARC920080029US2). The SWF outputs a number that indicates how happy or satisfied the “society” of sources is with the results. Thus, in one application, multiple methods of combining are examined and evaluated, and the combining method that returns the highest SWF value is considered the “best” method. That combining method is then established as the combining method that application will use when determining preferences from future mashups combining information for those sources for those business purposes, for example. As discussed below, reevaluation of the combining method may be done periodically to optimize the quality of the results.

The present disclosure differs from traditional work in the field in several ways. For example, the disclosure addresses situations where, as noted, people are providing preferences in non-uniform ways (complaints, purchase, opinion posted, time, etc.). In such situations, ad hoc weights don't work well because ad hoc weights can only adjust for the deficiencies that exist at a simple point in time. Consider the use of ad hoc weights in combining top-10 lists from Amazon.com and Barnes & Noble in 1995. In that year, Amazon.com ranks should be weighted lower (having less weight in the over scheme of the analysis) than Barnes & Noble because Amazon.com opened its online store in July 1995. If the top-10 lists were compared today, the weights would differ. Thus, although ad hoc weights are useful when combining lists of preferences at one point in time, they need adjusting each time new data from the sources are recombined to account for, e.g., changes in the market, business cycles, seasons, time of day, new product releases (which could, for example, skew statistics for a few days), blitz marketing campaigns, events (e.g., Olympics® or Super Bowl®), etc. These real world changes have the potential of causing dramatic shifts in the rankings being reported. The ad hoc weights adjustments are time-dependent. If we calculate the rankings at a different point in time, the weights would be reconsidered and changed, tuned each time we calculate the rankings. This can be particularly onerous depending on how often the combined rankings are calculated (in real time, daily, weekly, monthly, quarterly, annually, etc.) particularly if the tuning is done without the assistance of any computer-implemented algorithms.

In contrast, embodiments of the invention identify the most appropriate method to combine preferences from sources of different modalities by using a SWF appropriate for predefined objectives (e.g., business requirements). Thus, in analyzing and combining information on preferences, exemplary embodiments take into account, for example, business requirements to a level of granularity that ad hoc weights cannot.

Additionally, embodiments of the invention examine domains with orders of magnitude more “candidates” than “voters”, the reverse of most elections. Conventional voting techniques do not examine scenarios in which the number of “candidates” is orders of magnitude more than the number of “voters.” For example, the Borda function is intended for use in situations when there are a large number of voters and a small number of candidates, such as in a presidential election. Accordingly, embodiments of the invention examine vote computing techniques that are intended for use in scenarios in which the number of items being ranked (or “candidates”) is orders magnitude more than the number of sources ranking the items (“voters”), the opposite of convention elections. Thus, such a vote computing technique may combine the information on preferences into a combined list ranking the preferences in an application in which the number of preferences being ranked is at least an order of magnitude more in number than the number of sources.

FEATURES OF EXEMPLARY EMBODIMENT

Exemplary embodiments of the invention determine preferences from cross-modality information mashups based on a constructed or selected social welfare function (SWF). FIG. 1 illustrates a method of combining data on preferences according to one embodiment of the invention. FIG. 1 shows information 1010 from multiples sources (e.g., Source1, Source2, Source3, etc.), a set of vote computing methods 1020, a social welfare function (SWF) 1030, and a set of social welfare function (SWF) scores 1040.

The information 1010 includes information from multiples sources (e.g., Source1, Source2, Source3, etc.). The multiple sources are of varying modalities. Modalities may be expressed as having two major dimensions: intentional versus unintentional, and consuming (passive) versus producing (creative). Intentional activities are those where a user, for example, has had to take steps to “make their mark.” Examples in the online arena would be navigating to a particular page or typing in a name into a search bar. Intentional activities are stronger indicators of interest than unintentional activities. Creative, producing activities are, for example, those where the user takes the time to author a post or compose a response. Passive, consuming activities may involve watching or reading something created by someone else. Creative activities, taking more time and attention, indicate more interest than passive activities.

In FIG. 1, each source is either a list of preferences (e.g. a ranked list) or provides data which is converted into a list of preferences. The converting may, for example, process user posts from a social networking site, such as by employing a series of unstructured information management architecture (UIMA) annotators driven off of entity spotting, using Information Extraction (IE) techniques, and/or using natural language mining (NLM) techniques. Depending on the application, an embodiment of the invention may additionally or alternatively request that a source convert the source's data into a list of preferences, instead of converting the data itself. In an example application, Source1 may be sales numbers of wines, Source2 may be an online list generated by wine tasters, and Source3 may be a ranking based analysis of various blogs on wines. The analysis may have employed a series of UIMA annotators. Each source expresses or reflects opinions on the same underlying subject matter, phenomenon, or domain of interest.

The information 1010 is communicated to each of vote computing methods (e.g., Vote computing method1, Vote computing method2, etc.). In FIG. 1, each vote computing method is a different combining technique. For example, Vote computing method1, may be a Borda count technique, Vote computing method2 may be an inverted ranking technique, Vote computing method3 may be round robin technique and Vote computing method4 may be the technique described in U.S. patent application Ser. No. ______ (having attorney docket number ARC920080029US2.

In FIG. 1, the output of each vote computing method is, a “final” ranked list, sometimes referred to as a single, merged list. The final list is provided to the SWF, along with the sources' preferences. From one perspective, the SWF is a mathematical criterion for the success of a voting system based on some desired characteristics. Accordingly, in certain embodiments, the SWF is constructed to capture characteristics which are considered valuable. The value may be determined from a business standpoint if business objectives are driving the undertaking. For example, in determining preferences for wine, it may be considered valuable for each source to see at least ½ of its top 10 list in the overall top 10 list. How well the SWF embodies the subjective values driving the undertaking and incorporates those values into an objective function affects the appropriateness of the actual combined list outputted by embodiments of the invention.

Accordingly, the SWF may be selected or custom constructed to fit the situation. In one embodiment, the SWF is selected from among a set of SWF, e.g., a set including the Precision Optimal Aggregation SWF (Pswf) and the Spearman Footrule SWF (Sswf). The Pswf measures how many items from each source's top-n list are in the “final” ranked list (the single list which merges ranked items from each source). For example, in one application, the Pswf measures how many artists from each source's top-10 list are in an overall top-10 list created using Borda count technique. One exemplary embodiment uses a Precision Optimal Aggregation SWF defined as:


PswfSmin(2*|TS∩T|,10),

    • for top-10 lists TS for each source and top-10 list T overall.

The Spearman Footrule SWF (Sswf) emphasizes preservation of position in the rankings. The Sswf is an approximation of a related SWF Kendall tau distance. The Sswf is less computationally intensive (minutes versus days) relative to the Kendall tau distance. One exemplary embodiment uses a Spearman Footrule SWF defined as:


SswfSΣ10a=1max(10−|ra−ras|,0).

In use, the SWF takes as input a “final” ranked list and the preferences of each source. The outcome is a score where points are awarded for increased social welfare of a ranking system. In this way, embodiments quantitatively measure the “happiness” of each contributing source with the overall “final” ranking. As shown in FIG. 1, an SWF score is calculated for each vote computing method (e.g., SWF score1, SWF score2, SWF score3, etc.).

FIGS. 2A-2D shows sample tables showing the top-10 artists resulting by merging preference information from various sources using different vote computing methods. In FIGS. 2A-2D, the results of four combination techniques are shown to illustrate how each technique merges the four top-10 lists show in FIG. 3. In each of FIGS. 2A-2D, a “final” top-10 computed using the corresponding vote counting technique is shown in the first column. Specifically, in FIG. 2A, a “final” top-10 ranking computed using Total Votes is shown; in FIG. 2B, a “final” top-10 ranking computed using Weighted Votes is shown; in FIG. 2C, a “final” top-10 ranking computed using Semi-Proportional methods is shown; and in FIG. 2D, a “final” top-10 ranking computed using Delegates is shown.

In each of FIGS. 2A-2D, SWF scores computed using two different SWF (Pswf and Sswf) are shown. The Pswf column shows the contribution of each artist to the overall Precision Optimal Aggregation SWF for that source. The Sswf column shows the contribution of each artist to the overall Spearman Footrule SWF for that source. In the columns labeled Pswf and Sswf, from top to bottom in each of those table cells, the bars correspond to the sources in FIG. 3 in this order: Bebo, LastFM, MySpace, and YouTube. The bars represent how “happy” each source is with the artist being ranked at this position.

The graphs in the CS column express the contribution to the combined ranking for the artist from each source. In the columns labeled CS, the bars from left to right in each of those cells correspond to the sources in FIG. 3 in the order: Bebo, LastFM, MySpace, and YouTube. The greater a source's contribution to the combined ranking for the artist, the longer the bar. For example, in FIG. 2A, the rankings shown in the first column was produced by merging the rankings from Bebo, LastFM, MySpace, and YouTube shown in FIG. 3 through simple summation of the votes for each artists. In FIG. 2A, the last bar, corresponding to YouTube, is the longest because YouTube had more data points than the other sources. In this example, the “number of votes” is dominated by YouTube.

The bottom of each table in FIGS. 2A-2D shows the total SWF for Pswf and Sswf expressed as a raw score. For Pswf, each source contributes up to 10 points, for a maximum score of 40 (best). For Sswf, each source contributes up to 100 points, for a maximum of 400. The total influence of each source had on the top-10 list is also seen in the bottom of each table in the last row of the CS column.

The examples illustrated by FIGS. 2A-2D can be understood in the context of FIG. 1 as follows. FIGS. 2A-2D illustrate a scenario with four sources: Source1=ranking from Bebo, Source2=ranking from LastFM, Source3=ranking from MySpace, and Source4=ranking from YouTube. FIGS. 2A-2D also illustrate a scenario with four vote counting methods: Vote computing method1=Total Votes (see FIG. 2A), Vote computing method2=Weighted Votes (see FIG. 2B), Vote computing method3=Semi-Proportional (see FIG. 2C), and Vote computing method4=Delegates (see FIG. 2D).

FIGS. 2A-2D also illustrate a scenario when the SWF is a Precision Optimal Aggregation SWF (second column of each table). For each vote counting method, the Precision Optimal Aggregation SWF score is shown at the bottom of the corresponding table: SWF score1=22 (see FIG. 2A), SWF score2=28 (see FIG. 2B), SWF score3=30 (see FIG. 2C), and SWF score3=26 (see FIG. 2D). Table 1 below shows the results converted to percentage based on a maximum score of 40 for the Precision Optimal Aggregation SWF.

TABLE 1 Precision Optimal Aggregation SWF scores Raw Precision Optimal Vote counting method Aggregation SWF score Percentage Total Votes 22 55% Weighted Votes 28 70% Semi-Proportional 30 75% Delegates 26 65%

Accordingly, using the Precision Optimal Aggregation SWF, the Semi-Proportional vote counting method is identified among the four as the combining technique that produces a combined list most congruent with the subjective values embodied by the Precision Optimal Aggregation SWF.

FIGS. 2A-2D also illustrate a scenario when the SWF is Spearman Footrule SWF (third column of each table). For each vote counting method, the Spearman Footrule SWF score is shown at the bottom of each table: SWF score1=149 (see FIG. 2A), SWF score2=153 (see FIG. 2B), SWF score3=146 (see FIG. 2C), and SWF score3=151 (see FIG. 2D). Table 2 below shows the results converted to percentage based on a maximum score of 400 for the Spearman Footrule SWF.

TABLE 2 Spearman Footrule SWF scores Raw Spearman Footrule Vote counting method SWF score Percentage Total Votes 149 37.25% Weighted Votes 153 38.25% Semi-Proportional 146 36.50% Delegates 151 37.75%

Accordingly, using the Spearman Footrule SWF, the Weighted Votes vote counting method is identified as the combining technique among the four that produces a combined list most congruent with the subjective values embodied by the Spearman Footrule SWF.

Accordingly, embodiments of the invention include a method that includes identifying a vote computing method that produces the highest SWF score. The evaluation of which voting computing method is most appropriate for a given set of objectives (e.g., business objectives) is performed by the SWF. The SWF takes the lists of the voters' preferences (the lists from the various sources), along with the outcome of the vote (the combined/consensus list), and produces for each vote computing method a “score” indicating the “satisfaction” in the outcome. The highest score indicates the highest satisfaction. That is, the vote computing method that elevates/accounts for/values those characteristics that are valued by the business objectives (as modeled using the SWF) in an optimal fashion is the vote computing method that gets the highest score from the SWF. Since an embodiment of the invention will output a final combined list based on the SWF, the quality of the output is affected by how well the SWF embodies the subjective values driving the undertaking and incorporates those values into an objective function.

In an exemplary embodiment, to improve the quality of the system or method's output, a large collection of voting methods or combination techniques are enumerated and examined, over multiple time periods of sample data, to identify which voting method or combination technique produces the highest SWF score. An exemplary embodiment examines the results of various “voting” methods using several weeks or months worth of data.

For parameterized voting techniques (such as the technique described in U.S. patent application Ser. No. ______ (having attorney docket number ARC920080029US2), parameter(s) may also be optimized as well to improve the quality of the system. For example, in use, embodiments may determine (e.g., by searching for or computing) a parameter value that is most congruent with enabling the parameterized voting method to output a combined list that reflects the values, e.g., the business objectives.

Moreover, the characteristics of many sources change over time. Thus, in an exemplary embodiment, even after a vote counting method is established, the congruency of the method to the business objectives (as embodied by the SWF) is revisited periodically (e.g., quarterly) to make sure that changes in the underlying data sources have not reduced the quality of the results. In certain applications, the additional optimization techniques described are also repeated periodically.

Thus, an exemplary embodiment of the invention applies voting theory to cross-modality information mashups to construct a combined list ranking preferences. An SWF is used to select from various voting methods based on data from various cross-modality sources. In use, the sources and associated data are dependent on the domain. For example, in the domain of interest of wine, the source or associated data may be results of wine tasting parties, professional reviews (e.g., scores from 1-10 in different categories), sales, change in sales, comments posted by average users, and mentions in mass media.

FIG. 4 shows a flow diagram of a method 4000 for combining data on preferences in accordance with an embodiment of the invention. At 4010, a social welfare function (SWF) is received (e.g., by communications interface 66 or from memory or storage, as described below). In an exemplary embodiment, the SWF embodies a business objective. At 4020, sources that provide perspective on opinions on the subject area are identified. The sources may be, for example, sales, comments, etc. At 4030, data from these sources are gathered and normalized to create ranked lists of preferences for each source. At 4040, two or more vote computing methods are identified. At 4050, for each of the two or more vote computing methods, the vote computing method is used to combine data on preferences into a combined list ranking the preferences. The data being combined is from a set of two or more sources. In an exemplary embodiment, the set is heterogeneous in modality. In embodiments in which a delegate allocation vote computing method is used, a preliminary distribution of delegates among the sources is determined. For example, in the example shown in FIG. 2D, the following delegate numbers based roughly on population were distributed to the sources: 300 to Bebo, 500 to LastFM, 1000 to MySpace, and 500 to YouTube.

At 4060, for each combined list, the combined list is inputted into the SWF to compute a score. The score indicates congruency between the combined list and value(s) embodied by the SWF, e.g., a business objective. At 4070, the combined list of the vote computing method associated with the highest score is outputted. In one embodiment, additionally or alternatively, the vote computing method associated with the highest score is outputted.

FIG. 5 shows a flow diagram of a method 5000 for combining data on preferences in accordance with an embodiment of the invention. At 5010, a social welfare function (SWF) is created. In an exemplary embodiment, the SWF defines business objectives. At 5020, two or more vote computing methods are identified. At 5030, for each of the two or more vote computing methods, the vote computing method is used to combine data on preferences into a combined list ranking the preferences. The data is from a set of two or more sources. In an exemplary embodiment, the set is heterogeneous in modality. At 5040, for each combined list, the combined list is inputted into the SWF to compute a score. The score indicates congruency between the combined list and, for example, the business objective. At 5050, the combined list of the vote computing method associated with the highest score is outputted.

Although labeled with the numbers above, it should be understood that embodiments of this invention may execute the method 4000 and/or the method 5000 in a non-sequential order as appropriate and still remain in accordance with the invention. For example, although numbered 4010, embodiments of the present invention may receive the social welfare function before, during, or after 4020, 4030, 4040, and/or 4050. Similarly, although numbered 5010, embodiments of the present invention may create the social welfare function before, during, or after 5020 and/or 5030.

Moreover, while ranked lists are described in detail herein, in other embodiments, the sources provide additional information on preference (such as total numbers) for input into voting systems that can make use of such additional information.

Embodiments of the invention can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment containing both hardware and software elements. In a preferred embodiment, the invention is implemented in software, which includes but is not limited to firmware, resident software, and microcode.

Furthermore, the invention can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. For the purposes of this description, a computer-usable or computer-readable medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

The medium can be, for example, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device). Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk, or an optical disk. Current examples of optical disks include compact disk-read-only memory (CD-ROM), compact disk-read/write (CD-R/W), and DVD.

A data processing system suitable for storing and/or executing program code will include at least one processor coupled directly or indirectly to memory elements through a system bus. The memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution.

Input/output or I/O devices (including but not limited to keyboards, displays, pointing devices) can be coupled to the system either directly or through intervening I/O controllers.

Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modems and Ethernet cards are just a few of the currently available types of network adapters.

FIG. 6 is a high level block diagram showing an information processing system useful for implementing one embodiment of the present invention. The computer system includes one or more processors, such as processor 44. The processor 44 is connected to a communication infrastructure 46 (e.g., a communications bus, cross-over bar, or network). Various software embodiments are described in terms of this exemplary computer system. After reading this description, it will become apparent to a person of ordinary skill in the relevant art(s) how to implement the invention using other computer systems and/or computer architectures.

The computer system can include a display interface 48 that forwards graphics, text, and other data from the communication infrastructure 46 (or from a frame buffer not shown) for display on a display unit 50. The computer system also includes a main memory 52, preferably random access memory (RAM), and may also include a secondary memory 54. The secondary memory 54 may include, for example, a hard disk drive 56 and/or a removable storage drive 58, representing, for example, a floppy disk drive, a magnetic tape drive, or an optical disk drive. The removable storage drive 58 reads from and/or writes to a removable storage unit 60 in a manner well known to those having ordinary skill in the art. Removable storage unit 60 represents, for example, a floppy disk, a compact disc, a magnetic tape, or an optical disk, etc. which is read by and written to by removable storage drive 58. As will be appreciated, the removable storage unit 60 includes a computer readable medium having stored therein computer software and/or data.

In alternative embodiments, the secondary memory 54 may include other similar means for allowing computer programs or other instructions to be loaded into the computer system. Such means may include, for example, a removable storage unit 62 and an interface 64. Examples of such means may include a program cartridge and cartridge interface (such as that found in video game devices), a removable memory chip (such as an EPROM, or PROM) and associated socket, and other removable storage units 62 and interfaces 64 which allow software and data to be transferred from the removable storage unit 62 to the computer system.

The computer system may also include a communications interface 66. Communications interface 66 allows software and data to be transferred between the computer system and external devices. Examples of communications interface 66 may include a modem, a network interface (such as an Ethernet card), a communications port, or a PCMCIA slot and card, etc. Software and data transferred via communications interface 66 are in the form of signals which may be, for example, electronic, electromagnetic, optical, or other signals capable of being received by communications interface 66. These signals are provided to communications interface 66 via a communications path (i.e., channel) 68. This channel 68 carries signals and may be implemented using wire or cable, fiber optics, a phone line, a cellular phone link, an RF link, and/or other communications channels.

In this document, the terms “computer program medium,” “computer usable medium,” and “computer readable medium” are used to generally refer to media such as main memory 52 and secondary memory 54, removable storage drive 58, and a hard disk installed in hard disk drive 56.

Computer programs (also called computer control logic) are stored in main memory 52 and/or secondary memory 54. Computer programs may also be received via communications interface 66. Such computer programs, when executed, enable the computer system to perform the features of the present invention as discussed herein. In particular, the computer programs, when executed, enable the processor 44 to perform the features of the computer system. Accordingly, such computer programs represent controllers of the computer system.

FIG. 7 represents an exemplary distributed data processing system in which aspects of the illustrative embodiments may be implemented. Distributed data processing system 100 may include a network of computers in which aspects of the illustrative embodiments may be implemented. The distributed data processing system 100 contains at least one network 102, which is the medium used to provide communication links between various devices and computers connected together within distributed data processing system 100. The network 102 may include connections, such as wire, wireless communication links, or fiber optic cables.

In the depicted example, server 104 and server 106 are connected to network 102 along with storage unit 108. In addition, clients 110, 112, and 114 are also connected to network 102. These clients 110, 112, and 114 may be, for example, personal computers, network computers, or the like. In the depicted example, server 104 provides data, such as boot files, operating system images, and applications to clients 110, 112, and 114. Clients 110, 112, and 114 are clients to server 104 in the depicted example. Distributed data processing system 100 may include additional servers, clients, and other devices not shown.

In the depicted example, distributed data processing system 100 is the Internet with network 102 representing a worldwide collection of networks and gateways that use the Transmission Control Protocol/Internet Protocol (TCP/IP) suite of protocols to communicate with one another. At the heart of the Internet is a backbone of high-speed data communication lines between major nodes or host computers, consisting of thousands of commercial, governmental, educational and other computer systems that route data and messages. Of course, the distributed data processing system 100 may also be implemented to include a number of different types of networks, such as for example, an intranet, a local area network (LAN), a wide area network (WAN), or the like. As stated above, FIG. 7 is intended as an example, not as an architectural limitation for different embodiments of the present invention, and therefore, the particular elements shown in FIG. 7 should not be considered limiting with regard to the environments in which the illustrative embodiments of the present invention may be implemented.

In one use, as an example, clients 110 and 112 collect information (e.g., from user input) and provides it to server 104. Server 104 stores the information in storage 108. Server 106 contains hardware devices and software tools to combine the information (e.g., into information mashups and/or combined/consensus lists) according to the present invention. Server 106 transmits the combined information to server 104 and/or clients 110, 112, and/or 114, for example.

In use, client 114 may provide the server with business requirements embodied in a SWF. The server determines to best vote counting method to use for that particular application based on the SWF and the information stored, e.g., in storage 108. The server may transmit an identification of the vote counting method to the client.

References in the claims to an element in the singular is not intended to mean “one and only” unless explicitly so stated, but rather “one or more.” All structural and functional equivalents to the elements of the above-described exemplary embodiment that are currently known or later come to be known to those of ordinary skill in the art are intended to be encompassed by the present claims. No claim element herein is to be construed under the provisions of 35 U.S.C. section 112, sixth paragraph, unless the element is expressly recited using the phrase “means for” or “step for.”

Thus, a system and method for determining preferences from information mashups and, in particular, for determining preferences from cross-modality information mashups based on a social welfare function is disclosed. While the preferred embodiments of the present invention have been described, it will be understood that modifications and adaptations to the embodiments shown may occur to one of ordinary skill in the art without departing from the scope of the present invention as set forth in the claims. Thus, the scope of this invention is to be construed according to the claims and not limited by the specific details disclosed in the exemplary embodiments.

Claims

1. A computer-implemented method for determining preferences from cross-modality information mashups, the method comprising:

receiving a social welfare function (SWF);
identifying two or more vote computing methods;
for each of the two or more vote computing methods, using the vote computing method to combine information on preferences into a combined list ranking the preferences, wherein the information is from a set of two or more sources, and wherein the set is heterogeneous in modality;
for each combined list, inputting the combined list into the SWF to compute a score; and
outputting the combined list of the vote computing method associated with the highest score.

2. The method of claim 1, wherein the set of two or more sources includes data from websites indicating preferences within a certain domain of interest.

3. The method of claim 1, wherein the information from the set of two or more sources includes structured data from a first source and unstructured data from a second source.

4. The method of claim 3, further comprising processing the unstructured data using natural language mining.

5. The method of claim 1, wherein receiving the SWF comprises receiving a custom constructed SWF based on a set of business objectives.

6. The method of claim 1, wherein the two or more vote computing methods includes a parameterized vote computing method.

7. The method of claim 1, wherein the number of preferences being ranked is at least an order of magnitude more in number than the number of sources.

8. A computer program product for determining preferences from cross-modality information, said computer program product comprising:

a computer readable medium;
first program instructions stored on the computer readable medium, the first program instructions to identify two or more vote computing methods;
second program instructions stored on the computer readable medium, the second program instructions to, for each of the two or more vote computing methods, use the vote computing method to combine information on preferences into a combined list ranking the preferences, wherein the information is from a set of two or more sources, and wherein the set is heterogeneous in modality;
third program instructions stored on the computer readable medium, the third program instructions to, for each combined list, input the combined list into a social welfare function to compute a score; and
fourth program instructions stored on the computer readable medium, the fourth program instructions to output the vote computing method associated with the highest score.

9. The computer program product of claim 8, further comprising

fifth program instructions stored on the computer readable medium, the fifth program instructions to output the combined list of the vote computing method associated with the highest score.

10. The computer program product of claim 8, wherein the information from the set of two or more sources includes structured data from a first source and unstructured data from a second source.

11. The computer program product of claim 10, wherein the second source is selected from the group consisting of: an online blog, an online forum, and an online social networking website.

12. The computer program product of claim 8, wherein the social welfare function is selected from the group consisting of: Bergson-Samuelson, Precision Optimal Aggregation, and Spearman Footrule.

13. The computer program product of claim 8, wherein the social welfare function is a custom constructed social welfare function.

14. The computer program product of claim 8, wherein the two or more vote computing methods includes a parameterized vote computing method.

15. The computer program product of claim 8, wherein the number of preferences being ranked is at least an order magnitude more in number than the number of sources.

16. A system for determining preferences from cross-modality information, the system comprising:

a communications interface;
memory storing computer usable program code; and
a processor coupled to the communications interface to receive information on preferences from an external device and coupled to the memory to execute the computer usable program code stored on the memory; wherein the computer usable program code comprises: computer usable program code configured to identify two or more vote computing methods; computer usable program code configured to, for each of the two or more vote computing methods, use the vote computing method to combine the information on preferences into a combined list ranking the preferences, wherein the information is from a set of two or more sources, and wherein the set is heterogeneous in modality; computer usable program code configured to, for each combined list, input the combined list into a social welfare function to compute a score; and computer usable program code configured to identify the vote computing method associated with the highest score.

17. The system of claim 16, wherein the computer usable program code further comprises:

computer usable program code configured to output the combined list of the vote computing method associated with the highest score.

18. The system of claim 16, wherein the set of two or more sources includes data from websites indicating preferences within a certain domain of interest.

19. The system of claim 16, wherein the information from the set of two or more sources includes structured data from a first source and unstructured data from a second source.

20. The system of claim 16, wherein the processor is coupled to the communications interface to receive the information on preferences from a server.

Patent History
Publication number: 20090248690
Type: Application
Filed: Aug 20, 2008
Publication Date: Oct 1, 2009
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION (Armonk, NY)
Inventors: Varun Bhagwan (San Jose, CA), Tyrone Wilberforce Andre Grandison (San Jose, CA), Daniel Frederick Gruhl (San Jose, CA), Jan Hendrik Pieper (San Jose, CA)
Application Number: 12/195,126
Classifications
Current U.S. Class: 707/7; Data Indexing; Abstracting; Data Reduction (epo) (707/E17.002)
International Classification: G06F 17/30 (20060101);