Clustering Or Classification (epo) Patents (Class 707/E17.046)
-
Publication number: 20120005210Abstract: A method of structuring a database of objects, the objects each comprising one or more attributes, the attributes being ordered, the method being executed by at least one computer processor connected to a memory, the method classifying in memory the objects in a structure composed of a list CL of sets of formal concepts Ci, includes at least the following steps: create several groups of attributes SAi; for each of said groups SAi, construct a closed set Pi composed of all the attributes common to the objects comprising at least the attributes of said group SAi; determine the list CL of formal concepts Ci ordered in the lexicographic order, by successively determining the formal concepts in order of increasing intent, the intent F of a formal concept Ci being formed by a set of closed sets Pi.Type: ApplicationFiled: November 18, 2009Publication date: January 5, 2012Applicant: THALESInventors: Cédric Tavernier, Jean-Luc Rogier
-
Publication number: 20110320454Abstract: A system and method for constructing a hierarchical multi-faceted classification structure includes organizing a plurality of visual categories into a multi-relational reference ontology that accounts for a plurality of different types of relationships. Media artifacts are categorized into the plurality of visual categories. The categories of artifacts are refined based on faceted ontology relationships or constraints from the multi-relational reference ontology. The multi-relational reference ontology and the one or more media artifacts with relationships are stored as the hierarchical multi-faceted classification structure in computer readable memory storage.Type: ApplicationFiled: June 29, 2010Publication date: December 29, 2011Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: MATTHEW HILL, JOHN R. KENDER, APOSTOL NATSEV, QUOC-BAO NGUYEN, JOHN R. SMITH, JELENA TESIC, LEXING XIE, RONG YAN
-
Publication number: 20110320395Abstract: Content provided by a decision engine system is described. Content, stored in a server system, is provided to a plurality of display units at a plurality of touch point devices. One or more features are determined to optimize the content provided to the plurality of display units. The content is updated syndicated across the plurality of display units at the plurality of touch point devices based on the determination.Type: ApplicationFiled: June 29, 2010Publication date: December 29, 2011Inventors: Uzair Dada, Jason Kobilka, Michael Krol, Adeeb Ashraf, Abe Mammen, Omer Saeed
-
Publication number: 20110320447Abstract: In one aspect, a processing device of an information processing system is operative to perform high-dimensional stratified sampling of a database comprising a plurality of records arranged in overlapping sub-groups. For a given record, the processing device determines which of the sub-groups the given record is associated with, and for each of the sub-groups associated with the given record, checks if a sampling rate of the sub-group is less than a specified sampling rate. If the sampling rate of each of the sub-groups is less than the specified sampling rate, the processing device samples the given record, and otherwise does not sample the given record. The determine, check and sample operations are repeated for additional records, and samples resulting from the sample operations are processed to generate information characterizing the database.Type: ApplicationFiled: June 28, 2010Publication date: December 29, 2011Inventors: Aiyou Chen, Ming Xiong
-
Publication number: 20110314018Abstract: Summaries of entities (e.g., people, places, things, concepts, etc.) may provide additional useful information to user. For example, a search engine may provide a summary of an entity within search results. A category (e.g., “writer”, “politician”, etc.) of the entity that is short and concise may be advantageous to provide within a summary of the entity. The category may allow a user to quickly determine whether the information of the entity relates to the intended entity (e.g., search results of an entity as “a writer” vs. search results of an entity as “a politician”). Potential categories and summary text may be extracted from pre-labeled data. The potential categories and summary text may be intersected to determine a set of candidate categories that may be ranked. An entity category having a desired ranked may be determined as the entity category that describes the entity in a desired way.Type: ApplicationFiled: June 22, 2010Publication date: December 22, 2011Applicant: Microsoft CorporationInventors: Michael Bieniosek, Franco Salvetti, Giovanni Lorenzo Thione
-
Publication number: 20110307487Abstract: A system for obtaining data from various sources. The data may be organized into cluster sets of related items. Elements of various kinds may be pulled from the data. The elements may be put together into sets of clusters for each kind of elements. The clusters may be refined relative to one another and in view of integrated properties of the cluster sets. Elements may be added or removed from the clusters during refinement. Examples of the elements may be people and events. Examples of cluster sets of such elements may be groups and goals, respectively.Type: ApplicationFiled: June 15, 2010Publication date: December 15, 2011Inventors: Valerie Guralnik, Kirk Schloegel
-
Publication number: 20110302170Abstract: Methods for factoring search and browse policies and content preferences into Web search results are provided. Such search and browse policies and/or content preferences generally are provided by a parent, an employer, or other company representative and specify to whom they apply. Upon receiving a search query from a particular user, it is determined whether one or more search and browse policies and/or content preferences apply to the received search query. Upon determining that one or more search and browse policies and/or content preferences apply to the received search query, at least one of the received search query and any search results determined as satisfying the search query are analyzed in accordance with the one or more applicable search and browse policies and/or content preferences applying to the user. Any necessary modifications are made to the search results before the results are presented to the user.Type: ApplicationFiled: June 3, 2010Publication date: December 8, 2011Applicant: MICROSOFT CORPORATIONInventor: VLADIMIR HOLOSTOV
-
Publication number: 20110302147Abstract: This disclosure describes systems and methods for identifying and correcting anomalies in web graphs. A web graph is transformed into a sequence of tokens via a walk algorithm. The sequence is fingerprinted to form a set of shingles. The singles are compared to shingles for other web graphs in order to determine similarity between web graphs. Actions are then carried out to remove anomalous web graphs and modify parameters governing web mapping in order to decrease the likelihood of future anomalous web graphs being built.Type: ApplicationFiled: May 2, 2011Publication date: December 8, 2011Applicant: Yahoo! Inc.Inventors: Ali Dasdan, Panagiotis Papadimitriou
-
Publication number: 20110302166Abstract: The present invention provides a search system and a search method to make it easy to find out a document required truly among documents of a search result. This search system includes a division unit that divides a document to be searched into a plurality of blocks in accordance with designated division information, a calculation unit that calculates a hash value of each block by applying a hash function to a character string included in each block, a storage unit that stores the calculated hash value together with positional information on the block in the document, and a document grouping unit that fetches, for each document obtained by searching based on the search word, a corresponding hash value from the storage unit 545 in accordance with positional information on a block including the search word to group documents having the same hash value into one group and output the grouped documents as the search result.Type: ApplicationFiled: October 16, 2009Publication date: December 8, 2011Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Yutaka Moriya, Fumihiko Terui
-
Publication number: 20110302168Abstract: In a method for representing a text document with a graphical model, a document including a plurality of ordered words is received and a graph data structure for the document is created. The graph data structure includes a plurality of nodes and edges, with each node representing a distinct word in the document and each edge identifying a number of times two nodes occur within a predetermined distance from each other. The graph data structure is stored in an information repository.Type: ApplicationFiled: June 8, 2010Publication date: December 8, 2011Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Charu Aggarwal
-
Publication number: 20110295856Abstract: Techniques for grouping related objects such as documents and files using quantum clustering are disclosed. A method may include constructing a feature-object database of multiple objects. The feature-object database may have quantized selected features as keys. A connected objects database maybe built. Clusters of connected objects may be identified in the connected objects database. The clusters of identified objects may be evaluated to determine groups of related objects. The method may be implemented on a computing device.Type: ApplicationFiled: August 8, 2011Publication date: December 1, 2011Inventors: Herbert L. Roitblat, Brian Golbére
-
Publication number: 20110289086Abstract: A system and method for searching a database for multiple entries in the database that contain similar data, in which some embodiments of the method include collating data on physical sites from at least one database source to form a collation of site data, assigning a unique entry identifier to each entry of the site data in the collation, performing a lexical analysis of the site data and assigning a similarity metric(s) to each entry of the site data, sorting site data into at least one group with similar lexical content based on a metric threshold difference analysis of the similarity metric(s), to thereby provide at least one group, having at least one site data entry therein, and wherein where there are two or more site data entries in the at least one group, preferably they refer to the same site or to sites having a similar physical address.Type: ApplicationFiled: May 20, 2011Publication date: November 24, 2011Inventors: Philip Martin Jordan, Vilosh Marion Brito
-
Publication number: 20110282872Abstract: Categorizing data in an on-demand database environment is provided. The categorized data is accessed to provide results based on statistical likelihood that records provide a desired result of a query. The categorization of the data includes organizing queries based on semantic terms, with categorization based on a multidimensional categorization of data in the database environment. The generating of results includes accessing relationship metadata both for individual records and for categories. Relationships along the same category, or among categories can provide records that may answer the query. The relationships and statistics are updated based on usage of the results data. Records and relationships identified as being used to solve the query, or being a desired solution to the query, can be weighted more heavily, thus increasing the likelihood of providing the most relevant data for subsequent queries.Type: ApplicationFiled: May 11, 2011Publication date: November 17, 2011Applicant: salesforce.com, incInventors: Eugene Oksman, Alexandre Hersans
-
Publication number: 20110282875Abstract: A method, system, and computer program for processing records is disclosed. The records are associated with record sets. Record sets are associated with processor sets, which include one or more processors. Records are routed to associated processor sets for processing, based on the record set associated with the record. Records are processed on processors in the processor sets. Furthermore, various localized affinities can be established. Process affinity can link server processes with processor sets. Cache affinity can link database caches with processor sets. Data affinity can link incoming data to processor sets.Type: ApplicationFiled: April 8, 2011Publication date: November 17, 2011Applicant: UNITED STATES POSTAL SERVICEInventors: C. Scot Atkins, Joseph Conway
-
Publication number: 20110276552Abstract: In a dynamic information delivery context, a system collects data regarding transient information accessed by a user. The user can then query the stored data to reconstruct transient information. The system uses heuristics to help reconstruct transient information. The heuristics include user profile, time stamps, metadata, and indexing.Type: ApplicationFiled: May 7, 2010Publication date: November 10, 2011Applicant: TELCORDIA TECHNOLOGIES, INC.Inventors: Shoshana K. Loeb, Euthimios Panagos
-
Publication number: 20110276553Abstract: One embodiment is a computer-implemented method for classifying documents in a collection of documents according to their intended readerships. The method comprises using a computer to select a document in the collection of documents; and using a computer to determine a characteristic of the selected document, the characteristic being: misleading when the document includes one or more features that are determined to be for a purpose other than reading the document; commercial when the document includes features that are presented for a commercial purpose; or personal when the document includes features of a personal opinion. The method further includes using a computer to classify the selected document as misleading, commercial, or personal according to its determined characteristic; and using a computer to repeat the steps of select document, determine a characteristic of the selected document, and classify the selected document for additional documents in the collection.Type: ApplicationFiled: May 10, 2010Publication date: November 10, 2011Applicant: International Business Machines CorporationInventors: Ying Chen, Bin He, W. Scott Spangler
-
Publication number: 20110270819Abstract: Query classification techniques attempt to classify user search queries in order to better understand user search intent. Understanding a user's search intent allows search engines to provide relevant content tailored to the user's interest. Unfortunately, current classification techniques do not take into account contextual information. Accordingly, as provided herein, a target query may be classified based upon contextual information. In particular, features may be extracted from contextual information and/or other sources. For example, features may be extracted from the target query, related queries, and/or invoked search results of the related queries. In this way, the target query may be classified based upon other queries performed by the user and/or search results of the queries the user found interesting. In addition, a CRF model may be utilized in classifying the target query by providing generalized parameters learned from labeled query sessions.Type: ApplicationFiled: April 30, 2010Publication date: November 3, 2011Applicant: Microsoft CorporationInventors: Dou Shen, Daxin Jiang, Jian-Tao Sun
-
Publication number: 20110270808Abstract: A clustering-based approach to data standardization is provided. Certain embodiments take as input a plurality of addresses, identify one or more features of the addresses, cluster the addresses based on the one or more features, utilize the cluster(s) to provide a data-based context useful in identifying one or more synonyms for elements contained in the address(es), and standardize the address(es) to an acceptable format, with one or more synonyms and/or other elements being added to or taken away from the input address(es) as part of the standardization process.Type: ApplicationFiled: April 30, 2010Publication date: November 3, 2011Applicant: International Business Machines CorporationInventors: Tanveer A. Faruquie, Sachindra Joshi, Hima P. Karanam, Marvin Mendelssohn, Mukesh K. Mohania, Angel Smith, L. V. Subramaniam, Girish Venkatachaliah
-
Publication number: 20110270826Abstract: A document analysis system includes a database that stores documents, a document evaluation module that evaluates the documents by using features of the documents, and a user interface (UI) output unit that provides an evaluation result of the documents, which is produced by the document evaluation module, upon call of the documents.Type: ApplicationFiled: October 27, 2009Publication date: November 3, 2011Inventors: Wan-Kyu Cha, Mi- Kyung Jung, Han-Joon Ahn, Jeong-Joong Kim, Sung-Ho Choi
-
Patent number: 8051084Abstract: Systems and methods are described that calculate the interestingness of a set of one or more records in a database, either absolutely (i.e., compared to an overall collection of records) or relative to some other set of records. In one embodiment, the measure is a relative entropy value that has been normalized. Various applications of the measure are described in the context of an information retrieval system. These applications include, for example, guiding query interpretation, guiding view selection and summarization, intelligent ranges, event detection, concept triggers and interpreting user actions, hierarchy discovery, and adaptive data mining.Type: GrantFiled: June 25, 2008Date of Patent: November 1, 2011Assignee: Endeca Technologies, Inc.Inventors: Daniel Tunkelang, Joyce Jeanpin Wang, Vladimir Zelevinsky, Paul Alexander Wehner
-
Publication number: 20110258173Abstract: A computerized system and method of constructing and expanding search queries for conducting searches through information sources. The system enables retrieving a category options tree, allowing a user to define a category route by selecting a category-node, which defines a search-category. The system may further enable retrieving a query scenario tree, having a hierarchal structure comprising query nodes, where the retrieved query scenario tree is associated with an initial input query, inputted by a user. Each query node defines a query route enabling to construct the content and structure of an expanded search query. The system enables selecting a query node of the retrieved query scenario tree, according to an online decision making process, which analyses the search-category in relation to available query routes in to allow selecting a query node from the retrieved scenario tree that is most compatible with the search-category.Type: ApplicationFiled: June 28, 2011Publication date: October 20, 2011Inventors: Michael RATINER, Dmitry KUHARENKO, Alexander RUBINOV
-
Publication number: 20110258192Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for question and answer services. In one aspect, a method combines receiving a plurality of questions from a plurality of different servers according to a protocol that defines services for submitting questions and obtaining answers to questions. Each received question is analyzed and associated with one or more labels based on the analysis. A request from a server is received according to the protocol to obtain questions related to one or more labels. Questions associated with one or more of the labels are identified and provided in response to the request.Type: ApplicationFiled: November 29, 2010Publication date: October 20, 2011Applicant: GOOGLE INC.Inventors: Jun Yao, Jinhui Du
-
Publication number: 20110258193Abstract: One embodiment of the present invention provides a system for estimating a similarity level between semantic entities. During operation, the system selects two or more semantic entities associated with a number documents. The system subsequently parses the documents into sub-parts, and calculates the similarity level between the semantic entities based on occurrences of the semantic entities within the sub-parts of the documents.Type: ApplicationFiled: April 15, 2010Publication date: October 20, 2011Applicant: PALO ALTO RESEARCH CENTER INCORPORATEDInventors: Oliver Brdiczka, Petro Hizalev
-
Publication number: 20110246462Abstract: A method and system for prompting changes of electronic document content. The method includes the steps of: determining a first relation information from a first document where the first relation information includes: a first named entity, a second named entity, and a first relationship between the first named entity and the second named entity, storing the first relation information in a database, determining a second relation information from a second document, where the second relation information includes: a third named entity, a fourth named entity, and a second relationship between the third named entity and the fourth named entity, retrieving the first relation information from a database, and sending the first relation information to a client, if the first relation information is different from the second relation information, where at least one step is performed using a computer device.Type: ApplicationFiled: March 29, 2011Publication date: October 6, 2011Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Xian Wu, Quan Yuan, Xia Tian Zhang, Shiwan Zhao
-
Publication number: 20110231400Abstract: Disclosed herein are a document manipulating method, a document managerial system, and an electronic device using the same. The electronic device includes the system, an activating unit, a determining unit and a placing unit. The system includes at least one label of a searchable and classifiable format, a database accessible by the electronic device, and a searching and classifying engine. The method includes the steps of activating a document, determining a labeling location and a labeling size within the document, placing the label at the labeling location to record a document description, and saving the label and a part of the document in the database.Type: ApplicationFiled: June 9, 2010Publication date: September 22, 2011Applicant: Compal Electronics, Inc.Inventors: Yi-Chen Sung, Chien-Yuan Chen, Fei Wu
-
Publication number: 20110231387Abstract: A model is created and from seed trivia facts will create a database of pruned and ranked trivia facts and associated trigger terms. Search, email, or other information provider systems are configured to detect usage of the trigger terms and provide relevant trivia facts in response to the usage.Type: ApplicationFiled: March 22, 2010Publication date: September 22, 2011Applicant: YAHOO! INC.Inventors: Alpa Jain, Gilad Mishne
-
Publication number: 20110219002Abstract: A computer-implemented method for determining similarities between system executable objects includes the steps of determining with one or more computing systems a plurality of subsequences of operation codes in a plurality of disassembled system executable objects, for each subsequence, determining with the one or more computing systems a first set of system executable objects associated with the subsequence, with the computing systems, clustering the first set of system executable objects with a cluster. The cluster includes a set of system executable objects. The step of clustering the first set of system executable objects and the cluster includes the steps of determining with the computing systems the relative similarity between the first set of system executable objects and the cluster, and if the first set of system executable objects is similar to the cluster, adding with the computing systems the system executable objects to the cluster.Type: ApplicationFiled: March 5, 2010Publication date: September 8, 2011Applicant: MCAFEE, INC.Inventors: Anthony Vaughan Bartram, Adrian M. Dunbar
-
Publication number: 20110219000Abstract: Provided is a search apparatus, a search method, and a program that can improve search speed for a document set even when an object to be searched is a large-scale document set.Type: ApplicationFiled: November 6, 2009Publication date: September 8, 2011Inventor: Yukitaka Kusumura
-
Publication number: 20110219005Abstract: Methods and computer-readable media are provided for performing a federated search using a library description file to locate multiple data sources. For a federated search, a library description can be used to describe a set of data sources searched, and may further be used to describe how search results should be presented to a user. The format of such a library description file can include multiple elements, some of which provide information on how to display the library and others that define which data sources are included in the library. The library description file can be created according to library description template.Type: ApplicationFiled: May 12, 2011Publication date: September 8, 2011Applicant: MICROSOFT CORPORATIONInventors: Carlos Brito, Christopher Clayton McConnell, Shannon Scott Hysom, Paolo Marcucci, Tyler Kien Beam
-
Publication number: 20110218999Abstract: The index update unit analyses the information stored in a document repository to create an index for search and stores the index in a time-series divisional index storage unit and creates, from an ACL repository, an access control entry ACE in association with the index for search, which is correlation of information to be searched with access right of at least a group to which the user belongs. The ACL cache generation unit creates ACL cache data that correlates the user with access right to the information to be searched, from the ACE, and registers the ACL cache data created in an ACL cache. A search processing unit searches for an index for search in response to a request for search from said user. In case the ACL cache data correlating the user with the index for search is registered in the ACL cache, the search processing unit_takes, from among the information searched, the information, reference to which is allowed for the user as a search result, based on information in the ACL cache.Type: ApplicationFiled: November 13, 2009Publication date: September 8, 2011Inventors: Masaki Kan, Yoshihiro Kajiki
-
Publication number: 20110218947Abstract: Electronic documents are analyzed to identify assertions, which are inverted to generate questions that may be answered by the assertions. A document or a corpus of electronic documents may be analyzed to identify entities and relationships among entities within the text of the document(s). Assertions are identified based on the entities and relationships among the entities. Each assertion represents a fact about an entity, and a group of assertions represents a summary of the document or document corpus. The assertions are inverted to generate questions that may be answered by the assertions. The questions may be further analyzed to identify relevant concepts and topics and to cluster the questions around the concepts and topics. A combined graph may also be generated that facilitates traversal among topics, concepts, questions, assertions, document summaries, and documents.Type: ApplicationFiled: March 8, 2010Publication date: September 8, 2011Applicant: MICROSOFT CORPORATIONInventors: VISWANATH VADLAMANI, ABHINAI SRIVASTAVA, TAREK NAJM, MUNIRATHNAM SRIKANTH, PHANI VADDADI, ARUNGUNRAM CHANDRASEKARAN SURENDRAN
-
METHOD AND SYSTEM FOR BROWSING, SEARCHING AND SHARING OF PERSONAL VIDEO BY A NON-PARAMETRIC APPROACH
Publication number: 20110218997Abstract: A method for determining a predictability of a media entity portion, the method includes: receiving or generating (a) reference media descriptors, and (b) probability estimations of descriptor space representatives given the reference media descriptors; wherein the descriptor space representatives are representative of a set of media entities; and calculating a predictability score of the media entity portion based on at least (a) the probability estimations of the descriptor space representatives given the reference media descriptors, and (b) relationships between the media entity portion descriptors and the descriptor space representatives. A method for processing media streams, the method may include: applying probabilistic non-parametric process on the media stream to locate media portions of interest; and generating metadata indicative of the media portions of interest.Type: ApplicationFiled: March 7, 2011Publication date: September 8, 2011Inventors: Oren Boiman, Alex Rav-Acha -
Publication number: 20110202528Abstract: A method of identifying a fresh document in a document set is provided. The method may include obtaining a query document that is included in a document set comprising a plurality of documents. The method may also include grouping the plurality of documents into a plurality of fine clusters based on a textual similarity between the plurality of documents. The method may also include identifying a target fine cluster within the plurality of fine clusters, the target fine cluster including the query document. The method may also include ordering the documents included in the target fine cluster by time to identify the fresh document. The method may also include generating a query response that includes the fresh document.Type: ApplicationFiled: February 13, 2010Publication date: August 18, 2011Inventors: Vinay Deolalikar, Hernan Laffitte
-
Publication number: 20110202534Abstract: In an embodiment, a method is provided for storing information related to a decision making process. In this method, data items that are associated with a choice, a fact, and/or a decision are accessed. These data items are used in an application that provides a functionality associated with the decision making process. A relationship between the data items is then created based on a context in which the data items are used in the application. The data items and the relationship are stored in a common data structure that is accessible by a different application that provides a different functionality associated with the decision making process.Type: ApplicationFiled: February 18, 2010Publication date: August 18, 2011Applicant: Business Objects Software Ltd.Inventor: Mark Allerton
-
Publication number: 20110196870Abstract: Systems, methods and computer program products for classifying documents are presented. Systems, methods and computer program products for analyzing documents, e.g., associated with legal discovery are also presented. Systems, methods and computer program products for cleaning up data are also presented. Systems, methods and computer program products for verifying an association of an invoice with an entity are also presented. Systems, methods and computer program products for managing medical records are presented. Systems, methods and computer program products for face recognition are presented.Type: ApplicationFiled: April 19, 2011Publication date: August 11, 2011Applicant: KOFAX, INC.Inventors: Mauritius A.R. Schmidtler, Roland Borrey, Anthony Sarah
-
Publication number: 20110196871Abstract: A method and a system are provided for targeting online ads by grouping and mapping user properties. In one example, the system receives user data associated with one or more users. The system identifies user properties for a user. The system eliminates unacceptable user properties associated with the user. The system identifies permutations of the user properties associated with the user. The system eliminates unacceptable permutations of the user properties associated with the user. Valid permutations remain. The system attaches a weight of importance to each valid permutation. A weight quantifies a level of importance of a valid permutation for the user with respect to buckets. A bucket is an ad category. The system grades each valid permutation relative to a bucket. The system calculates a final grade for each bucket. The system then assigns the user to zero or more buckets based on the final grade for each bucket.Type: ApplicationFiled: February 5, 2010Publication date: August 11, 2011Inventors: Jonathan Kilroy, Dale Nussel, Allie K. Watfa
-
Publication number: 20110191343Abstract: A computer research tool for inputting, searching, displaying, and analyzing metabolic-related clinical data utilizing a novel graphical user interface (GUI) for visual-statistical data analysis and insight generation and method thereof are disclosed.Type: ApplicationFiled: November 18, 2010Publication date: August 4, 2011Applicant: ROCHE DIAGNOSTICS INTERNATIONAL LTD.Inventors: Kelly Heaton, Amy Killoren Clark, Luc Girardin, Dominik Brodbeck
-
Publication number: 20110184949Abstract: A method for recommending places to visit, included using a processor to provide the following steps: assembling a collection of images, wherein each image has first and second tags with the first tag corresponding to the location where the image was taken, and the second tag corresponding to subject matter of the image; clustering the images in response to the first tags into a plurality of locations; using the images in each location to produce at least one representative image of the location; using the second tags of images of each location to produce a list of representative keywords for each location; providing a query in the form of an image or subject matter, or both; and using the query in the form of an image to search among the representative images to recommend a location to visit, or using the query in the form of subject matter to search among the keywords to recommend a location to visit.Type: ApplicationFiled: January 25, 2010Publication date: July 28, 2011Inventor: Jiebo Luo
-
Publication number: 20110184914Abstract: A technique for archiving a relational database having tables of rows may use clusters. Transaction identifiers may be assigned to each of the rows in each of the tables such that all rows belonging to the same application transaction share a unique transaction identifier. Plural hierarchies may be determined, each hierarchy having high level nodes corresponding to the rows in a single table and dependent nodes corresponding to rows in other tables to which the rows in the single table are related in the database. The plural hierarchies may be merged to farm plural clusters, one cluster for each unique transaction identifier. Each cluster may have high level nodes corresponding to the plural hierarchies but only those dependent nodes from the plural hierarchies whose transaction identifiers correspond to that of the cluster. The clusters may be stored in one or more files to form an archive.Type: ApplicationFiled: January 28, 2010Publication date: July 28, 2011Inventor: Jeff Gong
-
Publication number: 20110184955Abstract: Organizing video data [110] is described. Video data [110] comprising metadata is received [205], wherein the metadata [120] provides an intra-video tag of the video data [110]. The metadata [120] is compared [210] with a plurality of video profiles [130]. Based on the comparing [210], the video data [110] is associated [215] with a corresponding one of the plurality of video profiles [130].Type: ApplicationFiled: October 31, 2008Publication date: July 28, 2011Inventors: April Sleyoen Mitchell, Mitchell Trott, W. Alex Vorbau
-
Publication number: 20110179028Abstract: One or more techniques and/or systems are disclosed herein for aggregating web-based data stored in a distributed data store so that it can be retrieved in a first-in, first-out (FIFO) manner. A unique aggregation key is generated for respective one or more data generated from a web-based event, where the one or more data are added to the distributed data store, and the aggregation key corresponds merely to the data generated from the web-based event. The one or more data from the web based event is aggregated in a FIFO queue and stored in a same partition of the distributed data store, based on the aggregation key.Type: ApplicationFiled: January 15, 2010Publication date: July 21, 2011Applicant: Microsoft CorporationInventors: Andrew Ness, Alexander Mallet, Bruce Copeland, Christopher Rickman, Rajesh Viswanathan
-
Publication number: 20110179037Abstract: A data classifier system of the present invention selects a plurality of classifications correlated to data groups so as to output classification axes based on hierarchical classifications and data groups. The data classifier system includes a basic category accumulation means, a classification axis candidate creation means and a priority calculation means. The basic category accumulation means accumulates classifications serving as basic categories used for selecting desired classifications in advance. The classification axis candidate creation means creates classification axis candidates based on combinations of classifications each correlated to at least one data among descendant classifications of each basic category. The priority calculation means calculates priorities with respect to the classification axis candidates created by the classification axis candidate creation means based on hierarchical distances of classifications in the classified hierarchy.Type: ApplicationFiled: July 29, 2009Publication date: July 21, 2011Inventors: Hironori Mizuguchi, Kenji Tateishi, Itaru Hosomi, Dai Kusui
-
Publication number: 20110178844Abstract: The present invention improves upon existing systems and methods by providing a passive profile creation method. The data accessible to a financial processor, such as spend level data, is leveraged using sophisticated data clustering and/or data appending techniques. Associations are established among entities (e.g., consumers), among merchants, and between entities and merchants. In one embodiment, a system and method for passively collecting spend level data for a transaction of a first entity, aggregating the collected spend level data for a plurality of entities; and clustering the first entity with a subset of the plurality of entities, based on aggregated spend level data of the first entity is provided.Type: ApplicationFiled: January 20, 2010Publication date: July 21, 2011Applicant: American Express Travel Related Services Company, Inc.Inventors: Rajendra R. Rane, Melissa Schwartz
-
Publication number: 20110179033Abstract: A method and a system to organize a data set into groups of data subsets in multiple passes using different parameters and to automatically name the groups is disclosed. For example, a data set is retrieved in accordance with a search query submitted by a user. The data set is organized into clusters based on a statistic(s) of the data set. The data set is then organized into groups of data subsets based on an attribute(s) indicated by the data set. Each of the groups are automatically named based on a property shared by data units of the group. The name(s) of a group may be mined from the data units of the group, retrieved from a structure that maps to attribute values indicated by the data units of the group, etc.Type: ApplicationFiled: March 14, 2011Publication date: July 21, 2011Applicant: eBay Inc.Inventors: John A. Mount, Badrul M. Sarwar
-
Publication number: 20110173201Abstract: This invention relates to a method and an apparatus for determining a reliability indicator for at least one set of signatures obtained from clinical data collected from a group of samples. The signatures are obtained by detecting characteristics in the clinical data from the group of sample sand each of the signatures generate a first set of stratification values that stratify the group of samples. At least one additional and parallel stratification source to the signatures obtained from group of sample sis provided, the at least one additional and parallel stratification source to the signatures being independent from the signatures and generates a second set of stratification values. A comparison is done for each respective sample, where the first stratification values are compared with a true reference stratification values, and where the second stratification values are compared with the true reference stratification values.Type: ApplicationFiled: September 24, 2009Publication date: July 14, 2011Applicant: KONINKLIJKE PHILIPS ELECTRONICS N.V.Inventors: Angel Janevski, Nilanjana Banerjee, Yasser Alsafadi, Vinay Varadan
-
Publication number: 20110173202Abstract: Systems and methods for classifying a document are provided. In exemplary embodiments, an organization specific classification code (OSCC) is used to classify the document or data. The OSCC is a classification code based on an information type and an organization. In some embodiments, one or more policies may be associated with the OSCC.Type: ApplicationFiled: August 16, 2006Publication date: July 14, 2011Inventors: Deidre Paknad, Puttappaiah Muniyappa
-
Publication number: 20110173197Abstract: Exemplary methods and apparatuses are provided which may be implemented using one or more computing devices to allow for super clustering of clusters of electronic documents based, at least in part, on structural and static content features.Type: ApplicationFiled: January 12, 2010Publication date: July 14, 2011Applicant: Yahoo! Inc.Inventors: Rupesh R. Mehta, Srinivasan H. Sengamedu, Rajeev R. Rastogi
-
Publication number: 20110167065Abstract: A data generating apparatus includes an acquiring unit that acquires text data (name data) related to a name associated with position information; a classifying unit that using the acquired position data, classifies the name data according to given regions; an integrating unit that integrates neighboring regions such that the total data size of the name data included in regions to be integrated does not exceed a predetermined given data size; a storage unit that groups the name data according to integrated regions and stores the grouped name data as a name dictionary to be used in both a facility search process and a map display process; and an extracting unit that from the classified name data, extracts the name data common to regions of a given number or more, where the storage unit groups and stores the common name data as a common name dictionary different from the name dictionary.Type: ApplicationFiled: June 17, 2008Publication date: July 7, 2011Applicants: Pioneer Corporation, Increment P CorporationInventors: Shunsaku Toyoda, Takashi Hanyuda, Takashi Hashimoto, Ippei Nambata, Hajime Adachi
-
Publication number: 20110167064Abstract: A system and associated method for evaluating cross-domain clusterability upon a target domain and a source domain. The cross-domain clusterability is calculated as a linear combination of a target clusterability and a source-target pair matchability, by use of a trade-off parameter that determines relative contribution of the target clusterability and the source-target pair matchability. The target clusterability quantifies how clusterable the target domain is. The source-target pair matchability is calculated as an average of a target-side matchability and a source-side matchability, which quantifies how well target centroids of the target domain are aligned with the source centroids and how well source centroids of the source domain are aligned with the target centroids, respectively.Type: ApplicationFiled: January 6, 2010Publication date: July 7, 2011Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: JEFFREY M. ACHTERMANN, INDRAJIT BHATTACHARYA, KEVIN W. ENGLISH, Jr., SHANTANU R. GODBOLE, SACHINDRA JOSHI, ASHWIN SRINIVASAN, ASHISH VERMA
-
Publication number: 20110161312Abstract: Mechanisms are provided for integration of Web information architecture taxonomy and Web metrics taxonomy. When the author creates source content, the mechanism classifies the content using a rich taxonomy. The mechanism also adds unique identifiers into the source content pages as tags. The mechanism may then transform the source content into Web content that contains the identifiers in the tags. When users view the Web content, the tags generate usage data, which contain the identifiers. A Web metrics mechanism generates a Web metrics report from the usage data. The page tags are the identifiers from the source content. The Web metrics report associates each page of Web content with the rich taxonomy available in the source content.Type: ApplicationFiled: December 28, 2009Publication date: June 30, 2011Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Tracy H. Wallman