Clustering Or Classification (epo) Patents (Class 707/E17.046)

E Subclasses

Including cluster or class visualization or browsing (epo) (Class 707/E17.047)

Method of Structuring a Database of Objects

Publication number: 20120005210

Abstract: A method of structuring a database of objects, the objects each comprising one or more attributes, the attributes being ordered, the method being executed by at least one computer processor connected to a memory, the method classifying in memory the objects in a structure composed of a list CL of sets of formal concepts Ci, includes at least the following steps: create several groups of attributes SAi; for each of said groups SAi, construct a closed set Pi composed of all the attributes common to the objects comprising at least the attributes of said group SAi; determine the list CL of formal concepts Ci ordered in the lexicographic order, by successively determining the formal concepts in order of increasing intent, the intent F of a formal concept Ci being formed by a set of closed sets Pi.

Type: Application

Filed: November 18, 2009

Publication date: January 5, 2012

Applicant: THALES

Inventors: Cédric Tavernier, Jean-Luc Rogier
MULTI-FACET CLASSIFICATION SCHEME FOR CATALOGING OF INFORMATION ARTIFACTS

Publication number: 20110320454

Abstract: A system and method for constructing a hierarchical multi-faceted classification structure includes organizing a plurality of visual categories into a multi-relational reference ontology that accounts for a plurality of different types of relationships. Media artifacts are categorized into the plurality of visual categories. The categories of artifacts are refined based on faceted ontology relationships or constraints from the multi-relational reference ontology. The multi-relational reference ontology and the one or more media artifacts with relationships are stored as the hierarchical multi-faceted classification structure in computer readable memory storage.

Type: Application

Filed: June 29, 2010

Publication date: December 29, 2011

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: MATTHEW HILL, JOHN R. KENDER, APOSTOL NATSEV, QUOC-BAO NGUYEN, JOHN R. SMITH, JELENA TESIC, LEXING XIE, RONG YAN
Optimization of Multi-channel Commerce

Publication number: 20110320395

Abstract: Content provided by a decision engine system is described. Content, stored in a server system, is provided to a plurality of display units at a plurality of touch point devices. One or more features are determined to optimize the content provided to the plurality of display units. The content is updated syndicated across the plurality of display units at the plurality of touch point devices based on the determination.

Type: Application

Filed: June 29, 2010

Publication date: December 29, 2011

Inventors: Uzair Dada, Jason Kobilka, Michael Krol, Adeeb Ashraf, Abe Mammen, Omer Saeed
High-Dimensional Stratified Sampling

Publication number: 20110320447

Abstract: In one aspect, a processing device of an information processing system is operative to perform high-dimensional stratified sampling of a database comprising a plurality of records arranged in overlapping sub-groups. For a given record, the processing device determines which of the sub-groups the given record is associated with, and for each of the sub-groups associated with the given record, checks if a sampling rate of the sub-group is less than a specified sampling rate. If the sampling rate of each of the sub-groups is less than the specified sampling rate, the processing device samples the given record, and otherwise does not sample the given record. The determine, check and sample operations are repeated for additional records, and samples resulting from the sample operations are processed to generate information characterizing the database.

Type: Application

Filed: June 28, 2010

Publication date: December 29, 2011

Inventors: Aiyou Chen, Ming Xiong
ENTITY CATEGORY DETERMINATION

Publication number: 20110314018

Abstract: Summaries of entities (e.g., people, places, things, concepts, etc.) may provide additional useful information to user. For example, a search engine may provide a summary of an entity within search results. A category (e.g., “writer”, “politician”, etc.) of the entity that is short and concise may be advantageous to provide within a summary of the entity. The category may allow a user to quickly determine whether the information of the entity relates to the intended entity (e.g., search results of an entity as “a writer” vs. search results of an entity as “a politician”). Potential categories and summary text may be extracted from pre-labeled data. The potential categories and summary text may be intersected to determine a set of candidate categories that may be ranked. An entity category having a desired ranked may be determined as the entity category that describes the entity in a desired way.

Type: Application

Filed: June 22, 2010

Publication date: December 22, 2011

Applicant: Microsoft Corporation

Inventors: Michael Bieniosek, Franco Salvetti, Giovanni Lorenzo Thione
SYSTEM FOR MULTI-MODAL DATA MINING AND ORGANIZATION VIA ELEMENTS CLUSTERING AND REFINEMENT

Publication number: 20110307487

Abstract: A system for obtaining data from various sources. The data may be organized into cluster sets of related items. Elements of various kinds may be pulled from the data. The elements may be put together into sets of clusters for each kind of elements. The clusters may be refined relative to one another and in view of integrated properties of the cluster sets. Elements may be added or removed from the clusters during refinement. Examples of the elements may be people and events. Examples of cluster sets of such elements may be groups and goals, respectively.

Type: Application

Filed: June 15, 2010

Publication date: December 15, 2011

Inventors: Valerie Guralnik, Kirk Schloegel
UTILIZING SEARCH POLICIES TO DETERMINE SEARCH RESULTS

Publication number: 20110302170

Abstract: Methods for factoring search and browse policies and content preferences into Web search results are provided. Such search and browse policies and/or content preferences generally are provided by a parent, an employer, or other company representative and specify to whom they apply. Upon receiving a search query from a particular user, it is determined whether one or more search and browse policies and/or content preferences apply to the received search query. Upon determining that one or more search and browse policies and/or content preferences apply to the received search query, at least one of the received search query and any search results determined as satisfying the search query are analyzed in accordance with the one or more applicable search and browse policies and/or content preferences applying to the user. Any necessary modifications are made to the search results before the results are presented to the user.

Type: Application

Filed: June 3, 2010

Publication date: December 8, 2011

Applicant: MICROSOFT CORPORATION

Inventor: VLADIMIR HOLOSTOV
METHODS AND APPARATUS FOR COMPUTING GRAPH SIMILARITY VIA SEQUENCE SIMILARITY

Publication number: 20110302147

Abstract: This disclosure describes systems and methods for identifying and correcting anomalies in web graphs. A web graph is transformed into a sequence of tokens via a walk algorithm. The sequence is fingerprinted to form a set of shingles. The singles are compared to shingles for other web graphs in order to determine similarity between web graphs. Actions are then carried out to remove anomalous web graphs and modify parameters governing web mapping in order to decrease the likelihood of future anomalous web graphs being built.

Type: Application

Filed: May 2, 2011

Publication date: December 8, 2011

Applicant: Yahoo! Inc.

Inventors: Ali Dasdan, Panagiotis Papadimitriou
SEARCH SYSTEM, SEARCH METHOD, AND PROGRAM

Publication number: 20110302166

Abstract: The present invention provides a search system and a search method to make it easy to find out a document required truly among documents of a search result. This search system includes a division unit that divides a document to be searched into a plurality of blocks in accordance with designated division information, a calculation unit that calculates a hash value of each block by applying a hash function to a character string included in each block, a storage unit that stores the calculated hash value together with positional information on the block in the document, and a document grouping unit that fetches, for each document obtained by searching based on the search word, a corresponding hash value from the storage unit 545 in accordance with positional information on a block including the search word to group documents having the same hash value into one group and output the grouped documents as the search result.

Type: Application

Filed: October 16, 2009

Publication date: December 8, 2011

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Yutaka Moriya, Fumihiko Terui
GRAPHICAL MODELS FOR REPRESENTING TEXT DOCUMENTS FOR COMPUTER ANALYSIS

Publication number: 20110302168

Abstract: In a method for representing a text document with a graphical model, a document including a plurality of ordered words is received and a graph data structure for the document is created. The graph data structure includes a plurality of nodes and edges, with each node representing a distinct word in the document and each edge identifying a number of times two nodes occur within a predetermined distance from each other. The graph data structure is stored in an information repository.

Type: Application

Filed: June 8, 2010

Publication date: December 8, 2011

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: Charu Aggarwal
IDENTIFYING RELATED OBJECTS USING QUANTUM CLUSTERING

Publication number: 20110295856

Abstract: Techniques for grouping related objects such as documents and files using quantum clustering are disclosed. A method may include constructing a feature-object database of multiple objects. The feature-object database may have quantized selected features as keys. A connected objects database maybe built. Clusters of connected objects may be identified in the connected objects database. The clusters of identified objects may be evaluated to determine groups of related objects. The method may be implemented on a computing device.

Type: Application

Filed: August 8, 2011

Publication date: December 1, 2011

Inventors: Herbert L. Roitblat, Brian Golbére
SYSTEM, METHOD AND APPARATUS FOR DATA ANALYSIS

Publication number: 20110289086

Abstract: A system and method for searching a database for multiple entries in the database that contain similar data, in which some embodiments of the method include collating data on physical sites from at least one database source to form a collation of site data, assigning a unique entry identifier to each entry of the site data in the collation, performing a lexical analysis of the site data and assigning a similarity metric(s) to each entry of the site data, sorting site data into at least one group with similar lexical content based on a metric threshold difference analysis of the similarity metric(s), to thereby provide at least one group, having at least one site data entry therein, and wherein where there are two or more site data entries in the at least one group, preferably they refer to the same site or to sites having a similar physical address.

Type: Application

Filed: May 20, 2011

Publication date: November 24, 2011

Inventors: Philip Martin Jordan, Vilosh Marion Brito
Methods and Systems for Categorizing Data in an On-Demand Database Environment

Publication number: 20110282872

Abstract: Categorizing data in an on-demand database environment is provided. The categorized data is accessed to provide results based on statistical likelihood that records provide a desired result of a query. The categorization of the data includes organizing queries based on semantic terms, with categorization based on a multidimensional categorization of data in the database environment. The generating of results includes accessing relationship metadata both for individual records and for categories. Relationships along the same category, or among categories can provide records that may answer the query. The relationships and statistics are updated based on usage of the results data. Records and relationships identified as being used to solve the query, or being a desired solution to the query, can be weighted more heavily, thus increasing the likelihood of providing the most relevant data for subsequent queries.

Type: Application

Filed: May 11, 2011

Publication date: November 17, 2011

Applicant: salesforce.com, inc

Inventors: Eugene Oksman, Alexandre Hersans
LOCALIZED DATA AFFINITY SYSTEM AND HYBRID METHOD

Publication number: 20110282875

Abstract: A method, system, and computer program for processing records is disclosed. The records are associated with record sets. Record sets are associated with processor sets, which include one or more processors. Records are routed to associated processor sets for processing, based on the record set associated with the record. Records are processed on processors in the processor sets. Furthermore, various localized affinities can be established. Process affinity can link server processes with processor sets. Cache affinity can link database caches with processor sets. Data affinity can link incoming data to processor sets.

Type: Application

Filed: April 8, 2011

Publication date: November 17, 2011

Applicant: UNITED STATES POSTAL SERVICE

Inventors: C. Scot Atkins, Joseph Conway
RECONSTRUCTION OF TRANSIENT INFORMATION IN INFORMATION DELIVERY SYSTEMS

Publication number: 20110276552

Abstract: In a dynamic information delivery context, a system collects data regarding transient information accessed by a user. The user can then query the stored data to reconstruct transient information. The system uses heuristics to help reconstruct transient information. The heuristics include user profile, time stamps, metadata, and indexing.

Type: Application

Filed: May 7, 2010

Publication date: November 10, 2011

Applicant: TELCORDIA TECHNOLOGIES, INC.

Inventors: Shoshana K. Loeb, Euthimios Panagos
CLASSIFYING DOCUMENTS ACCORDING TO READERSHIP

Publication number: 20110276553

Abstract: One embodiment is a computer-implemented method for classifying documents in a collection of documents according to their intended readerships. The method comprises using a computer to select a document in the collection of documents; and using a computer to determine a characteristic of the selected document, the characteristic being: misleading when the document includes one or more features that are determined to be for a purpose other than reading the document; commercial when the document includes features that are presented for a commercial purpose; or personal when the document includes features of a personal opinion. The method further includes using a computer to classify the selected document as misleading, commercial, or personal according to its determined characteristic; and using a computer to repeat the steps of select document, determine a characteristic of the selected document, and classify the selected document for additional documents in the collection.

Type: Application

Filed: May 10, 2010

Publication date: November 10, 2011

Applicant: International Business Machines Corporation

Inventors: Ying Chen, Bin He, W. Scott Spangler
CONTEXT-AWARE QUERY CLASSIFICATION

Publication number: 20110270819

Abstract: Query classification techniques attempt to classify user search queries in order to better understand user search intent. Understanding a user's search intent allows search engines to provide relevant content tailored to the user's interest. Unfortunately, current classification techniques do not take into account contextual information. Accordingly, as provided herein, a target query may be classified based upon contextual information. In particular, features may be extracted from contextual information and/or other sources. For example, features may be extracted from the target query, related queries, and/or invoked search results of the related queries. In this way, the target query may be classified based upon other queries performed by the user and/or search results of the queries the user found interesting. In addition, a CRF model may be utilized in classifying the target query by providing generalized parameters learned from labeled query sessions.

Type: Application

Filed: April 30, 2010

Publication date: November 3, 2011

Applicant: Microsoft Corporation

Inventors: Dou Shen, Daxin Jiang, Jian-Tao Sun
Systems and Methods for Discovering Synonymous Elements Using Context Over Multiple Similar Addresses

Publication number: 20110270808

Abstract: A clustering-based approach to data standardization is provided. Certain embodiments take as input a plurality of addresses, identify one or more features of the addresses, cluster the addresses based on the one or more features, utilize the cluster(s) to provide a data-based context useful in identifying one or more synonyms for elements contained in the address(es), and standardize the address(es) to an acceptable format, with one or more synonyms and/or other elements being added to or taken away from the input address(es) as part of the standardization process.

Type: Application

Filed: April 30, 2010

Publication date: November 3, 2011

Applicant: International Business Machines Corporation

Inventors: Tanveer A. Faruquie, Sachindra Joshi, Hima P. Karanam, Marvin Mendelssohn, Mukesh K. Mohania, Angel Smith, L. V. Subramaniam, Girish Venkatachaliah
DOCUMENT ANALYSIS SYSTEM

Publication number: 20110270826

Abstract: A document analysis system includes a database that stores documents, a document evaluation module that evaluates the documents by using features of the documents, and a user interface (UI) output unit that provides an evaluation result of the documents, which is produced by the document evaluation module, upon call of the documents.

Type: Application

Filed: October 27, 2009

Publication date: November 3, 2011

Inventors: Wan-Kyu Cha, Mi- Kyung Jung, Han-Joon Ahn, Jeong-Joong Kim, Sung-Ho Choi
System and method for measuring the quality of document sets

Patent number: 8051084

Abstract: Systems and methods are described that calculate the interestingness of a set of one or more records in a database, either absolutely (i.e., compared to an overall collection of records) or relative to some other set of records. In one embodiment, the measure is a relative entropy value that has been normalized. Various applications of the measure are described in the context of an information retrieval system. These applications include, for example, guiding query interpretation, guiding view selection and summarization, intelligent ranges, event detection, concept triggers and interpreting user actions, hierarchy discovery, and adaptive data mining.

Type: Grant

Filed: June 25, 2008

Date of Patent: November 1, 2011

Assignee: Endeca Technologies, Inc.

Inventors: Daniel Tunkelang, Joyce Jeanpin Wang, Vladimir Zelevinsky, Paul Alexander Wehner
EXPANSION OF SEARCH QUERIES USING INFORMATION CATEGORIZATION

Publication number: 20110258173

Abstract: A computerized system and method of constructing and expanding search queries for conducting searches through information sources. The system enables retrieving a category options tree, allowing a user to define a category route by selecting a category-node, which defines a search-category. The system may further enable retrieving a query scenario tree, having a hierarchal structure comprising query nodes, where the retrieved query scenario tree is associated with an initial input query, inputted by a user. Each query node defines a query route enabling to construct the content and structure of an expanded search query. The system enables selecting a query node of the retrieved query scenario tree, according to an online decision making process, which analyses the search-category in relation to available query routes in to allow selecting a query node from the retrieved scenario tree that is most compatible with the search-category.

Type: Application

Filed: June 28, 2011

Publication date: October 20, 2011

Inventors: Michael RATINER, Dmitry KUHARENKO, Alexander RUBINOV
PROVIDING QUESTION AND ANSWER SERVICES

Publication number: 20110258192

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for question and answer services. In one aspect, a method combines receiving a plurality of questions from a plurality of different servers according to a protocol that defines services for submitting questions and obtaining answers to questions. Each received question is analyzed and associated with one or more labels based on the analysis. A request from a server is received according to the protocol to obtain questions related to one or more labels. Questions associated with one or more of the labels are identified and provided in response to the request.

Type: Application

Filed: November 29, 2010

Publication date: October 20, 2011

Applicant: GOOGLE INC.

Inventors: Jun Yao, Jinhui Du
METHOD FOR CALCULATING ENTITY SIMILARITIES

Publication number: 20110258193

Abstract: One embodiment of the present invention provides a system for estimating a similarity level between semantic entities. During operation, the system selects two or more semantic entities associated with a number documents. The system subsequently parses the documents into sub-parts, and calculates the similarity level between the semantic entities based on occurrences of the semantic entities within the sub-parts of the documents.

Type: Application

Filed: April 15, 2010

Publication date: October 20, 2011

Applicant: PALO ALTO RESEARCH CENTER INCORPORATED

Inventors: Oliver Brdiczka, Petro Hizalev
Method and System for Prompting Changes of Electronic Document Content

Publication number: 20110246462

Abstract: A method and system for prompting changes of electronic document content. The method includes the steps of: determining a first relation information from a first document where the first relation information includes: a first named entity, a second named entity, and a first relationship between the first named entity and the second named entity, storing the first relation information in a database, determining a second relation information from a second document, where the second relation information includes: a third named entity, a fourth named entity, and a second relationship between the third named entity and the fourth named entity, retrieving the first relation information from a database, and sending the first relation information to a client, if the first relation information is different from the second relation information, where at least one step is performed using a computer device.

Type: Application

Filed: March 29, 2011

Publication date: October 6, 2011

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Xian Wu, Quan Yuan, Xia Tian Zhang, Shiwan Zhao
Method of Editing Document and Document Managerial System and Electronic Device Using the Same

Publication number: 20110231400

Abstract: Disclosed herein are a document manipulating method, a document managerial system, and an electronic device using the same. The electronic device includes the system, an activating unit, a determining unit and a placing unit. The system includes at least one label of a searchable and classifiable format, a database accessible by the electronic device, and a searching and classifying engine. The method includes the steps of activating a document, determining a labeling location and a labeling size within the document, placing the label at the labeling location to record a document description, and saving the label and a part of the document in the database.

Type: Application

Filed: June 9, 2010

Publication date: September 22, 2011

Applicant: Compal Electronics, Inc.

Inventors: Yi-Chen Sung, Chien-Yuan Chen, Fei Wu
ENGAGING CONTENT PROVISION

Publication number: 20110231387

Abstract: A model is created and from seed trivia facts will create a database of pruned and ranked trivia facts and associated trigger terms. Search, email, or other information provider systems are configured to detect usage of the trigger terms and provide relevant trivia facts in response to the usage.

Type: Application

Filed: March 22, 2010

Publication date: September 22, 2011

Applicant: YAHOO! INC.

Inventors: Alpa Jain, Gilad Mishne
METHOD AND SYSTEM FOR DISCOVERING LARGE CLUSTERS OF FILES THAT SHARE SIMILAR CODE TO DEVELOP GENERIC DETECTIONS OF MALWARE

Publication number: 20110219002

Abstract: A computer-implemented method for determining similarities between system executable objects includes the steps of determining with one or more computing systems a plurality of subsequences of operation codes in a plurality of disassembled system executable objects, for each subsequence, determining with the one or more computing systems a first set of system executable objects associated with the subsequence, with the computing systems, clustering the first set of system executable objects with a cluster. The cluster includes a set of system executable objects. The step of clustering the first set of system executable objects and the cluster includes the steps of determining with the computing systems the relative similarity between the first set of system executable objects and the cluster, and if the first set of system executable objects is similar to the cluster, adding with the computing systems the system executable objects to the cluster.

Type: Application

Filed: March 5, 2010

Publication date: September 8, 2011

Applicant: MCAFEE, INC.

Inventors: Anthony Vaughan Bartram, Adrian M. Dunbar
SEARCH APPARATUS, SEARCH METHOD, AND RECORDING MEDIUM STORING PROGRAM

Publication number: 20110219000

Abstract: Provided is a search apparatus, a search method, and a program that can improve search speed for a document set even when an object to be searched is a large-scale document set.

Type: Application

Filed: November 6, 2009

Publication date: September 8, 2011

Inventor: Yukitaka Kusumura
LIBRARY DESCRIPTION OF THE USER INTERFACE FOR FEDERATED SEARCH RESULTS

Publication number: 20110219005

Abstract: Methods and computer-readable media are provided for performing a federated search using a library description file to locate multiple data sources. For a federated search, a library description can be used to describe a set of data sources searched, and may further be used to describe how search results should be presented to a user. The format of such a library description file can include multiple elements, some of which provide information on how to display the library and others that define which data sources are included in the library. The library description file can be created according to library description template.

Type: Application

Filed: May 12, 2011

Publication date: September 8, 2011

Applicant: MICROSOFT CORPORATION

Inventors: Carlos Brito, Christopher Clayton McConnell, Shannon Scott Hysom, Paolo Marcucci, Tyler Kien Beam
SYSTEM, METHOD AND PROGRAM FOR INFORMATION PROCESSING

Publication number: 20110218999

Abstract: The index update unit analyses the information stored in a document repository to create an index for search and stores the index in a time-series divisional index storage unit and creates, from an ACL repository, an access control entry ACE in association with the index for search, which is correlation of information to be searched with access right of at least a group to which the user belongs. The ACL cache generation unit creates ACL cache data that correlates the user with access right to the information to be searched, from the ACE, and registers the ACL cache data created in an ACL cache. A search processing unit searches for an index for search in response to a request for search from said user. In case the ACL cache data correlating the user with the index for search is registered in the ACL cache, the search processing unit_takes, from among the information searched, the information, reference to which is allowed for the user as a search result, based on information in the ACL cache.

Type: Application

Filed: November 13, 2009

Publication date: September 8, 2011

Inventors: Masaki Kan, Yoshihiro Kajiki
ONTOLOGICAL CATEGORIZATION OF QUESTION CONCEPTS FROM DOCUMENT SUMMARIES

Publication number: 20110218947

Abstract: Electronic documents are analyzed to identify assertions, which are inverted to generate questions that may be answered by the assertions. A document or a corpus of electronic documents may be analyzed to identify entities and relationships among entities within the text of the document(s). Assertions are identified based on the entities and relationships among the entities. Each assertion represents a fact about an entity, and a group of assertions represents a summary of the document or document corpus. The assertions are inverted to generate questions that may be answered by the assertions. The questions may be further analyzed to identify relevant concepts and topics and to cluster the questions around the concepts and topics. A combined graph may also be generated that facilitates traversal among topics, concepts, questions, assertions, document summaries, and documents.

Type: Application

Filed: March 8, 2010

Publication date: September 8, 2011

Applicant: MICROSOFT CORPORATION

Inventors: VISWANATH VADLAMANI, ABHINAI SRIVASTAVA, TAREK NAJM, MUNIRATHNAM SRIKANTH, PHANI VADDADI, ARUNGUNRAM CHANDRASEKARAN SURENDRAN
METHOD AND SYSTEM FOR BROWSING, SEARCHING AND SHARING OF PERSONAL VIDEO BY A NON-PARAMETRIC APPROACH

Publication number: 20110218997

Abstract: A method for determining a predictability of a media entity portion, the method includes: receiving or generating (a) reference media descriptors, and (b) probability estimations of descriptor space representatives given the reference media descriptors; wherein the descriptor space representatives are representative of a set of media entities; and calculating a predictability score of the media entity portion based on at least (a) the probability estimations of the descriptor space representatives given the reference media descriptors, and (b) relationships between the media entity portion descriptors and the descriptor space representatives. A method for processing media streams, the method may include: applying probabilistic non-parametric process on the media stream to locate media portions of interest; and generating metadata indicative of the media portions of interest.

Type: Application

Filed: March 7, 2011

Publication date: September 8, 2011

Inventors: Oren Boiman, Alex Rav-Acha
SYSTEM AND METHOD FOR IDENTIFYING FRESH INFORMATION IN A DOCUMENT SET

Publication number: 20110202528

Abstract: A method of identifying a fresh document in a document set is provided. The method may include obtaining a query document that is included in a document set comprising a plurality of documents. The method may also include grouping the plurality of documents into a plurality of fine clusters based on a textual similarity between the plurality of documents. The method may also include identifying a target fine cluster within the plurality of fine clusters, the target fine cluster including the query document. The method may also include ordering the documents included in the target fine cluster by time to identify the fresh document. The method may also include generating a query response that includes the fresh document.

Type: Application

Filed: February 13, 2010

Publication date: August 18, 2011

Inventors: Vinay Deolalikar, Hernan Laffitte
Storage Model for Information Related to Decision Making Process

Publication number: 20110202534

Abstract: In an embodiment, a method is provided for storing information related to a decision making process. In this method, data items that are associated with a choice, a fact, and/or a decision are accessed. These data items are used in an application that provides a functionality associated with the decision making process. A relationship between the data items is then created based on a context in which the data items are used in the application. The data items and the relationship are stored in a common data structure that is accessible by a different application that provides a different functionality associated with the decision making process.

Type: Application

Filed: February 18, 2010

Publication date: August 18, 2011

Applicant: Business Objects Software Ltd.

Inventor: Mark Allerton
DATA CLASSIFICATION USING MACHINE LEARNING TECHNIQUES

Publication number: 20110196870

Abstract: Systems, methods and computer program products for classifying documents are presented. Systems, methods and computer program products for analyzing documents, e.g., associated with legal discovery are also presented. Systems, methods and computer program products for cleaning up data are also presented. Systems, methods and computer program products for verifying an association of an invoice with an entity are also presented. Systems, methods and computer program products for managing medical records are presented. Systems, methods and computer program products for face recognition are presented.

Type: Application

Filed: April 19, 2011

Publication date: August 11, 2011

Applicant: KOFAX, INC.

Inventors: Mauritius A.R. Schmidtler, Roland Borrey, Anthony Sarah
Targeting Online Ads by Grouping and Mapping User Properties

Publication number: 20110196871

Abstract: A method and a system are provided for targeting online ads by grouping and mapping user properties. In one example, the system receives user data associated with one or more users. The system identifies user properties for a user. The system eliminates unacceptable user properties associated with the user. The system identifies permutations of the user properties associated with the user. The system eliminates unacceptable permutations of the user properties associated with the user. Valid permutations remain. The system attaches a weight of importance to each valid permutation. A weight quantifies a level of importance of a valid permutation for the user with respect to buckets. A bucket is an ad category. The system grades each valid permutation relative to a bucket. The system calculates a final grade for each bucket. The system then assigns the user to zero or more buckets based on the final grade for each bucket.

Type: Application

Filed: February 5, 2010

Publication date: August 11, 2011

Inventors: Jonathan Kilroy, Dale Nussel, Allie K. Watfa
Computer Research Tool For The Organization, Visualization And Analysis Of Metabolic-Related Clinical Data And Method Thereof

Publication number: 20110191343

Abstract: A computer research tool for inputting, searching, displaying, and analyzing metabolic-related clinical data utilizing a novel graphical user interface (GUI) for visual-statistical data analysis and insight generation and method thereof are disclosed.

Type: Application

Filed: November 18, 2010

Publication date: August 4, 2011

Applicant: ROCHE DIAGNOSTICS INTERNATIONAL LTD.

Inventors: Kelly Heaton, Amy Killoren Clark, Luc Girardin, Dominik Brodbeck
RECOMMENDING PLACES TO VISIT

Publication number: 20110184949

Abstract: A method for recommending places to visit, included using a processor to provide the following steps: assembling a collection of images, wherein each image has first and second tags with the first tag corresponding to the location where the image was taken, and the second tag corresponding to subject matter of the image; clustering the images in response to the first tags into a plurality of locations; using the images in each location to produce at least one representative image of the location; using the second tags of images of each location to produce a list of representative keywords for each location; providing a query in the form of an image or subject matter, or both; and using the query in the form of an image to search among the representative images to recommend a location to visit, or using the query in the form of subject matter to search among the keywords to recommend a location to visit.

Type: Application

Filed: January 25, 2010

Publication date: July 28, 2011

Inventor: Jiebo Luo
Database Archiving Using Clusters

Publication number: 20110184914

Abstract: A technique for archiving a relational database having tables of rows may use clusters. Transaction identifiers may be assigned to each of the rows in each of the tables such that all rows belonging to the same application transaction share a unique transaction identifier. Plural hierarchies may be determined, each hierarchy having high level nodes corresponding to the rows in a single table and dependent nodes corresponding to rows in other tables to which the rows in the single table are related in the database. The plural hierarchies may be merged to farm plural clusters, one cluster for each unique transaction identifier. Each cluster may have high level nodes corresponding to the plural hierarchies but only those dependent nodes from the plural hierarchies whose transaction identifiers correspond to that of the cluster. The clusters may be stored in one or more files to form an archive.

Type: Application

Filed: January 28, 2010

Publication date: July 28, 2011

Inventor: Jeff Gong
ORGANIZING DATA

Publication number: 20110184955

Abstract: Organizing video data [110] is described. Video data [110] comprising metadata is received [205], wherein the metadata [120] provides an intra-video tag of the video data [110]. The metadata [120] is compared [210] with a plurality of video profiles [130]. Based on the comparing [210], the video data [110] is associated [215] with a corresponding one of the plurality of video profiles [130].

Type: Application

Filed: October 31, 2008

Publication date: July 28, 2011

Inventors: April Sleyoen Mitchell, Mitchell Trott, W. Alex Vorbau
AGGREGATING DATA FROM A WORK QUEUE

Publication number: 20110179028

Abstract: One or more techniques and/or systems are disclosed herein for aggregating web-based data stored in a distributed data store so that it can be retrieved in a first-in, first-out (FIFO) manner. A unique aggregation key is generated for respective one or more data generated from a web-based event, where the one or more data are added to the distributed data store, and the aggregation key corresponds merely to the data generated from the web-based event. The one or more data from the web based event is aggregated in a FIFO queue and stored in a same partition of the distributed data store, based on the aggregation key.

Type: Application

Filed: January 15, 2010

Publication date: July 21, 2011

Applicant: Microsoft Corporation

Inventors: Andrew Ness, Alexander Mallet, Bruce Copeland, Christopher Rickman, Rajesh Viswanathan
DATA CLASSIFIER SYSTEM, DATA CLASSIFIER METHOD AND DATA CLASSIFIER PROGRAM

Publication number: 20110179037

Abstract: A data classifier system of the present invention selects a plurality of classifications correlated to data groups so as to output classification axes based on hierarchical classifications and data groups. The data classifier system includes a basic category accumulation means, a classification axis candidate creation means and a priority calculation means. The basic category accumulation means accumulates classifications serving as basic categories used for selecting desired classifications in advance. The classification axis candidate creation means creates classification axis candidates based on combinations of classifications each correlated to at least one data among descendant classifications of each basic category. The priority calculation means calculates priorities with respect to the classification axis candidates created by the classification axis candidate creation means based on hierarchical distances of classifications in the classified hierarchy.

Type: Application

Filed: July 29, 2009

Publication date: July 21, 2011

Inventors: Hironori Mizuguchi, Kenji Tateishi, Itaru Hosomi, Dai Kusui
SYSTEM AND METHOD FOR USING SPEND BEHAVIOR TO IDENTIFY A POPULATION OF MERCHANTS

Publication number: 20110178844

Abstract: The present invention improves upon existing systems and methods by providing a passive profile creation method. The data accessible to a financial processor, such as spend level data, is leveraged using sophisticated data clustering and/or data appending techniques. Associations are established among entities (e.g., consumers), among merchants, and between entities and merchants. In one embodiment, a system and method for passively collecting spend level data for a transaction of a first entity, aggregating the collected spend level data for a plurality of entities; and clustering the first entity with a subset of the plurality of entities, based on aggregated spend level data of the first entity is provided.

Type: Application

Filed: January 20, 2010

Publication date: July 21, 2011

Applicant: American Express Travel Related Services Company, Inc.

Inventors: Rajendra R. Rane, Melissa Schwartz
MULTI-PASS DATA ORGANIZATION AND AUTOMATIC NAMING

Publication number: 20110179033

Abstract: A method and a system to organize a data set into groups of data subsets in multiple passes using different parameters and to automatically name the groups is disclosed. For example, a data set is retrieved in accordance with a search query submitted by a user. The data set is organized into clusters based on a statistic(s) of the data set. The data set is then organized into groups of data subsets based on an attribute(s) indicated by the data set. Each of the groups are automatically named based on a property shared by data units of the group. The name(s) of a group may be mined from the data units of the group, retrieved from a structure that maps to attribute values indicated by the data units of the group, etc.

Type: Application

Filed: March 14, 2011

Publication date: July 21, 2011

Applicant: eBay Inc.

Inventors: John A. Mount, Badrul M. Sarwar
METHOD OF DETERMINING A RELIABILITY INDICATOR FOR SIGNATURES OBTAINED FROM CLINICAL DATA AND USE OF THE RELIABILITY INDICATOR FOR FAVORING ONE SIGNATURE OVER THE OTHER

Publication number: 20110173201

Abstract: This invention relates to a method and an apparatus for determining a reliability indicator for at least one set of signatures obtained from clinical data collected from a group of samples. The signatures are obtained by detecting characteristics in the clinical data from the group of sample sand each of the signatures generate a first set of stratification values that stratify the group of samples. At least one additional and parallel stratification source to the signatures obtained from group of sample sis provided, the at least one additional and parallel stratification source to the signatures being independent from the signatures and generates a second set of stratification values. A comparison is done for each respective sample, where the first stratification values are compared with a true reference stratification values, and where the second stratification values are compared with the true reference stratification values.

Type: Application

Filed: September 24, 2009

Publication date: July 14, 2011

Applicant: KONINKLIJKE PHILIPS ELECTRONICS N.V.

Inventors: Angel Janevski, Nilanjana Banerjee, Yasser Alsafadi, Vinay Varadan
Systems and methods for utilizing organization-specific classification codes

Publication number: 20110173202

Abstract: Systems and methods for classifying a document are provided. In exemplary embodiments, an organization specific classification code (OSCC) is used to classify the document or data. The OSCC is a classification code based on an information type and an organization. In some embodiments, one or more policies may be associated with the OSCC.

Type: Application

Filed: August 16, 2006

Publication date: July 14, 2011

Inventors: Deidre Paknad, Puttappaiah Muniyappa
METHODS AND APPARATUSES FOR CLUSTERING ELECTRONIC DOCUMENTS BASED ON STRUCTURAL FEATURES AND STATIC CONTENT FEATURES

Publication number: 20110173197

Abstract: Exemplary methods and apparatuses are provided which may be implemented using one or more computing devices to allow for super clustering of clusters of electronic documents based, at least in part, on structural and static content features.

Type: Application

Filed: January 12, 2010

Publication date: July 14, 2011

Applicant: Yahoo! Inc.

Inventors: Rupesh R. Mehta, Srinivasan H. Sengamedu, Rajeev R. Rastogi
DATA GENERATING APPARATUS, INFORMATION PROCESSING APPARATUS, DATA GENERATING METHOD, INFORMATION PROCESSING METHOD, DATA GENERATING PROGRAM INFORMATION PROCESSING PROGRAM AND RECORDING MEDIUM

Publication number: 20110167065

Abstract: A data generating apparatus includes an acquiring unit that acquires text data (name data) related to a name associated with position information; a classifying unit that using the acquired position data, classifies the name data according to given regions; an integrating unit that integrates neighboring regions such that the total data size of the name data included in regions to be integrated does not exceed a predetermined given data size; a storage unit that groups the name data according to integrated regions and stores the grouped name data as a name dictionary to be used in both a facility search process and a map display process; and an extracting unit that from the classified name data, extracts the name data common to regions of a given number or more, where the storage unit groups and stores the common name data as a common name dictionary different from the name dictionary.

Type: Application

Filed: June 17, 2008

Publication date: July 7, 2011

Applicants: Pioneer Corporation, Increment P Corporation

Inventors: Shunsaku Toyoda, Takashi Hanyuda, Takashi Hashimoto, Ippei Nambata, Hajime Adachi
CROSS-DOMAIN CLUSTERABILITY EVALUATION FOR CROSS-GUIDED DATA CLUSTERING BASED ON ALIGNMENT BETWEEN DATA DOMAINS

Publication number: 20110167064

Abstract: A system and associated method for evaluating cross-domain clusterability upon a target domain and a source domain. The cross-domain clusterability is calculated as a linear combination of a target clusterability and a source-target pair matchability, by use of a trade-off parameter that determines relative contribution of the target clusterability and the source-target pair matchability. The target clusterability quantifies how clusterable the target domain is. The source-target pair matchability is calculated as an average of a target-side matchability and a source-side matchability, which quantifies how well target centroids of the target domain are aligned with the source centroids and how well source centroids of the source domain are aligned with the target centroids, respectively.

Type: Application

Filed: January 6, 2010

Publication date: July 7, 2011

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: JEFFREY M. ACHTERMANN, INDRAJIT BHATTACHARYA, KEVIN W. ENGLISH, Jr., SHANTANU R. GODBOLE, SACHINDRA JOSHI, ASHWIN SRINIVASAN, ASHISH VERMA
Integration of Web Information Architecture Taxonomy and Web Metrics Taxonomy

Publication number: 20110161312

Abstract: Mechanisms are provided for integration of Web information architecture taxonomy and Web metrics taxonomy. When the author creates source content, the mechanism classifies the content using a rich taxonomy. The mechanism also adds unique identifiers into the source content pages as tags. The mechanism may then transform the source content into Web content that contains the identifiers in the tags. When users view the Web content, the tags generate usage data, which contain the identifiers. A Web metrics mechanism generates a Web metrics report from the usage data. The page tags are the identifiers from the source content. The Web metrics report associates each page of Web content with the rich taxonomy available in the source content.

Type: Application

Filed: December 28, 2009

Publication date: June 30, 2011

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: Tracy H. Wallman

prev … 3 4 5 6 7 8 9 10 11 … next