Clustering Or Classification (epo) Patents (Class 707/E17.089)

E Subclasses

Into predefined classes (epo) (Class 707/E17.09)

Including class or cluster creation or modification (epo) (Class 707/E17.091)

Including cluster or class visualization or browsing (epo) (Class 707/E17.092)

SYSTEM AND METHOD FOR AUTOMATIC WEIGHT GENERATION FOR PROBABILISTIC MATCHING

Publication number: 20100175024

Abstract: Embodiments of the invention provide a system and method of automatically generating weights for matching data records. Each field of a record may be compared by an exact match and/or close matches and each comparison can result in a mathematical score which is the sum of the field comparisons. To sum up the field scores accurately, the automatic weight generation process comprises an iterative process. In one embodiment, initial weights are computed based upon unmatched-set probabilities and default discrepancy weights associated with attributes in the comparison algorithm. A bulk cross-match is performed across the records using the initial weights and a candidate matched set is computed for updating the discrepancy probabilities. New weights are computed based upon the unmatched probabilities and the updated discrepancy probabilities. Test for convergence between the new weights and the old weights. Repeat with the new weight table until the weights converge to their final value.

Type: Application

Filed: March 19, 2010

Publication date: July 8, 2010

Inventors: Scott Schumacher, Scott Ellard, Norman S. Adams
DATA MINING USING AN INDEX TREE CREATED BY RECURSIVE PROJECTION OF DATA POINTS ON RANDOM LINES

Publication number: 20100174714

Abstract: The present invention relates to a method computer program product for datamining with constant search time, the method and computer program product comprises the steps of: traversing a search tree to a leave, retrieving a one or more data store identifier from said leave, read data pointed to by said data store identifier, locating one or more value in said data, referencing one or more data descriptor, retrieve the n-nearest data descriptor neighbors, terminate said search.

Type: Application

Filed: June 6, 2007

Publication date: July 8, 2010

Applicant: HASKOLINN I REYKJAVIK

Inventors: Fridrik Heidar Asmundsson, Herwig Lejsek, Bjorn Thor Jonsson
Network System, Network Household Appliance, Content/Metadata Synchronous Processing Method, and Computer Program

Publication number: 20100174680

Abstract: [Object] To provide a network system capable of efficiently carrying out synchronous processing by positively notifying a home network appliance of an update content of a content or metadata. [Solving Means] A subscriber in a home network appliance acquires via a network an update notification message that stores update information of a content or metadata and in which a filter attribute for categorizing the update notification message is set. The appliance includes a service client that updates, by an application corresponding to a specific service, the content or the metadata in a local content/metadata database using the update information within the update notification message. The subscriber manages a correspondence between the service client and the filter attribute and specifies the service client that provides the update information within the update notification message based on the filter attribute set in the update notification message and the correspondence.

Type: Application

Filed: October 17, 2008

Publication date: July 8, 2010

Inventors: Yasuaki Yamagishi, Yasuhiro Yukawa
GENERATING DOCUMENT TEMPLATES THAT ARE ROBUST TO STRUCTURAL VARIATIONS

Publication number: 20100174715

Abstract: A template or wrapper tree for a document such as a web page is generalized from the bottom up (from leaf toward root of a logical tree structure of the template). At a given level in the tree, sub-trees are clustered and the clustered sub-trees are generalized, and the process is repeated at a next higher level in the tree, resulting in a generalized template or wrapper tree. This can be done by generating a nested pattern regular expression based on the sub-tree clusters, merging sub-trees based on the nested pattern regular expression, and then replacing sub-trees in a tree-based regular expression of the template or wrapper at the given level with the merged sub-trees. This process is repeated at a next higher level of the tree (progressing from leaf towards root) until the wrapper or tree-based regular expression that represents the template is fully generalized.

Type: Application

Filed: February 22, 2010

Publication date: July 8, 2010

Applicant: YAHOO! INC.

Inventors: Charu Tiwari, V.G. Vinod Vydiswaran
Multi-level fraud check with dynamic feedback for internet business transaction processor

Publication number: 20100169163

Abstract: An Internet business transaction processor of the present invention has a distributed processing architecture which allows the processing load to be distributed among multiple parallel servers. The transaction processor of the present invention provides a virtual store front utilizing “others people's warehouse” approach by using a dynamic distributor selection processing system to select among a plurality of distributors based on flexible rule-based algorithm.

Type: Application

Filed: October 26, 2009

Publication date: July 1, 2010

Inventor: Robert S. Alvin
METHOD AND SYSTEM FOR EMAIL SEARCH

Publication number: 20100169320

Abstract: A method and system for performing email search, the said method comprising of enabling the user to find relations between emails and build network relations and to further retrieve groups based on the relations (and intersections of relations) as per the user's choice; the system comprising of giving and having the user select predetermined options for a search with a further ability to “drill-down” the results with the aid of filters to view further mails/results, and being also able to search on search results and also provide for storing user searches.

Type: Application

Filed: December 22, 2009

Publication date: July 1, 2010

Inventors: Prashanth PATNAM, Ajay Deshpande, Jitendra Gokhale, Vinod Kulkarni
Method and device for clustering categorical data and identifying anomalies, outliers, and exemplars

Publication number: 20100161609

Abstract: One aspect of the invention is a method for assigning categorical data to a plurality of clusters. An example of the method includes identifying a plurality of categories associated with the data. This example also includes, for each category in the plurality of categories, identifying at least one element associated with the category. This example also includes specifying a number of clusters to which the data may be assigned. This example additionally includes assigning at least some of the data, wherein each assigned datum is assigned to a respective one of the clusters. This example further includes, for at least one of the clusters, determining, for at least one category, the frequency in data assigned to the cluster of at least one element associated with the category. Further, some examples of the invention provide for detecting outliers, anomalies, and exemplars in the categorical data.

Type: Application

Filed: February 27, 2010

Publication date: June 24, 2010

Inventor: David B. Fogel
CONTEXT TRANSFER IN SEARCH ADVERTISING

Publication number: 20100161605

Abstract: A computer-implemented method is disclosed for determining a type of landing page to which to transfer web searchers that enter a particular query, the method comprising: classifying a landing page as one of a plurality of landing page classes with a trained classifier of a computer based on textual content of the landing page; determining, by the computer, characteristics of one or more query to be associated with the landing page; and choosing, with the computer, whether to retain or to change classification of the landing page to be associated with the one or more query based on relative average conversion rates of advertisements on a plurality of manually-classified landing pages when associated with the characteristics of the one or more query.

Type: Application

Filed: December 23, 2008

Publication date: June 24, 2010

Applicant: Yahoo! Inc.

Inventors: Evgeniy Gabrilovich, Andrei Broder, Bo Pang, Vanja Josifovski, Hila Becker
SYSTEM AND METHOD FOR GENERATING DISPLAY ADVERTISEMENTS FROM SEARCH BASED KEYWORD ADVERTISEMENTS

Publication number: 20100161411

Abstract: There is provided a system and method for generating display advertisements from search based keyword advertisements. The system includes a keyword generation unit for generating one or more advertising keywords from a received category profile defining a classification hierarchy, for use in selecting one or more candidate advertisement messages from a plurality of advertisement messages, an advertisement selection unit for receiving one or more candidate advertisement messages comprising a text message selected from the plurality of advertisement messages and selecting one advertisement message from the one or more received candidate advertisement messages based upon one or more characteristics associated with the received one or more candidate advertisement messages and a creative advertisement assembly unit for generating an advertisement image based on the text advertisement of the selected one advertisement message for display in network based content.

Type: Application

Filed: December 22, 2009

Publication date: June 24, 2010

Applicant: KINDSIGHT

Inventors: Wang Wu, Dorothy Tse, Michael Gassewitz, Sitao Yang
Method and Apparatus For Track and Track Subset Grouping

Publication number: 20100153395

Abstract: A method comprises storing real-time multimedia data in a plurality of tracks and/or track subsets; and identifying one or more multi-track groups, each multi-track group being associated with a relationship among one or more of the plurality of tracks and/or track subsets.

Type: Application

Filed: July 16, 2009

Publication date: June 17, 2010

Applicant: NOKIA CORPORATION

Inventors: Miska Matias Hannuksela, Ye-Kui Wang
NAME INDEXING FOR NAME MATCHING SYSTEMS

Publication number: 20100153396

Abstract: Methods, systems and computer software program code products enabling the matching of a large number of names across any of a range of different languages comprise: receiving incoming names in any of a set of languages or scripts; generating high-recall keys based on the received incoming names; executing a full-text index process based on the generated high-recall keys; and looking up candidates for matching.

Type: Application

Filed: February 26, 2008

Publication date: June 17, 2010

Inventors: Benson Margulies, David Murgatroyd, Bernard Greenberg, Zhaohui Li
MAINTAINING A RELATIONSHIP BETWEEN TWO DIFFERENT ITEMS OF DATA

Publication number: 20100153397

Abstract: Data is stored persistently. At least two different items of the data are stored in two different non-conflicting regions or two different physical clusters. A relationship is maintained between the two different items of data. The relationship enables a process to reach any one of the data items from the other data item. Consistency of the relationship is maintained notwithstanding updates of either or both of the items.

Type: Application

Filed: August 5, 2009

Publication date: June 17, 2010

Applicant: Miosoft Corporation

Inventors: Albert B. Barabas, Ernst M. Siepmann, Mark D.A. van Gulik
WINDOW GROUPiNG

Publication number: 20100153399

Abstract: A framework is provided for obtaining window information. The window information can be applied to different assignment models to assign windows to different groups. A group may correspond to a task being performed by a user. The window information can be semantic or temporal information captured as window events and properties of windows whose events are captured. Temporal information can be information about switches between windows. Semantic information can be window titles. Temporal information, semantic information, or both, can be used to assign windows to groups.

Type: Application

Filed: February 26, 2010

Publication date: June 17, 2010

Applicant: MICROSOFT CORPORATION

Inventors: Nuria M. Oliver, Arungunram C. Surendran, Chintan S. Thakkar, Gregory R. Smith
SYSTEMS AND METHODS FOR RATIONAL SELECTION OF CONTEXT SEQUENCES AND SEQUENCE TEMPLATES

Publication number: 20100153400

Abstract: Provided are systems and methods for rational selection of context sequences and sequence templates including a computer implemented method for obtaining a repository of attributes sets where the attributes sets are statistically associated with a sequence template representing two or more context sequences.

Type: Application

Filed: August 20, 2008

Publication date: June 17, 2010

Inventor: Yoav Namir
METHODS AND SYSTEMS FOR MANAGING DATA

Publication number: 20100145949

Abstract: Systems and methods for managing data, such as metadata or indexes of content of files. In one exemplary method, notifications to update a metadata database or an index database are combined into a combined notification. According to other aspects, an order among logical locations on a storage device is determined in order to specify a sequence for scanning for files to be indexed. According to another aspect, a method includes determining whether to index a file based on a path name of the file relative to a plurality of predetermined path names.

Type: Application

Filed: December 11, 2009

Publication date: June 10, 2010

Inventors: Yan Arrouye, Dominic Giampaolo, Andrew Carol
SYSTEM, METHOD AND PROGRAM PRODUCT FOR CLASSIFYING DATA ELEMENTS INTO DIFFERENT LEVELS OF A BUSINESS HIERARCHY

Publication number: 20100145945

Abstract: A method, system and program product for classifying data elements into different levels of a business hierarchy. The method includes identifying data elements to be classified into one or more levels of a business hierarchy, selecting a first logic decision tree for evaluating the data elements identified for classification into the hierarchy and executing the first tree for recursively evaluating each data element identified until the first tree has been traversed. Further, the method includes dynamically creating configurable anchor point classifications for the data elements evaluated through the first tree and assigning a respective anchor point classification to each data element evaluated, such that, a respective anchor point classification assigned to a data element evaluated links the data element to a lowest level of the hierarchy, and where the anchor point classification conveys classification information as to each higher level of the hierarchy that the data element belongs to.

Type: Application

Filed: December 10, 2008

Publication date: June 10, 2010

Applicant: International Business Machines Corporation

Inventors: James D. Episale, Mark A. Musa, David G. Ruest
Image processing system, particularly for use with diagnostic images

Publication number: 20100138422

Abstract: An image processing system particularly for use with diagnostic images, comprises at least one processing unit, which receives digital images acquired by one or more imaging apparatus and provides output images, processed by means of an image processing program loaded in the memory of the processing unit and executed thereby, characterized in that it consists of a central service unit, comprising an interface to be accessed by remote users, which connect to said central unit by remote communication means.

Type: Application

Filed: October 24, 2006

Publication date: June 3, 2010

Applicant: BRACCO IMAGING S.P.A.

Inventor: Marco Mattiuzzi
IDENTIFYING INADEQUATE SEARCH CONTENT

Publication number: 20100138421

Abstract: Systems and methods for identifying inadequate search content are provided. Inadequate search content, for example, can be identified based on statistics associated with the search queries related to the content.

Type: Application

Filed: February 3, 2010

Publication date: June 3, 2010

Applicant: GOOGLE INC.

Inventors: Jeffrey David Oldham, Hal R. Varian, Matthew D. Cutts, Matt Rosencrantz
SYSTEMS AND METHODS FOR CLASSIFYING AND TRANSFERRING INFORMATION IN A STORAGE NETWORK

Publication number: 20100131467

Abstract: Systems and methods for data classification to facilitate and improve data management within an enterprise are described. The disclosed systems and methods evaluate and define data management operations based on data characteristics rather than data location, among other things. Also provided are methods for generating a data structure of metadata that describes system data and storage operations. This data structure may be consulted to determine changes in system data rather than scanning the data files themselves.

Type: Application

Filed: January 28, 2010

Publication date: May 27, 2010

Applicant: CommVault Systems, Inc.

Inventors: Anand Prahlad, Jeremy A. Schwartz, David Ngo, Brian Brockway, Marcus S. Muller
PERSONALIZATION ENGINE FOR BUILDING A DYNAMIC CLASSIFICATION DICTIONARY

Publication number: 20100131507

Abstract: A dynamic classification dictionary is built for use in profiling and targeting users for additional relevant content. Behavioral data is gathered from user activity, and user documents and actions are categorized. Author-generated document classification information is analyzed and assigned a first taxonomic noun to characterize the document. User-generated tags characterizing a portion of the document are assigned a second taxonomic noun. Search terms that resulted in the user accessing the document are identified and assigned a third taxonomic noun. Attributes related to the manner in which the document was accessed are evaluated and assigned a fourth taxonomic noun. The document is processed using pattern rules to extract a fifth taxonomic noun. The taxonomic nouns are aggregated into a composite set of taxonomic nouns, and the dynamic classification dictionary is build by storing the composite set of taxonomic nouns.

Type: Application

Filed: January 29, 2010

Publication date: May 27, 2010

Applicant: CBS INTERACTIVE, INC.

Inventors: Tushar PRADHAN, Thomas OSBORNE, John POTTER
SYSTEM FOR ADVERTISING USING META-BLOG WEB PAGE AND PROFIT CREATING METHOD WITH IT

Publication number: 20100121711

Abstract: The present invention relates to an advertising system and profit creation method using a metablog web page, in which the personal website and the display target entity of a member are posted on a personal metablog web page, managed by a metablog server 200, when the member subscribes to the metablog server as a member while operating the personal website managed by a web server 100, and which provide some of advertising fees, incurred by posting the display target entity and paid by an advertiser client 300, to the member. Therefore, the present invention is advantageous in that, since all members, who operate personal websites through different web servers, use a metablog web page, new profit can be created in such a way that the members can be paid some of the advertising fees paid by an advertiser client.

Type: Application

Filed: May 30, 2007

Publication date: May 13, 2010

Applicant: NR SYSTEMS, INC.

Inventor: Yon-Ho Park
QUERY GENERATION FOR A CAPTURE SYSTEM

Publication number: 20100121853

Abstract: A document accessible over a network can be registered. A registered document, and the content contained therein, is not transmitted undetected over and off of the network. In one embodiment, the invention includes a manager agent to maintain signatures of registered documents and a match agent to detect the unauthorized transmission of the content of registered documents.

Type: Application

Filed: January 20, 2010

Publication date: May 13, 2010

Inventors: Erik de la Iglesia, William Deninger, Ratinder Paul Singh Ahuja
DATA COLLECTION SYSTEM, DATA COLLECTION METHOD AND DATA COLLECTION PROGRAM

Publication number: 20100121826

Abstract: It is an object to provide a data collection system that is configured to reduce a communication amount, etc. at the time when data are collected from a plurality of devices, so as to reduce a communication amount attended by the collection of data without increasing processing loads imposed on devices. A symbol classifying unit of a data relay device classifies received data that have been already compressed. A data recompressing unit replaces codes contained in the classified already compressed data with other codes, so as to recompress the already compressed data. A symbol set clustering unit sends a transfer destination renewal device a communication speed at the time when the recompressed data are transferred to other devices, a processing speed at the recompressing time, etc. The transfer destination renewal device generates transfer destination information on the basis of the communication speed, the processing speed, etc.

Type: Application

Filed: February 26, 2008

Publication date: May 13, 2010

Inventor: Akitake Mitsuhashi
EVENT SEARCHING

Publication number: 20100114893

Abstract: Events can be searched by identifying a query that includes a time interval and a search component, determining a time increment associated with the time interval, and partitioning the time interval into partitions based on the time increment. For each partition, a relevance of each event in a collection of events that occur at a time in the partition is determined based on the query. A pre-determined number of the relevant events are displayed.

Type: Application

Filed: January 11, 2010

Publication date: May 6, 2010

Applicant: GOOGLE INC.

Inventors: Nikhil Chandhok, Peter Solderitsch, Michael Gordon, Philo Juang
ADAPTIVE RADAR

Publication number: 20100109938

Abstract: A method of classifying items from reflected signals returned from said items is disclosed, the method comprising: processing said return signals to discriminate between a first set of signals indicative of items of interest and a further set of signals indicative of clutter; identifying items from said first set of signals and classifying them as a first class of item; processing said further set of signals to identify a second set of signals indicative of further items of interest; identifying items from said second set of signals and classifying them as a second class of item.

Type: Application

Filed: January 31, 2008

Publication date: May 6, 2010

Inventors: Gordon Kenneth Andrew Oswald, Edwin Christopher Carter, Per Arne Vincent Utsi, Samuel Julius Pumphrey, Desmond Keith Phillips, Michael Hugh Burchett, Allan Geoffrey Smithson, Jonathan Peter Edgecombe
INTRODUCING SYSTEM, INTRODUCING METHOD, INFORMATION RECORDING MEDIUM, AND PROGRAM

Publication number: 20100114892

Abstract: Provided is an introducing system in which a server device introduces users of terminal devices to each other while motivating the users to join the system by presenting appropriate information to them during a wait time before they receive introduction. When a terminal device requests introduction of another terminal device during a time slot (303) shifted from a time period (304) between times ti?1 and ti by a margin time period (305), the server device assigns an introduction time ti to the terminal device. The terminal device displays the difference between the assigned introduction time and the current time on a screen as a remaining wait time and the number of terminal devices in an introduction waiting list. When the time ti comes, the server device groups terminal devices that are assigned the introduction time ti to match an introduction target, and notifies the introduction target to each terminal device.

Type: Application

Filed: April 17, 2008

Publication date: May 6, 2010

Inventors: Hiromasa Kaneko, Hideo Ueda
PATENT EVALUATING DEVICE

Publication number: 20100114587

Abstract: A patent evaluating device comprises a data acquiring section (105) for acquiring items of patent data and patent attribute information on each item of patent data in a predetermined technical field from a patent database, a data classifying section (115) for classifying the acquired items of patent data into groups within a predetermined period of time, and an evaluation value calculating section (120) for calculating the evaluation value of each item of the patent data by using the patent attribute information on each of the patent data belonging to each group and by using the value determined for each group. With this, the value of a patent application or a patent right is adequately evaluated according to numerical information objectively determined and according to the progress information of the patent application or the patent right or the content information.

Type: Application

Filed: November 2, 2007

Publication date: May 6, 2010

Inventors: Hiroaki Masuyama, Toshiro Ohsaki, Kazumi Hasuko
Method and system for business intelligence analytics on unstructured data

Publication number: 20100114899

Abstract: Various embodiments of the present invention disclose a method for Business Intelligence (BI) metrics on unstructured data. Unstructured data is collected from numerous data sources that include unstructured data as ingested data. The ingested data is indexed and represents hyperlink and extracted data and metadata for each document. Thereafter, the ingested data is automatically classified into one or more relevance classes. Further, numerous analytics are performed on the classified data to generate business intelligence metrics that may be presented on an access device operated by a user.

Type: Application

Filed: October 7, 2009

Publication date: May 6, 2010

Inventors: Aloke Guha, Joan Wrabetz, Shumin Wu, Venky Madireddi
SYSTEM AND METHOD FOR DYNAMIC AND REAL-TIME CATEGORIZATION OF WEBPAGES

Publication number: 20100115615

Abstract: A system and method for categorizing content on a webpage is disclosed. The method comprises receiving a request for a webpage from a user's computer. Next, the system determines whether there is dynamic content on the webpage by analyzing the address, links, reputation, type, style and other indicators of being able to easily change the webpage. If the webpage contains content that can be changed, then the webpage is analyzed to determine a current categorization thereof. If the webpage does not have dynamic content then the categorization of the webpage will remain the same thereby freeing system resources by only analyzing dynamic webpages.

Type: Application

Filed: June 29, 2009

Publication date: May 6, 2010

Applicant: WEBSENSE, INC.

Inventors: Daniel Lyle Hubbard, Dan Ruskin
Information classifying device, information classifying method, information classifying program, information classifying system

Patent number: 7693683

Abstract: An information classifying device calculates, for a plurality of populations containing pieces of sample information, evaluation distance between a center of gravity of the pieces of sample information belonging to each population and a piece of sample information as an object of classification (object sample), calculates statistical information such as mean, variance and standard deviation of the evaluation distance for each population, evaluates the evaluation distance of the sample information to the population based on the evaluation distance and the statistical information and evaluates degree of assignment relevancy of the object sample to the population, determines to which population the object sample is to be assigned in accordance with the degree of assignment relevancy, and assigns the object sample to the population. Evaluation distance between the center of gravity of each updated population and the object sample belonging to each population is calculated.

Type: Grant

Filed: November 17, 2005

Date of Patent: April 6, 2010

Assignee: Sharp Kabushiki Kaisha

Inventor: Masayoshi Ihara
CLUSTERED SEARCH PROCESSING

Publication number: 20100082618

Abstract: Methods and apparatus for searching data and grouping search results into clusters that are ordered according to search relevance. Each cluster comprises one or more data type, such as images, web pages, local information, news, advertisements, and the like. In one embodiment, a search term is evaluated for related concepts indicating categories of data sources to search. Data sources may also be identified by context information such as a location of a client device, a currently running application, and the like. Search results in each cluster are ordered by relevance and each cluster is given a score based on an aggregate of the relevance within the cluster. Each cluster score may be modified based on one or more corresponding concepts and/or context information. The clusters are ordered based on the modified scores. Content, including advertisements, may also be added to the ordered list to appear as another cluster.

Type: Application

Filed: October 30, 2009

Publication date: April 1, 2010

Applicant: Yahoo! Inc.

Inventors: Edward Stanley Ott, IV, Keith David Saft, Marco Boerries, Meher Tendjoukian, Paul Yiu
METHODS AND APPARATUS FOR SEARCHING AND ACCESSING MULTIMEDIA CONTENT

Publication number: 20090254578

Abstract: A method of synchronising a multimedia content file with an associated text file includes subdividing the text file into one or more samples, where each sample includes zero or more consecutive characters of the text file. The samples are associated with a corresponding contiguous time interval of the multimedia content file. For each sample, a corresponding consumption rate value is determined, which represents a use ratio of characters of the sample within the associated time interval of the multimedia content file. The consumption rate values are then stored, so that they may subsequently be used to compute time positions within the multimedia content file associated with corresponding text characters within the text file. Additional information, such as time cues and interlude intervals, may also be recorded in order to improve the accuracy of synchronisation.

Type: Application

Filed: April 1, 2009

Publication date: October 8, 2009

Inventor: Michael Andrew Hall
Company Technical Document Group Analysis Supporting Device

Publication number: 20090234688

Abstract: A company technical document group analysis supporting device comprises index term extracting means for extracting an index term from a group of documents of a subject company including a technical document group, clustering means for classifying the document group of the subject company under given conditions to acquire multiple clusters, number-of-documents determination means for determining the number of documents belong to each cluster, appearance frequency calculating means for calculating a function value of an appearance frequency of each extracted index term in each cluster, per-cluster keyword point calculating means for calculating the keyword point in each cluster by dividing the function value of the appearance frequency of each index term in each cluster by the number of documents belonging to each cluster, and entire-cluster keyword point calculating means for calculating the total for the entire clusters of the results of the calculation by the per-cluster keyword point calculating means for e

Type: Application

Filed: October 11, 2006

Publication date: September 17, 2009

Inventors: Hiroaki Masuyama, Makoto Asada, Kazumi Hasuko, Hideaki Hotta, Norio Araki
METHOD AND SYSTEM FOR CLUSTERING IDENTIFIED FORMS

Publication number: 20090210406

Abstract: A method is provided for organizing a plurality of documents that include forms. An initial set of clusters is defined for the plurality of documents. The initial set of clusters is reclustered based on similarity values calculated in multiple feature spaces. For example, a first feature space may be associated with a content of a document while a second feature space may be associated with a content of a form associated with the document. Each cluster has an associated centroid vector in each feature space that is used to represent the cluster. The similarity between the document and each cluster is calculated in both feature spaces. Each document is assigned to the cluster whose centroid is most similar. The cluster centroids may be recalculated and the process repeated until the cluster assignments become stable.

Type: Application

Filed: February 15, 2008

Publication date: August 20, 2009

Inventors: Juliana Freire, Luciano Barbosa
ADAPTIVE DATA CLASSIFICATION FOR DATA MINING

Publication number: 20090164416

Abstract: A method and system for adaptive classification during information retrieval from unstructured data are provided. The method includes receiving input from a user defining a classification. A sample set of unstructured data based on the user defined classification defined is determined. The sample set of unstructured data is analyzed to determine a classification mapping that maps attributes of the sample set of unstructured data to class labels for the classification. The attributes of a set of data objects in a second set of unstructured data are indexed and one or more data objects in the set of data objects are mapped to the class label based on the classification mapping. Feedback based on the user's response to an interaction with results is determined using the class label. Finally, adaptive classification mapping is performed based on analysis of feedback by adjusting the sample set of data objects.

Type: Application

Filed: December 9, 2008

Publication date: June 25, 2009

Applicant: Aumni Data Inc.

Inventor: Aloke Guha
METHOD AND SYSTEM FOR CATEGORIZING TOPIC DATA WITH CHANGING SUBTOPICS

Publication number: 20090150436

Abstract: The embodiments of the invention provide a method for the automatic identification of changing subtopics within topics. The method begins by receiving customer satisfaction data having unstructured data objects. Next, the data objects are automatically categorized into pre-defined topics, wherein the pre-defined topics do not change throughout the customer satisfaction analysis. The pre-defined topics can be automatically defined based on a history of customer satisfaction data. Following this, a clustering analysis is automatically performed to identify subtopics of the data objects within the pre-defined topics. The subtopics are more specific than the pre-defined topics, and the subtopics can change. Further, the clustering analysis can include extracting features from the data objects and grouping the features into the subtopics. Each of the subtopics includes features having a predetermined degree of similarity.

Type: Application

Filed: December 10, 2007

Publication date: June 11, 2009

Applicant: International Business Machines Corporation

Inventors: Shantanu Godbole, Raghuram Krishnapuram, Shourya Roy
LINK-BASED CLASSIFICATION OF GRAPH NODES

Publication number: 20090132561

Abstract: A method of labeling unlabeled nodes in a graph that represents objects that have an explicit structure between them. A computing device can use a labeling engine to labeled nodes in a graph that are labeled and can identify an unlabeled node in the graph that is structurally associated with the labeled nodes. The labeling engine can label the unlabeled node with the label of the labeled node based on the structural association between the unlabeled node and the labeled node.

Type: Application

Filed: November 21, 2007

Publication date: May 21, 2009

Applicant: AT&T LABS, INC.

Inventors: Graham Cormode, Smriti Bhagat, Irina Rozenbaum
Method and apparatus for evaluating the closeness of items in a recommender of such items

Patent number: 7533093

Abstract: A method and apparatus are disclosed for recommending items of interest to a user, such as television program recommendations, before a viewing history or purchase history of the user is available. A third party viewing or purchase history is processed to generate stereotype profiles that reflect the typical patterns of items selected by representative viewers. A user can select the most relevant stereotype(s) from the generated stereotype profiles and thereby initialize his or her profile with the items that are closest to his or her own interests. A clustering routine partitions the third party viewing or purchase history (the data set) into clusters, such that points (e.g., television programs) in one cluster are closer to the mean of that cluster than any other cluster. A distance computation routine evaluates the closeness of a television program to each cluster based on the distance between a given television program and the mean of a given cluster.

Type: Grant

Filed: November 13, 2001

Date of Patent: May 12, 2009

Assignee: Koninklijke Philips Electronics N.V.

Inventors: Srinivas Gutta, Kaushal Kurapati
Identifying Clusters Of Words According To Word Affinities

Publication number: 20090094207

Abstract: In one embodiment, identifying clusters of words includes accessing a record that records affinities. An affinity between a first and second word describes a quantitative relationship between the first and second word. Clusters of words are identified according to the affinities. A cluster comprises words that are sufficiently affine with each other. A first word is sufficiently affine with a second word if the affinity between the first and second word satisfies one or more affinity criteria. A clustering analysis is performed using the clusters.

Type: Application

Filed: October 1, 2008

Publication date: April 9, 2009

Applicant: Fujitsu Limited

Inventors: David L. Marvit, Jawahar Jain, Stergios Stergiou, Alex Gilman, B. Thomas Adler, John J. Sidorowich
Information Processing Device and Method, and Program

Publication number: 20090077132

Abstract: The present invention relates to an information processing device, an information processing method, and a program that make it possible to prevent recommendation in a CF method from concentrating on a part of contents, and recommend a content to a user with little history information. In step S11, another user X having most similar history information to that of a user A to whom to recommend a musical piece is detected. In step S12, a musical piece a that the user X has and which the user A does not have is detected. In step S13, a cluster in each cluster layer to which the musical piece a belongs is identified. Then, in step S14, common musical pieces classified into all the clusters identified are extracted and set as recommendation candidates. Further, in step S15, one musical piece that has most similar cluster information to that of the musical piece a is selected among the musical pieces as recommendation candidates. The musical piece selected in this step is recommended to the user A.

Type: Application

Filed: September 15, 2006

Publication date: March 19, 2009

Applicant: SONY CORPORATION

Inventors: Noriyuki Yamamoto, Kei Tateno, Mari Saito, Tomohiro Tsunoda, Mitsuhiro Miyazaki
Bayesian Surety Check to Reduce False Positives in Filtering of Content in Non-Trained Languages

Publication number: 20090055412

Abstract: A Bayesian spam filter determines an amount of content in incoming email messages that it knows from training. If the filter is familiar with a threshold amount of the content, then the filter proceeds to classify the email message as being spam or legitimate. On the other hand, if not enough of the words in the email are known to the filter from training, then the filter cannot accurately determine whether or not the message is spam. In this case, the filter classifies the message as being of type unknown. Different threshold metrics can be used, such as the percentage of known words, and the percentage of maximum correction value used during processing. This greatly improves the processing of emails in languages on which the filter was not trained.

Type: Application

Filed: August 24, 2007

Publication date: February 26, 2009

Inventor: Shaun Cooley
Multiscale detection of local image structures

Patent number: 7474790

Abstract: A method and apparatus for the detection of local image structures represented as clusters in a joint-spatial range domain where the method comprises receiving an input image made having one or more clusters in a joint-spatial range domain, and each of the one or more clusters having a corresponding mode. Receiving a set of analysis matrices and selecting through each one of the analysis matrices. Using the selected analysis matrix to partition the input image into the one or more clusters and their corresponding modes, and computing a mean, ?, and a local covariance matrix ? for each of the corresponding modes of each of the one or more clusters. Selecting at least one of the one or more clusters, where each selected cluster has a stable mean and stable covariance matrix across the set of analysis matrices, whereby each of the selected clusters is indicative of a local image structure.

Type: Grant

Filed: September 29, 2004

Date of Patent: January 6, 2009

Assignee: Siemens Medical Solutions USA, Inc.

Inventors: Navneet Dalal, Dorin Comaniciu
System and Method of Uniformly Classifying Information Objects with Metadata Across Heterogeneous Data Stores

Publication number: 20080270462

Abstract: Described are a system and method for classifying information objects with metadata across heterogeneous data stores. A metadata model includes a plurality of interconnected nodes. A least one of the nodes corresponds to a metadata instance and at least one of the nodes corresponds to a metadata category. Information related to an information object maintained in a data store is acquired. A look up of the metadata model finds one or more metadata instances and metadata categories based on the acquired information related to the information object. One or more of the found metadata instances and metadata categories are associated with the information object maintained in the data store.

Type: Application

Filed: November 6, 2007

Publication date: October 30, 2008

Applicant: INTERSE A/S

Inventor: Dan Thomsen
Information delivery system, information delivery method, delivery device, node device, and the like

Publication number: 20080201371

Abstract: The present invention is to provide for example a node device that receives content catalog information having attribute information and delivered from a delivery device in an information delivery system, the nodes being mutually communicable through a network and divided into a plurality of groups, the node device including: a new content catalog receiving means for receiving new content catalog information, delivered from the delivery device and having attribute information; a new content catalog saving means; a condition information saving means for saving the grouping condition and presentation time information; a group judgment means for judging on the basis of the grouping condition; a presentation time judgment means; and a content catalog presentation setting means for presenting the new content catalog information after the presentation time arrives.

Type: Application

Filed: January 22, 2008

Publication date: August 21, 2008

Applicant: BROTHER KOGYO KABUSHIKI KAISHA

Inventor: Atsushi Murakami
Hardware and Software Identifier Categorization and Review

Publication number: 20080172403

Abstract: A method for updating a catalog of hardware device and software object identifiers by identifying unknown identifiers and categorizing each of the unknown identifiers. The method further provides the categorized identifiers to a community of users for review and receives comments from the community of users on the provided categorization. The method further determines if the categorized identifiers should be recategorized based upon the received comments. Another method performs a search for an entity associated with an unknown identifier, determines a likely entity associated with the unknown identifier, and verifies the correctness of such determined likely entity. Another method generates a catalog of computer system components, receives information regarding the identity of a computer system component from at least two different sources, and determines the identity of the computer system component based upon the reputation of the sources of the received information.

Type: Application

Filed: January 15, 2007

Publication date: July 17, 2008

Applicant: MICROSOFT CORPORATION

Inventors: Ram P. Papatla, John Leo Ellis, Mario Hewardt, David James Armour
SYSTEM AND METHOD FOR SEARCHING INFORMATION AND DISPLAYING SEARCH RESULTS

Publication number: 20080168054

Abstract: A method for searching information and displaying search results is disclosed. The method includes: receiving one or more keywords; obtaining search results according to the one or more keywords, the search results comprising one or more documents; confirming at least one cluster name according to the search results, and clustering each of the one or more documents in the search results into a corresponding cluster name; classifying each document in the search results into a corresponding field, and thus obtaining classified search results; generating a cluster diagram according to the at least one cluster name and the clustered documents, and generating a cluster-classification diagram according to the classified search results and the generated cluster diagram; and outputting the generated cluster-classification diagram. A related system is also disclosed.

Type: Application

Filed: August 21, 2007

Publication date: July 10, 2008

Applicant: HON HAI PRECISION INDUSTRY CO., LTD.

Inventors: Chung-I Lee, Chien-Fa Yeh, Yao-Huei Sie
SYSTEM AND METHOD FOR DOCUMENT SECTION SEGMENTATION

Publication number: 20080059498

Abstract: A system and method for facilitating the processing and the use of documents by providing a system for categorizing document section headings under a set of canonical section headings. In the method for categorizing section headings, there may be a process of training a database and matching methods to categorize different but equivalent document section headings under canonical headings and categories. Once trained, the system may match and categorize the document sections with little to no supervision of the categorization for large sets of documents.

Type: Application

Filed: September 7, 2007

Publication date: March 6, 2008

Applicant: Nuance Communications, Inc.

Inventors: Alwin CARUS, Melissa MACPHERSON, Stefaan Heyvaert, Cornelia Parkes
Systems and methods for employing an orthogonal corpus for document indexing

Publication number: 20080010311

Abstract: Methods and systems for processing a body of reference material to generate a directory for accessing information from a database.

Type: Application

Filed: February 16, 2007

Publication date: January 10, 2008

Inventors: Henry Kon, George Burch

prev … 11 12 13 14 15