Latent Semantic Index Or Analysis (lsi Or Lsa) Patents (Class 707/739)
-
Publication number: 20120124051Abstract: An ontological information retrieval system is provided. According to an embodiment, the subject ontological information retrieval system can be utilized for computer-aided clinical Traditional Chinese Medicine (TCM) practice. In one implementation, a graphical user interface (GUI) is provided, enabling a user to input a query with symptoms determined from a patient, and the system's parser can find instances of the symptoms in a document object model (DOM) tree of the TCM ontological information. Diagnosis based upon the symptoms can be communicated to the user through the GUI. A relevance index (RI) and/or a frequency index (F1) can be further provided for evaluating a diagnosis by comparing the symptoms determined from a patient with the expected symptoms of the diagnosed illness and returning a value based on the number of matched symptoms, or a weighted index of matched symptoms.Type: ApplicationFiled: July 29, 2010Publication date: May 17, 2012Inventors: Wilfred Wan Kei Lin, Allan Kang Ying Wong, Jackei Ho Kei Wong, Jewels Chun Wing Kong
-
Patent number: 8180778Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating action trails from web history are described. In one aspect, a method includes receiving a web content access history of a user, the content access history including one or more user actions, each user action being associated with a content item upon which the user action is performed and identifying one or more action trails from the content access history, each action trail including a sequence of user actions performed one content items relating to a topic. Identifying a particular action trail includes clustering the user actions into a series of segments using temporal criteria; calculating semantic similarities between the content items, and adding a segment of the series of segments to the action trail when the semantic similarities between the segment and another segment satisfy a similarity threshold.Type: GrantFiled: February 5, 2010Date of Patent: May 15, 2012Assignee: Google Inc.Inventors: Elin Pedersen, Karl A. Gyllstrom, Shengyin Gu, Peter Jin Hong
-
Patent number: 8180777Abstract: The present invention relates in general to methods and systems for comparing and maximizing the optimal selection of a first set of one or more data objects to a set of second data objects. In one embodiment, the first set of data objects represent one or more tasks to be fulfilled by a set of capabilities represented by the second data objects. In one embodiment, methods and systems are provided that apply topic modeling and similarity metrics to determine the optimal selection. In one embodiment, methods and systems are provided to determine the appropriateness of a set of second data objects to satisfy the requirements of a first data object given interaction attributes. Embodiments may be used to compare mission requirements with potential team members to determine the appropriateness of team members and teams for a given mission based on interaction attributes of the team members and teams.Type: GrantFiled: October 24, 2010Date of Patent: May 15, 2012Assignee: Aptima, Inc.Inventors: Andrew Duchon, Kari Kelton, Pacey Foster, Kara Orvis, Robert McCormack
-
Publication number: 20120109964Abstract: A method of classifying a set of semantic concepts on a second multimedia collection based upon adapting a set of semantic concept classifiers and updating concept affinity relations that were developed to classify the set of semantic concepts for a first multimedia collection. The method comprises providing the second multimedia collection from a different domain and a processor automatically classifying the semantic concepts from the second multimedia collection by adapting the semantic concept classifiers and updating the concept affinity relations to the second multimedia collection based upon the local smoothness over the concept affinity relations and the local smoothness over data affinity relations.Type: ApplicationFiled: October 27, 2010Publication date: May 3, 2012Inventors: Wei Jiang, Alexander C. Loui
-
Patent number: 8171025Abstract: A density-based data clustering method, comprising a parameter-setting step for setting a scanning radius and a minimum threshold value, a dividing step for dividing a space of a plurality of data points according to the scanning radius, a data-retrieving step for retrieving one data point out of the plurality of data points as a core data point, a searching step for calculating a distance between the core data point and each of the query points, a grouping determination step for determining whether a number of the neighboring points is smaller than the minimum threshold value.Type: GrantFiled: January 6, 2010Date of Patent: May 1, 2012Assignee: National Pingtung University Of Science & TechnologyInventors: Cheng-Fa Tsai, Chien-Tsung Wu
-
Patent number: 8166033Abstract: A system and method for matching and assembling records is provided. One embodiment of the invention assembles records by applying a method for grouping records based on matching fields, assembling a new record as a composite of the matched records, and then repeating the grouping, matching and assembling steps in a cascade where the matching grouping and assembling steps are modified as a function of the cascade step and the assembled records created in earlier steps.Type: GrantFiled: February 27, 2003Date of Patent: April 24, 2012Assignee: Parity Computing, Inc.Inventors: Zunaid H. Kazi, Christopher D. Rosin, Ramamohan Paturi, Holden P. Robbins, Mark W. S. Land
-
Patent number: 8156123Abstract: Methods and apparatuses for processing metadata are described herein. In one embodiment, when a file (e.g., a text, audio, and/or image files) having metadata is received, the metadata and optionally at least a portion of the content of the file are extracted from the file to generate a first set of metadata. An analysis is performed on the extracted metadata and the content to generate a second set of metadata, which may include metadata in addition to the first set of metadata. The second set of metadata may be stored in a database suitable to be searched to identify or locate the file. Other methods and apparatuses are also described.Type: GrantFiled: April 22, 2005Date of Patent: April 10, 2012Assignee: Apple Inc.Inventors: Guy L. Tribble, Yan Arrouye, Dominic Giampaolo
-
Publication number: 20120072423Abstract: Particular portions of program execution data are specified and organized in semantic groups. A grouping expression written in a transformation syntax language specifies a pattern and a replacement, for grouping performance data samples. An exception to the pattern can also be specified. In response to the grouping expression, a cost accounting shows groups and their costs. The grouping expression may operate on names and/or name-associated characteristics such as private/public status, author, directory, and the like. Samples may represent nodes in a directed acyclic graph memorializing call stacks or memory allocation. Grouping expressions are used to group nodes and consolidate costs by various procedures when making modified sample stacks: clustering-by-name, entry-group-clustering, folding-by-name, a folding-by-cost. An entry group clustering shows at least one entry point name while avoiding unwanted detail.Type: ApplicationFiled: September 20, 2010Publication date: March 22, 2012Applicant: MICROSOFT CORPORATIONInventors: Vance Morrison, Joshua Ryan Williams
-
Patent number: 8140515Abstract: Users of electronic documents are classified for profiling and targeting of additional relevant content. Behavioral data is gathered from user registration information and user activity, and user documents and actions are categorized. Registration information is combined with collaborative and editorial data to provide user profile information. Author-generated document classification information is analyzed and assigned a first taxonomic noun to characterize the document. User-generated tags characterizing a portion of the document are assigned a second taxonomic noun. Search terms that resulted in the user accessing the document are identified and assigned a third taxonomic noun. Attributes related to how the document was accessed are evaluated and assigned a fourth taxonomic noun. The document is processed using pattern rules to extract a fifth taxonomic noun.Type: GrantFiled: October 28, 2009Date of Patent: March 20, 2012Assignee: CBS Interactive Inc.Inventors: Tushar Pradhan, Thomas Osborne, John Potter
-
Patent number: 8131725Abstract: Systems and methods for managing electronic data are disclosed. Various data management operations can be performed based on a metabase formed from metadata. Such metadata can be identified from an index of data interactions generated by a journaling module, and obtained from their associated data objects stored in one or more storage devices. In various embodiments, such processing of the index and storing of the metadata can facilitate, for example, enhanced data management operations, enhanced data identification operations, enhanced storage operations, data classification for organizing and storing the metadata, cataloging of metadata for the stored metadata, and/or user interfaces for managing data. In various embodiments, the metabase can be configured in different ways. For example, the metabase can be stored separately from the data objects so as to allow obtaining of information about the data objects without accessing the data objects or a data structure used by a file system.Type: GrantFiled: September 20, 2010Date of Patent: March 6, 2012Assignee: Comm Vault Systems, Inc.Inventors: Anand Prahlad, Jeremy Alan Schwartz, David Ngo, Brian Brockway, Marcus S. Muller
-
Patent number: 8131684Abstract: In one embodiment, input is received from a user defining a classification and an analytic for the classification. Multiple classifications and analytics may be defined by a user. A definition of relevance parameters is determined that characterize the classification and a set of analytics measures associated with the analytic. The definition may be for the classification. Unstructured data and structured data are analyzed based on the definition of the relevance parameters to determine relevant data in the unstructured data and the structured data. The relevant data being data that is determined to be relevant to the classification defined by the user. An index of the terms from the relevant data is determined. The index is useable by an analytics tool to provide results for queries of the unstructured data and structured data. The query may be used within the classification such that targeted results are provided using the index and the relevant data to the classification.Type: GrantFiled: March 21, 2011Date of Patent: March 6, 2012Assignee: Aumni Data Inc.Inventors: Joan Wrabetz, Aloke Guha
-
Publication number: 20120054185Abstract: The different illustrative embodiments provide a method, a computer program product, and an apparatus for managing information. A request to store text in a table in a database is received. A determination is made as to whether a first collection of textual information having a first concept that is related to a second concept for the text is present in the database responsive to receiving the request containing the text. The text is associated with the first collection of textual information in the database responsive to a determination that the first collection of textual information in the database having the first concept that is related to the second concept for the text is present in the database. A second collection for the data with a third concept that is related to the second concept for the text within the degree of relatedness is created.Type: ApplicationFiled: August 31, 2010Publication date: March 1, 2012Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Sandra K. Johnson, Grant D. Miller, Robert F. Pryor
-
Publication number: 20120041953Abstract: A latent topic labels text mining system and method to mine and analyze the content of textual data. Embodiments of the system and method are particularly well suited for use on microblog data to help people identify posts they want to read and to find people that they want to follow. Embodiments of the system and method use a modified Labeled LDA technique (called an L+LDA technique) that analyzes content using a combination of labeled and latent topics. The resultant data is assigned labels one of four labels to generate a lower-dimensional representation of the data that the individual words in a microblog post. This learned topic representation is used to characterize, summarize, filter, find, suggest, and compare the content of microblog posts. Embodiments of the system and method also include visualization techniques such as a tag cloud visualization that is used to visualize microblogging data.Type: ApplicationFiled: August 16, 2010Publication date: February 16, 2012Applicant: Microsoft CorporationInventors: Susan Theresa Dumais, Daniel Ramage, Daniel John Liebling, Steven Mark Drucker
-
Patent number: 8117205Abstract: A method and system for enhancing the quality of a bookmark or a set of bookmarks that have been organized by topic and contain information related to that topic. The method and system analyzes documents accessible by the bookmark or set of bookmarks and performs a search using key terms from that analysis in a vector called a latent similarity metric. The terms that result from this search are preferably ranked in a hierarchy or the like and utilized in a subsequent search to locate and rank additional related documents.Type: GrantFiled: July 8, 2008Date of Patent: February 14, 2012Assignee: International Business Machines CorporationInventor: Michael D. Rychener
-
Patent number: 8112436Abstract: In one embodiment, access a search query comprising one or more query words, at least one of the query words representing one or more query concepts; access a network document identified for a search query by a search engine, the network document comprising one or more document words, at least one of the document words representing one or more document concepts; semantic-text match the search query and the network document to determine one or more negative semantic-text matches; and construct one or more negative features based on the negative semantic-text matches.Type: GrantFiled: September 21, 2009Date of Patent: February 7, 2012Assignee: Yahoo ! Inc.Inventors: Yumao Lu, Lei Duan, Fan Li, Benoit Dumoulin, Xing Wei
-
Patent number: 8108398Abstract: A system that facilitates data presentation and management includes at least one database to store a corpus of data relating to one or more topics. The system further includes a summarizer component to automatically determine a subset of the data over the corpus of data relating to at least one of the topic(s), wherein the subset forms a summary of at least one topic.Type: GrantFiled: June 29, 2007Date of Patent: January 31, 2012Assignee: Microsoft CorporationInventors: Shai Guday, Bret P. O'Rourke, John Mark Miller, James Morris Alkove, Andrew David Wilson
-
Patent number: 8108376Abstract: A document set, and history documents including documents, etc., browsed by a user are input. The document set and the history documents are each analyzed to obtain characteristic vectors. A plurality of topic clusters and a plurality of sub-topic clusters are obtained by clustering the document set. A transition structure showing transitions of topics among the sub-topic clusters is generated, and a characteristic attribute is extracted from each topic cluster and each sub-topic cluster. An cluster-of-interest is extracted in comparison among characteristic vectors of the history documents and a characteristic vector of each document included in the document set, a sub-topic cluster having transition relations with the cluster-of-interest is obtained on the basis of a transition structure owned by the cluster-of-interest, and a document included in the sub-topic cluster is extracted as a recommended document to be presented together with the characteristic attribute.Type: GrantFiled: March 20, 2009Date of Patent: January 31, 2012Assignee: Kabushiki Kaisha ToshibaInventors: Masayuki Okamoto, Masaaki Kikuchi
-
Publication number: 20120023103Abstract: In one embodiment, a method of generating annotation tags (28) for a digital image (22) includes maintaining a library (16) of human-meaningful words or phrases organized as category entries (72) according to a number of defined image description categories (70), and receiving context metadata (20) associated with the capture of a given digital image (22). The method further includes selecting particular category entries (72-1, 72-2) as vocabulary metadata (24) for the digital image (22) by mapping the context metadata (20) into the library (16), and generating annotation tags (28) for the digital image (22) by logically combining the vocabulary metadata (24) according to a defined set of deductive logic rules (30) that are predicated on the defined image description categories (70). In another embodiment, a processing apparatus (12), such as a digital processor (18, 26) and supporting memory (14), etc., is configured to carry out the above method, or to carry out variations of the above method.Type: ApplicationFiled: January 21, 2009Publication date: January 26, 2012Applicant: TELEFONAKTIEBOLAGET LM ERICSSON (PUBL)Inventors: Joakim Soderberg, Jonas Bjork, Andreas Fasbender
-
Publication number: 20120011124Abstract: According to one embodiment, a latent semantic mapping (LSM) space is generated from a collection of a plurality of documents, where the LSM space includes a plurality of document vectors, each representing one of the documents in the collection. For each of the document vectors considered as a centroid document vector, a group of document vectors is identified in the LSM space that are within a predetermined hypersphere diameter from the centroid document vector. As a result, multiple groups of document vectors are formed. The predetermined hypersphere diameter represents a predetermined closeness measure among the document vectors in the LSM space. Thereafter, a group from the plurality of groups is designated as a cluster of document vectors, where the designated group contains a maximum number of document vectors among the plurality of groups.Type: ApplicationFiled: July 7, 2010Publication date: January 12, 2012Applicant: APPLE INC.Inventor: Jerome R. Bellegarda
-
Publication number: 20120004893Abstract: The present invention relates to a method for the automatic identification of at least one informative data filter from a data set that can be used to identify at least one relevant data subset against a target feature for subsequent hypothesis generation, model building and model testing. The present invention describes methods, and an initial implementation, for efficiently linking relevant data both within and across multiple domains and identifying informative statistical relationships across this data that can be integrated into agent-based models. The relationships, encoded by the agents, can then drive emergent behavior across the global system that is described in the integrated data environment.Type: ApplicationFiled: September 10, 2009Publication date: January 5, 2012Applicant: QUANTUM LEAP RESEARCH, INC.Inventors: Akhileswar Ganesh VAIDYANATHAN, Stephen D. PRIOR, Jijun Wang, Bin Yu
-
Patent number: 8090743Abstract: Provided are a document management system and method. The document management system including a database storing documents and a document classification unit for automatically classifying the documents stored in the database, wherein the document classification unit comprises a feature extraction module extracting features based on a keyword included in the documents and vectorizing the extracted features, a similarity judgment module judging similarity among the documents using vectors formed by the feature extraction module, and a classification system module classifying the documents stored in the database according to a preset classification system, the document classification unit performing document classification according to the classification system with respect to documents provided to the database.Type: GrantFiled: January 10, 2007Date of Patent: January 3, 2012Assignee: LG Electronics Inc.Inventors: Wan Kyu Cha, Jeong Joong Kim, Han Joon Ahn
-
Publication number: 20110320454Abstract: A system and method for constructing a hierarchical multi-faceted classification structure includes organizing a plurality of visual categories into a multi-relational reference ontology that accounts for a plurality of different types of relationships. Media artifacts are categorized into the plurality of visual categories. The categories of artifacts are refined based on faceted ontology relationships or constraints from the multi-relational reference ontology. The multi-relational reference ontology and the one or more media artifacts with relationships are stored as the hierarchical multi-faceted classification structure in computer readable memory storage.Type: ApplicationFiled: June 29, 2010Publication date: December 29, 2011Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: MATTHEW HILL, JOHN R. KENDER, APOSTOL NATSEV, QUOC-BAO NGUYEN, JOHN R. SMITH, JELENA TESIC, LEXING XIE, RONG YAN
-
Patent number: 8086609Abstract: In a method and apparatus for analyzing nodes of a Deterministic Finite Automata (DFA), an accessibility ranking, based on a DFA graph geometrical configuration, may be determined in order to determine cacheable portions of the DFA graph in order to reduce the number of external memory accesses. A walker process may be configured to walk the graph in a graph cache as well as main memory. The graph may be generated in a manner allowing each arc to include information if the node it is pointing to is stored in the graph cache or in main memory. The walker may use this information to determine whether or not to access the next arc in the graph cache or in main memory.Type: GrantFiled: November 1, 2007Date of Patent: December 27, 2011Assignee: Cavium, Inc.Inventors: Rajan Goyal, Muhammad Raghib Hussain, Trent Parker
-
Patent number: 8086504Abstract: Tag suggestions enable a hosting entity such as a website to determine one or more tags to suggest to a user for association with a particular item within an electronic catalog. After this determination, the hosting entity may suggest the determined tags to the user. To determine these tags, the hosting entity may employ techniques to determine items related to the particular item. The hosting entity then suggests some or all of the tags associated with the related items. Additionally or alternatively, the hosting entity may determine certain metadata associated with the particular item. The entity then may suggest this metadata, or some related phrase or tag, to the user for association with the particular item. However the tag suggestions are determined, the hosting entity may rank the tag suggestions to determine which tags to present to the user or to determine an order in which to present the tags.Type: GrantFiled: September 6, 2007Date of Patent: December 27, 2011Assignee: Amazon Technologies, Inc.Inventors: Russell A. Dicker, Waqas Ahmed, Aaron D. Wilson, Scott Allen Mongrain, Florin V. Manolache, Valentin Radu Munteanu, Val Dan Dar Ion I. Rosca, Corneliu Gabriel Alexandru Rudeanu
-
Publication number: 20110314022Abstract: In a KStore having a plurality of K nodes with count fields a method for updating count fields, receiving a particle to provide a received particle, updating selected node counts of the plurality of nodes counts in response to the received particle to provide first updated K node count fields, and saving selected K node count fields for later updating to provide second updated count fields are recited. The K nodes include elemental root nodes and the second updated K node count fields include elemental root nodes of the plurality of elemental root nodes. The second updated K node count fields include only elemental root nodes of the plurality of elemental root nodes. The first updated K node count fields include no elemental root nodes. The second updated K node count fields include K nodes pointed to by the Result pointers of the first updated K node count fields.Type: ApplicationFiled: June 8, 2006Publication date: December 22, 2011Applicant: Unisys CorporationInventors: Jane Campbell Mazzagatti, Steven L. Rajcan, Robert R. Buckwalter
-
Patent number: 8082248Abstract: A classification method and system for documents containing text sentences and images having meta-data. The classification method and system categorizes document sentences into subjective and non-subjective sentences and categorizes document images into descriptive and non-descriptive. The categorization is further used to calculate subjectivity and descriptive-images classification of a document. This classification system can be used by a web search engine to filter, sort or tag a set of document references based on user selection.Type: GrantFiled: May 29, 2008Date of Patent: December 20, 2011Inventor: Rania Abouyounes
-
Publication number: 20110302168Abstract: In a method for representing a text document with a graphical model, a document including a plurality of ordered words is received and a graph data structure for the document is created. The graph data structure includes a plurality of nodes and edges, with each node representing a distinct word in the document and each edge identifying a number of times two nodes occur within a predetermined distance from each other. The graph data structure is stored in an information repository.Type: ApplicationFiled: June 8, 2010Publication date: December 8, 2011Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Charu Aggarwal
-
Publication number: 20110302152Abstract: Techniques that may be used for detecting a primary content (e.g., a web page) that the user is viewing and presenting one or more pieces of supplemental content (e.g., social media data) together with the primary content. The supplemental content presented to the user together with the primary content may be content that is matched to the primary content and therefore detected to be relevant to the user. Detection of primary content and matching to supplemental content may be carried out based on a comparison of entities related to the primary and supplemental content. In some embodiments, an analysis of the primary content for entities may include ordering entities according to significance in the primary content and selecting top entities for comparison. Also, in some embodiments, multiple pieces of supplemental content may be displayed to a user categorized based on entity.Type: ApplicationFiled: June 7, 2010Publication date: December 8, 2011Applicant: Microsoft CorporationInventors: danah boyd, Gilad Lotan, Paul Oka, Emre Mehmet Kiciman, Chun-Kai Wang
-
Patent number: 8073851Abstract: To provide a content searching device which can efficiently present to the user a topical related keyword.Type: GrantFiled: March 2, 2009Date of Patent: December 6, 2011Assignee: Panasonic CorporationInventors: Kazutoyo Takata, Takashi Tsuzuki, Satoshi Matsuura
-
Publication number: 20110295857Abstract: A system and method for aligning multilingual content and indexing multilingual documents, to a computer readable data storage medium having stored thereon computer code means for indexing multilingual documents, to a system for presenting multilingual content. The method for aligning multilingual content and indexing multilingual documents comprises the steps of generating multiple bilingual terminology databases, wherein each bilingual terminology database associates respective terms in a pivot language with one or more terms in another language; and combining the multiple bilingual terminology databases to form a multilingual terminology database, wherein the multilingual terminology database associates terms in different languages via the pivot language terms.Type: ApplicationFiled: June 20, 2008Publication date: December 1, 2011Inventors: Ai Ti Aw, Min Zhang, Lian Hau Lee, Thuy Vu, Fon Lin Lai
-
Patent number: 8065293Abstract: An indexing system uses a graph-like data structure that clusters features indexes together. The minimum atomic value in the data structure is represented as a leaf node which is either a single feature index or a sequence of two or more feature indexes when a minimum sequence length is imposed. Root nodes are formed as clustered collections of leaf nodes and/or other root nodes. Context nodes are formed from root nodes that are associated with content that is being indexed. Links between a root node and other nodes each include a sequence order value that is used to maintain the sequencing order for feature indexes relative to the root node. The collection of nodes forms a graph-like data structure, where each context node is indexed according to the sequenced pattern of feature indexes. Clusters can be split, merged, and promoted to increase the efficiency in searching the data structure.Type: GrantFiled: October 24, 2007Date of Patent: November 22, 2011Assignee: Microsoft CorporationInventors: Kunal Mukerjee, R. Donald Thompson, III, Jeffrey Cole, Brendan Meeder
-
Patent number: 8065304Abstract: In one illustrative embodiment, a computer implemented method using asymmetric memory management is provided. The computer implemented method receives a request, containing a search key, to access an array of records in the asymmetric memory, wherein the array has a sorted prefix portion and an unsorted append portion, the append portion alternatively comprising a linked-list, and responsive to a determination that the request is an insert request, inserts the record in the request in arrival order in the unsorted append portion to form a newly inserted record. Responsive to a determination that the newly inserted record completes the group of records, stores an index, in sorted order, for the group of records.Type: GrantFiled: June 11, 2008Date of Patent: November 22, 2011Assignee: International Business Machines CorporationInventor: Kenneth Andrew Ross
-
Patent number: 8060513Abstract: A system and method for generating a frame of reference for a plurality of information, the plurality of information containing text data and obtained by a user through interaction with one or more information sources. The method and system include receiving selected information for analysis, the information including a plurality of text data and identifying a plurality of logical units of the text data. Also included are identifying a plurality of individual textual portions in each of the logical units and calculating the number of logical units associated with each of the individual textual portions of the plurality of textual portions for use in identifying a plurality of patterns including a respective pattern for each of the individual textual portions.Type: GrantFiled: July 1, 2008Date of Patent: November 15, 2011Assignee: Dossierview Inc.Inventors: Stephen Basco, Nick Foisy, Bruce Scanlan, Harsch Khandelwal
-
Patent number: 8055656Abstract: Embodiments of the invention provide techniques for searching for virtual objects of an immersive virtual environment based on user interactions within the virtual environment. Generally, embodiments provide an attribute index storing data describing attributes of virtual objects, and an interaction index storing data describing user interactions with virtual objects. Search queries may be evaluated using both the attribute index and interactions index. Thus, virtual objects may be searched in terms of object attributes as well as user interactions with the virtual objects.Type: GrantFiled: October 10, 2007Date of Patent: November 8, 2011Assignee: International Business Machines CorporationInventors: Ryan Kirk Cradick, Zachary Adam Garbow, Ryan Robert Pendergast
-
Publication number: 20110270808Abstract: A clustering-based approach to data standardization is provided. Certain embodiments take as input a plurality of addresses, identify one or more features of the addresses, cluster the addresses based on the one or more features, utilize the cluster(s) to provide a data-based context useful in identifying one or more synonyms for elements contained in the address(es), and standardize the address(es) to an acceptable format, with one or more synonyms and/or other elements being added to or taken away from the input address(es) as part of the standardization process.Type: ApplicationFiled: April 30, 2010Publication date: November 3, 2011Applicant: International Business Machines CorporationInventors: Tanveer A. Faruquie, Sachindra Joshi, Hima P. Karanam, Marvin Mendelssohn, Mukesh K. Mohania, Angel Smith, L. V. Subramaniam, Girish Venkatachaliah
-
Publication number: 20110264649Abstract: Methods, systems, and apparatus, including medium-encoded computer program products, for providing an adaptive knowledge platform. In one or more aspects, a system can include a knowledge management component to acquire, classify and disseminate information of a dataset; a human-computer interaction component to visualize multiple perspectives of the dataset and to model user interactions with the multiple perspectives; and an adaptivity component to modify one or more of the multiple perspectives of the dataset based on a user-interaction model.Type: ApplicationFiled: April 28, 2009Publication date: October 27, 2011Inventors: Ruey-Lung Hsiao, Eugene B. Shirley, Jr.
-
Patent number: 8046364Abstract: A method and system for analyzing a patent disclosure is disclosed. The method and system comprise a disclosure analysis and a separate claims analysis, such that each analysis may be performed independently. Missing and incorrect reference labels are identified within the disclosure. Antecedent basis and specification support are checked for the claim elements. Terms within the specification that do not have a reference number, but may require one, are identified, provided that they fit the profile of one of a set of particular lexical patterns.Type: GrantFiled: December 2, 2007Date of Patent: October 25, 2011Assignee: Veripat, LLCInventor: Michael Robert Kahn
-
Patent number: 8046363Abstract: Provided are a system and method of clustering documents. The system includes a document DB, a document feature writing unit storing documents, a document retrieving unit, a clustering unit, and a cluster DB. The document DB stores documents. The document feature writing unit extracts attribute information of documents stored in the document database, and writes indexes with respect to the respective documents on the basis of the attribute information. The document retrieving unit retrieves documents including a query input by a user, using the indexes. The clustering unit includes a representative vector calculator calculating feature vectors and a representative vector of the retrieved documents, and a similarity calculator calculating similarities between the documents using the feature vectors and the representative vector. The cluster database stores documents clustered by the clustering unit.Type: GrantFiled: January 10, 2007Date of Patent: October 25, 2011Assignee: LG Electronics Inc.Inventors: Wan Kyu Cha, Jeong Joong Kim, Han Joon Ahn
-
Publication number: 20110258193Abstract: One embodiment of the present invention provides a system for estimating a similarity level between semantic entities. During operation, the system selects two or more semantic entities associated with a number documents. The system subsequently parses the documents into sub-parts, and calculates the similarity level between the semantic entities based on occurrences of the semantic entities within the sub-parts of the documents.Type: ApplicationFiled: April 15, 2010Publication date: October 20, 2011Applicant: PALO ALTO RESEARCH CENTER INCORPORATEDInventors: Oliver Brdiczka, Petro Hizalev
-
Publication number: 20110252036Abstract: A domain-specific sentiment classifier that can be used to score the polarity and magnitude of sentiment expressed by domain-specific documents is created. A domain-independent sentiment lexicon is established and a classifier uses the lexicon to score sentiment of domain-specific documents. Sets of high-sentiment documents having positive and negative polarities are identified. The n-grams within the high-sentiment documents are filtered to remove extremely common n-grams. The filtered n-grams are saved as a domain-specific sentiment lexicon and are used as features in a model. The model is trained using a set of training documents which may be manually or automatically labeled as to their overall sentiment to produce sentiment scores for the n-grams in the domain-specific sentiment lexicon. This lexicon is used by the domain-specific sentiment classifier.Type: ApplicationFiled: June 17, 2011Publication date: October 13, 2011Inventors: Tyler J. Neylon, Kerry L. Hannan, Ryan T. McDonald, Michael Wells, Jeffrey C. Reynar
-
Patent number: 8037069Abstract: The described implementations relate to data analysis, such as membership checking. One technique identifies candidate matches between document sub-strings and database members utilizing signatures. The technique further verifies that the candidate matches are true matches.Type: GrantFiled: June 3, 2008Date of Patent: October 11, 2011Assignee: Microsoft CorporationInventors: Kaushik Chakrabarti, Surajt Chaudhuri, Venkatesh Ganti, Dong Xin
-
Patent number: 8037009Abstract: An embodiment relates generally to a method of linking. The method includes receiving a message associated with at least one technical issue being resolved in a first system and containing non-confidential information and searching a knowledgebase in a second system based on the message to obtain at least one related entry. The method also includes associating at least one related entry with the non-confidential information of the message, updating at least one related entry with the non-confidential information, or creating a new entry with the non-confidential information, in the knowledgebase.Type: GrantFiled: August 27, 2007Date of Patent: October 11, 2011Assignee: Red Hat, Inc.Inventor: Jason S. Hibbets
-
Publication number: 20110246467Abstract: One or more classification algorithms are applied to at least one natural language document in order to extract both attributes and values of a given product. Supervised classification algorithms, semi-supervised classification algorithms, unsupervised classification algorithms or combinations of such classification algorithms may be employed for this purpose. The at least one natural language document may be obtained via a public communication network. Two or more attributes (or two or more values) thus identified may be merged to form one or more attribute phrases or value phrases. Once attributes and values have been extracted in this manner, association or linking operations may be performed to establish attribute-value pairs that are descriptive of the product. In a presently preferred embodiment, an (unsupervised) algorithm is used to generate seed attributes and values which can then support a supervised or semi-supervised classification algorithm.Type: ApplicationFiled: June 13, 2011Publication date: October 6, 2011Applicant: Accenture Global Services LimitedInventors: Katharina PROBST, Rayid GHANI, Andrew E. FANO, Marko KREMA, Yan LIU
-
Patent number: 8032521Abstract: Embodiments of the present invention address deficiencies of the art in respect to structured content storage and provide a novel and non-obvious method, system and computer program product for managing structured content stored in a BLOB. In an embodiment of the invention, a performance optimized structured content management system can include a content repository, a content manager configured to provide access to structured content in the content repository and multiple different performance optimized containers disposed in the content repository. Each of the containers can store a portion of the structured content, and each of the containers can include a flattened form of original structured content in a primary binary large object (BLOB) and a parsed form of the original structured content in a secondary BLOB, the parsed form of the original structured content in the secondary BLOB indexing the flattened form of the original structured content in the primary BLOB.Type: GrantFiled: August 8, 2007Date of Patent: October 4, 2011Assignee: International Business Machines CorporationInventors: Stephen J. Garward, Mark C. Hampton, Eric Martinez de Morentin, Kenneth Sabir
-
Publication number: 20110231347Abstract: Named Entity Recognition in Query (NERQ) involves detection of a named entity in a given query and classification of the named entity into one or more predefined classes. The predefined classes may be based on a predefined taxonomy. A probabilistic approach may be taken to detecting and classifying named entities in queries, the approach using either query log data or click through data and Weakly Supervised Latent Dirichlet Allocation (WS-LDA) to construct and train a topic model.Type: ApplicationFiled: March 16, 2010Publication date: September 22, 2011Applicant: Microsoft CorporationInventors: Gu Xu, Hang Li, Jiafeng Guo
-
Patent number: 8024344Abstract: Presented are systems and methods for securely sharing confidential information. In such a method, term vectors corresponding to ones of a plurality of confidential terms included in a plurality of confidential documents is received. Each of the received term vectors is mapped into a vector space. Non-confidential documents are mapped into the vector space to generate a document vector corresponding to each non-confidential document, wherein the generation of each document vector is based on a subset of the received term vectors. At least one of the non-confidential documents is identified in response to a query mapped into the vector space.Type: GrantFiled: June 5, 2008Date of Patent: September 20, 2011Assignee: Content Analyst Company, LLCInventor: Roger Bradford
-
Patent number: 8024341Abstract: An expanded queries data structure is described. The data structure is produced on the basis of a set of seed queries, and consists of entries each specifying an expanded query submitted by a user that has been determined to have a high degree of relatedness to at least a plurality of the seed queries of the set. The expanded queries specified by the entries of the expanded queries data structure can be used to define a segment of users expected to have interests characterized by the seed queries.Type: GrantFiled: July 10, 2008Date of Patent: September 20, 2011Assignee: AudienceScience Inc.Inventors: Yair Even-Zohar, Basem Nayfeh
-
Publication number: 20110225159Abstract: The disclosed embodiments provide a system and method for using modified Latent Semantic Analysis techniques to structure data for efficient search and display. The present invention creates a hierarchy of clustered documents, representing the topics of a domain corpus, through a process of optimal agglomerative clustering. The output from a search query is displayed in a fisheye view corresponding to the hierarchy of clustered documents. The fisheye view may link to a two-dimensional self-organizing map that represents semantic relationships between documents.Type: ApplicationFiled: January 27, 2011Publication date: September 15, 2011Inventor: Jonathan Murray
-
Publication number: 20110225160Abstract: A computer-readable, non-transitory medium stores therein an operation management support program that causes a computer to execute a process that includes acquiring execution history information recording for each element group included in activity diagrams expressing work procedures for operation processes executed by a system, correlations between elements and access destinations thereof; searching among elements not yet selected from among all element groups, for a second element having an access destination coinciding with that of a first element selected from among all element groups, the searching performed by referring to the acquired execution history information; setting the first and the second elements as synonymous elements, if a second element is retrieved at the searching; extracting from among the element groups included in the activity diagrams including synonymous elements, a common element string of elements common among the activity diagrams that include the synonymous elements; and outputType: ApplicationFiled: May 24, 2011Publication date: September 15, 2011Applicant: Fujitsu LimitedInventor: Masataka Sonoda
-
Patent number: 8019764Abstract: The present invention relates to a method of profiling an Internet endpoint associated with an Internet Protocol (IP) address, the method includes generating a profiling rule using an Internet search engine, obtaining a search result by inputting the IP address to the Internet search engine, and classifying the Internet endpoint based on the search result using the profiling rule.Type: GrantFiled: April 17, 2008Date of Patent: September 13, 2011Assignee: Narus Inc.Inventors: Antonio Nucci, Supranamaya Ranjan, Aleksandar Kuzmanovic