Latent Semantic Index Or Analysis (lsi Or Lsa) Patents (Class 707/739)
  • Publication number: 20120124051
    Abstract: An ontological information retrieval system is provided. According to an embodiment, the subject ontological information retrieval system can be utilized for computer-aided clinical Traditional Chinese Medicine (TCM) practice. In one implementation, a graphical user interface (GUI) is provided, enabling a user to input a query with symptoms determined from a patient, and the system's parser can find instances of the symptoms in a document object model (DOM) tree of the TCM ontological information. Diagnosis based upon the symptoms can be communicated to the user through the GUI. A relevance index (RI) and/or a frequency index (F1) can be further provided for evaluating a diagnosis by comparing the symptoms determined from a patient with the expected symptoms of the diagnosed illness and returning a value based on the number of matched symptoms, or a weighted index of matched symptoms.
    Type: Application
    Filed: July 29, 2010
    Publication date: May 17, 2012
    Inventors: Wilfred Wan Kei Lin, Allan Kang Ying Wong, Jackei Ho Kei Wong, Jewels Chun Wing Kong
  • Patent number: 8180778
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating action trails from web history are described. In one aspect, a method includes receiving a web content access history of a user, the content access history including one or more user actions, each user action being associated with a content item upon which the user action is performed and identifying one or more action trails from the content access history, each action trail including a sequence of user actions performed one content items relating to a topic. Identifying a particular action trail includes clustering the user actions into a series of segments using temporal criteria; calculating semantic similarities between the content items, and adding a segment of the series of segments to the action trail when the semantic similarities between the segment and another segment satisfy a similarity threshold.
    Type: Grant
    Filed: February 5, 2010
    Date of Patent: May 15, 2012
    Assignee: Google Inc.
    Inventors: Elin Pedersen, Karl A. Gyllstrom, Shengyin Gu, Peter Jin Hong
  • Patent number: 8180777
    Abstract: The present invention relates in general to methods and systems for comparing and maximizing the optimal selection of a first set of one or more data objects to a set of second data objects. In one embodiment, the first set of data objects represent one or more tasks to be fulfilled by a set of capabilities represented by the second data objects. In one embodiment, methods and systems are provided that apply topic modeling and similarity metrics to determine the optimal selection. In one embodiment, methods and systems are provided to determine the appropriateness of a set of second data objects to satisfy the requirements of a first data object given interaction attributes. Embodiments may be used to compare mission requirements with potential team members to determine the appropriateness of team members and teams for a given mission based on interaction attributes of the team members and teams.
    Type: Grant
    Filed: October 24, 2010
    Date of Patent: May 15, 2012
    Assignee: Aptima, Inc.
    Inventors: Andrew Duchon, Kari Kelton, Pacey Foster, Kara Orvis, Robert McCormack
  • Publication number: 20120109964
    Abstract: A method of classifying a set of semantic concepts on a second multimedia collection based upon adapting a set of semantic concept classifiers and updating concept affinity relations that were developed to classify the set of semantic concepts for a first multimedia collection. The method comprises providing the second multimedia collection from a different domain and a processor automatically classifying the semantic concepts from the second multimedia collection by adapting the semantic concept classifiers and updating the concept affinity relations to the second multimedia collection based upon the local smoothness over the concept affinity relations and the local smoothness over data affinity relations.
    Type: Application
    Filed: October 27, 2010
    Publication date: May 3, 2012
    Inventors: Wei Jiang, Alexander C. Loui
  • Patent number: 8171025
    Abstract: A density-based data clustering method, comprising a parameter-setting step for setting a scanning radius and a minimum threshold value, a dividing step for dividing a space of a plurality of data points according to the scanning radius, a data-retrieving step for retrieving one data point out of the plurality of data points as a core data point, a searching step for calculating a distance between the core data point and each of the query points, a grouping determination step for determining whether a number of the neighboring points is smaller than the minimum threshold value.
    Type: Grant
    Filed: January 6, 2010
    Date of Patent: May 1, 2012
    Assignee: National Pingtung University Of Science & Technology
    Inventors: Cheng-Fa Tsai, Chien-Tsung Wu
  • Patent number: 8166033
    Abstract: A system and method for matching and assembling records is provided. One embodiment of the invention assembles records by applying a method for grouping records based on matching fields, assembling a new record as a composite of the matched records, and then repeating the grouping, matching and assembling steps in a cascade where the matching grouping and assembling steps are modified as a function of the cascade step and the assembled records created in earlier steps.
    Type: Grant
    Filed: February 27, 2003
    Date of Patent: April 24, 2012
    Assignee: Parity Computing, Inc.
    Inventors: Zunaid H. Kazi, Christopher D. Rosin, Ramamohan Paturi, Holden P. Robbins, Mark W. S. Land
  • Patent number: 8156123
    Abstract: Methods and apparatuses for processing metadata are described herein. In one embodiment, when a file (e.g., a text, audio, and/or image files) having metadata is received, the metadata and optionally at least a portion of the content of the file are extracted from the file to generate a first set of metadata. An analysis is performed on the extracted metadata and the content to generate a second set of metadata, which may include metadata in addition to the first set of metadata. The second set of metadata may be stored in a database suitable to be searched to identify or locate the file. Other methods and apparatuses are also described.
    Type: Grant
    Filed: April 22, 2005
    Date of Patent: April 10, 2012
    Assignee: Apple Inc.
    Inventors: Guy L. Tribble, Yan Arrouye, Dominic Giampaolo
  • Publication number: 20120072423
    Abstract: Particular portions of program execution data are specified and organized in semantic groups. A grouping expression written in a transformation syntax language specifies a pattern and a replacement, for grouping performance data samples. An exception to the pattern can also be specified. In response to the grouping expression, a cost accounting shows groups and their costs. The grouping expression may operate on names and/or name-associated characteristics such as private/public status, author, directory, and the like. Samples may represent nodes in a directed acyclic graph memorializing call stacks or memory allocation. Grouping expressions are used to group nodes and consolidate costs by various procedures when making modified sample stacks: clustering-by-name, entry-group-clustering, folding-by-name, a folding-by-cost. An entry group clustering shows at least one entry point name while avoiding unwanted detail.
    Type: Application
    Filed: September 20, 2010
    Publication date: March 22, 2012
    Applicant: MICROSOFT CORPORATION
    Inventors: Vance Morrison, Joshua Ryan Williams
  • Patent number: 8140515
    Abstract: Users of electronic documents are classified for profiling and targeting of additional relevant content. Behavioral data is gathered from user registration information and user activity, and user documents and actions are categorized. Registration information is combined with collaborative and editorial data to provide user profile information. Author-generated document classification information is analyzed and assigned a first taxonomic noun to characterize the document. User-generated tags characterizing a portion of the document are assigned a second taxonomic noun. Search terms that resulted in the user accessing the document are identified and assigned a third taxonomic noun. Attributes related to how the document was accessed are evaluated and assigned a fourth taxonomic noun. The document is processed using pattern rules to extract a fifth taxonomic noun.
    Type: Grant
    Filed: October 28, 2009
    Date of Patent: March 20, 2012
    Assignee: CBS Interactive Inc.
    Inventors: Tushar Pradhan, Thomas Osborne, John Potter
  • Patent number: 8131725
    Abstract: Systems and methods for managing electronic data are disclosed. Various data management operations can be performed based on a metabase formed from metadata. Such metadata can be identified from an index of data interactions generated by a journaling module, and obtained from their associated data objects stored in one or more storage devices. In various embodiments, such processing of the index and storing of the metadata can facilitate, for example, enhanced data management operations, enhanced data identification operations, enhanced storage operations, data classification for organizing and storing the metadata, cataloging of metadata for the stored metadata, and/or user interfaces for managing data. In various embodiments, the metabase can be configured in different ways. For example, the metabase can be stored separately from the data objects so as to allow obtaining of information about the data objects without accessing the data objects or a data structure used by a file system.
    Type: Grant
    Filed: September 20, 2010
    Date of Patent: March 6, 2012
    Assignee: Comm Vault Systems, Inc.
    Inventors: Anand Prahlad, Jeremy Alan Schwartz, David Ngo, Brian Brockway, Marcus S. Muller
  • Patent number: 8131684
    Abstract: In one embodiment, input is received from a user defining a classification and an analytic for the classification. Multiple classifications and analytics may be defined by a user. A definition of relevance parameters is determined that characterize the classification and a set of analytics measures associated with the analytic. The definition may be for the classification. Unstructured data and structured data are analyzed based on the definition of the relevance parameters to determine relevant data in the unstructured data and the structured data. The relevant data being data that is determined to be relevant to the classification defined by the user. An index of the terms from the relevant data is determined. The index is useable by an analytics tool to provide results for queries of the unstructured data and structured data. The query may be used within the classification such that targeted results are provided using the index and the relevant data to the classification.
    Type: Grant
    Filed: March 21, 2011
    Date of Patent: March 6, 2012
    Assignee: Aumni Data Inc.
    Inventors: Joan Wrabetz, Aloke Guha
  • Publication number: 20120054185
    Abstract: The different illustrative embodiments provide a method, a computer program product, and an apparatus for managing information. A request to store text in a table in a database is received. A determination is made as to whether a first collection of textual information having a first concept that is related to a second concept for the text is present in the database responsive to receiving the request containing the text. The text is associated with the first collection of textual information in the database responsive to a determination that the first collection of textual information in the database having the first concept that is related to the second concept for the text is present in the database. A second collection for the data with a third concept that is related to the second concept for the text within the degree of relatedness is created.
    Type: Application
    Filed: August 31, 2010
    Publication date: March 1, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Sandra K. Johnson, Grant D. Miller, Robert F. Pryor
  • Publication number: 20120041953
    Abstract: A latent topic labels text mining system and method to mine and analyze the content of textual data. Embodiments of the system and method are particularly well suited for use on microblog data to help people identify posts they want to read and to find people that they want to follow. Embodiments of the system and method use a modified Labeled LDA technique (called an L+LDA technique) that analyzes content using a combination of labeled and latent topics. The resultant data is assigned labels one of four labels to generate a lower-dimensional representation of the data that the individual words in a microblog post. This learned topic representation is used to characterize, summarize, filter, find, suggest, and compare the content of microblog posts. Embodiments of the system and method also include visualization techniques such as a tag cloud visualization that is used to visualize microblogging data.
    Type: Application
    Filed: August 16, 2010
    Publication date: February 16, 2012
    Applicant: Microsoft Corporation
    Inventors: Susan Theresa Dumais, Daniel Ramage, Daniel John Liebling, Steven Mark Drucker
  • Patent number: 8117205
    Abstract: A method and system for enhancing the quality of a bookmark or a set of bookmarks that have been organized by topic and contain information related to that topic. The method and system analyzes documents accessible by the bookmark or set of bookmarks and performs a search using key terms from that analysis in a vector called a latent similarity metric. The terms that result from this search are preferably ranked in a hierarchy or the like and utilized in a subsequent search to locate and rank additional related documents.
    Type: Grant
    Filed: July 8, 2008
    Date of Patent: February 14, 2012
    Assignee: International Business Machines Corporation
    Inventor: Michael D. Rychener
  • Patent number: 8112436
    Abstract: In one embodiment, access a search query comprising one or more query words, at least one of the query words representing one or more query concepts; access a network document identified for a search query by a search engine, the network document comprising one or more document words, at least one of the document words representing one or more document concepts; semantic-text match the search query and the network document to determine one or more negative semantic-text matches; and construct one or more negative features based on the negative semantic-text matches.
    Type: Grant
    Filed: September 21, 2009
    Date of Patent: February 7, 2012
    Assignee: Yahoo ! Inc.
    Inventors: Yumao Lu, Lei Duan, Fan Li, Benoit Dumoulin, Xing Wei
  • Patent number: 8108398
    Abstract: A system that facilitates data presentation and management includes at least one database to store a corpus of data relating to one or more topics. The system further includes a summarizer component to automatically determine a subset of the data over the corpus of data relating to at least one of the topic(s), wherein the subset forms a summary of at least one topic.
    Type: Grant
    Filed: June 29, 2007
    Date of Patent: January 31, 2012
    Assignee: Microsoft Corporation
    Inventors: Shai Guday, Bret P. O'Rourke, John Mark Miller, James Morris Alkove, Andrew David Wilson
  • Patent number: 8108376
    Abstract: A document set, and history documents including documents, etc., browsed by a user are input. The document set and the history documents are each analyzed to obtain characteristic vectors. A plurality of topic clusters and a plurality of sub-topic clusters are obtained by clustering the document set. A transition structure showing transitions of topics among the sub-topic clusters is generated, and a characteristic attribute is extracted from each topic cluster and each sub-topic cluster. An cluster-of-interest is extracted in comparison among characteristic vectors of the history documents and a characteristic vector of each document included in the document set, a sub-topic cluster having transition relations with the cluster-of-interest is obtained on the basis of a transition structure owned by the cluster-of-interest, and a document included in the sub-topic cluster is extracted as a recommended document to be presented together with the characteristic attribute.
    Type: Grant
    Filed: March 20, 2009
    Date of Patent: January 31, 2012
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Masayuki Okamoto, Masaaki Kikuchi
  • Publication number: 20120023103
    Abstract: In one embodiment, a method of generating annotation tags (28) for a digital image (22) includes maintaining a library (16) of human-meaningful words or phrases organized as category entries (72) according to a number of defined image description categories (70), and receiving context metadata (20) associated with the capture of a given digital image (22). The method further includes selecting particular category entries (72-1, 72-2) as vocabulary metadata (24) for the digital image (22) by mapping the context metadata (20) into the library (16), and generating annotation tags (28) for the digital image (22) by logically combining the vocabulary metadata (24) according to a defined set of deductive logic rules (30) that are predicated on the defined image description categories (70). In another embodiment, a processing apparatus (12), such as a digital processor (18, 26) and supporting memory (14), etc., is configured to carry out the above method, or to carry out variations of the above method.
    Type: Application
    Filed: January 21, 2009
    Publication date: January 26, 2012
    Applicant: TELEFONAKTIEBOLAGET LM ERICSSON (PUBL)
    Inventors: Joakim Soderberg, Jonas Bjork, Andreas Fasbender
  • Publication number: 20120011124
    Abstract: According to one embodiment, a latent semantic mapping (LSM) space is generated from a collection of a plurality of documents, where the LSM space includes a plurality of document vectors, each representing one of the documents in the collection. For each of the document vectors considered as a centroid document vector, a group of document vectors is identified in the LSM space that are within a predetermined hypersphere diameter from the centroid document vector. As a result, multiple groups of document vectors are formed. The predetermined hypersphere diameter represents a predetermined closeness measure among the document vectors in the LSM space. Thereafter, a group from the plurality of groups is designated as a cluster of document vectors, where the designated group contains a maximum number of document vectors among the plurality of groups.
    Type: Application
    Filed: July 7, 2010
    Publication date: January 12, 2012
    Applicant: APPLE INC.
    Inventor: Jerome R. Bellegarda
  • Publication number: 20120004893
    Abstract: The present invention relates to a method for the automatic identification of at least one informative data filter from a data set that can be used to identify at least one relevant data subset against a target feature for subsequent hypothesis generation, model building and model testing. The present invention describes methods, and an initial implementation, for efficiently linking relevant data both within and across multiple domains and identifying informative statistical relationships across this data that can be integrated into agent-based models. The relationships, encoded by the agents, can then drive emergent behavior across the global system that is described in the integrated data environment.
    Type: Application
    Filed: September 10, 2009
    Publication date: January 5, 2012
    Applicant: QUANTUM LEAP RESEARCH, INC.
    Inventors: Akhileswar Ganesh VAIDYANATHAN, Stephen D. PRIOR, Jijun Wang, Bin Yu
  • Patent number: 8090743
    Abstract: Provided are a document management system and method. The document management system including a database storing documents and a document classification unit for automatically classifying the documents stored in the database, wherein the document classification unit comprises a feature extraction module extracting features based on a keyword included in the documents and vectorizing the extracted features, a similarity judgment module judging similarity among the documents using vectors formed by the feature extraction module, and a classification system module classifying the documents stored in the database according to a preset classification system, the document classification unit performing document classification according to the classification system with respect to documents provided to the database.
    Type: Grant
    Filed: January 10, 2007
    Date of Patent: January 3, 2012
    Assignee: LG Electronics Inc.
    Inventors: Wan Kyu Cha, Jeong Joong Kim, Han Joon Ahn
  • Publication number: 20110320454
    Abstract: A system and method for constructing a hierarchical multi-faceted classification structure includes organizing a plurality of visual categories into a multi-relational reference ontology that accounts for a plurality of different types of relationships. Media artifacts are categorized into the plurality of visual categories. The categories of artifacts are refined based on faceted ontology relationships or constraints from the multi-relational reference ontology. The multi-relational reference ontology and the one or more media artifacts with relationships are stored as the hierarchical multi-faceted classification structure in computer readable memory storage.
    Type: Application
    Filed: June 29, 2010
    Publication date: December 29, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: MATTHEW HILL, JOHN R. KENDER, APOSTOL NATSEV, QUOC-BAO NGUYEN, JOHN R. SMITH, JELENA TESIC, LEXING XIE, RONG YAN
  • Patent number: 8086609
    Abstract: In a method and apparatus for analyzing nodes of a Deterministic Finite Automata (DFA), an accessibility ranking, based on a DFA graph geometrical configuration, may be determined in order to determine cacheable portions of the DFA graph in order to reduce the number of external memory accesses. A walker process may be configured to walk the graph in a graph cache as well as main memory. The graph may be generated in a manner allowing each arc to include information if the node it is pointing to is stored in the graph cache or in main memory. The walker may use this information to determine whether or not to access the next arc in the graph cache or in main memory.
    Type: Grant
    Filed: November 1, 2007
    Date of Patent: December 27, 2011
    Assignee: Cavium, Inc.
    Inventors: Rajan Goyal, Muhammad Raghib Hussain, Trent Parker
  • Patent number: 8086504
    Abstract: Tag suggestions enable a hosting entity such as a website to determine one or more tags to suggest to a user for association with a particular item within an electronic catalog. After this determination, the hosting entity may suggest the determined tags to the user. To determine these tags, the hosting entity may employ techniques to determine items related to the particular item. The hosting entity then suggests some or all of the tags associated with the related items. Additionally or alternatively, the hosting entity may determine certain metadata associated with the particular item. The entity then may suggest this metadata, or some related phrase or tag, to the user for association with the particular item. However the tag suggestions are determined, the hosting entity may rank the tag suggestions to determine which tags to present to the user or to determine an order in which to present the tags.
    Type: Grant
    Filed: September 6, 2007
    Date of Patent: December 27, 2011
    Assignee: Amazon Technologies, Inc.
    Inventors: Russell A. Dicker, Waqas Ahmed, Aaron D. Wilson, Scott Allen Mongrain, Florin V. Manolache, Valentin Radu Munteanu, Val Dan Dar Ion I. Rosca, Corneliu Gabriel Alexandru Rudeanu
  • Publication number: 20110314022
    Abstract: In a KStore having a plurality of K nodes with count fields a method for updating count fields, receiving a particle to provide a received particle, updating selected node counts of the plurality of nodes counts in response to the received particle to provide first updated K node count fields, and saving selected K node count fields for later updating to provide second updated count fields are recited. The K nodes include elemental root nodes and the second updated K node count fields include elemental root nodes of the plurality of elemental root nodes. The second updated K node count fields include only elemental root nodes of the plurality of elemental root nodes. The first updated K node count fields include no elemental root nodes. The second updated K node count fields include K nodes pointed to by the Result pointers of the first updated K node count fields.
    Type: Application
    Filed: June 8, 2006
    Publication date: December 22, 2011
    Applicant: Unisys Corporation
    Inventors: Jane Campbell Mazzagatti, Steven L. Rajcan, Robert R. Buckwalter
  • Patent number: 8082248
    Abstract: A classification method and system for documents containing text sentences and images having meta-data. The classification method and system categorizes document sentences into subjective and non-subjective sentences and categorizes document images into descriptive and non-descriptive. The categorization is further used to calculate subjectivity and descriptive-images classification of a document. This classification system can be used by a web search engine to filter, sort or tag a set of document references based on user selection.
    Type: Grant
    Filed: May 29, 2008
    Date of Patent: December 20, 2011
    Inventor: Rania Abouyounes
  • Publication number: 20110302168
    Abstract: In a method for representing a text document with a graphical model, a document including a plurality of ordered words is received and a graph data structure for the document is created. The graph data structure includes a plurality of nodes and edges, with each node representing a distinct word in the document and each edge identifying a number of times two nodes occur within a predetermined distance from each other. The graph data structure is stored in an information repository.
    Type: Application
    Filed: June 8, 2010
    Publication date: December 8, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Charu Aggarwal
  • Publication number: 20110302152
    Abstract: Techniques that may be used for detecting a primary content (e.g., a web page) that the user is viewing and presenting one or more pieces of supplemental content (e.g., social media data) together with the primary content. The supplemental content presented to the user together with the primary content may be content that is matched to the primary content and therefore detected to be relevant to the user. Detection of primary content and matching to supplemental content may be carried out based on a comparison of entities related to the primary and supplemental content. In some embodiments, an analysis of the primary content for entities may include ordering entities according to significance in the primary content and selecting top entities for comparison. Also, in some embodiments, multiple pieces of supplemental content may be displayed to a user categorized based on entity.
    Type: Application
    Filed: June 7, 2010
    Publication date: December 8, 2011
    Applicant: Microsoft Corporation
    Inventors: danah boyd, Gilad Lotan, Paul Oka, Emre Mehmet Kiciman, Chun-Kai Wang
  • Patent number: 8073851
    Abstract: To provide a content searching device which can efficiently present to the user a topical related keyword.
    Type: Grant
    Filed: March 2, 2009
    Date of Patent: December 6, 2011
    Assignee: Panasonic Corporation
    Inventors: Kazutoyo Takata, Takashi Tsuzuki, Satoshi Matsuura
  • Publication number: 20110295857
    Abstract: A system and method for aligning multilingual content and indexing multilingual documents, to a computer readable data storage medium having stored thereon computer code means for indexing multilingual documents, to a system for presenting multilingual content. The method for aligning multilingual content and indexing multilingual documents comprises the steps of generating multiple bilingual terminology databases, wherein each bilingual terminology database associates respective terms in a pivot language with one or more terms in another language; and combining the multiple bilingual terminology databases to form a multilingual terminology database, wherein the multilingual terminology database associates terms in different languages via the pivot language terms.
    Type: Application
    Filed: June 20, 2008
    Publication date: December 1, 2011
    Inventors: Ai Ti Aw, Min Zhang, Lian Hau Lee, Thuy Vu, Fon Lin Lai
  • Patent number: 8065293
    Abstract: An indexing system uses a graph-like data structure that clusters features indexes together. The minimum atomic value in the data structure is represented as a leaf node which is either a single feature index or a sequence of two or more feature indexes when a minimum sequence length is imposed. Root nodes are formed as clustered collections of leaf nodes and/or other root nodes. Context nodes are formed from root nodes that are associated with content that is being indexed. Links between a root node and other nodes each include a sequence order value that is used to maintain the sequencing order for feature indexes relative to the root node. The collection of nodes forms a graph-like data structure, where each context node is indexed according to the sequenced pattern of feature indexes. Clusters can be split, merged, and promoted to increase the efficiency in searching the data structure.
    Type: Grant
    Filed: October 24, 2007
    Date of Patent: November 22, 2011
    Assignee: Microsoft Corporation
    Inventors: Kunal Mukerjee, R. Donald Thompson, III, Jeffrey Cole, Brendan Meeder
  • Patent number: 8065304
    Abstract: In one illustrative embodiment, a computer implemented method using asymmetric memory management is provided. The computer implemented method receives a request, containing a search key, to access an array of records in the asymmetric memory, wherein the array has a sorted prefix portion and an unsorted append portion, the append portion alternatively comprising a linked-list, and responsive to a determination that the request is an insert request, inserts the record in the request in arrival order in the unsorted append portion to form a newly inserted record. Responsive to a determination that the newly inserted record completes the group of records, stores an index, in sorted order, for the group of records.
    Type: Grant
    Filed: June 11, 2008
    Date of Patent: November 22, 2011
    Assignee: International Business Machines Corporation
    Inventor: Kenneth Andrew Ross
  • Patent number: 8060513
    Abstract: A system and method for generating a frame of reference for a plurality of information, the plurality of information containing text data and obtained by a user through interaction with one or more information sources. The method and system include receiving selected information for analysis, the information including a plurality of text data and identifying a plurality of logical units of the text data. Also included are identifying a plurality of individual textual portions in each of the logical units and calculating the number of logical units associated with each of the individual textual portions of the plurality of textual portions for use in identifying a plurality of patterns including a respective pattern for each of the individual textual portions.
    Type: Grant
    Filed: July 1, 2008
    Date of Patent: November 15, 2011
    Assignee: Dossierview Inc.
    Inventors: Stephen Basco, Nick Foisy, Bruce Scanlan, Harsch Khandelwal
  • Patent number: 8055656
    Abstract: Embodiments of the invention provide techniques for searching for virtual objects of an immersive virtual environment based on user interactions within the virtual environment. Generally, embodiments provide an attribute index storing data describing attributes of virtual objects, and an interaction index storing data describing user interactions with virtual objects. Search queries may be evaluated using both the attribute index and interactions index. Thus, virtual objects may be searched in terms of object attributes as well as user interactions with the virtual objects.
    Type: Grant
    Filed: October 10, 2007
    Date of Patent: November 8, 2011
    Assignee: International Business Machines Corporation
    Inventors: Ryan Kirk Cradick, Zachary Adam Garbow, Ryan Robert Pendergast
  • Publication number: 20110270808
    Abstract: A clustering-based approach to data standardization is provided. Certain embodiments take as input a plurality of addresses, identify one or more features of the addresses, cluster the addresses based on the one or more features, utilize the cluster(s) to provide a data-based context useful in identifying one or more synonyms for elements contained in the address(es), and standardize the address(es) to an acceptable format, with one or more synonyms and/or other elements being added to or taken away from the input address(es) as part of the standardization process.
    Type: Application
    Filed: April 30, 2010
    Publication date: November 3, 2011
    Applicant: International Business Machines Corporation
    Inventors: Tanveer A. Faruquie, Sachindra Joshi, Hima P. Karanam, Marvin Mendelssohn, Mukesh K. Mohania, Angel Smith, L. V. Subramaniam, Girish Venkatachaliah
  • Publication number: 20110264649
    Abstract: Methods, systems, and apparatus, including medium-encoded computer program products, for providing an adaptive knowledge platform. In one or more aspects, a system can include a knowledge management component to acquire, classify and disseminate information of a dataset; a human-computer interaction component to visualize multiple perspectives of the dataset and to model user interactions with the multiple perspectives; and an adaptivity component to modify one or more of the multiple perspectives of the dataset based on a user-interaction model.
    Type: Application
    Filed: April 28, 2009
    Publication date: October 27, 2011
    Inventors: Ruey-Lung Hsiao, Eugene B. Shirley, Jr.
  • Patent number: 8046364
    Abstract: A method and system for analyzing a patent disclosure is disclosed. The method and system comprise a disclosure analysis and a separate claims analysis, such that each analysis may be performed independently. Missing and incorrect reference labels are identified within the disclosure. Antecedent basis and specification support are checked for the claim elements. Terms within the specification that do not have a reference number, but may require one, are identified, provided that they fit the profile of one of a set of particular lexical patterns.
    Type: Grant
    Filed: December 2, 2007
    Date of Patent: October 25, 2011
    Assignee: Veripat, LLC
    Inventor: Michael Robert Kahn
  • Patent number: 8046363
    Abstract: Provided are a system and method of clustering documents. The system includes a document DB, a document feature writing unit storing documents, a document retrieving unit, a clustering unit, and a cluster DB. The document DB stores documents. The document feature writing unit extracts attribute information of documents stored in the document database, and writes indexes with respect to the respective documents on the basis of the attribute information. The document retrieving unit retrieves documents including a query input by a user, using the indexes. The clustering unit includes a representative vector calculator calculating feature vectors and a representative vector of the retrieved documents, and a similarity calculator calculating similarities between the documents using the feature vectors and the representative vector. The cluster database stores documents clustered by the clustering unit.
    Type: Grant
    Filed: January 10, 2007
    Date of Patent: October 25, 2011
    Assignee: LG Electronics Inc.
    Inventors: Wan Kyu Cha, Jeong Joong Kim, Han Joon Ahn
  • Publication number: 20110258193
    Abstract: One embodiment of the present invention provides a system for estimating a similarity level between semantic entities. During operation, the system selects two or more semantic entities associated with a number documents. The system subsequently parses the documents into sub-parts, and calculates the similarity level between the semantic entities based on occurrences of the semantic entities within the sub-parts of the documents.
    Type: Application
    Filed: April 15, 2010
    Publication date: October 20, 2011
    Applicant: PALO ALTO RESEARCH CENTER INCORPORATED
    Inventors: Oliver Brdiczka, Petro Hizalev
  • Publication number: 20110252036
    Abstract: A domain-specific sentiment classifier that can be used to score the polarity and magnitude of sentiment expressed by domain-specific documents is created. A domain-independent sentiment lexicon is established and a classifier uses the lexicon to score sentiment of domain-specific documents. Sets of high-sentiment documents having positive and negative polarities are identified. The n-grams within the high-sentiment documents are filtered to remove extremely common n-grams. The filtered n-grams are saved as a domain-specific sentiment lexicon and are used as features in a model. The model is trained using a set of training documents which may be manually or automatically labeled as to their overall sentiment to produce sentiment scores for the n-grams in the domain-specific sentiment lexicon. This lexicon is used by the domain-specific sentiment classifier.
    Type: Application
    Filed: June 17, 2011
    Publication date: October 13, 2011
    Inventors: Tyler J. Neylon, Kerry L. Hannan, Ryan T. McDonald, Michael Wells, Jeffrey C. Reynar
  • Patent number: 8037069
    Abstract: The described implementations relate to data analysis, such as membership checking. One technique identifies candidate matches between document sub-strings and database members utilizing signatures. The technique further verifies that the candidate matches are true matches.
    Type: Grant
    Filed: June 3, 2008
    Date of Patent: October 11, 2011
    Assignee: Microsoft Corporation
    Inventors: Kaushik Chakrabarti, Surajt Chaudhuri, Venkatesh Ganti, Dong Xin
  • Patent number: 8037009
    Abstract: An embodiment relates generally to a method of linking. The method includes receiving a message associated with at least one technical issue being resolved in a first system and containing non-confidential information and searching a knowledgebase in a second system based on the message to obtain at least one related entry. The method also includes associating at least one related entry with the non-confidential information of the message, updating at least one related entry with the non-confidential information, or creating a new entry with the non-confidential information, in the knowledgebase.
    Type: Grant
    Filed: August 27, 2007
    Date of Patent: October 11, 2011
    Assignee: Red Hat, Inc.
    Inventor: Jason S. Hibbets
  • Publication number: 20110246467
    Abstract: One or more classification algorithms are applied to at least one natural language document in order to extract both attributes and values of a given product. Supervised classification algorithms, semi-supervised classification algorithms, unsupervised classification algorithms or combinations of such classification algorithms may be employed for this purpose. The at least one natural language document may be obtained via a public communication network. Two or more attributes (or two or more values) thus identified may be merged to form one or more attribute phrases or value phrases. Once attributes and values have been extracted in this manner, association or linking operations may be performed to establish attribute-value pairs that are descriptive of the product. In a presently preferred embodiment, an (unsupervised) algorithm is used to generate seed attributes and values which can then support a supervised or semi-supervised classification algorithm.
    Type: Application
    Filed: June 13, 2011
    Publication date: October 6, 2011
    Applicant: Accenture Global Services Limited
    Inventors: Katharina PROBST, Rayid GHANI, Andrew E. FANO, Marko KREMA, Yan LIU
  • Patent number: 8032521
    Abstract: Embodiments of the present invention address deficiencies of the art in respect to structured content storage and provide a novel and non-obvious method, system and computer program product for managing structured content stored in a BLOB. In an embodiment of the invention, a performance optimized structured content management system can include a content repository, a content manager configured to provide access to structured content in the content repository and multiple different performance optimized containers disposed in the content repository. Each of the containers can store a portion of the structured content, and each of the containers can include a flattened form of original structured content in a primary binary large object (BLOB) and a parsed form of the original structured content in a secondary BLOB, the parsed form of the original structured content in the secondary BLOB indexing the flattened form of the original structured content in the primary BLOB.
    Type: Grant
    Filed: August 8, 2007
    Date of Patent: October 4, 2011
    Assignee: International Business Machines Corporation
    Inventors: Stephen J. Garward, Mark C. Hampton, Eric Martinez de Morentin, Kenneth Sabir
  • Publication number: 20110231347
    Abstract: Named Entity Recognition in Query (NERQ) involves detection of a named entity in a given query and classification of the named entity into one or more predefined classes. The predefined classes may be based on a predefined taxonomy. A probabilistic approach may be taken to detecting and classifying named entities in queries, the approach using either query log data or click through data and Weakly Supervised Latent Dirichlet Allocation (WS-LDA) to construct and train a topic model.
    Type: Application
    Filed: March 16, 2010
    Publication date: September 22, 2011
    Applicant: Microsoft Corporation
    Inventors: Gu Xu, Hang Li, Jiafeng Guo
  • Patent number: 8024344
    Abstract: Presented are systems and methods for securely sharing confidential information. In such a method, term vectors corresponding to ones of a plurality of confidential terms included in a plurality of confidential documents is received. Each of the received term vectors is mapped into a vector space. Non-confidential documents are mapped into the vector space to generate a document vector corresponding to each non-confidential document, wherein the generation of each document vector is based on a subset of the received term vectors. At least one of the non-confidential documents is identified in response to a query mapped into the vector space.
    Type: Grant
    Filed: June 5, 2008
    Date of Patent: September 20, 2011
    Assignee: Content Analyst Company, LLC
    Inventor: Roger Bradford
  • Patent number: 8024341
    Abstract: An expanded queries data structure is described. The data structure is produced on the basis of a set of seed queries, and consists of entries each specifying an expanded query submitted by a user that has been determined to have a high degree of relatedness to at least a plurality of the seed queries of the set. The expanded queries specified by the entries of the expanded queries data structure can be used to define a segment of users expected to have interests characterized by the seed queries.
    Type: Grant
    Filed: July 10, 2008
    Date of Patent: September 20, 2011
    Assignee: AudienceScience Inc.
    Inventors: Yair Even-Zohar, Basem Nayfeh
  • Publication number: 20110225159
    Abstract: The disclosed embodiments provide a system and method for using modified Latent Semantic Analysis techniques to structure data for efficient search and display. The present invention creates a hierarchy of clustered documents, representing the topics of a domain corpus, through a process of optimal agglomerative clustering. The output from a search query is displayed in a fisheye view corresponding to the hierarchy of clustered documents. The fisheye view may link to a two-dimensional self-organizing map that represents semantic relationships between documents.
    Type: Application
    Filed: January 27, 2011
    Publication date: September 15, 2011
    Inventor: Jonathan Murray
  • Publication number: 20110225160
    Abstract: A computer-readable, non-transitory medium stores therein an operation management support program that causes a computer to execute a process that includes acquiring execution history information recording for each element group included in activity diagrams expressing work procedures for operation processes executed by a system, correlations between elements and access destinations thereof; searching among elements not yet selected from among all element groups, for a second element having an access destination coinciding with that of a first element selected from among all element groups, the searching performed by referring to the acquired execution history information; setting the first and the second elements as synonymous elements, if a second element is retrieved at the searching; extracting from among the element groups included in the activity diagrams including synonymous elements, a common element string of elements common among the activity diagrams that include the synonymous elements; and output
    Type: Application
    Filed: May 24, 2011
    Publication date: September 15, 2011
    Applicant: Fujitsu Limited
    Inventor: Masataka Sonoda
  • Patent number: 8019764
    Abstract: The present invention relates to a method of profiling an Internet endpoint associated with an Internet Protocol (IP) address, the method includes generating a profiling rule using an Internet search engine, obtaining a search result by inputting the IP address to the Internet search engine, and classifying the Internet endpoint based on the search result using the profiling rule.
    Type: Grant
    Filed: April 17, 2008
    Date of Patent: September 13, 2011
    Assignee: Narus Inc.
    Inventors: Antonio Nucci, Supranamaya Ranjan, Aleksandar Kuzmanovic