Latent Semantic Index Or Analysis (lsi Or Lsa) Patents (Class 707/739)
  • Publication number: 20140214842
    Abstract: A method and a system for summarization of short comments are provided. The system comprises a memory to store a comments collection. The comments collection stores a plurality of comments for later access. The comments respectively include an overall rating and at least one phrase. The system also includes one or more processors to implement an aspect module to map a portion of the plurality of comments to a first aspect corresponding to an attribute of the entity. The one or more processor also implementing a rating module to determine an aspect rating corresponding to the first aspect based on the respective overall rating of the portion of the plurality of comments.
    Type: Application
    Filed: April 1, 2014
    Publication date: July 31, 2014
    Applicant: eBay Inc.
    Inventors: Yue Lu, Neelakantan Sundaresan
  • Publication number: 20140214841
    Abstract: The present disclosure extends to methods, systems, and computer program products for updating a merchant database with new product items and placing the new product items within a hierarchy of existing merchant product offerings. In operation, the new product is represented by a title and description that can be semantically classified using a plurality of classification models and reviewed by users for accuracy.
    Type: Application
    Filed: January 31, 2013
    Publication date: July 31, 2014
    Applicant: Wal-Mart Stores, Inc.
    Inventors: Nikesh Lucky Garera, Narasimhan Rampalli, Dintyala Venkata Subrahmanya Ravikant, Srikanth Subramaniam, Chong Sun, Heather Dawn Yalin
  • Patent number: 8793253
    Abstract: The present invention discloses methods, systems, and tools for unified semantic ranking of compositions of ontological subjects. The method breaks a composition into a plurality of partitions as well as its constituent ontological subjects of different orders and builds a participation matrix indicating the participation of ontological subjects of the composition in other ontological subjects, i.e. the partitions, of the composition. Using the participation information of the ontological subjects into each other a similarity matrix is built from which the semantic importance ranks of the partitions of the composition are calculated. The method, systematically, enables the calculation of the semantic ranks of the ontological subjects of different orders of the composition. Various systems for implementing the method and numerous applications and services are disclosed.
    Type: Grant
    Filed: August 8, 2013
    Date of Patent: July 29, 2014
    Assignee: Hamid Hatami-Hanza
    Inventor: Hamid Hatami-Hanza
  • Publication number: 20140207782
    Abstract: System and method for computerized identification of themes in a large data set, the system comprising reducing the number of data set members in a large data set, using at least one computerized data set member pruning technique other than random selection; and using a computerized theme identification technique for identifying a plurality of themes in the reduced data set.
    Type: Application
    Filed: January 22, 2014
    Publication date: July 24, 2014
    Inventor: Yiftach RAVID
  • Publication number: 20140207784
    Abstract: Embodiments are directed towards generating a representative sampling as a subset from a larger dataset that includes unstructured data. A graphical user interface enables a user to provide various data selection parameters, including specifying a data source and one or more subset types desired, including one or more of latest records, earliest records, diverse records, outlier records, and/or random records. Diverse and/or outlier subset types may be obtained by generating clusters from an initial selection of records obtained from the larger dataset. An iteration analysis is performed to determine whether a sufficient number of clusters and/or cluster types have been generated that exceed at least one threshold and when not exceeded, additional clustering is performed on additional records. From the resultant clusters, and/or other subtype results, a subset of records is obtained as the representative sampling subset.
    Type: Application
    Filed: January 30, 2014
    Publication date: July 24, 2014
    Applicant: SPLUNK INC.
    Inventors: R. David CARASSO, Micah James Delfino
  • Publication number: 20140207783
    Abstract: System and method for computerized identification and presentation of semantic themes occurring in a set of electronic documents, comprising performing topic modeling on the set of documents thereby to yield a set of topics and for each topic, a topic-modeling output list of words; and using a processor performing a matching algorithm to match only a subset of each topic-modeling output list of words, to the output list's corresponding topic, such that each word appears in no more than a predetermined number of subsets from among said subsets.
    Type: Application
    Filed: January 22, 2014
    Publication date: July 24, 2014
    Inventor: Yiftach RAVID
  • Publication number: 20140201185
    Abstract: Systems and methods are discussed to automatically create a domain ontology that is a combination of ontologies. Some embodiments include systems and methods for developing a combined ontology for a website that includes extracting collocations for each webpage within the website, creating first and second ontologies from the collocations, and then aggregating the ontologies into a combined ontology. Some embodiments of the invention include unique ways to calculate collocations, to develop a smaller yet meaningful document sample from a large sample, to determine webpages of interest to users interacting with a website, and to determine topics of interest of users interacting with a website. Various other embodiments of the invention are disclosed.
    Type: Application
    Filed: January 17, 2013
    Publication date: July 17, 2014
    Applicant: Adobe Systems Incorporated
    Inventors: Walter Chang, Minhoe Hur, Geoff Baum
  • Patent number: 8782042
    Abstract: Some embodiments provide a program that identifies an entity having an entity attribute. The program receives, from each method of several methods, a set of candidate identity attributes that are each for identifying a particular entity having the entity attribute specified in the document. Each method of the several methods generates the corresponding set of candidate identity attributes based on the entity attribute specified in a document. The program calculates a score for each candidate identity attribute in the sets of candidate identity attributes. The program identifies, based on the sets of scores, an identity attribute from the sets of candidate identity attributes that identifies the entity having the entity attribute specified in the document.
    Type: Grant
    Filed: October 14, 2011
    Date of Patent: July 15, 2014
    Assignee: Firstrain, Inc.
    Inventors: David Cooke, Martin Betz, Ashutosh Joshi, Binay Mohanty
  • Publication number: 20140195539
    Abstract: A system and method are provided for automatically generating systematic reviews of received information in a field of science and technology, such as scientific literature, where the systematic review includes a systematic review of a research field in the scientific literature. The method includes the steps of constructing a time series networks of words, passages, documents, and citations and/or co-citations within received information into a synthesized network, decomposing the networks into clusters of fields or topics, performing part-of-speech tagging of text within the received information to provide tagged text, constructing semantic structures of concepts and/or assertions extracted from the source text, generating citation-based and content-based summaries of the clusters of fields or topics and the semantic structures, and generating structured narratives of the clusters of fields or topics and the summaries of the generated semantic structures.
    Type: Application
    Filed: October 1, 2013
    Publication date: July 10, 2014
    Applicant: Drexel University
    Inventor: Chaomei Chen
  • Publication number: 20140188885
    Abstract: Methods, systems, and computer readable storage medium embodiments for hashing with improved utilization and power efficiency are disclosed. Some embodiments include inserting a key in a selected bucket in accordance with an bucket identifier generated by a hash function, wherein the selected bucket is one of a plurality of buckets of a hash table configured in at least one memory, determining respective unique bit strings based upon corresponding bit positions for a plurality of keys in the selected bucket including the inserted key, inserting the respective unique bit strings in a table location corresponding to the bucket identifier, wherein the table location is one of a plurality of table locations in at least one control table configured in the at least one memory. Other embodiments include lookup operations in a hash table.
    Type: Application
    Filed: December 27, 2012
    Publication date: July 3, 2014
    Applicant: Broadcom Corporation
    Inventors: Abhay Kulkarni, Bhupesh Ramchandani
  • Patent number: 8762384
    Abstract: A method and system for performing a semantic search on structured data. An unstructured search query is received from a requestor. The query is evaluated within a computer to identify a best structured request based on the unstructured search query. The selected structured request is applied to a set of structured data. The result of the application of the structured request is then returned to the requestor.
    Type: Grant
    Filed: August 19, 2010
    Date of Patent: June 24, 2014
    Assignee: SAP Aktiengesellschaft
    Inventor: Robert Heidasch
  • Patent number: 8762368
    Abstract: A server is configured to receive, from a client, a query and context information associated with a document; obtain search results, based on the query, that identify documents relevant to the query; analyze the context information to identify content; generate first scores for a hierarchy of topics, that correspond to measures of relevance of the topics to the content; select a topic that is most relevant to the context information when the topic is associated with a greatest first score; generate second scores for the search results that correspond to measures of relevance, of the search results, to the topic; select one or more of the search results as being most relevant to the topic when the search results are associated with one or more greatest second scores; generate a search result document that includes the selected search results; and send, to a client, the search result document.
    Type: Grant
    Filed: April 30, 2012
    Date of Patent: June 24, 2014
    Assignee: Google Inc.
    Inventors: Sarveshwar Duddu, Kuntal Loya, Minh Tue Vo Thanh, Thorsten Brants
  • Publication number: 20140172858
    Abstract: A method and a system to automatically segment text based on header tokens is described. A relevance value and an irrelevance value are determined for each token in a description, assuming no tokens are left out of computations. The irrelevance value is based on occurrences of a token in a sample set of descriptions. The relevance value is an estimated probability of relevance based on the header of the description being segmented.
    Type: Application
    Filed: December 9, 2013
    Publication date: June 19, 2014
    Applicant: eBay Inc.
    Inventors: Badrul M. Sarwar, John A. Mount
  • Patent number: 8756234
    Abstract: A system and process for automated text analysis which can be used to identify phrases in reports such as medical reports includes identifying a phrase contained within a text, extracting the phrase from the text, determining a value of the phrase and, in response to the phrase having at least a threshold value, reducing the phrase to a root meaning. In one embodiment, the value of the phrase is assigned via lexicon-based hierarchical decision trees.
    Type: Grant
    Filed: November 16, 2004
    Date of Patent: June 17, 2014
    Assignee: The General Hospital Corporation
    Inventor: Keith J. Dreyer
  • Patent number: 8756215
    Abstract: A document to be indexed is initially indexed in dependence upon language-specific rules of a single language. A success metric is used to assess the effectiveness of the single language indexing. If a threshold level of success is not attained, the document is identified as multi-lingual. In response to identifying the document as multi-lingual, the document is queued for multi-lingual indexing. A document may be fragmented into a number of smaller documents, each of which is indexed separately.
    Type: Grant
    Filed: December 2, 2009
    Date of Patent: June 17, 2014
    Assignee: International Business Machines Corporation
    Inventor: Deep Shikha
  • Patent number: 8751503
    Abstract: A computer-readable, non-transitory medium stores therein an operation management support program that causes a computer to execute a process that includes acquiring execution history information recording for each element group included in activity diagrams expressing work procedures for operation processes executed by a system, correlations between elements and access destinations thereof; searching among elements not yet selected from among all element groups, for a second element having an access destination coinciding with that of a first element selected from among all element groups, the searching performed by referring to the acquired execution history information; setting the first and the second elements as synonymous elements, if a second element is retrieved at the searching; extracting from among the element groups included in the activity diagrams including synonymous elements, a common element string of elements common among the activity diagrams that include the synonymous elements; and output
    Type: Grant
    Filed: May 24, 2011
    Date of Patent: June 10, 2014
    Assignee: Fujitsu Limited
    Inventor: Masataka Sonoda
  • Patent number: 8751502
    Abstract: When executed, a computer program product generates a graphical user interface that renders results that are responsive to a search query of a rich media file. The graphical user interface includes a chronological representation of the rich media file, one or more occurrence markers along the chronological representation corresponding to actual occurrences of a desired term at an indicated chronological location in the rich media file, and an execution icon configured to launch a rich media application that renders a relevant portion that is responsive to the search query.
    Type: Grant
    Filed: December 30, 2005
    Date of Patent: June 10, 2014
    Assignee: AOL Inc.
    Inventor: Rakesh Agrawal
  • Publication number: 20140156665
    Abstract: Techniques are disclosed for efficiently and automatically classifying textual documents or files. In some embodiments, the classification process is integrated into or otherwise made part of the storage function, such that when the user initiates a save process for a given file, the file is processed through a classifier prior to (or contemporaneously with) completing the save function. In some such embodiments, textual content of the file is analyzed using natural language processing to identify a main or substantial concept discussed in the file, and one or more corresponding tags are then assigned to that file. Subsequently, the user can access that file based on the one or more tags, for instance, through a user interface that allows the user to select one or more content categories associated with the assigned tags. The files can be text-based, but may include other content as well, such as images, video, and audio.
    Type: Application
    Filed: December 3, 2012
    Publication date: June 5, 2014
    Applicant: ADOBE SYSTEMS INCORPORATED
    Inventor: Michael Kraley
  • Patent number: 8744855
    Abstract: Architectures and techniques are described to determine a reading level of an electronic book. In particular, words, phrases, clauses, and parts of speech of an electronic book may be tagged and used to determine the reading level of the electronic book. In some cases, the reading level of the electronic book is based on a level of complexity of sentences of the electronic book and a level of complexity of words of the electronic book.
    Type: Grant
    Filed: August 9, 2010
    Date of Patent: June 3, 2014
    Assignee: Amazon Technologies, Inc.
    Inventor: Daniel B. Rausch
  • Publication number: 20140143253
    Abstract: Systems, methods, and apparatus for clustering resources using rare features are provided. For example, an environment includes an extraction module, an index module, and a cluster module. The extractions module identifies a set of resources and extracts a plurality of features from the resources. The plurality of features may be rare features. The index module identifies and generates a rare features index. The cluster module identifies at least two resources that share rare features, creates one or more clusters based on the identified at least two resources, and associates resources that share similar features with the one or more clusters. Resources that do not share similar features are not associated with the one or more clusters. Identifying at least two resources that share rare features is based at least upon a threshold.
    Type: Application
    Filed: November 15, 2013
    Publication date: May 22, 2014
    Inventor: Joshua Powers
  • Patent number: 8732245
    Abstract: In a computer system, a system, method and computer program product for pre-selecting a folder for a current message. The system, method and computer program product involve (a) providing a folder pre-selection cache having n configurable entries, n being a predetermined positive integer greater than one, each configurable entry being configured to include an associated pre-selection criterion for matching with the current message, and an associated folder identification for identifying an associated folder in the plurality of folders; (b) for at least one entry in the folder pre-selection cache, comparing a comparison criterion, obtained from the current message, with the associated pre-selection criterion to determine a matching entry in the folder pre-selection cache; and, (c) pre-selecting the folder identified by the associated folder identification of the matching entry when the message comparison module determines the matching entry in the folder pre-selection cache.
    Type: Grant
    Filed: February 7, 2003
    Date of Patent: May 20, 2014
    Assignee: BlackBerry Limited
    Inventors: Anthony F. Scian, David P. Yach, R. Scotte Zinn, Gerhard D. Klassen
  • Patent number: 8725737
    Abstract: Systems and methods for managing electronic data are disclosed. Various data management operations can be performed based on a metabase formed from metadata. Such metadata can be identified from an index of data interactions generated by a journaling module, and obtained from their associated data objects stored in one or more storage devices. In various embodiments, such processing of the index and storing of the metadata can facilitate, for example, enhanced data management operations, enhanced data identification operations, enhanced storage operations, data classification for organizing and storing the metadata, cataloging of metadata for the stored metadata, and/or user interfaces for managing data. In various embodiments, the metabase can be configured in different ways. For example, the metabase can be stored separately from the data objects so as to allow obtaining of information about the data objects without accessing the data objects or a data structure used by a file system.
    Type: Grant
    Filed: September 11, 2012
    Date of Patent: May 13, 2014
    Assignee: CommVault Systems, Inc.
    Inventors: Anand Prahlad, Jeremy Alan Schwartz, David Ngo, Brian Brockway, Marcus S. Muller
  • Publication number: 20140129746
    Abstract: The present disclosure relates to real-time data management for a power grid and presents a real-time data management system, a system, method, apparatus and tangible computer readable medium for accessing data in a power grid, a system, method, apparatus and tangible computer readable medium for controlling a transmission delay of real-time data delivered via a real-time bus, and a system, method, apparatus and tangible computer readable medium for delivering real-time data in a power grid. In the real-time data management system of the present disclosure, a unified data model covering various organizations and various data resource is designed and a management scheme for clustered data is used to provide a transparent and high speed data access. Besides, multi-bus collaboration and bus performance optimization approaches are utilized to improve efficiency and performance of the buses.
    Type: Application
    Filed: February 26, 2013
    Publication date: May 8, 2014
    Applicant: ACCENTURE GLOBAL SERVICES LIMITED
    Inventors: Qin Zhou, Zhihui Yang, Xiaopei Cheng, Yan Gao, Guo Ma
  • Patent number: 8713021
    Abstract: According to one embodiment, a latent semantic mapping (LSM) space is generated from a collection of a plurality of documents, where the LSM space includes a plurality of document vectors, each representing one of the documents in the collection. For each of the document vectors considered as a centroid document vector, a group of document vectors is identified in the LSM space that are within a predetermined hypersphere diameter from the centroid document vector. As a result, multiple groups of document vectors are formed. The predetermined hypersphere diameter represents a predetermined closeness measure among the document vectors in the LSM space. Thereafter, a group from the plurality of groups is designated as a cluster of document vectors, where the designated group contains a maximum number of document vectors among the plurality of groups.
    Type: Grant
    Filed: July 7, 2010
    Date of Patent: April 29, 2014
    Assignee: Apple Inc.
    Inventor: Jerome R. Bellegarda
  • Publication number: 20140114978
    Abstract: The present invention is directed to a method, system, and article of manufacture for systematically and automatically identifying abnormal or collective behavior patterns in microblogging messages that produce burst phenomena, such as Twitter storms. A microblogging storm engine in a storm detection server is configured to detect and classify the volume, shape, and type of a Twitter storm when keying on topics such as, but not limited to, a brand, an event, a person, an entity, a country, or a controversial issue. The microblogging storm engine comprises a storm detection module, a storm classification module, a database interface module, and a sentiment process module. The storm detection module is configured to detect different patterns of microblogging storms by capturing the volume of a particular storm to assist in output statistical analysis. The storm classification module is configured to classify the storms into different types of a particular storm category.
    Type: Application
    Filed: October 24, 2013
    Publication date: April 24, 2014
    Applicant: METAVANA, INC.
    Inventors: Manjirnath CHATTERJEE, Rabia TURAN, Brian LUE
  • Patent number: 8706499
    Abstract: Client devices periodically capture ambient audio waveforms, generate waveform fingerprints, and upload the fingerprints to a server for analysis. The server compares the waveforms to a database of stored waveform fingerprints, and upon finding a match, pushes content or other information to the client device. The fingerprints in the database may be uploaded by other users, and compared to the received client waveform fingerprint based on common location or other social factors. Thus a client's location may be enhanced if the location of users whose fingerprints match the client's is known. In particular embodiments, the server may instruct clients whose fingerprints partially match to capture waveform data at a particular time and duration for further analysis and increased match confidence.
    Type: Grant
    Filed: August 16, 2011
    Date of Patent: April 22, 2014
    Assignee: Facebook, Inc.
    Inventors: Matthew Nicholas Papakipos, David Harry Garcia
  • Publication number: 20140101162
    Abstract: A method for recommending semantic annotations on a main document and sub documents is provided. The method includes: extracting a keyword of the main document; extracting a or a set of keyword of each sub document; and generating a or a set of keyword similarity of each of the sub documents based on a degree of similarity between the keyword of the main document and the keyword of each of the sub documents. The method also includes: obtaining a plurality of words appeared on each of the sub documents and calculating a frequency of each of the words; generating a semantic capacity of each of the sub documents according to the frequencies; grouping the main document and at least one of the sub documents into a semantic document set based on the semantic capacities and the keyword similarities; and annotating the main document according to the semantic document set.
    Type: Application
    Filed: October 9, 2012
    Publication date: April 10, 2014
    Applicant: INDUSTRIAL TECHNOLOGY RESEARCH INSTITUTE
    Inventors: Hsiang-Yuan Hsueh, Ko-Li Kan, Chi-Chou Chiang
  • Patent number: 8694504
    Abstract: Systems and methods for cladistics-based content searching, analysis, and/or diagrammatic representation of results in graphical user interface format for viewing by at least one user on a computer-type device or network, in particular for technology and patent-related content stored in at least one database.
    Type: Grant
    Filed: March 4, 2004
    Date of Patent: April 8, 2014
    Assignee: Spore, Inc.
    Inventors: Guy R. Beretich, Jr., JiNan Glasgow
  • Patent number: 8676805
    Abstract: Relational clustering has attracted more and more attention due to its phenomenal impact in various important applications which involve multi-type interrelated data objects, such as Web mining, search marketing, bioinformatics, citation analysis, and epidemiology. A probabilistic model is presented for relational clustering, which also provides a principal framework to unify various important clustering tasks including traditional attributes-based clustering, semi-supervised clustering, co-clustering and graph clustering. The model seeks to identify cluster structures for each type of data objects and interaction patterns between different types of objects. Under this model, parametric hard and soft relational clustering algorithms are provided under a large number of exponential family distributions.
    Type: Grant
    Filed: September 27, 2012
    Date of Patent: March 18, 2014
    Assignee: The Research Foundation for The State University of New York
    Inventors: Bo Long, Zhongfei Zhang
  • Publication number: 20140074845
    Abstract: Content of different formats may be sourced from various data sources such as content servers and ingested into a data integration server by an ingestion broker embodied on a non-transitory computer readable medium. The ingestion broker may normalize the content of different formats into a uniform representation that can be indexed and delivered across multiple digital channels for a variety of applications. The normalized content may be analyzed and semantic metadata may be determined from the normalized content. The normalized content can be semantically enriched by associating the semantic metadata and the like with the content. The semantic metadata can be stored in a semantic index that can be used for searching via the data integration server. During search, the semantic metadata can be instantiated as facets for user navigation and refinement of search criteria and additional semantic relationships can be assigned to the words in the normalized content.
    Type: Application
    Filed: November 13, 2013
    Publication date: March 13, 2014
    Applicant: Open Text Corporation
    Inventors: Pascal Dimassimo, Steve Pettigrew, Martin Brousseau, Charles-Olivier Simard, Eric Williams, Francis Lacroix, Alex Dowgailenko, Agostino Deligia, Jean-Michel Texier
  • Publication number: 20140074844
    Abstract: Disclosed is a method, system, and computer program product for semantically analyzing the content within an internal social network. Using the results of the analysis, the executives can gain a better understanding of, and insight into, the organization and its employees. A dashboard tool may be used in some embodiments of the invention to visualize the results of the semantic analysis.
    Type: Application
    Filed: September 9, 2013
    Publication date: March 13, 2014
    Applicant: Oracle International Corporation
    Inventors: Srividhya SUBRAMANIAN, Mary E.G. BEAR, Mehrshad SETAYESH, Noah HORTON
  • Patent number: 8671078
    Abstract: Embodiments are configured to provide sharing of business logic items. A document may contain business logic items, for example, sets, members, or measures. Some business logic items may be created by a publisher who wants to make the business logic available to other users so that others can access the business logic. Embodiments provide for using an integrated server platform search component to automatically retrieve business logic items which exist in one or more documents stored in a document library. This may allow for a publisher to provide business logic to other users without having to rely on the other users to retrieve the business logic from a specific document, and without requiring the other users to know of the existence of the business logic. Restrictions may be placed so that a publisher can control what specific pieces of business logic may be made available.
    Type: Grant
    Filed: August 31, 2011
    Date of Patent: March 11, 2014
    Assignee: Microsoft Corporation
    Inventors: Josh C. Zimmerman, David Scott Gustafson, Kurt Leonard Ziegler
  • Publication number: 20140067815
    Abstract: The present disclosure provides example methods and apparatuses of labeling product identifiers and methods of navigating products. Description information of one or more products is extracted. The description information of the products is clustered into a text. A subject analysis is applied to the text by using a text analysis method based on subject models to obtain one or more subjects and definition names for the subjects. A subject that is correlated to the description information of the product is used as an identifier of the product to label the product. The present techniques label the products with identifiers that have one or more user dimension attributes so that users may easily and intuitively find their desired products.
    Type: Application
    Filed: September 3, 2013
    Publication date: March 6, 2014
    Applicant: Alibaba Group Holding Limited
    Inventors: Changlong Sun, Anxiang Zeng
  • Patent number: 8655882
    Abstract: A system for ontology candidate selection and comparison including a microprocessor and an ontology candidate selection component executing on the microprocessor and configured to compare at least a portion of a plurality of ontology candidates based on a candidate selection rule, and based on said comparison, select from the plurality of ontology candidates a pair of ontologies. The system further includes an ontology similarity component coupled to the ontology candidate selection component and configured to generate a similarity outcome related to the pair of ontologies based on a similarity rule and evaluate at least one of: the candidate selection rule or the similarity rule based on the similarity outcome.
    Type: Grant
    Filed: August 31, 2011
    Date of Patent: February 18, 2014
    Assignee: Raytheon Company
    Inventors: Donald R. Kretz, William D. Phillips, Bruce E. Peoples, Justin W. Toennies
  • Patent number: 8650190
    Abstract: A computer-implemented system and method for generating a display of document clusters is described. Clusters of documents are presented in a multi-dimensional concept space. At least one document is selected from a collection of documents to be clusters. An angle ? of the document relative to a common origin of the multi-dimensional concept space is computed. The selected document is compared with each of the clusters. An angle ? from the common origin is determined for each cluster. A difference between the angle ? for the document and the angle ? for the cluster is determined. The difference is compared to the variance, and a new cluster is created when the difference exceeds the variance for all the clusters.
    Type: Grant
    Filed: March 14, 2013
    Date of Patent: February 11, 2014
    Assignee: FTI Technology LLC
    Inventor: Dan Gallivan
  • Publication number: 20140040270
    Abstract: Method, apparatus, and computer-readable medium are provided for analyzing a document including text. In one example, a method for identifying patterns in a document is described. The method includes identifying a plurality of candidate phrases in the document based on candidate identification criteria, grouping the candidate phrases of the plurality of candidate phrases with a phrase family based on family criteria and comparison between candidate phrases of the plurality of candidate phrases to obtain consistent phrases, and, for remaining phrases not meeting all of the candidate identification criteria, associating at least one of the remaining phrases with a phrase family based on inconsistent phrase criteria to obtain inconsistent phrases. Identified in this manner, the inconsistent phrase may be displayed via a user interface to permit a user the opportunity to determine whether an inconsistent phrase requires modification.
    Type: Application
    Filed: July 31, 2012
    Publication date: February 6, 2014
    Applicant: Freedom Solutions Group, LLC, d/b/a Microsystems
    Inventors: Thomas O'Sullivan, Andrzej Jachowicz
  • Publication number: 20140033088
    Abstract: Apparatus, systems, and methods may operate to transmit and receive information, such as between a client and a server, that enables the display of a plurality of version indicators corresponding to a plurality of versions of electronic content, the plurality of versions comprising a first version newer than a second version. Further activities may include detecting selection of, and then displaying, a first selection indicator to indicate selection of the first version and a second selection indicator to indicate selection of the second version. Further activity may include communicate information to enable displaying, at substantially the same time as the first and second selection indicators, at least a portion of a plurality of changes between the first version and the second version. Additional apparatus, systems, and methods are disclosed.
    Type: Application
    Filed: October 8, 2008
    Publication date: January 30, 2014
    Inventor: Robert Shaver
  • Patent number: 8639708
    Abstract: Computer-readable media and a computer system for implementing a natural language search using fact-based structures and for generating such fact-based structures are provided. A fact-based structure is generated using a semantic structure, which represents information, such as text, from a document, such as a web page. Typically, a natural language parser is used to create a semantic structure of the information, and the parser identifies terms, as well as the relationship between the terms. A fact-based structure of a semantic structure allows for a linear structure of these terms and their relationships to be created, while also maintaining identifiers of the terms to convey the dependency of one fact-based structure on another fact-based structure. Additionally, synonyms and hypernyms are identified while generating the fact-based structure to improve the accuracy of the overall search.
    Type: Grant
    Filed: August 29, 2008
    Date of Patent: January 28, 2014
    Assignee: Microsoft Corporation
    Inventors: Martin Henk Van Den Berg, Daniel Bobrow, Robert D. Cheslow, Barney Pell, Giovanni Lorenzo Thione, Chad Walters
  • Patent number: 8639496
    Abstract: A method includes accessing text that includes a plurality of words, tagging each of the plurality of words with one of a plurality of parts of speech (POS) tags, and creating a plurality of tokens, each token comprising one of the plurality of words and its associated POS tag. The method further includes clustering one or more of the created tokens into a chunk of tokens, the one or more tokens clustered into the chunk of tokens based on the POS tags of the one or more tokens, and forming a phrase based on the chunk of tokens, the phrase comprising the words of the one or more tokens clustered into the chunk of tokens.
    Type: Grant
    Filed: January 2, 2013
    Date of Patent: January 28, 2014
    Assignee: PureDiscovery Corporation
    Inventor: Paul A. Jakubik
  • Patent number: 8635107
    Abstract: An extensible offer inventory database of offers in a domain is established. Further, an offer ontology is generated based on the extensible offer inventory database. The offer ontology provides an extensible vocabulary that correlates to categories in the offer inventory database. In addition, offers are automatically located. The offers are also semantically analyzed to generate semantic analysis data. Further, user data is obtained. In addition, an optimal offer match is automatically determined based upon the semantic analysis data and the user data.
    Type: Grant
    Filed: June 3, 2011
    Date of Patent: January 21, 2014
    Assignee: Adobe Systems Incorporated
    Inventors: Walter Chang, Geoff Baum
  • Patent number: 8620918
    Abstract: Among other disclosed subject matter, a computer-implemented method includes receiving a plurality of electronic documents associated with a domain at a server. Each of the plurality of electronic documents includes meta-data and textual content. The method includes identifying one or more text strings in the textual content that are to be processed differently than an identical or similar text string in other electronic documents, and associating, with the electronic document, data indicating that each of the identified text strings is to be processed differently than an identical or similar text string in other electronic documents. The method also includes performing an analysis of the electronic documents to identify one or more subsets of the electronic documents that include related subject matter. A plurality of degrees of relatedness can be associated with text strings associated with data indicating that each of the text strings is to be processed differently.
    Type: Grant
    Filed: February 1, 2012
    Date of Patent: December 31, 2013
    Assignee: Google Inc.
    Inventors: Aner Ben-Artzi, Kirill Buryak, Glenn M. Lewis, Jun Peng, Nadav Benbarak
  • Patent number: 8612445
    Abstract: The present invention discloses methods, systems, and tools for unified semantic ranking of compositions of ontological subjects. The method breaks a composition into a plurality of partitions as well as its constituent ontological subjects of different orders and builds a participation matrix indicating the participation of ontological subjects of the composition in other ontological subjects, i.e. the partitions, of the composition. Using the participation information of the ontological subjects into each other a similarity matrix is built from which the semantic importance ranks of the partitions of the composition are calculated. The method, systematically, enables the calculation of semantic ranks of the ontological subjects of different orders of the composition. Various systems for implementing the method and numerous applications and services are disclosed.
    Type: Grant
    Filed: April 7, 2010
    Date of Patent: December 17, 2013
    Inventor: Hamid Hatami-Hanza
  • Patent number: 8612411
    Abstract: Systems and methods for clustering documents, such as for scientific documents, taking into account the citation patterns of the documents are disclosed. In one embodiment, the method includes locating citations to other documents, e.g., search result documents, comparing each pair of documents to be clustered for overlapping citations in a first, a more specific second, and an even more specific optional third citation generality, and determining clusters of related documents based on the comparisons. The levels of generalities may be, for example, document-, paragraph-, and/or citation-level generalities. The locating may locate only citations to the other documents to be clustered. The clusters may be determined based on a weighted score of the amount of overlapping citations in the various generalities and/or by performing factor analysis using the comparison results. The clusters may be ranked to determine the dominant clusters.
    Type: Grant
    Filed: December 31, 2003
    Date of Patent: December 17, 2013
    Assignee: Google Inc.
    Inventor: Vibhu O. Mittal
  • Publication number: 20130332458
    Abstract: Generally discussed herein are systems and methods for lexically enriching structured and semi-structured data. In one or more embodiments, a method can include receiving a code, lexicalizing the code, lexically combining the lexicalized code with a lexical descriptor, and sending the lexical combination to a keyword database.
    Type: Application
    Filed: June 12, 2013
    Publication date: December 12, 2013
    Inventor: Arthur R. Culbertson
  • Patent number: 8583419
    Abstract: The present invention relates to Latent Metonymical analysis and Indexing (LMai) is a novel concept for Advance Machine Learning or Unsupervised Machine Learning Techniques, which uses a statistical approach to identify the relationship between the words in a set of given documents (Unstructured Data). This approach does not necessarily need training data to make decisions on matching the related words together but actually has the ability to do the classification by itself. All that is needed is to give the algorithm a set of natural documents. The method is elegant enough to classify the relationships automatically without any human guidance during the process as shown in FIGS. 6 and 7.
    Type: Grant
    Filed: April 2, 2007
    Date of Patent: November 12, 2013
    Inventor: Syed Yasin
  • Patent number: 8584011
    Abstract: One or more techniques and/or systems are provided for transitioning between representations of an electronic document. Elements, such as visual elements, common between a first set of elements from a first representation of the document and a second set of elements from a second representation of the document are identified. The non-intersecting elements from the first and second sets are respectively ranked in accordance with a representation relevance. First set non-intersecting elements are removed from an intermediate representation of the document, and second set non-intersecting elements are added to the intermediate representation, while the intermediate representation is not equivalent to the second representation; and respective iterations of the intermediate representation are output, such as to a display to depict a transition from the first representation of the document to the second representation of the document.
    Type: Grant
    Filed: June 22, 2010
    Date of Patent: November 12, 2013
    Assignee: Microsoft Corporation
    Inventors: Jaime Teevan, Susan T. Dumais, Daniel J. Liebling
  • Publication number: 20130290338
    Abstract: A system (100) for generating a computer readable data file representative of a mapping between a first representation of a set of concepts or of a data structure (e.g. a database schema) and a second representation of a set of concepts or of a data structure (e.g. an ontology), each representation comprising a plurality of complex representational elements (e.g. tables in a database schema and concepts in an ontology) each of which may itself include a number of associated subordinate representational elements (e.g. columns/fields of a table in a database schema and attributes of a concept in an ontology).
    Type: Application
    Filed: December 23, 2011
    Publication date: October 31, 2013
    Applicant: BRITISH TELECOMMUNICATIONS PUBLIC LIMITED COMPANY
    Inventors: Beum Seuk Lee, Zhan Cui
  • Patent number: 8572086
    Abstract: In one embodiment, a method of generating annotation tags (28) for a digital image (22) includes maintaining a library (16) of human-meaningful words or phrases organized as category entries (72) according to a number of defined image description categories (70), and receiving context metadata (20) associated with the capture of a given digital image (22). The method further includes selecting particular category entries (72-1, 72-2) as vocabulary metadata (24) for the digital image (22) by mapping the context metadata (20) into the library (16), and generating annotation tags (28) for the digital image (22) by logically combining the vocabulary metadata (24) according to a defined set of deductive logic rules (30) that are predicated on the defined image description categories (70). In another embodiment, a processing apparatus (12), such as a digital processor (18, 26) and supporting memory (14), etc., is configured to carry out the above method, or to carry out variations of the above method.
    Type: Grant
    Filed: January 21, 2009
    Date of Patent: October 29, 2013
    Assignee: Telefonaktiebolaget LM Ericsson (publ)
    Inventors: Joakim Soderberg, Jonas Bjork, Andreas Fasbender
  • Patent number: 8572089
    Abstract: A method is provided for forming an entity cluster. In this method, a plurality of entities found in one or more data sources are identified. An entity may represent a word or a phrase found in the one or more data sources. The plurality of entities may then be organized into groups, where each group has a master entity and a set of subordinate entities. The groups are formed using a first comparison criteria. Then, using a second comparison criteria, a first group is associated with a second group. The second comparison criteria may compare the master entities associated with the first and second groups. Based on the association between the first group and the second group, the method can then determine that the first entity is related to the second entity.
    Type: Grant
    Filed: December 15, 2011
    Date of Patent: October 29, 2013
    Assignee: Business Objects Software Ltd.
    Inventor: Kimberly Starks
  • Patent number: 8572088
    Abstract: Automated rich presentation of a semantic topic is described. In one aspect, respective portions of multimodal information corresponding to a semantic topic are evaluated to locate events associated with the semantic topic. The probability that a document belongs to an event is determined based on document inclusion of one or more of persons, times, locations, and keywords, and document distribution along a timeline associated with the event. For each event, one or more documents objectively determined to be substantially representative of the event are identified. One or more other types of media (e.g., video, images, etc.) related to the event are then extracted from the multimodal information. The representative documents and the other media are for presentation to a user in a storyboard.
    Type: Grant
    Filed: October 21, 2005
    Date of Patent: October 29, 2013
    Assignee: Microsoft Corporation
    Inventors: Lie Lu, Wei-Ying Ma, Zhiwei Li