Creation Of Semantic Tools (epo) Patents (Class 707/E17.098)
  • Publication number: 20140082022
    Abstract: Methods are disclosed for converting a directed graph to a taxonomy using guidelines from a user. An initial tree is output from a first pruning step in which subtree preferences (and other weights) are applied to preserve or remove paths from a node to one or more levels of descendent nodes. Subtree preferences (and infoboxes) may specify rules for automatically generating recommendations during application to nodes. In a second pruning step, the directed graph is again processed with additional weightings applied to edges in the graph in accordance with the recommendations. The recommendations may be human defined. Recommendations may specify a recommended ancestor for a particular node and may include a weighting to be applied to the recommendation itself, if there are multiple conflicting recommendations for the same node. Recommendations may also specify what standard weight to apply to the edge of the best parent.
    Type: Application
    Filed: September 19, 2012
    Publication date: March 20, 2014
    Applicant: Wal-Mart Stores, Inc.
    Inventors: Digvijay Singh Lamba, Omkar Deshpande
  • Publication number: 20130198234
    Abstract: Embodiments of the present disclosure provide a method and system for defining one or more custom properties of a term in a hierarchical taxonomy. Embodiments described herein include identifying a term in a term-set using an identifier associated with the term and defining at least one new property for the term. Once the property is defined, the newly defined property is applied to the term.
    Type: Application
    Filed: September 7, 2012
    Publication date: August 1, 2013
    Applicant: Microsoft Corporation
    Inventors: Patrick Carl Miller, Daniel E. Kogan, Peter Blair Gonzalez del Solar, Qinwei Zhu
  • Publication number: 20130060816
    Abstract: Described herein are methods, systems, apparatuses and products for transforming hierarchical language data into relational form. An aspect provides for assembling at least one statistical summary of at least one hierarchical language data source responsive to execution of program instructions accessible to at least one processor operatively connected to a memory device; accessing at least one entity of interest selected from the at least one statistical summary; generating at least one target hierarchical language model based on the at least one entity of interest; and transforming data from the at least one hierarchical language data source into at least one relational form by executing transformation artifacts generated based on a type of the relational form. Other aspects are disclosed herein.
    Type: Application
    Filed: August 28, 2012
    Publication date: March 7, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Joshua Wai-ho Hui, Peter Martin Schwarz
  • Publication number: 20130041920
    Abstract: Provided are techniques for creating a hierarchy of results. An unstructured result set is received. Each result in the unstructured result set is hashed into a preliminary result set. For each hashed result, one or more related concepts are obtained using one or more taxonomies; one or more matches between the one or more related concepts and other hashed results in the preliminary result set are found; a candidate group for the hashed result is formed, wherein the candidate group includes the hashed result and one or more other hashed results based on the one or more matches; in response to determining that a frequency associated with the hashed result exceeds a threshold, the candidate group associated with that hashed result is compared with pre-existing groups that are in use; and, based on the comparing, one or more suggestions regarding the candidate group are provided.
    Type: Application
    Filed: August 8, 2011
    Publication date: February 14, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: John P. Bufe, Samuel A. Kaufmann, Ian W. Webster, Margaret A. Zagelow
  • Publication number: 20120310969
    Abstract: A method, machine readable storage medium, and system for generating a semantic network that utilizes existing relationships between related terms in a searchable database. Upon detection of the absence of a searched term from a database, a term data structure and indexes in a particular domain in which related terms related to the results provided by the search engine may be analyzed to determine if a new term related to the unfound search term should be created. Upon creation of the term, attributes related to the term are generated so the term may be placed in the most proper domain, and linkages to other terms in the same or different domains may be generated. All of the information is stored in the database. User input is not needed to accomplish the creation of the new term in the database.
    Type: Application
    Filed: May 31, 2011
    Publication date: December 6, 2012
    Applicant: SAP AG
    Inventor: Robert Heidasch
  • Publication number: 20120303662
    Abstract: A method is provided for enhancing service diagnostics for root cause analysis of an identified problem in a vehicle. Service repair data of previously serviced vehicles is obtained from a memory storage device. The service data is compiled based on a service repair history for each vehicle. Each vehicle within the compiled service data having at least two service repairs performed within a predetermined period of time is identified. Combinations of parts serviced during each service repair are identified. A count is determined that indicates the number of times each combination appears in the compiled service data. The combinations having counts greater than a predetermined threshold are identified. A determination is made whether any of the combinations having counts greater than the predetermined threshold are present in the structural taxonomy database.
    Type: Application
    Filed: May 24, 2011
    Publication date: November 29, 2012
    Applicant: GM GLOBAL TECHNOLOGY OPERATIONS LLC
    Inventors: Asim Tewari, Vineet R. Khare, Sugato Chakrabarty
  • Publication number: 20120239694
    Abstract: A method and system for predicting future trends of terms taxonomies of users generated content. The method comprises crawling one or more sources of users generated content to collect phrases mentioned by users of the one or more data sources; periodically analyzing one or more term taxonomies to determine at least a trend of at least a non-sentiment phrase with respect of a plurality of sentiment phrases, wherein a term taxonomy is an association between a non-sentiment phrase and a sentiment phrase, the non-sentiment and sentiment phrases are included in the collected phrases; and generating a prediction of future behavior of the at least trend with respect of the one or more term taxonomies.
    Type: Application
    Filed: May 29, 2012
    Publication date: September 20, 2012
    Applicant: TAYKEY LTD.
    Inventors: Amit Avner, Omer Dror
  • Publication number: 20120173556
    Abstract: A method and system for managing semantic and syntactic metadata. Heterogeneous data is received. After the heterogeneous data is received, the semantic metadata associated with the received heterogeneous data is captured and syntactic metadata associated with the received heterogeneous data is captured. The semantic metadata describes contextually relevant or domain-specific information about data based on an industry-specific or enterprise-specific metadata model or ontology. The syntactic metadata included grammatical rules and structural patterns governing an ordered use of formats and arrangement pertaining to specified data. The received heterogeneous data and said captured semantic metadata and said syntactic metadata are logically linked. The heterogeneous data is stored in a repository.
    Type: Application
    Filed: March 14, 2012
    Publication date: July 5, 2012
    Applicant: International Business Machines Corporation
    Inventors: Ock Kee Baek, Arti Abhay Kale, Tao Liu, Pradeep Madaiah
  • Publication number: 20120143880
    Abstract: Methods and system of searching for content in a target set of content based on a reference set of content, a reference semantic network representing knowledge associated with the reference set of content, and a target semantic network representing knowledge associated with the target set of content.
    Type: Application
    Filed: December 30, 2011
    Publication date: June 7, 2012
    Applicant: Primal Fusion Inc.
    Inventors: Peter Joseph Sweeney, Ihab Francis IIyas, Jean-Paul Dupuis, Nadiya Yampolska
  • Publication number: 20120124098
    Abstract: Described herein are a method and system for managing complex systems knowledge. Information generated during operation of a complex system is monitored. This information is normalized to a complex system base element that is expressed according to a standardized element taxonomy. During normalization, the information inherits characteristics of the base element. Following normalization, the information is stored in an information database. This information can be used to do any one or more of design, construct, operate, automate and otherwise configure another complex system.
    Type: Application
    Filed: May 28, 2010
    Publication date: May 17, 2012
    Inventor: Mark Gordon Damm
  • Patent number: 8180777
    Abstract: The present invention relates in general to methods and systems for comparing and maximizing the optimal selection of a first set of one or more data objects to a set of second data objects. In one embodiment, the first set of data objects represent one or more tasks to be fulfilled by a set of capabilities represented by the second data objects. In one embodiment, methods and systems are provided that apply topic modeling and similarity metrics to determine the optimal selection. In one embodiment, methods and systems are provided to determine the appropriateness of a set of second data objects to satisfy the requirements of a first data object given interaction attributes. Embodiments may be used to compare mission requirements with potential team members to determine the appropriateness of team members and teams for a given mission based on interaction attributes of the team members and teams.
    Type: Grant
    Filed: October 24, 2010
    Date of Patent: May 15, 2012
    Assignee: Aptima, Inc.
    Inventors: Andrew Duchon, Kari Kelton, Pacey Foster, Kara Orvis, Robert McCormack
  • Publication number: 20120117114
    Abstract: A system for collaborative analysis from different processes on different data sources. The system uses a unique approach to lightweight temporary data structures in order to allow communication of interim results among processes, and construction of semantically appropriate reports. The data structures are generated in near real time and their lightweight nature supports massive scaling, including many diverse streaming inputs.
    Type: Application
    Filed: November 7, 2011
    Publication date: May 10, 2012
    Inventor: Harold Theodore Goranson
  • Publication number: 20120096041
    Abstract: A data processing system used for delivering profile data structures that contain interest nodes or channels. The interest nodes include sets of targets and qualifiers that comprise attributes used in filtering information files for delivery. Targets and qualifiers are applied to the attributes and available information files to produce the filtered set. Web pages showing results include tools to assist the user in creation and editing of the information. The user may share interest nodes with other users, and other users may be granted editing capability of the interest nodes. The other users may modify the interest nodes for their own personal use. Even if modified the user may continue to receive new content or information from the original user in accordance with the unmodified interest nodes or channels.
    Type: Application
    Filed: October 7, 2011
    Publication date: April 19, 2012
    Applicant: The Washington Post
    Inventors: Ramana Rao, Brian L. Neumann, Michael J. Ferguson
  • Publication number: 20120047174
    Abstract: A system for identifying hidden connections between non-sentiment phrases. The system comprises a network interface enabling an access to one or more data sources; a data warehouse storage for at least storing a plurality of phrases including sentiment phrases and non-sentiment phrases; an analysis unit for identifying hidden connections between non-sentiment phrases based on at least one proximity rule and for generating at least an association between at least two non-sentiment phrases having a hidden connection and a sentiment phrase, wherein an association between the at least two non-sentiment phrases having the hidden connection and the corresponding sentiment phrase is a term taxonomy.
    Type: Application
    Filed: October 24, 2011
    Publication date: February 23, 2012
    Applicant: TAYKEY LTD.
    Inventors: Amit AVNER, Omer DROR, Itay BIRNBOIM
  • Publication number: 20120016909
    Abstract: A method is provided for analyzing the semantic content of network configuration files, comprising the steps of accessing configuration files associated with corresponding network components, the files containing commands that define the configuration of those components; transforming the commands into a structural database based, at least in part, on a non-grammatical analysis of the commands, wherein the structure of the commands is represented as the structural database; and constructing a semantic database of the configuration files by querying the structural database.
    Type: Application
    Filed: July 16, 2010
    Publication date: January 19, 2012
    Applicant: TELCORDIA TECHNOLOGIES, INC.
    Inventors: Sanjai Narain, Gary Levin
  • Publication number: 20120016805
    Abstract: An electronic document is accessed. A structural definition that defines a structural convention according to which information within the electronic document is arranged also is accessed. Based on the accessed structural definition, at least some of the information is extracted from the electronic document. A machine-understandable representation of the extracted information then is generated.
    Type: Application
    Filed: July 13, 2010
    Publication date: January 19, 2012
    Inventors: Sven Graupner, Hamid Reza Motahari Nezhad, Sujoy Basu
  • Publication number: 20110295902
    Abstract: Method(s) for identifying a taxon corresponding to a query sequence are described herein. The method includes selecting a target cluster, from amongst a plurality of reference clusters, corresponding to the query sequence. The target cluster may be selected based on a composition based analysis. A similarity based analysis of the query sequence is performed with respect to the target cluster. From the target cluster, the taxon corresponding to the query sequence is identified based on the similarity based analysis.
    Type: Application
    Filed: May 25, 2011
    Publication date: December 1, 2011
    Applicant: Tata Consultancy Service Limited
    Inventors: Sharmila S. Mande, Mohammed Monzoorul Haque, Tarini Shankar Ghosh, Nitin Kumar Singh
  • Publication number: 20110270883
    Abstract: The present invention uses an algorithm which evaluates learners' short free-text answers when the answer has as few as 10 words. The answer key uses only one correct answer, allowing instructors to ask learners to produce short open-ended text responses to questions. The algorithm automates the scoring of free-text answers, enabling instructors to embed such questions in online courses, and providing nearly immediate scoring and feedback on learners' responses. The algorithm is based on the semantic relatedness of the words in the learners' answer to the single correct answer. The semantic relatedness algorithm requires a dedicated domain specific index or collection of topic-focused documents (a corpus), which is created by an automated crawl mechanism that collects documents based upon descriptive domain keywords.
    Type: Application
    Filed: July 11, 2011
    Publication date: November 3, 2011
    Inventors: Ohad Lisral Bukai, Robert Pokorny, Jacqueline A. Haynes
  • Publication number: 20110270888
    Abstract: Methods and systems for searching over large (i.e., Internet scale) data to discover relevant information artifacts based on similar content and/or relationships are disclosed. Improvements over simple keyword and phrase based searching over internet scale data are shown. Search engines providing accurate and contextually relevant search results are disclosed. Users are enabled to identify related documents and information artifacts and quickly, ascertain, via visualization, which of these documents are original, which are derived (or copied) from a source document or information artifact, and which subset is independently generated (i.e., an original document or information artifact).
    Type: Application
    Filed: April 29, 2011
    Publication date: November 3, 2011
    Applicant: ORBIS TECHNOLOGIES, INC.
    Inventors: Larry CROCHET, Michael NIV
  • Publication number: 20110264699
    Abstract: A processing method for classification (300) of contents (400) in a domain (500; 501) that can be represented through a taxonomy is described, the method comprising: generating (301) a first digital mathematical representation of the taxonomy; generating (302) a second digital mathematical representation of text documents (600) different from said contents and containing keywords; processing (303) the first and second digital mathematical representations for enriching the taxonomy, by associating keywords of the text documents (600) with the first digital mathematical representation; generating (304) a third digital mathematical representation of the contents (400); processing (305) the first digital enriched mathematical representation and third mathematical representation for classifying the contents (400) in the enriched taxonomy.
    Type: Application
    Filed: December 30, 2008
    Publication date: October 27, 2011
    Applicant: TELECOM ITALIA S.p.A.
    Inventors: Fabrizio Antonelli, Marina Geymonat, Dario Mana, Rossana Simeoni, Selcuk Kasim Candan, Mario Cataldi, Luigi Di Caro, Maria Luisa Sapino
  • Patent number: 8024331
    Abstract: An apparatus and method are disclosed for producing a semantic representation of information in a semantic space. The information is first represented in a table that stores values which indicate a relationship with predetermined categories. The categories correspond to dimensions in the semantic space. The significance of the information with respect to the predetermined categories is then determined. A trainable semantic vector (TSV) is constructed to provide a semantic representation of the information. The TSV has dimensions equal to the number of predetermined categories and represents the significance of the information relative to each of the predetermined categories. Various types of manipulation and analysis, such as searching, classification, and clustering, can subsequently be performed on a semantic level.
    Type: Grant
    Filed: June 2, 2008
    Date of Patent: September 20, 2011
    Assignee: Manning & Napier Information Services, LLC.
    Inventors: Randall J. Calistri-Yeh, Bo Yuan, George B. Osborne, David L. Snyder
  • Publication number: 20110179084
    Abstract: A content device may select associated content, such as adverts, for a user selected content item based on textual characterizing data for the associated content and the user selected content item. A term set characterizing the user selected content item is expanded using semantic graphs and similarity values between the expanded term set and term sets describing associated content is calculated. A specific associated content item is then selected based on the similarity values. The semantic graph based term set expansion may allow improved accuracy in selecting appropriate associated content while providing a process that is suitable for resource constrained scenarios. In particular, communication resource, memory resource, and computational resource usage may be kept low.
    Type: Application
    Filed: August 24, 2009
    Publication date: July 21, 2011
    Applicant: MOTOROLA, INC.
    Inventors: Simon Waddington, Ben m. Bratu, Ioannis Kompatsiaris, Fotis Menemenis, Symeon Papadopoulos
  • Publication number: 20110161285
    Abstract: An approach is provided for automatic controlled value expansion of information. A value expansion controller detects a request at a device to perform an information operation on a set of data elements of an information space, wherein the request identifies the set by a name and the information operation applies to each data element within the set. The value expansion controller intercepts the request based on the detection and determines the location of the data elements within a communication network based on the name. The value expansion controller retrieves the data elements from the location and populates an expansion table with the retrieved data elements, wherein the expansion table is correlated to the set. The value expansion controller performs the information operation on each data element in the expansion table and causes transmission of one or more results of the information operation to the device.
    Type: Application
    Filed: December 30, 2009
    Publication date: June 30, 2011
    Applicant: Nokia Corporation
    Inventors: Sergey Boldyrev, Jukka Honkola, Vesa Luukkala
  • Publication number: 20110153673
    Abstract: The invention relates to topic classification systems in which text intervals are represented as proposition trees. Free-text queries and candidate responses are transformed into proposition trees, and a particular candidate response can be matched to a free-text query by transforming the proposition trees of the free-text query into the proposition trees of the candidate responses. Because proposition trees are able to capture semantic information of text intervals, the topic classification system accounts for the relative importance of topic words, for paraphrases and re-wordings, and for omissions and additions. Redundancy of two text intervals can also be identified.
    Type: Application
    Filed: January 24, 2011
    Publication date: June 23, 2011
    Applicant: RAYTHEON BBN TECHNOLOGIES CORP.
    Inventors: Elizabeth Megan Boschee, Michael Levit, Marjorie Ruth Freedman
  • Publication number: 20110119310
    Abstract: Systems, methods, and other embodiments associated with incremental inference are described. One example method includes updating existing or old triples in a semantic model with triples resulting from the addition of new triples. The updating is performed by separating inference rules into joining steps that are performed on first and second predicates for the inference rule. A first joining step joins results of execution of the first predicate on the new triples with the results of execution the second predicate on the union of the old and new triples to produce newly inferred triples. A second joining step joins results of execution of the first predicate on the union of the old and new triples with the results of execution the second predicate on the new triples to produce newly inferred triples.
    Type: Application
    Filed: November 18, 2009
    Publication date: May 19, 2011
    Applicant: ORACLE INTERNATIONAL CORPORATION
    Inventors: Vladimir KOLOVSKI, Zhe WU
  • Publication number: 20110093479
    Abstract: A system and method for using semantic understanding in storing and searching data and other information. A linearized tuple-based version of a conceptual graph can be created from a user input. A plurality of conceptual graphs, or portions thereof, can be compared to determine matches. An associative database can be created and/or searched using a hierarchy of conceptual graphs in tuple format, so that the data storage and searching of such database is optimized. The associative database can be used to integrate data from multiple different sources; form part of an Internet or other search engine; or used in other implementations. Also disclosed herein is a system and method for use of semantic understanding in searching and providing of content is described herein.
    Type: Application
    Filed: October 15, 2010
    Publication date: April 21, 2011
    Applicant: VEXIGO, LTD.
    Inventor: Gil Fuchs
  • Publication number: 20110076653
    Abstract: Systems and methods for semantic knowledge assessment, instruction, and acquisition are disclosed. In one embodiment a computer-implemented method for language instruction includes determining a lexical recognition ability level of a user within a lexicon of a particular language. This method further includes, based on item recognizability, creating a target list of unknown lexical items. The target list can be sorted by ranking the importance of the unknown lexical items within the particular lexicon. The method also includes generating a personal language learning sequence for the user based, at least in part, on the target list.
    Type: Application
    Filed: April 5, 2006
    Publication date: March 31, 2011
    Inventors: Brent Culligan, Takashi Ono, Kiyoshi Nishijima, David Schaufele, Guy Cihi, Charles Browne
  • Publication number: 20110072052
    Abstract: Embodiments of the subject invention comprise a computer based system and methods to collect and compare the attributes of a group of entities using data representing topic data of the entity and interaction data between entities. Embodiments of the invention comprise using minimally invasive means to automatically collect and model both an entity's attributes such as their knowledge/work/interest as well as model the social interactions of the entity together with a means to identify opportunities to influence changes in the entity attributes. Minimally invasive means to collect and model attributes include semantic analysis and topic modeling techniques. Means to model social interactions include social network analysis techniques that can incorporate location data of the entity. Embodiments of the invention further provide a sharable index of the attributes of the entities and the group of entities.
    Type: Application
    Filed: May 27, 2009
    Publication date: March 24, 2011
    Applicant: Aptima Inc.
    Inventors: Bruce Skarin, Andrew Duchon, Paul Allopenna, Rich Dejordy
  • Publication number: 20100262620
    Abstract: In one embodiment, a method comprises defining a set of concepts based on a first set of structured and unstructured data objects, defining a business rule based on the set of concepts, applying the business rule to a second set of structured and unstructured data objects to make a determination associated with that set, and outputting to a display information associated with the determination.
    Type: Application
    Filed: April 14, 2009
    Publication date: October 14, 2010
    Inventor: Rengaswamy Mohan
  • Publication number: 20100223295
    Abstract: Novel tools and techniques for generating and/or implementing an applied semantic knowledgebase. Some tools allow for data integration into coherent, semantically connected networks and for generation of sets of query-based models describing complex functional relationships as sub-networks. In an aspect, an applied semantic knowledgebase may comprise collections of SPARQL network queries describing a specific set of sub-network relationships and their applicable ranges for each element in the query.
    Type: Application
    Filed: April 12, 2010
    Publication date: September 2, 2010
    Applicant: IO INFORMATICS, INC.
    Inventors: Robert A. Stanley, Erich A. Gombocz
  • Patent number: 7698271
    Abstract: A conceptual network generating system that generates a conceptual network showing conceptual relations between words, the conceptual network generating system including: a first searching unit that searches a knowledge source storing search sentences; a first generating unit that analyzes the retrieved first search result sentence; a holding unit that stores the generated first structure information in a memory unit; a second searching unit that searches the knowledge source; a second generating unit that analyzes the retrieved second search result sentence; a calculating unit that calculates similarity between the generated second structure information and the stored first structure information; and a setting unit that generates conceptual network information.
    Type: Grant
    Filed: March 21, 2007
    Date of Patent: April 13, 2010
    Assignee: Fuji Xerox Co., Ltd.
    Inventors: Hiroki Yoshimura, Motoyuki Takaai, Hiroshi Masuichi
  • Publication number: 20090327285
    Abstract: Determining a semantic relationship is disclosed. Source content is received. Cluster analysis is performed at least in part by using at least a portion of the source content. At least a portion of a result of the cluster analysis is used to determine the semantic relationship between two or more content elements comprising the source content.
    Type: Application
    Filed: August 31, 2009
    Publication date: December 31, 2009
    Applicant: Apple, Inc.
    Inventors: Philip Andrew Mansfield, Michael Robert Levy, Yuri Khramov, Darryl Will Fuller
  • Publication number: 20090287678
    Abstract: A system, method and computer program product for providing answers to questions based on any corpus of data. The method facilitates generating a number of candidate passages from the corpus that answer an input query, and finds the correct resulting answer by collecting supporting evidence from the multiple passages. By analyzing all retrieved passages and that passage's metadata in parallel, there is generated an output plurality of data structures including candidate answers based upon the analyzing. Then, by each of a plurality of parallel operating modules, supporting passage retrieval operations are performed upon the set of candidate answers, and for each candidate answer, the data corpus is traversed to find those passages having candidate answer in addition to query terms. All candidate answers are automatically scored causing the supporting passages by a plurality of scoring modules, each producing a module score.
    Type: Application
    Filed: May 14, 2008
    Publication date: November 19, 2009
    Applicant: International business machines corporation
    Inventors: Eric W. Brown, David Ferrucci, Adam Lally, Wlodek W. Zadrozny
  • Publication number: 20090276397
    Abstract: A system and method are disclosed for analyzing, deconstructing, reconstructing, and repurposing rhetorical content. A system that incorporates teachings of the present disclosure may include, for example, a content management system (400) having a database (404), and a controller (402) for managing the database. The controller can be programmed to retrieve (702) at least one of a first plurality of records, each including content, retrieve (706) rhetorical libraries, identify (714) patterns between the rhetorical libraries and the content of each record, and deconstruct (718) the content into one or more rhetorical topics according to the patterns identified. Additional embodiments are disclosed for analyzing, deconstructing, reconstructing, and repurposing content.
    Type: Application
    Filed: July 13, 2009
    Publication date: November 5, 2009
    Applicant: AT&T INTELLECTUAL PROPERTY I, L.P.
    Inventors: NABHA V. REGE, John Neil Cobb, Yeow Loong Lee, Lee Alan Cobb, Kristen Jane Sebastian
  • Publication number: 20090265162
    Abstract: A set of words is converted to a corresponding set of particles, wherein the words and the particles are unique within each set. For each word, all possible partitionings of the word into particles are determined, and a cost is determined for each possible partitioning. The particles of the possible partitioning associated with a minimal cost are added to the set of particles.
    Type: Application
    Filed: June 30, 2009
    Publication date: October 22, 2009
    Inventors: Tony Ezzat, Evandro B. Gouvea
  • Publication number: 20090144277
    Abstract: Computer-storage media, computerized methods and systems for classifying character strings within electronic documents are provided. Initially, textual data, which includes one or more character strings, is extracted from an electronic version of a document, typically scanned from a physical document utilizing optical character recognition. The textual data is received at a table-of-contents (TOC) engine that extracts semantic information from the textual data. Sub-engines within the TOC engine analyze the semantic information to determine at least one appropriate classification for character strings within the textual data. Labels selected from a predetermined set of TOC-architecture labels are appended to the character strings according to the appropriate classification. The character strings, and labels appended thereto, are stored in association with each other generating an electronic document file that includes enriched textual data.
    Type: Application
    Filed: December 3, 2007
    Publication date: June 4, 2009
    Applicant: MICROSOFT CORPORATION
    Inventors: OREN TRUTNER, BODIN DRESEVIC, SASA GALIC, BOGDAN RADAKOVIC, ALEKSANDAR UZELAC, DEJAN LUKACEVIC