Of Unstructured Textual Data (epo) Patents (Class 707/E17.058)
  • Patent number: 7937243
    Abstract: Techniques for non-disruptive embedding of specialized elements are disclosed. In one aspect of the techniques, ontology is defined to specify an application domain. A program interface (API) is also provided for creating raw features by a developer. Thus a module is provided for at least one form of statistical analysis within the ontology. The module is configured automatically in a computing device with the API in response to a system consistent with the ontology, wherein the system has no substantial requirement for specialized knowledge of that form of statistical analysis, and the module has no substantial requirement for specialized knowledge of particular functions provided by the system.
    Type: Grant
    Filed: July 7, 2009
    Date of Patent: May 3, 2011
    Assignee: AiLive, Inc.
    Inventors: Wei Yen, Ian Wright, Dana Wilkinson, Xiaoyuan Tu, Stuart Reynolds, William Robert Powers, III, Charles Musick, Jr., John Funge, Daniel Dobson
  • Publication number: 20110093402
    Abstract: Techniques for ensuring acceptability of legal agreements entered into as part of a computer-facilitated workflow, e.g., for accepting a license agreement while installing software on a computer system owned by an organization. If the license has an approved status in an agreements database, the user can accept the license agreement during the software install process. If the license has a disapproved status in the agreements database, the user then rejects the license agreement during the software install process (or an install mechanism may simply aborts installation of the software). The process for other computer-facilitated workflows is similar.
    Type: Application
    Filed: June 28, 2010
    Publication date: April 21, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: PRAMOD GUPTA
  • Publication number: 20110087680
    Abstract: A system for selecting electronic advertisements from an advertisement pool to match the surrounding content is disclosed. To select advertisements, the system takes an approach to content match that takes advantage of machine translation technologies. The system of the present invention implements this goal by means of simple and efficient machine translation features that are extracted from the surrounding context to match with the pool of potential advertisements. Machine translation features used as features for training a machine learning model. In one embodiment, a ranking SVM (Support Vector Machines) trained to identify advertisements relevant to a particular context. The trained machine learning model can then be used to rank advertisements for a particular context by supplying the machine learning model with the machine translation features measures for the advertisements and the surrounding context.
    Type: Application
    Filed: December 16, 2010
    Publication date: April 14, 2011
    Inventors: Vanessa Murdock, Massimiliano Ciaramita, Vassilis Plachouras
  • Publication number: 20110078152
    Abstract: An exemplary embodiment of the present invention provides a method of processing an electronic text document. The method includes obtaining a character from the document. The method also includes obtaining a hash input code from a character map, the hash input code corresponding to the character. The method also includes modifying a hash value based on the hash input code if the hash input code indicates that the character is part of a token, or asserting the hash value if the hash input code indicates that character is not part of a token.
    Type: Application
    Filed: September 30, 2009
    Publication date: March 31, 2011
    Inventors: George Forman, Evan R. Kirshenbaum
  • Publication number: 20110029545
    Abstract: An improved search engine, for a computing device or computer network, utilizes search strings comprising complete words and numbers representing a syllable count for each unknown word. Pattern-matching algorithms are utilized to search a document database for documents that match the input search strings. The document database is constructed by analyzing a number of documents, utilizing document-analyzing algorithms. In one embodiment, each database record comprises a document that has been analyzed into one or more groups of word sequences. Each word sequence comprises an ordered list of words in the word sequence, as well as a corresponding ordered list of the syllable count for each word in the word sequence. The syllabic search engine can be implemented in different ways, such as through a software application, operating system, network software, or a custom software module. Improved computers and computer networks for providing a syllabic search function are also described.
    Type: Application
    Filed: October 14, 2010
    Publication date: February 3, 2011
    Inventor: Edward O. Clapper
  • Patent number: 7882153
    Abstract: A method for electronic messaging of trade data involves receiving trade data having an unknown format in an electronic message, where trade data includes individual trade data elements, processing the electronic message to resolve the unknown format into a known format, parsing trade data using the known format to obtain individual trade data elements, and storing individual trade data elements in a trade data repository.
    Type: Grant
    Filed: February 28, 2007
    Date of Patent: February 1, 2011
    Assignee: Intuit Inc.
    Inventors: Scott D. Cook, Daniel Wernikoff
  • Publication number: 20100299314
    Abstract: Methods and systems for identifying critical fields in documents, for example so that quality improvement efforts can be prioritized on the critical fields. One aspect of the invention concerns a method for improving quality of a data processing operation in a plurality of documents. A set of documents is sampled. An error rate for fields in the documents is estimated based on the sampling. Critical fields are identified based on which fields have error rates higher than a threshold.
    Type: Application
    Filed: May 25, 2010
    Publication date: November 25, 2010
    Inventors: Arijit Sengupta, Brad A. Stronger
  • Publication number: 20100268691
    Abstract: The invention relates to a framework system and methods for connecting a plurality of tools. The system comprises a plug-in mechanism configured to dynamically load the plurality of tools, a data pool having storage space configured to store data sets associated with the plurality of tools, a linking mechanism configured to establish communications links between the loaded plurality of tools to enable coordinated operation of the loaded plurality of tools, a session component configured to record the process history of the operations of the loaded plurality of tool and the system states corresponding to the process history of the operations and an annotation module configured to associate user-provided data corresponding to one or more of the stored data sets.
    Type: Application
    Filed: June 8, 2010
    Publication date: October 21, 2010
    Applicant: UNIVERSITY OF MASSACHUSETTS
    Inventors: Georges Grinstein, Alexander Gee, Urska Cvek, Howard Goodell, Hongli Li, Min Yu, Jianping Zhou, Vivek Gupta, Mary Beth Smrtic, Christine Lawrence, Chih-Hung Chiang
  • Publication number: 20100235365
    Abstract: System and methods are described for sorting information in order O(n) time using O(n) space and searching for information in that sorted list in order O(1) time by using one single dimensioned array, without the use of other data structures and techniques, parallel processing, recursion, or other-sorting algorithm.
    Type: Application
    Filed: March 13, 2009
    Publication date: September 16, 2010
    Inventor: Marvon M. Newby, JR.
  • Publication number: 20100228730
    Abstract: A set of tags can be identified from a first set of tagged documents in a first repository. A set of tags can be identified from a second set of tagged documents in a second repository. Access to documents in the second repository can be more restrictive than access to documents in the first repository. For each of a subset tags in the first set and/or the second set, a number of steps can occur. A ratio can be determined of tag instances in the second repository compared to tag instances in the first repository. It can be determined whether the ratio exceeds a previously determined threshold. When the threshold is exceeded, an indicator of at least one tagged document associated with the tag can be changed to indicate that the tagged document is likely to contain sensitive content.
    Type: Application
    Filed: March 5, 2009
    Publication date: September 9, 2010
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael Muller, Tolga Oral, Andrew L. Schirmer
  • Publication number: 20100228794
    Abstract: A technique for dynamic integration and semantic analysis of structured data and unstructured textual data including: defining and selecting static attributes and dynamic attribute from structured data, embedding static and dynamic views of the selected corresponding attributes in an annotated document, linking the unstructured textual data with the structured data using the defined static and dynamic attributes, populating an annotated document structure of multiple annotated documents, performing semantic analysis of a query across the unstructured textual data and structured data, querying the annotated document structure to provide query results satisfying static part of the query, processing static and dynamic parts of the query by querying structured data and the annotated document structure, as appropriate, and providing a combined query processing result satisfying the dynamic and static part the query. Other embodiments are also disclosed.
    Type: Application
    Filed: February 25, 2009
    Publication date: September 9, 2010
    Applicant: International Business Machines Corporation
    Inventors: Sourashis Roy, Himanshu Gupta, Hiroki Oya, Mukesh Kumar Mohania, Inagaki Iwao
  • Publication number: 20100191782
    Abstract: A method of assigning content with an entry in a directory includes parsing the content into text phrases. Mappings between each entry in the directory and information in name fields of the directory are determined. Name proposals for a phrase are determined using the mappings. Each name proposal identifies a potential match between the content and one or more entries in the directory. The content is assigned to an entry in the directory associated with a name proposal of the one or more name proposals.
    Type: Application
    Filed: January 29, 2009
    Publication date: July 29, 2010
    Inventor: Michael J. Brzozowski
  • Publication number: 20100191749
    Abstract: A method of externally sorting large files in a computer system is presented. The contents of the input file to be sorted are investigated in order to identify presorted portions thereof. The presorted portions of the input file as thus identified are incorporated as sorted strings into an external sortwork file, by rearranging directory information rather than physically transferring data. If merging is necessary, the data may then be merged by a procedure wherein blocks of sorted data to be merged are incorporated into an output (sortout) file, by rearranging directory information rather than physically transferring sorted blocks to the sortout file. As a result of the process, portions of sorted data incorporated into the sortout file may physically remain in external storage space allocated to the input file, and/or in external space allocated to sortwork, thereby eliminating or reducing reading and writing from disk during sort-merge processing.
    Type: Application
    Filed: February 5, 2010
    Publication date: July 29, 2010
    Inventor: Peter Chi-Hsiung Liu
  • Publication number: 20100174975
    Abstract: Some embodiments provide a method for analyzing an unstructured document that includes a number of glyphs. The method identifies boundaries between sets of glyphs. The method identifies that several of the boundaries form a table. The method defines a tabular structural element based on the table. The tabular structural element includes several cells arranged in a plurality of rows and columns, each of which includes an associated set of glyphs.
    Type: Application
    Filed: June 7, 2009
    Publication date: July 8, 2010
    Inventors: Philip Andrew Mansfield, Michael Robert Levy
  • Publication number: 20100153408
    Abstract: A method of and apparatus for storage and retrieval of resume images in a manner which preserves the appearance, organization, and information content of the original document. In addition, summaries or “outlines” of resume images, broken down into multiple fields, are stored, and can be searched field by field. A user interface is provided which is based on a familiar paper-based method already in common use, thus reducing the training required to effectively use the system.
    Type: Application
    Filed: February 22, 2010
    Publication date: June 17, 2010
    Inventors: Richard L. Hartman, Mary M. Hartman, Roy P. Massena
  • Publication number: 20100114882
    Abstract: Search results may be provided to a user. A search query may be received from the user. A query feature vector may be formed for the search query. The query feature vector may be compared with news feature vectors associated with documents related to current events. An augmented query feature vector may be formed based on results of the comparison of the query feature vector with the news feature vectors. The augmented query feature vector may be compared with feature vectors related to target documents. Search results that include target documents may be identified based on results of the comparison of the augmented query feature vector with the feature vectors related to the target documents. The user may be made able to perceive at least some of the identified search results.
    Type: Application
    Filed: July 20, 2007
    Publication date: May 6, 2010
    Applicant: AOL LLC
    Inventors: Anthony Wiegering, Harmannus Vandermolen, Karen Howe, Michael Sommers
  • Patent number: 7712054
    Abstract: Methods and apparatus, including computer program products, for populating a table in a business application. A computer-implemented method of displaying information on a computer display device includes displaying a first view on the display device, the first view including dropdown values and data in a table of rows and columns, the data received from a table node data structure stored in a memory, the table node data structure including node elements and attributes, and generating a second view in response to a change in a dropdown value by repopulating the table of rows and columns using attribute identifications (IDs).
    Type: Grant
    Filed: October 14, 2005
    Date of Patent: May 4, 2010
    Assignee: SAP AG
    Inventor: Peter Vignet
  • Publication number: 20100107164
    Abstract: Process management involves facilitating the application of a user action to an electronic document that changes a state of a thread. The thread includes data that collectively describes states and relationships of interrelated tasks of a process. Metadata of the electronic document is changed to reflect the changed state of the thread. The changed metadata is communicated via an electronic messaging operation of the process to update the changed state of the thread.
    Type: Application
    Filed: October 24, 2008
    Publication date: April 29, 2010
    Inventors: OSKARI KOSKIMIES, ANSSI KARHINEN, HARRI VEIKKO HEIKKILA
  • Publication number: 20100095203
    Abstract: In one embodiment, a method includes obtaining a first document that includes at least a first section, and displaying the first document on a display screen. The method also includes determining when the first section has been consumed. Such a determination includes a determination of whether the first section has been displayed on the display screen. If the first section is determined to have been displayed on the display screen, the method also includes providing an indication that the first section has been consumed.
    Type: Application
    Filed: October 15, 2008
    Publication date: April 15, 2010
    Applicant: Cisco Technology, Inc.
    Inventors: John Toebes, Lisa Bobbitt
  • Publication number: 20100083092
    Abstract: In a database application executing on a computer system, a database table view is represented by a structured object located on a storage device coupled to the computer system. Responsive to a user pasting content (e.g., spreadsheet data) onto the view of the table, the database application determines if the content to be pasted extends beyond the number of rows (records in a database) or columns (database fields) currently displayed in the table view. If the content extends beyond the number of rows or columns currently displayed, the database application automatically adds one or more records or fields to the structured object on the storage device, and updates the table view to display one or more rows or columns corresponding to the records or fields added to the structured object.
    Type: Application
    Filed: September 30, 2008
    Publication date: April 1, 2010
    Applicant: APPLE INC.
    Inventors: Geoff Schuller, Yan Guo
  • Publication number: 20100070515
    Abstract: Any client application uses a namespace application to resolve its pathname in order to reference a computer file. Computer files are stored in a fixed-content storage cluster and are accessed by retrieving a unique identifier for the computer file using the namespace application. Any type of pathname scheme from any client application is supported by the namespace. The namespace application uses a bindings table to record bindings between objects including the start date and end date for each binding, and direction and separator data used in the pathname scheme. An attribute table in the namespace keeps track of each attribute and its value for each object of the namespace including a start date and an end date for each attribute. The namespace provides syntactic generality in that any pathname scheme of a client application can be resolved to identify a unique computer file in the storage cluster.
    Type: Application
    Filed: August 24, 2009
    Publication date: March 18, 2010
    Applicant: CARINGO, INC.,
    Inventors: James E. Dutton, Laura Arbilla, James B. Casey, JR., James M. Morrison
  • Publication number: 20100017377
    Abstract: A process performs multiple evaluations of text simultaneously. There are multiple counters, each with pattern-amount pairs. The pattern-amount pairs are accumulated into a single finite-state machine, with each state having a list of (counter, value) pairs instead of a single value. While the finite-state machine is applied to text, a score for each counter is accumulated by summing values for the counter from value lists of visited states. With one state transition per character, evaluating text using one finite-state machine for multiple counters is more efficient than using separate finite-state machines for counters or patterns.
    Type: Application
    Filed: September 30, 2009
    Publication date: January 21, 2010
    Applicant: AVAYA INC.
    Inventor: ERIC THEODORE BAX
  • Publication number: 20090313230
    Abstract: A computing job information managing device assigns computing job identification information independent of an existing computing job controller to a computing job. By a matching process, the device associates a job in which a computer executes computation with a job uniquely assigned identification information by the computing job information managing device. A terminal transmits a request to acquire data of the in-progress data of the computation or the result of the computation to the computing job information managing device, thereby acquiring a combination of the address information about a computer needed to use an interface provided by a program being executed in a computer and the number of an available port.
    Type: Application
    Filed: August 19, 2009
    Publication date: December 17, 2009
    Applicant: FUJITSU LIMITED
    Inventor: Koichi SHIMIZU
  • Publication number: 20090259670
    Abstract: In one embodiment, the present invention includes a method for conditioning semi-structured text to enhance its use as a data source for an analytical processing tool. In general, the method involves analyzing the semi-structured text to identify portions of text (referred to herein as sub-documents) that exhibit a repetitive characteristic. Next, for each sub-document identified, the semi-structured text is integrated, for example, by filtering the text for relevant words, removing stop words, stemming certain words, adding or replacing certain words with synonyms, modifying the spelling of certain words, and/or resolving certain homonyms based on a document class assigned to the semi-structured text, and so on. Once integrated, the sub-documents are mapped to existing structures defined for the document class and/or sub-document type.
    Type: Application
    Filed: April 14, 2008
    Publication date: October 15, 2009
    Inventor: William H. Inmon
  • Publication number: 20090248726
    Abstract: A method and apparatus for accessing, processing and manipulating data in an OLAP database. According to one aspect, the present invention comprises a user interface configured for accessing, processing and manipulating data in an OLAP cube. According to another aspect, the present invention comprises a calculation engine for manipulating and managing data in the OLAP cube.
    Type: Application
    Filed: March 31, 2008
    Publication date: October 1, 2009
    Inventors: Paul Grant BARBER, Robert John WALKER
  • Publication number: 20090248762
    Abstract: Systems and methods for performing hierarchical storage operations on electronic data in a computer network are provided. In one embodiment, the present invention may store electronic data from a network device to a network attached storage (NAS) device pursuant to certain storage criteria. The data stored on the NAS may be migrated to a secondary storage and a stub file having a pointer pointing to the secondary storage may be put at the location the data was previously stored on the NAS. The stub file may redirect the network device to the secondary storage if a read request for the data is received from the network device.
    Type: Application
    Filed: May 18, 2009
    Publication date: October 1, 2009
    Applicant: CommVault Systems, Inc.
    Inventors: Anand Prahlad, Jeremy Schwartz
  • Publication number: 20090240737
    Abstract: The present invention concerns an appliance, a process and a computer programme product for the processing of unstructured or semi-structured digital data in a file system. In order to create an appliance, a process and a computer programme product which allow simple, reliable, highperformance and purpose oriented management of every manner of digital, stored, unstructured data, it is proposed that, it is functionally extended by providing a framework for further external logic to be inserted in order to modify the filesystem's behaviour and/or a structure is imposed onto unstructured or semi-structured data in real time by enhancing existing namespace semantics and/or metadata and data are processed independently by physically and logically separating namespace and block handlers.
    Type: Application
    Filed: December 12, 2005
    Publication date: September 24, 2009
    Applicant: SMAPPER TECHNOLOGIES GMBH
    Inventors: Mark Hardisty, Thiel Gunther
  • Publication number: 20090235163
    Abstract: A data communication device capable of facilitating file management compared to conventional methods is provided. The data communication device includes an image file storage portion for memorizing an image file to be sent to a user at the other end, a transmission information setting portion for setting transmission information necessary for sending the image file to the user at the other end, a file combining portion for generating a composite file by adding the transmission information to the image file and an e-mail message transmission portion for sending the generated composite file to the user at the other end.
    Type: Application
    Filed: May 15, 2009
    Publication date: September 17, 2009
    Applicant: Konica Minolta Business Technologies, Inc.
    Inventor: Kagumi Moriwaki
  • Publication number: 20090187567
    Abstract: A system and method are provided for comparing portions of document text with potential citation components, determining if individual portions correspond to a citation component, and determining if a set of portions correspond to a valid citation pattern. A set of valid citation patterns is provided. Each citation pattern may include a specified combination of citation components. The invention further relates to identifying potential citation components from text in a document, analyzing a pattern of the identified citation components by comparing the pattern to a set of stored citation patterns to determine if the potential citation is a type of citation, and if so, is it a valid (and/or invalid) citation pattern. Once citation patterns have been determined in the document, annotations may be inserted into the document, and subsequent action may be taken, for example, generating a list of citations, providing research services, error-handling, and/or providing other options related to the citations.
    Type: Application
    Filed: January 18, 2008
    Publication date: July 23, 2009
    Applicant: Citation Ware LLC
    Inventor: Tony ROLLE
  • Publication number: 20090182714
    Abstract: A sorting apparatus and method for sorting units into a unit storage structure in accordance with a pre-determined order, the sorting apparatus comprising a unit search structure containing a record of units in the unit storage structure, and a unit location pointer structure containing location pointers for units in the unit storage structure, wherein the sorting apparatus receives a unit being sorted, the unit search structure reads the unit being sorted, uses its record of units in the unit storage structure to search for a closest matching unit to the unit being sorted, accesses the unit location pointer structure and retrieves a location pointer for the closest matching unit, and the sorting apparatus uses the location pointer of the closest matching unit to access the unit storage structure and to place the unit being sorted into the unit storage structure in an appropriate position in accordance with the pre-determined order.
    Type: Application
    Filed: December 5, 2006
    Publication date: July 16, 2009
    Inventors: Sakir Sezer, Kieran McLaughlin
  • Publication number: 20090182754
    Abstract: A method for parsing a text file defines a tree pattern and a plurality of character string patterns. A tree structure corresponding to the text file is determined according to the tree pattern, and the desired data are retrieved from the text file according to the character string patterns. The retrieved desired data are output into a storage system.
    Type: Application
    Filed: January 12, 2009
    Publication date: July 16, 2009
    Applicants: HONG FU JIN PRECISION INDUSTRY(ShenZhen) CO., LTD., HON HAI PRECISION INDUSTRY CO., LTD.
    Inventors: CHUNG-I LEE, CHIEN-FA YEH, CHIU-HUA LU, XIAO-DI FAN, XIAO-PING ZHANG
  • Publication number: 20090182744
    Abstract: A method of analyzing a string-pattern includes defining a minimum length (Lmin—1) of substrings (STR_A_B) to be considered; defining a maximum length (Lmax—1) of substrings (STR_A_B) to be considered; with a computer, searching the string-pattern for substrings (STR_A_B) with a length in an interval between the minimum length (Lmin—1) and the maximum length (Lmax—1); counting an occurrence (Occ_A_B) of each substring (STR_A_B) found with a length in the interval between the minimum length (Lmin—1) and the maximum length (Lmax—1); and pruning away a number of the substrings (STR_A_B) that meet one or more criteria. The criteria are selected from the group consisting of (1) being contained inside the string-pattern in a subset (SET_A) of substrings (STR_A_B), (2) being shorter than the string-pattern, (3) occurring with a same frequency as the string-pattern, and combinations thereof.
    Type: Application
    Filed: January 9, 2009
    Publication date: July 16, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Andreas Arning, Roland Seiffert
  • Publication number: 20090177663
    Abstract: Software, devices and methods allowing varied mobile devices to interact with server side software applications are disclosed. Data from an application executing at a computing device is presented at a remote wireless device by providing the device an application definition file, containing definitions for a user interface format for the application at the wireless device; the format of network messages for exchange of data generated by the application; and a format for storing data related to the application at the wireless device. Using these definitions, the wireless device may receive data from the application in accordance with the definition and present an interface for the application. Preferably, the application definition file is an XML file. Similarly, application specific network messages provided to the device are also formed using XML. Data from the application may be presented at the mobile device by virtual machine software that uses the application definition file.
    Type: Application
    Filed: December 31, 2008
    Publication date: July 9, 2009
    Inventors: Steven J. HULAJ, Tim Neil
  • Publication number: 20090172001
    Abstract: A method for reducing the size of a DFA associated with a regular expression separates the functions of locating subexpressions within the DFA and determining if the located subexpressions satisfy a regular expression. For example, the functions of (1) locating subexpressions in a range asserting expression and, (2) determining whether the subexpressions satisfy the range of the range asserting expression are partitioned. In one embodiment, a first component may locate the subexpressions in a data stream using one or more DFAs, while a second component determines if the located subexpressions satisfy the range. In this embodiment, because the DFAs are not configured to determine a relationship between subexpressions, such as a range between subexpressions, the size of the resultant DFA may be significantly reduced.
    Type: Application
    Filed: March 11, 2009
    Publication date: July 2, 2009
    Applicant: TARARI, INC.
    Inventor: Robert J. McMillen
  • Publication number: 20090157720
    Abstract: The claimed subject matter provides systems and/or methods for normalizing document representations for use with Naïve Bayes. The system can include devices and components that determine norms associated with documents by aggregating absolute term weight values associated with the documents, and further ascertain term weights for features associated with the documents, and thereafter divides the term weights for the features associated with the documents with the norms associated with the documents to produce a normalized document representation that can be utilized by arbitrary linear classifiers.
    Type: Application
    Filed: December 12, 2007
    Publication date: June 18, 2009
    Applicant: Microsoft Corporation
    Inventors: Aleksander Kolcz, Wen-tau Yih
  • Publication number: 20090157735
    Abstract: A method and apparatus for obtaining access to services of service providers. In one embodiment, the method comprises requesting a desired service through a foreign service provider, generating a hash tree and generating a digital signature on a root value of the hash tree, sending the digital signature and the root value to the foreign service provider, providing one or more tokens to the foreign service provider with the next packet if the foreign service provider accepts the signature and continuing to use the service while the foreign service provider accepts tokens.
    Type: Application
    Filed: February 5, 2009
    Publication date: June 18, 2009
    Inventors: Craig B. Gentry, Zulfikar Amin Ramzan
  • Publication number: 20090144295
    Abstract: A computer readable storage medium includes executable instructions to receive a semantic abstraction describing at least one underlying data source. The semantic abstraction includes at least one dimension with at least one dimension value. Unstructured text is parsed into parsed text units. A dimension value is matched to a parsed text unit to form matched content. An indication of the matched content is stored.
    Type: Application
    Filed: November 30, 2007
    Publication date: June 4, 2009
    Applicant: BUSINESS OBJECTS S.A.
    Inventors: Gilles Vergnory Mion, Jean-Yves Cras
  • Publication number: 20090112892
    Abstract: A method and system for automatically summarizing fine-grained opinions in digital text are disclosed. Accordingly, a digital text is analyzed for the purpose of extracting all opinion expressions found in the text. Next, the extracted opinion expressions (referred to herein as opinion frames) are analyzed to generate opinion summaries.
    Type: Application
    Filed: October 29, 2007
    Publication date: April 30, 2009
    Inventors: Claire Cardie, Veselin Stoyanov, Yejin Choi, Eric Breck
  • Publication number: 20090100090
    Abstract: The present invention relates to a method and device for generating an ontology instance that classifies documents into structured documents and unstructured documents and automatically generates ontology instances. The method includes collecting documents corresponding to classes of an ontology from Web; if the collected documents are unstructured documents, extracting inter-entity relationship information from the unstructured documents; if the collected documents are structured documents, extracting inter-entity relationship information from the structured documents; generating ontology instances from the extracted inter-entity relationship information, and mapping the generated ontology instances to corresponding classes of the ontology.
    Type: Application
    Filed: June 27, 2008
    Publication date: April 16, 2009
    Applicant: Electronics and Telecommunications Research Institute
    Inventors: Changki LEE, Jihyun Wang, Miran Choi, Myunggil Jang
  • Publication number: 20090083305
    Abstract: The illustrative embodiments provide a system and method for processing a document. A data storage unit is provided to store data corresponding to the document, several documents processed at a previous time, and a set of rules. A rule in the set of rules may include a rule identifier, a directive to proceed to a second rule based on a condition, a specification of a data component, the specification configured to include a data component identifier, a data component attribute, and a directive to proceed to a second specification of a second data component based on a second condition. A rules-based engine is provided that may communicate with the data storage unit and may execute a rule in the set of rules. The set of rules may include rules for parsing, validating, identifying, relating, selecting, extracting, transforming, generating, analyzing, error correcting, reporting, and sending.
    Type: Application
    Filed: September 20, 2007
    Publication date: March 26, 2009
    Inventor: Allen F. Baker
  • Publication number: 20090070361
    Abstract: One or more methods of generating a pseudonymizable document are described. A method comprises receiving a set of subdocuments and generating a first set of random values wherein each subdocument in the document corresponds to a first set random value. A second set of values is generated based on a subdocument and a corresponding value of the first set random value. A set of pseudonyms is generated wherein each subdocument in the document corresponds to at least one pseudonym of the pseudonym set. A third set of values is generated based on the second set of values and the pseudonym set and a summary value is generated based on the third set of values.
    Type: Application
    Filed: September 12, 2007
    Publication date: March 12, 2009
    Applicant: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.
    Inventors: Stuart HABER, William G. HORNE, Tomas SANDER, Danfeng YAO
  • Publication number: 20090012970
    Abstract: A computer-implemented method for processing a plurality of data items includes defining a set of one or more categories having a corresponding set of conditions that associate the data items with the categories. A sub-categorization request, requesting to divide a category from among the categories into lower-level categories, is accepted from a user. The data items associated with the category are processed responsively to the sub-categorization request, so as to automatically suggest the lower-level categories. The automatically-suggested lower-level categories are presented to the user, and direction with respect to the automatically-suggested lower-level categories is accepted from the user. A hierarchical structure representing the categories is constructed responsively to the direction, by dividing the category into the lower-level categories. Output based on the hierarchical structure is presented to the user.
    Type: Application
    Filed: July 2, 2007
    Publication date: January 8, 2009
    Inventors: Dror Daniel Ziv, Yaron Gvili, Alexander Sokolovsky, Ofer Shochet, Michael Brand
  • Publication number: 20090010542
    Abstract: A system for interactive note-taking is provided having a receiver for receiving interaction data from a note-taking device used to interact with a note-taking form having note-taking information and a plurality of coded tags printed thereon, and a processor for recording or retrieving the note-taking by identifying, from the received interaction data, at least one parameter relating to the note-taking. Each tag encodes data on an identity of the form and a location of that tag on the form. The note-taking device senses the tags and generates the interaction data with data on the sensed form identity and a position of the note-taking device relative to the sensed tags.
    Type: Application
    Filed: September 15, 2008
    Publication date: January 8, 2009
    Inventors: Paul Lapstun, Kia Silverbrook, Jacqueline Anne Lapstun
  • Publication number: 20080313103
    Abstract: A computer software system that provides a requester and a physician with information on available cost savings for therapeutic equivalents to a patient's medication is disclosed. The computer software system can include an information-gathering template, a system administrator module, a search engine, a database and a message delivery module. The information-gathering template can be accessible by a requestor that is paying for the patient's medication and have an input field for accepting a medication name. The system administrator module can receive the medication name from the information-gathering template and instruct the search engine to search the database for a cost associated with the medication name, any therapeutic equivalents to the medication name and their respective cost.
    Type: Application
    Filed: June 16, 2008
    Publication date: December 18, 2008
    Inventors: Frank Burns, Phil St. John, Ramin Kouzehkanani
  • Publication number: 20080313166
    Abstract: Systems, methods, and computer-readable media for generating a research progression summary are provided. A research progression summary provides a snapshot of documents (e.g., articles) that have had a significant impact on a particular field of research, or at least a portion thereof, over time. Research progression sorts through all accessible relevant documents, analyzes the importance of each, and summarizes for presentation only those documents determined to be of particular importance with respect to a topic of interest (i.e., the particular field of research or some portion thereof). In this manner, a researcher can readily determine how the thinking with respect to a particular topic has progressed over time. By way of example only, the research progression summary may focus on one or more of historical developments in a particular field, current developments with respect to a topic of interest, or an overall summary of a particular field/topic.
    Type: Application
    Filed: June 15, 2007
    Publication date: December 18, 2008
    Applicant: MICROSOFT CORPORATION
    Inventors: Wei Yu, Amir Padovitz
  • Publication number: 20080306981
    Abstract: In various embodiments, a method for generating documents in native application formats includes receiving a document template as a first document according to a native format. The first document is parsed to generate an Extensible Document Transformation Language (XDTL) template representing the document template. An XDTL execution document is generated based on the XDTL template. A second document is then generated according to the native format based on the XDTL execution document.
    Type: Application
    Filed: June 6, 2007
    Publication date: December 11, 2008
    Applicant: Oracle International Corporation
    Inventors: Xin Jiang, Shirley Hong Zeng, Tomoji Ashitani
  • Publication number: 20080294661
    Abstract: A computer system with a first messaging application communicates a message to another computer system with a second messaging application via a coupling facility storage device. If the message does not exceed a predetermined threshold, the message is put onto the queue in the coupling facility. If the message does exceed a predetermined threshold, the message is put onto a log associated with the first messaging application and readable by the second messaging application. A pointer to the message is put onto the queue in the coupling facility. The pointer can be used to access the message in the log.
    Type: Application
    Filed: April 24, 2008
    Publication date: November 27, 2008
    Inventors: Jose Emir Garza, Stephen James Hobson, Peter Siddall
  • Publication number: 20080289014
    Abstract: A method and system for efficiently and securely permitting a user to scan electronic documents from a remote multi-function device to a user's home directory. A user can be authenticated via the multi-function device and electronic credentials associated with the user generated, which are utilized to determine the user's home directory. The multi-function device can then produce a customized template that can be selected by the user when accessing rendering/scanning services. The user can then scan a document and electronically store such a document at the home directory via an SMB (Server Message Block) protocol. Home directories can either be determined via an LDAP (Lightweight Directory Access Protocol) or configured on a network interface via a default directory path and the user name.
    Type: Application
    Filed: May 15, 2007
    Publication date: November 20, 2008
    Inventors: Amanda L. Applin, Parul Patel, Michael W. Barrett, Michael Wang, Cynthia Lambert Moskal
  • Publication number: 20080288482
    Abstract: A deduplication algorithm that provides improved accuracy in data deduplication by using aggregate and/or groupwise constraints. Deduplication is accomplished using only as many of these constraints that are satisfied rather than be imposed inflexibly as hard constraints. Additionally, textual similarity between tuples is leveraged to restrict the search space. The algorithm begins with a coarse initial partition of data records and continues by raising the similarity threshold until the threshold splits a given partition. This sequence of splits defines a rich space of alternatives. Over this space, an algorithm finds a partition of the input that maximizes constraint satisfaction. In the context of groupwise aggregation constraints for deduplication all SQL (structured query language) aggregates are allowed, including summation.
    Type: Application
    Filed: May 18, 2007
    Publication date: November 20, 2008
    Applicant: Microsoft Corporation
    Inventors: Surajit Chaudhuri, Venkatesh Ganti, Shriraghav Kaushik
  • Publication number: 20080281841
    Abstract: A system for analyzing a document in a repository is provided. The system receives a document that includes data and a document type. The document type has an associated physical structure. The system determines a logical structure of the document based in part on the data and selects a subset of the data based on at least one of the group including the associated physical structure and the logical structure. The system also stores a document segment that includes the selected subset of the data.
    Type: Application
    Filed: May 2, 2008
    Publication date: November 13, 2008
    Inventors: Kishore Swaminathan, Scott W. Kurth, William N. Milleker