Patents by Inventor Jan H. Pieper

Jan H. Pieper has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Holistic disambiguation for entity name spotting

Patent number: 8856119

Abstract: A method resolves ambiguous spotted entity names in a data corpus by determining an activation level value for each of a plurality of nodes corresponding to a single ambiguous entity name. The activation levels for each of the nodes may be modified by inputting outside domain knowledge corresponding to the nodes to increase the activation value of the nodes, spotting entity names corresponding to the nodes to increase the activation value of the nodes, searching the data corpus to spot newly posted entity names to increase the activation value of the nodes, and searching the data corpus to reduce or deactivate the activation value of the nodes by eliminating false positives. The ambiguous entity name is assigned to the node determined to have the highest activation level and is then outputted to a user.

Type: Grant

Filed: February 27, 2009

Date of Patent: October 7, 2014

Assignee: International Business Machines Corporation

Inventors: Varun Bhagwan, Tyrone W. A. Grandison, Daniel F. Gruhl, Jan H. Pieper
Method and framework to support indexing and searching taxonomies in large scale full text indexes

Patent number: 8600997

Abstract: A system and method of indexing a plurality of entities located in a taxonomy, the entities comprising sets of terms, comprises receiving terms in an index structure; building a posting list for an entity with respect to the locations of the set of terms defining the entity and data associated with the respective terms; and indexing a name of a group comprising the entities within this group at the location of the entities with the data of the group comprising the name of the respective entity at each location. The building of the posting list comprises storing the location of the term and data associated with the term in an entry in the posting list for the term. The method comprises indexing aliases of the name of the group comprising the term, and using an inverted list index to associate data with each occurrence of an index term.

Type: Grant

Filed: September 30, 2005

Date of Patent: December 3, 2013

Assignee: International Business Machines Corporation

Inventors: Nadav Eiron, Daniel N. Meredith, Joerg Meyer, Jan H. Pieper, Andrew S. Tomkins
Operating system and file system independent incremental data backup

Patent number: 8595188

Abstract: Embodiments of the invention relate to creating an operating system and file system independent incremental data backup. A first data backup of a source system and second version of the data on the source system is received. A second data backup of the second version of the data is created by determining differences between the first data backup and the second version of the data. Each portion of the second version of the data that is the same as a portion of the first data backup is referenced in the second data backup. Each portion of the second version of the data that is different than all portions of the first data backup is included in the second data backup. The second data backup is appended to the first data backup to create an incremental data backup.

Type: Grant

Filed: November 6, 2009

Date of Patent: November 26, 2013

Assignee: International Business Machines Corporation

Inventors: Daniel Gruhl, Jan H. Pieper, Mark Andrew Smith
Data deduplication for streaming sequential data storage applications

Patent number: 8407193

Abstract: Data deduplication compression in a streaming storage application, is provided. The disclosed deduplication process provides a deduplication archive that enables storage of the archive to, and extraction from, a streaming storage medium. One implementation involves compressing fully sequential data stored in a data repository to a sequential streaming storage, by: splitting fully sequential data into data blocks; hashing content of each data block and comparing each hash to an in-memory lookup table for a match, the in-memory lookup table storing all hashes that have been encountered during the compression of the fully sequential data; for each data block without a hash match, adding the data block as a new data block for compression of fully sequential data; and encoding duplicate data blocks using the in-memory lookup table into data segments.

Type: Grant

Filed: January 27, 2010

Date of Patent: March 26, 2013

Assignee: International Business Machines Corporation

Inventors: Daniel F. Gruhl, Jan H. Pieper, Mark A. Smith
Method and apparatus for data compression

Patent number: 8380688

Abstract: A method, system, and article for compressing an input stream of uncompressed data. The input stream is divided into one or more data segments. A hash is applied to a first data segment, and an offset and length are associated with this first segment. This hash, together with the offset and length data for the first segment, is stored in a hash table. Thereafter, a subsequent segment within the input stream is evaluated and compared with all other hash entries in the hash table, and a reference is written to a prior hash for an identified duplicate segment. The reference includes a new offset location for the subsequent segment. Similarly, a new hash is applied to an identified non-duplicate segment, with the new hash and its corresponding offset stored in the hash table. A compressed output stream of data is created from the hash table retained on storage media.

Type: Grant

Filed: November 6, 2009

Date of Patent: February 19, 2013

Assignee: International Business Machines Corporation

Inventors: Daniel F. Gruhl, Jan H. Pieper, Mark A. Smith
System for monitoring global online opinions via semantic extraction

Patent number: 8352412

Abstract: A system for transforming domain specific unstructured data into structured data including an intake platform controlled by feed back from a control platform. The intake platform includes an intake acquisition module for acquiring data building baseline data related to a domain and problem of interest, an intake pre-processing module, an intake language module, an intake application descriptors module, and an intake adjudication module. The control platform includes a control data acquisition module, a control data consistency collator, a control auditor, a control event definition and policy repository, an error resolver, and an output that outputs results of the workflow into structured data enabled to be used in data analytics.

Type: Grant

Filed: February 27, 2009

Date of Patent: January 8, 2013

Assignee: International Business Machines Corporation

Inventors: Alfredo Alba, Varun Bhagwan, Tyrone W. A. Grandison, Daniel F. Gruhl, Jan H. Pieper
DATA DEDUPLICATION FOR STREAMING SEQUENTIAL DATA STORAGE APPLICATIONS

Publication number: 20110185149

Abstract: Data deduplication compression in a streaming storage application, is provided. The disclosed deduplication process provides a deduplication archive that enables storage of the archive to, and extraction from, a streaming storage medium. One implementation involves compressing fully sequential data stored in a data repository to a sequential streaming storage, by: splitting fully sequential data into data blocks; hashing content of each data block and comparing each hash to an in-memory lookup table for a match, the in-memory lookup table storing all hashes that have been encountered during the compression of the fully sequential data; for each data block without a hash match, adding the data block as a new data block for compression of fully sequential data; and encoding duplicate data blocks using the in-memory lookup table into data segments.

Type: Application

Filed: January 27, 2010

Publication date: July 28, 2011

Applicant: International Business Machines Corporation

Inventors: Daniel F. Gruhl, Jan H. Pieper, Mark A. Smith
Method and Apparatus for Data Compression

Publication number: 20110113016

Abstract: A method, system, and article for compressing an input stream of uncompressed data. The input stream is divided into one or more data segments. A hash is applied to a first data segment, and an offset and length are associated with this first segment. This hash, together with the offset and length data for the first segment, is stored in a hash table. Thereafter, a subsequent segment within the input stream is evaluated and compared with all other hash entries in the hash table, and a reference is written to a prior hash for an identified duplicate segment. The reference includes a new offset location for the subsequent segment. Similarly, a new hash is applied to an identified non-duplicate segment, with the new hash and its corresponding offset stored in the hash table. A compressed output stream of data is created from the hash table retained on storage media.

Type: Application

Filed: November 6, 2009

Publication date: May 12, 2011

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Daniel F. Gruhl, Jan H. Pieper, Mark A. Smith
Operating System and File System Independent Incremental Data Backup

Publication number: 20110113012

Abstract: Embodiments of the invention relate to creating an operating system and file system independent incremental data backup. A first data backup of a source system and second version of the data on the source system is received. A second data backup of the second version of the data is created by determining differences between the first data backup and the second version of the data. Each portion of the second version of the data that is the same as a portion of the first data backup is referenced in the second data backup. Each portion of the second version of the data that is different than all portions of the first data backup is included in the second data backup. The second data backup is appended to the first data backup to create an incremental data backup.

Type: Application

Filed: November 6, 2009

Publication date: May 12, 2011

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Daniel Gruhl, Jan H. Pieper, Mark Andrew Smith
HOLISTIC DISAMBIGUATION FOR ENTITY NAME SPOTTING

Publication number: 20100223292

Abstract: A method resolves ambiguous spotted entity names in a data corpus by determining an activation level value for each of a plurality of nodes corresponding to a single ambiguous entity name. The activation levels for each of the nodes may be modified by inputting outside domain knowledge corresponding to the nodes to increase the activation value of the nodes, spotting entity names corresponding to the nodes to increase the activation value of the nodes, searching the data corpus to spot newly posted entity names to increase the activation value of the nodes, and searching the data corpus to reduce or deactivate the activation value of the nodes by eliminating false positives. The ambiguous entity name is assigned to the node determined to have the highest activation level and is then outputted to a user.

Type: Application

Filed: February 27, 2009

Publication date: September 2, 2010

Applicant: International Business Machines Corporation

Inventors: Varun Bhagwan, Tyrone W.A. Grandison, Daniel F. Gruhl, Jan H. Pieper
SYSTEM FOR MONITORING GLOBAL ONLINE OPINIONS VIA SEMANTIC EXTRACTION

Publication number: 20100223226

Abstract: A system for transforming domain specific unstructured data into structured data including an intake platform controlled by feed back from a control platform. The intake platform includes an intake acquisition module for acquiring data building baseline data related to a domain and problem of interest, an intake pre-processing module, an intake language module, an intake application descriptors module, and an intake adjudication module. The control platform includes a control data acquisition module, a control data consistency collator, a control auditor, a control event definition and policy repository, an error resolver, and an output that outputs results of the workflow into structured data enabled to be used in data analytics.

Type: Application

Filed: February 27, 2009

Publication date: September 2, 2010

Applicant: International Business Machines Corporation

Inventors: Alfredo Alba, Varun Bhagwan, Tyrone W.A. Grandison, Daniel F. Gruhl, Jan H. Pieper