Patents by Inventor Thomas Hampp-Bahnmueller

Thomas Hampp-Bahnmueller has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Dynamic fact contextualization in support of artificial intelligence (AI) model development

Patent number: 12682154

Abstract: Provided are techniques for dynamic fact contextualization in support of AI model development. A template from a plurality of templates is selected, where the template includes definitions for identifying facts. The facts are retrieved from a facts repository based on the definitions. It is determined that that the facts are valid based on one or more policies. A FactSheet is generated using the template and the facts. A machine learning model is used to identify one or more deficient facts from the FactSheet. The FactSheet is displayed in a preview with the one or more deficient facts. One or more facts corresponding to the one or more deficient facts are located. The FactSheet is updated to correct the one or more deficient facts with the corresponding facts.

Type: Grant

Filed: March 30, 2023

Date of Patent: July 14, 2026

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: John Thomas Richards, Thomas Hampp-Bahnmueller, Michael Hind, David John Piorkowski
Smart identification of indicator text with full-text search or optimized document analysis

Patent number: 12417306

Abstract: Several aspects for optimizing unstructured document analysis comprise operating a document system, where the document system comprises a plurality of documents comprising unstructured content and a full-text index; receiving a request to identify documents comprising a type of data elements; selecting a sample out of the plurality of documents; determining data elements of the type in the sample of documents; determining an indicator context expression for the type of data elements out of the determined data elements of the type; determining a query for searching, using a search engine, the full-text index using the indicator context expression; and determining the documents in the document system being compliant to the determined query.

Type: Grant

Filed: December 19, 2022

Date of Patent: September 16, 2025

Assignee: International Business Machines Corporation

Inventors: Thomas Hampp-Bahnmueller, Michael Baessler, Yannick Saillet
Optimizing metadata enrichment of data assets

Patent number: 12380074

Abstract: The present disclosure relates to a method of metadata enrichment using an enrichment comprising multiple steps. The method comprises: determining for an input data asset a metadata value descriptive of the input data asset. Characteristics of the metadata value of the input data asset may be determined. At least one informativeness score of the metadata value of the input data asset may be computed using the determined characteristics. An execution of the enrichment step may be skipped in case an input characteristic of the enrichment step is not part of the determined characteristics. In case the input characteristic of the enrichment step is part of the determined characteristics, the enrichment step may be adapted and executed or the enrichment step may be executed without adaptation. Labels resulting from the executed enrichment steps may be combined for providing one or more labels of the data asset.

Type: Grant

Filed: January 4, 2023

Date of Patent: August 5, 2025

Assignee: International Business Machines Corporation

Inventors: Thomas Hampp-Bahnmueller, Peter Gerstl, Yannick Saillet, Michael Baessler, Albert Maier, Oliver Suhre
Random sampling from a search engine

Patent number: 12189695

Abstract: A method for providing one or more random sample documents from a corpus of documents using a search engine is provided. The providing of each of the random sample documents comprises selecting randomly a time window from a set of time windows. A search query is sent to the search engine defining a search for documents of the corpus with time-stamps within the time window defined by the randomly selected time window. In response to the sending of the search query, a search result is receiving from the search engine. The search result comprises a set of the documents of the corpus with time-stamps within the time window. One of the documents comprised by the received set of documents is then selected randomly.

Type: Grant

Filed: September 19, 2023

Date of Patent: January 7, 2025

Assignee: International Business Machines Corporation

Inventors: Michael Baessler, Thomas Hampp-Bahnmueller, Jojo Joseph, Pavlo Petrenko
DYNAMIC FACT CONTEXTUALIZATION IN SUPPORT OF ARTIFICIAL INTELLIGENCE (AI) MODEL DEVELOPMENT

Publication number: 20240330577

Abstract: Provided are techniques for dynamic fact contextualization in support of AI model development. A template from a plurality of templates is selected, where the template includes definitions for identifying facts. The facts are retrieved from a facts repository based on the definitions. It is determined that that the facts are valid based on one or more policies. A FactSheet is generated using the template and the facts. A machine learning model is used to identify one or more deficient facts from the FactSheet. The FactSheet is displayed in a preview with the one or more deficient facts. One or more facts corresponding to the one or more deficient facts are located. The FactSheet is updated to correct the one or more deficient facts with the corresponding facts.

Type: Application

Filed: March 30, 2023

Publication date: October 3, 2024

Inventors: John Thomas Richards, Thomas Hampp-Bahnmueller, Michael Hind, David John Piorkowski
IMPACT SCORE FOR ONTOLOGY CHANGES

Publication number: 20240256591

Abstract: Described are techniques for a re-analysis of assignments of terms to assets. The techniques include detecting a change in a term ontology comprising a plurality of terms, and determining at least one selected from a group consisting of: a domain feature change vector (DFCV) for a domain of the term ontology affected by the change, and a term feature change vector (TFCV) for the term affected by the change. The techniques further include identifying assets for the re-analysis of the assignments of terms, wherein each of the identified assets is associated with an impact score value based on the DFCV and/or the TFCV, and performing the re-analysis of the assignments of terms for the identified assets ordered by the impact score value.

Type: Application

Filed: February 1, 2023

Publication date: August 1, 2024

Inventors: Oliver Suhre, Thomas Hampp-Bahnmueller, Peter Gerstl, Yannick Saillet, Albert Maier, Michael Baessler
SMART IDENTIFICATION OF INDICATOR TEXT WITH FULL-TEXT SEARCH OR OPTIMIZED DOCUMENT ANALYSIS

Publication number: 20240202358

Abstract: Several aspects for optimizing unstructured document analysis comprise operating a document system, where the document system comprises a plurality of documents comprising unstructured content and a full-text index; receiving a request to identify documents comprising a type of data elements; selecting a sample out of the plurality of documents; determining data elements of the type in the sample of documents; determining an indicator context expression for the type of data elements out of the determined data elements of the type; determining a query for searching, using a search engine, the full-text index using the indicator context expression; and determining the documents in the document system being compliant to the determined query.

Type: Application

Filed: December 19, 2022

Publication date: June 20, 2024

Inventors: Thomas Hampp-Bahnmueller, Michael Baessler, Yannick Saillet
OPTIMIZING METADATA ENRICHMENT OF DATA ASSETS

Publication number: 20240152494

Abstract: The present disclosure relates to a method of metadata enrichment using an enrichment comprising multiple steps. The method comprises: determining for an input data asset a metadata value descriptive of the input data asset. Characteristics of the metadata value of the input data asset may be determined. At least one informativeness score of the metadata value of the input data asset may be computed using the determined characteristics. An execution of the enrichment step may be skipped in case an input characteristic of the enrichment step is not part of the determined characteristics. In case the input characteristic of the enrichment step is part of the determined characteristics, the enrichment step may be adapted and executed or the enrichment step may be executed without adaptation. Labels resulting from the executed enrichment steps may be combined for providing one or more labels of the data asset.

Type: Application

Filed: January 4, 2023

Publication date: May 9, 2024

Inventors: Thomas Hampp-Bahnmueller, Peter Gerstl, Yannick Saillet, Michael Baessler, Albert Maier, Oliver Suhre
Analyzing deduplicated data blocks associated with unstructured documents

Patent number: 11921676

Abstract: Techniques are described relating to unstructured document processing. An associated computer-implemented method includes identifying a plurality of deduplicated data blocks associated with a collection of unstructured documents. The method further includes sorting the plurality of deduplicated data blocks in descending order based upon at least one block frequency metric, selecting a highest sorted unprocessed deduplicated data block, applying text analytics to the selected deduplicated data block, and applying at least one result of the text analytics to any document among the collection of unstructured documents including the selected deduplicated data block. The method is terminated responsive to satisfaction of at least one stopping condition.

Type: Grant

Filed: November 29, 2021

Date of Patent: March 5, 2024

Assignee: International Business Machines Corporation

Inventors: Michael Baessler, Thomas Hampp-Bahnmueller, Yannick Saillet
RANDOM SAMPLING FROM A SEARCH ENGINE

Publication number: 20240004939

Abstract: A method for providing one or more random sample documents from a corpus of documents using a search engine is provided. The providing of each of the random sample documents comprises selecting randomly a time window from a set of time windows. A search query is sent to the search engine defining a search for documents of the corpus with time-stamps within the time window defined by the randomly selected time window. In response to the sending of the search query, a search result is receiving from the search engine. The search result comprises a set of the documents of the corpus with time-stamps within the time window. One of the documents comprised by the received set of documents is then selected randomly.

Type: Application

Filed: September 19, 2023

Publication date: January 4, 2024

Inventors: Michael Baessler, Thomas Hampp-Bahnmueller, Jojo Joseph, Pavlo Petrenko
Random sampling from a search engine

Patent number: 11797615

Abstract: A method for providing one or more random sample documents from a corpus of documents using a search engine is provided. The providing of each of the random sample documents comprises selecting randomly a time window from a set of time windows. A search query is sent to the search engine defining a search for documents of the corpus with time-stamps within the time window defined by the randomly selected time window. In response to the sending of the search query, a search result is received from the search engine. The search result comprises a set of the documents of the corpus with time-stamps within the time window. One of the documents comprised by the received set of documents is then selected randomly.

Type: Grant

Filed: January 7, 2020

Date of Patent: October 24, 2023

Assignee: International Business Machines Corporation

Inventors: Michael Baessler, Thomas Hampp-Bahnmueller, Jojo Joseph, Pavlo Petrenko
Processing electronic documents

Patent number: 11783088

Abstract: A method for processing electronic documents comprises an iteration including: (i) applying, by a computer device, a first statistical test process to a first subset of the documents, the first statistical test process estimating whether or not content of the documents of the first subset comply with a predefined criterion; (ii) in response to a result of the first statistical test process, estimating, by the computer device, that the documents of the first subset do not comply with the criterion, selecting, by the computer device, a part of the documents of the first subset, and moving, by the computer device, the part of the documents to a second subset of the documents; and (iii) applying, by the computer device, a second statistical test process to the second subset of the documents, the second statistical test process calculating at least one statistical metric related to the documents of the second subset.

Type: Grant

Filed: February 1, 2019

Date of Patent: October 10, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Michael Bässler, Amir Jaibaji, Jojo Joseph, Thomas Hampp-Bahnmueller
ANALYZING DEDUPLICATED DATA BLOCKS ASSOCIATED WITH UNSTRUCTURED DOCUMENTS

Publication number: 20230169041

Abstract: Techniques are described relating to unstructured document processing. An associated computer-implemented method includes identifying a plurality of deduplicated data blocks associated with a collection of unstructured documents. The method further includes sorting the plurality of deduplicated data blocks in descending order based upon at least one block frequency metric, selecting a highest sorted unprocessed deduplicated data block, applying text analytics to the selected deduplicated data block, and applying at least one result of the text analytics to any document among the collection of unstructured documents including the selected deduplicated data block. The method is terminated responsive to satisfaction of at least one stopping condition.

Type: Application

Filed: November 29, 2021

Publication date: June 1, 2023

Inventors: Michael Baessler, Thomas Hampp-Bahnmueller, Yannick Saillet
EXTRACTION OF STRUCTURED INFORMATION FROM UNSTRUCTURED DOCUMENTS

Publication number: 20220114189

Abstract: Embodiments of the present invention provide methods, computer program products, and systems. Embodiments of the present invention can extract of structured information for unstructured document analysis. Embodiments of the present invention can extract structured information for unstructured document analysis by identifying tables and columns of a database that correspond to business terms of a business glossary. Embodiments of the present invention can then receive a specification of business terms of interest for recognizing in an unstructured document. Embodiments of the present invention can then generate an analysis module based on the identified tables and columns that enables to identify or recognize attribute values of attributes of the tables and columns. Embodiments of the present invention can then use the analysis module for automatic extraction of values of at least part of the attributes from the unstructured document based on the specification of business terms of interest.

Type: Application

Filed: October 14, 2020

Publication date: April 14, 2022

Inventors: Michael Baessler, Albert Maier, Dirk Jahn, Thomas Hampp-Bahnmueller
RANDOM SAMPLING FROM A SEARCH ENGINE

Publication number: 20210004417

Abstract: A method for providing one or more random sample documents from a corpus of documents using a search engine is provided. The providing of each of the random sample documents comprises selecting randomly a time window from a set of time windows. A search query is sent to the search engine defining a search for documents of the corpus with time-stamps within the time window defined by the randomly selected time window. In response to the sending of the search query, a search result is receiving from the search engine. The search result comprises a set of the documents of the corpus with time-stamps within the time window. One of the documents comprised by the received set of documents is then selected randomly.

Type: Application

Filed: January 7, 2020

Publication date: January 7, 2021

Inventors: Michael Baessler, Thomas Hampp-Bahnmueller, Jojo Joseph, Pavlo Petrenko
PROCESSING ELECTRONIC DOCUMENTS

Publication number: 20200250345

Abstract: A method for processing electronic documents comprises an iteration including: (i) applying, by a computer device, a first statistical test process to a first subset of the documents, the first statistical test process estimating whether or not content of the documents of the first subset comply with a predefined criterion; (ii) in response to a result of the first statistical test process, estimating, by the computer device, that the documents of the first subset do not comply with the criterion, selecting, by the computer device, a part of the documents of the first subset, and moving, by the computer device, the part of the documents to a second subset of the documents; and (iii) applying, by the computer device, a second statistical test process to the second subset of the documents, the second statistical test process calculating at least one statistical metric related to the documents of the second subset.

Type: Application

Filed: February 1, 2019

Publication date: August 6, 2020

Inventors: Michael Bässler, Amir Jaibaji, Jojo Joseph, Thomas Hampp-Bahnmueller
Communication method and system for accessing media data

Patent number: 10152477

Abstract: Providing access to media data shared by multiple users. A predefined edge weight is assigned to each edge of a linked data structure based on a dependency category of the edge. A first access rating value is assigned to each node. A rating residue value is calculated as the difference between the two first access rating values of the nodes connected by each edge. The data structure is traversed from a seed node, and for each edge traversed, calculating a second access rating value using an edge weight value and the first access rating value. Repeating until the rating residue values meet a predefined convergence criterion. The nodes having access rating values meeting a predefined data removal criterion are selected from the nodes of the linked data structure. The data entities corresponding to the selected nodes are then removed.

Type: Grant

Filed: February 6, 2015

Date of Patent: December 11, 2018

Assignee: International Business Machines Corporation

Inventors: Brent Benton, Thomas Hampp-Bahnmueller, Dana W. Morris, Daniel Pittner, Thomas Schaeck, Dieter Schieber
Clustering a collection using an inverted index of features

Patent number: 10083230

Abstract: Provided are techniques for creating an inverted index for features of a set of data elements, wherein each of the data elements is represented by a vector of features, wherein the inverted index, when queried with a feature, outputs one or more data elements containing the feature. The features of the set of data elements are ranked. For each feature in the ranked list, the inverted index is queried for data elements having the feature and not having any previously selected feature and a cluster of the data elements is created based on results returned in response to the query.

Type: Grant

Filed: December 13, 2010

Date of Patent: September 25, 2018

Assignee: International Business Machines Corporation

Inventors: Danish Contractor, Thomas Hampp-Bahnmueller, Sachindra Joshi, Raghuram Krishnapuram, Kenney Ng
COMMUNICATION METHOD AND SYSTEM FOR ACCESSING MEDIA DATA

Publication number: 20150263984

Abstract: Providing access to media data shared by multiple users. A predefined edge weight is assigned to each edge of a linked data structure based on a dependency category of the edge. A first access rating value is assigned to each node. A rating residue value is calculated as the difference between the two first access rating values of the nodes connected by each edge. The data structure is traversed from a seed node, and for each edge traversed, calculating a second access rating value using an edge weight value and the first access rating value. Repeating until the rating residue values meet a predefined convergence criterion. The nodes having access rating values meeting a predefined data removal criterion are selected from the nodes of the linked data structure. The data entities corresponding to the selected nodes are then removed.

Type: Application

Filed: February 6, 2015

Publication date: September 17, 2015

Inventors: Brent Benton, Thomas Hampp-Bahnmueller, Dana W. Morris, Daniel Pittner, Thomas Schaeck, Dieter Schieber
Enhanced content web browsing

Patent number: 8543571

Abstract: An embodiment of a method for enhanced content browsing includes loading a web page in a user interface; detecting entities of a first specified type in the web page by an analysis service; tagging the detected entities in the web page; calling an action service associated with the analysis service when a detected entity is activated; and displaying a result of the action service in the user interface. Embodiments of systems for enhanced content browsing are also provided.

Type: Grant

Filed: January 8, 2009

Date of Patent: September 24, 2013

Assignee: International Business Machines Corporation

Inventors: Michael Baessler, Andrea Elias, Thilo Goetz, Thomas Hampp-Bahnmueller, Sebastian Nelke

1 2 next