Patents by Inventor Thomas Hampp-Bahnmueller

Thomas Hampp-Bahnmueller has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240152494
    Abstract: The present disclosure relates to a method of metadata enrichment using an enrichment comprising multiple steps. The method comprises: determining for an input data asset a metadata value descriptive of the input data asset. Characteristics of the metadata value of the input data asset may be determined. At least one informativeness score of the metadata value of the input data asset may be computed using the determined characteristics. An execution of the enrichment step may be skipped in case an input characteristic of the enrichment step is not part of the determined characteristics. In case the input characteristic of the enrichment step is part of the determined characteristics, the enrichment step may be adapted and executed or the enrichment step may be executed without adaptation. Labels resulting from the executed enrichment steps may be combined for providing one or more labels of the data asset.
    Type: Application
    Filed: January 4, 2023
    Publication date: May 9, 2024
    Inventors: Thomas Hampp-Bahnmueller, Peter Gerstl, Yannick Saillet, Michael Baessler, Albert Maier, Oliver Suhre
  • Patent number: 11921676
    Abstract: Techniques are described relating to unstructured document processing. An associated computer-implemented method includes identifying a plurality of deduplicated data blocks associated with a collection of unstructured documents. The method further includes sorting the plurality of deduplicated data blocks in descending order based upon at least one block frequency metric, selecting a highest sorted unprocessed deduplicated data block, applying text analytics to the selected deduplicated data block, and applying at least one result of the text analytics to any document among the collection of unstructured documents including the selected deduplicated data block. The method is terminated responsive to satisfaction of at least one stopping condition.
    Type: Grant
    Filed: November 29, 2021
    Date of Patent: March 5, 2024
    Assignee: International Business Machines Corporation
    Inventors: Michael Baessler, Thomas Hampp-Bahnmueller, Yannick Saillet
  • Publication number: 20240004939
    Abstract: A method for providing one or more random sample documents from a corpus of documents using a search engine is provided. The providing of each of the random sample documents comprises selecting randomly a time window from a set of time windows. A search query is sent to the search engine defining a search for documents of the corpus with time-stamps within the time window defined by the randomly selected time window. In response to the sending of the search query, a search result is receiving from the search engine. The search result comprises a set of the documents of the corpus with time-stamps within the time window. One of the documents comprised by the received set of documents is then selected randomly.
    Type: Application
    Filed: September 19, 2023
    Publication date: January 4, 2024
    Inventors: Michael Baessler, Thomas Hampp-Bahnmueller, Jojo Joseph, Pavlo Petrenko
  • Patent number: 11797615
    Abstract: A method for providing one or more random sample documents from a corpus of documents using a search engine is provided. The providing of each of the random sample documents comprises selecting randomly a time window from a set of time windows. A search query is sent to the search engine defining a search for documents of the corpus with time-stamps within the time window defined by the randomly selected time window. In response to the sending of the search query, a search result is received from the search engine. The search result comprises a set of the documents of the corpus with time-stamps within the time window. One of the documents comprised by the received set of documents is then selected randomly.
    Type: Grant
    Filed: January 7, 2020
    Date of Patent: October 24, 2023
    Assignee: International Business Machines Corporation
    Inventors: Michael Baessler, Thomas Hampp-Bahnmueller, Jojo Joseph, Pavlo Petrenko
  • Patent number: 11783088
    Abstract: A method for processing electronic documents comprises an iteration including: (i) applying, by a computer device, a first statistical test process to a first subset of the documents, the first statistical test process estimating whether or not content of the documents of the first subset comply with a predefined criterion; (ii) in response to a result of the first statistical test process, estimating, by the computer device, that the documents of the first subset do not comply with the criterion, selecting, by the computer device, a part of the documents of the first subset, and moving, by the computer device, the part of the documents to a second subset of the documents; and (iii) applying, by the computer device, a second statistical test process to the second subset of the documents, the second statistical test process calculating at least one statistical metric related to the documents of the second subset.
    Type: Grant
    Filed: February 1, 2019
    Date of Patent: October 10, 2023
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael Bässler, Amir Jaibaji, Jojo Joseph, Thomas Hampp-Bahnmueller
  • Publication number: 20230169041
    Abstract: Techniques are described relating to unstructured document processing. An associated computer-implemented method includes identifying a plurality of deduplicated data blocks associated with a collection of unstructured documents. The method further includes sorting the plurality of deduplicated data blocks in descending order based upon at least one block frequency metric, selecting a highest sorted unprocessed deduplicated data block, applying text analytics to the selected deduplicated data block, and applying at least one result of the text analytics to any document among the collection of unstructured documents including the selected deduplicated data block. The method is terminated responsive to satisfaction of at least one stopping condition.
    Type: Application
    Filed: November 29, 2021
    Publication date: June 1, 2023
    Inventors: Michael Baessler, Thomas Hampp-Bahnmueller, Yannick Saillet
  • Publication number: 20220114189
    Abstract: Embodiments of the present invention provide methods, computer program products, and systems. Embodiments of the present invention can extract of structured information for unstructured document analysis. Embodiments of the present invention can extract structured information for unstructured document analysis by identifying tables and columns of a database that correspond to business terms of a business glossary. Embodiments of the present invention can then receive a specification of business terms of interest for recognizing in an unstructured document. Embodiments of the present invention can then generate an analysis module based on the identified tables and columns that enables to identify or recognize attribute values of attributes of the tables and columns. Embodiments of the present invention can then use the analysis module for automatic extraction of values of at least part of the attributes from the unstructured document based on the specification of business terms of interest.
    Type: Application
    Filed: October 14, 2020
    Publication date: April 14, 2022
    Inventors: Michael Baessler, Albert Maier, Dirk Jahn, Thomas Hampp-Bahnmueller
  • Publication number: 20210004417
    Abstract: A method for providing one or more random sample documents from a corpus of documents using a search engine is provided. The providing of each of the random sample documents comprises selecting randomly a time window from a set of time windows. A search query is sent to the search engine defining a search for documents of the corpus with time-stamps within the time window defined by the randomly selected time window. In response to the sending of the search query, a search result is receiving from the search engine. The search result comprises a set of the documents of the corpus with time-stamps within the time window. One of the documents comprised by the received set of documents is then selected randomly.
    Type: Application
    Filed: January 7, 2020
    Publication date: January 7, 2021
    Inventors: Michael Baessler, Thomas Hampp-Bahnmueller, Jojo Joseph, Pavlo Petrenko
  • Publication number: 20200250345
    Abstract: A method for processing electronic documents comprises an iteration including: (i) applying, by a computer device, a first statistical test process to a first subset of the documents, the first statistical test process estimating whether or not content of the documents of the first subset comply with a predefined criterion; (ii) in response to a result of the first statistical test process, estimating, by the computer device, that the documents of the first subset do not comply with the criterion, selecting, by the computer device, a part of the documents of the first subset, and moving, by the computer device, the part of the documents to a second subset of the documents; and (iii) applying, by the computer device, a second statistical test process to the second subset of the documents, the second statistical test process calculating at least one statistical metric related to the documents of the second subset.
    Type: Application
    Filed: February 1, 2019
    Publication date: August 6, 2020
    Inventors: Michael Bässler, Amir Jaibaji, Jojo Joseph, Thomas Hampp-Bahnmueller
  • Patent number: 10262056
    Abstract: A method and a computer-readable medium for method for searching a plurality of documents. Each document is structured into a set of blocks and each block is associated with a block ID. The method includes receiving a search query including a search term having at least one search term attribute; identifying at least one block ID based on a correlation between the at least one search term attribute and the set of blocks; and identifying at least one document based on a correlation between the set of blocks and the documents. Methods for generating a data structure for searching documents are also described.
    Type: Grant
    Filed: November 28, 2014
    Date of Patent: April 16, 2019
    Assignee: International Business Machines Corporation
    Inventors: Thomas A. Hampp-Bahnmueller, Peng H. Jiang, Pi J. Jiang, Yan U. Xu
  • Patent number: 10152477
    Abstract: Providing access to media data shared by multiple users. A predefined edge weight is assigned to each edge of a linked data structure based on a dependency category of the edge. A first access rating value is assigned to each node. A rating residue value is calculated as the difference between the two first access rating values of the nodes connected by each edge. The data structure is traversed from a seed node, and for each edge traversed, calculating a second access rating value using an edge weight value and the first access rating value. Repeating until the rating residue values meet a predefined convergence criterion. The nodes having access rating values meeting a predefined data removal criterion are selected from the nodes of the linked data structure. The data entities corresponding to the selected nodes are then removed.
    Type: Grant
    Filed: February 6, 2015
    Date of Patent: December 11, 2018
    Assignee: International Business Machines Corporation
    Inventors: Brent Benton, Thomas Hampp-Bahnmueller, Dana W. Morris, Daniel Pittner, Thomas Schaeck, Dieter Schieber
  • Patent number: 10083230
    Abstract: Provided are techniques for creating an inverted index for features of a set of data elements, wherein each of the data elements is represented by a vector of features, wherein the inverted index, when queried with a feature, outputs one or more data elements containing the feature. The features of the set of data elements are ranked. For each feature in the ranked list, the inverted index is queried for data elements having the feature and not having any previously selected feature and a cluster of the data elements is created based on results returned in response to the query.
    Type: Grant
    Filed: December 13, 2010
    Date of Patent: September 25, 2018
    Assignee: International Business Machines Corporation
    Inventors: Danish Contractor, Thomas Hampp-Bahnmueller, Sachindra Joshi, Raghuram Krishnapuram, Kenney Ng
  • Publication number: 20160314183
    Abstract: Provided is a technique for matching different user representations of a person in a plurality of computer systems may be provided. The technique includes collecting information sets about user representations from a plurality of computer systems; normalizing the information sets to a unified format; grouping the information sets in the unified format into indexing buckets based on a user name using a non-phonetic algorithm; determining a similarity score for each pair of information sets in each of the indexing buckets; classifying each information set pair into a set of classes based on the similarity scores, wherein the set of classes comprise at least matches and non-matches; and using a data structure for merging information of information set pairs classified as matches.
    Type: Application
    Filed: April 21, 2015
    Publication date: October 27, 2016
    Inventors: Lars Bremer, Thomas A. Hampp-Bahnmueller, Markus Lorch, Pavlo Petrenko, Sebastian B. Schmid
  • Publication number: 20150263984
    Abstract: Providing access to media data shared by multiple users. A predefined edge weight is assigned to each edge of a linked data structure based on a dependency category of the edge. A first access rating value is assigned to each node. A rating residue value is calculated as the difference between the two first access rating values of the nodes connected by each edge. The data structure is traversed from a seed node, and for each edge traversed, calculating a second access rating value using an edge weight value and the first access rating value. Repeating until the rating residue values meet a predefined convergence criterion. The nodes having access rating values meeting a predefined data removal criterion are selected from the nodes of the linked data structure. The data entities corresponding to the selected nodes are then removed.
    Type: Application
    Filed: February 6, 2015
    Publication date: September 17, 2015
    Inventors: Brent Benton, Thomas Hampp-Bahnmueller, Dana W. Morris, Daniel Pittner, Thomas Schaeck, Dieter Schieber
  • Publication number: 20150154253
    Abstract: A method and a computer-readable medium for method for searching a plurality of documents. Each document is structured into a set of blocks and each block is associated with a block ID. The method includes receiving a search query including a search term having at least one search term attribute; identifying at least one block ID based on a correlation between the at least one search term attribute and the set of blocks; and identifying at least one document based on a correlation between the set of blocks and the documents. Methods for generating a data structure for searching documents are also described.
    Type: Application
    Filed: November 28, 2014
    Publication date: June 4, 2015
    Inventors: Thomas A. Hampp-Bahnmueller, Peng H. Jiang, Pi J. JIANG, Yan U. XU
  • Patent number: 8543571
    Abstract: An embodiment of a method for enhanced content browsing includes loading a web page in a user interface; detecting entities of a first specified type in the web page by an analysis service; tagging the detected entities in the web page; calling an action service associated with the analysis service when a detected entity is activated; and displaying a result of the action service in the user interface. Embodiments of systems for enhanced content browsing are also provided.
    Type: Grant
    Filed: January 8, 2009
    Date of Patent: September 24, 2013
    Assignee: International Business Machines Corporation
    Inventors: Michael Baessler, Andrea Elias, Thilo Goetz, Thomas Hampp-Bahnmueller, Sebastian Nelke
  • Publication number: 20120150867
    Abstract: Provided are techniques for creating an inverted index for features of a set of data elements, wherein each of the data elements is represented by a vector of features, wherein the inverted index, when queried with a feature, outputs one or more data elements containing the feature. The features of the set of data elements are ranked. For each feature in the ranked list, the inverted index is queried for data elements having the feature and not having any previously selected feature and a cluster of the data elements is created based on results returned in response to the query.
    Type: Application
    Filed: December 13, 2010
    Publication date: June 14, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Danish Contractor, Thomas Hampp-Bahnmueller, Sachindra Joshi, Raghuram Krishnapuram, Kenney Ng
  • Publication number: 20100306307
    Abstract: According to one embodiment of the present invention, a method for social bookmarking and tagging documents is provided. According to one embodiment of the present invention, a method comprises receiving a new document in a tagging server having a storage unit with stored tags associated with a preexisting document and comparing the new document with the tags using a processor to find matching instances between parts of the new document and the tags. Each matching instance in the new document is marked with tag information. The marked up new document is delivered for display on a display unit.
    Type: Application
    Filed: May 31, 2009
    Publication date: December 2, 2010
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael Baessler, Andrea Elias, Thilo Goetz, Thomas Hampp-Bahnmueller, Sebastian Nelke
  • Publication number: 20100174713
    Abstract: An embodiment of a method for enhanced content browsing includes loading a web page in a user interface; detecting entities of a first specified type in the web page by an analysis service; tagging the detected entities in the web page; calling an action service associated with the analysis service when a detected entity is activated; and displaying a result of the action service in the user interface. Embodiments of systems for enhanced content browsing are also provided.
    Type: Application
    Filed: January 8, 2009
    Publication date: July 8, 2010
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael Baessler, Andrea Elias, Thilo Goetz, Thomas Hampp-Bahnmueller, Sebastian Nelke
  • Publication number: 20060129538
    Abstract: Techniques are provided for electronic Information Retrieval (IR) applied for an electronic search in a search environment. At indexing time, a searched document is mapped to at least one element of an organizational structure of an enterprise associated with the search environment. At query time, a querying user is associated with at least one element of the organizational structure of the enterprise. The organizational information of the searched document and that of the querying user are compared. A higher rank is provided to the searched document when the searched document has a closer organizational relation to the querying user compared to other searched documents with a less close relation to the querying user based on the compared organizational information.
    Type: Application
    Filed: December 5, 2005
    Publication date: June 15, 2006
    Inventors: Andrea Baader, Michael Baessler, Jochen Doerre, Thilo Goetz, Thomas Hampp-Bahnmueller, Alexander Lang