Patents by Inventor Berthold Reinwald
Berthold Reinwald has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240070522Abstract: Providing a representative dataset from an initial dataset by accessing a dataset associated with a machine learning model, receiving input parameters associated with the representative dataset selection, the input parameters including an evaluation metric, determining a density of a plurality of datapoints associated with the dataset, training a first iteration of a machine learning model using a first data point selected according to the density, determining a first value of the evaluation metric for the first iteration of the machine learning model, generating a representative subset based on the first value of the evaluation metric value, and providing the representative dataset and a final machine learning model trained using the representative dataset.Type: ApplicationFiled: August 23, 2022Publication date: February 29, 2024Inventors: Shaikh Shahriar Quader, Aindrila Basak, Adrian Mahjour, Petr Novotny, CARLO APPUGLIESE, Berthold Reinwald, Dheeraj Arremsetty
-
Patent number: 11520986Abstract: Aspects of the present disclosure relate to neural-based ontology generation and refinement. A set of input data can be received. A set of entities can be extracted from the set of input data using a named-entity recognition (NER) process, each entity having a corresponding label, the corresponding labels making up a label set. The label set can be compared to concepts in a set of reference ontologies. Labels that match to concepts in the set of reference ontologies can be selected as a candidate concept set. Relations associated with the candidate concepts within the set of reference ontologies can be identified as a candidate relation set. An ontology can then be generated using the candidate concept set and candidate relation set.Type: GrantFiled: July 24, 2020Date of Patent: December 6, 2022Assignee: International Business Machines CorporationInventors: Balaji Ganesan, Riddhiman Dasgupta, Akshay Parekh, Hima Patel, Berthold Reinwald, Sameep Mehta
-
Publication number: 20220245425Abstract: A knowledge graph embedding method, system, and computer program product using a computing device to embed a knowledge graph using a graph convolutional network, the method including learning, by the computing device, an embedding of the knowledge graph that includes entities, relations, and edges, weighing, by the computing device, initial feature vectors of nodes and a convolutional layer output to compute a weight and modifying the embedding based on the weight, and using, by the computing device, the modified embedding to perform a task related to the knowledge graph.Type: ApplicationFiled: January 29, 2021Publication date: August 4, 2022Inventors: Nasrullah Sheikh, Xiao Qin, Berthold Reinwald, Christoph Adrian Miksovic Czasch, Thomas Gschwind, Paolo Scotton
-
Publication number: 20220245460Abstract: A graph neural network (GNN) training method, system, and computer program product in a graph, include generating, by the computing device, one or more one or more hypothetical edges between two or more nodes of a plurality of nodes of a graph neural network, testing, by the computing device, to determine whether the one or more generated hypothetical edges should be connected by using negative sampling, and permanently connecting, by the computing device, the one or more tested hypothetical edges if the negative sampling indicates the connectivity.Type: ApplicationFiled: January 29, 2021Publication date: August 4, 2022Inventors: Xiao Qin, Nasrullah Sheikh, Berthold Reinwald, Lingfei Wu
-
Publication number: 20220197977Abstract: A computer-implemented method is provided for predicting future data values or target labels of multivariate time series data. The method includes receiving the multivariate time series data having present values, systematic missing values, and random missing values. The method further includes masking the present values, the systematic missing values, and the random missing values using triplet encodings. The method also includes determining time intervals between current missing values, from among the systematic missing values and the random missing values, and immediately preceding ones of the present values. The method additionally includes training, by a computing device, at least one recurrent neural network with the triplet encodings, the time intervals, and multivariate time series data to perform a feedforward pass on the recurrent neural network predicting the future data values or the target labels.Type: ApplicationFiled: December 22, 2020Publication date: June 23, 2022Inventors: Mu Qiao, Yuya Jeremy Ong, Prithviraj Sen, Berthold Reinwald
-
Publication number: 20220188567Abstract: One embodiment provides a computer implemented method, including: obtaining an information document corresponding to an entity, wherein the information document includes redacted information spans; identifying an entity type for each of the redacted information spans, wherein the entity type identifies a relationship between a redacted information span and at least one other entity within the information document; replacing the redacted information spans with replacement entities corresponding to the entity type of a given redacted information span, wherein the replacing is performed in view of a frequency distribution of actual information and wherein the replacing includes maintaining relationships of the redacted information spans; and controlling bias within the replacement entities, wherein the controlling includes detecting bias within the replacement entities.Type: ApplicationFiled: December 11, 2020Publication date: June 16, 2022Inventors: Balaji Ganesan, Kalapriya Kannan, Neeraj Ramkrishna Singh, Shettigar Parkala Srinivas, Hima Patel, Soma Shekar Naganna, Berthold Reinwald, Sameep Mehta
-
Publication number: 20220092427Abstract: A method, a computer program product, and a system for non-obvious relationship detection. The method includes receiving a knowledge and inputting a first node and a second node from the knowledge graph into a twin neural network. The method also includes embedding the first node and the second node, aggregating neighborhood information and position information into the node embeddings. The method further includes concatenating the neighborhood information and the position information of the first node and the second node to produce a first output vector and a second output vector. The method also includes generating a final score by comparing the first output vector with the second output vector. The final score indicates a probability of a non-obvious relationship between the first node and the second node.Type: ApplicationFiled: September 21, 2020Publication date: March 24, 2022Inventors: Phillipp Müller, Xiao Qin, Balaji Ganesan, Berthold Reinwald, Nasrullah Sheikh
-
Publication number: 20220058465Abstract: In an approach for forecasting in multivariate irregularly sampled time series, a processor receives time series data having one or more missing values. A processor determines, from the time series data, non-missing values present in the time series data. A processor determines, from the time series data, zero or more mask values for the time series data. A processor determines time interval values. A processor inputs the one or more missing values, the non-missing values, the zero or more mask values, and the time interval values into a recurrent neural network. A processor determines a predicted value for the one or more missing values.Type: ApplicationFiled: August 24, 2020Publication date: February 24, 2022Inventors: Prithviraj Sen, Berthold Reinwald, Shivam Srivastava
-
Publication number: 20220027561Abstract: Aspects of the present disclosure relate to neural-based ontology generation and refinement. A set of input data can be received. A set of entities can be extracted from the set of input data using a named-entity recognition (NER) process, each entity having a corresponding label, the corresponding labels making up a label set. The label set can be compared to concepts in a set of reference ontologies. Labels that match to concepts in the set of reference ontologies can be selected as a candidate concept set. Relations associated with the candidate concepts within the set of reference ontologies can be identified as a candidate relation set. An ontology can then be generated using the candidate concept set and candidate relation set.Type: ApplicationFiled: July 24, 2020Publication date: January 27, 2022Inventors: Balaji Ganesan, Riddhiman Dasgupta, Akshay Parekh, Hima Patel, Berthold Reinwald, Sameep Mehta
-
Patent number: 11194826Abstract: A computer-implemented method is provided that includes identifying an input dataset formatted as an input matrix, the input matrix including a plurality of rows and a plurality of columns. The computer-implemented method also includes dividing the input matrix into a plurality of input matrix blocks. Further, the computer-implemented method includes distributing the input matrix blocks to a plurality of different machines across a distributed filesystem, and sampling, by at least two of the different machines in parallel, at least two of the input matrix blocks. Finally, the computer-implemented method includes generating at least one sample matrix based on the sampling of the at least two of the input matrix blocks.Type: GrantFiled: February 8, 2019Date of Patent: December 7, 2021Assignee: International Business Machines CorporationInventors: Douglas R. Burdick, Alexandre V. Evfimievski, Berthold Reinwald, Sebastian Schelter
-
Patent number: 10534590Abstract: The embodiments described herein relate to recompiling an execution plan of a machine-learning program during runtime. An execution plan of a machine-learning program is compiled. In response to identifying a directed acyclic graph of high-level operations (HOP DAG) for recompilation during runtime, the execution plan is dynamically recompiled. The dynamic recompilation includes updating statistics and dynamically rewriting one or more operators of the identified HOP DAG, recomputing memory estimates of operators of the rewritten HOP DAG based on the updated statistics and rewritten operators, constructing a directed acyclic graph of low-level operations (LOP DAG) corresponding to the rewritten HOP DAG based in part on the recomputed memory estimates, and generating runtime instructions based on the LOP DAG.Type: GrantFiled: April 28, 2017Date of Patent: January 14, 2020Assignee: International Business Machines CorporationInventors: Matthias Boehm, Berthold Reinwald, Shirish Tatikonda
-
Patent number: 10521435Abstract: A method that includes generating, in a query pre-processor, a set of pre-computed materialized sub-graphs by executing a pre-processing dynamic random-walk based search for a bin of terms. The method also includes receiving, in a query processor, a search query having at least one search query term. In response to receiving the search query, the method includes accessing the set of pre-computed materialized sub-graphs. The accessing includes accessing a text index based on the search query term to retrieve a corresponding term group identifier and accessing the corresponding pre-computed materialized sub-graph based on the term group identifier. The method also includes executing a dynamic random-walk based search on only the corresponding pre-computed materialized sub-graph and based on the executing, retrieving nodes in the dataset and transmitting the nodes as results of the query.Type: GrantFiled: September 21, 2015Date of Patent: December 31, 2019Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Andrey Balmin, Heasoo Hwang, Erik Nijkamp, Berthold Reinwald
-
Publication number: 20190171641Abstract: A computer-implemented method is provided that includes identifying an input dataset formatted as an input matrix, the input matrix including a plurality of rows and a plurality of columns. The computer-implemented method also includes dividing the input matrix into a plurality of input matrix blocks. Further, the computer-implemented method includes distributing the input matrix blocks to a plurality of different machines across a distributed filesystem, and sampling, by at least two of the different machines in parallel, at least two of the input matrix blocks. Finally, the computer-implemented method includes generating at least one sample matrix based on the sampling of the at least two of the input matrix blocks.Type: ApplicationFiled: February 8, 2019Publication date: June 6, 2019Inventors: Douglas R. Burdick, Alexandre V. Evfimievski, Berthold Reinwald, Sebastian Schelter
-
Patent number: 10268461Abstract: A method for global data flow optimization for machine learning (ML) programs. The method includes receiving, by a storage device, an initial plan for an ML program. A processor builds a nested global data flow graph representation using the initial plan. Operator directed acyclic graphs (DAGs) are connected using crossblock operators according to inter-block data dependencies. The initial plan for the ML program is re-written resulting in an optimized plan for the ML program with respect to its global data flow properties. The re-writing includes re-writes of: configuration dataflow properties, operator selection and structural changes.Type: GrantFiled: November 23, 2015Date of Patent: April 23, 2019Assignee: International Business Machines CorporationInventors: Matthias Boehm, Mathias Peters, Berthold Reinwald, Shirish Tatikonda
-
Patent number: 10228922Abstract: Parallel execution of machine learning programs is provided. Program code is received. The program code contains at least one parallel for statement having a plurality of iterations. A parallel execution plan is determined for the program code. According to the parallel execution plan, the plurality of iterations is partitioned into a plurality of tasks. Each task comprises at least one iteration. The iterations of each task are independent. Data required by the plurality of tasks is determined. An access pattern by the plurality of tasks of the data is determined. The data is partitioned based on the access pattern.Type: GrantFiled: January 12, 2016Date of Patent: March 12, 2019Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Matthias Boehm, Douglas Burdick, Berthold Reinwald, Prithviraj Sen, Shirish Tatikonda, Yuanyuan Tian, Shivakumar Vaithyanathan
-
Patent number: 10229168Abstract: A computer-implemented method is provided that includes identifying an input dataset formatted as an input matrix, the input matrix including a plurality of rows and a plurality of columns. The computer-implemented method also includes dividing the input matrix into a plurality of input matrix blocks. Further, the computer-implemented method includes distributing the input matrix blocks to a plurality of different machines across a distributed filesystem, and sampling, by at least two of the different machines in parallel, at least two of the input matrix blocks. Finally, the computer-implemented method includes generating at least one sample matrix based on the sampling of the at least two of the input matrix blocks.Type: GrantFiled: November 20, 2015Date of Patent: March 12, 2019Assignee: International Business Machines CorporationInventors: Douglas R. Burdick, Alexandre V. Evfimievski, Berthold Reinwald, Sebastian Schelter
-
Patent number: 10223762Abstract: A method for optimization of machine learning (ML) workloads on a graphics processor unit (GPU). The method includes identifying a computation having a generic pattern commonly observed in ML processes. Hierarchical aggregation spanning a memory hierarchy of the GPU for processing is performed for the identified computation including maintaining partial output vector results in shared memory of the GPU. Hierarchical aggregation for vectors is performed including performing intra-block aggregation for multiple thread blocks of a partial output vector results on GPU global memory.Type: GrantFiled: March 16, 2018Date of Patent: March 5, 2019Assignee: International Business Machines CorporationInventors: Arash Ashari, Matthias Boehm, Keith W. Campbell, Alexandre Evfimievski, John D. Keenleyside, Berthold Reinwald, Shirish Tatikonda
-
Patent number: 10198291Abstract: One embodiment provides a method for runtime piggybacking of concurrent data-parallel jobs in task-parallel machine learning (ML) programs including intercepting, by a processor, executable jobs including executable map reduce (MR) jobs and looped jobs in a job stream. The processor queues the executable jobs, and applies runtime piggybacking of multiple jobs by processing workers of different types. Runtime piggybacking for a ParFOR (parallel for) ML program is optimized including configuring the runtime piggybacking based on processing worker type, degree of parallelism and minimum time thresholds.Type: GrantFiled: March 7, 2017Date of Patent: February 5, 2019Assignee: International Business Machines CorporationInventors: Matthias Boehm, Berthold Reinwald, Shirish Tatikonda
-
Publication number: 20180260246Abstract: One embodiment provides a method for runtime piggybacking of concurrent data-parallel jobs in task-parallel machine learning (ML) programs including intercepting, by a processor, executable jobs including executable map reduce (MR) jobs and looped jobs in a job stream. The processor queues the executable jobs, and applies runtime piggybacking of multiple jobs by processing workers of different types. Runtime piggybacking for a ParFOR (parallel for) ML program is optimized including configuring the runtime piggybacking based on processing worker type, degree of parallelism and minimum time thresholds.Type: ApplicationFiled: March 7, 2017Publication date: September 13, 2018Inventors: Matthias Boehm, Berthold Reinwald, Shirish Tatikonda
-
Publication number: 20180211357Abstract: A method for optimization of machine learning (ML) workloads on a graphics processor unit (GPU). The method includes identifying a computation having a generic pattern commonly observed in ML processes. Hierarchical aggregation spanning a memory hierarchy of the GPU for processing is performed for the identified computation including maintaining partial output vector results in shared memory of the GPU. Hierarchical aggregation for vectors is performed including performing intra-block aggregation for multiple thread blocks of a partial output vector results on GPU global memory.Type: ApplicationFiled: March 16, 2018Publication date: July 26, 2018Inventors: Arash Ashari, Matthias Boehm, Keith W. Campbell, Alexandre Evfimievski, John D. Keenleyside, Berthold Reinwald, Shirish Tatikonda