Patents by Inventor Shirish Tatikonda
Shirish Tatikonda has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 10534590Abstract: The embodiments described herein relate to recompiling an execution plan of a machine-learning program during runtime. An execution plan of a machine-learning program is compiled. In response to identifying a directed acyclic graph of high-level operations (HOP DAG) for recompilation during runtime, the execution plan is dynamically recompiled. The dynamic recompilation includes updating statistics and dynamically rewriting one or more operators of the identified HOP DAG, recomputing memory estimates of operators of the rewritten HOP DAG based on the updated statistics and rewritten operators, constructing a directed acyclic graph of low-level operations (LOP DAG) corresponding to the rewritten HOP DAG based in part on the recomputed memory estimates, and generating runtime instructions based on the LOP DAG.Type: GrantFiled: April 28, 2017Date of Patent: January 14, 2020Assignee: International Business Machines CorporationInventors: Matthias Boehm, Berthold Reinwald, Shirish Tatikonda
-
Patent number: 10268461Abstract: A method for global data flow optimization for machine learning (ML) programs. The method includes receiving, by a storage device, an initial plan for an ML program. A processor builds a nested global data flow graph representation using the initial plan. Operator directed acyclic graphs (DAGs) are connected using crossblock operators according to inter-block data dependencies. The initial plan for the ML program is re-written resulting in an optimized plan for the ML program with respect to its global data flow properties. The re-writing includes re-writes of: configuration dataflow properties, operator selection and structural changes.Type: GrantFiled: November 23, 2015Date of Patent: April 23, 2019Assignee: International Business Machines CorporationInventors: Matthias Boehm, Mathias Peters, Berthold Reinwald, Shirish Tatikonda
-
Patent number: 10228922Abstract: Parallel execution of machine learning programs is provided. Program code is received. The program code contains at least one parallel for statement having a plurality of iterations. A parallel execution plan is determined for the program code. According to the parallel execution plan, the plurality of iterations is partitioned into a plurality of tasks. Each task comprises at least one iteration. The iterations of each task are independent. Data required by the plurality of tasks is determined. An access pattern by the plurality of tasks of the data is determined. The data is partitioned based on the access pattern.Type: GrantFiled: January 12, 2016Date of Patent: March 12, 2019Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Matthias Boehm, Douglas Burdick, Berthold Reinwald, Prithviraj Sen, Shirish Tatikonda, Yuanyuan Tian, Shivakumar Vaithyanathan
-
Patent number: 10223762Abstract: A method for optimization of machine learning (ML) workloads on a graphics processor unit (GPU). The method includes identifying a computation having a generic pattern commonly observed in ML processes. Hierarchical aggregation spanning a memory hierarchy of the GPU for processing is performed for the identified computation including maintaining partial output vector results in shared memory of the GPU. Hierarchical aggregation for vectors is performed including performing intra-block aggregation for multiple thread blocks of a partial output vector results on GPU global memory.Type: GrantFiled: March 16, 2018Date of Patent: March 5, 2019Assignee: International Business Machines CorporationInventors: Arash Ashari, Matthias Boehm, Keith W. Campbell, Alexandre Evfimievski, John D. Keenleyside, Berthold Reinwald, Shirish Tatikonda
-
Patent number: 10198291Abstract: One embodiment provides a method for runtime piggybacking of concurrent data-parallel jobs in task-parallel machine learning (ML) programs including intercepting, by a processor, executable jobs including executable map reduce (MR) jobs and looped jobs in a job stream. The processor queues the executable jobs, and applies runtime piggybacking of multiple jobs by processing workers of different types. Runtime piggybacking for a ParFOR (parallel for) ML program is optimized including configuring the runtime piggybacking based on processing worker type, degree of parallelism and minimum time thresholds.Type: GrantFiled: March 7, 2017Date of Patent: February 5, 2019Assignee: International Business Machines CorporationInventors: Matthias Boehm, Berthold Reinwald, Shirish Tatikonda
-
Publication number: 20180260246Abstract: One embodiment provides a method for runtime piggybacking of concurrent data-parallel jobs in task-parallel machine learning (ML) programs including intercepting, by a processor, executable jobs including executable map reduce (MR) jobs and looped jobs in a job stream. The processor queues the executable jobs, and applies runtime piggybacking of multiple jobs by processing workers of different types. Runtime piggybacking for a ParFOR (parallel for) ML program is optimized including configuring the runtime piggybacking based on processing worker type, degree of parallelism and minimum time thresholds.Type: ApplicationFiled: March 7, 2017Publication date: September 13, 2018Inventors: Matthias Boehm, Berthold Reinwald, Shirish Tatikonda
-
Publication number: 20180211357Abstract: A method for optimization of machine learning (ML) workloads on a graphics processor unit (GPU). The method includes identifying a computation having a generic pattern commonly observed in ML processes. Hierarchical aggregation spanning a memory hierarchy of the GPU for processing is performed for the identified computation including maintaining partial output vector results in shared memory of the GPU. Hierarchical aggregation for vectors is performed including performing intra-block aggregation for multiple thread blocks of a partial output vector results on GPU global memory.Type: ApplicationFiled: March 16, 2018Publication date: July 26, 2018Inventors: Arash Ashari, Matthias Boehm, Keith W. Campbell, Alexandre Evfimievski, John D. Keenleyside, Berthold Reinwald, Shirish Tatikonda
-
Patent number: 9972063Abstract: A method for optimization of machine learning (ML) workloads on a graphics processor unit (GPU). The method includes identifying a computation having a generic pattern commonly observed in ML processes. An optimized fused GPU kernel is employed to exploit temporal locality for inherent data-flow dependencies in the identified computation. Hierarchical aggregation spanning a memory hierarchy of the GPU for processing for the identified computation is performed. GPU kernel launch parameters are estimated following an analytical model that maximizes thread occupancy and minimizes atomic writes to GPU global memory.Type: GrantFiled: July 30, 2015Date of Patent: May 15, 2018Assignee: International Business Machines CorporationInventors: Arash Ashari, Matthias Boehm, Keith W. Campbell, Alexandre Evfimievski, John D. Keenleyside, Berthold Reinwald, Shirish Tatikonda
-
Publication number: 20170228222Abstract: The embodiments described herein relate to recompiling an execution plan of a machine-learning program during runtime. An execution plan of a machine-learning program is compiled. In response to identifying a directed acyclic graph of high-level operations (HOP DAG) for recompilation during runtime, the execution plan is dynamically recompiled. The dynamic recompilation includes updating statistics and dynamically rewriting one or more operators of the identified HOP DAG, recomputing memory estimates of operators of the rewritten HOP DAG based on the updated statistics and rewritten operators, constructing a directed acyclic graph of low-level operations (LOP DAG) corresponding to the rewritten HOP DAG based in part on the recomputed memory estimates, and generating runtime instructions based on the LOP DAG.Type: ApplicationFiled: April 28, 2017Publication date: August 10, 2017Applicant: International Business Machines CorporationInventors: Matthias Boehm, Berthold Reinwald, Shirish Tatikonda
-
Patent number: 9715373Abstract: The embodiments described herein relate to recompiling an execution plan of a machine-learning program during runtime. An execution plan of a machine-learning program is compiled. In response to identifying a directed acyclic graph of high-level operations (HOP DAG) for recompilation during runtime, the execution plan is dynamically recompiled. The dynamic recompilation includes updating statistics and dynamically rewriting one or more operators of the identified HOP DAG, recomputing memory estimates of operators of the rewritten HOP DAG based on the updated statistics and rewritten operators, constructing a directed acyclic graph of low-level operations (LOP DAG) corresponding to the rewritten HOP DAG based in part on the recomputed memory estimates, and generating runtime instructions based on the LOP DAG.Type: GrantFiled: December 18, 2015Date of Patent: July 25, 2017Assignee: International Business Machines CorporationInventors: Matthias Boehm, Berthold Reinwald, Shirish Tatikonda
-
Publication number: 20170177312Abstract: The embodiments described herein relate to recompiling an execution plan of a machine-learning program during runtime. An execution plan of a machine-learning program is compiled. In response to identifying a directed acyclic graph of high-level operations (HOP DAG) for recompilation during runtime, the execution plan is dynamically recompiled. The dynamic recompilation includes updating statistics and dynamically rewriting one or more operators of the identified HOP DAG, recomputing memory estimates of operators of the rewritten HOP DAG based on the updated statistics and rewritten operators, constructing a directed acyclic graph of low-level operations (LOP DAG) corresponding to the rewritten HOP DAG based in part on the recomputed memory estimates, and generating runtime instructions based on the LOP DAG.Type: ApplicationFiled: December 18, 2015Publication date: June 22, 2017Applicant: International Business Machines CorporationInventors: Matthias Boehm, Berthold Reinwald, Shirish Tatikonda
-
Patent number: 9684493Abstract: In a method for analyzing a large data set using a statistical computing environment language operation, a processor generates code from the statistical computing environment language operation that can be understood by a software system for processing machine learning algorithms in a MapReduce environment. A processor transfers the code to the software system for processing machine learning algorithms in a MapReduce environment. A processor invokes execution of the code with the software system for processing machine learning algorithms in a MapReduce environment.Type: GrantFiled: June 2, 2014Date of Patent: June 20, 2017Assignee: International Business Machines CorporationInventors: Matthias Boehm, Douglas R. Burdick, Stefan Burnicki, Berthold Reinwald, Shirish Tatikonda
-
Publication number: 20170147943Abstract: A method for global data flow optimization for machine learning (ML) programs. The method includes receiving, by a storage device, an initial plan for an ML program. A processor builds a nested global data flow graph representation using the initial plan. Operator directed acyclic graphs (DAGs) are connected using crossblock operators according to inter-block data dependencies. The initial plan for the ML program is re-written resulting in an optimized plan for the ML program with respect to its global data flow properties. The re-writing includes re-writes of: configuration dataflow properties, operator selection and structural changes.Type: ApplicationFiled: November 23, 2015Publication date: May 25, 2017Inventors: Matthias Boehm, Mathias Peters, Berthold Reinwald, Shirish Tatikonda
-
Patent number: 9652374Abstract: Embodiments of the invention relate to sparsity-driven matrix representation. In one embodiment, a sparsity of a matrix is determined and the sparsity is compared to a threshold. Computer memory is allocated to store the matrix in a first data structure format based on the sparsity being greater than the threshold. Computer memory is allocated to store the matrix in a second data structure format based on the sparsity not being greater than the threshold.Type: GrantFiled: August 31, 2016Date of Patent: May 16, 2017Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Berthold Reinwald, Shirish Tatikonda, Yuanyuan Tian
-
Publication number: 20170032487Abstract: A method for optimization of machine learning (ML) workloads on a graphics processor unit (GPU). The method includes identifying a computation having a generic pattern commonly observed in ML processes. An optimized fused GPU kernel is employed to exploit temporal locality for inherent data-flow dependencies in the identified computation. Hierarchical aggregation spanning a memory hierarchy of the GPU for processing for the identified computation is performed. GPU kernel launch parameters are estimated following an analytical model that maximizes thread occupancy and minimizes atomic writes to GPU global memory.Type: ApplicationFiled: July 30, 2015Publication date: February 2, 2017Inventors: Arash Ashari, Matthias Boehm, Keith W. Campbell, Alexandre Evfimievski, John D. Keenleyside, Berthold Reinwald, Shirish Tatikonda
-
Publication number: 20160364327Abstract: Embodiments of the invention relate to sparsity-driven matrix representation. In one embodiment, a sparsity of a matrix is determined and the sparsity is compared to a threshold. Computer memory is allocated to store the matrix in a first data structure format based on the sparsity being greater than the threshold.Type: ApplicationFiled: August 31, 2016Publication date: December 15, 2016Inventors: Berthold Reinwald, Shirish Tatikonda, Yuanyuan Tian
-
Patent number: 9454472Abstract: Embodiments of the invention relate to sparsity-driven matrix representation. In one embodiment, a sparsity of a matrix is determined and the sparsity is compared to a threshold. Computer memory is allocated to store the matrix in a first data structure format based on the sparsity being greater than the threshold. Computer memory is allocated to store the matrix in a second data structure format based on the sparsity not being greater than the threshold.Type: GrantFiled: April 15, 2016Date of Patent: September 27, 2016Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Berthold Reinwald, Shirish Tatikonda, Yuanyuan Tian
-
Publication number: 20160217066Abstract: Embodiments of the invention relate to sparsity-driven matrix representation. In one embodiment, a sparsity of a matrix is determined and the sparsity is compared to a threshold. Computer memory is allocated to store the matrix in a first data structure format based on the sparsity being greater than the threshold.Type: ApplicationFiled: April 15, 2016Publication date: July 28, 2016Inventors: Berthold Reinwald, Shirish Tatikonda, Yuanyuan Tian
-
Patent number: 9400767Abstract: Embodiments relate to subgraph-based distributed graph processing. An aspect includes receiving an input graph comprising a plurality of vertices. Another aspect includes partitioning the input graph into a plurality of subgraphs, each subgraph comprising internal vertices and boundary vertices. Another aspect includes assigning one or more respective subgraphs to each of a plurality of workers. Another aspect includes initiating processing of the plurality of subgraphs by performing a series of processing steps comprising: processing the internal vertices and boundary vertices internally within each of the subgraphs; detecting that a change was made to a boundary vertex of a first subgraph during the internal processing; and sending a message from a first worker to which the first subgraph is assigned to a second worker to which a second subgraph is assigned in response to detecting the change that was made to the boundary vertex of the first subgraph.Type: GrantFiled: December 17, 2013Date of Patent: July 26, 2016Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Andrey Balmin, Severin A. Corsten, John A McPherson, Jr., Shirish Tatikonda, Yuanyuan Tian
-
Patent number: 9396164Abstract: Embodiments of the invention relate to sparsity-driven matrix representation. In one embodiment, a sparsity of a matrix is determined and the sparsity is compared to a threshold. Computer memory is allocated to store the matrix in a first data structure format based on the sparsity being greater than the threshold. Computer memory is allocated to store the matrix in a second data structure format based on the sparsity not being greater than the threshold.Type: GrantFiled: October 21, 2013Date of Patent: July 19, 2016Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Berthold Reinwald, Shirish Tatikonda, Yuanyuan Tian