Patents by Inventor Vinod Grover

Vinod Grover has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

CODE COVERAGE GENERATION IN GPU BY USING HOST-DEVICE COORDINATION

Publication number: 20190108006

Abstract: System and method of compiling a program having a mixture of host code and device code to enable code coverage data collection for device code execution. An exemplary integrated compiler can compile source code programmed to be executed by a host processor (e.g., CPU) and a co-processor (e.g., a GPU) concurrently. The compilation can generate an instrumented executable code which includes: coverage instrumentation counters for the device functions; mapping information that maps the counters with the instrumented source points; and instructions for the host processor to allocate and initialize device memory for the counters and to retrieve collected code coverage information from the device memory to the host memory. Execution of the instrumented executable can yield a coverage report on the device code functions.

Type: Application

Filed: October 8, 2018

Publication date: April 11, 2019

Inventors: Hariharan Sandanagobalane, Sean Lee, Vinod Grover
System and method for compiler support for compile time customization of code

Patent number: 10241761

Abstract: A system and method for processing source code for compilation. The method includes accessing a portion of host source code and determining whether the portion of the host source code comprises a device lambda expression. The method further includes in response to the portion of host code comprising the device lambda expression, determining a unique placeholder type instantiation based on the device lambda expression and modifying the device lambda expression based on the unique placeholder type instantiation to produce modified host source code. The method further includes sending the modified host source code to a host compiler.

Type: Grant

Filed: December 14, 2015

Date of Patent: March 26, 2019

Assignee: Nvidia Corporation

Inventors: Jaydeep Marathe, Vinod Grover
TECHNIQUE FOR INTER-PROCEDURAL MEMORY ADDRESS SPACE OPTIMIZATION IN GPU COMPUTING COMPILER

Publication number: 20190087164

Abstract: A device compiler and linker is configured to optimize program code of a co-processor enabled application by resolving generic memory access operations within that program code to target specific memory spaces. In situations where a generic memory access operation cannot be resolved and may target constant memory, constant variables associated with those generic memory access operations are transferred to reside in global memory.

Type: Application

Filed: November 19, 2018

Publication date: March 21, 2019

Inventors: Xiangyun KONG, Jian-Zhong WANG, Yuan LIN, Vinod GROVER
Fusing a sequence of operations through subdividing

Patent number: 10152310

Abstract: A compiler and a method of compiling code that reduces memory bandwidth when processing code on a computer are provided herein. In one embodiment, the method includes: (1) automatically identifying a sequence of operations for fusing, wherein the sequence of operations correspond to instructions from a source code, (2) determining subdivisions of a final output of the sequence of operations, (3) determining input data and intermediate operations needed to obtain a final subdivision output for each of the subdivisions and (4) automatically generating code to fuse the sequence of operations employing the subdivisions, wherein the automatically identifying and the automatically generating are performed by a processor.

Type: Grant

Filed: May 27, 2015

Date of Patent: December 11, 2018

Assignee: Nvidia Corporation

Inventors: Mahesh Ravishankar, Paulius Micikevicius, Vinod Grover
Dynamic compiler parallelism techniques

Patent number: 10152312

Abstract: Compiler techniques for inline parallelism and re-targetable parallel runtime execution of logic iterators enables selection thereof from the source code or dynamically during the object code execution.

Type: Grant

Filed: January 21, 2015

Date of Patent: December 11, 2018

Assignee: NVIDIA Corporation

Inventors: Vinod Grover, Thibaut Lutz
Execution of divergent threads using a convergence barrier

Patent number: 10067768

Abstract: A method, system, and computer program product for executing divergent threads using a convergence barrier are disclosed. A first instruction in a program is executed by a plurality of threads, where the first instruction, when executed by a particular thread, indicates to a scheduler unit that the thread participates in a convergence barrier. A first path through the program is executed by a first divergent portion of the participating threads and a second path through the program is executed by a second divergent portion of the participating threads. The first divergent portion of the participating threads executes a second instruction in the program and transitions to a blocked state at the convergence barrier. The scheduler unit determines that all of the participating threads are synchronized at the convergence barrier and the convergence barrier is cleared.

Type: Grant

Filed: July 13, 2015

Date of Patent: September 4, 2018

Assignee: NVIDIA CORPORATION

Inventors: Gregory Frederick Diamos, Richard Craig Johnson, Vinod Grover, Olivier Giroux, Jack H. Choquette, Michael Alan Fetterman, Ajay S. Tirumala, Peter Nelson, Ronny Meir Krashinsky
EXECUTION OF COMPUTATION GRAPHS

Publication number: 20180203673

Abstract: A computation graph is accessed. In the computation graph, operations to be performed are represented as interior nodes, inputs to the operations are represented as leaf nodes, and a result of the operations is represented as a root. Selected sets of the operations are combined to form respective kernels of operations. Code is generated execute the kernels of operations. The code is executed to determine the result.

Type: Application

Filed: January 16, 2018

Publication date: July 19, 2018

Inventors: Mahesh RAVISHANKAR, Vinod GROVER, Evghenii GABUROV, Alberto MAGNI, Sean LEE
System and method for compiler support for kernel launches in device code

Patent number: 10025643

Abstract: A system and method for compiling source code (e.g., with a compiler). The method includes accessing a portion of device source code and determining whether the portion of the device source code comprises a piece of work to be launched on a device from the device. The method further includes determining a plurality of application programming interface (API) calls based on the piece of work to be launched on the device and generating compiled code based on the plurality of API calls. The compiled code comprises a first portion operable to execute on a central processing unit (CPU) and a second portion operable to execute on the device (e.g., GPU).

Type: Grant

Filed: January 7, 2013

Date of Patent: July 17, 2018

Assignee: Nvidia Corporation

Inventors: Vinod Grover, Jaydeep Marathe, Sean Lee
Partial program specialization at runtime

Patent number: 9952843

Abstract: A solution is proposed for implementing staging in computer programs and code specialization at runtime. Even when values are not known at compile time, many of the values used as parameters for a code section or a function are constant, and are known prior to starting the computation of the algorithm. Embodiments of the claimed subject matter propagate these values just before execution in the same way a compiler would if they were compile time constant, resulting in improved control flow and significant simplification in the computation involved.

Type: Grant

Filed: May 15, 2015

Date of Patent: April 24, 2018

Assignee: NVIDIA CORPORATION

Inventors: Vinod Grover, Thibaut Lutz
System and method for retrieving values of captured local variables for lambda functions in Java

Patent number: 9798569

Abstract: A system for and method of retrieving values of captured local variables for a lambda function in Java. In one embodiment, the system includes: (1) a Java virtual machine and (2) a captured variable retriever that interacts with the Java virtual machine and configured to retrieve a signature of the lambda function from a classfile of a Java class containing the lambda function, compare the signature with a declaration of the lambda function to identify arguments corresponding to the captured local variables, modify the lambda function and cause the Java virtual machine to execute the modified lambda function.

Type: Grant

Filed: February 15, 2016

Date of Patent: October 24, 2017

Assignee: Nvidia Corporation

Inventors: Michael Lai, Vinod Grover, Sean Lee, Jaydeep Marathe
SYSTEM AND METHOD FOR RETRIEVING VALUES OF CAPTURED LOCAL VARIABLES FOR LAMBDA FUNCTIONS IN JAVA

Publication number: 20170235586

Abstract: A system for and method of retrieving values of captured local variables for a lambda function in Java. In one embodiment, the system includes: (1) a Java virtual machine and (2) a captured variable retriever that interacts with the Java virtual machine and configured to retrieve a signature of the lambda function from a classfile of a Java class containing the lambda function, compare the signature with a declaration of the lambda function to identify arguments corresponding to the captured local variables, modify the lambda function and cause the Java virtual machine to execute the modified lambda function.

Type: Application

Filed: February 15, 2016

Publication date: August 17, 2017

Inventors: Michael Lai, Vinod Grover, Sean Lee, Jaydeep Marathe
Allocating memory for local variables of a multi-threaded program for execution in a single-threaded environment

Patent number: 9678775

Abstract: Computer code written to execute on a multi-threaded computing environment is transformed into code designed to execute on a single-threaded computing environment and simulate concurrent executing threads. Optimization techniques during the transformation process are utilized to identify local variables for scalar expansion. A first set of local variables is defined that includes those local variables in the code identified as “Downward exposed Defined” (DD). A second set of local variables is defined that includes those local variables in the code identified as “Upward exposed Use” (UU). The intersection of the first set and the second set identifies local variables for scalar expansion.

Type: Grant

Filed: February 26, 2009

Date of Patent: June 13, 2017

Assignee: NVIDIA Corporation

Inventors: Vinod Grover, John A. Stratton
SYSTEM AND METHOD FOR OPTIMIZING MULTIPLE INVOCATIONS OF GRAPHICS PROCESSING UNIT PROGRAMS IN JAVA

Publication number: 20170147299

Abstract: A system and method for optimizing multiple invocations of a graphics processing unit (GPU) program in Java. In one embodiment, the system includes: (1) a frontend component in a computer system and configured to compile Java bytecode associated with the a class object that implements a functional interface into Intermediate Representation (IR) code and store the IR code with the associated jogArray and (2) a collector/composer component in the computer system, associated with the frontend and configured to traverse a tree containing the multiple invocations from the result to collect the IR code and compose the IR code collected in the traversing into aggregate IR code when a result of the GPU program is explicitly requested to be transferred to a host.

Type: Application

Filed: November 24, 2015

Publication date: May 25, 2017

Inventors: Michael Lai, Vinod Grover, Sean Lee, Jaydeep Marathe
Algorithm for vectorization and memory coalescing during compiling

Patent number: 9639336

Abstract: One embodiment of the present invention sets forth a technique for reducing the number of assembly instructions included in a computer program. The technique involves receiving a directed acyclic graph (DAG) that includes a plurality of nodes, where each node includes an assembly instruction of the computer program, hierarchically parsing the plurality of nodes to identify at least two assembly instructions that are vectorizable and can be replaced by a single vectorized assembly instruction, and replacing the at least two assembly instructions with the single vectorized assembly instruction.

Type: Grant

Filed: October 25, 2012

Date of Patent: May 2, 2017

Assignee: NVIDIA Corporation

Inventors: Vinod Grover, Manjunath Kudlur, Michael Murphy
Confluence analysis and loop fast-forwarding for improving SIMD execution efficiency

Patent number: 9612811

Abstract: One embodiment of the present invention sets forth a method for causing thread convergence. The method includes determining that a control flow graph representing a first section of a program includes at least two non-overlapping paths that extend from a first divergent node to a candidate node. The method also includes determining that the first divergent node is not a dominator of the candidate node or that the candidate node is not a post-dominator of the first divergent node. The method further includes identifying an external node and inserting a first instruction configured to cause a predicate variable to be set to true for a first set of threads that is to execute the external node. The method additionally includes inserting into the program a second divergent node configured to cause various threads to execute or not execute a first control flow path associated with the external node.

Type: Grant

Filed: January 21, 2014

Date of Patent: April 4, 2017

Assignee: NVIDIA Corporation

Inventors: Amit Jayant Sabne, Yuan Lin, Vinod Grover
Methods for reducing memory space in sequential operations using directed acyclic graphs

Patent number: 9563933

Abstract: Various disclosed embodiments are directed to methods and systems for reducing memory space in sequential computer-implemented operations. The method includes generating a directed acyclic graph (DAG) having a plurality of vertices and directed edges, wherein each edge connects a predecessor vertex to a successor vertex. Each vertex represents one of the computer-implemented operations and each directed edge represents output data generated by the operations. The method includes merging one of the predecessor vertex with one of the successor vertex by combining the operations of the predecessor vertex and the successor vertex if the predecessor and successor vertices are connected by a directed edge and there is only one directed edge originating from the predecessor vertex. The merger of the predecessor and the successor vertices reduces the number of directed edges in the DAG, resulting in a reduction of intermediate buffer memory required to store the output data.

Type: Grant

Filed: January 28, 2014

Date of Patent: February 7, 2017

Assignee: Nvidia Corporation

Inventors: Vinod Grover, Mahesh Ravishankar
FUSING A SEQUENCE OF OPERATIONS THROUGH SUBDIVIDING

Publication number: 20160350088

Abstract: A compiler and a method of compiling code that reduces memory bandwidth when processing code on a computer are provided herein. In one embodiment, the method includes: (1) automatically identifying a sequence of operations for fusing, wherein the sequence of operations correspond to instructions from a source code, (2) determining subdivisions of a final output of the sequence of operations, (3) determining input data and intermediate operations needed to obtain a final subdivision output for each of the subdivisions and (4) automatically generating code to fuse the sequence of operations employing the subdivisions, wherein the automatically identifying and the automatically generating are performed by a processor.

Type: Application

Filed: May 27, 2015

Publication date: December 1, 2016

Inventors: Mahesh Ravishankar, Paulius Micikevicius, Vinod Grover
Execution of retargetted graphics processor accelerated code by a general purpose processor

Patent number: 9448779

Abstract: One embodiment of the present invention sets forth a technique for translating application programs written using a parallel programming model for execution on multi-core graphics processing unit (GPU) for execution by general purpose central processing unit (CPU). Portions of the application program that rely on specific features of the multi-core GPU are converted by a translator for execution by a general purpose CPU. The application program is partitioned into regions of synchronization independent instructions. The instructions are classified as convergent or divergent and divergent memory references that are shared between regions are replicated. Thread loops are inserted to ensure correct sharing of memory between various threads during execution by the general purpose CPU.

Type: Grant

Filed: March 20, 2009

Date of Patent: September 20, 2016

Assignee: NVIDIA Corporation

Inventors: Vinod Grover, Bastiaan Joannes Matheus Aarts, Michael Murphy, Jayant B. Kolhe, John Bryan Pormann, Douglas Saylor
Technique for live analysis-based rematerialization to reduce register pressures and enhance parallelism

Patent number: 9436447

Abstract: A device compiler and linker within a parallel processing unit (PPU) is configured to optimize program code of a co-processor enabled application by rematerializing a subset of live-in variables for a particular block in a control flow graph generated for that program code. The device compiler and linker identifies the block of the control flow graph that has the greatest number of live-in variables, then selects a subset of the live-in variables associated with the identified block for which rematerializing confers the greatest estimated profitability. The profitability of rematerializing a given subset of live-in variables is determined based on the number of live-in variables reduced, the cost of rematerialization, and the potential risk of rematerialization.

Type: Grant

Filed: November 5, 2012

Date of Patent: September 6, 2016

Assignee: NVIDIA Corporation

Inventors: Xiangyun Kong, Jian-Zhong Wang, Yuan Lin, Vinod Grover
System, method, and computer program product for optimizing the management of thread stack memory

Patent number: 9411715

Abstract: A system, method, and computer program product for optimizing thread stack memory allocation is disclosed. The method includes the steps of receiving source code for a program, translating the source code into an intermediate representation, analyzing the intermediate representation to identify at least two objects that could use a first allocated memory space in a thread stack memory, and modifying the intermediate representation by replacing references to a first object of the at least two objects with a reference to a second object of the at least two objects.

Type: Grant

Filed: December 12, 2012

Date of Patent: August 9, 2016

Assignee: NVIDIA Corporation

Inventors: Adriana Maria Susnea, Vinod Grover, Sean Youngsung Lee

prev 1 2 3 4 5 next