Patents by Inventor Vinod Grover

Vinod Grover has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20190108006
    Abstract: System and method of compiling a program having a mixture of host code and device code to enable code coverage data collection for device code execution. An exemplary integrated compiler can compile source code programmed to be executed by a host processor (e.g., CPU) and a co-processor (e.g., a GPU) concurrently. The compilation can generate an instrumented executable code which includes: coverage instrumentation counters for the device functions; mapping information that maps the counters with the instrumented source points; and instructions for the host processor to allocate and initialize device memory for the counters and to retrieve collected code coverage information from the device memory to the host memory. Execution of the instrumented executable can yield a coverage report on the device code functions.
    Type: Application
    Filed: October 8, 2018
    Publication date: April 11, 2019
    Inventors: Hariharan Sandanagobalane, Sean Lee, Vinod Grover
  • Patent number: 10241761
    Abstract: A system and method for processing source code for compilation. The method includes accessing a portion of host source code and determining whether the portion of the host source code comprises a device lambda expression. The method further includes in response to the portion of host code comprising the device lambda expression, determining a unique placeholder type instantiation based on the device lambda expression and modifying the device lambda expression based on the unique placeholder type instantiation to produce modified host source code. The method further includes sending the modified host source code to a host compiler.
    Type: Grant
    Filed: December 14, 2015
    Date of Patent: March 26, 2019
    Assignee: Nvidia Corporation
    Inventors: Jaydeep Marathe, Vinod Grover
  • Publication number: 20190087164
    Abstract: A device compiler and linker is configured to optimize program code of a co-processor enabled application by resolving generic memory access operations within that program code to target specific memory spaces. In situations where a generic memory access operation cannot be resolved and may target constant memory, constant variables associated with those generic memory access operations are transferred to reside in global memory.
    Type: Application
    Filed: November 19, 2018
    Publication date: March 21, 2019
    Inventors: Xiangyun KONG, Jian-Zhong WANG, Yuan LIN, Vinod GROVER
  • Patent number: 10152310
    Abstract: A compiler and a method of compiling code that reduces memory bandwidth when processing code on a computer are provided herein. In one embodiment, the method includes: (1) automatically identifying a sequence of operations for fusing, wherein the sequence of operations correspond to instructions from a source code, (2) determining subdivisions of a final output of the sequence of operations, (3) determining input data and intermediate operations needed to obtain a final subdivision output for each of the subdivisions and (4) automatically generating code to fuse the sequence of operations employing the subdivisions, wherein the automatically identifying and the automatically generating are performed by a processor.
    Type: Grant
    Filed: May 27, 2015
    Date of Patent: December 11, 2018
    Assignee: Nvidia Corporation
    Inventors: Mahesh Ravishankar, Paulius Micikevicius, Vinod Grover
  • Patent number: 10152312
    Abstract: Compiler techniques for inline parallelism and re-targetable parallel runtime execution of logic iterators enables selection thereof from the source code or dynamically during the object code execution.
    Type: Grant
    Filed: January 21, 2015
    Date of Patent: December 11, 2018
    Assignee: NVIDIA Corporation
    Inventors: Vinod Grover, Thibaut Lutz
  • Patent number: 10067768
    Abstract: A method, system, and computer program product for executing divergent threads using a convergence barrier are disclosed. A first instruction in a program is executed by a plurality of threads, where the first instruction, when executed by a particular thread, indicates to a scheduler unit that the thread participates in a convergence barrier. A first path through the program is executed by a first divergent portion of the participating threads and a second path through the program is executed by a second divergent portion of the participating threads. The first divergent portion of the participating threads executes a second instruction in the program and transitions to a blocked state at the convergence barrier. The scheduler unit determines that all of the participating threads are synchronized at the convergence barrier and the convergence barrier is cleared.
    Type: Grant
    Filed: July 13, 2015
    Date of Patent: September 4, 2018
    Assignee: NVIDIA CORPORATION
    Inventors: Gregory Frederick Diamos, Richard Craig Johnson, Vinod Grover, Olivier Giroux, Jack H. Choquette, Michael Alan Fetterman, Ajay S. Tirumala, Peter Nelson, Ronny Meir Krashinsky
  • Publication number: 20180203673
    Abstract: A computation graph is accessed. In the computation graph, operations to be performed are represented as interior nodes, inputs to the operations are represented as leaf nodes, and a result of the operations is represented as a root. Selected sets of the operations are combined to form respective kernels of operations. Code is generated execute the kernels of operations. The code is executed to determine the result.
    Type: Application
    Filed: January 16, 2018
    Publication date: July 19, 2018
    Inventors: Mahesh RAVISHANKAR, Vinod GROVER, Evghenii GABUROV, Alberto MAGNI, Sean LEE
  • Patent number: 10025643
    Abstract: A system and method for compiling source code (e.g., with a compiler). The method includes accessing a portion of device source code and determining whether the portion of the device source code comprises a piece of work to be launched on a device from the device. The method further includes determining a plurality of application programming interface (API) calls based on the piece of work to be launched on the device and generating compiled code based on the plurality of API calls. The compiled code comprises a first portion operable to execute on a central processing unit (CPU) and a second portion operable to execute on the device (e.g., GPU).
    Type: Grant
    Filed: January 7, 2013
    Date of Patent: July 17, 2018
    Assignee: Nvidia Corporation
    Inventors: Vinod Grover, Jaydeep Marathe, Sean Lee
  • Patent number: 9952843
    Abstract: A solution is proposed for implementing staging in computer programs and code specialization at runtime. Even when values are not known at compile time, many of the values used as parameters for a code section or a function are constant, and are known prior to starting the computation of the algorithm. Embodiments of the claimed subject matter propagate these values just before execution in the same way a compiler would if they were compile time constant, resulting in improved control flow and significant simplification in the computation involved.
    Type: Grant
    Filed: May 15, 2015
    Date of Patent: April 24, 2018
    Assignee: NVIDIA CORPORATION
    Inventors: Vinod Grover, Thibaut Lutz
  • Patent number: 9798569
    Abstract: A system for and method of retrieving values of captured local variables for a lambda function in Java. In one embodiment, the system includes: (1) a Java virtual machine and (2) a captured variable retriever that interacts with the Java virtual machine and configured to retrieve a signature of the lambda function from a classfile of a Java class containing the lambda function, compare the signature with a declaration of the lambda function to identify arguments corresponding to the captured local variables, modify the lambda function and cause the Java virtual machine to execute the modified lambda function.
    Type: Grant
    Filed: February 15, 2016
    Date of Patent: October 24, 2017
    Assignee: Nvidia Corporation
    Inventors: Michael Lai, Vinod Grover, Sean Lee, Jaydeep Marathe
  • Publication number: 20170235586
    Abstract: A system for and method of retrieving values of captured local variables for a lambda function in Java. In one embodiment, the system includes: (1) a Java virtual machine and (2) a captured variable retriever that interacts with the Java virtual machine and configured to retrieve a signature of the lambda function from a classfile of a Java class containing the lambda function, compare the signature with a declaration of the lambda function to identify arguments corresponding to the captured local variables, modify the lambda function and cause the Java virtual machine to execute the modified lambda function.
    Type: Application
    Filed: February 15, 2016
    Publication date: August 17, 2017
    Inventors: Michael Lai, Vinod Grover, Sean Lee, Jaydeep Marathe
  • Patent number: 9678775
    Abstract: Computer code written to execute on a multi-threaded computing environment is transformed into code designed to execute on a single-threaded computing environment and simulate concurrent executing threads. Optimization techniques during the transformation process are utilized to identify local variables for scalar expansion. A first set of local variables is defined that includes those local variables in the code identified as “Downward exposed Defined” (DD). A second set of local variables is defined that includes those local variables in the code identified as “Upward exposed Use” (UU). The intersection of the first set and the second set identifies local variables for scalar expansion.
    Type: Grant
    Filed: February 26, 2009
    Date of Patent: June 13, 2017
    Assignee: NVIDIA Corporation
    Inventors: Vinod Grover, John A. Stratton
  • Publication number: 20170147299
    Abstract: A system and method for optimizing multiple invocations of a graphics processing unit (GPU) program in Java. In one embodiment, the system includes: (1) a frontend component in a computer system and configured to compile Java bytecode associated with the a class object that implements a functional interface into Intermediate Representation (IR) code and store the IR code with the associated jogArray and (2) a collector/composer component in the computer system, associated with the frontend and configured to traverse a tree containing the multiple invocations from the result to collect the IR code and compose the IR code collected in the traversing into aggregate IR code when a result of the GPU program is explicitly requested to be transferred to a host.
    Type: Application
    Filed: November 24, 2015
    Publication date: May 25, 2017
    Inventors: Michael Lai, Vinod Grover, Sean Lee, Jaydeep Marathe
  • Patent number: 9639336
    Abstract: One embodiment of the present invention sets forth a technique for reducing the number of assembly instructions included in a computer program. The technique involves receiving a directed acyclic graph (DAG) that includes a plurality of nodes, where each node includes an assembly instruction of the computer program, hierarchically parsing the plurality of nodes to identify at least two assembly instructions that are vectorizable and can be replaced by a single vectorized assembly instruction, and replacing the at least two assembly instructions with the single vectorized assembly instruction.
    Type: Grant
    Filed: October 25, 2012
    Date of Patent: May 2, 2017
    Assignee: NVIDIA Corporation
    Inventors: Vinod Grover, Manjunath Kudlur, Michael Murphy
  • Patent number: 9612811
    Abstract: One embodiment of the present invention sets forth a method for causing thread convergence. The method includes determining that a control flow graph representing a first section of a program includes at least two non-overlapping paths that extend from a first divergent node to a candidate node. The method also includes determining that the first divergent node is not a dominator of the candidate node or that the candidate node is not a post-dominator of the first divergent node. The method further includes identifying an external node and inserting a first instruction configured to cause a predicate variable to be set to true for a first set of threads that is to execute the external node. The method additionally includes inserting into the program a second divergent node configured to cause various threads to execute or not execute a first control flow path associated with the external node.
    Type: Grant
    Filed: January 21, 2014
    Date of Patent: April 4, 2017
    Assignee: NVIDIA Corporation
    Inventors: Amit Jayant Sabne, Yuan Lin, Vinod Grover
  • Patent number: 9563933
    Abstract: Various disclosed embodiments are directed to methods and systems for reducing memory space in sequential computer-implemented operations. The method includes generating a directed acyclic graph (DAG) having a plurality of vertices and directed edges, wherein each edge connects a predecessor vertex to a successor vertex. Each vertex represents one of the computer-implemented operations and each directed edge represents output data generated by the operations. The method includes merging one of the predecessor vertex with one of the successor vertex by combining the operations of the predecessor vertex and the successor vertex if the predecessor and successor vertices are connected by a directed edge and there is only one directed edge originating from the predecessor vertex. The merger of the predecessor and the successor vertices reduces the number of directed edges in the DAG, resulting in a reduction of intermediate buffer memory required to store the output data.
    Type: Grant
    Filed: January 28, 2014
    Date of Patent: February 7, 2017
    Assignee: Nvidia Corporation
    Inventors: Vinod Grover, Mahesh Ravishankar
  • Publication number: 20160350088
    Abstract: A compiler and a method of compiling code that reduces memory bandwidth when processing code on a computer are provided herein. In one embodiment, the method includes: (1) automatically identifying a sequence of operations for fusing, wherein the sequence of operations correspond to instructions from a source code, (2) determining subdivisions of a final output of the sequence of operations, (3) determining input data and intermediate operations needed to obtain a final subdivision output for each of the subdivisions and (4) automatically generating code to fuse the sequence of operations employing the subdivisions, wherein the automatically identifying and the automatically generating are performed by a processor.
    Type: Application
    Filed: May 27, 2015
    Publication date: December 1, 2016
    Inventors: Mahesh Ravishankar, Paulius Micikevicius, Vinod Grover
  • Patent number: 9448779
    Abstract: One embodiment of the present invention sets forth a technique for translating application programs written using a parallel programming model for execution on multi-core graphics processing unit (GPU) for execution by general purpose central processing unit (CPU). Portions of the application program that rely on specific features of the multi-core GPU are converted by a translator for execution by a general purpose CPU. The application program is partitioned into regions of synchronization independent instructions. The instructions are classified as convergent or divergent and divergent memory references that are shared between regions are replicated. Thread loops are inserted to ensure correct sharing of memory between various threads during execution by the general purpose CPU.
    Type: Grant
    Filed: March 20, 2009
    Date of Patent: September 20, 2016
    Assignee: NVIDIA Corporation
    Inventors: Vinod Grover, Bastiaan Joannes Matheus Aarts, Michael Murphy, Jayant B. Kolhe, John Bryan Pormann, Douglas Saylor
  • Patent number: 9436447
    Abstract: A device compiler and linker within a parallel processing unit (PPU) is configured to optimize program code of a co-processor enabled application by rematerializing a subset of live-in variables for a particular block in a control flow graph generated for that program code. The device compiler and linker identifies the block of the control flow graph that has the greatest number of live-in variables, then selects a subset of the live-in variables associated with the identified block for which rematerializing confers the greatest estimated profitability. The profitability of rematerializing a given subset of live-in variables is determined based on the number of live-in variables reduced, the cost of rematerialization, and the potential risk of rematerialization.
    Type: Grant
    Filed: November 5, 2012
    Date of Patent: September 6, 2016
    Assignee: NVIDIA Corporation
    Inventors: Xiangyun Kong, Jian-Zhong Wang, Yuan Lin, Vinod Grover
  • Patent number: 9411715
    Abstract: A system, method, and computer program product for optimizing thread stack memory allocation is disclosed. The method includes the steps of receiving source code for a program, translating the source code into an intermediate representation, analyzing the intermediate representation to identify at least two objects that could use a first allocated memory space in a thread stack memory, and modifying the intermediate representation by replacing references to a first object of the at least two objects with a reference to a second object of the at least two objects.
    Type: Grant
    Filed: December 12, 2012
    Date of Patent: August 9, 2016
    Assignee: NVIDIA Corporation
    Inventors: Adriana Maria Susnea, Vinod Grover, Sean Youngsung Lee