Patents by Inventor Vinod Grover

Vinod Grover has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20160188352
    Abstract: A system and method for processing source code for compilation. The method includes accessing a portion of host source code and determining whether the portion of the host source code comprises a device lambda expression. The method further includes in response to the portion of host code comprising the device lambda expression, determining a unique placeholder type instantiation based on the device lambda expression and modifying the device lambda expression based on the unique placeholder type instantiation to produce modified host source code. The method further includes sending the modified host source code to a host compiler.
    Type: Application
    Filed: December 14, 2015
    Publication date: June 30, 2016
    Inventors: Jaydeep MARATHE, Vinod GROVER
  • Patent number: 9367306
    Abstract: A technique is disclosed for executing a program designed for multi-threaded operation on a general purpose processor. Original source code for the program is transformed from a multi-threaded structure into a computationally equivalent single-threaded structure. A transform operation modifies the original source code to insert code constructs for serial thread execution. The transform operation also replaces synchronization barrier constructs in the original source code with synchronization barrier code that is configured to facilitate serialization. The transformed source code may then be conventionally compiled and advantageously executed on the general purpose processor.
    Type: Grant
    Filed: March 30, 2011
    Date of Patent: June 14, 2016
    Assignee: NVIDIA CORPORATION
    Inventors: Jaydeep Marathe, Vinod Grover
  • Patent number: 9361079
    Abstract: A technique is disclosed for executing a compiled parallel application on a general purpose processor. The compiled parallel application comprises parallel thread execution code, which includes single-instruction multiple-data (SIMD) constructs, as well as references to intrinsic functions conventionally available in a graphics processing unit. The parallel thread execution code is transformed into an intermediate representation, which includes vector instruction constructs. The SIMD constructs are mapped to vector instructions available within the intermediate representation. Intrinsic functions are mapped to corresponding emulated runtime implementations. The technique advantageously enables parallel applications compiled for execution on a graphics processing unit to be executed on a general purpose central processing unit configured to support vector instructions.
    Type: Grant
    Filed: January 30, 2012
    Date of Patent: June 7, 2016
    Assignee: NVIDIA Corporation
    Inventors: Vinod Grover, Andrew Kerr, Sean Lee
  • Patent number: 9292265
    Abstract: Basic blocks within a thread program are characterized for convergence based on variance analysis or corresponding instructions. Each basic block is marked as divergent based on transitive control dependence on a block that is either divergent or comprising a variant branch condition. Convergent basic blocks that are defined by invariant instructions are advantageously identified as candidates for scalarization by a thread program compiler.
    Type: Grant
    Filed: May 9, 2012
    Date of Patent: March 22, 2016
    Assignee: NVIDIA Corporation
    Inventors: Vinod Grover, Yunsup Lee, Xiangyun Kong, Gautam Chakrabarti, Ronny M. Krashinsky
  • Publication number: 20160019066
    Abstract: A method, system, and computer program product for executing divergent threads using a convergence barrier are disclosed. A first instruction in a program is executed by a plurality of threads, where the first instruction, when executed by a particular thread, indicates to a scheduler unit that the thread participates in a convergence barrier. A first path through the program is executed by a first divergent portion of the participating threads and a second path through the program is executed by a second divergent portion of the participating threads. The first divergent portion of the participating threads executes a second instruction in the program and transitions to a blocked state at the convergence barrier. The scheduler unit determines that all of the participating threads are synchronized at the convergence barrier and the convergence barrier is cleared.
    Type: Application
    Filed: July 13, 2015
    Publication date: January 21, 2016
    Inventors: Gregory Frederick Diamos, Richard Craig Johnson, Vinod Grover, Olivier Giroux, Jack H. Choquette, Michael Alan Fetterman, Ajay S. Tirumala, Peter Nelson, Ronny Meir Krashinsky
  • Publication number: 20160011857
    Abstract: Compiler techniques for inline parallelism and re-targetable parallel runtime execution of logic iterators enables selection thereof from the source code or dynamically during the object code execution.
    Type: Application
    Filed: January 21, 2015
    Publication date: January 14, 2016
    Applicant: NVIDIA CORPORATION
    Inventors: Vinod Grover, Thibaut Lutz
  • Publication number: 20150331700
    Abstract: A solution is proposed for implementing staging in computer programs and code specialization at runtime. Even when values are not known at compile time, many of the values used as parameters for a code section or a function are constant, and are known prior to starting the computation of the algorithm. Embodiments of the claimed subject matter propagate these values just before execution in the same way a compiler would if they were compile time constant, resulting in improved control flow and significant simplification in the computation involved.
    Type: Application
    Filed: May 15, 2015
    Publication date: November 19, 2015
    Inventors: Vinod Grover, Thibaut Lutz
  • Publication number: 20150212933
    Abstract: Various disclosed embodiments are directed to methods and systems for reducing memory space in sequential computer-implemented operations. The method includes generating a directed acyclic graph (DAG) having a plurality of vertices and directed edges, wherein each edge connects a predecessor vertex to a successor vertex. Each vertex represents one of the computer-implemented operations and each directed edge represents output data generated by the operations. The method includes merging one of the predecessor vertex with one of the successor vertex by combining the operations of the predecessor vertex and the successor vertex if the predecessor and successor vertices are connected by a directed edge and there is only one directed edge originating from the predecessor vertex. The merger of the predecessor and the successor vertices reduces the number of directed edges in the DAG, resulting in a reduction of intermediate buffer memory required to store the output data.
    Type: Application
    Filed: January 28, 2014
    Publication date: July 30, 2015
    Applicant: Nvidia Corporation
    Inventors: Vinod Grover, Mahesh Ravishankar
  • Publication number: 20150205590
    Abstract: One embodiment of the present invention sets forth a method for causing thread convergence. The method includes determining that a control flow graph representing a first section of a program includes at least two non-overlapping paths that extend from a first divergent node to a candidate node. The method also includes determining that the first divergent node is not a dominator of the candidate node or that the candidate node is not a post-dominator of the first divergent node. The method further includes identifying an external node and inserting a first instruction configured to cause a predicate variable to be set to true for a first set of threads that is to execute the external node. The method additionally includes inserting into the program a second divergent node configured to cause various threads to execute or not execute a first control flow path associated with the external node.
    Type: Application
    Filed: January 21, 2014
    Publication date: July 23, 2015
    Applicant: NVIDIA CORPORATION
    Inventors: Amit Jayant SABNE, Yuan LIN, Vinod GROVER
  • Patent number: 9086933
    Abstract: A system and method are provided for launching a callable function. A processing system includes a host processor, a graphics processing unit, and a driver for launching a callable function. The driver is adapted to recognize at load time of a program that a first function within the program is a callable function. The driver is further adapted to generate a second function. The second function is adapted to receive arguments and translate the arguments from a calling convention for launching a function into a calling convention for calling a callable function. The second function is further adapted to call the first function using the translated arguments. The driver is also adapted to receive from the host processor or the GPU a procedure call representing a launch of the first function and, in response, launch the second function.
    Type: Grant
    Filed: October 1, 2012
    Date of Patent: July 21, 2015
    Assignee: NVIDIA CORPORATION
    Inventors: Bastiaan Aarts, Luke Durant, Girish Bharambe, Vinod Grover
  • Patent number: 9038080
    Abstract: A system and method for detecting, filtering, prioritizing and reporting shared memory hazards are disclosed. The method includes, for a unit of hardware operating on a block of threads, mapping a plurality of shared memory locations assigned to the unit to a tracking table. The tracking table comprises initialization information for each shared memory location. The method also includes, for an instruction of a program within a barrier region, identifying a potential conflict by identifying a second access to a location in shared memory within a block of threads executed by the hardware unit. First information associated with a first access and second information associated with the second access to the location is determined. Filter criteria is applied to the first and second information to determine whether the instruction causes a reportable hazard. The instruction is reported when it causes the reportable hazard.
    Type: Grant
    Filed: December 27, 2012
    Date of Patent: May 19, 2015
    Assignee: NVIDIA CORPORATION
    Inventors: Vyas Venkataraman, Manjunath Kudlur, Vinod Grover
  • Patent number: 9009686
    Abstract: One embodiment of the present invention sets forth a technique for extracting a memory address offset from a 64-bit type-conversion expression included in high-level source code of a computer program. The technique involves receiving the 64-bit type-conversion expression, where the 64-bit type-conversion expression includes one or more 32-bit expressions, determining a range for each of the one or more 32-bit expressions, calculating a total range by summing the ranges of the 32-bit expressions, determining that the total range is a subset of a range for a 32-bit unsigned integer, calculating the memory address offset based on the ranges for the one or more 32-bit expressions, and generating at least one assembly-level instruction that references the memory address offset.
    Type: Grant
    Filed: October 24, 2012
    Date of Patent: April 14, 2015
    Assignee: NVIDIA Corporation
    Inventors: Xiangyun Kong, Jian-Zhong Wang, Vinod Grover
  • Patent number: 8984498
    Abstract: One embodiment of the present invention sets forth a technique for translating application programs written using a parallel programming model for execution on multi-core graphics processing unit (GPU) for execution by general purpose central processing unit (CPU). Portions of the application program that rely on specific features of the multi-core GPU are converted by a translator for execution by a general purpose CPU. The application program is partitioned into regions of synchronization independent instructions. The instructions are classified as convergent or divergent and divergent memory references that are shared between regions are replicated. Thread loops are inserted to ensure correct sharing of memory between various threads during execution by the general purpose CPU.
    Type: Grant
    Filed: March 31, 2009
    Date of Patent: March 17, 2015
    Assignee: Nvidia Corporation
    Inventors: Vinod Grover, Bastiaan Joannes Matheus Aarts, Michael Murphy
  • Patent number: 8776030
    Abstract: One embodiment of the present invention sets forth a technique for translating application programs written using a parallel programming model for execution on multi-core graphics processing unit (GPU) for execution by general purpose central processing unit (CPU). Portions of the application program that rely on specific features of the multi-core GPU are converted by a translator for execution by a general purpose CPU. The application program is partitioned into regions of synchronization independent instructions. The instructions are classified as convergent or divergent and divergent memory references that are shared between regions are replicated. Thread loops are inserted to ensure correct sharing of memory between various threads during execution by the general purpose CPU.
    Type: Grant
    Filed: March 31, 2009
    Date of Patent: July 8, 2014
    Assignee: NVIDIA Corporation
    Inventors: Vinod Grover, Bastiaan Joannes Matheus Aarts, Michael Murphy
  • Publication number: 20140164727
    Abstract: A system, method, and computer program product for optimizing thread stack memory allocation is disclosed. The method includes the steps of receiving source code for a program, translating the source code into an intermediate representation, analyzing the intermediate representation to identify at least two objects that could use a first allocated memory space in a thread stack memory, and modifying the intermediate representation by replacing references to a first object of the at least two objects with a reference to a second object of the at least two objects.
    Type: Application
    Filed: December 12, 2012
    Publication date: June 12, 2014
    Applicant: NVIDIA Corporation
    Inventors: Adriana Maria Susnea, Vinod Grover, Sean Youngsung Lee
  • Publication number: 20140143755
    Abstract: A system and method are provided for inserting synchronization statements into a program file to mitigate race conditions. The method includes reading a program file and determining one or more convergent statements in the program file. The method also includes inserting one or more synchronization statements in the program file between the determined convergent statements. The method further includes removing one or more of the inserted synchronization statements and writing the modified program file. The method may include, after removing the inserted synchronization statements, identifying to a user any remaining inserted synchronization statements.
    Type: Application
    Filed: November 20, 2012
    Publication date: May 22, 2014
    Applicant: Nvidia Corporation
    Inventors: Vinod Grover, Xiangyun Kong, Jae-Woo Lee, Manjunath Kudlur, Jian-Zhong Wang
  • Patent number: 8719514
    Abstract: A method and apparatus for utilizing hardware mechanisms of a transactional memory system is herein described. Various embodiments relate to software-based filtering of operations from read and write barriers and read isolation barriers during transactional execution. Other embodiments relate to software-implemented read barrier processing to accelerate strong atomicity. Other embodiments are also described and claimed.
    Type: Grant
    Filed: December 15, 2009
    Date of Patent: May 6, 2014
    Assignee: Intel Corporation
    Inventors: Ali-Reza Adl-Tabatabai, David Callahan, Jan Gray, Vinod Grover, Bratin Saha, Gad Sheaffer
  • Publication number: 20140096147
    Abstract: A system and method are provided for launching a callable function. A processing system includes a host processor, a graphics processing unit, and a driver for launching a callable function. The driver is adapted to recognize at load time of a program that a first function within the program is a callable function. The driver is further adapted to generate a second function. The second function is adapted to receive arguments and translate the arguments from a calling convention for launching a function into a calling convention for calling a callable function. The second function is further adapted to call the first function using the translated arguments. The driver is also adapted to receive from the host processor or the GPU a procedure call representing a launch of the first function and, in response, launch the second function.
    Type: Application
    Filed: October 1, 2012
    Publication date: April 3, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Bastiaan Aarts, Luke Durant, Girish Bharambe, Vinod Grover
  • Patent number: 8612732
    Abstract: One embodiment of the present invention sets forth a technique for translating application programs written using a parallel programming model for execution on multi-core graphics processing unit (GPU) for execution by general purpose central processing unit (CPU). Portions of the application program that rely on specific features of the multi-core GPU are converted by a translator for execution by a general purpose CPU. The application program is partitioned into regions of synchronization independent instructions. The instructions are classified as convergent or divergent and divergent memory references that are shared between regions are replicated. Thread loops are inserted to ensure correct sharing of memory between various threads during execution by the general purpose CPU.
    Type: Grant
    Filed: March 19, 2009
    Date of Patent: December 17, 2013
    Assignee: NVIDIA Corporation
    Inventors: Vinod Grover, Bastiaan Joannes Matheus Aarts, Michael Murphy, Boris Beylin, Jayant B. Kolhe, Douglas Saylor
  • Publication number: 20130305021
    Abstract: Basic blocks within a thread program are characterized for convergence based on variance analysis or corresponding instructions. Each basic block is marked as divergent based on transitive control dependence on a block that is either divergent or comprising a variant branch condition. Convergent basic blocks that are defined by invariant instructions are advantageously identified as candidates for scalarization by a thread program compiler.
    Type: Application
    Filed: May 9, 2012
    Publication date: November 14, 2013
    Inventors: Vinod GROVER, Yunsup LEE, Xiangyun KONG, Gautam CHAKRABARTI, Ronny M. KRASHINSKY