Patents by Inventor Vinod Grover

Vinod Grover has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

SYSTEM AND METHOD FOR COMPILER SUPPORT FOR COMPILE TIME CUSTOMIZATION OF CODE

Publication number: 20160188352

Abstract: A system and method for processing source code for compilation. The method includes accessing a portion of host source code and determining whether the portion of the host source code comprises a device lambda expression. The method further includes in response to the portion of host code comprising the device lambda expression, determining a unique placeholder type instantiation based on the device lambda expression and modifying the device lambda expression based on the unique placeholder type instantiation to produce modified host source code. The method further includes sending the modified host source code to a host compiler.

Type: Application

Filed: December 14, 2015

Publication date: June 30, 2016

Inventors: Jaydeep MARATHE, Vinod GROVER
Method for transforming a multithreaded program for general execution

Patent number: 9367306

Abstract: A technique is disclosed for executing a program designed for multi-threaded operation on a general purpose processor. Original source code for the program is transformed from a multi-threaded structure into a computationally equivalent single-threaded structure. A transform operation modifies the original source code to insert code constructs for serial thread execution. The transform operation also replaces synchronization barrier constructs in the original source code with synchronization barrier code that is configured to facilitate serialization. The transformed source code may then be conventionally compiled and advantageously executed on the general purpose processor.

Type: Grant

Filed: March 30, 2011

Date of Patent: June 14, 2016

Assignee: NVIDIA CORPORATION

Inventors: Jaydeep Marathe, Vinod Grover
Method for compiling a parallel thread execution program for general execution

Patent number: 9361079

Abstract: A technique is disclosed for executing a compiled parallel application on a general purpose processor. The compiled parallel application comprises parallel thread execution code, which includes single-instruction multiple-data (SIMD) constructs, as well as references to intrinsic functions conventionally available in a graphics processing unit. The parallel thread execution code is transformed into an intermediate representation, which includes vector instruction constructs. The SIMD constructs are mapped to vector instructions available within the intermediate representation. Intrinsic functions are mapped to corresponding emulated runtime implementations. The technique advantageously enables parallel applications compiled for execution on a graphics processing unit to be executed on a general purpose central processing unit configured to support vector instructions.

Type: Grant

Filed: January 30, 2012

Date of Patent: June 7, 2016

Assignee: NVIDIA Corporation

Inventors: Vinod Grover, Andrew Kerr, Sean Lee
Method for convergence analysis based on thread variance analysis

Patent number: 9292265

Abstract: Basic blocks within a thread program are characterized for convergence based on variance analysis or corresponding instructions. Each basic block is marked as divergent based on transitive control dependence on a block that is either divergent or comprising a variant branch condition. Convergent basic blocks that are defined by invariant instructions are advantageously identified as candidates for scalarization by a thread program compiler.

Type: Grant

Filed: May 9, 2012

Date of Patent: March 22, 2016

Assignee: NVIDIA Corporation

Inventors: Vinod Grover, Yunsup Lee, Xiangyun Kong, Gautam Chakrabarti, Ronny M. Krashinsky
EXECUTION OF DIVERGENT THREADS USING A CONVERGENCE BARRIER

Publication number: 20160019066

Abstract: A method, system, and computer program product for executing divergent threads using a convergence barrier are disclosed. A first instruction in a program is executed by a plurality of threads, where the first instruction, when executed by a particular thread, indicates to a scheduler unit that the thread participates in a convergence barrier. A first path through the program is executed by a first divergent portion of the participating threads and a second path through the program is executed by a second divergent portion of the participating threads. The first divergent portion of the participating threads executes a second instruction in the program and transitions to a blocked state at the convergence barrier. The scheduler unit determines that all of the participating threads are synchronized at the convergence barrier and the convergence barrier is cleared.

Type: Application

Filed: July 13, 2015

Publication date: January 21, 2016

Inventors: Gregory Frederick Diamos, Richard Craig Johnson, Vinod Grover, Olivier Giroux, Jack H. Choquette, Michael Alan Fetterman, Ajay S. Tirumala, Peter Nelson, Ronny Meir Krashinsky
Dynamic Compiler Parallelism Techniques

Publication number: 20160011857

Abstract: Compiler techniques for inline parallelism and re-targetable parallel runtime execution of logic iterators enables selection thereof from the source code or dynamically during the object code execution.

Type: Application

Filed: January 21, 2015

Publication date: January 14, 2016

Applicant: NVIDIA CORPORATION

Inventors: Vinod Grover, Thibaut Lutz
PARTIAL PROGRAM SPECIALIZATION AT RUNTIME

Publication number: 20150331700

Abstract: A solution is proposed for implementing staging in computer programs and code specialization at runtime. Even when values are not known at compile time, many of the values used as parameters for a code section or a function are constant, and are known prior to starting the computation of the algorithm. Embodiments of the claimed subject matter propagate these values just before execution in the same way a compiler would if they were compile time constant, resulting in improved control flow and significant simplification in the computation involved.

Type: Application

Filed: May 15, 2015

Publication date: November 19, 2015

Inventors: Vinod Grover, Thibaut Lutz
METHODS FOR REDUCING MEMORY SPACE IN SEQUENTIAL OPERATIONS USING DIRECTED ACYCLIC GRAPHS

Publication number: 20150212933

Abstract: Various disclosed embodiments are directed to methods and systems for reducing memory space in sequential computer-implemented operations. The method includes generating a directed acyclic graph (DAG) having a plurality of vertices and directed edges, wherein each edge connects a predecessor vertex to a successor vertex. Each vertex represents one of the computer-implemented operations and each directed edge represents output data generated by the operations. The method includes merging one of the predecessor vertex with one of the successor vertex by combining the operations of the predecessor vertex and the successor vertex if the predecessor and successor vertices are connected by a directed edge and there is only one directed edge originating from the predecessor vertex. The merger of the predecessor and the successor vertices reduces the number of directed edges in the DAG, resulting in a reduction of intermediate buffer memory required to store the output data.

Type: Application

Filed: January 28, 2014

Publication date: July 30, 2015

Applicant: Nvidia Corporation

Inventors: Vinod Grover, Mahesh Ravishankar
CONFLUENCE ANALYSIS AND LOOP FAST-FORWARDING FOR IMPROVING SIMD EXECUTION EFFICIENCY

Publication number: 20150205590

Abstract: One embodiment of the present invention sets forth a method for causing thread convergence. The method includes determining that a control flow graph representing a first section of a program includes at least two non-overlapping paths that extend from a first divergent node to a candidate node. The method also includes determining that the first divergent node is not a dominator of the candidate node or that the candidate node is not a post-dominator of the first divergent node. The method further includes identifying an external node and inserting a first instruction configured to cause a predicate variable to be set to true for a first set of threads that is to execute the external node. The method additionally includes inserting into the program a second divergent node configured to cause various threads to execute or not execute a first control flow path associated with the external node.

Type: Application

Filed: January 21, 2014

Publication date: July 23, 2015

Applicant: NVIDIA CORPORATION

Inventors: Amit Jayant SABNE, Yuan LIN, Vinod GROVER
System and method for launching callable functions

Patent number: 9086933

Abstract: A system and method are provided for launching a callable function. A processing system includes a host processor, a graphics processing unit, and a driver for launching a callable function. The driver is adapted to recognize at load time of a program that a first function within the program is a callable function. The driver is further adapted to generate a second function. The second function is adapted to receive arguments and translate the arguments from a calling convention for launching a function into a calling convention for calling a callable function. The second function is further adapted to call the first function using the translated arguments. The driver is also adapted to receive from the host processor or the GPU a procedure call representing a launch of the first function and, in response, launch the second function.

Type: Grant

Filed: October 1, 2012

Date of Patent: July 21, 2015

Assignee: NVIDIA CORPORATION

Inventors: Bastiaan Aarts, Luke Durant, Girish Bharambe, Vinod Grover
Method and system for heterogeneous filtering framework for shared memory data access hazard reports

Patent number: 9038080

Abstract: A system and method for detecting, filtering, prioritizing and reporting shared memory hazards are disclosed. The method includes, for a unit of hardware operating on a block of threads, mapping a plurality of shared memory locations assigned to the unit to a tracking table. The tracking table comprises initialization information for each shared memory location. The method also includes, for an instruction of a program within a barrier region, identifying a potential conflict by identifying a second access to a location in shared memory within a block of threads executed by the hardware unit. First information associated with a first access and second information associated with the second access to the location is determined. Filter criteria is applied to the first and second information to determine whether the instruction causes a reportable hazard. The instruction is reported when it causes the reportable hazard.

Type: Grant

Filed: December 27, 2012

Date of Patent: May 19, 2015

Assignee: NVIDIA CORPORATION

Inventors: Vyas Venkataraman, Manjunath Kudlur, Vinod Grover
Algorithm for 64-bit address mode optimization

Patent number: 9009686

Abstract: One embodiment of the present invention sets forth a technique for extracting a memory address offset from a 64-bit type-conversion expression included in high-level source code of a computer program. The technique involves receiving the 64-bit type-conversion expression, where the 64-bit type-conversion expression includes one or more 32-bit expressions, determining a range for each of the one or more 32-bit expressions, calculating a total range by summing the ranges of the 32-bit expressions, determining that the total range is a subset of a range for a 32-bit unsigned integer, calculating the memory address offset based on the ranges for the one or more 32-bit expressions, and generating at least one assembly-level instruction that references the memory address offset.

Type: Grant

Filed: October 24, 2012

Date of Patent: April 14, 2015

Assignee: NVIDIA Corporation

Inventors: Xiangyun Kong, Jian-Zhong Wang, Vinod Grover
Variance analysis for translating CUDA code for execution by a general purpose processor

Patent number: 8984498

Abstract: One embodiment of the present invention sets forth a technique for translating application programs written using a parallel programming model for execution on multi-core graphics processing unit (GPU) for execution by general purpose central processing unit (CPU). Portions of the application program that rely on specific features of the multi-core GPU are converted by a translator for execution by a general purpose CPU. The application program is partitioned into regions of synchronization independent instructions. The instructions are classified as convergent or divergent and divergent memory references that are shared between regions are replicated. Thread loops are inserted to ensure correct sharing of memory between various threads during execution by the general purpose CPU.

Type: Grant

Filed: March 31, 2009

Date of Patent: March 17, 2015

Assignee: Nvidia Corporation

Inventors: Vinod Grover, Bastiaan Joannes Matheus Aarts, Michael Murphy
Partitioning CUDA code for execution by a general purpose processor

Patent number: 8776030

Abstract: One embodiment of the present invention sets forth a technique for translating application programs written using a parallel programming model for execution on multi-core graphics processing unit (GPU) for execution by general purpose central processing unit (CPU). Portions of the application program that rely on specific features of the multi-core GPU are converted by a translator for execution by a general purpose CPU. The application program is partitioned into regions of synchronization independent instructions. The instructions are classified as convergent or divergent and divergent memory references that are shared between regions are replicated. Thread loops are inserted to ensure correct sharing of memory between various threads during execution by the general purpose CPU.

Type: Grant

Filed: March 31, 2009

Date of Patent: July 8, 2014

Assignee: NVIDIA Corporation

Inventors: Vinod Grover, Bastiaan Joannes Matheus Aarts, Michael Murphy
SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR OPTIMIZING THE MANAGEMENT OF THREAD STACK MEMORY

Publication number: 20140164727

Abstract: A system, method, and computer program product for optimizing thread stack memory allocation is disclosed. The method includes the steps of receiving source code for a program, translating the source code into an intermediate representation, analyzing the intermediate representation to identify at least two objects that could use a first allocated memory space in a thread stack memory, and modifying the intermediate representation by replacing references to a first object of the at least two objects with a reference to a second object of the at least two objects.

Type: Application

Filed: December 12, 2012

Publication date: June 12, 2014

Applicant: NVIDIA Corporation

Inventors: Adriana Maria Susnea, Vinod Grover, Sean Youngsung Lee
SYSTEM AND METHOD FOR INSERTING SYNCHRONIZATION STATEMENTS INTO A PROGRAM FILE TO MITIGATE RACE CONDITIONS

Publication number: 20140143755

Abstract: A system and method are provided for inserting synchronization statements into a program file to mitigate race conditions. The method includes reading a program file and determining one or more convergent statements in the program file. The method also includes inserting one or more synchronization statements in the program file between the determined convergent statements. The method further includes removing one or more of the inserted synchronization statements and writing the modified program file. The method may include, after removing the inserted synchronization statements, identifying to a user any remaining inserted synchronization statements.

Type: Application

Filed: November 20, 2012

Publication date: May 22, 2014

Applicant: Nvidia Corporation

Inventors: Vinod Grover, Xiangyun Kong, Jae-Woo Lee, Manjunath Kudlur, Jian-Zhong Wang
Software filtering in a transactional memory system

Patent number: 8719514

Abstract: A method and apparatus for utilizing hardware mechanisms of a transactional memory system is herein described. Various embodiments relate to software-based filtering of operations from read and write barriers and read isolation barriers during transactional execution. Other embodiments relate to software-implemented read barrier processing to accelerate strong atomicity. Other embodiments are also described and claimed.

Type: Grant

Filed: December 15, 2009

Date of Patent: May 6, 2014

Assignee: Intel Corporation

Inventors: Ali-Reza Adl-Tabatabai, David Callahan, Jan Gray, Vinod Grover, Bratin Saha, Gad Sheaffer
SYSTEM AND METHOD FOR LAUNCHING CALLABLE FUNCTIONS

Publication number: 20140096147

Abstract: A system and method are provided for launching a callable function. A processing system includes a host processor, a graphics processing unit, and a driver for launching a callable function. The driver is adapted to recognize at load time of a program that a first function within the program is a callable function. The driver is further adapted to generate a second function. The second function is adapted to receive arguments and translate the arguments from a calling convention for launching a function into a calling convention for calling a callable function. The second function is further adapted to call the first function using the translated arguments. The driver is also adapted to receive from the host processor or the GPU a procedure call representing a launch of the first function and, in response, launch the second function.

Type: Application

Filed: October 1, 2012

Publication date: April 3, 2014

Applicant: NVIDIA CORPORATION

Inventors: Bastiaan Aarts, Luke Durant, Girish Bharambe, Vinod Grover
Retargetting an application program for execution by a general purpose processor

Patent number: 8612732

Abstract: One embodiment of the present invention sets forth a technique for translating application programs written using a parallel programming model for execution on multi-core graphics processing unit (GPU) for execution by general purpose central processing unit (CPU). Portions of the application program that rely on specific features of the multi-core GPU are converted by a translator for execution by a general purpose CPU. The application program is partitioned into regions of synchronization independent instructions. The instructions are classified as convergent or divergent and divergent memory references that are shared between regions are replicated. Thread loops are inserted to ensure correct sharing of memory between various threads during execution by the general purpose CPU.

Type: Grant

Filed: March 19, 2009

Date of Patent: December 17, 2013

Assignee: NVIDIA Corporation

Inventors: Vinod Grover, Bastiaan Joannes Matheus Aarts, Michael Murphy, Boris Beylin, Jayant B. Kolhe, Douglas Saylor
METHOD FOR CONVERGENCE ANALYSIS BASED ON THREAD VARIANCE ANALYSIS

Publication number: 20130305021

Abstract: Basic blocks within a thread program are characterized for convergence based on variance analysis or corresponding instructions. Each basic block is marked as divergent based on transitive control dependence on a block that is either divergent or comprising a variant branch condition. Convergent basic blocks that are defined by invariant instructions are advantageously identified as candidates for scalarization by a thread program compiler.

Type: Application

Filed: May 9, 2012

Publication date: November 14, 2013

Inventors: Vinod GROVER, Yunsup LEE, Xiangyun KONG, Gautam CHAKRABARTI, Ronny M. KRASHINSKY

prev 1 2 3 4 5 next