Patents by Inventor Vinod Grover

Vinod Grover has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20130300752
    Abstract: A system and method for compiling source code (e.g., with a compiler). The method includes accessing a portion of device source code and determining whether the portion of the device source code comprises a piece of work to be launched on a device from the device. The method further includes determining a plurality of application programming interface (API) calls based on the piece of work to be launched on the device and generating compiled code based on the plurality of API calls. The compiled code comprises a first portion operable to execute on a central processing unit (CPU) and a second portion operable to execute on the device (e.g., GPU).
    Type: Application
    Filed: January 7, 2013
    Publication date: November 14, 2013
    Applicant: NVIDIA CORPORATION
    Inventors: Vinod Grover, Jaydeep Marathe, Sean Lee
  • Publication number: 20130305252
    Abstract: A system and method for detecting, filtering, prioritizing and reporting shared memory hazards are disclosed. The method includes, for a unit of hardware operating on a block of threads, mapping a plurality of shared memory locations assigned to the unit to a tracking table. The tracking table comprises initialization information for each shared memory location. The method also includes, for an instruction of a program within a barrier region, identifying a potential conflict by identifying a second access to a location in shared memory within a block of threads executed by the hardware unit. First information associated with a first access and second information associated with the second access to the location is determined. Filter criteria is applied to the first and second information to determine whether the instruction causes a reportable hazard. The instruction is reported when it causes the reportable hazard.
    Type: Application
    Filed: December 27, 2012
    Publication date: November 14, 2013
    Applicant: NVIDIA CORPORATION
    Inventors: Vyas Venkataraman, Manjunath Kudlur, Vinod Grover
  • Publication number: 20130304996
    Abstract: A system and method for detecting shared memory hazards are disclosed. The method includes, for a unit of hardware operating on a block of threads, mapping a plurality of shared memory locations assigned to the unit to a tracking table. The tracking table comprises an initialization bit as well as access type information, collectively called the state tracking bits for each shared memory location. The method also includes, for an instruction of a program within a barrier region, identifying a second access to a location in shared memory within a block of threads executed by the hardware unit. The second access is identified based on a status of the state tracking bits. The method also includes determining a hazard based on a first type of access and a second type of access to the shared memory location. Information related to the first access is provided in the table.
    Type: Application
    Filed: December 27, 2012
    Publication date: November 14, 2013
    Applicant: NVIDIA Corporation
    Inventors: Vyas Venkataraman, Jaydeep Marathe, Manjunath Kudlur, Vinod Grover, Geoffrey Gerfin, Alban Douillet, Mayank Kaushik
  • Patent number: 8572588
    Abstract: One embodiment of the present invention sets forth a technique for translating application programs written using a parallel programming model for execution on multi-core graphics processing unit (GPU) for execution by general purpose central processing unit (CPU). Portions of the application program that rely on specific features of the multi-core GPU are converted by a translator for execution by a general purpose CPU. The application program is partitioned into regions of synchronization independent instructions. The instructions are classified as convergent or divergent and divergent memory references that are shared between regions are replicated. Thread loops are inserted to ensure correct sharing of memory between various threads during execution by the general purpose CPU.
    Type: Grant
    Filed: March 31, 2009
    Date of Patent: October 29, 2013
    Assignee: Nvidia Corporation
    Inventors: Vinod Grover, Bastiaan Joannes Matheus Aarts, Michael Murphy
  • Publication number: 20130198494
    Abstract: A technique is disclosed for executing a compiled parallel application on a general purpose processor. The compiled parallel application comprises parallel thread execution code, which includes single-instruction multiple-data (SIMD) constructs, as well as references to intrinsic functions conventionally available in a graphics processing unit. The parallel thread execution code is transformed into an intermediate representation, which includes vector instruction constructs. The SIMD constructs are mapped to vector instructions available within the intermediate representation. Intrinsic functions are mapped to corresponding emulated runtime implementations. The technique advantageously enables parallel applications compiled for execution on a graphics processing unit to be executed on a general purpose central processing unit configured to support vector instructions.
    Type: Application
    Filed: January 30, 2012
    Publication date: August 1, 2013
    Inventors: Vinod Grover, Andrew Kerr, Sean Lee
  • Patent number: 8365016
    Abstract: In one embodiment, the present invention includes a method for selecting a first transaction execution mode to begin a first transaction in a unbounded transactional memory (UTM) system having a plurality of transaction execution modes. These transaction execution modes include hardware modes to execute within a cache memory of a processor, a hardware assisted mode to execute using transactional hardware of the processor and a software buffer, and a software transactional memory (STM) mode to execute without the transactional hardware. The first transaction execution mode can be selected to be a highest performant of the hardware modes if no pending transaction is executing in the STM mode, otherwise a lower performant mode can be selected. Other embodiments are described and claimed.
    Type: Grant
    Filed: November 30, 2011
    Date of Patent: January 29, 2013
    Assignee: Intel Corporation
    Inventors: Jan Gray, Martin Taillefer, Yossi Levanoni, Ali-Reza Adl-Tabatabai, Dave Detlefs, Vinod Grover, Mike Magruder, Matt Tolton, Bratin Saha, Gad Sheaffer, Vadim Bassin
  • Publication number: 20120254875
    Abstract: A technique is disclosed for executing a program designed for multi-threaded operation on a general purpose processor. Original source code for the program is transformed from a multi-threaded structure into a computationally equivalent single-threaded structure. A transform operation modifies the original source code to insert code constructs for serial thread execution. The transform operation also replaces synchronization barrier constructs in the original source code with synchronization barrier code that is configured to facilitate serialization. The transformed source code may then be conventionally compiled and advantageously executed on the general purpose processor.
    Type: Application
    Filed: March 30, 2011
    Publication date: October 4, 2012
    Inventors: Jaydeep MARATHE, Vinod Grover
  • Patent number: 8266604
    Abstract: Transactional memory compatibility type attributes are associated with intermediate language code to specify, for example, that intermediate language code must be run within a transaction, or must not be run within a transaction, or may be run within a transaction. Attributes are automatically produced while generating intermediate language code from annotated source code. Default rules also generate attributes. Tools use attributes to statically or dynamically check for incompatibility between intermediate language code and a transactional memory implementation.
    Type: Grant
    Filed: January 26, 2009
    Date of Patent: September 11, 2012
    Assignee: Microsoft Corporation
    Inventors: Dana Groff, Yosseff Levanoni, Stephen Toub, Michael McKenzie Magruder, Weirong Zhu, Timothy Lawrence Harris, Christopher William Dern, John Joseph Duffy, David Detlefs, Martin Abadi, Sukhdeep Singh Sodhi, Lingli Zhang, Alexander Dadiomov, Vinod Grover
  • Publication number: 20120079215
    Abstract: In one embodiment, the present invention includes a method for selecting a first transaction execution mode to begin a first transaction in a unbounded transactional memory (UTM) system having a plurality of transaction execution modes. These transaction execution modes include hardware modes to execute within a cache memory of a processor, a hardware assisted mode to execute using transactional hardware of the processor and a software buffer, and a software transactional memory (STM) mode to execute without the transactional hardware. The first transaction execution mode can be selected to be a highest performant of the hardware modes if no pending transaction is executing in the STM mode, otherwise a lower performant mode can be selected. Other embodiments are described and claimed.
    Type: Application
    Filed: November 30, 2011
    Publication date: March 29, 2012
    Inventors: Jan Gray, Martin Taillefer, Yossi Levanoni, Ali-Reza Adl-Tabatabai, Dave Detlefs, Vinod Grover, Mike Magruder, Matt Tolton, Bratin Saha, Gad Sheaffer, Vadim Bassin
  • Patent number: 8095824
    Abstract: In one embodiment, the present invention includes a method for selecting a first transaction execution mode to begin a first transaction in a unbounded transactional memory (UTM) system having a plurality of transaction execution modes. These transaction execution modes include hardware modes to execute within a cache memory of a processor, a hardware assisted mode to execute using transactional hardware of the processor and a software buffer, and a software transactional memory (STM) mode to execute without the transactional hardware. The first transaction execution mode can be selected to be a highest performant of the hardware modes if no pending transaction is executing in the STM mode, otherwise a lower performant mode can be selected. Other embodiments are described and claimed.
    Type: Grant
    Filed: December 15, 2009
    Date of Patent: January 10, 2012
    Assignee: Intel Corporation
    Inventors: Jan Gray, Martin Taillefer, Yossi Levanoni, Ali-Reza Adl-Tabatabai, Dave Detlefs, Vinod Grover, Mike Magruder, Matt Tolton, Bratin Saha, Gad Sheaffer, Vadim Bassin
  • Publication number: 20110145637
    Abstract: In one embodiment, the present invention includes a method for selecting a first transaction execution mode to begin a first transaction in a unbounded transactional memory (UTM) system having a plurality of transaction execution modes. These transaction execution modes include hardware modes to execute within a cache memory of a processor, a hardware assisted mode to execute using transactional hardware of the processor and a software buffer, and a software transactional memory (STM) mode to execute without the transactional hardware. The first transaction execution mode can be selected to be a highest performant of the hardware modes if no pending transaction is executing in the STM mode, otherwise a lower performant mode can be selected. Other embodiments are described and claimed.
    Type: Application
    Filed: December 15, 2009
    Publication date: June 16, 2011
    Inventors: Jan Gray, Martin Taillefer, Yossi Levanoni, Ali-Reza Adl-Tabatabai, Dave Detlefs, Vinod Grover, Mike Magruder, Matt Tolton, Bratin Saha, Gad Sheaffer, Vadim Bassin
  • Publication number: 20100218195
    Abstract: A method and apparatus for utilizing hardware mechanisms of a transactional memory system is herein described. Various embodiments relate to software-based filtering of operations from read and write barriers and read isolation barriers during transactional execution. Other embodiments relate to software-implemented read barrier processing to accelerate strong atomicity. Other embodiments are also described and claimed.
    Type: Application
    Filed: December 15, 2009
    Publication date: August 26, 2010
    Inventors: Ali-Reza Adl-Tabatabai, David Callahan, Jan Gray, Vinod Grover, Bratin Saha, Gad Sheaffer
  • Publication number: 20100191930
    Abstract: Transactional memory compatibility type attributes are associated with intermediate language code to specify, for example, that intermediate language code must be run within a transaction, or must not be run within a transaction, or may be run within a transaction. Attributes are automatically produced while generating intermediate language code from annotated source code. Default rules also generate attributes. Tools use attributes to statically or dynamically check for incompatibility between intermediate language code and a transactional memory implementation.
    Type: Application
    Filed: January 26, 2009
    Publication date: July 29, 2010
    Applicant: MICROSOFT CORPORATION
    Inventors: Dana Groff, Yosseff Levanoni, Stephen Toub, Michael McKenzie Magruder, Weirong Zhu, Timothy Lawrence Harris, Christopher William Dern, John Joseph Duffy, David Detlefs, Martin Abadi, Sukhdeep Singh Sodhi, Lingli Zhang, Alexander Dadiomov, Vinod Grover
  • Publication number: 20090259832
    Abstract: One embodiment of the present invention sets forth a technique for translating application programs written using a parallel programming model for execution on multi-core graphics processing unit (GPU) for execution by general purpose central processing unit (CPU). Portions of the application program that rely on specific features of the multi-core GPU are converted by a translator for execution by a general purpose CPU. The application program is partitioned into regions of synchronization independent instructions. The instructions are classified as convergent or divergent and divergent memory references that are shared between regions are replicated. Thread loops are inserted to ensure correct sharing of memory between various threads during execution by the general purpose CPU.
    Type: Application
    Filed: March 19, 2009
    Publication date: October 15, 2009
    Inventors: Vinod GROVER, Bastiaan Joannes Matheus AARTS, Michael MURPHY, Boris BEYLIN, Jayant B. KOLHE, Douglas SAYLOR
  • Publication number: 20090259997
    Abstract: One embodiment of the present invention sets forth a technique for translating application programs written using a parallel programming model for execution on multi-core graphics processing unit (GPU) for execution by general purpose central processing unit (CPU). Portions of the application program that rely on specific features of the multi-core GPU are converted by a translator for execution by a general purpose CPU. The application program is partitioned into regions of synchronization independent instructions. The instructions are classified as convergent or divergent and divergent memory references that are shared between regions are replicated. Thread loops are inserted to ensure correct sharing of memory between various threads during execution by the general purpose CPU.
    Type: Application
    Filed: March 31, 2009
    Publication date: October 15, 2009
    Inventors: Vinod GROVER, Bastiaan Joannes Matheus Aarts, Michael Murphy
  • Publication number: 20090259828
    Abstract: One embodiment of the present invention sets forth a technique for translating application programs written using a parallel programming model for execution on multi-core graphics processing unit (GPU) for execution by general purpose central processing unit (CPU). Portions of the application program that rely on specific features of the multi-core GPU are converted by a translator for execution by a general purpose CPU. The application program is partitioned into regions of synchronization independent instructions. The instructions are classified as convergent or divergent and divergent memory references that are shared between regions are replicated. Thread loops are inserted to ensure correct sharing of memory between various threads during execution by the general purpose CPU.
    Type: Application
    Filed: March 20, 2009
    Publication date: October 15, 2009
    Inventors: Vinod Grover, Bastiaan Joannes Matheus Aarts, Michael Murphy, Jayant B. Kolhe, John Bryan Pormann, Douglas Saylor
  • Publication number: 20090259996
    Abstract: One embodiment of the present invention sets forth a technique for translating application programs written using a parallel programming model for execution on multi-core graphics processing unit (GPU) for execution by general purpose central processing unit (CPU). Portions of the application program that rely on specific features of the multi-core GPU are converted by a translator for execution by a general purpose CPU. The application program is partitioned into regions of synchronization independent instructions. The instructions are classified as convergent or divergent and divergent memory references that are shared between regions are replicated. Thread loops are inserted to ensure correct sharing of memory between various threads during execution by the general purpose CPU.
    Type: Application
    Filed: March 31, 2009
    Publication date: October 15, 2009
    Inventors: Vinod GROVER, Bastiaan Joannes Matheus Aarts, Michael Murphy
  • Publication number: 20090259829
    Abstract: One embodiment of the present invention sets forth a technique for translating application programs written using a parallel programming model for execution on multi-core graphics processing unit (GPU) for execution by general purpose central processing unit (CPU). Portions of the application program that rely on specific features of the multi-core GPU are converted by a translator for execution by a general purpose CPU. The application program is partitioned into regions of synchronization independent instructions. The instructions are classified as convergent or divergent and divergent memory references that are shared between regions are replicated. Thread loops are inserted to ensure correct sharing of memory between various threads during execution by the general purpose CPU.
    Type: Application
    Filed: March 31, 2009
    Publication date: October 15, 2009
    Inventors: Vinod GROVER, Bastiaan Joannes Matheus Aarts, Michael Murphy
  • Publication number: 20070006192
    Abstract: As described herein, an intermediate representation of a source code file may be used to explicitly express exception handling control flow prior to generating object code for the source code. As further described herein, a single uniform set of instructions of the intermediate representation may be used for expressing multiple different exception handling mechanisms related to multiple different programming languages. The intermediate form of the exception handling instructions may be generated by translating an intermediate language representation of the source code file. Representations of the source code in multiple different intermediate languages may be translated to a single uniform set of instructions of the intermediate representation. The intermediate form of the exception handling instructions may then be used by software development tools for such tasks as code generation, code optimization, code analysis etc.
    Type: Application
    Filed: August 15, 2006
    Publication date: January 4, 2007
    Applicant: Microsoft Corporation
    Inventors: Vinod Grover, Akella Sastry
  • Publication number: 20050273777
    Abstract: Intermediate representations of computer code are efficiently generated. More particularly, methods described herein may be used to construct a static single assignment representation of computer code without unnecessary phi-function nodes. Potentially necessary phi-function node assignments may be analyzed to determine whether they directly reach a non-phi use or a necessary phi-use of a corresponding variable. Those that ultimately reach such a use may be determined to be necessary and a pruned static single assignment may be constructed by including those potentially necessary phi-functions determined to be in fact necessary. Also, some phi-function nodes may be determined to be necessary based on their dependency relationship to other phi-functions previously determined to be necessary (e.g., because they directly reach a non-phi use). A phi-function dependency graph may be used to record dependency relationships between phi-function nodes.
    Type: Application
    Filed: June 7, 2004
    Publication date: December 8, 2005
    Applicant: Microsoft Corporation
    Inventors: Vinod Grover, Weiping Hu