Patents by Inventor Ram Rangan

Ram Rangan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11934867
    Abstract: Warp sharding techniques to switch execution between divergent shards on instructions that trigger a long stall, thereby interleaving execution between diverged threads within a warp instead of across warps. The technique may be applied to mitigate pipeline stalls in applications with low warp occupancy and high divergence. Warp data cache locality may also be improved by concentrating memory accesses within a warp rather than spreading them across warps.
    Type: Grant
    Filed: February 24, 2021
    Date of Patent: March 19, 2024
    Assignee: NVIDIA CORP.
    Inventors: Sana Damani, Mark Stephenson, Ram Rangan, Daniel Robert Johnson, Rishkul Kulkarni
  • Patent number: 11513686
    Abstract: Accesses between a processor and its external memory is reduced when the processor internally maintains a compressed version of values stored in the external memory. The processor can then refer to the compressed version rather than access the external memory. One compression technique involves maintaining a dictionary on the processor mapping portions of a memory to values. When all of the values of a portion of memory are uniform (e.g., the same), the value is stored in the dictionary for that portion of memory. Thereafter, when the processor needs to access that portion of memory, the value is retrieved from the dictionary rather than from external memory. Techniques are disclosed herein to extend, for example, the capabilities of such dictionary-based compression so that the amount of accesses between the processor and its external memory are further reduced.
    Type: Grant
    Filed: May 5, 2020
    Date of Patent: November 29, 2022
    Assignee: NVIDIA Corporation
    Inventors: Ram Rangan, Suryakant Patidar, Praveen Krishnamurthy, Wishwesh Anil Gandhi
  • Patent number: 11372548
    Abstract: Some systems compress data utilized by a user mode software without the user mode software being aware of any compression taking place. To maintain that illusion, such systems prevent user mode software from being aware of and/or accessing the underlying compressed states of the data. While such an approach protects proprietary compression techniques used in such systems from being deciphered, such restrictions limit the ability of user mode software to use the underlying compressed forms of the data in new ways. Disclosed herein are various techniques for allowing user-mode software to access the underlying compressed states of data either directly or indirectly. Such techniques can be used, for example, to allow various user-mode software on a single system or on multiple systems to exchange data in the underlying compression format of the system(s) even when the user mode software is unable to decipher the compression format.
    Type: Grant
    Filed: May 29, 2020
    Date of Patent: June 28, 2022
    Assignee: NVIDIA Corporation
    Inventors: Ram Rangan, Patrick Richard Brown, Wishwesh Anil Gandhi, Steven James Heinrich, Mathias Heyer, Emmett Michael Kilgariff, Praveen Krishnamurthy, Dong Han Ryu
  • Patent number: 11263051
    Abstract: Accesses between a processor and its external memory is reduced when the processor internally maintains a compressed version of values stored in the external memory. The processor can then refer to the compressed version rather than access the external memory. One compression technique involves maintaining a dictionary on the processor mapping portions of a memory to values. When all of the values of a portion of memory are uniform (e.g., the same), the value is stored in the dictionary for that portion of memory. Thereafter, when the processor needs to access that portion of memory, the value is retrieved from the dictionary rather than from external memory. Techniques are disclosed herein to extend, for example, the capabilities of such dictionary-based compression so that the amount of accesses between the processor and its external memory are further reduced.
    Type: Grant
    Filed: May 5, 2020
    Date of Patent: March 1, 2022
    Assignee: NVIDIA Corporation
    Inventors: Ram Rangan, Suryakant Patidar, Praveen Krishnamurthy, Wishwesh Anil Gandhi
  • Publication number: 20220027194
    Abstract: Warp sharding techniques to switch execution between divergent shards on instructions that trigger a long stall, thereby interleaving execution between diverged threads within a warp instead of across warps. The technique may be applied to mitigate pipeline stalls in applications with low warp occupancy and high divergence. Warp data cache locality may also be improved by concentrating memory accesses within a warp rather than spreading them across warps.
    Type: Application
    Filed: February 24, 2021
    Publication date: January 27, 2022
    Applicant: NVIDIA Corp.
    Inventors: Sana Damani, Mark Stephenson, Ram Rangan, Daniel Robert Johnson, Rishkul Kulkarni
  • Publication number: 20210373774
    Abstract: Some systems compress data utilized by a user mode software without the user mode software being aware of any compression taking place. To maintain that illusion, such systems prevent user mode software from being aware of and/or accessing the underlying compressed states of the data. While such an approach protects proprietary compression techniques used in such systems from being deciphered, such restrictions limit the ability of user mode software to use the underlying compressed forms of the data in new ways. Disclosed herein are various techniques for allowing user-mode software to access the underlying compressed states of data either directly or indirectly. Such techniques can be used, for example, to allow various user-mode software on a single system or on multiple systems to exchange data in the underlying compression format of the system(s) even when the user mode software is unable to decipher the compression format.
    Type: Application
    Filed: May 29, 2020
    Publication date: December 2, 2021
    Inventors: Ram Rangan, Patrick Richard Brown, Wishwesh Anil Gandhi, Steven James Heinrich, Mathias Heyer, Emmett Michael Kilgariff, Praveen Krishnamurthy, Dong Han Ryu
  • Publication number: 20210349761
    Abstract: Accesses between a processor and its external memory is reduced when the processor internally maintains a compressed version of values stored in the external memory. The processor can then refer to the compressed version rather than access the external memory. One compression technique involves maintaining a dictionary on the processor mapping portions of a memory to values. When all of the values of a portion of memory are uniform (e.g., the same), the value is stored in the dictionary for that portion of memory. Thereafter, when the processor needs to access that portion of memory, the value is retrieved from the dictionary rather than from external memory. Techniques are disclosed herein to extend, for example, the capabilities of such dictionary-based compression so that the amount of accesses between the processor and its external memory are further reduced.
    Type: Application
    Filed: May 5, 2020
    Publication date: November 11, 2021
    Inventors: Ram Rangan, Suryakant Patidar, Praveen Krishnamurthy, Wishwesh Anil Gandhi
  • Publication number: 20210349639
    Abstract: Accesses between a processor and its external memory is reduced when the processor internally maintains a compressed version of values stored in the external memory. The processor can then refer to the compressed version rather than access the external memory. One compression technique involves maintaining a dictionary on the processor mapping portions of a memory to values. When all of the values of a portion of memory are uniform (e.g., the same), the value is stored in the dictionary for that portion of memory. Thereafter, when the processor needs to access that portion of memory, the value is retrieved from the dictionary rather than from external memory. Techniques are disclosed herein to extend, for example, the capabilities of such dictionary-based compression so that the amount of accesses between the processor and its external memory are further reduced.
    Type: Application
    Filed: May 5, 2020
    Publication date: November 11, 2021
    Inventors: Ram Rangan, Suryakant Patidar, Praveen Krishnamurthy, Wishwesh Anil Gandhi
  • Patent number: 11138018
    Abstract: Profile-guided optimization is a technique for optimizing execution of computer programs using profile information to improve program runtime performance. Obtaining the profile information can be challenging, especially in live production environments such as high-performance gaming systems. A profiling strategy is provided herein that obtains profile information without requiring extra effort from users. The profiling strategy collects several approximate, lightweight profiles called piecemeal profiles over one or more lifetimes of a computer program, or application. The piecemeal profiles are then used to generate whole program application profiles that can then be used to improve the execution of the application. A piecemeal profile is profile information of a section or portion of an application.
    Type: Grant
    Filed: December 14, 2018
    Date of Patent: October 5, 2021
    Assignee: Nvidia Corporation
    Inventors: Marc Blackstein, Ram Rangan
  • Patent number: 11069023
    Abstract: A technique selectively avoids memory fetches for partially uniform textures in real time graphics shader programs and instead uses program paths specialized for one or more frequently occurring values. One aspect avoids memory lookups and dependent computations for partially uniform textures through use of pre-constructed coarse-grained representations called value locality maps or dirty tilemaps (DTMs). The decision to use a specialized fast path or not is made dynamically by consulting such coarse-grained dirty tilemap representations. Thread-sharing value reuse can be implemented with or instead of the DTM mechanism.
    Type: Grant
    Filed: May 24, 2019
    Date of Patent: July 20, 2021
    Assignee: NVIDIA Corporation
    Inventor: Ram Rangan
  • Publication number: 20200372603
    Abstract: A technique selectively avoids memory fetches for partially uniform textures in real time graphics shader programs and instead uses program paths specialized for one or more frequently occurring values. One aspect avoids memory lookups and dependent computations for partially uniform textures through use of pre-constructed coarse-grained representations called value locality maps or dirty tilemaps (DTMs). The decision to use a specialized fast path or not is made dynamically by consulting such coarse-grained dirty tilemap representations. Thread-sharing value reuse can be implemented with or instead of the DTM mechanism.
    Type: Application
    Filed: May 24, 2019
    Publication date: November 26, 2020
    Inventor: Ram RANGAN
  • Publication number: 20200192680
    Abstract: Profile-guided optimization is a technique for optimizing execution of computer programs using profile information to improve program runtime performance. Obtaining the profile information can be challenging, especially in live production environments such as high-performance gaming systems. A profiling strategy is provided herein that obtains profile information without requiring extra effort from users. The profiling strategy collects several approximate, lightweight profiles called piecemeal profiles over one or more lifetimes of a computer program, or application. The piecemeal profiles are then used to generate whole program application profiles that can then be used to improve the execution of the application. A piecemeal profile is profile information of a section or portion of an application.
    Type: Application
    Filed: December 14, 2018
    Publication date: June 18, 2020
    Inventors: Marc Blackstein, Ram Rangan
  • Patent number: 9946550
    Abstract: A technique for handling predicated code in an out-of-order processor includes detecting a predicate defining instruction associated with a predicated code region. Renaming of predicated instructions, within the predicated code region, is then stalled until a predicate of the predicate defining instruction is resolved.
    Type: Grant
    Filed: September 17, 2007
    Date of Patent: April 17, 2018
    Assignee: International Business Machines Corporation
    Inventors: Ram Rangan, William E. Speight, Mark W. Stephenson, Lixin Zhang
  • Patent number: 9262140
    Abstract: A predication technique for out-of-order instruction processing provides efficient out-of-order execution with low hardware overhead. A special op-code demarks unified regions of program code that contain predicated instructions that depend on the resolution of a condition. Field(s) or operand(s) associated with the special op-code indicate the number of instructions that follow the op-code and also contain an indication of the association of each instruction with its corresponding conditional path. Each conditional register write in a region has a corresponding register write for each conditional path, with additional register writes inserted by the compiler if symmetry is not already present, forming a coupled set of register writes. Therefore, a unified instruction stream can be decoded and dispatched with the register writes all associated with the same re-name resource, and the conditional register write is resolved by executing the particular instruction specified by the resolved condition.
    Type: Grant
    Filed: May 19, 2008
    Date of Patent: February 16, 2016
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Ram Rangan, Mark W. Stephenson, Lixin Zhang
  • Patent number: 8453129
    Abstract: A method, computer system, and computer program product for using one or more hardware interrupts to drive dynamic binary code recompilation. The execution of a plurality of instructions is monitored to detect a problematic instruction. In response to detecting the problematic instruction, a hardware interrupt is thrown to a dynamic interrupt handler. A determination is made whether a threshold for dynamic binary code recompilation is satisfied. If the threshold for dynamic code recompilation is satisfied, the dynamic interrupt handler optimizes at least one of the plurality of instructions.
    Type: Grant
    Filed: April 24, 2008
    Date of Patent: May 28, 2013
    Assignee: International Business Machines Corporation
    Inventors: Mark W. Stephenson, Ram Rangan
  • Patent number: 7886132
    Abstract: A predication technique for out-of-order instruction processing provides efficient out-of-order execution with low hardware overhead. A special op-code demarks unified regions of program code that contain predicated instructions that depend on the resolution of a condition. Field(s) or operand(s) associated with the special op-code indicate the number of instructions that follow the op-code and also contain an indication of the association of each instruction with its corresponding conditional path. Each conditional register write in a region has a corresponding register write for each conditional path, with additional register writes inserted by the compiler if symmetry is not already present, forming a coupled set of register writes. Therefore, a unified instruction stream can be decoded and dispatched with the register writes all associated with the same re-name resource, and the conditional register write is resolved by executing the particular instruction specified by the resolved condition.
    Type: Grant
    Filed: May 19, 2008
    Date of Patent: February 8, 2011
    Assignee: International Business Machines Corporation
    Inventors: Ram Rangan, Mark W. Stephenson, Lixin Zhang
  • Publication number: 20090287908
    Abstract: A predication technique for out-of-order instruction processing provides efficient out-of-order execution with low hardware overhead. A special op-code demarks unified regions of program code that contain predicated instructions that depend on the resolution of a condition. Field(s) or operand(s) associated with the special op-code indicate the number of instructions that follow the op-code and also contain an indication of the association of each instruction with its corresponding conditional path. Each conditional register write in a region has a corresponding register write for each conditional path, with additional register writes inserted by the compiler if symmetry is not already present, forming a coupled set of register writes. Therefore, a unified instruction stream can be decoded and dispatched with the register writes all associated with the same re-name resource, and the conditional register write is resolved by executing the particular instruction specified by the resolved condition.
    Type: Application
    Filed: May 19, 2008
    Publication date: November 19, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Ram Rangan, Mark W. Stephenson, Lixin Zhang
  • Publication number: 20090288063
    Abstract: A predication technique for out-of-order instruction processing provides efficient out-of-order execution with low hardware overhead. A special op-code demarks unified regions of program code that contain predicated instructions that depend on the resolution of a condition. Field(s) or operand(s) associated with the special op-code indicate the number of instructions that follow the op-code and also contain an indication of the association of each instruction with its corresponding conditional path. Each conditional register write in a region has a corresponding register write for each conditional path, with additional register writes inserted by the compiler if symmetry is not already present, forming a coupled set of register writes. Therefore, a unified instruction stream can be decoded and dispatched with the register writes all associated with the same re-name resource, and the conditional register write is resolved by executing the particular instruction specified by the resolved condition.
    Type: Application
    Filed: May 19, 2008
    Publication date: November 19, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Ram Rangan, Mark W. Stephenson, Lixin Zhang
  • Publication number: 20090271772
    Abstract: A method, computer system, and computer program product for using one or more hardware interrupts to drive dynamic binary code recompilation. The execution of a plurality of instructions is monitored to detect a problematic instruction. In response to detecting the problematic instruction, a hardware interrupt is thrown to a dynamic interrupt handler. A determination is made whether a threshold for dynamic binary code recompilation is satisfied. If the threshold for dynamic code recompilation is satisfied, the dynamic interrupt handler optimizes at least one of the plurality of instructions.
    Type: Application
    Filed: April 24, 2008
    Publication date: October 29, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Mark W. Stephenson, Ram Rangan
  • Publication number: 20090077354
    Abstract: A technique for handling predicated code in an out-of-order processor includes detecting a predicate defining instruction associated with a predicated code region. Renaming of predicated instructions, within the predicated code region, is then stalled until a predicate of the predicate defining instruction is resolved.
    Type: Application
    Filed: September 17, 2007
    Publication date: March 19, 2009
    Inventors: Ram Rangan, William E. Speight, Mark W. Stephenson, Lixin Zhang