Patents by Inventor Ram Rangan
Ram Rangan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11934867Abstract: Warp sharding techniques to switch execution between divergent shards on instructions that trigger a long stall, thereby interleaving execution between diverged threads within a warp instead of across warps. The technique may be applied to mitigate pipeline stalls in applications with low warp occupancy and high divergence. Warp data cache locality may also be improved by concentrating memory accesses within a warp rather than spreading them across warps.Type: GrantFiled: February 24, 2021Date of Patent: March 19, 2024Assignee: NVIDIA CORP.Inventors: Sana Damani, Mark Stephenson, Ram Rangan, Daniel Robert Johnson, Rishkul Kulkarni
-
Patent number: 11513686Abstract: Accesses between a processor and its external memory is reduced when the processor internally maintains a compressed version of values stored in the external memory. The processor can then refer to the compressed version rather than access the external memory. One compression technique involves maintaining a dictionary on the processor mapping portions of a memory to values. When all of the values of a portion of memory are uniform (e.g., the same), the value is stored in the dictionary for that portion of memory. Thereafter, when the processor needs to access that portion of memory, the value is retrieved from the dictionary rather than from external memory. Techniques are disclosed herein to extend, for example, the capabilities of such dictionary-based compression so that the amount of accesses between the processor and its external memory are further reduced.Type: GrantFiled: May 5, 2020Date of Patent: November 29, 2022Assignee: NVIDIA CorporationInventors: Ram Rangan, Suryakant Patidar, Praveen Krishnamurthy, Wishwesh Anil Gandhi
-
Patent number: 11372548Abstract: Some systems compress data utilized by a user mode software without the user mode software being aware of any compression taking place. To maintain that illusion, such systems prevent user mode software from being aware of and/or accessing the underlying compressed states of the data. While such an approach protects proprietary compression techniques used in such systems from being deciphered, such restrictions limit the ability of user mode software to use the underlying compressed forms of the data in new ways. Disclosed herein are various techniques for allowing user-mode software to access the underlying compressed states of data either directly or indirectly. Such techniques can be used, for example, to allow various user-mode software on a single system or on multiple systems to exchange data in the underlying compression format of the system(s) even when the user mode software is unable to decipher the compression format.Type: GrantFiled: May 29, 2020Date of Patent: June 28, 2022Assignee: NVIDIA CorporationInventors: Ram Rangan, Patrick Richard Brown, Wishwesh Anil Gandhi, Steven James Heinrich, Mathias Heyer, Emmett Michael Kilgariff, Praveen Krishnamurthy, Dong Han Ryu
-
Patent number: 11263051Abstract: Accesses between a processor and its external memory is reduced when the processor internally maintains a compressed version of values stored in the external memory. The processor can then refer to the compressed version rather than access the external memory. One compression technique involves maintaining a dictionary on the processor mapping portions of a memory to values. When all of the values of a portion of memory are uniform (e.g., the same), the value is stored in the dictionary for that portion of memory. Thereafter, when the processor needs to access that portion of memory, the value is retrieved from the dictionary rather than from external memory. Techniques are disclosed herein to extend, for example, the capabilities of such dictionary-based compression so that the amount of accesses between the processor and its external memory are further reduced.Type: GrantFiled: May 5, 2020Date of Patent: March 1, 2022Assignee: NVIDIA CorporationInventors: Ram Rangan, Suryakant Patidar, Praveen Krishnamurthy, Wishwesh Anil Gandhi
-
Publication number: 20220027194Abstract: Warp sharding techniques to switch execution between divergent shards on instructions that trigger a long stall, thereby interleaving execution between diverged threads within a warp instead of across warps. The technique may be applied to mitigate pipeline stalls in applications with low warp occupancy and high divergence. Warp data cache locality may also be improved by concentrating memory accesses within a warp rather than spreading them across warps.Type: ApplicationFiled: February 24, 2021Publication date: January 27, 2022Applicant: NVIDIA Corp.Inventors: Sana Damani, Mark Stephenson, Ram Rangan, Daniel Robert Johnson, Rishkul Kulkarni
-
Publication number: 20210373774Abstract: Some systems compress data utilized by a user mode software without the user mode software being aware of any compression taking place. To maintain that illusion, such systems prevent user mode software from being aware of and/or accessing the underlying compressed states of the data. While such an approach protects proprietary compression techniques used in such systems from being deciphered, such restrictions limit the ability of user mode software to use the underlying compressed forms of the data in new ways. Disclosed herein are various techniques for allowing user-mode software to access the underlying compressed states of data either directly or indirectly. Such techniques can be used, for example, to allow various user-mode software on a single system or on multiple systems to exchange data in the underlying compression format of the system(s) even when the user mode software is unable to decipher the compression format.Type: ApplicationFiled: May 29, 2020Publication date: December 2, 2021Inventors: Ram Rangan, Patrick Richard Brown, Wishwesh Anil Gandhi, Steven James Heinrich, Mathias Heyer, Emmett Michael Kilgariff, Praveen Krishnamurthy, Dong Han Ryu
-
Publication number: 20210349761Abstract: Accesses between a processor and its external memory is reduced when the processor internally maintains a compressed version of values stored in the external memory. The processor can then refer to the compressed version rather than access the external memory. One compression technique involves maintaining a dictionary on the processor mapping portions of a memory to values. When all of the values of a portion of memory are uniform (e.g., the same), the value is stored in the dictionary for that portion of memory. Thereafter, when the processor needs to access that portion of memory, the value is retrieved from the dictionary rather than from external memory. Techniques are disclosed herein to extend, for example, the capabilities of such dictionary-based compression so that the amount of accesses between the processor and its external memory are further reduced.Type: ApplicationFiled: May 5, 2020Publication date: November 11, 2021Inventors: Ram Rangan, Suryakant Patidar, Praveen Krishnamurthy, Wishwesh Anil Gandhi
-
Publication number: 20210349639Abstract: Accesses between a processor and its external memory is reduced when the processor internally maintains a compressed version of values stored in the external memory. The processor can then refer to the compressed version rather than access the external memory. One compression technique involves maintaining a dictionary on the processor mapping portions of a memory to values. When all of the values of a portion of memory are uniform (e.g., the same), the value is stored in the dictionary for that portion of memory. Thereafter, when the processor needs to access that portion of memory, the value is retrieved from the dictionary rather than from external memory. Techniques are disclosed herein to extend, for example, the capabilities of such dictionary-based compression so that the amount of accesses between the processor and its external memory are further reduced.Type: ApplicationFiled: May 5, 2020Publication date: November 11, 2021Inventors: Ram Rangan, Suryakant Patidar, Praveen Krishnamurthy, Wishwesh Anil Gandhi
-
Patent number: 11138018Abstract: Profile-guided optimization is a technique for optimizing execution of computer programs using profile information to improve program runtime performance. Obtaining the profile information can be challenging, especially in live production environments such as high-performance gaming systems. A profiling strategy is provided herein that obtains profile information without requiring extra effort from users. The profiling strategy collects several approximate, lightweight profiles called piecemeal profiles over one or more lifetimes of a computer program, or application. The piecemeal profiles are then used to generate whole program application profiles that can then be used to improve the execution of the application. A piecemeal profile is profile information of a section or portion of an application.Type: GrantFiled: December 14, 2018Date of Patent: October 5, 2021Assignee: Nvidia CorporationInventors: Marc Blackstein, Ram Rangan
-
Patent number: 11069023Abstract: A technique selectively avoids memory fetches for partially uniform textures in real time graphics shader programs and instead uses program paths specialized for one or more frequently occurring values. One aspect avoids memory lookups and dependent computations for partially uniform textures through use of pre-constructed coarse-grained representations called value locality maps or dirty tilemaps (DTMs). The decision to use a specialized fast path or not is made dynamically by consulting such coarse-grained dirty tilemap representations. Thread-sharing value reuse can be implemented with or instead of the DTM mechanism.Type: GrantFiled: May 24, 2019Date of Patent: July 20, 2021Assignee: NVIDIA CorporationInventor: Ram Rangan
-
Publication number: 20200372603Abstract: A technique selectively avoids memory fetches for partially uniform textures in real time graphics shader programs and instead uses program paths specialized for one or more frequently occurring values. One aspect avoids memory lookups and dependent computations for partially uniform textures through use of pre-constructed coarse-grained representations called value locality maps or dirty tilemaps (DTMs). The decision to use a specialized fast path or not is made dynamically by consulting such coarse-grained dirty tilemap representations. Thread-sharing value reuse can be implemented with or instead of the DTM mechanism.Type: ApplicationFiled: May 24, 2019Publication date: November 26, 2020Inventor: Ram RANGAN
-
Publication number: 20200192680Abstract: Profile-guided optimization is a technique for optimizing execution of computer programs using profile information to improve program runtime performance. Obtaining the profile information can be challenging, especially in live production environments such as high-performance gaming systems. A profiling strategy is provided herein that obtains profile information without requiring extra effort from users. The profiling strategy collects several approximate, lightweight profiles called piecemeal profiles over one or more lifetimes of a computer program, or application. The piecemeal profiles are then used to generate whole program application profiles that can then be used to improve the execution of the application. A piecemeal profile is profile information of a section or portion of an application.Type: ApplicationFiled: December 14, 2018Publication date: June 18, 2020Inventors: Marc Blackstein, Ram Rangan
-
Patent number: 9946550Abstract: A technique for handling predicated code in an out-of-order processor includes detecting a predicate defining instruction associated with a predicated code region. Renaming of predicated instructions, within the predicated code region, is then stalled until a predicate of the predicate defining instruction is resolved.Type: GrantFiled: September 17, 2007Date of Patent: April 17, 2018Assignee: International Business Machines CorporationInventors: Ram Rangan, William E. Speight, Mark W. Stephenson, Lixin Zhang
-
Patent number: 9262140Abstract: A predication technique for out-of-order instruction processing provides efficient out-of-order execution with low hardware overhead. A special op-code demarks unified regions of program code that contain predicated instructions that depend on the resolution of a condition. Field(s) or operand(s) associated with the special op-code indicate the number of instructions that follow the op-code and also contain an indication of the association of each instruction with its corresponding conditional path. Each conditional register write in a region has a corresponding register write for each conditional path, with additional register writes inserted by the compiler if symmetry is not already present, forming a coupled set of register writes. Therefore, a unified instruction stream can be decoded and dispatched with the register writes all associated with the same re-name resource, and the conditional register write is resolved by executing the particular instruction specified by the resolved condition.Type: GrantFiled: May 19, 2008Date of Patent: February 16, 2016Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Ram Rangan, Mark W. Stephenson, Lixin Zhang
-
Patent number: 8453129Abstract: A method, computer system, and computer program product for using one or more hardware interrupts to drive dynamic binary code recompilation. The execution of a plurality of instructions is monitored to detect a problematic instruction. In response to detecting the problematic instruction, a hardware interrupt is thrown to a dynamic interrupt handler. A determination is made whether a threshold for dynamic binary code recompilation is satisfied. If the threshold for dynamic code recompilation is satisfied, the dynamic interrupt handler optimizes at least one of the plurality of instructions.Type: GrantFiled: April 24, 2008Date of Patent: May 28, 2013Assignee: International Business Machines CorporationInventors: Mark W. Stephenson, Ram Rangan
-
Patent number: 7886132Abstract: A predication technique for out-of-order instruction processing provides efficient out-of-order execution with low hardware overhead. A special op-code demarks unified regions of program code that contain predicated instructions that depend on the resolution of a condition. Field(s) or operand(s) associated with the special op-code indicate the number of instructions that follow the op-code and also contain an indication of the association of each instruction with its corresponding conditional path. Each conditional register write in a region has a corresponding register write for each conditional path, with additional register writes inserted by the compiler if symmetry is not already present, forming a coupled set of register writes. Therefore, a unified instruction stream can be decoded and dispatched with the register writes all associated with the same re-name resource, and the conditional register write is resolved by executing the particular instruction specified by the resolved condition.Type: GrantFiled: May 19, 2008Date of Patent: February 8, 2011Assignee: International Business Machines CorporationInventors: Ram Rangan, Mark W. Stephenson, Lixin Zhang
-
Publication number: 20090287908Abstract: A predication technique for out-of-order instruction processing provides efficient out-of-order execution with low hardware overhead. A special op-code demarks unified regions of program code that contain predicated instructions that depend on the resolution of a condition. Field(s) or operand(s) associated with the special op-code indicate the number of instructions that follow the op-code and also contain an indication of the association of each instruction with its corresponding conditional path. Each conditional register write in a region has a corresponding register write for each conditional path, with additional register writes inserted by the compiler if symmetry is not already present, forming a coupled set of register writes. Therefore, a unified instruction stream can be decoded and dispatched with the register writes all associated with the same re-name resource, and the conditional register write is resolved by executing the particular instruction specified by the resolved condition.Type: ApplicationFiled: May 19, 2008Publication date: November 19, 2009Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Ram Rangan, Mark W. Stephenson, Lixin Zhang
-
Publication number: 20090288063Abstract: A predication technique for out-of-order instruction processing provides efficient out-of-order execution with low hardware overhead. A special op-code demarks unified regions of program code that contain predicated instructions that depend on the resolution of a condition. Field(s) or operand(s) associated with the special op-code indicate the number of instructions that follow the op-code and also contain an indication of the association of each instruction with its corresponding conditional path. Each conditional register write in a region has a corresponding register write for each conditional path, with additional register writes inserted by the compiler if symmetry is not already present, forming a coupled set of register writes. Therefore, a unified instruction stream can be decoded and dispatched with the register writes all associated with the same re-name resource, and the conditional register write is resolved by executing the particular instruction specified by the resolved condition.Type: ApplicationFiled: May 19, 2008Publication date: November 19, 2009Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Ram Rangan, Mark W. Stephenson, Lixin Zhang
-
Publication number: 20090271772Abstract: A method, computer system, and computer program product for using one or more hardware interrupts to drive dynamic binary code recompilation. The execution of a plurality of instructions is monitored to detect a problematic instruction. In response to detecting the problematic instruction, a hardware interrupt is thrown to a dynamic interrupt handler. A determination is made whether a threshold for dynamic binary code recompilation is satisfied. If the threshold for dynamic code recompilation is satisfied, the dynamic interrupt handler optimizes at least one of the plurality of instructions.Type: ApplicationFiled: April 24, 2008Publication date: October 29, 2009Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Mark W. Stephenson, Ram Rangan
-
Publication number: 20090077354Abstract: A technique for handling predicated code in an out-of-order processor includes detecting a predicate defining instruction associated with a predicated code region. Renaming of predicated instructions, within the predicated code region, is then stalled until a predicate of the predicate defining instruction is resolved.Type: ApplicationFiled: September 17, 2007Publication date: March 19, 2009Inventors: Ram Rangan, William E. Speight, Mark W. Stephenson, Lixin Zhang