Patents by Inventor Ajay Sudarshan Tirumala

Ajay Sudarshan Tirumala has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Techniques for storing sub-alignment data when accelerating Smith-Waterman sequence alignments

Patent number: 11822541

Abstract: Various techniques for accelerating Smith-Waterman sequence alignments are provided. For example, threads in a group of threads are employed to use an interleaved cell layout to store relevant data in registers while computing sub-alignment data for one or more local alignment problems. In another example, specialized instructions that reduce the number of cycles required to compute each sub-alignment score are utilized. In another example, threads are employed to compute sub-alignment data for a subset of columns of one or more local alignment problems while other threads begin computing sub-alignment data based on partial result data received from the preceding threads. After computing a maximum sub-alignment score, a thread stores the maximum sub-alignment score and the corresponding position in global memory.

Type: Grant

Filed: September 30, 2021

Date of Patent: November 21, 2023

Assignee: NVIDIA CORPORATION

Inventors: Maciej Piotr Tyrlik, Ajay Sudarshan Tirumala, Shirish Gadre
IMPLEMENTING SPECIALIZED INSTRUCTIONS FOR ACCELERATING DYNAMIC PROGRAMMING ALGORITHMS

Publication number: 20230305844

Abstract: Various techniques for accelerating dynamic programming algorithms are provided. For example, a fused addition and comparison instruction, a three-operand comparison instruction, and a two-operand comparison instruction are used to accelerate a Needleman-Wunsch algorithm that determines an optimized global alignment of subsequences over two entire sequences. In another example, the fused addition and comparison instruction is used in an innermost loop of a Floyd-Warshall algorithm to reduce the number of instructions required to determine shortest paths between pairs of vertices in a graph. In another example, a two-way single instruction multiple data (SIMD) floating point variant of the three-operand comparison instruction is used to reduce the number of instructions required to determine the median of an array of floating point values.

Type: Application

Filed: September 28, 2022

Publication date: September 28, 2023

Inventors: Maciej Piotr TYRLIK, Ajay Sudarshan TIRUMALA, Shirish GADRE, Frank Joseph EATON, Daniel Alan STIFFLER
TECHNIQUES FOR ACCELERATING SMITH-WATERMAN SEQUENCE ALIGNMENTS

Publication number: 20230101085

Abstract: Various techniques for accelerating Smith-Waterman sequence alignments are provided. For example, threads in a group of threads are employed to use an interleaved cell layout to store relevant data in registers while computing sub-alignment data for one or more local alignment problems. In another example, specialized instructions that reduce the number of cycles required to compute each sub-alignment score are utilized. In another example, threads are employed to compute sub-alignment data for a subset of columns of one or more local alignment problems while other threads begin computing sub-alignment data based on partial result data received from the preceding threads. After computing a maximum sub-alignment score, a thread stores the maximum sub-alignment score and the corresponding position in global memory.

Type: Application

Filed: September 30, 2021

Publication date: March 30, 2023

Inventors: Maciej Piotr TYRLIK, Ajay Sudarshan TIRUMALA, Shirish GADRE
TECHNIQUES FOR STORING SUB-ALIGNMENT DATA WHEN ACCELERATING SMITH-WATERMAN SEQUENCE ALIGNMENTS

Publication number: 20230095916

Abstract: Various techniques for accelerating Smith-Waterman sequence alignments are provided. For example, threads in a group of threads are employed to use an interleaved cell layout to store relevant data in registers while computing sub-alignment data for one or more local alignment problems. In another example, specialized instructions that reduce the number of cycles required to compute each sub-alignment score are utilized. In another example, threads are employed to compute sub-alignment data for a subset of columns of one or more local alignment problems while other threads begin computing sub-alignment data based on partial result data received from the preceding threads. After computing a maximum sub-alignment score, a thread stores the maximum sub-alignment score and the corresponding position in global memory.

Type: Application

Filed: September 30, 2021

Publication date: March 30, 2023

Inventors: Maciej Piotr TYRLIK, Ajay Sudarshan TIRUMALA, Shirish GADRE
Implementing specialized instructions for accelerating Smith-Waterman sequence alignments

Patent number: 11550584

Abstract: Various techniques for accelerating Smith-Waterman sequence alignments are provided. For example, threads in a group of threads are employed to use an interleaved cell layout to store relevant data in registers while computing sub-alignment data for one or more local alignment problems. In another example, specialized instructions that reduce the number of cycles required to compute each sub-alignment score are utilized. In another example, threads are employed to compute sub-alignment data for a subset of columns of one or more local alignment problems while other threads begin computing sub-alignment data based on partial result data received from the preceding threads. After computing a maximum sub-alignment score, a thread stores the maximum sub-alignment score and the corresponding position in global memory.

Type: Grant

Filed: September 30, 2021

Date of Patent: January 10, 2023

Assignee: NVIDIA CORPORATION

Inventors: Maciej Piotr Tyrlik, Ajay Sudarshan Tirumala, Shirish Gadre
TECHNIQUES FOR EFFICIENTLY SYNCHRONIZING MULTIPLE PROGRAM THREADS

Publication number: 20220391264

Abstract: Various embodiments include a parallel processing computer system that enables parallel instances of a program to synchronize at disparate addresses in memory. When the parallel program instances need to exchange data, the program instances synchronize based on a mask that identifies the program instances that are synchronizing. As each program instance reaches the point of synchronization, the program instance blocks and waits for all other program instances to reach the point of synchronization. When all program instances have reached the point of synchronization, at least one program instance executes a synchronous operation to exchange data. The program instances then continue execution at respective and disparate return addresses.

Type: Application

Filed: June 3, 2021

Publication date: December 8, 2022

Inventors: Ajay Sudarshan TIRUMALA, Olivier GIROUX, Peter NELSON, Gary M. TAROLLI, Ankita UPRETI
Techniques for efficiently performing data reductions in parallel processing units

Patent number: 11061741

Abstract: Techniques are disclosed for reducing the latency associated with performing data reductions in a multithreaded processor. In response to a single instruction associated with a set of threads executing in the multithreaded processor, a warp reduction unit acquires register values stored in source registers, where each register value is associated with a different thread included in the set of threads. The warp reduction unit performs operation(s) on the register values to compute an aggregate value. The warp reduction unit stores the aggregate value in a destination register that is accessible to at least one of the threads in the set of threads. Because the data reduction is performed via a single instruction using hardware specialized for data reductions, the number of cycles required to perform the data reduction is decreased relative to prior-art techniques that are performed via multiple instructions using hardware that is not specialized for data reductions.

Type: Grant

Filed: July 16, 2019

Date of Patent: July 13, 2021

Assignee: NVIDIA Corporation

Inventors: Peter Nelson, Olivier Giroux, Ajay Sudarshan Tirumala
Techniques for comprehensively synchronizing execution threads

Patent number: 10977037

Abstract: In one embodiment, a synchronization instruction causes a processor to ensure that specified threads included within a warp concurrently execute a single subsequent instruction. The specified threads include at least a first thread and a second thread. In operation, the first thread arrives at the synchronization instruction. The processor determines that the second thread has not yet arrived at the synchronization instruction and configures the first thread to stop executing instructions. After issuing at least one instruction for the second thread, the processor determines that all the specified threads have arrived at the synchronization instruction. The processor then causes all the specified threads to execute the subsequent instruction. Advantageously, unlike conventional approaches to synchronizing threads, the synchronization instruction enables the processor to reliably and properly execute code that includes complex control flows and/or instructions that presuppose that threads are converged.

Type: Grant

Filed: October 7, 2019

Date of Patent: April 13, 2021

Assignee: NVIDIA Corporation

Inventors: Ajay Sudarshan Tirumala, Olivier Giroux, Peter Nelson, Jack Choquette
TECHNIQUES FOR EFFICIENTLY PERFORMING DATA REDUCTIONS IN PARALLEL PROCESSING UNITS

Publication number: 20210019198

Abstract: Techniques are disclosed for reducing the latency associated with performing data reductions in a multithreaded processor. In response to a single instruction associated with a set of threads executing in the multithreaded processor, a warp reduction unit acquires register values stored in source registers, where each register value is associated with a different thread included in the set of threads. The warp reduction unit performs operation(s) on the register values to compute an aggregate value. The warp reduction unit stores the aggregate value in a destination register that is accessible to at least one of the threads in the set of threads. Because the data reduction is performed via a single instruction using hardware specialized for data reductions, the number of cycles required to perform the data reduction is decreased relative to prior-art techniques that are performed via multiple instructions using hardware that is not specialized for data reductions.

Type: Application

Filed: July 16, 2019

Publication date: January 21, 2021

Inventors: Peter NELSON, Olivier GIROUX, Ajay Sudarshan TIRUMALA
Thread-level sleep in a multithreaded architecture

Patent number: 10817295

Abstract: A streaming multiprocessor (SM) includes a nanosleep (NS) unit configured to cause individual threads executing on the SM to sleep for a programmer-specified interval of time. For a given thread, the NS unit parses a NANOSLEEP instruction and extracts a sleep time. The NS unit then maps the sleep time to a single bit of a timer and causes the thread to sleep. When the timer bit changes, the sleep time expires, and the NS unit awakens the thread. The thread may then continue executing. The SM also includes a nanotrap (NT) unit configured to issue traps using a similar timing mechanism to that described above. For a given thread, the NT unit parses a NANOTRAP instruction and extracts a trap time. The NT unit then maps the trap time to a single bit of a timer. When the timer bit changes, the NT unit issues a trap.

Type: Grant

Filed: April 28, 2017

Date of Patent: October 27, 2020

Assignee: NVIDIA Corporation

Inventors: Olivier Giroux, Peter Nelson, Jack Choquette, Ajay Sudarshan Tirumala
TECHNIQUES FOR COMPREHENSIVELY SYNCHRONIZING EXECUTION THREADS

Publication number: 20200034143

Abstract: In one embodiment, a synchronization instruction causes a processor to ensure that specified threads included within a warp concurrently execute a single subsequent instruction. The specified threads include at least a first thread and a second thread. In operation, the first thread arrives at the synchronization instruction. The processor determines that the second thread has not yet arrived at the synchronization instruction and configures the first thread to stop executing instructions. After issuing at least one instruction for the second thread, the processor determines that all the specified threads have arrived at the synchronization instruction. The processor then causes all the specified threads to execute the subsequent instruction. Advantageously, unlike conventional approaches to synchronizing threads, the synchronization instruction enables the processor to reliably and properly execute code that includes complex control flows and/or instructions that presuppose that threads are converged.

Type: Application

Filed: October 7, 2019

Publication date: January 30, 2020

Inventors: Ajay Sudarshan Tirumala, Olivier Giroux, Peter Nelson, Jack Choquette
Techniques for comprehensively synchronizing execution threads

Patent number: 10437593

Abstract: A synchronization instruction causes a processor to ensure that specified threads included within a warp concurrently execute a single subsequent instruction. The specified threads include at least a first thread and a second thread. In operation, the first thread arrives at the synchronization instruction. The processor determines that the second thread has not yet arrived at the synchronization instruction and configures the first thread to stop executing instructions. After issuing at least one instruction for the second thread, the processor determines that all the specified threads have arrived at the synchronization instruction. The processor then causes all the specified threads to execute the subsequent instruction. Advantageously, unlike conventional approaches to synchronizing threads, the synchronization instruction enables the processor to reliably and properly execute code that includes complex control flows and/or instructions that presuppose that threads are converged.

Type: Grant

Filed: April 27, 2017

Date of Patent: October 8, 2019

Assignee: NVIDIA CORPORATION

Inventors: Ajay Sudarshan Tirumala, Olivier Giroux, Peter Nelson, Jack Choquette
THREAD-LEVEL SLEEP IN A MASSIVELY MULTITHREADED ARCHITECTURE

Publication number: 20180314522

Abstract: A streaming multiprocessor (SM) includes a nanosleep (NS) unit configured to cause individual threads executing on the SM to sleep for a programmer-specified interval of time. For a given thread, the NS unit parses a NANOSLEEP instruction and extracts a sleep time. The NS unit then maps the sleep time to a single bit of a timer and causes the thread to sleep. When the timer bit changes, the sleep time expires, and the NS unit awakens the thread. The thread may then continue executing. The SM also includes a nanotrap (NT) unit configured to issue traps using a similar timing mechanism to that described above. For a given thread, the NT unit parses a NANOTRAP instruction and extracts a trap time. The NT unit then maps the trap time to a single bit of a timer. When the timer bit changes, the NT unit issues a trap.

Type: Application

Filed: April 28, 2017

Publication date: November 1, 2018

Inventors: Olivier GIROUX, Peter NELSON, Jack CHOQUETTE, Ajay Sudarshan TIRUMALA
TECHNIQUES FOR COMPREHENSIVELY SYNCHRONIZING EXECUTION THREADS

Publication number: 20180314520

Abstract: In one embodiment, a synchronization instruction causes a processor to ensure that specified threads included within a warp concurrently execute a single subsequent instruction. The specified threads include at least a first thread and a second thread. In operation, the first thread arrives at the synchronization instruction. The processor determines that the second thread has not yet arrived at the synchronization instruction and configures the first thread to stop executing instructions. After issuing at least one instruction for the second thread, the processor determines that all the specified threads have arrived at the synchronization instruction. The processor then causes all the specified threads to execute the subsequent instruction. Advantageously, unlike conventional approaches to synchronizing threads, the synchronization instruction enables the processor to reliably and properly execute code that includes complex control flows and/or instructions that presuppose that threads are converged.

Type: Application

Filed: April 27, 2017

Publication date: November 1, 2018

Inventors: Ajay Sudarshan Tirumala, Olivier Giroux, Peter Nelson, Jack Choquette