Patents by Inventor Manan Patel

Manan Patel has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

HARDWARE ACCELERATED SYNCHRONIZATION WITH ASYNCHRONOUS TRANSACTION SUPPORT

Publication number: 20230289242

Abstract: A new transaction barrier synchronization primitive enables executing threads and asynchronous transactions to synchronize across parallel processors. The asynchronous transactions may include transactions resulting from, for example, hardware data movement units such as direct memory units, etc. A hardware synchronization circuit may provide for the synchronization primitive to be stored in a cache memory so that barrier operations may be accelerated by the circuit. A new wait mechanism reduces software overhead associated with waiting on a barrier.

Type: Application

Filed: March 10, 2022

Publication date: September 14, 2023

Inventors: Timothy GUO, Jack CHOQUETTE, Shirish GADRE, Olivier GIROUX, Carter EDWARDS, John EDMONDSON, Manan PATEL, Raghavan MADHAVAN, JR., Jessie HUANG, Peter NELSON, Ronny KRASHINSKY
PROGRAMMATICALLY CONTROLLED DATA MULTICASTING ACROSS MULTIPLE COMPUTE ENGINES

Publication number: 20230289190

Abstract: This specification describes a programmatic multicast technique enabling one thread (for example, in a cooperative group array (CGA) on a GPU) to request data on behalf of one or more other threads (for example, executing on respective processor cores of the GPU). The multicast is supported by tracking circuitry that interfaces between multicast requests received from processor cores and the available memory. The multicast is designed to reduce cache (for example, layer 2 cache) bandwidth utilization enabling strong scaling and smaller tile sizes.

Type: Application

Filed: March 10, 2022

Publication date: September 14, 2023

Inventors: Apoorv PARLE, Ronny KRASHINSKY, John EDMONDSON, Jack CHOQUETTE, Shirish GADRE, Steve HEINRICH, Manan PATEL, Prakash Bangalore PRABHAKAR, JR., Ravi MANYAM, Wish GANDHI, Lacky SHAH, Alexander L. Minkin
Efficient Matrix Multiply and Add with a Group of Warps

Publication number: 20230289398

Abstract: This specification describes techniques for implementing matrix multiply and add (MMA) operations in graphics processing units (GPU)s and other processors. The implementations provide for a plurality of warps of threads to collaborate in generating the result matrix by enabling each thread to share its respective register files to be accessed by the datapaths associated with other threads in the group of warps. A state machine circuit controls a MMA execution among the warps executing on asynchronous computation units. A group MMA (GMMA) instruction provides for a descriptor to be provided as parameter where the descriptor may include information regarding size and formats of input data to be loaded into shared memory and/or the datapath.

Type: Application

Filed: March 10, 2022

Publication date: September 14, 2023

Inventors: Jack CHOQUETTE, Manan PATEL, Matt TYRLIK, Ronny KRASHINSKY
Error containment for enabling local checkpoint and recovery

Patent number: 11720440

Abstract: Various embodiments include a parallel processing computer system that detects memory errors as a memory client loads data from memory and disables the memory client from storing data to memory, thereby reducing the likelihood that the memory error propagates to other memory clients. The memory client initiates a stall sequence, while other memory clients continue to execute instructions and the memory continues to service memory load and store operations. When a memory error is detected, a specific bit pattern is stored in conjunction with the data associated with the memory error. When the data is copied from one memory to another memory, the specific bit pattern is also copied, in order to identify the data as having a memory error.

Type: Grant

Filed: July 12, 2021

Date of Patent: August 8, 2023

Assignee: NVIDIA CORPORATION

Inventors: Naveen Cherukuri, Saurabh Hukerikar, Paul Racunas, Nirmal Raj Saxena, David Charles Patrick, Yiyang Feng, Abhijeet Ghadge, Steven James Heinrich, Adam Hendrickson, Gentaro Hirota, Praveen Joginipally, Vaishali Kulkarni, Peter C. Mills, Sandeep Navada, Manan Patel, Liang Yin
EFFICIENTLY LAUNCHING TASKS ON A PROCESSOR

Publication number: 20230236878

Abstract: In various embodiments, scheduling dependencies associated with tasks executed on a processor are decoupled from data dependencies associated with the tasks. Before the completion of a first task that is executing in the processor, a scheduling dependency specifying that a second task is dependent on the first task is resolved based on a pre-exit trigger. In response to the resolution of the scheduling dependency, the second task is launched on the processor.

Type: Application

Filed: January 25, 2022

Publication date: July 27, 2023

Inventors: Jack Hilaire CHOQUETTE, Rajballav DASH, Shayani DEB, Gentaro HIROTA, Ronny M. KRASHINSKY, Ze LONG, Chen MEI, Manan PATEL, Ming Y. SIU
TECHNIQUES FOR EFFICIENTLY TRANSFERRING DATA TO A PROCESSOR

Publication number: 20230185570

Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.

Type: Application

Filed: February 8, 2023

Publication date: June 15, 2023

Inventors: Andrew KERR, Jack Choquette, Xiaogang Qiu, Omkar Paranjape, Poornachandra Rao, Shirish Gadre, Steven J. Heinrich, Manan Patel, Olivier Giroux, Alan Kaatz
Techniques for efficiently transferring data to a processor

Patent number: 11604649

Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.

Type: Grant

Filed: June 30, 2021

Date of Patent: March 14, 2023

Assignee: NVIDIA Corporation

Inventors: Andrew Kerr, Jack Choquette, Xiaogang Qiu, Omkar Paranjape, Poornachandra Rao, Shirish Gadre, Steven J. Heinrich, Manan Patel, Olivier Giroux, Alan Kaatz
ERROR CONTAINMENT FOR ENABLING LOCAL CHECKPOINT AND RECOVERY

Publication number: 20230011863

Abstract: Various embodiments include a parallel processing computer system that detects memory errors as a memory client loads data from memory and disables the memory client from storing data to memory, thereby reducing the likelihood that the memory error propagates to other memory clients. The memory client initiates a stall sequence, while other memory clients continue to execute instructions and the memory continues to service memory load and store operations. When a memory error is detected, a specific bit pattern is stored in conjunction with the data associated with the memory error. When the data is copied from one memory to another memory, the specific bit pattern is also copied, in order to identify the data as having a memory error.

Type: Application

Filed: July 12, 2021

Publication date: January 12, 2023

Inventors: NAVEEN CHERUKURI, SAURABH HUKERIKAR, PAUL RACUNAS, NIRMAL RAJ SAXENA, DAVID CHARLES PATRICK, YIYANG FENG, ABHIJEET GHADGE, STEVEN JAMES HEINRICH, ADAM HENDRICKSON, GENTARO HIROTA, PRAVEEN JOGINIPALLY, VAISHALI KULKARNI, PETER C. MILLS, SANDEEP NAVADA, MANAN PATEL, LIANG YIN
INTEGRATION SCHEME FOR SHUNTED JOSEPHSON JUNCTIONS

Publication number: 20220384704

Abstract: Materials with etch selectivity with respect to one another and one or more additional etch-stop layers are used in a Josephson junction structure to allow for integration with a Josephson junction with supporting structures such as resistors. Selective etch processes compatible with high volume manufacturing are used to pattern various layers of the Josephson junction structure to provide a Josephson junction, which is electrically coupled to a support structure.

Type: Application

Filed: May 28, 2021

Publication date: December 1, 2022

Inventors: Richard P. ROUSE, Karthik JAMBUNATHAN, Susan E. SHORE, Jeremy WARREN, Manan PATEL, Brian BENTON, Kan MI, Kenneth FLUGAUR, Alexander CHOV, Bryan SMITH
Managing data sparsity for neural networks

Patent number: 11392829

Abstract: Approaches in accordance with various embodiments provide for the processing of sparse matrices for mathematical and programmatic operations. In particular, various embodiments enforce sparsity constraints for performing sparse matrix multiply-add instruction (MMA) operations. Deep neural networks can exhibit significant sparsity in the data used in operations, both in the activations and weights. The computational load can be reduced by excluding zero-valued data elements. A sparsity constraint is applied across all submatrices of a sparse matrix, providing fine-grained structured sparsity that is evenly distributed across the matrix. The matrix may then be compressed since a minimum number of elements of the matrix are known to have zero value. Matrix operations are then performed using these matrices.

Type: Grant

Filed: April 2, 2019

Date of Patent: July 19, 2022

Assignee: NVIDIA Corporation

Inventors: Jeff Pool, Ganesh Venkatesh, Jorge Albericio Latorre, Jack Choquette, Ronny Krashinsky, John Tran, Feng Xie, Ming Y. Siu, Manan Patel
Techniques for efficiently operating a processing system based on energy characteristics of instructions and machine learning

Patent number: 11379708

Abstract: An integrated circuit such as, for example a graphics processing unit (GPU), includes a dynamic power controller for adjusting operating voltage and/or frequency. The controller may receive current power used by the integrated circuit and a predicted power determined based on instructions pending in a plurality of processors. The controller determines adjustments that need to be made to the operating voltage and/or frequency to minimize the difference between the current power and the predicted power. An in-system reinforced learning mechanism is included to self-tune parameters of the controller.

Type: Grant

Filed: July 17, 2019

Date of Patent: July 5, 2022

Assignee: NVIDIA Corporation

Inventors: Sachin Idgunji, Ming Y. Siu, Alex Gu, James Reilley, Manan Patel, Rajeshwaran Selvanesan, Ewa Kubalska
Network storage failover systems and associated methods

Patent number: 11269744

Abstract: Failover methods and systems for a networked storage environment are provided. A filtering data structure and a metadata data structure are generated before starting a replay of a log stored in a non-volatile memory of a second storage node, during a failover operation initiated in response to a failure at a first storage node. The second storage node operates as a partner node of the first storage node to mirror at the log one or more write requests received by the first storage node prior to the failure, and data associated with the one or more write requests. The filtering data structure identifies each log entry and the metadata structure stores a metadata attribute of each log entry. The filtering data structure and the metadata structure are used for providing access to a logical storage object during the log replay from the second storage node.

Type: Grant

Filed: April 22, 2020

Date of Patent: March 8, 2022

Assignee: NETAPP, INC.

Inventors: Parag Sarfare, Ananthan Subramanian, Szu-Wen Kuo, Asif Imtiyaz Pathan, Santhosh Selvaraj, Nikhil Mattankot, Manan Patel, Travis Ryan Grusecki
SYSTEM AND METHOD FOR MANAGING AND PRODUCING A DATASET IMAGE ACROSS MULTIPLE STORAGE SYSTEMS

Publication number: 20220012133

Abstract: An application may store data to a dataset comprising a plurality of volumes stored on a plurality of storage systems. The application may request a dataset image of the dataset, the dataset image comprising a volume image of each volume of the dataset. A dataset image manager operates with a plurality of volume image managers in parallel to produce the dataset image, each volume image manager executing on a storage system. The plurality of volume image managers respond by performing requested operations and sending responses to the dataset image manager in parallel. Each volume image manager on a storage system may manage and produce a volume image for each volume of the dataset stored to the storage system. If a volume image for any volume of the dataset fails, or a timeout period expires, a cleanup procedure is performed to delete any successful volume images.

Type: Application

Filed: September 27, 2021

Publication date: January 13, 2022

Inventors: Stephen Wu, Prathamesh Deshpande, Manan Patel
NETWORK STORAGE FAILOVER SYSTEMS AND ASSOCIATED METHODS

Publication number: 20210334179

Abstract: Failover methods and systems for a networked storage environment are provided. A filtering data structure and a metadata data structure are generated before starting a replay of a log stored in a non-volatile memory of a second storage node, during a failover operation initiated in response to a failure at a first storage node. The second storage node operates as a partner node of the first storage node to mirror at the log one or more write requests received by the first storage node prior to the failure, and data associated with the one or more write requests. The filtering data structure identifies each log entry and the metadata structure stores a metadata attribute of each log entry. The filtering data structure and the metadata structure are used for providing access to a logical storage object during the log replay from the second storage node.

Type: Application

Filed: April 22, 2020

Publication date: October 28, 2021

Applicant: NETAPP, INC.

Inventors: Parag Sarfare, Ananthan Subramanian, Szu-Wen Kuo, Asif Imtiyaz Pathan, Santhosh Selvaraj, Nikhil Mattankot, Manan Patel, Travis Ryan Grusecki
TECHNIQUES FOR EFFICIENTLY TRANSFERRING DATA TO A PROCESSOR

Publication number: 20210326137

Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.

Type: Application

Filed: June 30, 2021

Publication date: October 21, 2021

Inventors: Andrew KERR, Jack CHOQUETTE, Xiaogang QIU, Omkar PARANJAPE, Poornachandra RAO, Shirish GADRE, Steven J. HEINRICH, Manan PATEL, Olivier GIROUX, Alan KAATZ
System and method for enforcing a dataset timeout for generating a dataset image

Patent number: 11132262

Abstract: An application may store data to a dataset comprising a plurality of volumes stored on a plurality of storage systems. The application may request a dataset image of the dataset, the dataset image comprising a volume image of each volume of the dataset. A dataset image manager operates with a plurality of volume image managers in parallel to produce the dataset image, each volume image manager executing on a storage system. The plurality of volume image managers respond by performing requested operations and sending responses to the dataset image manager in parallel. Each volume image manager on a storage system may manage and produce a volume image for each volume of the dataset stored to the storage system. If a volume image for any volume of the dataset fails, or a timeout period expires, a cleanup procedure is performed to delete any successful volume images.

Type: Grant

Filed: June 3, 2019

Date of Patent: September 28, 2021

Assignee: NetApp Inc.

Inventors: Stephen Wu, Prathamesh Deshpande, Manan Patel
System and method for utilizing operation identifiers for communicating with storage systems to perform a dataset image operation

Patent number: 11132261

Abstract: An application may store data to a dataset comprising a plurality of volumes stored on a plurality of storage systems. The application may request a dataset image of the dataset, the dataset image comprising a volume image of each volume of the dataset. A dataset image manager operates with a plurality of volume image managers in parallel to produce the dataset image, each volume image manager executing on a storage system. The plurality of volume image managers respond by performing requested operations and sending responses to the dataset image manager in parallel. Each volume image manager on a storage system may manage and produce a volume image for each volume of the dataset stored to the storage system. If a volume image for any volume of the dataset fails, or a timeout period expires, a cleanup procedure is performed to delete any successful volume images.

Type: Grant

Filed: June 3, 2019

Date of Patent: September 28, 2021

Assignee: NetApp Inc.

Inventors: Stephen Wu, Prathamesh Deshpande, Manan Patel
Techniques for efficiently transferring data to a processor

Patent number: 11080051

Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.

Type: Grant

Filed: December 12, 2019

Date of Patent: August 3, 2021

Assignee: NVIDIA Corporation

Inventors: Andrew Kerr, Jack Choquette, Xiaogang Qiu, Omkar Paranjape, Poornachandra Rao, Shirish Gadre, Steven J. Heinrich, Manan Patel, Olivier Giroux, Alan Kaatz
TECHNIQUES FOR EFFICIENTLY TRANSFERRING DATA TO A PROCESSOR

Publication number: 20210124582

Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.

Type: Application

Filed: December 12, 2019

Publication date: April 29, 2021

Inventors: Andrew Kerr, Jack Choquette, Xiaogang Qiu, Omkar Paranjape, Poornachandra Rao, Shirish Gadre, Steven J. Heinrich, Manan Patel, Olivier Giroux, Alan Kaatz
Binding constants at runtime for improved resource utilization

Patent number: 10877757

Abstract: A just-in-time (JIT) compiler binds constants to specific memory locations at runtime. The JIT compiler parses program code derived from a multithreaded application and identifies an instruction that references a uniform constant. The JIT compiler then determines a chain of pointers that originates within a root table specified in the multithreaded application and terminates at the uniform constant. The JIT compiler generates additional instructions for traversing the chain of pointers and inserts these instructions into the program code. A parallel processor executes this compiled code and, in doing so, causes a thread to traverse the chain of pointers and bind the uniform constant to a uniform register at runtime. Each thread in a group of threads executing on the parallel processor may then access the uniform constant.

Type: Grant

Filed: February 14, 2018

Date of Patent: December 29, 2020

Assignee: NVIDIA Corporation

Inventors: Ajay Tirumala, Jack Choquette, Manan Patel, Shirish Gadre, Praveen Kaushik, Amanpreet Grewal, Shekhar Divekar, Andrei Khodakovsky

prev 1 2 3 4 next