Patents by Inventor Manan Patel

Manan Patel has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

SUGGESTING COFFEE BEAN GRIND SIZE FOR BEVERAGE MACHINES

Publication number: 20250235034

Abstract: Various illustrative systems, devices, and methods for beverage machines (e.g., drip coffee machines, espresso machines, etc.) are provided. In an exemplary implementation, an espresso machine is configured to brew and dispense espresso. In an exemplary implementation, the espresso machine is configured to determine a recommended coffee bean grind size for a beverage, e.g., an espresso or a sprover-style drink, selected by a user and to provide the recommended coffee bean grind size to the user.

Type: Application

Filed: May 1, 2024

Publication date: July 24, 2025

Inventors: Korkut Bekiroglu, Manan Patel, Steven P. Carter, Scott John Shaw, Eli Piscitelli, Ethan T. Brown, James Parry, Joshua Cole Dettmar, Karol Rabalski
Cooperative group arrays

Patent number: 12333311

Abstract: A new level(s) of hierarchy—Cooperate Group Arrays (CGAs)—and an associated new hardware-based work distribution/execution model is described. A CGA is a grid of thread blocks (also referred to as cooperative thread arrays (CTAs)). CGAs provide co-scheduling, e.g., control over where CTAs are placed/executed in a processor (such as a GPU), relative to the memory required by an application and relative to each other. Hardware support for such CGAs guarantees concurrency and enables applications to see more data locality, reduced latency, and better synchronization between all the threads in tightly cooperating collections of CTAs programmably distributed across different (e.g., hierarchical) hardware domains or partitions.

Type: Grant

Filed: March 10, 2022

Date of Patent: June 17, 2025

Assignee: NVIDIA Corporation

Inventors: Greg Palmer, Gentaro Hirota, Ronny Krashinsky, Ze Long, Brian Pharris, Rajballav Dash, Jeff Tuckey, Jerome F. Duluk, Jr., Lacky Shah, Luke Durant, Jack Choquette, Eric Werness, Naman Govil, Manan Patel, Shayani Deb, Sandeep Navada, John Edmondson, Prakash Bangalore Prabhakar, Wish Gandhi, Ravi Manyam, Apoorv Parle, Olivier Giroux, Shirish Gadre, Steve Heinrich
Distributed Shared Memory

Publication number: 20250173152

Abstract: Distributed shared memory (DSMEM) comprises blocks of memory that are distributed or scattered across a processor (such as a GPU). Threads executing on a processing core local to one memory block are able to access a memory block local to a different processing core. In one embodiment, shared access to these DSMEM allocations distributed across a collection of processing cores is implemented by communications between the processing cores. Such distributed shared memory provides very low latency memory access for processing cores located in proximity to the memory blocks, and also provides a way for more distant processing cores to also access the memory blocks in a manner and using interconnects that do not interfere with the processing cores' access to main or global memory such as backed by an L2 cache.

Type: Application

Filed: January 31, 2025

Publication date: May 29, 2025

Inventors: Prakash BANGALORE PRABHAKAR, Gentaro HIROTA, Ronny KRASHINSKY, Ze LONG, Brian PHARRIS, Rajballav DASH, Jeff TUCKEY, Jerome F. DULUK, JR., Lacky SHAH, Luke DURANT, Jack CHOQUETTE, Eric WERNESS, Naman GOVIL, Manan PATEL, Shayani DEB, Sandeep NAVADA, John EDMONDSON, Greg PALMER, Wish GANDHI, Ravi MANYAM, Apoorv PARLE, Olivier GIROUX, Shirish GADRE, Steve HEINRICH
Distributed shared memory

Patent number: 12248788

Abstract: Distributed shared memory (DSMEM) comprises blocks of memory that are distributed or scattered across a processor (such as a GPU). Threads executing on a processing core local to one memory block are able to access a memory block local to a different processing core. In one embodiment, shared access to these DSMEM allocations distributed across a collection of processing cores is implemented by communications between the processing cores. Such distributed shared memory provides very low latency memory access for processing cores located in proximity to the memory blocks, and also provides a way for more distant processing cores to also access the memory blocks in a manner and using interconnects that do not interfere with the processing cores' access to main or global memory such as hacked by an L2 cache.

Type: Grant

Filed: March 10, 2022

Date of Patent: March 11, 2025

Assignee: NVIDIA Corporation

Inventors: Prakash Bangalore Prabhakar, Gentaro Hirota, Ronny Krashinsky, Ze Long, Brian Pharris, Rajballav Dash, Jeff Tuckey, Jerome F. Duluk, Jr., Lacky Shah, Luke Durant, Jack Choquette, Eric Werness, Naman Govil, Manan Patel, Shayani Deb, Sandeep Navada, John Edmondson, Greg Palmer, Wish Gandhi, Ravi Manyam, Apoorv Parle, Olivier Giroux, Shirish Gadre, Steve Heinrich
METHOD AND APPARATUS FOR WEIGHT-STATIONARY DIRECT CONVOLUTION CALCULATION

Publication number: 20250077615

Abstract: Systems and methods for efficient convolution based on matrix multiply and add (MMA) are described. An example processor having a plurality of processing lanes is configured to perform convolution of a matrix of activation elements and a filter matrix in accordance with a configurable series of instructions including a plurality of MMA instructions and shift instructions while reusing activation elements already loaded to the processor or associated memory over a plurality of MMA operations. The filter elements are held stationary at inputs to the processor for multiple cycles for multiple MMA operations while activations are streamed in. Associated methods are also described.

Type: Application

Filed: August 30, 2023

Publication date: March 6, 2025

Inventors: Manan PATEL, Jack CHOQUETTE, Alexander L. MINKIN, Maciej Tyrlik
METHOD AND APPARATUS FOR DIRECT CONVOLUTION CALCULATION

Publication number: 20250060938

Abstract: Systems and methods for efficient convolution based on matrix multiply and add (MMA) are described. An example processor having a plurality of processing lanes is configured to perform convolution of a matrix of activation elements and a filter matrix in accordance with a configurable series of instructions including a plurality of MMA instructions and shift instructions while reusing activation elements already loaded to the datapath or associated memory over a plurality of MMA operations. Associated methods are also described.

Type: Application

Filed: August 14, 2023

Publication date: February 20, 2025

Inventors: Jack CHOQUETTE, Po-An TSAI, Alexander L. MINKIN, Manan PATEL, Neal Clayton CRAGO, Daniel STIFFLER, Kefeng DUAN, Yu-Jung CHEN, Jing LI, Qian WANG, Ronny KRASHINSKY, Jun YANG, Feng XIE
SELF-SYNCHRONIZING REMOTE MEMORY OPERATIONS IN A MULTIPROCESSOR SYSTEM

Publication number: 20240393951

Abstract: Various embodiments include techniques for performing self-synchronizing remote memory operations in a multiprocessor computing system. During a remote memory operation in the multiprocessor computing system, a source processing unit transmits multiple segments of data to a destination processing. For each segment of data, the source processing unit transmits a remote memory operation to the destination processing unit that includes associated metadata that identifies the memory location of a corresponding synchronization object. The remote memory operation along with the metadata is transmitted as a single unit to the destination processing unit. The destination processing unit splits the operation into the remote memory operation and the memory synchronization operation. As a result, the source processing unit avoids the need to perform a separate memory synchronization operation, thereby reducing inter-processor communications and increasing performance of remote memory operations.

Type: Application

Filed: July 10, 2024

Publication date: November 28, 2024

Inventors: Srinivas Santosh Kumar MADUGULA, Olivier GIROUX, Wishwesh Anil GANDHI, Michael Allen PARKER, Raghuram L, Ivan TANASIC, Manan PATEL, Mark HUMMEL, Alexander L. MINKIN
Method and apparatus for efficient access to multidimensional data structures and/or other large data blocks

Patent number: 12141082

Abstract: A parallel processing unit comprises a plurality of processors each being coupled to a memory access hardware circuitry. Each memory access hardware circuitry is configured to receive, from the coupled processor, a memory access request specifying a coordinate of a multidimensional data structure, wherein the memory access hardware circuit is one of a plurality of memory access circuitry each coupled to a respective one of the processors; and, in response to the memory access request, translate the coordinate of the multidimensional data structure into plural memory addresses for the multidimensional data structure and using the plural memory addresses, asynchronously transfer at least a portion of the multidimensional data structure for processing by at least the coupled processor. The memory locations may be in the shared memory of the coupled processor and/or an external memory.

Type: Grant

Filed: March 10, 2022

Date of Patent: November 12, 2024

Assignee: NVIDIA CORPORATION

Inventors: Alexander L. Minkin, Alan Kaatz, Oliver Giroux, Jack Choquette, Shirish Gadre, Manan Patel, John Tran, Ronny Krashinsky, Jeff Schottmiller
SELF-SYNCHRONIZING REMOTE MEMORY OPERATIONS IN A DATA CENTER OR MULTIPROCESSOR SYSTEM

Publication number: 20240354106

Abstract: Various embodiments include techniques for performing self-synchronizing remote memory operations in a data center or multiprocessor computing system. During a remote memory operation, a source processor transmits multiple data segments to a destination processor. For each data segment, the source processor transmits a remote memory operation to the destination processor that includes associated metadata that identifies the memory location of a corresponding synchronization object representing a count of data segments to be stored or a flag for each data segment to be stored. The remote memory operation along with the metadata is transmitted as a single unit to the destination processor. The destination processor splits the operation into the remote memory operation and the memory synchronization operation.

Type: Application

Filed: June 26, 2024

Publication date: October 24, 2024

Inventors: Srinivas Santosh Kumar MADUGULA, Olivier GIROUX, Wishwesh Anil GANDHI, Michael Allen PARKER, Raghuram L, Ivan TANASIC, Manan PATEL, Mark HUMMEL, Alexander L. MINKIN, Gregory Michael THORSON
Self-synchronizing remote memory operations in a multiprocessor system

Patent number: 12105960

Abstract: Various embodiments include techniques for performing self-synchronizing remote memory operations in a multiprocessor computing system. During a remote memory operation in the multiprocessor computing system, a source processing unit transmits multiple segments of data to a destination processing. For each segment of data, the source processing unit transmits a remote memory operation to the destination processing unit that includes associated metadata that identifies the memory location of a corresponding synchronization object. The remote memory operation along with the metadata is transmitted as a single unit to the destination processing unit. The destination processing unit splits the operation into the remote memory operation and the memory synchronization operation. As a result, the source processing unit avoids the need to perform a separate memory synchronization operation, thereby reducing inter-processor communications and increasing performance of remote memory operations.

Type: Grant

Filed: August 31, 2022

Date of Patent: October 1, 2024

Assignee: NVIDIA CORPORATION

Inventors: Srinivas Santosh Kumar Madugula, Olivier Giroux, Wishwesh Anil Gandhi, Michael Allen Parker, Raghuram L, Ivan Tanasic, Manan Patel, Mark Hummel, Alexander L. Minkin
PROGRAMMATICALLY CONTROLLED DATA MULTICASTING ACROSS MULTIPLE COMPUTE ENGINES

Publication number: 20240289132

Abstract: This specification describes a programmatic multicast technique enabling one thread (for example, in a cooperative group array (CGA) on a GPU) to request data on behalf of one or more other threads (for example, executing on respective processor cores of the GPU). The multicast is supported by tracking circuitry that interfaces between multicast requests received from processor cores and the available memory. The multicast is designed to reduce cache (for example, layer 2 cache) bandwidth utilization enabling strong scaling and smaller tile sizes.

Type: Application

Filed: May 10, 2024

Publication date: August 29, 2024

Inventors: Apoorv PARLE, Ronny KRASHINSKY, John EDMONDSON, Jack CHOQUETTE, Shirish GADRE, Steve HEINRICH, Manan PATEL, Prakash Bangalore PRABHAKAR, JR., Ravi MANYAM, Wish GANDHI, Lacky SHAH, Alexander L. Minkin
Programmatically controlled data multicasting across multiple compute engines

Patent number: 12020035

Abstract: This specification describes a programmatic multicast technique enabling one thread (for example, in a cooperative group array (CGA) on a GPU) to request data on behalf of one or more other threads (for example, executing on respective processor cores of the GPU). The multicast is supported by tracking circuitry that interfaces between multicast requests received from processor cores and the available memory. The multicast is designed to reduce cache (for example, layer 2 cache) bandwidth utilization enabling strong scaling and smaller tile sizes.

Type: Grant

Filed: March 10, 2022

Date of Patent: June 25, 2024

Assignee: NVIDIA CORPORATION

Inventors: Apoorv Parle, Ronny Krashinsky, John Edmondson, Jack Choquette, Shirish Gadre, Steve Heinrich, Manan Patel, Prakash Bangalore Prabhakar, Jr., Ravi Manyam, Wish Gandhi, Lacky Shah, Alexander L. Minkin
System and method for managing and producing a dataset image across multiple storage systems

Patent number: 11966301

Abstract: An application may store data to a dataset comprising a plurality of volumes stored on a plurality of storage systems. The application may request a dataset image of the dataset, the dataset image comprising a volume image of each volume of the dataset. A dataset image manager operates with a plurality of volume image managers in parallel to produce the dataset image, each volume image manager executing on a storage system. The plurality of volume image managers respond by performing requested operations and sending responses to the dataset image manager in parallel. Each volume image manager on a storage system may manage and produce a volume image for each volume of the dataset stored to the storage system. If a volume image for any volume of the dataset fails, or a timeout period expires, a cleanup procedure is performed to delete any successful volume images.

Type: Grant

Filed: September 27, 2021

Date of Patent: April 23, 2024

Assignee: NetApp, Inc.

Inventors: Stephen Wu, Prathamesh Deshpande, Manan Patel
SELF-SYNCHRONIZING REMOTE MEMORY OPERATIONS IN A MULTIPROCESSOR SYSTEM

Publication number: 20240069736

Abstract: Various embodiments include techniques for performing self-synchronizing remote memory operations in a multiprocessor computing system. During a remote memory operation in the multiprocessor computing system, a source processing unit transmits multiple segments of data to a destination processing. For each segment of data, the source processing unit transmits a remote memory operation to the destination processing unit that includes associated metadata that identifies the memory location of a corresponding synchronization object. The remote memory operation along with the metadata is transmitted as a single unit to the destination processing unit. The destination processing unit splits the operation into the remote memory operation and the memory synchronization operation. As a result, the source processing unit avoids the need to perform a separate memory synchronization operation, thereby reducing inter-processor communications and increasing performance of remote memory operations.

Type: Application

Filed: August 31, 2022

Publication date: February 29, 2024

Inventors: Srinivas Santosh Kumar MADUGULA, Olivier GIROUX, Wishwesh Anil GANDHI, Michael Allen PARKER, Raghuram L, Ivan TANASIC, Manan PATEL, Mark HUMMEL, Alexander L. MINKIN
Techniques for efficiently transferring data to a processor

Patent number: 11907717

Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.

Type: Grant

Filed: February 8, 2023

Date of Patent: February 20, 2024

Assignee: NVIDIA Corporation

Inventors: Andrew Kerr, Jack Choquette, Xiaogang Qiu, Omkar Paranjape, Poornachandra Rao, Shirish Gadre, Steven J. Heinrich, Manan Patel, Olivier Giroux, Alan Kaatz
DATASET IMAGE CREATION

Publication number: 20240012715

Abstract: An application may store data to a dataset comprising a plurality of volumes stored on a plurality of storage systems. The application may request a dataset image of the dataset, the dataset image comprising a volume image of each volume of the dataset. A dataset image manager operates with a plurality of volume image managers in parallel to produce the dataset image, each volume image manager executing on a storage system. The plurality of volume image managers respond by performing requested operations and sending responses to the dataset image manager in parallel. Each volume image manager on a storage system may manage and produce a volume image for each volume of the dataset stored to the storage system. If a volume image for any volume of the dataset fails, or a timeout period expires, a cleanup procedure is performed to delete any successful volume images.

Type: Application

Filed: September 25, 2023

Publication date: January 11, 2024

Inventors: Stephen Wu, Prathamesh Deshpande, Manan Patel
FAST DATA SYNCHRONIZATION IN PROCESSORS AND MEMORY

Publication number: 20230315655

Abstract: A new synchronization system synchronizes data exchanges between producer processes and consumer processes which may be on the same or different processors in a multiprocessor system. The synchronization incurs less than one roundtrip of latency - in some implementations, in approximately 0.5 roundtrip times. A key aspect of the fast synchronization is that the producer’s data store is followed without delay with the updating of a barrier on which the consumer is waiting.

Type: Application

Filed: March 10, 2022

Publication date: October 5, 2023

Inventors: Jack CHOQUETTE, Ronny KRASHINSKY, Timothy GUO, Carter EDWARDS, Steve HEINRICH, John EDMONDSON, Prakash Bangalore PRABHAKAR, Apoorv PARLE, JR., Manan PATEL, Olivier GIROUX, Michael PELLAUER
Rollback procedure for failed dataset image operation

Patent number: 11768737

Abstract: An application may store data to a dataset comprising a plurality of volumes stored on a plurality of storage systems. The application may request a dataset image of the dataset, the dataset image comprising a volume image of each volume of the dataset. A dataset image manager operates with a plurality of volume image managers in parallel to produce the dataset image, each volume image manager executing on a storage system. The plurality of volume image managers respond by performing requested operations and sending responses to the dataset image manager in parallel. Each volume image manager on a storage system may manage and produce a volume image for each volume of the dataset stored to the storage system. If a volume image for any volume of the dataset fails, or a timeout period expires, a cleanup procedure is performed to delete any successful volume images.

Type: Grant

Filed: November 14, 2019

Date of Patent: September 26, 2023

Assignee: NetApp, Inc.

Inventors: Stephen Wu, Prathamesh Deshpande, Manan Patel
RECONFIGURING REGISTER AND SHARED MEMORY USAGE IN THREAD ARRAYS

Publication number: 20230297426

Abstract: Various embodiments include techniques for utilizing resources on a processing unit. Thread groups executing on a processor begin execution with specified resources, such as a number of registers and an amount of shared memory. During execution, one or more thread groups may determine that the thread groups have excess resources needed to execute the current functions. Such thread groups can deallocate the excess resources to a free pool. Similarly, during execution, one or more thread groups may determine that the thread groups have fewer resources needed to execute the current functions. Such thread groups can allocate the needed resources from the free pool. Further, producer thread groups that generate data for consumer thread groups can deallocate excess resources prior to completion. The consumer thread groups can allocate the excess resources and initiate execution while the producer thread groups complete execution, thereby decreasing latency between producer and consumer thread groups.

Type: Application

Filed: March 18, 2022

Publication date: September 21, 2023

Inventors: Rajballav DASH, Stephen JONES, Jack Hilaire CHOQUETTE, Manan PATEL, Ronny M. KRASHINSKY, Shirish GADRE, Lixia QIN
Efficient Matrix Multiply and Add with a Group of Warps

Publication number: 20230289398

Abstract: This specification describes techniques for implementing matrix multiply and add (MMA) operations in graphics processing units (GPU)s and other processors. The implementations provide for a plurality of warps of threads to collaborate in generating the result matrix by enabling each thread to share its respective register files to be accessed by the datapaths associated with other threads in the group of warps. A state machine circuit controls a MMA execution among the warps executing on asynchronous computation units. A group MMA (GMMA) instruction provides for a descriptor to be provided as parameter where the descriptor may include information regarding size and formats of input data to be loaded into shared memory and/or the datapath.

Type: Application

Filed: March 10, 2022

Publication date: September 14, 2023

Inventors: Jack CHOQUETTE, Manan PATEL, Matt TYRLIK, Ronny KRASHINSKY

1 2 3 4 next