Patents by Inventor Sreenivas Aerra Reddy

Sreenivas Aerra Reddy has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

System and method for queuing commands in a deep learning processor

Patent number: 11941440

Abstract: A method includes: dequeuing a signal primitive from a signaling command queue in the set of command queues, the signal primitive pointing to a waiting command queue; in response to the signal primitive pointing to the waiting command queue, incrementing a number of pending signal primitives in the signal-wait counter matrix; dequeuing a wait primitive from the waiting command queue, the wait primitive pointing to the signaling command queue; in response to the wait primitive pointing to the signaling command queue, accessing the register to read the number of pending signal primitives; in response to the number of pending signal primitives indicating at least one pending signal primitive: decrementing the number of pending signal primitives; and dequeuing an instruction from the waiting command queue; and dispatching a control signal representing the instruction to a resource.

Type: Grant

Filed: October 25, 2022

Date of Patent: March 26, 2024

Assignee: Deep Vision Inc.

Inventors: Mohamed Shahim, Sreenivas Aerra Reddy, Raju Datla, Lava Kumar Bokam, Suresh Kumar Vennam, Sameek Banerjee
PROXY SYSTEMS AND METHODS FOR MULTIPROCESSING ARCHITECTURES

Publication number: 20230409936

Abstract: Proxy systems and methods for multiprocessing architectures are described. One method includes receiving an inference request and a statistics request from a client computing system. The method may access a load state of each processing device in a subset of processing devices preloaded with the neural network model, and select a target processing device from the subset based on the load states. One aspect includes transmitting the inference request to the target processing device, and monitoring an execution of the inference request by the target processing device based on the neural network model. The method may receive an inference result generated by the target processing device after executing the inference request, and compute an average inference time for the inference request execution based on the monitoring. The method may transmit the inference result and the average inference time to the client computing system.

Type: Application

Filed: May 17, 2023

Publication date: December 21, 2023

Inventors: Lava Kumar Bokam, Sriduth Jayhari, Divya Vipin, Rajashekar Reddy Ereddy, Snigdha Alkanti, Venkateswara Rao Andole Mankali, Suresh Kumar Vennam, Mohammed Mujahid, Sreenivas Aerra Reddy
PROXY SYSTEMS AND METHODS FOR MULTIPROCESSING ARCHITECTURES

Publication number: 20230376728

Abstract: Proxy systems and methods for multiprocessing architectures are described. One method includes receiving a neural network model from a client computing system. System resource availability on a plurality of processing devices may be assessed, and a subset of available processing devices may be selected based on the system resource availability. In one aspect, the neural network model is loaded into each processing device in the subset. The method may include receiving an inference request from the client computing system. A load state of each processing device in the subset may be accessed, and a target processing device from the subset may be selected based on the load states. The inference request may be transmitted to the target processing device.

Type: Application

Filed: May 17, 2023

Publication date: November 23, 2023

Inventors: Lava Kumar Bokam, Sriduth Jayhari, Divya Vipin, Rajashekar Reddy Ereddy, Snigdha Alkanti, Venkateswara Rao Andole Mankali, Suresh Kumar Vennam, Mohammed Mujahid, Sreenivas Aerra Reddy
A PROCESSOR SYSTEM AND METHOD FOR INCREASING DATA-TRANSFER BANDWIDTH DURING EXECUTION OF A SCHEDULED PARALLEL PROCESS

Publication number: 20230063751

Abstract: A broadcast subsystem of a processor system includes: a set of broadcast buses, each broadcast bus in the set of broadcast buses electrically coupled to a subset of primary memory units in the set of primary memory units; a primary memory unit queue: configured to store a first set of data transfer requests associated with the set of primary memory units; electrically coupled to the data buffer a broadcast scheduler: electrically coupled to the primary memory unit queue; electrically coupled to the set of broadcast buses; and configured to transfer source data from the data buffer to a target subset of primary memory units in the set of primary memory units via the set of broadcast buses based on the set of data transfer requests stored in the primary memory unit queue.

Type: Application

Filed: November 10, 2022

Publication date: March 2, 2023

Inventors: Raju Datla, Mohamed Shahim, Suresh Kumar Vennam, Sreenivas Aerra Reddy
SYSTEM AND METHOD FOR QUEUING COMMANDS IN A DEEP LEARNING PROCESSOR

Publication number: 20230052277

Abstract: A method includes: dequeuing a signal primitive from a signaling command queue in the set of command queues, the signal primitive pointing to a waiting command queue; in response to the signal primitive pointing to the waiting command queue, incrementing a number of pending signal primitives in the signal-wait counter matrix; dequeuing a wait primitive from the waiting command queue, the wait primitive pointing to the signaling command queue; in response to the wait primitive pointing to the signaling command queue, accessing the register to read the number of pending signal primitives; in response to the number of pending signal primitives indicating at least one pending signal primitive: decrementing the number of pending signal primitives; and dequeuing an instruction from the waiting command queue; and dispatching a control signal representing the instruction to a resource.

Type: Application

Filed: October 25, 2022

Publication date: February 16, 2023

Inventors: Mohamed Shahim, Sreenivas Aerra Reddy, Raju Datla, Lava Kumar Bokam, Suresh Kumar Vennam, Sameek Banerjee
Processor system and method for increasing data-transfer bandwidth during execution of a scheduled parallel process

Patent number: 11526767

Abstract: A broadcast subsystem of a processor system includes: a set of broadcast buses, each broadcast bus in the set of broadcast buses electrically coupled to a subset of primary memory units in the set of primary memory units; a primary memory unit queue: configured to store a first set of data transfer requests associated with the set of primary memory units; and electrically coupled to the data buffer a broadcast scheduler: electrically coupled to the primary memory unit queue; electrically coupled to the set of broadcast buses; and configured to transfer source data from the data buffer to a target subset of primary memory units in the set of primary memory units via the set of broadcast buses based on the set of data transfer requests stored in the primary memory unit queue.

Type: Grant

Filed: August 30, 2021

Date of Patent: December 13, 2022

Assignee: Deep Vision Inc.

Inventors: Raju Datla, Mohamed Shahim, Suresh Kumar Vennam, Sreenivas Aerra Reddy
System and method for queuing commands in a deep learning processor

Patent number: 11513847

Abstract: A method includes: dequeuing a signal primitive from a signaling command queue in the set of command queues, the signal primitive pointing to a waiting command queue; in response to the signal primitive pointing to the waiting command queue, incrementing a number of pending signal primitives in the signal-wait counter matrix; dequeuing a wait primitive from the waiting command queue, the wait primitive pointing to the signaling command queue; in response to the wait primitive pointing to the signaling command queue, accessing the register to read the number of pending signal primitives; in response to the number of pending signal primitives indicating at least one pending signal primitive: decrementing the number of pending signal primitives; and dequeuing an instruction from the waiting command queue; and dispatching a control signal representing the instruction to a resource.

Type: Grant

Filed: March 24, 2021

Date of Patent: November 29, 2022

Assignee: Deep Vision Inc.

Inventors: Mohamed Shahim, Sreenivas Aerra Reddy, Raju Datla, Lava Kumar Bokam, Suresh Kumar Vennam, Sameek Banerjee
PROCESSOR SYSTEM AND METHOD FOR INCREASING DATA-TRANSFER BANDWIDTH DURING EXECUTION OF A SCHEDULED PARALLEL PROCESS

Publication number: 20220066963

Abstract: A broadcast subsystem of a processor system includes: a set of broadcast buses, each broadcast bus in the set of broadcast buses electrically coupled to a subset of primary memory units in the set of primary memory units; a primary memory unit queue: configured to store a first set of data transfer requests associated with the set of primary memory units; and electrically coupled to the data buffer a broadcast scheduler: electrically coupled to the primary memory unit queue; electrically coupled to the set of broadcast buses; and configured to transfer source data from the data buffer to a target subset of primary memory units in the set of primary memory units via the set of broadcast buses based on the set of data transfer requests stored in the primary memory unit queue.

Type: Application

Filed: August 30, 2021

Publication date: March 3, 2022

Inventors: Raju Datla, Mohamed Shahim, Suresh Kumar Vennam, Sreenivas Aerra Reddy
PROCESSOR SYSTEM AND METHOD FOR INCREASING DATA-TRANSFER BANDWIDTH DURING EXECUTION OF A SCHEDULED PARALLEL PROCESS

Publication number: 20220067536

Abstract: A broadcast subsystem of a processor system includes: a set of broadcast buses, each broadcast bus in the set of broadcast buses electrically coupled to a subset of primary memory units in the set of primary memory units; a primary memory unit queue: configured to store a first set of data transfer requests associated with the set of primary memory units; and electrically coupled to the data buffer a broadcast scheduler: electrically coupled to the primary memory unit queue; electrically coupled to the set of broadcast buses; and configured to transfer source data from the data buffer to a target subset of primary memory units in the set of primary memory units via the set of broadcast buses based on the set of data transfer requests stored in the primary memory unit queue.

Type: Application

Filed: August 30, 2021

Publication date: March 3, 2022

Inventors: Raju Datla, Mohamed Shahim, Suresh Kumar Vennam, Sreenivas Aerra Reddy
SYSTEM AND METHOD FOR QUEUING COMMANDS IN A DEEP LEARNING PROCESSOR

Publication number: 20210303346

Abstract: A method includes: dequeuing a signal primitive from a signaling command queue in the set of command queues, the signal primitive pointing to a waiting command queue; in response to the signal primitive pointing to the waiting command queue, incrementing a number of pending signal primitives in the signal-wait counter matrix; dequeuing a wait primitive from the waiting command queue, the wait primitive pointing to the signaling command queue; in response to the wait primitive pointing to the signaling command queue, accessing the register to read the number of pending signal primitives; in response to the number of pending signal primitives indicating at least one pending signal primitive: decrementing the number of pending signal primitives; and dequeuing an instruction from the waiting command queue; and dispatching a control signal representing the instruction to a resource.

Type: Application

Filed: March 24, 2021

Publication date: September 30, 2021

Inventors: Mohamed Shahim, Sreenivas Aerra Reddy, Raju Datla, Lava Kumar Bokam, Suresh Kumar Vennam, Sameek Banerjee
METHOD FOR STATIC SCHEDULING OF ARTIFICIAL NEURAL NETWORKS FOR A PROCESSOR

Publication number: 20210191765

Abstract: A method for scheduling an artificial neural network includes: accessing a processor representation of a multicore processor comprising processor cores, direct memory access cores, and a cost model; and accessing a network structure defining a set of layers. The method also includes, for each layer in the set of layers: generating a graph based on the processor representation, the graph defining compute nodes, data transfer nodes, and edges representing dependencies between the compute nodes and the data transfer nodes; and generating a schedule for the layer based on the graph, the schedule assigning the compute nodes to the processor cores and assigning the data transfer nodes to the direct memory access cores. The method further includes aggregating the schedule for each layer in the set of layers to generate a complete schedule for the artificial neural network.

Type: Application

Filed: December 18, 2020

Publication date: June 24, 2021

Inventors: Lava Kumar Bokam, Sameek Bannerjee, Abhilash Bharath Ghanore, Rajashekar Reddy Ereddy, Wajahat Qadeer, Rehan Hameed, Mohamed Shahim, Sreenivas Aerra Reddy
Maintaining optimum voltage supply to match performance of an integrated circuit

Patent number: 9134782

Abstract: Power supply voltage to an integrated circuit (IC) or a portion of an IC is maintained at an optimum level matching the IC performance. Voltage ranges and delay measures for corresponding operating frequencies are stored in tables in a voltage control block. When a new frequency of operation is desired, the voltage control block measures delay performance of the IC, and sets the supply voltage to a value specified in a corresponding entry in a table. The voltage control block then continues to measure delay performance, and dynamically adjusts the power supply voltage to an optimum value thereby minimizing power consumption.

Type: Grant

Filed: May 7, 2007

Date of Patent: September 15, 2015

Assignee: NVIDIA CORPORATION

Inventors: Sreenivas Aerra Reddy, Srinivasan Arulanandam, Venkataraman Rajaraman
Maintaining Optimum Voltage Supply To Match Performance Of An Integrated Circuit

Publication number: 20080282102

Abstract: Power supply voltage to an integrated circuit (IC) or a portion of an IC is maintained at an optimum level matching the IC performance. Voltage ranges and delay measures for corresponding operating frequencies are stored in tables in a voltage control block. When a new frequency of operation is desired, the voltage control block measures delay performance of the IC, and sets the supply voltage to a value specified in a corresponding entry in a table. The voltage control block then continues to measure delay performance, and dynamically adjusts the power supply voltage to an optimum value thereby minimizing power consumption.

Type: Application

Filed: May 7, 2007

Publication date: November 13, 2008

Applicant: NVIDIA Corporation

Inventors: Sreenivas Aerra Reddy, Srinivasan Arulanandam, Venkataraman Rajaraman