Patents by Inventor Satyaki Koneru

Satyaki Koneru has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240036921
    Abstract: Methods, systems, and apparatuses for graph stream processing are disclosed. One apparatus includes a cascade of graph streaming processors, wherein each of the graph streaming processor includes a processor array, and a graph streaming processor scheduler. The cascade of graph streaming processors further includes a plurality of shared command buffers, wherein each shared command buffer includes a buffer address, a write pointer, and a read pointer, wherein for each of the plurality of shared command buffers a graph streaming processor writes commands to the shared command buffer as indicated by the write pointer of the shared command buffer and the graph streaming processor reads commands from the shared command buffer as indicated by the read pointer, wherein at least one graph streaming processor scheduler operates to manage the write pointer and the read pointer to avoid overwriting unused commands of the shared command buffer.
    Type: Application
    Filed: October 16, 2023
    Publication date: February 1, 2024
    Applicant: Blaize, Inc.
    Inventors: Venkata Ganapathi Puppala, Sarvendra Govindammagari, Lokesh Agarwal, Satyaki Koneru
  • Patent number: 11853762
    Abstract: Systems, apparatuses and methods are disclosed for efficient management of registers in a graph stream processing (GSP) system. The GSP system includes a thread scheduler module operative to initiate a Single Instruction Multiple Data (SIMD) thread, the SIMD thread including a dispatch mask with an initial value. A thread arbiter module operative to select an instruction from the instructions and provide the instruction to each of one or more compute resources, and an instruction iterator module, associated with the each of one or more compute resources operative to determine a data type of the instruction. The instruction iterator module iteratively executes the instruction based on the data type and the dispatch mask.
    Type: Grant
    Filed: May 20, 2022
    Date of Patent: December 26, 2023
    Assignee: Blaize, Inc.
    Inventors: Kamaraj Thangam, Srinivasulu Nagisetty, Venkata Divya Bharathi Palaparthy, Aswathy Asok, Satyaki Koneru
  • Patent number: 11822960
    Abstract: Methods, systems, and apparatuses for graph stream processing are disclosed. One apparatus includes a cascade of graph streaming processors, wherein each of the graph streaming processor includes a processor array, and a graph streaming processor scheduler. The cascade of graph streaming processors further includes a plurality of shared command buffers, wherein each shared command buffer includes a buffer address, a write pointer, and a read pointer, wherein for each of the plurality of shared command buffers a first graph streaming processor writes commands to the shared command buffer as indicated by the write pointer of the shared command buffer and a second graph streaming processor reads commands from the shared command buffer as indicated by the read pointer, wherein at least one graph streaming processor scheduler operates to manage the write pointer and the read pointer to avoid overwriting unused commands of the shared command buffer.
    Type: Grant
    Filed: June 7, 2022
    Date of Patent: November 21, 2023
    Assignee: Blaize, Inc.
    Inventors: Venkata Ganapathi Puppala, Sarvendra Govindammagari, Lokesh Agarwal, Satyaki Koneru
  • Patent number: 11755368
    Abstract: Systems and methods are disclosures for scheduling code in a multiprocessor system. Code is portioned into code blocks by a compiler. The compiler schedules execution of code blocks in nodes. The nodes are connected in a directed acyclical graph with a top node, terminal node and a plurality of intermediate nodes. Execution of the top node is initiated by the compiler. After executing at least one instance of the top node, an instruction in the code block indicates to the scheduler to initiate at least one intermediary node. The scheduler schedules a thread for execution of the intermediary node. The data for the nodes resides in a plurality of data buffers; the index to the data buffer is stored in a command buffer.
    Type: Grant
    Filed: August 8, 2021
    Date of Patent: September 12, 2023
    Assignee: Blaize , Inc.
    Inventors: Satyaki Koneru, Val G. Cook, Ke Yin
  • Patent number: 11734065
    Abstract: Systems, apparatuses, and methods are disclosed for scheduling threads comprising of code blocks in a graph streaming processor (GSP) system. One system includes a scheduler for scheduling plurality of prefetch threads, main threads, invalidate threads. The plurality of prefetch threads includes prefetching data from main memory required for execution of the main threads of the next stage. The plurality of main threads includes a set of instructions operating on the graph streaming processors of GSP system. The plurality of the invalidate threads includes invalidating data location/s consumed by the plurality of the main threads of the previous stage. A portion of the scheduler is implemented in hardware.
    Type: Grant
    Filed: October 30, 2022
    Date of Patent: August 22, 2023
    Inventor: Satyaki Koneru
  • Patent number: 11669366
    Abstract: Methods, systems, and apparatuses for graph streaming processing system are disclosed. One system includes a plurality of graph streaming processors operative to process a plurality of threads, wherein the plurality of threads is organized as nodes. The system further includes a scheduler that includes a plurality of stages. Each stage includes a command parser operative to interpret commands within a corresponding input command buffer, an alternate command buffer, and a thread generator coupled to the command parser. The thread generator is operative to generate the plurality of threads, and dispatch the plurality of threads, where the processing of the plurality of thread for each stage includes storing write commands in the corresponding output command buffer or in the alternate command buffer.
    Type: Grant
    Filed: July 16, 2022
    Date of Patent: June 6, 2023
    Assignee: Blaize, Inc.
    Inventors: Lokesh Agarwal, Sarvendra Govindammagari, Venkata Ganapathi Puppala, Satyaki Koneru
  • Patent number: 11593184
    Abstract: Methods, systems and apparatuses for graph processing are disclosed. One graph streaming processor includes a thread manager, wherein the thread manager is operative to dispatch operation of the plurality of threads of a plurality of thread processors before dependencies of the dependent threads have been resolved, maintain a scorecard of operation of the plurality of threads of the plurality of thread processors, and provide an indication to at least one of the plurality of thread processors when a dependency between the at least one of the plurality of threads that a request has or has not been satisfied. Further, a producer thread provides a response to the dependency when the dependency has been satisfied, and each of the plurality of thread processors is operative to provide processing updates to the thread manager, and provide queries to the thread manager upon reaching a dependency.
    Type: Grant
    Filed: August 11, 2021
    Date of Patent: February 28, 2023
    Assignee: Blaize, Inc.
    Inventors: Lokesh Agarwal, Sarvendra Govindammagari, Venkata Ganapathi Puppala, Satyaki Koneru
  • Patent number: 11593911
    Abstract: The described embodiments include systems, methods, and apparatuses for increased efficiency processing flow. One method includes a plurality of stages configured to process an execution graph that includes a plurality of logical nodes with defined properties and resources associated with each logical node of the plurality of logical nodes, a recirculating ring buffer, wherein the recirculating ring buffer is configured to holding only any one of a control information, input, and, or out data necessary to stream a temporary data between each logical node of the execution graph, and a data producer, wherein the data producer is configured to stall from writing control information into a command buffer upon the command buffer being full, preventing command buffer over-writing.
    Type: Grant
    Filed: August 9, 2021
    Date of Patent: February 28, 2023
    Assignee: Blaze, Inc.
    Inventors: Val G. Cook, Satyaki Koneru, Ke Yin, Dinakar C. Munagala
  • Patent number: 11593114
    Abstract: Methods, systems and apparatuses for performing walk operations of single instruction, multiple data (SIMD) instructions are disclosed. One method includes initiating, by a scheduler, a SIMD thread, where the scheduler is operative to schedule the SIMD thread. The method further includes fetching a plurality of instructions for the SIMD thread. The method further includes determining, by a thread arbiter, at least one instruction that is a walk instruction, where the walk instruction iterates a block of instructions for a subset of channels of the SIMD thread, where the walk instruction includes a walk size, and where the walk size is a number of channels in the subset of channels of the SIMD thread that are processed in a walk iteration in association with the walk instruction. The method further includes executing the walk instruction based on the walk size.
    Type: Grant
    Filed: March 16, 2022
    Date of Patent: February 28, 2023
    Assignee: Blaize, Inc.
    Inventors: Satyaki Koneru, Kamaraj Thangam
  • Publication number: 20230051505
    Abstract: Systems, apparatuses, and methods are disclosed for scheduling threads comprising of code blocks in a graph streaming processor (GSP) system. One system includes a scheduler for scheduling plurality of prefetch threads, main threads, invalidate threads. The plurality of prefetch threads includes prefetching data from main memory required for execution of the main threads of the next stage. The plurality of main threads includes a set of instructions operating on the graph streaming processors of GSP system. The plurality of the invalidate threads includes invalidating data location/s consumed by the plurality of the main threads of the previous stage. A portion of the scheduler is implemented in hardware.
    Type: Application
    Filed: October 30, 2022
    Publication date: February 16, 2023
    Applicant: Blaize, Inc.
    Inventor: Satyaki Koneru
  • Patent number: 11513845
    Abstract: Systems, apparatuses, and methods are disclosed for scheduling threads comprising of code blocks in a graph streaming processor (GSP) system. One system includes a scheduler for scheduling plurality of prefetch threads, main threads, invalidate threads. The plurality of prefetch threads includes prefetching data from main memory required for execution of the main threads of the next stage. The plurality of main threads includes a set of instructions operating on the graph streaming processors of GSP system. The plurality of the invalidate threads includes invalidating data location/s consumed by the plurality of the main threads of the previous stage. A portion of the scheduler is implemented in hardware.
    Type: Grant
    Filed: November 6, 2020
    Date of Patent: November 29, 2022
    Assignee: Blaize, Inc.
    Inventor: Satyaki Koneru
  • Publication number: 20220350653
    Abstract: Methods, systems, and apparatuses for graph streaming processing system are disclosed. One system includes a plurality of graph streaming processors operative to process a plurality of threads, wherein the plurality of threads is organized as nodes. The system further includes a scheduler that includes a plurality of stages. Each stage includes a command parser operative to interpret commands within a corresponding input command buffer, an alternate command buffer, and a thread generator coupled to the command parser. The thread generator is operative to generate the plurality of threads, and dispatch the plurality of threads, where the processing of the plurality of thread for each stage includes storing write commands in the corresponding output command buffer or in the alternate command buffer.
    Type: Application
    Filed: July 16, 2022
    Publication date: November 3, 2022
    Applicant: Blaize, Inc.
    Inventors: Lokesh Agarwal, Sarvendra Govindammagari, Venkata Ganapathi Puppala, Satyaki Koneru
  • Patent number: 11481223
    Abstract: Methods, systems and apparatuses for reducing operations of Sum-Of-Multiply-Accumulate (SOMAC) instructions are disclosed. One method includes scheduling, by a scheduler, a thread for execution, executing, by a processor of a plurality of processors, the thread, fetching, by the processor, a plurality of instructions for the thread from a memory, selecting, by a thread arbiter of the processor, an instruction of the plurality of instructions for execution in an arithmetic logic unit (ALU) pipeline of the processor, and reading the instruction, and determining, by a macro-instruction iterator of the processor, whether the instruction is a Sum-Of-Multiply-Accumulate (SOMAC) instruction with an instruction size, wherein the instruction size indicates a number of iterations that the SOMAC instruction is to be executed.
    Type: Grant
    Filed: August 8, 2019
    Date of Patent: October 25, 2022
    Assignee: Blaize, Inc.
    Inventors: Kamaraj Thangam, Palaparthy Venkata Divya Bharathi, Satyaki Koneru
  • Publication number: 20220300322
    Abstract: Methods, systems, and apparatuses for graph stream processing are disclosed. One apparatus includes a cascade of graph streaming processors, wherein each of the graph streaming processor includes a processor array, and a graph streaming processor scheduler. The cascade of graph streaming processors further includes a plurality of shared command buffers, wherein each shared command buffer includes a buffer address, a write pointer, and a read pointer, wherein for each of the plurality of shared command buffers a first graph streaming processor writes commands to the shared command buffer as indicated by the write pointer of the shared command buffer and a second graph streaming processor reads commands from the shared command buffer as indicated by the read pointer, wherein at least one graph streaming processor scheduler operates to manage the write pointer and the read pointer to avoid overwriting unused commands of the shared command buffer.
    Type: Application
    Filed: June 7, 2022
    Publication date: September 22, 2022
    Applicant: Blaize, Inc.
    Inventors: Venkata Ganapathi Puppala, Sarvendra Govindammagari, Lokesh Agarwal, Satyaki Koneru
  • Patent number: 11436045
    Abstract: Methods, systems and apparatuses for graph streaming processing system are disclosed. One system includes a plurality of graph streaming processors operative to process a plurality of threads, wherein the plurality of threads is organized as nodes. The system further includes a scheduler that includes a plurality of stages. Each stage includes a command parser operative to interpret commands within a corresponding input command buffer, an alternate command buffer, and a thread generator coupled to the command parser. The thread generator is operative to generate the plurality of threads, and dispatch the plurality of threads, where the processing of the plurality of thread for each stage includes storing write commands in the corresponding output command buffer or in the alternate command buffer.
    Type: Grant
    Filed: April 30, 2019
    Date of Patent: September 6, 2022
    Assignee: Blaize, Inc.
    Inventors: Lokesh Agarwal, Sarvendra Govindammagari, Venkata Ganapathi Puppala, Satyaki Koneru
  • Patent number: 11416261
    Abstract: Methods, systems and apparatuses for graph streaming processing are disclosed. One method includes loading, by a group load register, a subset of a an input tensor from a data cache, wherein the group load register provides the subset of the input tensor to all of a plurality of processors, loading, by a plurality of weight data registers, a plurality of weights of a weight tensor, wherein each of the weight data registers provide an weight to a single of the plurality of processors, and performing, by the plurality of processors, a SOMAC (Sum-Of-Multiply-Accumulate) instruction, including simultaneously determining, by each of the plurality of processors, an instruction size of the SOMAC instruction, wherein the instruction size indicates a number of iterations that the SOMAC instruction is to be executed and is equal to a number of outputs within a subset of a plurality of output tensors.
    Type: Grant
    Filed: July 15, 2020
    Date of Patent: August 16, 2022
    Assignee: Blaize, Inc.
    Inventors: Satyaki Koneru, Kamaraj Thangam, Sruthikesh Surineni
  • Patent number: 11416282
    Abstract: Systems, apparatuses and methods are disclosed for scheduling threads comprising of code blocks in a graph streaming processor (GSP) system. One system includes a scheduler for scheduling plurality of threads, the plurality of threads includes a set of instructions operating on the graph streaming processors of GSP system. The scheduler comprises a plurality of stages where each stage is coupled to an input command buffer and an output command buffer. A portion of the scheduler is implemented in hardware and comprises of a command parser operative to interpret commands within a corresponding input command buffer, a thread generator coupled to the command parser operate to generate the plurality of threads, and a thread scheduler coupled to the thread generator for dispatching the plurality of threads for operating on the plurality of graph streaming processors.
    Type: Grant
    Filed: April 14, 2019
    Date of Patent: August 16, 2022
    Assignee: Blaize, Inc.
    Inventors: Satyaki Koneru, Val G. Cook, Ke Yin
  • Patent number: 11379262
    Abstract: Methods, systems and apparatuses for graph stream processing are disclosed. One apparatus includes a cascade of graph streaming processors, wherein each of the graph streaming processor includes a processor array, and a graph streaming processor scheduler. The cascade of graph streaming processors further includes a plurality of shared command buffers, wherein each shared command buffer includes a buffer address, a write pointer, and a read pointer, wherein for each of the plurality of shared command buffers a first graph streaming processor writes commands to the shared command buffer as indicated by the write pointer of the shared command buffer and a second graph streaming processor reads commands from the shared command buffer as indicated by the read pointer, wherein at least one graph streaming processor scheduler operates to manage the write pointer and the read pointer to avoid overwriting unused commands of the shared command buffer.
    Type: Grant
    Filed: February 22, 2019
    Date of Patent: July 5, 2022
    Assignee: Blaize, Inc.
    Inventors: Venkata Ganapathi Puppala, Sarvendra Govindammagari, Lokesh Agarwal, Satyaki Koneru
  • Patent number: 11366664
    Abstract: Systems, apparatuses and methods are disclosed for efficient management of registers in a graph stream processing (GSP) system. The GSP system includes a thread scheduler module operative to initiate a Single Instruction Multiple Data (SIMD) thread, the SIMD thread including a dispatch mask with an initial value. A thread arbiter module operative to select an instruction from the instructions and provide the instruction to each of one or more compute resources, and an instruction iterator module, associated with the each of one or more compute resources operative to determine a data type of the instruction. The instruction iterator module iteratively executes the instruction based on the data type and the dispatch mask.
    Type: Grant
    Filed: December 8, 2019
    Date of Patent: June 21, 2022
    Assignee: Blaize, Inc.
    Inventors: Kamaraj Thangam, Srinivasulu Nagisetty, Venkata Divya Bharathi Palaparthy, Aswathy Asok, Satyaki Koneru
  • Publication number: 20220147384
    Abstract: Systems, apparatuses, and methods are disclosed for scheduling threads comprising of code blocks in a graph streaming processor (GSP) system. One system includes a scheduler for scheduling plurality of prefetch threads, main threads, invalidate threads. The plurality of prefetch threads includes prefetching data from main memory required for execution of the main threads of the next stage. The plurality of main threads includes a set of instructions operating on the graph streaming processors of GSP system. The plurality of the invalidate threads includes invalidating data location/s consumed by the plurality of the main threads of the previous stage. A portion of the scheduler is implemented in hardware.
    Type: Application
    Filed: November 6, 2020
    Publication date: May 12, 2022
    Applicant: Blaize, Inc.
    Inventor: Satyaki Koneru