Patents Assigned to Blaize, Inc.

Cascading of Graph Streaming Processors

Publication number: 20240036921

Abstract: Methods, systems, and apparatuses for graph stream processing are disclosed. One apparatus includes a cascade of graph streaming processors, wherein each of the graph streaming processor includes a processor array, and a graph streaming processor scheduler. The cascade of graph streaming processors further includes a plurality of shared command buffers, wherein each shared command buffer includes a buffer address, a write pointer, and a read pointer, wherein for each of the plurality of shared command buffers a graph streaming processor writes commands to the shared command buffer as indicated by the write pointer of the shared command buffer and the graph streaming processor reads commands from the shared command buffer as indicated by the read pointer, wherein at least one graph streaming processor scheduler operates to manage the write pointer and the read pointer to avoid overwriting unused commands of the shared command buffer.

Type: Application

Filed: October 16, 2023

Publication date: February 1, 2024

Applicant: Blaize, Inc.

Inventors: Venkata Ganapathi Puppala, Sarvendra Govindammagari, Lokesh Agarwal, Satyaki Koneru
GRAPH STREAMING NEURAL NETWORK PROCESSING SYSTEM AND METHOD THEREOF

Publication number: 20230418666

Abstract: Disclosed herein is a graph streaming neural network processing system comprising a first processor array, a second processor, and a thread scheduler. The thread scheduler dispatches a thread of a first node to the first processor array or the second processor, wherein the thread is executed to generate output data comprising a data unit stored in a private data buffer of the second processor. The thread scheduler determines that the data unit is sufficient for executing a thread of a second node. The second node is dependent on the output data generated by execution of a plurality of threads of the first node. Upon determining that the data unit is sufficient, the thread scheduler dispatches the thread of the second node. The thread scheduler determines to dispatch a subsequent thread of the first node for execution when a predefined threshold buffer size is available on the private data buffer.

Type: Application

Filed: June 16, 2023

Publication date: December 28, 2023

Applicant: Blaize Inc.

Inventors: Venkata Ganapathi Puppala, Val G. Cook, Srinivasulu Nagisetty
Single instruction multiple data execution with variable size logical registers

Patent number: 11853762

Abstract: Systems, apparatuses and methods are disclosed for efficient management of registers in a graph stream processing (GSP) system. The GSP system includes a thread scheduler module operative to initiate a Single Instruction Multiple Data (SIMD) thread, the SIMD thread including a dispatch mask with an initial value. A thread arbiter module operative to select an instruction from the instructions and provide the instruction to each of one or more compute resources, and an instruction iterator module, associated with the each of one or more compute resources operative to determine a data type of the instruction. The instruction iterator module iteratively executes the instruction based on the data type and the dispatch mask.

Type: Grant

Filed: May 20, 2022

Date of Patent: December 26, 2023

Assignee: Blaize, Inc.

Inventors: Kamaraj Thangam, Srinivasulu Nagisetty, Venkata Divya Bharathi Palaparthy, Aswathy Asok, Satyaki Koneru
Method of optimizing register memory allocation for vector instructions and a system thereof

Patent number: 11829736

Abstract: The present disclosure relates to a system and a method of optimizing register allocation by a processor. The method comprising receiving an intermediate representation (IR) code of a source code and initializing single instruction multiple data (SIMD) width for the IR code. The method comprising analyzing each basic block of the IR code to classify determine one or more instructions of the IR code as vector instructions, wherein each basic block is one of LOAD, STORE and arithmetic logical and multiply (ALM) instructions. The method comprising dynamically setting the SIMD width for each of the vector instructions.

Type: Grant

Filed: February 9, 2022

Date of Patent: November 28, 2023

Assignee: Blaize, Inc.

Inventors: Pathikonda Datta Nagraj, Aravind Rajulapudi, Ravi Korsa
Cascading of graph streaming processors

Patent number: 11822960

Abstract: Methods, systems, and apparatuses for graph stream processing are disclosed. One apparatus includes a cascade of graph streaming processors, wherein each of the graph streaming processor includes a processor array, and a graph streaming processor scheduler. The cascade of graph streaming processors further includes a plurality of shared command buffers, wherein each shared command buffer includes a buffer address, a write pointer, and a read pointer, wherein for each of the plurality of shared command buffers a first graph streaming processor writes commands to the shared command buffer as indicated by the write pointer of the shared command buffer and a second graph streaming processor reads commands from the shared command buffer as indicated by the read pointer, wherein at least one graph streaming processor scheduler operates to manage the write pointer and the read pointer to avoid overwriting unused commands of the shared command buffer.

Type: Grant

Filed: June 7, 2022

Date of Patent: November 21, 2023

Assignee: Blaize, Inc.

Inventors: Venkata Ganapathi Puppala, Sarvendra Govindammagari, Lokesh Agarwal, Satyaki Koneru
Configurable scheduler for graph processing on multi-processor computing systems

Patent number: 11755368

Abstract: Systems and methods are disclosures for scheduling code in a multiprocessor system. Code is portioned into code blocks by a compiler. The compiler schedules execution of code blocks in nodes. The nodes are connected in a directed acyclical graph with a top node, terminal node and a plurality of intermediate nodes. Execution of the top node is initiated by the compiler. After executing at least one instance of the top node, an instruction in the code block indicates to the scheduler to initiate at least one intermediary node. The scheduler schedules a thread for execution of the intermediary node. The data for the nodes resides in a plurality of data buffers; the index to the data buffer is stored in a command buffer.

Type: Grant

Filed: August 8, 2021

Date of Patent: September 12, 2023

Assignee: Blaize , Inc.

Inventors: Satyaki Koneru, Val G. Cook, Ke Yin
METHOD AND SYSTEM FOR GENERATING A MIXED PRECISION MODEL

Publication number: 20230281423

Abstract: Disclosed herein is a method and a system for generating a mixed precision quantization model for performing image processing. The method comprises receiving a validation dataset of images to train a neural network model. The method comprises for each image of the validation dataset, generating a union sensitivity list, selecting a group of layers, generating a mixed precision quantization model by quantizing the selected group of layers into a high precision format; computing accuracy of the mixed precision quantization model for comparison with a target accuracy; in response to determining the accuracy is less than the target accuracy, generating another mixed precision model by selecting a next group of layers and computing the accuracy. In response to determining the accuracy is greater than or equal to the target accuracy, storing the mixed precision quantization model as a final mixed precision quantization model for image processing.

Type: Application

Filed: December 1, 2022

Publication date: September 7, 2023

Applicant: Blaize, Inc.

Inventors: Deepak Chandra Bijalwan, Mounika Gude, Pratyusha Musunuru
METHOD OF OPTIMIZING REGISTER MEMORY ALLOCATION FOR VECTOR INSTRUCTIONS AND A SYSTEM THEREOF

Publication number: 20230251836

Abstract: The present disclosure relates to a system and a method of optimizing register allocation by a processor. The method comprising receiving an intermediate representation (IR) code of a source code and initializing single instruction multiple data (SIMD) width for the IR code. The method comprising analyzing each basic block of the IR code to classify determine one or more instructions of the IR code as vector instructions, wherein each basic block is one of LOAD, STORE and arithmetic logical and multiply (ALM) instructions. The method comprising dynamically setting the SIMD width for each of the vector instructions.

Type: Application

Filed: February 9, 2022

Publication date: August 10, 2023

Applicant: Blaize, Inc.

Inventors: Pathikonda Datta Nagraj, Aravind Rajulapudi, Ravi Korsa
Adaptive Power Supply Voltage Transient Protection

Publication number: 20230178979

Abstract: Methods, systems, and apparatuses for adaptive power supply voltage transient protection are disclosed. One system includes a system on a chip (SOC). wherein the SOC includes a power supply, a voltage transient sensor, and a power control processing entity. The power supply operates to provide power to one or more processors operating on the SOC. The voltage transient sensor is connected to the power supply and operates to sense voltage transients on the power supply at greater than a predetermined speed or rate. The power control processing entity operates to receive a digital representation of the sensed voltage transients and adjust a power load of the SOC based on the sensed voltage transients.

Type: Application

Filed: December 7, 2021

Publication date: June 8, 2023

Applicant: Blaize, Inc.

Inventor: Sebastian Artur Ciesluk
Reduction of a number of stages of a graph streaming processor

Patent number: 11669366

Abstract: Methods, systems, and apparatuses for graph streaming processing system are disclosed. One system includes a plurality of graph streaming processors operative to process a plurality of threads, wherein the plurality of threads is organized as nodes. The system further includes a scheduler that includes a plurality of stages. Each stage includes a command parser operative to interpret commands within a corresponding input command buffer, an alternate command buffer, and a thread generator coupled to the command parser. The thread generator is operative to generate the plurality of threads, and dispatch the plurality of threads, where the processing of the plurality of thread for each stage includes storing write commands in the corresponding output command buffer or in the alternate command buffer.

Type: Grant

Filed: July 16, 2022

Date of Patent: June 6, 2023

Assignee: Blaize, Inc.

Inventors: Lokesh Agarwal, Sarvendra Govindammagari, Venkata Ganapathi Puppala, Satyaki Koneru
Method of using multidimensional blockification to optimize computer program and device thereof

Patent number: 11640285

Abstract: Disclosed embodiments relate to a method and device for optimizing compilation of source code. The proposed method receives a first intermediate representation code of a source code and analyses each basic block instruction of the plurality of basic block instructions contained in the first intermediate representation code for blockification. In order to blockify the identical instructions, the one or more groups of basic block instructions are assessed for eligibility of blockification. Upon determining as eligible, the group of basic block instructions are blockified using one of one dimensional SIMD vectorization and two-dimensional SIMD vectorization. The method further generates a second intermediate representation of the source code which is translated to executable target code with more efficient processing capacity.

Type: Grant

Filed: April 5, 2022

Date of Patent: May 2, 2023

Assignee: Blaize, Inc.

Inventors: Ravi Korsa, Aravind Rajulapudi, Pathikonda Datta Nagraj
Accelerated operation of a graph streaming processor

Patent number: 11593184

Abstract: Methods, systems and apparatuses for graph processing are disclosed. One graph streaming processor includes a thread manager, wherein the thread manager is operative to dispatch operation of the plurality of threads of a plurality of thread processors before dependencies of the dependent threads have been resolved, maintain a scorecard of operation of the plurality of threads of the plurality of thread processors, and provide an indication to at least one of the plurality of thread processors when a dependency between the at least one of the plurality of threads that a request has or has not been satisfied. Further, a producer thread provides a response to the dependency when the dependency has been satisfied, and each of the plurality of thread processors is operative to provide processing updates to the thread manager, and provide queries to the thread manager upon reaching a dependency.

Type: Grant

Filed: August 11, 2021

Date of Patent: February 28, 2023

Assignee: Blaize, Inc.

Inventors: Lokesh Agarwal, Sarvendra Govindammagari, Venkata Ganapathi Puppala, Satyaki Koneru
Iterating group sum of multiple accumulate operations

Patent number: 11593114

Abstract: Methods, systems and apparatuses for performing walk operations of single instruction, multiple data (SIMD) instructions are disclosed. One method includes initiating, by a scheduler, a SIMD thread, where the scheduler is operative to schedule the SIMD thread. The method further includes fetching a plurality of instructions for the SIMD thread. The method further includes determining, by a thread arbiter, at least one instruction that is a walk instruction, where the walk instruction iterates a block of instructions for a subset of channels of the SIMD thread, where the walk instruction includes a walk size, and where the walk size is a number of channels in the subset of channels of the SIMD thread that are processed in a walk iteration in association with the walk instruction. The method further includes executing the walk instruction based on the walk size.

Type: Grant

Filed: March 16, 2022

Date of Patent: February 28, 2023

Assignee: Blaize, Inc.

Inventors: Satyaki Koneru, Kamaraj Thangam
METHOD AND MACHINE LEARNING SYSTEM TO PERFORM QUANTIZATION OF NEURAL NETWORK

Publication number: 20230058500

Abstract: The present disclosure relates to a system and method of performing quantization of a neural network having multiple layers. The method comprises receiving a floating-point dataset as input dataset and determining a first shift constant for first layer of the neural network based on the input dataset. The method also comprises performing quantization for the first layer using the determined shift constant of the first layer. The method further comprises determining a next shift constant for next layer of the neural network based on output of a layer previous to the next layer, and performing quantization for the next layer using the determined next shift constant. The method further comprises iterating the steps of determining shift constant and performing quantization for all layers of the neural network to generate fixed point dataset as output.

Type: Application

Filed: March 21, 2022

Publication date: February 23, 2023

Applicant: Blaize, Inc.

Inventors: Deepak Chandra Bijalwan, Pratyusha Musunuru
CONFIGURABLE SCHEDULER WITH PRE-FETCH AND INVALIDATE THREADS IN A GRAPH STREAM PROCESSING SYSTEM

Publication number: 20230051505

Abstract: Systems, apparatuses, and methods are disclosed for scheduling threads comprising of code blocks in a graph streaming processor (GSP) system. One system includes a scheduler for scheduling plurality of prefetch threads, main threads, invalidate threads. The plurality of prefetch threads includes prefetching data from main memory required for execution of the main threads of the next stage. The plurality of main threads includes a set of instructions operating on the graph streaming processors of GSP system. The plurality of the invalidate threads includes invalidating data location/s consumed by the plurality of the main threads of the previous stage. A portion of the scheduler is implemented in hardware.

Type: Application

Filed: October 30, 2022

Publication date: February 16, 2023

Applicant: Blaize, Inc.

Inventor: Satyaki Koneru
Configurable scheduler with pre-fetch and invalidate threads in a graph stream processing system

Patent number: 11513845

Abstract: Systems, apparatuses, and methods are disclosed for scheduling threads comprising of code blocks in a graph streaming processor (GSP) system. One system includes a scheduler for scheduling plurality of prefetch threads, main threads, invalidate threads. The plurality of prefetch threads includes prefetching data from main memory required for execution of the main threads of the next stage. The plurality of main threads includes a set of instructions operating on the graph streaming processors of GSP system. The plurality of the invalidate threads includes invalidating data location/s consumed by the plurality of the main threads of the previous stage. A portion of the scheduler is implemented in hardware.

Type: Grant

Filed: November 6, 2020

Date of Patent: November 29, 2022

Assignee: Blaize, Inc.

Inventor: Satyaki Koneru
Reduction of a Number of Stages of a Graph Streaming Processor

Publication number: 20220350653

Abstract: Methods, systems, and apparatuses for graph streaming processing system are disclosed. One system includes a plurality of graph streaming processors operative to process a plurality of threads, wherein the plurality of threads is organized as nodes. The system further includes a scheduler that includes a plurality of stages. Each stage includes a command parser operative to interpret commands within a corresponding input command buffer, an alternate command buffer, and a thread generator coupled to the command parser. The thread generator is operative to generate the plurality of threads, and dispatch the plurality of threads, where the processing of the plurality of thread for each stage includes storing write commands in the corresponding output command buffer or in the alternate command buffer.

Type: Application

Filed: July 16, 2022

Publication date: November 3, 2022

Applicant: Blaize, Inc.

Inventors: Lokesh Agarwal, Sarvendra Govindammagari, Venkata Ganapathi Puppala, Satyaki Koneru
Reducing operations of sum-of-multiply-accumulate (SOMAC) instructions

Patent number: 11481223

Abstract: Methods, systems and apparatuses for reducing operations of Sum-Of-Multiply-Accumulate (SOMAC) instructions are disclosed. One method includes scheduling, by a scheduler, a thread for execution, executing, by a processor of a plurality of processors, the thread, fetching, by the processor, a plurality of instructions for the thread from a memory, selecting, by a thread arbiter of the processor, an instruction of the plurality of instructions for execution in an arithmetic logic unit (ALU) pipeline of the processor, and reading the instruction, and determining, by a macro-instruction iterator of the processor, whether the instruction is a Sum-Of-Multiply-Accumulate (SOMAC) instruction with an instruction size, wherein the instruction size indicates a number of iterations that the SOMAC instruction is to be executed.

Type: Grant

Filed: August 8, 2019

Date of Patent: October 25, 2022

Assignee: Blaize, Inc.

Inventors: Kamaraj Thangam, Palaparthy Venkata Divya Bharathi, Satyaki Koneru
Cascading of Graph Streaming Processors

Publication number: 20220300322

Abstract: Methods, systems, and apparatuses for graph stream processing are disclosed. One apparatus includes a cascade of graph streaming processors, wherein each of the graph streaming processor includes a processor array, and a graph streaming processor scheduler. The cascade of graph streaming processors further includes a plurality of shared command buffers, wherein each shared command buffer includes a buffer address, a write pointer, and a read pointer, wherein for each of the plurality of shared command buffers a first graph streaming processor writes commands to the shared command buffer as indicated by the write pointer of the shared command buffer and a second graph streaming processor reads commands from the shared command buffer as indicated by the read pointer, wherein at least one graph streaming processor scheduler operates to manage the write pointer and the read pointer to avoid overwriting unused commands of the shared command buffer.

Type: Application

Filed: June 7, 2022

Publication date: September 22, 2022

Applicant: Blaize, Inc.

Inventors: Venkata Ganapathi Puppala, Sarvendra Govindammagari, Lokesh Agarwal, Satyaki Koneru
Reduction of a number of stages of a graph streaming processor

Patent number: 11436045

Abstract: Methods, systems and apparatuses for graph streaming processing system are disclosed. One system includes a plurality of graph streaming processors operative to process a plurality of threads, wherein the plurality of threads is organized as nodes. The system further includes a scheduler that includes a plurality of stages. Each stage includes a command parser operative to interpret commands within a corresponding input command buffer, an alternate command buffer, and a thread generator coupled to the command parser. The thread generator is operative to generate the plurality of threads, and dispatch the plurality of threads, where the processing of the plurality of thread for each stage includes storing write commands in the corresponding output command buffer or in the alternate command buffer.

Type: Grant

Filed: April 30, 2019

Date of Patent: September 6, 2022

Assignee: Blaize, Inc.

Inventors: Lokesh Agarwal, Sarvendra Govindammagari, Venkata Ganapathi Puppala, Satyaki Koneru

prev 1 2 3 4 next