Patents by Inventor Ram Sivaramakrishnan

Ram Sivaramakrishnan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

MULTIPLE TILING CONFIGURATIONS IN SINGLE PROCESSING GRAPH

Publication number: 20250190749

Abstract: A device may pad a first input into a first padded input, read a first set of input tiles from the first padded input in a first input tiling configuration, process the first set of input tiles through a first section of a graph to generate a first set of output tiles in a first target tiling configuration, and pad the first set of output tiles to generate first set of padded output tiles. A device may arrange the first set of padded output tiles into a second input comprising a second set of input tiles, read the second set of input tiles from the second input in a second input tiling configuration, and process the second set of input tiles through a second section of the graph to generate a second set of output tiles in a second target tiling configuration, different than the first target tiling configuration.

Type: Application

Filed: February 24, 2025

Publication date: June 12, 2025

Applicant: SambaNova Systems, Inc.

Inventors: Tejas Nagendra Babu NAMA, Ruddhi CHAPHEKAR, Ram SIVARAMAKRISHNAN, Raghu PRABHAKAR, Sumti JAIRATH, Junjue WANG, Kaizhao LIANG, Adi FUCHS, Matheen MUSADDIQ, Arvind Krishna SUJEETH
MANAGING DISPARATE TILING CONFIGURATIONS IN A COARSE-GRAINED RECONFIGURABLE ARCHITECTURE

Publication number: 20250190750

Abstract: A device may write a composed input in memory, wherein the composed input is constructed by composing tiles in a first of set of tiles, wherein the tiles in the first of set of tiles have a first tiling configuration. A device may read a second set of tiles from the composed input, wherein tiles in the second set of tiles have a second tiling configuration that is different from the first tiling configuration.

Type: Application

Filed: February 24, 2025

Publication date: June 12, 2025

Applicant: SambaNova Systems, Inc.

Inventors: Tejas Nagendra Babu NAMA, Ruddhi CHAPHEKAR, Ram SIVARAMAKRISHNAN, Raghu PRABHAKAR, Sumti JAIRATH, Junjue WANG, Kaizhao LIANG, Adi FUCHS, Matheen MUSADDIQ, Arvind Krishna SUJEETH
RETILING A TENSOR AFTER ZERO PADDING

Publication number: 20250190751

Abstract: A device may cause a first section of a graph to generate a first plurality of tiles of a tensor, the first plurality of tiles having a first size. A device may initialize a memory area having a second size, larger than the first size, to zeros. A device may write the first plurality of tiles in the memory area, such that a zero padding is formed around edges of the first plurality of tiles written to the memory area, wherein a total width of the zero padding is based on a width difference between the second size and the first size. A device may subsequent to writing the first plurality of tiles, retile the combination of the first plurality of tiles and the zero padding, to generate a second plurality of tiles. A device may cause a second section of the graph to process the second plurality of tiles.

Type: Application

Filed: February 24, 2025

Publication date: June 12, 2025

Applicant: SambaNova Systems, Inc.

Inventors: Tejas Nagendra Babu NAMA, Ruddhi CHAPHEKAR, Ram SIVARAMAKRISHNAN, Raghu PRABHAKAR, Sumti JAIRATH, Junjue WANG, Kaizhao LIANG, Adi FUCHS, Matheen MUSADDIQ, Arvind Krishna SUJEETH
Lossless tiling in convolution networks—data flow logic

Patent number: 12321843

Abstract: A data processing system includes memory and reconfigurable processors, operatively coupled to the memory, configured to execute a sequence of subgraphs of a graph. The sequence of subgraphs includes a preceding subgraph and a succeeding subgraph. The data processing system also includes data flow logic, operatively coupled to the reconfigurable processors and the memory, configured to store a tiled output of the preceding subgraph as a composed input in the memory and make available parts of the composed input for processing by the succeeding subgraph.

Type: Grant

Filed: March 21, 2022

Date of Patent: June 3, 2025

Assignee: SambaNova Systems, Inc.

Inventors: Tejas Nagendra Babu Nama, Ruddhi Chaphekar, Ram Sivaramakrishnan, Raghu Prabhakar, Sumti Jairath, Junjue Wang, Kaizhao Liang, Adi Fuchs, Matheen Musaddiq, Arvind Krishna Sujeeth
Multiple contexts for a compute unit in a reconfigurable data processor

Patent number: 12314754

Abstract: A data processing system includes a coarse-grained reconfigurable (CGR) processor and a compiler configured to generate one or more configuration files for an application for execution on the CGR processor. The CGR processor includes an array of pattern compute units (PCUs) and pattern memory units (PMUs). A PCU comprises a plurality of single-instruction multiple data (SIMD) units configurable to form a datapath. The CGR processor is coupled to configure a datapath including a SIMD, using a set of configurations bits corresponding to an operation related to the task. The CGR processor is coupled to switch among the plurality of tasks and their corresponding PCU contexts during execution of the dataflow graph. The CGR processor is coupled to switch among tasks via static switching or dynamic switching, in response to the triggering of a task complete event generated by a preset counter, indicating completion of a current task.

Type: Grant

Filed: August 22, 2023

Date of Patent: May 27, 2025

Assignee: SambaNova Systems, Inc.

Inventors: Raghu Prabhakar, Ram Sivaramakrishnan, David Brian Jackson, Pramod Nataraja
Dynamic Exponent Bias Method for Neural Network Training

Publication number: 20250117647

Abstract: A method that may be computer implemented converts a tensor value from a first format to a second format and trains a neural network. The method determines a maximum exponent code in the first format and subtracts a first bias to obtain the highest needed exponent. It determines a second bias from the highest available code (HAC) in the second format and the HNE, and converts the tensor value from the first format to the second format by using the second bias instead of the first bias. The method uses the second format to train the neural network. The method may round the mantissa of the tensor value in the first format to obtain a rounded mantissa of the tensor value for the second format.

Type: Application

Filed: October 5, 2023

Publication date: April 10, 2025

Applicant: SambaNova Systems, Inc.

Inventors: Valentina Popescu, Jeffrey S. Brooks, Ram SIVARAMAKRISHNAN, Matthew William Ashcraft, Vinh Quang Nguyen, Gang Liu, Raghu PRABHAKAR, Yongning SHENG
Dynamic destination id in an array level network of a reconfigurable dataflow processor

Patent number: 12267256

Abstract: A tile of an embodiment of a coarse-grain reconfigurable architecture (CGRA) is based on an array of fused compute-memory units (FCMUs), pattern memory units (PMUs), and/or pattern compute units (PCUs) arranged in two dimensions, M×N. Unless clearly noted from context, any reference to a FCMU, PCU, or PMU may refer to one or more of the other units. The communication between a set of FCMUs is performed over a (M+1)×(N+1) switch fabric called the array-level network (ALN) where each switch has connections to its neighboring FCMUs and to neighboring switches in each of the four directions.

Type: Grant

Filed: August 23, 2022

Date of Patent: April 1, 2025

Assignee: SambaNova Systems, Inc.

Inventors: Ram Sivaramakrishnan, Mark Luttrell, Sumti Jairath, Raghu Prabhakar, Gregory Frederick Grohoski
Lossless tiling in convolution networks—section cuts

Patent number: 12210953

Abstract: A data processing system receives a graph that includes a sequence of layers and executes graph cuts between a preceding layer in the graph and a succeeding layer in the graph that succeeds the preceding layer. The preceding layer generates a set of tiles on a tile-by-tile basis and the succeeding layer processes a tensor that includes multiple tiles in the set of tiles. Thus the graph is partitioned into a sequence of subgraphs, and a subgraph in the sequence of subgraphs including a sub-sequence of layers in the sequence of layers. One or more configuration files is generated to configure runtime logic to execute the sequence of subgraphs and the one or more configuration files are stored on a computer-readable media.

Type: Grant

Filed: March 4, 2022

Date of Patent: January 28, 2025

Assignee: SambaNova Systems, Inc.

Inventors: Tejas Nagendra Babu Nama, Ruddhi Chaphekar, Ram Sivaramakrishnan, Raghu Prabhakar, Sumti Jairath, Junjue Wang, Kaizhao Liang, Adi Fuchs, Matheen Musaddiq, Arvind Krishna Sujeeth
Reconfigurable dataflow unit with remote read/write functionality

Patent number: 12206579

Abstract: A reconfigurable processing unit is disclosed, comprising a first internal network and a second internal network with different protocols, an interface to an external network with a different protocol, a first configurable unit sending a request to access an external memory over the first internal network, a second configurable unit receiving the request on the first internal network, obtaining a memory address, determining an identifier for the target reconfigurable processing unit, and sending the request, identifier, and memory address over the second internal network, and a third configurable unit receiving the request, identifier, and memory address on the second internal network, determining a routable address on the external network based on the identifier, synthesizing a payload with the request, address, and identifier, and sending the payload to the routable address on the external network.

Type: Grant

Filed: October 25, 2023

Date of Patent: January 21, 2025

Assignee: SambaNova Systems, Inc.

Inventors: Manish K. Shah, Ram Sivaramakrishnan, Gregory Frederick Grohoski, Raghu Prabhakar
Convolution Calculation Engine Using Look-Up Tables for Address Calculation

Publication number: 20240378147

Abstract: A convolution calculation engine includes a kernel element counter for a convolution operation between a kernel and an input tensor. The kernel element counter wraps back to an initial kernel count value after reaching a maximum kernel count value. The convolution calculation engine also includes an offset look-up table (LUT) that provides a relative input offset into the input tensor based on an output of the kernel element counter and input location calculation logic that provides an input location within an input tensor for the convolution operation based on the relative input offset provided by the offset LUT.

Type: Application

Filed: May 8, 2023

Publication date: November 14, 2024

Applicant: SambaNova Systems, Inc.

Inventors: Mark William Gottscho, Ram SIVARAMAKRISHNAN, David Brian JACKSON, Ruddhi CHAPHEKAR, Tuowen Zhao, Lei Xia
Convolution Calculation Engine

Publication number: 20240378259

Abstract: A convolution calculation engine to perform a convolution operation includes a convolution address compute unit. The convolution address compute unit includes an outer output base location register to provide an outer output base location for the convolution operation and an outer input base location register to provide an outer input base location for the convolution operation. It also includes a kernel element counter that starts to count from an initial kernel count value to a maximum kernel count value in response to a change in the outer output base location and a kernel offset generator to generate a kernel offset based on an output of the kernel element counter. In addition, the convolution address compute unit includes inner location logic to calculate an output location based on the outer output base location and an input location based on the outer input base location and output of the kernel element counter.

Type: Application

Filed: May 8, 2023

Publication date: November 14, 2024

Applicant: SambaNova Systems, Inc.

Inventors: Mark William Gottscho, Ram SIVARAMAKRISHNAN, David Brian JACKSON, Ruddhi CHAPHEKAR, Tuowen Zhao, Lei Xia
Peer-to-peer communication between reconfigurable dataflow units

Patent number: 12143298

Abstract: A computing system is disclosed, comprising a plurality of interconnected reconfigurable dataflow units (RDUs). Each RDU includes configurable units, internal networks, and external interfaces. The first configurable unit of the first RDU sends a request to access an external memory attached to the second RDU over its first internal network. The second configurable unit of the first RDU obtains a memory address for the request, determines an identifier for the second RDU, and sends the request, identifier, and memory address to the third configurable unit of the first RDU over its second internal network. The third configurable unit of the first RDU generates a routable address on the external network, synthesizes a payload, and sends it through an external network interface. The third configurable unit of the second RDU receives the payload, and the fourth configurable unit of the second RDU uses the address to access the external memory.

Type: Grant

Filed: October 25, 2023

Date of Patent: November 12, 2024

Assignee: SambaNova Systems, Inc.

Inventors: Manish K. Shah, Ram Sivaramakrishnan, Gregory Frederick Grohoski, Raghu Prabhakar
COARSE-GRAINED RECONFIGURABLE PROCESSOR ARRAY WITH OPTIMIZED BUFFERS

Publication number: 20240370240

Abstract: A system and method for transforming a high-level program into configuration data for a coarse-grained reconfigurable (CGR) data processor with an array of CGR units. The high-level program is transformed into a dataflow graph that includes multiple interdependent asynchronously performing meta-pipelines. A first buffer is identified that stores data that is passed from a producer in a first meta-pipeline stage to a consumer in a second meta-pipeline stage. The system determines limitations associated with the array, and selects for implementation the lowest-cost buffer topology, chosen from a cascaded buffer topology, a hybrid buffer topology, and a striped buffer topology, where cost is determined by the number of memory units and on a number of times data is written into a memory unit while traveling through the first buffer. Optimal configuration data for the array is generated and stored.

Type: Application

Filed: July 17, 2024

Publication date: November 7, 2024

Applicant: SambaNova Systems, Inc.

Inventors: Nathan Francis SHEELEY, Weihang FAN, Matheen MUSADDIQ, Ram SIVARAMAKRISHNAN
Lossless tiling in convolution networks—resetting overlap factor to zero at section boundaries

Patent number: 12112250

Abstract: A data processing system includes compile time logic to section a graph into a sequence of sections, including a first section followed by a second section. The compile time logic configured the first section to generate a first output in a first non-overlapping target configuration in response to processing an input in a first overlapping input configuration, and configures the second section to generate a second output in a second non-overlapping target configuration in response to processing the first output in a second overlapping input configuration. The compile time logic also creates a set of computer instructions to execute the first section and the second section on a target processing system.

Type: Grant

Filed: April 4, 2022

Date of Patent: October 8, 2024

Assignee: SambaNova Systems, Inc.

Inventors: Tejas Nagendra Babu Nama, Ruddhi Chaphekar, Ram Sivaramakrishnan, Raghu Prabhakar, Sumti Jairath, Junjue Wang, Kaizhao Liang, Adi Fuchs, Matheen Musaddiq, Arvind Krishna Sujeeth
Method And System For Unloading Configuration Data In A Reconfigurable Processor Array

Publication number: 20240296141

Abstract: A method and system for unloading configuration data in a reconfigurable processor array comprises a bus system, and array of processor units connected to bus system, the processor units in the array including configuration data stores to store unit files comprising plurality of subfiles of configuration data particular to corresponding processor units. A configuration unload controller is connected to the bus system, including logic to execute an array configuration unload process, including distributing a command to plurality of the processor units in array to unload the unit files particular to corresponding processor units, the unit files each comprising plurality of ordered sub-files, receiving sub-files via bus system from the array of process units, and assembling an unload configuration file by arranging the received subfiles in memory according to the process unit of the unit file of which the subfile is a part, and order of the subfile in unit file.

Type: Application

Filed: May 13, 2024

Publication date: September 5, 2024

Applicant: SambaNova Systems, Inc.

Inventors: Manish K. Shah, Ram Sivaramakrishnan, Mark Luttrell, David B. Jackson, Raghu Prabhakar, Sumti Jairath, Gregory Frederick Grohoski, Pramod Nataraja
Lossless tiling in convolution networks—materialization of tensors

Patent number: 12079156

Abstract: Disclosed is a data processing system that includes a plurality of reconfigurable processors and processor memory. Runtime logic, operatively coupled to the plurality of reconfigurable processors and the processor memory, is configured to configure at least one reconfigurable processor in the plurality of reconfigurable processors with a first subgraph in a sequence of subgraphs of a graph; load an input onto the processor memory; on a tile-by-tile basis, process a first set of input tiles from the input through the first subgraph and generate a first set of intermediate tiles, load the first set of intermediate tiles onto the processor memory, and process the first set of intermediate tiles through the first subgraph and generate a first set of output tiles; and compose output tiles in the first set of output tiles into a first composed input, and load the first composed input onto the processor memory.

Type: Grant

Filed: July 23, 2021

Date of Patent: September 3, 2024

Assignee: SambaNova Systems, Inc.

Inventors: Tejas Nagendra Babu Nama, Ruddhi Chaphekar, Ram Sivaramakrishnan, Raghu Prabhakar, Sumti Jairath, Junjue Wang, Kaizhao Liang, Adi Fuchs, Matheen Musaddiq, Arvind Krishna Sujeeth
MATRIX SUMMATION USING INTEGRATED MATRICES

Publication number: 20240256631

Abstract: A computing method comprises combining an M×K multiplicand matrix and P number of addend vectors to generate an M×(K+P) integrated matrix. The addend vectors can comprise a vector of constants and/or a column of an addend matrix. The method further comprises generating a row-extended matrix comprising a K×N multiplicand matrix and P rows of a constant vector. The method computes (K+P) products of a row of the integrated matrix multiplied by a column of the row-extended matrix and computing an integrated sum of the products. A multiply-accumulate computation can compute the integrated sum and is equivalent to a sum of K number of products of a column of the M×K matrix multiplied by a row of the K×N multiplicand matrix and added to the P number of addend vectors. A computing system can implement the method and can include a matrix computation unit.

Type: Application

Filed: January 27, 2023

Publication date: August 1, 2024

Applicant: SambaNova Systems, Inc.

Inventors: Pramod NATARAJA, Raghu PRABHAKAR, David Brian JACKSON, Ram SIVARAMAKRISHNAN
Skip buffer splitting

Patent number: 12045591

Abstract: A compiler transforms a high-level program into configuration data for a coarse-grained reconfigurable (CGR) data processor with an array of CGR units. The compiler includes a method that identifies a skip buffer in a dataflow graph, determines limitations associated with the array, and searches for a lowest cost implementation topology and stage depth. At least three topologies are considered, including a cascaded buffer topology, a hybrid buffer topology, and a striped buffer topology. The lowest cost implementation topology and stage depth are based on the size of the buffered data (usually, the size of a tensor), the depth of the skip buffer, and the array's limitations. The hybrid buffer topology includes multiple sections of parallel memory units. The data travels between memory units in one section to adjacent memory units in a next section without intervening reorder buffers.

Type: Grant

Filed: September 14, 2022

Date of Patent: July 23, 2024

Assignee: SambaNova Systems, Inc.

Inventors: Nathan Sheeley, Weihang Fan, Matheen Musaddiq, Ram Sivaramakrishnan
Computer System with Reconfigurable Processors

Publication number: 20240220325

Abstract: A computer system includes an array of reconfigurable processor blocks which execute fragments of a larger data processing operation. An array controller distributes a control signal to the reconfigurable processors in the array and receives control signals for the respective execution fragments. The control signal may include quiesce logic or other control methods to execute the effective execution fragments of the larger data processing operation when individual processors become available.

Type: Application

Filed: March 12, 2024

Publication date: July 4, 2024

Applicant: SambaNova Systems, Inc.

Inventors: Raghu Prabhakar, Manish K. Shah, Pramod Nataraja, David Brian Jackson, Kin Hing Leung, Ram Sivaramakrishnan, Sumti Jairath, Gregory Frederick Grohoski
Lossless tiling in convolution networks—graph metadata generation

Patent number: 12001936

Abstract: A processing graph of an application with a sequence of processing nodes is obtained which processes an input and generates an intermediate representation a further intermediate representation, and an output representation of the input at stages in the sequence of processing nodes. Graph metadata is generated that specifies a non-overlapping target tiling configuration for the output representation, an overlapping tiling configuration for the input, an overlapping tiling configuration for the intermediate representation, and a third tiling configuration for the further intermediate representation. The processing graph is modified based on the graph metadata to conform to the parameters specified by the graph metadata. A set of computer instructions is then created to execute the modified processing graph on a target processing system.

Type: Grant

Filed: March 21, 2022

Date of Patent: June 4, 2024

Assignee: SambaNova Systems, Inc.

Inventors: Tejas Nagendra Babu Nama, Ruddhi Chaphekar, Ram Sivaramakrishnan, Raghu Prabhakar, Sumti Jairath, Junjue Wang, Kaizhao Liang, Adi Fuchs, Matheen Musaddiq, Arvind Krishna Sujeeth

1 2 3 4 5 next