Patents by Inventor Raghu Prabhakar
Raghu Prabhakar has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20250117647Abstract: A method that may be computer implemented converts a tensor value from a first format to a second format and trains a neural network. The method determines a maximum exponent code in the first format and subtracts a first bias to obtain the highest needed exponent. It determines a second bias from the highest available code (HAC) in the second format and the HNE, and converts the tensor value from the first format to the second format by using the second bias instead of the first bias. The method uses the second format to train the neural network. The method may round the mantissa of the tensor value in the first format to obtain a rounded mantissa of the tensor value for the second format.Type: ApplicationFiled: October 5, 2023Publication date: April 10, 2025Applicant: SambaNova Systems, Inc.Inventors: Valentina Popescu, Jeffrey S. Brooks, Ram SIVARAMAKRISHNAN, Matthew William Ashcraft, Vinh Quang Nguyen, Gang Liu, Raghu PRABHAKAR, Yongning SHENG
-
Patent number: 12267256Abstract: A tile of an embodiment of a coarse-grain reconfigurable architecture (CGRA) is based on an array of fused compute-memory units (FCMUs), pattern memory units (PMUs), and/or pattern compute units (PCUs) arranged in two dimensions, M×N. Unless clearly noted from context, any reference to a FCMU, PCU, or PMU may refer to one or more of the other units. The communication between a set of FCMUs is performed over a (M+1)×(N+1) switch fabric called the array-level network (ALN) where each switch has connections to its neighboring FCMUs and to neighboring switches in each of the four directions.Type: GrantFiled: August 23, 2022Date of Patent: April 1, 2025Assignee: SambaNova Systems, Inc.Inventors: Ram Sivaramakrishnan, Mark Luttrell, Sumti Jairath, Raghu Prabhakar, Gregory Frederick Grohoski
-
Publication number: 20250077239Abstract: A reconfigurable data processor includes a bus system, an array of configurable units, and a configuration load controller connected to the bus system and coupled to a memory. The configuration load controller incorporates a first set of registers accessible from a host processor for storing addresses of a first configuration file, a second set of registers loaded by loading a configuration file for storing addresses of a second configuration file, and an address generation unit with working address registers. The processor is configured to load a first configuration file from the memory and initiate execution based on a request from runtime software. Additional configuration files are automatically loaded upon completion of a previous configuration file based on information stored in the previous configuration file.Type: ApplicationFiled: September 4, 2024Publication date: March 6, 2025Applicant: SambaNova Systems, Inc.Inventors: Manish K. Shah, Denis Sokolov, Raghu Prabhakar, Arjun Sabnis, Joshua Earle Polzin, Arnav Goel
-
Patent number: 12236220Abstract: The technology disclosed relates to storing a dataflow graph with a plurality of compute nodes that transmit data along data connections, and controlling data transmission between compute nodes in the plurality of compute nodes along the data connections by using control connections to control writing of data.Type: GrantFiled: June 7, 2023Date of Patent: February 25, 2025Assignee: SambaNova Systems, Inc.Inventors: Weiwei Chen, Raghu Prabhakar, David Alan Koeplinger, Sitanshu Gupta, Ruddhi Chaphekar, Ajit Punj, Sumti Jairath
-
Patent number: 12210953Abstract: A data processing system receives a graph that includes a sequence of layers and executes graph cuts between a preceding layer in the graph and a succeeding layer in the graph that succeeds the preceding layer. The preceding layer generates a set of tiles on a tile-by-tile basis and the succeeding layer processes a tensor that includes multiple tiles in the set of tiles. Thus the graph is partitioned into a sequence of subgraphs, and a subgraph in the sequence of subgraphs including a sub-sequence of layers in the sequence of layers. One or more configuration files is generated to configure runtime logic to execute the sequence of subgraphs and the one or more configuration files are stored on a computer-readable media.Type: GrantFiled: March 4, 2022Date of Patent: January 28, 2025Assignee: SambaNova Systems, Inc.Inventors: Tejas Nagendra Babu Nama, Ruddhi Chaphekar, Ram Sivaramakrishnan, Raghu Prabhakar, Sumti Jairath, Junjue Wang, Kaizhao Liang, Adi Fuchs, Matheen Musaddiq, Arvind Krishna Sujeeth
-
Publication number: 20250028786Abstract: A compiler tool and a method of operating a compiler tool for selecting between executing a matrix multiplication operation in a weight stationary mode or in an output stationary mode on a systolic array with reconfigurable processing elements are presented. The compiler tool receives configuration parameters, energy parameters, and performance parameters as well as dimensions of the matrices to be multiplied and estimates energy consumptions and performance numbers for executing the matrix multiplication operation in the weight stationary and the output stationary modes. The compiler tool selects between operating the matrix multiplication operation in the weight stationary and the output stationary mode based on the estimated energy consumption and the estimated performance numbers.Type: ApplicationFiled: July 22, 2024Publication date: January 23, 2025Applicant: SambaNova Systems, Inc.Inventors: Mark William Gottscho, Nasim Farahini, Raghu PRABHAKAR, Hakan Zeffer
-
Publication number: 20250028785Abstract: A reconfigurable processing element for a systolic array that is configurable for multiplying a first matrix with a second matrix to determine a result matrix in a weight stationary mode or in an output stationary mode is presented. Furthermore, a systolic array for performing matrix multiplication of a first matrix and a second matrix to determine a result matrix is presented that includes a plurality of reconfigurable processing elements that are configurable for operating in a weight stationary mode or in an output stationary mode. Moreover, a method of operating a reconfigurable processing element for a systolic array that is configured for performing matrix multiplication of a first matrix and a second matrix to determine a result matrix in a weight stationary mode or in an output stationary mode is presented.Type: ApplicationFiled: July 22, 2024Publication date: January 23, 2025Applicant: SambaNova Systems, Inc.Inventors: Mark William Gottscho, Nasim Farahini, Raghu PRABHAKAR, Hakan Zeffer
-
Patent number: 12206579Abstract: A reconfigurable processing unit is disclosed, comprising a first internal network and a second internal network with different protocols, an interface to an external network with a different protocol, a first configurable unit sending a request to access an external memory over the first internal network, a second configurable unit receiving the request on the first internal network, obtaining a memory address, determining an identifier for the target reconfigurable processing unit, and sending the request, identifier, and memory address over the second internal network, and a third configurable unit receiving the request, identifier, and memory address on the second internal network, determining a routable address on the external network based on the identifier, synthesizing a payload with the request, address, and identifier, and sending the payload to the routable address on the external network.Type: GrantFiled: October 25, 2023Date of Patent: January 21, 2025Assignee: SambaNova Systems, Inc.Inventors: Manish K. Shah, Ram Sivaramakrishnan, Gregory Frederick Grohoski, Raghu Prabhakar
-
Patent number: 12189564Abstract: A data processing system for implementing operations that generate a dynamically-sized output is presented. The data processing system includes a reconfigurable processor that is configured to implement a first operation, a second operation, a recording unit, and a control unit. The first operation generates an output, wherein a size of the output is unknown during a configuration phase. The second operation receives the output of the first operation as an input. The recording unit generates control data that is indicative of the size of the output. The control unit that provides the control data to the second operation, wherein the second operation processes the input based on the control data.Type: GrantFiled: February 14, 2023Date of Patent: January 7, 2025Assignee: SambaNova Systems, Inc.Inventors: Abhishek Srivastava, Matthew Vilim, Raghu Prabhakar, Sankar Rachuru, Zhekun Zhang, Matheen Musaddiq, Apurv Vivek, Sitanshu Gupta, Ayesha Siddiqua
-
Patent number: 12190084Abstract: A coarse-grained reconfigurable (CGR) processor includes a configurable unit comprising a fracturable data path with a plurality of sub-paths. The fracturable data path includes multiple stages that each include an arithmetic logic unit (ALU), selection logic to select two or more inputs for the ALU, and sub-path pipeline registers. The fracturable data path also includes a first output configurable to provide first data selected from any one of the sub-path pipeline registers and a second output configurable to provide second data selected from any one of the sub-path pipeline registers. The configurable unit includes a configuration store to store configuration data to provide a two or more immediate data fields for each stage of the fracturable data path and configuration information for the ALUs, the selection logic, and to select the first data and the second data for the first output and the second output.Type: GrantFiled: January 19, 2023Date of Patent: January 7, 2025Assignee: SambaNova Systems, Inc.Inventors: Raghu Prabhakar, David Brian Jackson
-
Publication number: 20250004971Abstract: A data processing system for implementing operations that generate a dynamically-sized output includes a reconfigurable processor that is configured to implement first and second operations and control circuitry. The first operation generates an output, whose size is unknown during a configuration phase. The second operation receives the output as an input. During a write operation, the first operation is enabled to write a first portion of the output to a first portion of a buffer, while the second operation reads a first portion of the input that is different than the first portion of the output from a second portion of the buffer that is different than the first portion of the buffer. The control circuity includes a control unit that: directs the second operation during a read operation following the write operation to read data as the input from the buffer that was stored during the write operation.Type: ApplicationFiled: September 13, 2024Publication date: January 2, 2025Applicant: SambaNova Systems, Inc.Inventors: Abhishek SRIVASTAVA, Matthew VILIM, Raghu PRABHAKAR, Sankar RACHURU, Zhekun ZHANG, Matheen MUSADDIQ, Apurv VIVEK, Sitanshu GUPTA, Ayesha Siddiqua
-
Publication number: 20250004972Abstract: A data processing system for implementing operations that generate a dynamically-sized output comprises a reconfigurable processor and a compiler. The compiler generates configuration data for configuring the reconfigurable processor to implement first and second operations and first and second connections. The first operation generates an output, and the second operation receives the output of the first operation as an input. The size of the output is unknown when generating the configuration data, and the output comprises a number of elements that is smaller than or equal to a predetermined maximum number of elements. The first connection for the output and the second connection for the input are both suitable for a transmission of the predetermined maximum number of elements. The reconfigurable processor is configured with the configuration data such that the reconfigurable processor implements the first operation, the second operation, the first connection, and the second connection.Type: ApplicationFiled: September 13, 2024Publication date: January 2, 2025Applicant: SambaNova Systems, Inc.Inventors: Abhishek SRIVASTAVA, Matthew VILIM, Raghu PRABHAKAR, Sankar RACHURU, Zhekun ZHANG, Matheen MUSADDIQ, Apurv VIVEK, Sitanshu GUPTA, Ayesha Siddiqua
-
Publication number: 20240427727Abstract: In some aspects, a program is executed on a coarse-grained reconfigurable (CGR) processor. The CGR determines that the program produces an output that includes a variable length tensor, determines a maximum size of the variable length tensor and sets, based on the maximum size, a maximum of a counter associated with the program. The counter is set to an initial value of zero. The CGR initiates execution of the program, causing the program to receive an input tensor. Based on determining that the program is operating on a first portion of the input tensor, the CGR performs an update to the counter, to create an updated counter, and communicates the updated counter to one or more consumers within the program. After determining that the program has completed operating on the input tensor, a final size of the output is communicated to one or more downstream consumers external to the program.Type: ApplicationFiled: June 23, 2023Publication date: December 26, 2024Applicant: SambaNova Systems, Inc.Inventors: Abhishek SRIVASTAVA, Matthew VILIM, Raghu PRABHAKAR, Sankar RACHURU, Zhekun ZHANG, Matheen MUSADDIQ, Apurv VIVEK, Sitanshu GUPTA
-
Patent number: 12143298Abstract: A computing system is disclosed, comprising a plurality of interconnected reconfigurable dataflow units (RDUs). Each RDU includes configurable units, internal networks, and external interfaces. The first configurable unit of the first RDU sends a request to access an external memory attached to the second RDU over its first internal network. The second configurable unit of the first RDU obtains a memory address for the request, determines an identifier for the second RDU, and sends the request, identifier, and memory address to the third configurable unit of the first RDU over its second internal network. The third configurable unit of the first RDU generates a routable address on the external network, synthesizes a payload, and sends it through an external network interface. The third configurable unit of the second RDU receives the payload, and the fourth configurable unit of the second RDU uses the address to access the external memory.Type: GrantFiled: October 25, 2023Date of Patent: November 12, 2024Assignee: SambaNova Systems, Inc.Inventors: Manish K. Shah, Ram Sivaramakrishnan, Gregory Frederick Grohoski, Raghu Prabhakar
-
Patent number: 12112250Abstract: A data processing system includes compile time logic to section a graph into a sequence of sections, including a first section followed by a second section. The compile time logic configured the first section to generate a first output in a first non-overlapping target configuration in response to processing an input in a first overlapping input configuration, and configures the second section to generate a second output in a second non-overlapping target configuration in response to processing the first output in a second overlapping input configuration. The compile time logic also creates a set of computer instructions to execute the first section and the second section on a target processing system.Type: GrantFiled: April 4, 2022Date of Patent: October 8, 2024Assignee: SambaNova Systems, Inc.Inventors: Tejas Nagendra Babu Nama, Ruddhi Chaphekar, Ram Sivaramakrishnan, Raghu Prabhakar, Sumti Jairath, Junjue Wang, Kaizhao Liang, Adi Fuchs, Matheen Musaddiq, Arvind Krishna Sujeeth
-
Publication number: 20240296141Abstract: A method and system for unloading configuration data in a reconfigurable processor array comprises a bus system, and array of processor units connected to bus system, the processor units in the array including configuration data stores to store unit files comprising plurality of subfiles of configuration data particular to corresponding processor units. A configuration unload controller is connected to the bus system, including logic to execute an array configuration unload process, including distributing a command to plurality of the processor units in array to unload the unit files particular to corresponding processor units, the unit files each comprising plurality of ordered sub-files, receiving sub-files via bus system from the array of process units, and assembling an unload configuration file by arranging the received subfiles in memory according to the process unit of the unit file of which the subfile is a part, and order of the subfile in unit file.Type: ApplicationFiled: May 13, 2024Publication date: September 5, 2024Applicant: SambaNova Systems, Inc.Inventors: Manish K. Shah, Ram Sivaramakrishnan, Mark Luttrell, David B. Jackson, Raghu Prabhakar, Sumti Jairath, Gregory Frederick Grohoski, Pramod Nataraja
-
Patent number: 12079156Abstract: Disclosed is a data processing system that includes a plurality of reconfigurable processors and processor memory. Runtime logic, operatively coupled to the plurality of reconfigurable processors and the processor memory, is configured to configure at least one reconfigurable processor in the plurality of reconfigurable processors with a first subgraph in a sequence of subgraphs of a graph; load an input onto the processor memory; on a tile-by-tile basis, process a first set of input tiles from the input through the first subgraph and generate a first set of intermediate tiles, load the first set of intermediate tiles onto the processor memory, and process the first set of intermediate tiles through the first subgraph and generate a first set of output tiles; and compose output tiles in the first set of output tiles into a first composed input, and load the first composed input onto the processor memory.Type: GrantFiled: July 23, 2021Date of Patent: September 3, 2024Assignee: SambaNova Systems, Inc.Inventors: Tejas Nagendra Babu Nama, Ruddhi Chaphekar, Ram Sivaramakrishnan, Raghu Prabhakar, Sumti Jairath, Junjue Wang, Kaizhao Liang, Adi Fuchs, Matheen Musaddiq, Arvind Krishna Sujeeth
-
Patent number: 12056506Abstract: In a Coarse-Grained Reconfigurable Architecture (CGRA) system, two configuration files are used. The CGRA system has an array of configurable units that includes a plurality of switches, a print configurable unit, a source configurable unit, and one or more sink configurable units, The first configuration file, upon being executed by the CGRA system, configures the CGRA system to send output data directly from the source configurable unit to the one or more sink configurable units through the plurality of switches. The second configuration file, upon being executed into the CGRA system, configures the CGRA system to send the output data from the source configurable unit to the print configurable unit through the plurality of switches, send the output data from the print configurable unit to both a memory that is accessible by a host computing unit, and the one or more sink configurable units.Type: GrantFiled: December 15, 2022Date of Patent: August 6, 2024Assignee: SambaNova Systems, Inc.Inventors: Joshua Brot, Raghu Prabhakar, Subhra Mazumdar, James Decker, Tram Tran
-
Publication number: 20240256631Abstract: A computing method comprises combining an M×K multiplicand matrix and P number of addend vectors to generate an M×(K+P) integrated matrix. The addend vectors can comprise a vector of constants and/or a column of an addend matrix. The method further comprises generating a row-extended matrix comprising a K×N multiplicand matrix and P rows of a constant vector. The method computes (K+P) products of a row of the integrated matrix multiplied by a column of the row-extended matrix and computing an integrated sum of the products. A multiply-accumulate computation can compute the integrated sum and is equivalent to a sum of K number of products of a column of the M×K matrix multiplied by a row of the K×N multiplicand matrix and added to the P number of addend vectors. A computing system can implement the method and can include a matrix computation unit.Type: ApplicationFiled: January 27, 2023Publication date: August 1, 2024Applicant: SambaNova Systems, Inc.Inventors: Pramod NATARAJA, Raghu PRABHAKAR, David Brian JACKSON, Ram SIVARAMAKRISHNAN
-
Publication number: 20240233068Abstract: A statically reconfigurable dataflow architecture processor (SRDAP) performs an N-dimensional affine transform specified by a matrix on an input image to produce an output image includes pattern compute units (PCUs) and pattern memory units (PMUs) interconnected by switches. PCUs have vector pipelines of functional units that perform operations on operands received from previous pipeline stages, another PCU, and/or PMUs. PMUs have memories loadable with the input image. The PCUs and PMUs are statically reconfigurable to, for all the output pixels: apply the matrix to vectors of output pixel coordinates to calculate corresponding vectors of input pixel coordinates, flatten the vectors of input pixel coordinates into vectors of PMU addresses of the input pixels, read values of the input pixels from the PMUs at the calculated input pixel addresses, and write vectors of the input pixel values to PMUs to form the output image.Type: ApplicationFiled: January 10, 2023Publication date: July 11, 2024Applicant: SambaNova Systems, Inc.Inventors: Matthew Vilim, Raghu Prabhakar, Matt Feldman, Yaqi Zhang