Patents by Inventor Ram Sivaramakrishnan
Ram Sivaramakrishnan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12001936Abstract: A processing graph of an application with a sequence of processing nodes is obtained which processes an input and generates an intermediate representation a further intermediate representation, and an output representation of the input at stages in the sequence of processing nodes. Graph metadata is generated that specifies a non-overlapping target tiling configuration for the output representation, an overlapping tiling configuration for the input, an overlapping tiling configuration for the intermediate representation, and a third tiling configuration for the further intermediate representation. The processing graph is modified based on the graph metadata to conform to the parameters specified by the graph metadata. A set of computer instructions is then created to execute the modified processing graph on a target processing system.Type: GrantFiled: March 21, 2022Date of Patent: June 4, 2024Assignee: SambaNova Systems, Inc.Inventors: Tejas Nagendra Babu Nama, Ruddhi Chaphekar, Ram Sivaramakrishnan, Raghu Prabhakar, Sumti Jairath, Junjue Wang, Kaizhao Liang, Adi Fuchs, Matheen Musaddiq, Arvind Krishna Sujeeth
-
Patent number: 11995529Abstract: Disclosed is a data processing system that includes compile time logic to section a graph into a sequence of sections including a first section and a second section. The compile time logic is to configure the first section with a first topology of tiling configurations in which to tile inputs, intermediate outputs, and final outputs of the first section, and configure the second section with a second topology of tiling configurations in which to tile inputs, intermediate outputs, and final outputs of the second section. The data processing system further includes runtime logic configured with the compile time logic to execute the first section to generate the inputs, intermediate outputs, and final outputs of the first section in the first topology of tiling configurations, and execute the second section to generate the inputs, intermediate outputs, and final outputs of the second section in the second topology of tiling configurations.Type: GrantFiled: June 30, 2021Date of Patent: May 28, 2024Assignee: SambaNova Systems, Inc.Inventors: Tejas Nagendra Babu Nama, Ruddhi Chaphekar, Ram Sivaramakrishnan, Raghu Prabhakar, Sumti Jairath, Junjue Wang, Kaizhao Liang, Adi Fuchs, Matheen Musaddiq, Arvind Krishna Sujeeth
-
Publication number: 20240168913Abstract: Disclosed is a method that includes sectioning a graph into a sequence of sections, the sequence of sections including at least a first section followed by a second section. The first section is configured to generate a first output in a first target tiling configuration in response to processing a first input in a first input tiling configuration. The graph is configured to reconfigure the first output in the first target tiling configuration to a second input in a second input tiling configuration. The second section is configured to generate a second output in a second target tiling configuration in response to processing the second input in the second input tiling configuration.Type: ApplicationFiled: November 24, 2023Publication date: May 23, 2024Applicant: SambaNova Systems, Inc.Inventors: Tejas Nagendra Babu NAMA, Ruddhi CHAPHEKAR, Ram SIVARAMAKRISHNAN, Raghu PRABHAKAR, Sumti JAIRATH, Junjue WANG, Kaizhao LIANG, Adi FUCHS, Matheen MUSADDIQ, Arvind Krishna SUJEETH
-
Patent number: 11983140Abstract: A reconfigurable data processor comprises a bus system, and an array of configurable units connected to the bus system, configurable units in the array including configuration data stores to store unit files comprising a plurality of sub-files of configuration data particular to the corresponding configurable units. A configuration unload controller connected to the bus system, including logic to execute an array configuration unload process, including distributing a command to a plurality of the configurable units in the array to unload the unit files particular to the corresponding configurable units, the unit files each comprising a plurality of ordered sub-files, receiving sub-files via the bus system from the array of configurable units, and assembling an unload configuration file by arranging the received sub-files in memory according to the configurable unit of the unit file of which the sub-file is a part, and the order of the sub-file in the unit file.Type: GrantFiled: November 22, 2021Date of Patent: May 14, 2024Assignee: SambaNova Systems, Inc.Inventors: Manish K. Shah, Ram Sivaramakrishnan, Mark Luttrell, David B. Jackson, Raghu Prabhakar, Sumti Jairath, Gregory Frederick Grohoski, Pramod Nataraja
-
Patent number: 11971846Abstract: A logic unit in an array of processing units is configurable to consume source tokens and a status signal and to produce barrier tokens and an enable signal based on the source tokens and the status signal.Type: GrantFiled: February 14, 2023Date of Patent: April 30, 2024Assignee: SambaNova Systems, Inc.Inventors: Raghu Prabhakar, Manish K. Shah, Ram Sivaramakrishnan, Pramod Nataraja, David Brian Jackson, Gregory Frederick Grohoski
-
Patent number: 11934343Abstract: Disclosed is a data processing system to receive a processing graph of an application. A compile time logic is configured to modify the processing graph and generate a modified processing graph. The modified processing graph is configured to apply a post-padding tiling after applying a cumulative input padding that confines padding to an input. The cumulative input padding pads the input into a padded input. The post-padding tiling tiles the padded input into a set of pre-padded input tiles with a same tile size, tiles intermediate representation of the input into a set of intermediate tiles with a same tile size, and tiles output representation of the input into a set of non-overlapping output tiles with a same tile size. Runtime logic is configured with the compile time logic to execute the modified processing graph to execute the application.Type: GrantFiled: July 23, 2021Date of Patent: March 19, 2024Assignee: SambaNova Systems, Inc.Inventors: Tejas Nagendra Babu Nama, Ruddhi Chaphekar, Ram Sivaramakrishnan, Raghu Prabhakar, Sumti Jairath, Junjue Wang, Kaizhao Liang, Adi Fuchs, Matheen Musaddiq, Arvind Krishna Sujeeth
-
Patent number: 11928512Abstract: A reconfigurable data processor comprises an array of configurable units configurable to allocate a plurality of sets of configurable units in the array to implement respective execution fragments of the data processing operation. Quiesce logic is coupled to configurable units in the array, configurable to respond to a quiesce control signal to quiesce the sets of configurable units in the array on quiesce boundaries of the respective execution fragments, and to forward quiesce ready signals for the respective execution fragments when the corresponding sets of processing units are ready. An array quiesce controller distributes the quiesce control signal to configurable units in the array, and receives quiesce ready signals for the respective execution fragments from the quiesce logic.Type: GrantFiled: May 17, 2021Date of Patent: March 12, 2024Assignee: SambaNova Systems, Inc.Inventors: Raghu Prabhakar, Manish K. Shah, Pramod Nataraja, David Brian Jackson, Kin Hing Leung, Ram Sivaramakrishnan, Sumti Jairath, Gregory Frederick Grohoski
-
Publication number: 20240073136Abstract: A reconfigurable processing unit is disclosed, comprising a first internal network and a second internal network with different protocols, an interface to an external network with a different protocol, a first configurable unit sending a request to access an external memory over the first internal network, a second configurable unit receiving the request on the first internal network, obtaining a memory address, determining an identifier for the target reconfigurable processing unit, and sending the request, identifier, and memory address over the second internal network, and a third configurable unit receiving the request, identifier, and memory address on the second internal network, determining a routable address on the external network based on the identifier, synthesizing a payload with the request, address, and identifier, and sending the payload to the routable address on the external network.Type: ApplicationFiled: October 25, 2023Publication date: February 29, 2024Applicant: SambaNova Systems, Inc.Inventors: Manish K. SHAH, Ram SIVARAMAKRISHNAN, Gregory Frederick GROHOSKI, Raghu PRABHAKAR
-
Publication number: 20240070106Abstract: A reconfigurable processing unit includes a first and second internal network, an interface to an external network, a first configurable unit coupled to the first internal network, a second configurable unit coupled to both internal networks, and a third configurable unit coupled to both the second internal network and the interface to the external network. The third configurable unit is configured to receive a payload containing a transaction type identifier and an identifier of the second configurable unit through the interface to the external network, and send a first packet including the transaction type identifier to the second configurable unit over the second internal network. The second configurable unit is configured to increment a counter in response to a particular transaction type identifier, and send a token to the first configurable unit over the first internal network while the counter is non-zero and the first configurable unit is executing.Type: ApplicationFiled: October 25, 2023Publication date: February 29, 2024Applicant: SambaNova Systems, Inc.Inventors: Manish K. SHAH, Ram SIVARAMAKRISHNAN, Gregory Frederick GROHOSKI, Raghu PRABHAKAR
-
Publication number: 20240070111Abstract: A reconfigurable processing unit is disclosed, comprising a first internal network and a second internal network with different protocols, an interface to an external network with a different protocol, a first configurable unit connected to the first internal network, a second configurable unit connected to both the first internal network and the second internal network, and a third configurable unit connected to both the second internal network and the interface to the external network. The third configurable unit is configured to receive a payload from the external network and send the transaction type identifier and the source application ID to the second configurable unit over the second internal network. The second configurable unit sends information to the first configurable unit based on the transaction type identifier and the source application ID matching the local application ID retrieved from the register.Type: ApplicationFiled: October 25, 2023Publication date: February 29, 2024Applicant: SambaNova Systems, Inc.Inventors: Manish K. SHAH, Ram SIVARAMAKRISHNAN, Gregory Frederick GROHOSKI, Raghu PRABHAKAR
-
Publication number: 20240073129Abstract: A computing system is disclosed, comprising a plurality of interconnected reconfigurable dataflow units (RDUs). Each RDU includes configurable units, internal networks, and external interfaces. The first configurable unit of the first RDU sends a request to access an external memory attached to the second RDU over its first internal network. The second configurable unit of the first RDU obtains a memory address for the request, determines an identifier for the second RDU, and sends the request, identifier, and memory address to the third configurable unit of the first RDU over its second internal network. The third configurable unit of the first RDU generates a routable address on the external network, synthesizes a payload, and sends it through an external network interface. The third configurable unit of the second RDU receives the payload, and the fourth configurable unit of the second RDU uses the address to access the external memory.Type: ApplicationFiled: October 25, 2023Publication date: February 29, 2024Applicant: SambaNova Systems, Inc.Inventors: Manish K. SHAH, Ram SIVARAMAKRISHNAN, Gregory Frederick GROHOSKI, Raghu PRABHAKAR
-
Patent number: 11886931Abstract: The technology disclosed relates to inter-node execution of configuration files on reconfigurable processors using network interface controller (NIC) buffers. In particular, the technology disclosed relates to a runtime logic that is configured to execute configuration files that define applications and application data for applications using a first reconfigurable processor connected to a first host, and a second reconfigurable processor connected to a second host. The first reconfigurable processor is configured to push input data for the applications in a first plurality of buffers. The first host is configured to cause a first network interface controller (NIC) to stream the input data to a second plurality of buffers from the first plurality of buffers. The second host is configured to cause a second NIC to stream the input data to the second reconfigurable processor from the second plurality of buffers.Type: GrantFiled: November 9, 2021Date of Patent: January 30, 2024Assignee: SambaNova Systems, Inc.Inventors: Ram Sivaramakrishnan, Sumti Jairath, Emre Ali Burhan, Manish K. Shah, Raghu Prabhakar, Ravinder Kumar, Arnav Goel, Ranen Chatterjee, Gregory Frederick Grohoski, Kin Hing Leung, Dawei Huang, Manoj Unnikrishnan, Martin Russell Raumann, Bandish B. Shah
-
Patent number: 11886930Abstract: The technology disclosed relates to runtime execution of functions across reconfigurable processor. In particular, the technology disclosed relates to a runtime logic that is configured to execute a first set of functions in a plurality of functions and/or data therefor on a first reconfigurable processor, and a second set of functions in the plurality of functions and/or data therefor on additional reconfigurable processors. Functions in the second set of functions and/or the data therefor are transmitted to the additional reconfigurable processors using one or more of a first reconfigurable processor-to-additional reconfigurable processors buffers, and results of executing the functions and/or the data therefor on the additional reconfigurable processors are transmitted to the first reconfigurable processor using one or more of additional reconfigurable processors-to-first reconfigurable processor buffers.Type: GrantFiled: November 9, 2021Date of Patent: January 30, 2024Assignee: SambaNova Systems, Inc.Inventors: Ram Sivaramakrishnan, Sumti Jairath, Emre Ali Burhan, Manish K. Shah, Raghu Prabhakar, Ravinder Kumar, Arnav Goel, Ranen Chatterjee, Gregory Frederick Grohoski, Kin Hing Leung, Dawei Huang, Manoj Unnikrishnan, Martin Russell Raumann, Bandish B. Shah
-
Publication number: 20230385043Abstract: A compiler transforms a high-level program into configuration data for a coarse-grained reconfigurable (CGR) data processor with an array of CGR units. The compiler includes a method that identifies a skip buffer in a dataflow graph, determines limitations associated with the array, and searches for a lowest cost implementation topology and stage depth. At least three topologies are considered, including a cascaded buffer topology, a hybrid buffer topology, and a striped buffer topology. The lowest cost implementation topology and stage depth are based on the size of the buffered data (usually, the size of a tensor), the depth of the skip buffer, and the array's limitations. The hybrid buffer topology includes multiple sections of parallel memory units. The data travels between memory units in one section to adjacent memory units in a next section without intervening reorder buffers.Type: ApplicationFiled: September 14, 2022Publication date: November 30, 2023Applicant: SambaNova Systems, Inc.Inventors: Nathan SHEELEY, Weihang FAN, Matheen MUSADDIQ, Ram SIVARAMAKRISHNAN
-
Publication number: 20230367845Abstract: A method comprises executing (K+P) number of transposition cycles to generate a transpose-extended matrix having N rows and (K+P) columns, in which columns 1 to K comprise a transposition of a first matrix having K rows and N columns, and columns (K+1) to (K+P) comprise constants or elements of an N×1 matrix. The method includes computing a sum-product of a row of a second matrix, having M rows and N columns, multiplied by a column among columns 1 to K of the transpose-extended matrix; and, computing a second sum-product of the row of the second matrix multiplied by a column among columns (K+1) to (K+P) of the transpose-extended matrix. The sum-products can comprise gradients of input matrices. A transpose processing unit can execute the transposition cycles to read K rows of the first matrix and insert P number of constant or N×1 columns to generate the transpose-extended matrix.Type: ApplicationFiled: July 24, 2023Publication date: November 16, 2023Applicant: SambaNova Systems, Inc.Inventors: Pramod NATARAJA, Raghu PRABHAKAR, David Brian JACKSON, Ram SIVARAMAKRISHNAN
-
Publication number: 20230367844Abstract: A computing method comprises generating an integrated matrix having (K+P) number of columns, columns 1 through K of the integrated matrix comprising columns 1 through K of a multiplicand matrix and columns (K+1) though P of the integrated matrix comprising addend columns. The method computes K number of products of elements of a row of the integrated matrix multiplied by elements of a column of a second multiplicand matrix; computes a (K+1) product comprising an element of an addend column multiplied by a constant; and, computes a sum of the K number of products added to the (K+1) product. The sum is equivalent to a sum of products of a column of the M×K matrix multiplied by a row of the K×N matrix added to the an element of an addend column of the integrated matrix. A computing system and a computer program product can implement the method.Type: ApplicationFiled: July 24, 2023Publication date: November 16, 2023Applicant: SambaNova Systems, Inc.Inventors: Pramod NATARAJA, Raghu PRABHAKAR, David Brian JACKSON, Ram SIVARAMAKRISHNAN
-
Patent number: 11782729Abstract: A data processing system comprises a pool of reconfigurable data flow resources and a runtime processor. The pool of reconfigurable data flow resources includes arrays of physical configurable units and memory. The runtime processor includes logic to receive a plurality of configuration files for user applications. The configuration files include configurations of virtual data flow resources required to execute the user applications. The runtime processor also includes logic to allocate physical configurable units and memory in the pool of reconfigurable data flow resources to the virtual data flow resources and load the configuration files to the allocated physical configurable units. The runtime processor further includes logic to execute the user applications using the allocated physical configurable units and memory.Type: GrantFiled: August 18, 2020Date of Patent: October 10, 2023Assignee: SambaNova Systems, Inc.Inventors: Gregory Frederick Grohoski, Manish K. Shah, Raghu Prabhakar, Mark Luttrell, Ravinder Kumar, Kin Hing Leung, Ranen Chatterjee, Sumti Jairath, David Alan Koeplinger, Ram Sivaramakrishnan, Matthew Thomas Grimm
-
Publication number: 20230289310Abstract: A reconfigurable data processor comprises an array of configurable units and a bus system. The bus system is connected to the array of configurable units. The bus system includes a top level network and an array level network. The top level network is connected to an external data interface for communication with memory outside of the array of configurable units. The array level network is connected to configurable units in the array of configurable units.Type: ApplicationFiled: May 18, 2023Publication date: September 14, 2023Applicant: SambaNova Systems, Inc.Inventors: Gregory Frederick GROHOSKI, Sumti JAIRATH, Mark LUTTRELL, Raghu PRABHAKAR, Ram SIVARAMAKRISHNAN, Manish K. SHAH
-
Publication number: 20230244748Abstract: A method for multiplying matrices in a coarse-grained computing grid includes assigning each compute unit c of C compute units to a unique submatrix Rc of a result matrix R, wherein the C compute units are arranged in a 2D computing grid, configuring one or more source memory units to provide relevant matrix A data and matrix B data to the C compute units via a plurality of packets, configuring each compute unit c to produce the unique submatrix Rc and send the unique submatrix Rc to one or more desired memory units. The method also includes initiating data flow in the computing grid to produce the result matrix R within the desired memory units. To reduce packet traffic, Matrix B data corresponding to a column of compute units may be narrow-casted to each column of compute units. A corresponding system and computer-readable medium are also disclosed herein.Type: ApplicationFiled: May 25, 2022Publication date: August 3, 2023Applicant: SambaNova Systems, Inc.Inventors: Pramod Natarja, Sitanshu Gupta, Ram Sivaramakrishnan, Ajit Punj
-
Publication number: 20230195686Abstract: A logic unit in an array of processing units is configurable to consume source tokens and a status signal and to produce barrier tokens and an enable signal based on the source tokens and the status signal.Type: ApplicationFiled: February 14, 2023Publication date: June 22, 2023Applicant: SambaNova Systems, Inc.Inventors: Raghu PRABHAKAR, Manish K. SHAH, Ram SIVARAMAKRISHNAN, Pramod NATARAJA, David Brian JACKSON, Gregory Frederick GROHOSKI