Patents by Inventor Mingran WANG
Mingran WANG has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20260147728Abstract: In one implementation, a computer implemented method for predicting new information based on a previous information may include creating a draft cache and a target cache within a memory of the computer wherein the draft cache and the target cache have a fixed length. The method may include storing tokens representing the information into the caches, the storing may start at a leftmost cache location. The method may also include forming a mask and an index or pointer for each cache. As new entries are formed, they are added to the caches, and the masks and pointers are updated to include to the entries that are added.Type: ApplicationFiled: November 20, 2025Publication date: May 28, 2026Applicant: SambaNova Systems, Inc.Inventors: John LONG, Bo Li, Reid GOODBAR, Tuowen Zhao, Mingran WANG, Leon ZHANG
-
Publication number: 20260147571Abstract: A method and system are provided a dataflow using a reconfigurable dataflow architecture (RDA) with repeated layered structure having synchronization boundaries between layers for a large language model used in machine learning for artificial intelligence (AI). The method first identifies in the dataflow consecutive calls to a same kernel, wherein the consecutive calls are made in an inference process involving generating a sequence of output tokens based on a given set of input tokens. The method then applies kernel looping to transform the identified consecutive calls in the dataflow into a single call to the kernel by identifying from a list of pattern matched calls ones to include in the single call to the kernel.Type: ApplicationFiled: January 15, 2026Publication date: May 28, 2026Applicant: SambaNova Systems, Inc.Inventors: David Alan KOEPLINGER, Darshan GANDHI, Pushkar Shridhar NANDKAR, Nathan Francis SHEELEY, Matheen MUSADDIQ, Leon ZHANG, Reid GOODBAR, Matthew SHAFFER, Han Wang, Angela WANG, Mingran WANG, Raghu PRABHAKAR, Peter Buckman
-
Publication number: 20260147570Abstract: A system and method provide kernel looping to eliminate synchronization boundaries to achieve peak inference performance in dataflow accelerators. The system provides a dataflow using a reconfigurable dataflow architecture (RDA) with repeated layered structure having synchronization boundaries between layers for a large language model used in machine learning. The method provides kernel looping between layers by first identifying in the dataflow consecutive calls to a same kernel, wherein the consecutive calls are made in an inference process involving generating a sequence of output tokens based on a given set of input tokens. The method then applies kernel looping to transform the identified consecutive calls in the dataflow into a single call to the kernel. The method further modifies the kernel to contain a pipelined outer loop.Type: ApplicationFiled: November 24, 2025Publication date: May 28, 2026Applicant: SambaNova Systems, Inc.Inventors: David Alan KOEPLINGER, Darshan GANDHI, Pushkar Shridhar NANDKAR, Nathan Francis SHEELEY, Matheen MUSADDIQ, Leon ZHANG, Reid GOODBAR, Matthew SHAFFER, Han Wang, Angela WANG, Mingran WANG, Raghu PRABHAKAR, Peter Buckman
-
Patent number: 12632269Abstract: A method for improving runtime performance and alleviating place and route issues in a reconfigurable computing system includes receiving a compute graph for execution on a reconfigurable dataflow processor. The compute graph includes a node specifying a template-based operation on a first and second tensor having a shared batch dimension B. The node may be split into B nodes. Each of the template-based operations on the pair of tensors may be replace with a GeMM operation on the first reduced rank tensor slice and a tile. B nodes that specify the GeMM operation may be appended with at least one first addition node that accepts input from the B nodes to produce a first modified compute graph. The first modified compute graph may be executed. The method describes a significant improvement to overall compute utilization across gradient-sections. Spatial tiling of tensors facilitates gradient calculation without the use of accumulators.Type: GrantFiled: May 26, 2023Date of Patent: May 19, 2026Assignee: SambaNova Systems, Inc.Inventors: Mingran Wang, Leon Zhang
-
Publication number: 20260064627Abstract: A method for a reconfigurable computing system includes receiving a compute graph for execution on multiple RDPs interconnected with a ring network having R interconnected RDPs. A compute graph with a node specifying a reduction operation for a first and second tensor is detected. Executing the compute graph on the multiple RDPs.Type: ApplicationFiled: November 7, 2025Publication date: March 5, 2026Applicant: SambaNova Systems, Inc.Inventor: Mingran WANG
-
Patent number: 12487965Abstract: A method for a reconfigurable computing system includes receiving a compute graph for execution on multiple RDPs interconnected with a ring network having R interconnected RDPs. A compute graph with a node specifying a reduction operation for a first and second tensor is detected. The detected compute graph node is partitioned into a compute subgraph corresponding to an RDP of the R interconnected RDPs. A first node is inserted into the compute subgraph that specifies a partial reduction operation for producing a partial reduction result corresponding to a shard of the first tensor and a shard of the second tensor. A second node is inserted for communicating the partial reduction result to an adjacent RDP. A third node is inserted that specifies a reduction operation for producing a total reduction result. A fourth node is inserted for communicating the total reduction result to at least one other RDP.Type: GrantFiled: June 9, 2023Date of Patent: December 2, 2025Assignee: SambaNova Systems, Inc.Inventor: Mingran Wang
-
Publication number: 20250278242Abstract: A circuit for function calculation includes input lines for providing an input value, a first lookup table with an address input connected to a first subset of the input lines to generate a first output, and a second lookup table with an address input connected to a second subset of the input lines to generate a second output. A range detector is connected to at least some of the input lines to produce a range output indicating whether the input value is within a specified range. An output stage, connected to the first and second outputs and the range output, generates the function calculation output for the input value based on these outputs.Type: ApplicationFiled: May 16, 2025Publication date: September 4, 2025Applicant: SambaNova Systems, Inc.Inventors: Mingran WANG, Xiaoyan LI, Yongning SHENG
-
Patent number: 12333270Abstract: A computation unit includes input lines to provide a floating-point value, a first lookup table, a second lookup table, a range detector, and an output stage. The input lines include exponent lines and mantissa lines. The first lookup table has a first address input coupled to a first subset of the input lines to provide a first output. The second lookup table has a second address input coupled to a second subset of the input lines to provide a second output. The range detector is coupled to at least some of the input lines and indicates whether the floating-point value provided on the input lines is within a specified range on a range output. The output stage is operatively coupled to the first output, the second output and the range output, to generate a function output based on the first output, the second output, and the range output.Type: GrantFiled: May 5, 2022Date of Patent: June 17, 2025Assignee: SambaNova Systems, Inc.Inventors: Mingran Wang, Xiaoyan Li, Yongning Sheng
-
Publication number: 20230409520Abstract: A method for a reconfigurable computing system includes receiving a compute graph for execution on multiple RDPs interconnected with a ring network having R interconnected RDPs. A compute graph with a node specifying a reduction operation for a first and second tensor is detected. The detected compute graph node is partitioned into a compute subgraph corresponding to an RDP of the R interconnected RDPs. A first node is inserted into the compute subgraph that specifies a partial reduction operation for producing a partial reduction result corresponding to a shard of the first tensor and a shard of the second tensor. A second node is inserted for communicating the partial reduction result to an adjacent RDP. A third node is inserted that specifies a reduction operation for producing a total reduction result. A fourth node is inserted for communicating the total reduction result to at least one other RDP.Type: ApplicationFiled: June 9, 2023Publication date: December 21, 2023Applicant: SambaNova Systems, Inc.Inventor: Mingran WANG
-
Publication number: 20230385077Abstract: A method for improving runtime performance and alleviating place and route issues in a reconfigurable computing system includes receiving a compute graph for execution on a reconfigurable dataflow processor. The compute graph includes a node specifying a template-based operation on a first and second tensor having a shared batch dimension B. The node may be split into B nodes. Each of the template-based operations on the pair of tensors may be replace with a GeMM operation on the first reduced rank tensor slice and a tile. B nodes that specify the GeMM operation may be appended with at least one first addition node that accepts input from the B nodes to produce a first modified compute graph. The first modified compute graph may be executed. The method describes a significant improvement to overall compute utilization across gradient-sections. Spatial tiling of tensors facilitates gradient calculation without the use of accumulators.Type: ApplicationFiled: May 26, 2023Publication date: November 30, 2023Applicant: SambaNova Systems, Inc.Inventors: Mingran WANG, Leon ZHANG
-
Patent number: 11443014Abstract: The technology disclosed relates to matrix multiplication where the multiplier can be a sparse matrix. In particular, a multiplication device includes first circuitry configured to obtain the multiplicand matrix and an index of columns of the multiplier matrix and to generate an intermediate matrix that has one row per entry in the index copied from a respective row of the multiplicand matrix based on a value of a corresponding entry in the index. The device also includes second circuitry configured to receive the intermediate matrix from the first circuitry, obtain non-zero values of the multiplier matrix and a list of a number of non-zero entries per row of the multiplier matrix, and generate a product matrix as a result of multiplies of the non-zero values of the multiplier matrix and the intermediate matrix.Type: GrantFiled: November 5, 2021Date of Patent: September 13, 2022Assignee: SambaNova Systems, Inc.Inventors: Mingran Wang, Raghu Prabhakar, Darshan Dhimantkumar Gandhi, Maulik Subhash Desai, Nathan Francis Sheeley, Scott Layson Burson, Sitanshu Gupta
-
Publication number: 20220261220Abstract: A computation unit includes input lines to provide a floating-point value, a first lookup table, a second lookup table, a range detector, and an output stage. The input lines include exponent lines and mantissa lines. The first lookup table has a first address input coupled to a first subset of the input lines to provide a first output. The second lookup table has a second address input coupled to a second subset of the input lines to provide a second output. The range detector is coupled to at least some of the input lines and indicates whether the floating-point value provided on the input lines is within a specified range on a range output. The output stage is operatively coupled to the first output, the second output and the range output, to generate a function output based on the first output, the second output, and the range output.Type: ApplicationFiled: May 5, 2022Publication date: August 18, 2022Applicant: SambaNova Systems, Inc.Inventors: Mingran WANG, Xiaoyan LI, Yongning SHENG
-
Patent number: 11327923Abstract: A functional unit for a data processor comprises an input register to store a variable X; a first circuit, having an input connected to the input register and an output, to generate a value eX on its output; a second circuit, having an input connected to the input register and an output, to generate an output which is a value (tan h(X/2)+1)/2 on its output; a comparator, having an input connected to the input register and an output, to generate a line on its output based on a comparison between X and a constant; and a selector to select between inputs connected to the outputs of the first circuit and the second circuit, in response to the output of the comparator, and provide an output representing a value sigmoid(X).Type: GrantFiled: September 4, 2019Date of Patent: May 10, 2022Assignee: SambaNova Systems, Inc.Inventors: Mingran Wang, Mark Luttrell, Yongning Sheng
-
Patent number: 11328038Abstract: Herein are disclosed computation units for batch normalization. A computation unit may include a first circuit to traverse a batch of input elements xi having a first format, to produce a mean ?1 in the first format and a mean ?2 in a second format, the second format having more bits than the first format. The computation unit may further include a second circuit operatively coupled to the first circuit to traverse the batch of input elements xi to produce a standard deviation ? for the batch using the mean ?1 in the first format. The computation unit may also include a third circuit operatively coupled to the second circuit to traverse the batch of input elements xi to produce a normalized set of values yi using the mean ?2 in the second format and the standard deviation ?.Type: GrantFiled: November 25, 2019Date of Patent: May 10, 2022Assignee: SambaNova Systems, Inc.Inventors: Mingran Wang, Xiaoyan Li, Yongning Sheng
-
Patent number: 11327713Abstract: A computation unit comprises a floating point input having X bits including a sign bit, an E bit exponent and an M bit mantissa. A first circuit is operatively coupled to receive X-N bits of the input, including e1 bits of the exponent and ml bits of the mantissa, where e1?E, and m1?M, to output values over a first domain of the input. A second circuit is operatively coupled to receive X-K bits of the input, including e2 bits of the exponent, e2<e1, and m2 bits of the mantissa, m2>m1, to output values, over a second domain of the input. A range detector is operatively coupled to the input, to indicate a range in response to a value of the input. A selector can select the output of the first circuit or of the second circuit in response to the range detector.Type: GrantFiled: October 1, 2019Date of Patent: May 10, 2022Assignee: SambaNova Systems, Inc.Inventors: Mingran Wang, Xiaoyan Li, Yongning Sheng
-
Patent number: 11327717Abstract: A computation unit computes a function f(I). The function f(I) has a target output range over a first domain of an input I encoded using a first format. A first circuit receives the encoded input I in the first format including X bits, to add an offset C to the encoded input I to generate an offset input SI=I+C, in a second format including fewer than X bits. The offset C is equal to a difference between the first domain in f(I) and a higher precision domain of the second format for the offset input SI. A second circuit is operatively coupled to receive the offset input SI in the second format, to output a value equal to a function f(SI) to provide an encoded output value f(I).Type: GrantFiled: November 19, 2019Date of Patent: May 10, 2022Assignee: SambaNova Systems, Inc.Inventors: Mingran Wang, Xiaoyan Li, Yongning Sheng
-
Patent number: 11250105Abstract: A computation unit that comprises (i) a multiplicand vector decomposer that generates a decomposed multiplicand vector which uses a sequence of first and second concatenated multiplicand sub-elements (1st2ndCMCSE) in a lower-precision format (LPF) to represent corresponding ones of multiplicand elements in a multiplicand vector in a higher-precision format (HPF), (ii) a multiplier vector decomposer that generates a decomposed multiplier vector which uses a sequence of first and second concatenated multiplier sub-elements (1st2ndCMLSE) in the LPF to represent corresponding ones of multiplier elements in a multiplier vector in the HPF, (iii) a multiplicand tensor encoder that encodes double reads of the sequence of the 1st2ndCMCSE in a decomposed multiplicand tensor, and (iv) a product vector generator that generates a product vector containing a sequence of first and second concatenated product sub-elements by executing general matrix-matrix multiplication (GeMM) operations between the double reads of the 1st2Type: GrantFiled: May 12, 2020Date of Patent: February 15, 2022Assignee: SambaNova Systems, Inc.Inventors: Mingran Wang, Xiaoyan Li, Yongning Sheng
-
Publication number: 20210357475Abstract: A computation unit that comprises (i) a multiplicand vector decomposer that generates a decomposed multiplicand vector which uses a sequence of first and second concatenated multiplicand sub-elements (1st2ndCMCSE) in a lower-precision format (LPF) to represent corresponding ones of multiplicand elements in a multiplicand vector in a higher-precision format (HPF), (ii) a multiplier vector decomposer that generates a decomposed multiplier vector which uses a sequence of first and second concatenated multiplier sub-elements (1st2ndCMLSE) in the LPF to represent corresponding ones of multiplier elements in a multiplier vector in the HPF, (iii) a multiplicand tensor encoder that encodes double reads of the sequence of the 1st2ndCMCSE in a decomposed multiplicand tensor, and (iv) a product vector generator that generates a product vector containing a sequence of first and second concatenated product sub-elements by executing general matrix-matrix multiplication (GeMM) operations between the double reads of the 1st2Type: ApplicationFiled: May 12, 2020Publication date: November 18, 2021Applicant: SambaNova Systems, Inc.Inventors: Mingran WANG, Xiaoyan LI, Yongning SHENG
-
Patent number: 11150872Abstract: Herein are disclosed computation units for element approximation. A computation unit may include a first circuit to compute a first projection ? of an input element xi from a first range to a second range. In the first circuit, the input element xi may have a first format and the projected element yi may have a second format. In addition, in the first circuit, the second format may have more bits than the first format. The computation unit may further include a second circuit operatively coupled to the first circuit to produce a reduction zi in the first format using the projected element yi in the second format. The computation unit may also include a third circuit operatively coupled to the second circuit to compute a second projection ? of the reduction zi from the second range to the first range to produce an approximation wi.Type: GrantFiled: December 17, 2019Date of Patent: October 19, 2021Assignee: SambaNova Systems, Inc.Inventors: Mingran Wang, Xiaoyan Li, Mark Luttrell, Yongning Sheng, Gregory Frederick Grohoski
-
Publication number: 20210182021Abstract: Herein are disclosed computation units for element approximation. A computation unit may include a first circuit to compute a first projection ? of an input element xi from a first range to a second range. In the first circuit, the input element xi may have a first format and the projected element yi may have a second format. In addition, in the first circuit, the second format may have more bits than the first format. The computation unit may further include a second circuit operatively coupled to the first circuit to produce a reduction zi in the first format using the projected element yi in the second format. The computation unit may also include a third circuit operatively coupled to the second circuit to compute a second projection ? of the reduction zi from the second range to the first range to produce an approximation wi.Type: ApplicationFiled: December 17, 2019Publication date: June 17, 2021Applicant: SambaNova Systems, Inc.Inventors: Mingran WANG, Xiaoyan Li, Mark Luttrell, Yongning Sheng, Gregory Frederick Grohoski