Patents by Inventor Mingran WANG

Mingran WANG has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20230409520
    Abstract: A method for a reconfigurable computing system includes receiving a compute graph for execution on multiple RDPs interconnected with a ring network having R interconnected RDPs. A compute graph with a node specifying a reduction operation for a first and second tensor is detected. The detected compute graph node is partitioned into a compute subgraph corresponding to an RDP of the R interconnected RDPs. A first node is inserted into the compute subgraph that specifies a partial reduction operation for producing a partial reduction result corresponding to a shard of the first tensor and a shard of the second tensor. A second node is inserted for communicating the partial reduction result to an adjacent RDP. A third node is inserted that specifies a reduction operation for producing a total reduction result. A fourth node is inserted for communicating the total reduction result to at least one other RDP.
    Type: Application
    Filed: June 9, 2023
    Publication date: December 21, 2023
    Applicant: SambaNova Systems, Inc.
    Inventor: Mingran WANG
  • Publication number: 20230385077
    Abstract: A method for improving runtime performance and alleviating place and route issues in a reconfigurable computing system includes receiving a compute graph for execution on a reconfigurable dataflow processor. The compute graph includes a node specifying a template-based operation on a first and second tensor having a shared batch dimension B. The node may be split into B nodes. Each of the template-based operations on the pair of tensors may be replace with a GeMM operation on the first reduced rank tensor slice and a tile. B nodes that specify the GeMM operation may be appended with at least one first addition node that accepts input from the B nodes to produce a first modified compute graph. The first modified compute graph may be executed. The method describes a significant improvement to overall compute utilization across gradient-sections. Spatial tiling of tensors facilitates gradient calculation without the use of accumulators.
    Type: Application
    Filed: May 26, 2023
    Publication date: November 30, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Mingran WANG, Leon ZHANG
  • Patent number: 11443014
    Abstract: The technology disclosed relates to matrix multiplication where the multiplier can be a sparse matrix. In particular, a multiplication device includes first circuitry configured to obtain the multiplicand matrix and an index of columns of the multiplier matrix and to generate an intermediate matrix that has one row per entry in the index copied from a respective row of the multiplicand matrix based on a value of a corresponding entry in the index. The device also includes second circuitry configured to receive the intermediate matrix from the first circuitry, obtain non-zero values of the multiplier matrix and a list of a number of non-zero entries per row of the multiplier matrix, and generate a product matrix as a result of multiplies of the non-zero values of the multiplier matrix and the intermediate matrix.
    Type: Grant
    Filed: November 5, 2021
    Date of Patent: September 13, 2022
    Assignee: SambaNova Systems, Inc.
    Inventors: Mingran Wang, Raghu Prabhakar, Darshan Dhimantkumar Gandhi, Maulik Subhash Desai, Nathan Francis Sheeley, Scott Layson Burson, Sitanshu Gupta
  • Publication number: 20220261220
    Abstract: A computation unit includes input lines to provide a floating-point value, a first lookup table, a second lookup table, a range detector, and an output stage. The input lines include exponent lines and mantissa lines. The first lookup table has a first address input coupled to a first subset of the input lines to provide a first output. The second lookup table has a second address input coupled to a second subset of the input lines to provide a second output. The range detector is coupled to at least some of the input lines and indicates whether the floating-point value provided on the input lines is within a specified range on a range output. The output stage is operatively coupled to the first output, the second output and the range output, to generate a function output based on the first output, the second output, and the range output.
    Type: Application
    Filed: May 5, 2022
    Publication date: August 18, 2022
    Applicant: SambaNova Systems, Inc.
    Inventors: Mingran WANG, Xiaoyan LI, Yongning SHENG
  • Patent number: 11328038
    Abstract: Herein are disclosed computation units for batch normalization. A computation unit may include a first circuit to traverse a batch of input elements xi having a first format, to produce a mean ?1 in the first format and a mean ?2 in a second format, the second format having more bits than the first format. The computation unit may further include a second circuit operatively coupled to the first circuit to traverse the batch of input elements xi to produce a standard deviation ? for the batch using the mean ?1 in the first format. The computation unit may also include a third circuit operatively coupled to the second circuit to traverse the batch of input elements xi to produce a normalized set of values yi using the mean ?2 in the second format and the standard deviation ?.
    Type: Grant
    Filed: November 25, 2019
    Date of Patent: May 10, 2022
    Assignee: SambaNova Systems, Inc.
    Inventors: Mingran Wang, Xiaoyan Li, Yongning Sheng
  • Patent number: 11327923
    Abstract: A functional unit for a data processor comprises an input register to store a variable X; a first circuit, having an input connected to the input register and an output, to generate a value eX on its output; a second circuit, having an input connected to the input register and an output, to generate an output which is a value (tan h(X/2)+1)/2 on its output; a comparator, having an input connected to the input register and an output, to generate a line on its output based on a comparison between X and a constant; and a selector to select between inputs connected to the outputs of the first circuit and the second circuit, in response to the output of the comparator, and provide an output representing a value sigmoid(X).
    Type: Grant
    Filed: September 4, 2019
    Date of Patent: May 10, 2022
    Assignee: SambaNova Systems, Inc.
    Inventors: Mingran Wang, Mark Luttrell, Yongning Sheng
  • Patent number: 11327713
    Abstract: A computation unit comprises a floating point input having X bits including a sign bit, an E bit exponent and an M bit mantissa. A first circuit is operatively coupled to receive X-N bits of the input, including e1 bits of the exponent and ml bits of the mantissa, where e1?E, and m1?M, to output values over a first domain of the input. A second circuit is operatively coupled to receive X-K bits of the input, including e2 bits of the exponent, e2<e1, and m2 bits of the mantissa, m2>m1, to output values, over a second domain of the input. A range detector is operatively coupled to the input, to indicate a range in response to a value of the input. A selector can select the output of the first circuit or of the second circuit in response to the range detector.
    Type: Grant
    Filed: October 1, 2019
    Date of Patent: May 10, 2022
    Assignee: SambaNova Systems, Inc.
    Inventors: Mingran Wang, Xiaoyan Li, Yongning Sheng
  • Patent number: 11327717
    Abstract: A computation unit computes a function f(I). The function f(I) has a target output range over a first domain of an input I encoded using a first format. A first circuit receives the encoded input I in the first format including X bits, to add an offset C to the encoded input I to generate an offset input SI=I+C, in a second format including fewer than X bits. The offset C is equal to a difference between the first domain in f(I) and a higher precision domain of the second format for the offset input SI. A second circuit is operatively coupled to receive the offset input SI in the second format, to output a value equal to a function f(SI) to provide an encoded output value f(I).
    Type: Grant
    Filed: November 19, 2019
    Date of Patent: May 10, 2022
    Assignee: SambaNova Systems, Inc.
    Inventors: Mingran Wang, Xiaoyan Li, Yongning Sheng
  • Patent number: 11250105
    Abstract: A computation unit that comprises (i) a multiplicand vector decomposer that generates a decomposed multiplicand vector which uses a sequence of first and second concatenated multiplicand sub-elements (1st2ndCMCSE) in a lower-precision format (LPF) to represent corresponding ones of multiplicand elements in a multiplicand vector in a higher-precision format (HPF), (ii) a multiplier vector decomposer that generates a decomposed multiplier vector which uses a sequence of first and second concatenated multiplier sub-elements (1st2ndCMLSE) in the LPF to represent corresponding ones of multiplier elements in a multiplier vector in the HPF, (iii) a multiplicand tensor encoder that encodes double reads of the sequence of the 1st2ndCMCSE in a decomposed multiplicand tensor, and (iv) a product vector generator that generates a product vector containing a sequence of first and second concatenated product sub-elements by executing general matrix-matrix multiplication (GeMM) operations between the double reads of the 1st2
    Type: Grant
    Filed: May 12, 2020
    Date of Patent: February 15, 2022
    Assignee: SambaNova Systems, Inc.
    Inventors: Mingran Wang, Xiaoyan Li, Yongning Sheng
  • Publication number: 20210357475
    Abstract: A computation unit that comprises (i) a multiplicand vector decomposer that generates a decomposed multiplicand vector which uses a sequence of first and second concatenated multiplicand sub-elements (1st2ndCMCSE) in a lower-precision format (LPF) to represent corresponding ones of multiplicand elements in a multiplicand vector in a higher-precision format (HPF), (ii) a multiplier vector decomposer that generates a decomposed multiplier vector which uses a sequence of first and second concatenated multiplier sub-elements (1st2ndCMLSE) in the LPF to represent corresponding ones of multiplier elements in a multiplier vector in the HPF, (iii) a multiplicand tensor encoder that encodes double reads of the sequence of the 1st2ndCMCSE in a decomposed multiplicand tensor, and (iv) a product vector generator that generates a product vector containing a sequence of first and second concatenated product sub-elements by executing general matrix-matrix multiplication (GeMM) operations between the double reads of the 1st2
    Type: Application
    Filed: May 12, 2020
    Publication date: November 18, 2021
    Applicant: SambaNova Systems, Inc.
    Inventors: Mingran WANG, Xiaoyan LI, Yongning SHENG
  • Patent number: 11150872
    Abstract: Herein are disclosed computation units for element approximation. A computation unit may include a first circuit to compute a first projection ? of an input element xi from a first range to a second range. In the first circuit, the input element xi may have a first format and the projected element yi may have a second format. In addition, in the first circuit, the second format may have more bits than the first format. The computation unit may further include a second circuit operatively coupled to the first circuit to produce a reduction zi in the first format using the projected element yi in the second format. The computation unit may also include a third circuit operatively coupled to the second circuit to compute a second projection ? of the reduction zi from the second range to the first range to produce an approximation wi.
    Type: Grant
    Filed: December 17, 2019
    Date of Patent: October 19, 2021
    Assignee: SambaNova Systems, Inc.
    Inventors: Mingran Wang, Xiaoyan Li, Mark Luttrell, Yongning Sheng, Gregory Frederick Grohoski
  • Publication number: 20210182021
    Abstract: Herein are disclosed computation units for element approximation. A computation unit may include a first circuit to compute a first projection ? of an input element xi from a first range to a second range. In the first circuit, the input element xi may have a first format and the projected element yi may have a second format. In addition, in the first circuit, the second format may have more bits than the first format. The computation unit may further include a second circuit operatively coupled to the first circuit to produce a reduction zi in the first format using the projected element yi in the second format. The computation unit may also include a third circuit operatively coupled to the second circuit to compute a second projection ? of the reduction zi from the second range to the first range to produce an approximation wi.
    Type: Application
    Filed: December 17, 2019
    Publication date: June 17, 2021
    Applicant: SambaNova Systems, Inc.
    Inventors: Mingran WANG, Xiaoyan Li, Mark Luttrell, Yongning Sheng, Gregory Frederick Grohoski
  • Publication number: 20210157550
    Abstract: Herein are disclosed computation units for batch normalization. A computation unit may include a first circuit to traverse a batch of input elements xi having a first format, to produce a mean ?1 in the first format and a mean ?2 in a second format, the second format having more bits than the first format. The computation unit may further include a second circuit operatively coupled to the first circuit to traverse the batch of input elements xi to produce a standard deviation ? for the batch using the mean ?1 in the first format. The computation unit may also include a third circuit operatively coupled to the second circuit to traverse the batch of input elements xi to produce a normalized set of values yi using the mean ?2 in the second format and the standard deviation ?.
    Type: Application
    Filed: November 25, 2019
    Publication date: May 27, 2021
    Applicant: SambaNova Systems, Inc.
    Inventors: Mingran WANG, Xiaoyan LI, Yongning SHENG
  • Publication number: 20210149634
    Abstract: A computation unit computes a function f(I). The function f(I) has a target output range over a first domain of an input I encoded using a first format. A first circuit receives the encoded input I in the first format including X bits, to add an offset C to the encoded input I to generate an offset input SI=I+C, in a second format including fewer than X bits. The offset C is equal to a difference between the first domain in f(I) and a higher precision domain of the second format for the offset input SI. A second circuit is operatively coupled to receive the offset input SI in the second format, to output a value equal to a function f(SI) to provide an encoded output value f(I).
    Type: Application
    Filed: November 19, 2019
    Publication date: May 20, 2021
    Applicant: SambaNova Systems, Inc.
    Inventors: Mingran WANG, Xiaoyan LI, Yongning SHENG
  • Publication number: 20210096816
    Abstract: A computation unit comprises a floating point input having X bits including a sign bit, an E bit exponent and an M bit mantissa. A first circuit is operatively coupled to receive X-N bits of the input, including e1 bits of the exponent and ml bits of the mantissa, where e1?E, and m1?M, to output values over a first domain of the input. A second circuit is operatively coupled to receive X-K bits of the input, including e2 bits of the exponent, e2<e1, and m2 bits of the mantissa, m2>m1, to output values, over a second domain of the input. A range detector is operatively coupled to the input, to indicate a range in response to a value of the input. A selector can select the output of the first circuit or of the second circuit in response to the range detector.
    Type: Application
    Filed: October 1, 2019
    Publication date: April 1, 2021
    Applicant: SambaNova Systems, Inc.
    Inventors: Mingran WANG, Xiaoyan LI, Yongning SHENG
  • Publication number: 20210064568
    Abstract: A functional unit for a data processor comprises an input register to store a variable X; a first circuit, having an input connected to the input register and an output, to generate a value eX on its output; a second circuit, having an input connected to the input register and an output, to generate an output which is a value (tan h(X/2)+1)/2 on its output; a comparator, having an input connected to the input register and an output, to generate a line on its output based on a comparison between X and a constant; and a selector to select between inputs connected to the outputs of the first circuit and the second circuit, in response to the output of the comparator, and provide an output representing a value sigmoid(X).
    Type: Application
    Filed: September 4, 2019
    Publication date: March 4, 2021
    Applicant: SambaNova Systems, Inc.
    Inventors: Mingran WANG, Mark LUTTRELL, Yongning SHENG