Patents by Inventor Mingran WANG

Mingran WANG has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Calculating a floating-point function using multiple lookup tables

Patent number: 12333270

Abstract: A computation unit includes input lines to provide a floating-point value, a first lookup table, a second lookup table, a range detector, and an output stage. The input lines include exponent lines and mantissa lines. The first lookup table has a first address input coupled to a first subset of the input lines to provide a first output. The second lookup table has a second address input coupled to a second subset of the input lines to provide a second output. The range detector is coupled to at least some of the input lines and indicates whether the floating-point value provided on the input lines is within a specified range on a range output. The output stage is operatively coupled to the first output, the second output and the range output, to generate a function output based on the first output, the second output, and the range output.

Type: Grant

Filed: May 5, 2022

Date of Patent: June 17, 2025

Assignee: SambaNova Systems, Inc.

Inventors: Mingran Wang, Xiaoyan Li, Yongning Sheng
All Reduce Across Multiple Reconfigurable Dataflow Processors

Publication number: 20230409520

Abstract: A method for a reconfigurable computing system includes receiving a compute graph for execution on multiple RDPs interconnected with a ring network having R interconnected RDPs. A compute graph with a node specifying a reduction operation for a first and second tensor is detected. The detected compute graph node is partitioned into a compute subgraph corresponding to an RDP of the R interconnected RDPs. A first node is inserted into the compute subgraph that specifies a partial reduction operation for producing a partial reduction result corresponding to a shard of the first tensor and a shard of the second tensor. A second node is inserted for communicating the partial reduction result to an adjacent RDP. A third node is inserted that specifies a reduction operation for producing a total reduction result. A fourth node is inserted for communicating the total reduction result to at least one other RDP.

Type: Application

Filed: June 9, 2023

Publication date: December 21, 2023

Applicant: SambaNova Systems, Inc.

Inventor: Mingran WANG
Forward-style Gradient GeMMs

Publication number: 20230385077

Abstract: A method for improving runtime performance and alleviating place and route issues in a reconfigurable computing system includes receiving a compute graph for execution on a reconfigurable dataflow processor. The compute graph includes a node specifying a template-based operation on a first and second tensor having a shared batch dimension B. The node may be split into B nodes. Each of the template-based operations on the pair of tensors may be replace with a GeMM operation on the first reduced rank tensor slice and a tile. B nodes that specify the GeMM operation may be appended with at least one first addition node that accepts input from the B nodes to produce a first modified compute graph. The first modified compute graph may be executed. The method describes a significant improvement to overall compute utilization across gradient-sections. Spatial tiling of tensors facilitates gradient calculation without the use of accumulators.

Type: Application

Filed: May 26, 2023

Publication date: November 30, 2023

Applicant: SambaNova Systems, Inc.

Inventors: Mingran WANG, Leon ZHANG
Sparse matrix multiplier in hardware and a reconfigurable data processor including same

Patent number: 11443014

Abstract: The technology disclosed relates to matrix multiplication where the multiplier can be a sparse matrix. In particular, a multiplication device includes first circuitry configured to obtain the multiplicand matrix and an index of columns of the multiplier matrix and to generate an intermediate matrix that has one row per entry in the index copied from a respective row of the multiplicand matrix based on a value of a corresponding entry in the index. The device also includes second circuitry configured to receive the intermediate matrix from the first circuitry, obtain non-zero values of the multiplier matrix and a list of a number of non-zero entries per row of the multiplier matrix, and generate a product matrix as a result of multiplies of the non-zero values of the multiplier matrix and the intermediate matrix.

Type: Grant

Filed: November 5, 2021

Date of Patent: September 13, 2022

Assignee: SambaNova Systems, Inc.

Inventors: Mingran Wang, Raghu Prabhakar, Darshan Dhimantkumar Gandhi, Maulik Subhash Desai, Nathan Francis Sheeley, Scott Layson Burson, Sitanshu Gupta
Calculating a Floating-Point Function using Multiple Lookup Tables

Publication number: 20220261220

Abstract: A computation unit includes input lines to provide a floating-point value, a first lookup table, a second lookup table, a range detector, and an output stage. The input lines include exponent lines and mantissa lines. The first lookup table has a first address input coupled to a first subset of the input lines to provide a first output. The second lookup table has a second address input coupled to a second subset of the input lines to provide a second output. The range detector is coupled to at least some of the input lines and indicates whether the floating-point value provided on the input lines is within a specified range on a range output. The output stage is operatively coupled to the first output, the second output and the range output, to generate a function output based on the first output, the second output, and the range output.

Type: Application

Filed: May 5, 2022

Publication date: August 18, 2022

Applicant: SambaNova Systems, Inc.

Inventors: Mingran WANG, Xiaoyan LI, Yongning SHENG
Sigmoid function in hardware and a reconfigurable data processor including same

Patent number: 11327923

Abstract: A functional unit for a data processor comprises an input register to store a variable X; a first circuit, having an input connected to the input register and an output, to generate a value eX on its output; a second circuit, having an input connected to the input register and an output, to generate an output which is a value (tan h(X/2)+1)/2 on its output; a comparator, having an input connected to the input register and an output, to generate a line on its output based on a comparison between X and a constant; and a selector to select between inputs connected to the outputs of the first circuit and the second circuit, in response to the output of the comparator, and provide an output representing a value sigmoid(X).

Type: Grant

Filed: September 4, 2019

Date of Patent: May 10, 2022

Assignee: SambaNova Systems, Inc.

Inventors: Mingran Wang, Mark Luttrell, Yongning Sheng
Computational units for batch normalization

Patent number: 11328038

Abstract: Herein are disclosed computation units for batch normalization. A computation unit may include a first circuit to traverse a batch of input elements xi having a first format, to produce a mean ?1 in the first format and a mean ?2 in a second format, the second format having more bits than the first format. The computation unit may further include a second circuit operatively coupled to the first circuit to traverse the batch of input elements xi to produce a standard deviation ? for the batch using the mean ?1 in the first format. The computation unit may also include a third circuit operatively coupled to the second circuit to traverse the batch of input elements xi to produce a normalized set of values yi using the mean ?2 in the second format and the standard deviation ?.

Type: Grant

Filed: November 25, 2019

Date of Patent: May 10, 2022

Assignee: SambaNova Systems, Inc.

Inventors: Mingran Wang, Xiaoyan Li, Yongning Sheng
Computation units for functions based on lookup tables

Patent number: 11327713

Abstract: A computation unit comprises a floating point input having X bits including a sign bit, an E bit exponent and an M bit mantissa. A first circuit is operatively coupled to receive X-N bits of the input, including e1 bits of the exponent and ml bits of the mantissa, where e1?E, and m1?M, to output values over a first domain of the input. A second circuit is operatively coupled to receive X-K bits of the input, including e2 bits of the exponent, e2<e1, and m2 bits of the mantissa, m2>m1, to output values, over a second domain of the input. A range detector is operatively coupled to the input, to indicate a range in response to a value of the input. A selector can select the output of the first circuit or of the second circuit in response to the range detector.

Type: Grant

Filed: October 1, 2019

Date of Patent: May 10, 2022

Assignee: SambaNova Systems, Inc.

Inventors: Mingran Wang, Xiaoyan Li, Yongning Sheng
Look-up table with input offsetting

Patent number: 11327717

Abstract: A computation unit computes a function f(I). The function f(I) has a target output range over a first domain of an input I encoded using a first format. A first circuit receives the encoded input I in the first format including X bits, to add an offset C to the encoded input I to generate an offset input SI=I+C, in a second format including fewer than X bits. The offset C is equal to a difference between the first domain in f(I) and a higher precision domain of the second format for the offset input SI. A second circuit is operatively coupled to receive the offset input SI in the second format, to output a value equal to a function f(SI) to provide an encoded output value f(I).

Type: Grant

Filed: November 19, 2019

Date of Patent: May 10, 2022

Assignee: SambaNova Systems, Inc.

Inventors: Mingran Wang, Xiaoyan Li, Yongning Sheng
Computationally efficient general matrix-matrix multiplication (GeMM)

Patent number: 11250105

Abstract: A computation unit that comprises (i) a multiplicand vector decomposer that generates a decomposed multiplicand vector which uses a sequence of first and second concatenated multiplicand sub-elements (1st2ndCMCSE) in a lower-precision format (LPF) to represent corresponding ones of multiplicand elements in a multiplicand vector in a higher-precision format (HPF), (ii) a multiplier vector decomposer that generates a decomposed multiplier vector which uses a sequence of first and second concatenated multiplier sub-elements (1st2ndCMLSE) in the LPF to represent corresponding ones of multiplier elements in a multiplier vector in the HPF, (iii) a multiplicand tensor encoder that encodes double reads of the sequence of the 1st2ndCMCSE in a decomposed multiplicand tensor, and (iv) a product vector generator that generates a product vector containing a sequence of first and second concatenated product sub-elements by executing general matrix-matrix multiplication (GeMM) operations between the double reads of the 1st2

Type: Grant

Filed: May 12, 2020

Date of Patent: February 15, 2022

Assignee: SambaNova Systems, Inc.

Inventors: Mingran Wang, Xiaoyan Li, Yongning Sheng
Computationally Efficient General Matrix-Matrix Multiplication (GeMM)

Publication number: 20210357475

Abstract: A computation unit that comprises (i) a multiplicand vector decomposer that generates a decomposed multiplicand vector which uses a sequence of first and second concatenated multiplicand sub-elements (1st2ndCMCSE) in a lower-precision format (LPF) to represent corresponding ones of multiplicand elements in a multiplicand vector in a higher-precision format (HPF), (ii) a multiplier vector decomposer that generates a decomposed multiplier vector which uses a sequence of first and second concatenated multiplier sub-elements (1st2ndCMLSE) in the LPF to represent corresponding ones of multiplier elements in a multiplier vector in the HPF, (iii) a multiplicand tensor encoder that encodes double reads of the sequence of the 1st2ndCMCSE in a decomposed multiplicand tensor, and (iv) a product vector generator that generates a product vector containing a sequence of first and second concatenated product sub-elements by executing general matrix-matrix multiplication (GeMM) operations between the double reads of the 1st2

Type: Application

Filed: May 12, 2020

Publication date: November 18, 2021

Applicant: SambaNova Systems, Inc.

Inventors: Mingran WANG, Xiaoyan LI, Yongning SHENG
Computational units for element approximation

Patent number: 11150872

Abstract: Herein are disclosed computation units for element approximation. A computation unit may include a first circuit to compute a first projection ? of an input element xi from a first range to a second range. In the first circuit, the input element xi may have a first format and the projected element yi may have a second format. In addition, in the first circuit, the second format may have more bits than the first format. The computation unit may further include a second circuit operatively coupled to the first circuit to produce a reduction zi in the first format using the projected element yi in the second format. The computation unit may also include a third circuit operatively coupled to the second circuit to compute a second projection ? of the reduction zi from the second range to the first range to produce an approximation wi.

Type: Grant

Filed: December 17, 2019

Date of Patent: October 19, 2021

Assignee: SambaNova Systems, Inc.

Inventors: Mingran Wang, Xiaoyan Li, Mark Luttrell, Yongning Sheng, Gregory Frederick Grohoski
Computational Units for Element Approximation

Publication number: 20210182021

Abstract: Herein are disclosed computation units for element approximation. A computation unit may include a first circuit to compute a first projection ? of an input element xi from a first range to a second range. In the first circuit, the input element xi may have a first format and the projected element yi may have a second format. In addition, in the first circuit, the second format may have more bits than the first format. The computation unit may further include a second circuit operatively coupled to the first circuit to produce a reduction zi in the first format using the projected element yi in the second format. The computation unit may also include a third circuit operatively coupled to the second circuit to compute a second projection ? of the reduction zi from the second range to the first range to produce an approximation wi.

Type: Application

Filed: December 17, 2019

Publication date: June 17, 2021

Applicant: SambaNova Systems, Inc.

Inventors: Mingran WANG, Xiaoyan Li, Mark Luttrell, Yongning Sheng, Gregory Frederick Grohoski
Computational Units for Batch Normalization

Publication number: 20210157550

Abstract: Herein are disclosed computation units for batch normalization. A computation unit may include a first circuit to traverse a batch of input elements xi having a first format, to produce a mean ?1 in the first format and a mean ?2 in a second format, the second format having more bits than the first format. The computation unit may further include a second circuit operatively coupled to the first circuit to traverse the batch of input elements xi to produce a standard deviation ? for the batch using the mean ?1 in the first format. The computation unit may also include a third circuit operatively coupled to the second circuit to traverse the batch of input elements xi to produce a normalized set of values yi using the mean ?2 in the second format and the standard deviation ?.

Type: Application

Filed: November 25, 2019

Publication date: May 27, 2021

Applicant: SambaNova Systems, Inc.

Inventors: Mingran WANG, Xiaoyan LI, Yongning SHENG
LOOK-UP TABLE WITH INPUT OFFSETTING

Publication number: 20210149634

Abstract: A computation unit computes a function f(I). The function f(I) has a target output range over a first domain of an input I encoded using a first format. A first circuit receives the encoded input I in the first format including X bits, to add an offset C to the encoded input I to generate an offset input SI=I+C, in a second format including fewer than X bits. The offset C is equal to a difference between the first domain in f(I) and a higher precision domain of the second format for the offset input SI. A second circuit is operatively coupled to receive the offset input SI in the second format, to output a value equal to a function f(SI) to provide an encoded output value f(I).

Type: Application

Filed: November 19, 2019

Publication date: May 20, 2021

Applicant: SambaNova Systems, Inc.

Inventors: Mingran WANG, Xiaoyan LI, Yongning SHENG
COMPUTATION UNITS FOR FUNCTIONS BASED ON LOOKUP TABLES

Publication number: 20210096816

Abstract: A computation unit comprises a floating point input having X bits including a sign bit, an E bit exponent and an M bit mantissa. A first circuit is operatively coupled to receive X-N bits of the input, including e1 bits of the exponent and ml bits of the mantissa, where e1?E, and m1?M, to output values over a first domain of the input. A second circuit is operatively coupled to receive X-K bits of the input, including e2 bits of the exponent, e2<e1, and m2 bits of the mantissa, m2>m1, to output values, over a second domain of the input. A range detector is operatively coupled to the input, to indicate a range in response to a value of the input. A selector can select the output of the first circuit or of the second circuit in response to the range detector.

Type: Application

Filed: October 1, 2019

Publication date: April 1, 2021

Applicant: SambaNova Systems, Inc.

Inventors: Mingran WANG, Xiaoyan LI, Yongning SHENG
SIGMOID FUNCTION IN HARDWARE AND A RECONFIGURABLE DATA PROCESSOR INCLUDING SAME

Publication number: 20210064568

Abstract: A functional unit for a data processor comprises an input register to store a variable X; a first circuit, having an input connected to the input register and an output, to generate a value eX on its output; a second circuit, having an input connected to the input register and an output, to generate an output which is a value (tan h(X/2)+1)/2 on its output; a comparator, having an input connected to the input register and an output, to generate a line on its output based on a comparison between X and a constant; and a selector to select between inputs connected to the outputs of the first circuit and the second circuit, in response to the output of the comparator, and provide an output representing a value sigmoid(X).

Type: Application

Filed: September 4, 2019

Publication date: March 4, 2021

Applicant: SambaNova Systems, Inc.

Inventors: Mingran WANG, Mark LUTTRELL, Yongning SHENG