Patents by Inventor Weihang FAN

Weihang FAN has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12164463
    Abstract: A method in a reconfigurable computing system includes receiving a user program for execution on a reconfigurable dataflow computing system, comprising a grid of compute units and grid of memory units interconnected with a switching array. The user program includes multiple tensor-based algebraic expressions that are converted to an intermediate representation comprising one or more logical operations executable via dataflow through compute units. These one or more logical operations are preceded by or followed by a buffer, each buffer corresponding to one or more memory units. The method includes determining whether splitting a selected buffer yields a reduced cost and then splitting the selected buffer, in response to the determining step, to produce first and second buffers. Dataflow through memory units corresponding to the first and second buffers is controlled by one or more memory units within the grid of memory units. Buffer splitting optimization reduces memory unit consumption.
    Type: Grant
    Filed: April 4, 2023
    Date of Patent: December 10, 2024
    Assignee: SambaNova Systems, Inc.
    Inventors: David Alan Koeplinger, Weihang Fan
  • Publication number: 20240370240
    Abstract: A system and method for transforming a high-level program into configuration data for a coarse-grained reconfigurable (CGR) data processor with an array of CGR units. The high-level program is transformed into a dataflow graph that includes multiple interdependent asynchronously performing meta-pipelines. A first buffer is identified that stores data that is passed from a producer in a first meta-pipeline stage to a consumer in a second meta-pipeline stage. The system determines limitations associated with the array, and selects for implementation the lowest-cost buffer topology, chosen from a cascaded buffer topology, a hybrid buffer topology, and a striped buffer topology, where cost is determined by the number of memory units and on a number of times data is written into a memory unit while traveling through the first buffer. Optimal configuration data for the array is generated and stored.
    Type: Application
    Filed: July 17, 2024
    Publication date: November 7, 2024
    Applicant: SambaNova Systems, Inc.
    Inventors: Nathan Francis SHEELEY, Weihang FAN, Matheen MUSADDIQ, Ram SIVARAMAKRISHNAN
  • Patent number: 12045591
    Abstract: A compiler transforms a high-level program into configuration data for a coarse-grained reconfigurable (CGR) data processor with an array of CGR units. The compiler includes a method that identifies a skip buffer in a dataflow graph, determines limitations associated with the array, and searches for a lowest cost implementation topology and stage depth. At least three topologies are considered, including a cascaded buffer topology, a hybrid buffer topology, and a striped buffer topology. The lowest cost implementation topology and stage depth are based on the size of the buffered data (usually, the size of a tensor), the depth of the skip buffer, and the array's limitations. The hybrid buffer topology includes multiple sections of parallel memory units. The data travels between memory units in one section to adjacent memory units in a next section without intervening reorder buffers.
    Type: Grant
    Filed: September 14, 2022
    Date of Patent: July 23, 2024
    Assignee: SambaNova Systems, Inc.
    Inventors: Nathan Sheeley, Weihang Fan, Matheen Musaddiq, Ram Sivaramakrishnan
  • Publication number: 20230385043
    Abstract: A compiler transforms a high-level program into configuration data for a coarse-grained reconfigurable (CGR) data processor with an array of CGR units. The compiler includes a method that identifies a skip buffer in a dataflow graph, determines limitations associated with the array, and searches for a lowest cost implementation topology and stage depth. At least three topologies are considered, including a cascaded buffer topology, a hybrid buffer topology, and a striped buffer topology. The lowest cost implementation topology and stage depth are based on the size of the buffered data (usually, the size of a tensor), the depth of the skip buffer, and the array's limitations. The hybrid buffer topology includes multiple sections of parallel memory units. The data travels between memory units in one section to adjacent memory units in a next section without intervening reorder buffers.
    Type: Application
    Filed: September 14, 2022
    Publication date: November 30, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Nathan SHEELEY, Weihang FAN, Matheen MUSADDIQ, Ram SIVARAMAKRISHNAN
  • Publication number: 20230325346
    Abstract: A method in a reconfigurable computing system includes receiving a user program for execution on a reconfigurable dataflow computing system, comprising a grid of compute units and grid of memory units interconnected with a switching array. The user program includes multiple tensor-based algebraic expressions that are converted to an intermediate representation comprising one or more logical operations executable via dataflow through compute units. These one or more logical operations are preceded by or followed by a buffer, each buffer corresponding to one or more memory units. The method includes determining whether splitting a selected buffer yields a reduced cost and then splitting the selected buffer, in response to the determining step, to produce first and second buffers. Dataflow through memory units corresponding to the first and second buffers is controlled by one or more memory units within the grid of memory units. Buffer splitting optimization reduces memory unit consumption.
    Type: Application
    Filed: April 4, 2023
    Publication date: October 12, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: David Alan KOEPLINGER, Weihang FAN
  • Publication number: 20230325312
    Abstract: A method for merging buffers and associated operations includes receiving a compute graph for a reconfigurable dataflow computing system and conducting a buffer allocation and merging process responsive to determining that a first operation specified by a first operation node is a memory indexing operation and that the first operation node is a producer for exactly one consuming node that specifies a second operation. The buffer allocation and merging process may include replacing the first operation node and the consuming node with a merged buffer node within the graph responsive to determining that the first operation and the second operation can be merged into a merged indexing operation and that the resource cost of the merged node is less than the sum of the resource costs of separate buffer nodes. A corresponding system and computer readable medium are also disclosed herein.
    Type: Application
    Filed: October 27, 2022
    Publication date: October 12, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: David Alan KOEPLINGER, Adam BORDELON, Weihang FAN, Kevin BROWN, Weiwei CHEN
  • Publication number: 20230315411
    Abstract: A method for improving throughput in a reconfigurable computing system includes detecting, in an algebraic representation of a computing task for a reconfigurable dataflow processor, an outer meta-pipeline loop, detecting an inner meta-pipeline loop nested within the outer meta-pipeline loop, and determining that the inner meta-pipeline loop and the outer meta-pipeline loop each conduct a common operation. The method also includes fusing the common operation for the inner meta-pipeline loop and the outer meta-pipeline loop into a single operation within the inner meta-pipeline loop. The instances of the common operation may be fused if the output of a first instance of the common operation is the source for a second instance of the common operation. Examples of the common operation include an accumulator operation, a re-read operation, and a temporal (chip buffer synchronized) operation such as a temporal concatenation operation and a temporal slicing operation.
    Type: Application
    Filed: April 4, 2023
    Publication date: October 5, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Fei WANG, Weihang FAN, David Alan KOEPLINGER
  • Publication number: 20230297349
    Abstract: A computer-implemented method of transforming a high-level program for mapping onto a coarse-grained reconfigurable (CGR) processor with an array of CGR units, including sectioning a dataflow graph into a plurality of sections; extracting performance information for each of the plurality of sections; on a CGR unit: assigning to a section at least two computations dependent on a first data element; scheduling an additional load of the first data element in response to available memory bandwidth for that section; eliminating a buffer between the additional load of the first data element and one of the two computations, for that section; generating configuration data for the and communication channels, wherein the configuration data, when loaded onto an instance of the array of CGR units, causes the array of CGR units to implement the dataflow graph; and storing the configuration data in a non-transitory computer-readable storage medium.
    Type: Application
    Filed: March 15, 2023
    Publication date: September 21, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Gao DENG, Weihang FAN, Fei WANG, Yun DU