Patents by Inventor Weihang FAN
Weihang FAN has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12164463Abstract: A method in a reconfigurable computing system includes receiving a user program for execution on a reconfigurable dataflow computing system, comprising a grid of compute units and grid of memory units interconnected with a switching array. The user program includes multiple tensor-based algebraic expressions that are converted to an intermediate representation comprising one or more logical operations executable via dataflow through compute units. These one or more logical operations are preceded by or followed by a buffer, each buffer corresponding to one or more memory units. The method includes determining whether splitting a selected buffer yields a reduced cost and then splitting the selected buffer, in response to the determining step, to produce first and second buffers. Dataflow through memory units corresponding to the first and second buffers is controlled by one or more memory units within the grid of memory units. Buffer splitting optimization reduces memory unit consumption.Type: GrantFiled: April 4, 2023Date of Patent: December 10, 2024Assignee: SambaNova Systems, Inc.Inventors: David Alan Koeplinger, Weihang Fan
-
Publication number: 20240370240Abstract: A system and method for transforming a high-level program into configuration data for a coarse-grained reconfigurable (CGR) data processor with an array of CGR units. The high-level program is transformed into a dataflow graph that includes multiple interdependent asynchronously performing meta-pipelines. A first buffer is identified that stores data that is passed from a producer in a first meta-pipeline stage to a consumer in a second meta-pipeline stage. The system determines limitations associated with the array, and selects for implementation the lowest-cost buffer topology, chosen from a cascaded buffer topology, a hybrid buffer topology, and a striped buffer topology, where cost is determined by the number of memory units and on a number of times data is written into a memory unit while traveling through the first buffer. Optimal configuration data for the array is generated and stored.Type: ApplicationFiled: July 17, 2024Publication date: November 7, 2024Applicant: SambaNova Systems, Inc.Inventors: Nathan Francis SHEELEY, Weihang FAN, Matheen MUSADDIQ, Ram SIVARAMAKRISHNAN
-
Patent number: 12045591Abstract: A compiler transforms a high-level program into configuration data for a coarse-grained reconfigurable (CGR) data processor with an array of CGR units. The compiler includes a method that identifies a skip buffer in a dataflow graph, determines limitations associated with the array, and searches for a lowest cost implementation topology and stage depth. At least three topologies are considered, including a cascaded buffer topology, a hybrid buffer topology, and a striped buffer topology. The lowest cost implementation topology and stage depth are based on the size of the buffered data (usually, the size of a tensor), the depth of the skip buffer, and the array's limitations. The hybrid buffer topology includes multiple sections of parallel memory units. The data travels between memory units in one section to adjacent memory units in a next section without intervening reorder buffers.Type: GrantFiled: September 14, 2022Date of Patent: July 23, 2024Assignee: SambaNova Systems, Inc.Inventors: Nathan Sheeley, Weihang Fan, Matheen Musaddiq, Ram Sivaramakrishnan
-
Publication number: 20230385043Abstract: A compiler transforms a high-level program into configuration data for a coarse-grained reconfigurable (CGR) data processor with an array of CGR units. The compiler includes a method that identifies a skip buffer in a dataflow graph, determines limitations associated with the array, and searches for a lowest cost implementation topology and stage depth. At least three topologies are considered, including a cascaded buffer topology, a hybrid buffer topology, and a striped buffer topology. The lowest cost implementation topology and stage depth are based on the size of the buffered data (usually, the size of a tensor), the depth of the skip buffer, and the array's limitations. The hybrid buffer topology includes multiple sections of parallel memory units. The data travels between memory units in one section to adjacent memory units in a next section without intervening reorder buffers.Type: ApplicationFiled: September 14, 2022Publication date: November 30, 2023Applicant: SambaNova Systems, Inc.Inventors: Nathan SHEELEY, Weihang FAN, Matheen MUSADDIQ, Ram SIVARAMAKRISHNAN
-
Publication number: 20230325346Abstract: A method in a reconfigurable computing system includes receiving a user program for execution on a reconfigurable dataflow computing system, comprising a grid of compute units and grid of memory units interconnected with a switching array. The user program includes multiple tensor-based algebraic expressions that are converted to an intermediate representation comprising one or more logical operations executable via dataflow through compute units. These one or more logical operations are preceded by or followed by a buffer, each buffer corresponding to one or more memory units. The method includes determining whether splitting a selected buffer yields a reduced cost and then splitting the selected buffer, in response to the determining step, to produce first and second buffers. Dataflow through memory units corresponding to the first and second buffers is controlled by one or more memory units within the grid of memory units. Buffer splitting optimization reduces memory unit consumption.Type: ApplicationFiled: April 4, 2023Publication date: October 12, 2023Applicant: SambaNova Systems, Inc.Inventors: David Alan KOEPLINGER, Weihang FAN
-
Publication number: 20230325312Abstract: A method for merging buffers and associated operations includes receiving a compute graph for a reconfigurable dataflow computing system and conducting a buffer allocation and merging process responsive to determining that a first operation specified by a first operation node is a memory indexing operation and that the first operation node is a producer for exactly one consuming node that specifies a second operation. The buffer allocation and merging process may include replacing the first operation node and the consuming node with a merged buffer node within the graph responsive to determining that the first operation and the second operation can be merged into a merged indexing operation and that the resource cost of the merged node is less than the sum of the resource costs of separate buffer nodes. A corresponding system and computer readable medium are also disclosed herein.Type: ApplicationFiled: October 27, 2022Publication date: October 12, 2023Applicant: SambaNova Systems, Inc.Inventors: David Alan KOEPLINGER, Adam BORDELON, Weihang FAN, Kevin BROWN, Weiwei CHEN
-
Publication number: 20230315411Abstract: A method for improving throughput in a reconfigurable computing system includes detecting, in an algebraic representation of a computing task for a reconfigurable dataflow processor, an outer meta-pipeline loop, detecting an inner meta-pipeline loop nested within the outer meta-pipeline loop, and determining that the inner meta-pipeline loop and the outer meta-pipeline loop each conduct a common operation. The method also includes fusing the common operation for the inner meta-pipeline loop and the outer meta-pipeline loop into a single operation within the inner meta-pipeline loop. The instances of the common operation may be fused if the output of a first instance of the common operation is the source for a second instance of the common operation. Examples of the common operation include an accumulator operation, a re-read operation, and a temporal (chip buffer synchronized) operation such as a temporal concatenation operation and a temporal slicing operation.Type: ApplicationFiled: April 4, 2023Publication date: October 5, 2023Applicant: SambaNova Systems, Inc.Inventors: Fei WANG, Weihang FAN, David Alan KOEPLINGER
-
Publication number: 20230297349Abstract: A computer-implemented method of transforming a high-level program for mapping onto a coarse-grained reconfigurable (CGR) processor with an array of CGR units, including sectioning a dataflow graph into a plurality of sections; extracting performance information for each of the plurality of sections; on a CGR unit: assigning to a section at least two computations dependent on a first data element; scheduling an additional load of the first data element in response to available memory bandwidth for that section; eliminating a buffer between the additional load of the first data element and one of the two computations, for that section; generating configuration data for the and communication channels, wherein the configuration data, when loaded onto an instance of the array of CGR units, causes the array of CGR units to implement the dataflow graph; and storing the configuration data in a non-transitory computer-readable storage medium.Type: ApplicationFiled: March 15, 2023Publication date: September 21, 2023Applicant: SambaNova Systems, Inc.Inventors: Gao DENG, Weihang FAN, Fei WANG, Yun DU