Patents by Inventor David Alan Koeplinger

David Alan Koeplinger has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Buffer Fusion and Layout Optimization

Publication number: 20230409233

Abstract: Buffer assignment in a contiguous area in a coarse-grained reconfigurable (CGR) array is optimized by temporarily assigning a first buffer portion and a second buffer portion to first and second physical memory units, routing connections in the contiguous area, and calculating a first cost. A list of candidates for a third physical memory unit is created, and a best cost and a best candidate are initialized. For each candidate, the first and second buffer are reassigned to the candidate, connections for data and dataflow control information in the contiguous area are routed, and a second cost is calculated. If the second cost is better than the best cost, the best cost and the best candidate are updated.

Type: Application

Filed: November 15, 2022

Publication date: December 21, 2023

Applicant: SambaNova Systems, Inc.

Inventors: Nathan Francis SHEELEY, Raghu PRABHAKAR, David Alan KOEPLINGER
COMPILE TIME LOGIC FOR DETECTING AND RESOLVING MEMORY LAYOUT CONFLICTS

Publication number: 20230376292

Abstract: The technology disclosed relates to automatically assigning and optimizing the physical memory layouts of all intermediate dense tensor data in a program. The technology disclosed is an implementation of a compiler analysis and transformation pass which automatically determines required physical layouts in light of kernel operation and performance requirements. The proposed solution also inserts physical layout conversion operations where necessary in cases of unresolvable incompatibilities. The pass takes as input a program acyclic dataflow graph and a set of physical layout constraints for every known operation.

Type: Application

Filed: April 18, 2023

Publication date: November 23, 2023

Applicant: SambaNova Systems, Inc.

Inventors: David Alan KOEPLINGER, Weiwei CHEN, Kevin BROWN, Xiaoming GU
Merging Buffer Access Operations in a Coarse-grained Reconfigurable Computing System

Publication number: 20230325312

Abstract: A method for merging buffers and associated operations includes receiving a compute graph for a reconfigurable dataflow computing system and conducting a buffer allocation and merging process responsive to determining that a first operation specified by a first operation node is a memory indexing operation and that the first operation node is a producer for exactly one consuming node that specifies a second operation. The buffer allocation and merging process may include replacing the first operation node and the consuming node with a merged buffer node within the graph responsive to determining that the first operation and the second operation can be merged into a merged indexing operation and that the resource cost of the merged node is less than the sum of the resource costs of separate buffer nodes. A corresponding system and computer readable medium are also disclosed herein.

Type: Application

Filed: October 27, 2022

Publication date: October 12, 2023

Applicant: SambaNova Systems, Inc.

Inventors: David Alan KOEPLINGER, Adam BORDELON, Weihang FAN, Kevin BROWN, Weiwei CHEN
Buffer Splitting

Publication number: 20230325346

Abstract: A method in a reconfigurable computing system includes receiving a user program for execution on a reconfigurable dataflow computing system, comprising a grid of compute units and grid of memory units interconnected with a switching array. The user program includes multiple tensor-based algebraic expressions that are converted to an intermediate representation comprising one or more logical operations executable via dataflow through compute units. These one or more logical operations are preceded by or followed by a buffer, each buffer corresponding to one or more memory units. The method includes determining whether splitting a selected buffer yields a reduced cost and then splitting the selected buffer, in response to the determining step, to produce first and second buffers. Dataflow through memory units corresponding to the first and second buffers is controlled by one or more memory units within the grid of memory units. Buffer splitting optimization reduces memory unit consumption.

Type: Application

Filed: April 4, 2023

Publication date: October 12, 2023

Applicant: SambaNova Systems, Inc.

Inventors: David Alan KOEPLINGER, Weihang FAN
FLOW CONTROL FOR RECONFIGURABLE PROCESSORS

Publication number: 20230325163

Abstract: The technology disclosed relates to storing a dataflow graph with a plurality of compute nodes that transmit data along data connections, and controlling data transmission between compute nodes in the plurality of compute nodes along the data connections by using control connections to control writing of data.

Type: Application

Filed: June 7, 2023

Publication date: October 12, 2023

Applicant: SambaNova Systems, Inc.

Inventors: Weiwei CHEN, Raghu PRABHAKAR, David Alan KOEPLINGER, Sitanshu GUPTA, Ruddhi CHAPHEKAR, Ajit PUNJ, Sumti JAIRATH
Runtime patching of configuration files

Patent number: 11782729

Abstract: A data processing system comprises a pool of reconfigurable data flow resources and a runtime processor. The pool of reconfigurable data flow resources includes arrays of physical configurable units and memory. The runtime processor includes logic to receive a plurality of configuration files for user applications. The configuration files include configurations of virtual data flow resources required to execute the user applications. The runtime processor also includes logic to allocate physical configurable units and memory in the pool of reconfigurable data flow resources to the virtual data flow resources and load the configuration files to the allocated physical configurable units. The runtime processor further includes logic to execute the user applications using the allocated physical configurable units and memory.

Type: Grant

Filed: August 18, 2020

Date of Patent: October 10, 2023

Assignee: SambaNova Systems, Inc.

Inventors: Gregory Frederick Grohoski, Manish K. Shah, Raghu Prabhakar, Mark Luttrell, Ravinder Kumar, Kin Hing Leung, Ranen Chatterjee, Sumti Jairath, David Alan Koeplinger, Ram Sivaramakrishnan, Matthew Thomas Grimm
Operation Fusion in Nested Meta-pipeline Loops

Publication number: 20230315411

Abstract: A method for improving throughput in a reconfigurable computing system includes detecting, in an algebraic representation of a computing task for a reconfigurable dataflow processor, an outer meta-pipeline loop, detecting an inner meta-pipeline loop nested within the outer meta-pipeline loop, and determining that the inner meta-pipeline loop and the outer meta-pipeline loop each conduct a common operation. The method also includes fusing the common operation for the inner meta-pipeline loop and the outer meta-pipeline loop into a single operation within the inner meta-pipeline loop. The instances of the common operation may be fused if the output of a first instance of the common operation is the source for a second instance of the common operation. Examples of the common operation include an accumulator operation, a re-read operation, and a temporal (chip buffer synchronized) operation such as a temporal concatenation operation and a temporal slicing operation.

Type: Application

Filed: April 4, 2023

Publication date: October 5, 2023

Applicant: SambaNova Systems, Inc.

Inventors: Fei WANG, Weihang FAN, David Alan KOEPLINGER
ITERATIVE COMPILATION TO OPTIMIZE TRANSLATION IN RECONFIGURABLE DATAFLOW ARCHITECTURES

Publication number: 20230315406

Abstract: In a method a compiler performs a trial compilation to a low level (LL) intermediate representation (IR) of a high level (HL) decision to execute a dataflow application on a computing system. The LLIR comprises hardware resources to execute the application based on the HL decision and the compiler determines a trial result based on LL execution metrics associated with the trail compilation. The compiler performs a trial compilation of a second HL decision to a second LLIR and determines a trial result based on LL execution metrics associated with the second trail compilation. The compiler evaluates the trial results and, based on the evaluations, selects one or both of the HL decisions for executing the dataflow application. A computer program product and a computing system can implement the method.

Type: Application

Filed: March 31, 2023

Publication date: October 5, 2023

Applicant: SambaNova Systems, Inc.

Inventors: Blaine RISTER, Haocheng DONG, David Alan KOEPLINGER, Yaqi ZHANG, Junjue WANG, Zhuo CHEN, Arvind SUJEETH
Merging Skip-Buffers

Publication number: 20230305823

Abstract: A method in a reconfigurable computing system includes connecting a plurality of tensor consumers to their corresponding tensor producers via skip-buffers, which generates a plurality of skip-buffers. The method includes determining that at least one skip-buffer of the plurality of skip-buffers corresponding to a first set of tensor consumers and at least one skip-buffer of the plurality of skip-buffers corresponding to a second set of tensor consumers, are compatible to wholly or partially merge. The method also includes merging, wholly or partially, the compatible skip-buffers to produce a merged skip-buffer having a minimal buffer depth. The described method may reduce memory unit consumption and latency.

Type: Application

Filed: March 27, 2023

Publication date: September 28, 2023

Applicant: SambaNova Systems, Inc.

Inventors: Fei WANG, David Alan KOEPLINGER, Kevin BROWN, Weiwei CHEN
Critical Stage Optimization for Reconfigurable Architectures

Publication number: 20230273879

Abstract: A method for reducing latency and increasing throughput in a reconfigurable computing system includes receiving a user program for execution on a reconfigurable dataflow computing system, comprising a grid of compute units and grid of memory units interconnected with a switching array. The user program includes multiple tensor-based algebraic expressions that are converted to an intermediate representation comprising multiple stages. Each stage includes one or more logical operations executable via dataflow through compute units, and each stage is preceded by and followed by a buffer, each buffer corresponding to one or more memory units. The method includes detecting a memory mapping operation within a critical stage and moving the memory mapping operation to an adjacent stage, wherein the memory mapping operation is executable by memory units within the adjacent stage and dataflow through the buffer is controlled by one or more memory units within the grid of memory units.

Type: Application

Filed: February 28, 2023

Publication date: August 31, 2023

Applicant: SambaNova Systems, Inc.

Inventors: Adam BORDELON, David Alan KOEPLINGER
Compiler flow logic for reconfigurable architectures

Patent number: 11714780

Abstract: The technology disclosed partitions a dataflow graph of a high-level program into memory allocations and execution fragments. The memory allocations represent creation of logical memory spaces in on-processor and/or off-processor memories for data required to implement the dataflow graph. The execution fragments represent operations on the data. The technology disclosed designates the memory allocations to virtual memory units and the execution fragments to virtual compute units. The technology disclosed partitions the execution fragments into memory fragments and compute fragments, and assigns the memory fragments to the virtual memory units and the compute fragments to the virtual compute units. The technology disclosed then allocates the virtual memory units to physical memory units and the virtual compute units to physical compute units.

Type: Grant

Filed: May 20, 2021

Date of Patent: August 1, 2023

Assignee: SambaNova Systems, Inc.

Inventors: David Alan Koeplinger, Raghu Prabhakar, Sumti Jairath
Anti-congestion flow control for reconfigurable processors

Patent number: 11709664

Abstract: A compiler configured to configure memory nodes with a ready-to-read credit counter and a write credit counter. The ready-to-read credit counter of a particular upstream memory node initialized with as many read credits as a buffer depth of a corresponding downstream memory node. The ready-to-read credit counter configured to decrement when a buffer data unit is written by the particular upstream memory node into the corresponding downstream memory node, and to increment when the particular upstream memory node receives from the corresponding downstream memory node a read ready token. The write credit counter of the particular upstream memory node initialized with one or more write credits and configured to decrement when the particular upstream memory node begins writing the buffer data unit into the corresponding downstream memory node, and to increment when the particular upstream memory node receives from the corresponding downstream memory node a write done token.

Type: Grant

Filed: June 2, 2020

Date of Patent: July 25, 2023

Assignee: SambaNova Systems, Inc.

Inventors: Weiwei Chen, Raghu Prabhakar, David Alan Koeplinger, Sitanshu Gupta, Ruddhi Arun Chaphekar, Ajit Punj, Sumti Jairath
COMPILER-BASED INPUT SYNCHRONIZATION FOR PROCESSOR WITH VARIANT STAGE LATENCIES

Publication number: 20230205501

Abstract: The technology disclosed provides a system that comprises a processor with computing units on an integrated circuit substrate. The processor is configured to map a program across multiple hardware stages with each hardware stage executing a corresponding operation of the program at a different stage latency dependent on an operation type and an operand format. The system further comprises a runtime logic that configures the compute units with configuration data. The configuration data causes first and second producer hardware stages in a given compute unit to execute first and second data processing operations and produce first and second outputs at first and second stage latencies, and synchronizes consumption of the first and second outputs by a consumer hardware stage in the given compute unit for execution of a third data processing operation by introducing a register storage delay that compensates for a difference between the first and second stage latencies.

Type: Application

Filed: December 27, 2022

Publication date: June 29, 2023

Applicant: SambaNova Systems, Inc.

Inventors: Weiwei CHEN, Raghu PRABHAKAR, David Alan KOEPLINGER
Systems and methods for memory layout determination and conflict resolution

Patent number: 11645057

Abstract: A dataflow graph has operation units that are configured to be producer operation units to produce tensors for execution of the application, and to be consumer operation units to consume the tensors for execution of the application. Compile time logic is configured to process the dataflow graph to determine, for the tensors, expected producer memory layouts, expected consumer memory layouts, and current memory layouts. The expected producer memory layouts specify memory layouts required by the producer operation units that produce the tensors. The expected consumer memory layouts specify the memory layouts required by the consumer operation units that consume the tensors. The current memory layouts specify the memory layouts of the tensors. Each of the memory layouts includes a vector dimension and at least one of a vector ordering and a data alignment.

Type: Grant

Filed: September 24, 2020

Date of Patent: May 9, 2023

Assignee: SambaNova Systems, Inc.

Inventors: David Alan Koeplinger, Weiwei Chen, Kevin James Brown, Xiaoming Gu
COMPILE TIME LOGIC FOR INSERTING A BUFFER BETWEEN A PRODUCER OPERATION UNIT AND A CONSUMER OPERATION UNIT IN A DATAFLOW GRAPH

Publication number: 20220147328

Abstract: A dataflow graph for an application has operation units that are configured to be producers and consumers of tensors. A write access pattern of a particular producer specifies an order in which the particular producer generates elements of a tensor, and a read access pattern of a corresponding consumer specifies an order in which the corresponding consumer processes the elements of the tensor. The technology disclosed detects conflicts between the producers and the corresponding consumers that have mismatches between the write access patterns and the read access patterns. A conflict occurs when the order in which the particular producer generates the elements of the tensor is different from the order in which the corresponding consumer processes the elements of the tensor. The technology disclosed resolves the conflicts by inserting buffers between the producers and the corresponding consumers.

Type: Application

Filed: January 24, 2022

Publication date: May 12, 2022

Applicant: SambaNova Systems, Inc.

Inventors: Kevin James BROWN, David Alan KOEPLINGER, Weiwei CHEN, Xiaoming GU
Compile Time Logic for Detecting Streaming Compatible and Broadcast Compatible Data Access Patterns

Publication number: 20220092247

Abstract: A dataflow graph has operation units that are configured to be producer operation units to produce tensors for execution of the application, and to be consumer operation units to consume the tensors for execution of the application. Compile time logic is configured to process the dataflow graph to determine, for the tensors, expected producer memory layouts, expected consumer memory layouts, and current memory layouts. The expected producer memory layouts specify memory layouts required by the producer operation units that produce the tensors. The expected consumer memory layouts specify the memory layouts required by the consumer operation units that consume the tensors. The current memory layouts specify the memory layouts of the tensors. Each of the memory layouts includes a vector dimension and at least one of a vector ordering and a data alignment.

Type: Application

Filed: September 24, 2020

Publication date: March 24, 2022

Applicant: SambaNova Systems, Inc.

Inventors: David Alan KOEPLINGER, Weiwei CHEN, Kevin James BROWN, Xiaoming GU
Runtime Patching of Configuration Files

Publication number: 20220058034

Abstract: A data processing system comprises a pool of reconfigurable data flow resources and a runtime processor. The pool of reconfigurable data flow resources includes arrays of physical configurable units and memory. The runtime processor includes logic to receive a plurality of configuration files for user applications. The configuration files include configurations of virtual data flow resources required to execute the user applications. The runtime processor also includes logic to allocate physical configurable units and memory in the pool of reconfigurable data flow resources to the virtual data flow resources and load the configuration files to the allocated physical configurable units. The runtime processor further includes logic to execute the user applications using the allocated physical configurable units and memory.

Type: Application

Filed: August 18, 2020

Publication date: February 24, 2022

Applicant: SambaNova Systems, Inc.

Inventors: Gregory Frederick GROHOSKI, Manish K. SHAH, Raghu PRABHAKAR, Mark LUTTRELL, Ravinder KUMAR, Kin Hing LEUNG, Ranen CHATTERJEE, Sumti JAIRATH, David Alan KOEPLINGER, Ram SIVARAMAKRISHNAN, Matthew Thomas GRIMM
Compile time logic for detecting streaming compatible and broadcast compatible data access patterns

Patent number: 11237971

Abstract: A dataflow graph for an application has operation units that are configured to be producers and consumers of tensors. A write access pattern of a particular producer specifies an order in which the particular producer generates elements of a tensor, and a read access pattern of a corresponding consumer specifies an order in which the corresponding consumer processes the elements of the tensor. The technology disclosed detects conflicts between the producers and the corresponding consumers that have mismatches between the write access patterns and the read access patterns. A conflict occurs when the order in which the particular producer generates the elements of the tensor is different from the order in which the corresponding consumer processes the elements of the tensor. The technology disclosed resolves the conflicts by inserting buffers between the producers and the corresponding consumers.

Type: Grant

Filed: September 16, 2020

Date of Patent: February 1, 2022

Assignee: SambaNova Systems, Inc.

Inventors: Kevin James Brown, David Alan Koeplinger, Weiwei Chen, Xiaoming Gu
Anti-Congestion Flow Control for Reconfigurable Processors

Publication number: 20210373867

Abstract: A compiler configured to configure memory nodes with a ready-to-read credit counter and a write credit counter. The ready-to-read credit counter of a particular upstream memory node initialized with as many read credits as a buffer depth of a corresponding downstream memory node. The ready-to-read credit counter configured to decrement when a buffer data unit is written by the particular upstream memory node into the corresponding downstream memory node, and to increment when the particular upstream memory node receives from the corresponding downstream memory node a read ready token. The write credit counter of the particular upstream memory node initialized with one or more write credits and configured to decrement when the particular upstream memory node begins writing the buffer data unit into the corresponding downstream memory node, and to increment when the particular upstream memory node receives from the corresponding downstream memory node a write done token.

Type: Application

Filed: June 2, 2020

Publication date: December 2, 2021

Applicant: SambaNova Systems, Inc.

Inventors: Weiwei CHEN, Raghu PRABHAKAR, David Alan KOEPLINGER, Sitanshu GUPTA, Ruddhi Arun CHAPHEKAR, Ajit PUNJ, Sumti JAIRATH
Compiler Flow Logic for Reconfigurable Architectures

Publication number: 20210271630

Abstract: The technology disclosed partitions a dataflow graph of a high-level program into memory allocations and execution fragments. The memory allocations represent creation of logical memory spaces in on-processor and/or off-processor memories for data required to implement the dataflow graph. The execution fragments represent operations on the data. The technology disclosed designates the memory allocations to virtual memory units and the execution fragments to virtual compute units. The technology disclosed partitions the execution fragments into memory fragments and compute fragments, and assigns the memory fragments to the virtual memory units and the compute fragments to the virtual compute units. The technology disclosed then allocates the virtual memory units to physical memory units and the virtual compute units to physical compute units.

Type: Application

Filed: May 20, 2021

Publication date: September 2, 2021

Applicant: SambaNova Systems, Inc.

Inventors: David Alan KOEPLINGER, Raghu PRABHAKAR, Sumti JAIRATH

1 2 next