Patents by Inventor David Alan Koeplinger
David Alan Koeplinger has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12164463Abstract: A method in a reconfigurable computing system includes receiving a user program for execution on a reconfigurable dataflow computing system, comprising a grid of compute units and grid of memory units interconnected with a switching array. The user program includes multiple tensor-based algebraic expressions that are converted to an intermediate representation comprising one or more logical operations executable via dataflow through compute units. These one or more logical operations are preceded by or followed by a buffer, each buffer corresponding to one or more memory units. The method includes determining whether splitting a selected buffer yields a reduced cost and then splitting the selected buffer, in response to the determining step, to produce first and second buffers. Dataflow through memory units corresponding to the first and second buffers is controlled by one or more memory units within the grid of memory units. Buffer splitting optimization reduces memory unit consumption.Type: GrantFiled: April 4, 2023Date of Patent: December 10, 2024Assignee: SambaNova Systems, Inc.Inventors: David Alan Koeplinger, Weihang Fan
-
Patent number: 12105630Abstract: A dataflow graph for an application has operation units that are configured to be producers and consumers of tensors. A write access pattern of a particular producer specifies an order in which the particular producer generates elements of a tensor, and a read access pattern of a corresponding consumer specifies an order in which the corresponding consumer processes the elements of the tensor. The technology disclosed detects conflicts between the producers and the corresponding consumers that have mismatches between the write access patterns and the read access patterns. A conflict occurs when the order in which the particular producer generates the elements of the tensor is different from the order in which the corresponding consumer processes the elements of the tensor. The technology disclosed resolves the conflicts by inserting buffers between the producers and the corresponding consumers.Type: GrantFiled: January 24, 2022Date of Patent: October 1, 2024Assignee: SambaNova Systems, Inc.Inventors: Kevin James Brown, David Alan Koeplinger, Weiwei Chen, Xiaoming Gu
-
Publication number: 20230409233Abstract: Buffer assignment in a contiguous area in a coarse-grained reconfigurable (CGR) array is optimized by temporarily assigning a first buffer portion and a second buffer portion to first and second physical memory units, routing connections in the contiguous area, and calculating a first cost. A list of candidates for a third physical memory unit is created, and a best cost and a best candidate are initialized. For each candidate, the first and second buffer are reassigned to the candidate, connections for data and dataflow control information in the contiguous area are routed, and a second cost is calculated. If the second cost is better than the best cost, the best cost and the best candidate are updated.Type: ApplicationFiled: November 15, 2022Publication date: December 21, 2023Applicant: SambaNova Systems, Inc.Inventors: Nathan Francis SHEELEY, Raghu PRABHAKAR, David Alan KOEPLINGER
-
Publication number: 20230376292Abstract: The technology disclosed relates to automatically assigning and optimizing the physical memory layouts of all intermediate dense tensor data in a program. The technology disclosed is an implementation of a compiler analysis and transformation pass which automatically determines required physical layouts in light of kernel operation and performance requirements. The proposed solution also inserts physical layout conversion operations where necessary in cases of unresolvable incompatibilities. The pass takes as input a program acyclic dataflow graph and a set of physical layout constraints for every known operation.Type: ApplicationFiled: April 18, 2023Publication date: November 23, 2023Applicant: SambaNova Systems, Inc.Inventors: David Alan KOEPLINGER, Weiwei CHEN, Kevin BROWN, Xiaoming GU
-
Publication number: 20230325346Abstract: A method in a reconfigurable computing system includes receiving a user program for execution on a reconfigurable dataflow computing system, comprising a grid of compute units and grid of memory units interconnected with a switching array. The user program includes multiple tensor-based algebraic expressions that are converted to an intermediate representation comprising one or more logical operations executable via dataflow through compute units. These one or more logical operations are preceded by or followed by a buffer, each buffer corresponding to one or more memory units. The method includes determining whether splitting a selected buffer yields a reduced cost and then splitting the selected buffer, in response to the determining step, to produce first and second buffers. Dataflow through memory units corresponding to the first and second buffers is controlled by one or more memory units within the grid of memory units. Buffer splitting optimization reduces memory unit consumption.Type: ApplicationFiled: April 4, 2023Publication date: October 12, 2023Applicant: SambaNova Systems, Inc.Inventors: David Alan KOEPLINGER, Weihang FAN
-
Publication number: 20230325163Abstract: The technology disclosed relates to storing a dataflow graph with a plurality of compute nodes that transmit data along data connections, and controlling data transmission between compute nodes in the plurality of compute nodes along the data connections by using control connections to control writing of data.Type: ApplicationFiled: June 7, 2023Publication date: October 12, 2023Applicant: SambaNova Systems, Inc.Inventors: Weiwei CHEN, Raghu PRABHAKAR, David Alan KOEPLINGER, Sitanshu GUPTA, Ruddhi CHAPHEKAR, Ajit PUNJ, Sumti JAIRATH
-
Publication number: 20230325312Abstract: A method for merging buffers and associated operations includes receiving a compute graph for a reconfigurable dataflow computing system and conducting a buffer allocation and merging process responsive to determining that a first operation specified by a first operation node is a memory indexing operation and that the first operation node is a producer for exactly one consuming node that specifies a second operation. The buffer allocation and merging process may include replacing the first operation node and the consuming node with a merged buffer node within the graph responsive to determining that the first operation and the second operation can be merged into a merged indexing operation and that the resource cost of the merged node is less than the sum of the resource costs of separate buffer nodes. A corresponding system and computer readable medium are also disclosed herein.Type: ApplicationFiled: October 27, 2022Publication date: October 12, 2023Applicant: SambaNova Systems, Inc.Inventors: David Alan KOEPLINGER, Adam BORDELON, Weihang FAN, Kevin BROWN, Weiwei CHEN
-
Patent number: 11782729Abstract: A data processing system comprises a pool of reconfigurable data flow resources and a runtime processor. The pool of reconfigurable data flow resources includes arrays of physical configurable units and memory. The runtime processor includes logic to receive a plurality of configuration files for user applications. The configuration files include configurations of virtual data flow resources required to execute the user applications. The runtime processor also includes logic to allocate physical configurable units and memory in the pool of reconfigurable data flow resources to the virtual data flow resources and load the configuration files to the allocated physical configurable units. The runtime processor further includes logic to execute the user applications using the allocated physical configurable units and memory.Type: GrantFiled: August 18, 2020Date of Patent: October 10, 2023Assignee: SambaNova Systems, Inc.Inventors: Gregory Frederick Grohoski, Manish K. Shah, Raghu Prabhakar, Mark Luttrell, Ravinder Kumar, Kin Hing Leung, Ranen Chatterjee, Sumti Jairath, David Alan Koeplinger, Ram Sivaramakrishnan, Matthew Thomas Grimm
-
Publication number: 20230315411Abstract: A method for improving throughput in a reconfigurable computing system includes detecting, in an algebraic representation of a computing task for a reconfigurable dataflow processor, an outer meta-pipeline loop, detecting an inner meta-pipeline loop nested within the outer meta-pipeline loop, and determining that the inner meta-pipeline loop and the outer meta-pipeline loop each conduct a common operation. The method also includes fusing the common operation for the inner meta-pipeline loop and the outer meta-pipeline loop into a single operation within the inner meta-pipeline loop. The instances of the common operation may be fused if the output of a first instance of the common operation is the source for a second instance of the common operation. Examples of the common operation include an accumulator operation, a re-read operation, and a temporal (chip buffer synchronized) operation such as a temporal concatenation operation and a temporal slicing operation.Type: ApplicationFiled: April 4, 2023Publication date: October 5, 2023Applicant: SambaNova Systems, Inc.Inventors: Fei WANG, Weihang FAN, David Alan KOEPLINGER
-
Publication number: 20230315406Abstract: In a method a compiler performs a trial compilation to a low level (LL) intermediate representation (IR) of a high level (HL) decision to execute a dataflow application on a computing system. The LLIR comprises hardware resources to execute the application based on the HL decision and the compiler determines a trial result based on LL execution metrics associated with the trail compilation. The compiler performs a trial compilation of a second HL decision to a second LLIR and determines a trial result based on LL execution metrics associated with the second trail compilation. The compiler evaluates the trial results and, based on the evaluations, selects one or both of the HL decisions for executing the dataflow application. A computer program product and a computing system can implement the method.Type: ApplicationFiled: March 31, 2023Publication date: October 5, 2023Applicant: SambaNova Systems, Inc.Inventors: Blaine RISTER, Haocheng DONG, David Alan KOEPLINGER, Yaqi ZHANG, Junjue WANG, Zhuo CHEN, Arvind SUJEETH
-
Publication number: 20230305823Abstract: A method in a reconfigurable computing system includes connecting a plurality of tensor consumers to their corresponding tensor producers via skip-buffers, which generates a plurality of skip-buffers. The method includes determining that at least one skip-buffer of the plurality of skip-buffers corresponding to a first set of tensor consumers and at least one skip-buffer of the plurality of skip-buffers corresponding to a second set of tensor consumers, are compatible to wholly or partially merge. The method also includes merging, wholly or partially, the compatible skip-buffers to produce a merged skip-buffer having a minimal buffer depth. The described method may reduce memory unit consumption and latency.Type: ApplicationFiled: March 27, 2023Publication date: September 28, 2023Applicant: SambaNova Systems, Inc.Inventors: Fei WANG, David Alan KOEPLINGER, Kevin BROWN, Weiwei CHEN
-
Publication number: 20230273879Abstract: A method for reducing latency and increasing throughput in a reconfigurable computing system includes receiving a user program for execution on a reconfigurable dataflow computing system, comprising a grid of compute units and grid of memory units interconnected with a switching array. The user program includes multiple tensor-based algebraic expressions that are converted to an intermediate representation comprising multiple stages. Each stage includes one or more logical operations executable via dataflow through compute units, and each stage is preceded by and followed by a buffer, each buffer corresponding to one or more memory units. The method includes detecting a memory mapping operation within a critical stage and moving the memory mapping operation to an adjacent stage, wherein the memory mapping operation is executable by memory units within the adjacent stage and dataflow through the buffer is controlled by one or more memory units within the grid of memory units.Type: ApplicationFiled: February 28, 2023Publication date: August 31, 2023Applicant: SambaNova Systems, Inc.Inventors: Adam BORDELON, David Alan KOEPLINGER
-
Patent number: 11714780Abstract: The technology disclosed partitions a dataflow graph of a high-level program into memory allocations and execution fragments. The memory allocations represent creation of logical memory spaces in on-processor and/or off-processor memories for data required to implement the dataflow graph. The execution fragments represent operations on the data. The technology disclosed designates the memory allocations to virtual memory units and the execution fragments to virtual compute units. The technology disclosed partitions the execution fragments into memory fragments and compute fragments, and assigns the memory fragments to the virtual memory units and the compute fragments to the virtual compute units. The technology disclosed then allocates the virtual memory units to physical memory units and the virtual compute units to physical compute units.Type: GrantFiled: May 20, 2021Date of Patent: August 1, 2023Assignee: SambaNova Systems, Inc.Inventors: David Alan Koeplinger, Raghu Prabhakar, Sumti Jairath
-
Patent number: 11709664Abstract: A compiler configured to configure memory nodes with a ready-to-read credit counter and a write credit counter. The ready-to-read credit counter of a particular upstream memory node initialized with as many read credits as a buffer depth of a corresponding downstream memory node. The ready-to-read credit counter configured to decrement when a buffer data unit is written by the particular upstream memory node into the corresponding downstream memory node, and to increment when the particular upstream memory node receives from the corresponding downstream memory node a read ready token. The write credit counter of the particular upstream memory node initialized with one or more write credits and configured to decrement when the particular upstream memory node begins writing the buffer data unit into the corresponding downstream memory node, and to increment when the particular upstream memory node receives from the corresponding downstream memory node a write done token.Type: GrantFiled: June 2, 2020Date of Patent: July 25, 2023Assignee: SambaNova Systems, Inc.Inventors: Weiwei Chen, Raghu Prabhakar, David Alan Koeplinger, Sitanshu Gupta, Ruddhi Arun Chaphekar, Ajit Punj, Sumti Jairath
-
Publication number: 20230205501Abstract: The technology disclosed provides a system that comprises a processor with computing units on an integrated circuit substrate. The processor is configured to map a program across multiple hardware stages with each hardware stage executing a corresponding operation of the program at a different stage latency dependent on an operation type and an operand format. The system further comprises a runtime logic that configures the compute units with configuration data. The configuration data causes first and second producer hardware stages in a given compute unit to execute first and second data processing operations and produce first and second outputs at first and second stage latencies, and synchronizes consumption of the first and second outputs by a consumer hardware stage in the given compute unit for execution of a third data processing operation by introducing a register storage delay that compensates for a difference between the first and second stage latencies.Type: ApplicationFiled: December 27, 2022Publication date: June 29, 2023Applicant: SambaNova Systems, Inc.Inventors: Weiwei CHEN, Raghu PRABHAKAR, David Alan KOEPLINGER
-
Patent number: 11645057Abstract: A dataflow graph has operation units that are configured to be producer operation units to produce tensors for execution of the application, and to be consumer operation units to consume the tensors for execution of the application. Compile time logic is configured to process the dataflow graph to determine, for the tensors, expected producer memory layouts, expected consumer memory layouts, and current memory layouts. The expected producer memory layouts specify memory layouts required by the producer operation units that produce the tensors. The expected consumer memory layouts specify the memory layouts required by the consumer operation units that consume the tensors. The current memory layouts specify the memory layouts of the tensors. Each of the memory layouts includes a vector dimension and at least one of a vector ordering and a data alignment.Type: GrantFiled: September 24, 2020Date of Patent: May 9, 2023Assignee: SambaNova Systems, Inc.Inventors: David Alan Koeplinger, Weiwei Chen, Kevin James Brown, Xiaoming Gu
-
Publication number: 20220147328Abstract: A dataflow graph for an application has operation units that are configured to be producers and consumers of tensors. A write access pattern of a particular producer specifies an order in which the particular producer generates elements of a tensor, and a read access pattern of a corresponding consumer specifies an order in which the corresponding consumer processes the elements of the tensor. The technology disclosed detects conflicts between the producers and the corresponding consumers that have mismatches between the write access patterns and the read access patterns. A conflict occurs when the order in which the particular producer generates the elements of the tensor is different from the order in which the corresponding consumer processes the elements of the tensor. The technology disclosed resolves the conflicts by inserting buffers between the producers and the corresponding consumers.Type: ApplicationFiled: January 24, 2022Publication date: May 12, 2022Applicant: SambaNova Systems, Inc.Inventors: Kevin James BROWN, David Alan KOEPLINGER, Weiwei CHEN, Xiaoming GU
-
Publication number: 20220092247Abstract: A dataflow graph has operation units that are configured to be producer operation units to produce tensors for execution of the application, and to be consumer operation units to consume the tensors for execution of the application. Compile time logic is configured to process the dataflow graph to determine, for the tensors, expected producer memory layouts, expected consumer memory layouts, and current memory layouts. The expected producer memory layouts specify memory layouts required by the producer operation units that produce the tensors. The expected consumer memory layouts specify the memory layouts required by the consumer operation units that consume the tensors. The current memory layouts specify the memory layouts of the tensors. Each of the memory layouts includes a vector dimension and at least one of a vector ordering and a data alignment.Type: ApplicationFiled: September 24, 2020Publication date: March 24, 2022Applicant: SambaNova Systems, Inc.Inventors: David Alan KOEPLINGER, Weiwei CHEN, Kevin James BROWN, Xiaoming GU
-
Publication number: 20220058034Abstract: A data processing system comprises a pool of reconfigurable data flow resources and a runtime processor. The pool of reconfigurable data flow resources includes arrays of physical configurable units and memory. The runtime processor includes logic to receive a plurality of configuration files for user applications. The configuration files include configurations of virtual data flow resources required to execute the user applications. The runtime processor also includes logic to allocate physical configurable units and memory in the pool of reconfigurable data flow resources to the virtual data flow resources and load the configuration files to the allocated physical configurable units. The runtime processor further includes logic to execute the user applications using the allocated physical configurable units and memory.Type: ApplicationFiled: August 18, 2020Publication date: February 24, 2022Applicant: SambaNova Systems, Inc.Inventors: Gregory Frederick GROHOSKI, Manish K. SHAH, Raghu PRABHAKAR, Mark LUTTRELL, Ravinder KUMAR, Kin Hing LEUNG, Ranen CHATTERJEE, Sumti JAIRATH, David Alan KOEPLINGER, Ram SIVARAMAKRISHNAN, Matthew Thomas GRIMM
-
Patent number: 11237971Abstract: A dataflow graph for an application has operation units that are configured to be producers and consumers of tensors. A write access pattern of a particular producer specifies an order in which the particular producer generates elements of a tensor, and a read access pattern of a corresponding consumer specifies an order in which the corresponding consumer processes the elements of the tensor. The technology disclosed detects conflicts between the producers and the corresponding consumers that have mismatches between the write access patterns and the read access patterns. A conflict occurs when the order in which the particular producer generates the elements of the tensor is different from the order in which the corresponding consumer processes the elements of the tensor. The technology disclosed resolves the conflicts by inserting buffers between the producers and the corresponding consumers.Type: GrantFiled: September 16, 2020Date of Patent: February 1, 2022Assignee: SambaNova Systems, Inc.Inventors: Kevin James Brown, David Alan Koeplinger, Weiwei Chen, Xiaoming Gu