Patents by Inventor David Alan Koeplinger

David Alan Koeplinger has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20230409233
    Abstract: Buffer assignment in a contiguous area in a coarse-grained reconfigurable (CGR) array is optimized by temporarily assigning a first buffer portion and a second buffer portion to first and second physical memory units, routing connections in the contiguous area, and calculating a first cost. A list of candidates for a third physical memory unit is created, and a best cost and a best candidate are initialized. For each candidate, the first and second buffer are reassigned to the candidate, connections for data and dataflow control information in the contiguous area are routed, and a second cost is calculated. If the second cost is better than the best cost, the best cost and the best candidate are updated.
    Type: Application
    Filed: November 15, 2022
    Publication date: December 21, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Nathan Francis SHEELEY, Raghu PRABHAKAR, David Alan KOEPLINGER
  • Publication number: 20230376292
    Abstract: The technology disclosed relates to automatically assigning and optimizing the physical memory layouts of all intermediate dense tensor data in a program. The technology disclosed is an implementation of a compiler analysis and transformation pass which automatically determines required physical layouts in light of kernel operation and performance requirements. The proposed solution also inserts physical layout conversion operations where necessary in cases of unresolvable incompatibilities. The pass takes as input a program acyclic dataflow graph and a set of physical layout constraints for every known operation.
    Type: Application
    Filed: April 18, 2023
    Publication date: November 23, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: David Alan KOEPLINGER, Weiwei CHEN, Kevin BROWN, Xiaoming GU
  • Publication number: 20230325312
    Abstract: A method for merging buffers and associated operations includes receiving a compute graph for a reconfigurable dataflow computing system and conducting a buffer allocation and merging process responsive to determining that a first operation specified by a first operation node is a memory indexing operation and that the first operation node is a producer for exactly one consuming node that specifies a second operation. The buffer allocation and merging process may include replacing the first operation node and the consuming node with a merged buffer node within the graph responsive to determining that the first operation and the second operation can be merged into a merged indexing operation and that the resource cost of the merged node is less than the sum of the resource costs of separate buffer nodes. A corresponding system and computer readable medium are also disclosed herein.
    Type: Application
    Filed: October 27, 2022
    Publication date: October 12, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: David Alan KOEPLINGER, Adam BORDELON, Weihang FAN, Kevin BROWN, Weiwei CHEN
  • Publication number: 20230325346
    Abstract: A method in a reconfigurable computing system includes receiving a user program for execution on a reconfigurable dataflow computing system, comprising a grid of compute units and grid of memory units interconnected with a switching array. The user program includes multiple tensor-based algebraic expressions that are converted to an intermediate representation comprising one or more logical operations executable via dataflow through compute units. These one or more logical operations are preceded by or followed by a buffer, each buffer corresponding to one or more memory units. The method includes determining whether splitting a selected buffer yields a reduced cost and then splitting the selected buffer, in response to the determining step, to produce first and second buffers. Dataflow through memory units corresponding to the first and second buffers is controlled by one or more memory units within the grid of memory units. Buffer splitting optimization reduces memory unit consumption.
    Type: Application
    Filed: April 4, 2023
    Publication date: October 12, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: David Alan KOEPLINGER, Weihang FAN
  • Publication number: 20230325163
    Abstract: The technology disclosed relates to storing a dataflow graph with a plurality of compute nodes that transmit data along data connections, and controlling data transmission between compute nodes in the plurality of compute nodes along the data connections by using control connections to control writing of data.
    Type: Application
    Filed: June 7, 2023
    Publication date: October 12, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Weiwei CHEN, Raghu PRABHAKAR, David Alan KOEPLINGER, Sitanshu GUPTA, Ruddhi CHAPHEKAR, Ajit PUNJ, Sumti JAIRATH
  • Patent number: 11782729
    Abstract: A data processing system comprises a pool of reconfigurable data flow resources and a runtime processor. The pool of reconfigurable data flow resources includes arrays of physical configurable units and memory. The runtime processor includes logic to receive a plurality of configuration files for user applications. The configuration files include configurations of virtual data flow resources required to execute the user applications. The runtime processor also includes logic to allocate physical configurable units and memory in the pool of reconfigurable data flow resources to the virtual data flow resources and load the configuration files to the allocated physical configurable units. The runtime processor further includes logic to execute the user applications using the allocated physical configurable units and memory.
    Type: Grant
    Filed: August 18, 2020
    Date of Patent: October 10, 2023
    Assignee: SambaNova Systems, Inc.
    Inventors: Gregory Frederick Grohoski, Manish K. Shah, Raghu Prabhakar, Mark Luttrell, Ravinder Kumar, Kin Hing Leung, Ranen Chatterjee, Sumti Jairath, David Alan Koeplinger, Ram Sivaramakrishnan, Matthew Thomas Grimm
  • Publication number: 20230315411
    Abstract: A method for improving throughput in a reconfigurable computing system includes detecting, in an algebraic representation of a computing task for a reconfigurable dataflow processor, an outer meta-pipeline loop, detecting an inner meta-pipeline loop nested within the outer meta-pipeline loop, and determining that the inner meta-pipeline loop and the outer meta-pipeline loop each conduct a common operation. The method also includes fusing the common operation for the inner meta-pipeline loop and the outer meta-pipeline loop into a single operation within the inner meta-pipeline loop. The instances of the common operation may be fused if the output of a first instance of the common operation is the source for a second instance of the common operation. Examples of the common operation include an accumulator operation, a re-read operation, and a temporal (chip buffer synchronized) operation such as a temporal concatenation operation and a temporal slicing operation.
    Type: Application
    Filed: April 4, 2023
    Publication date: October 5, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Fei WANG, Weihang FAN, David Alan KOEPLINGER
  • Publication number: 20230315406
    Abstract: In a method a compiler performs a trial compilation to a low level (LL) intermediate representation (IR) of a high level (HL) decision to execute a dataflow application on a computing system. The LLIR comprises hardware resources to execute the application based on the HL decision and the compiler determines a trial result based on LL execution metrics associated with the trail compilation. The compiler performs a trial compilation of a second HL decision to a second LLIR and determines a trial result based on LL execution metrics associated with the second trail compilation. The compiler evaluates the trial results and, based on the evaluations, selects one or both of the HL decisions for executing the dataflow application. A computer program product and a computing system can implement the method.
    Type: Application
    Filed: March 31, 2023
    Publication date: October 5, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Blaine RISTER, Haocheng DONG, David Alan KOEPLINGER, Yaqi ZHANG, Junjue WANG, Zhuo CHEN, Arvind SUJEETH
  • Publication number: 20230305823
    Abstract: A method in a reconfigurable computing system includes connecting a plurality of tensor consumers to their corresponding tensor producers via skip-buffers, which generates a plurality of skip-buffers. The method includes determining that at least one skip-buffer of the plurality of skip-buffers corresponding to a first set of tensor consumers and at least one skip-buffer of the plurality of skip-buffers corresponding to a second set of tensor consumers, are compatible to wholly or partially merge. The method also includes merging, wholly or partially, the compatible skip-buffers to produce a merged skip-buffer having a minimal buffer depth. The described method may reduce memory unit consumption and latency.
    Type: Application
    Filed: March 27, 2023
    Publication date: September 28, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Fei WANG, David Alan KOEPLINGER, Kevin BROWN, Weiwei CHEN
  • Publication number: 20230273879
    Abstract: A method for reducing latency and increasing throughput in a reconfigurable computing system includes receiving a user program for execution on a reconfigurable dataflow computing system, comprising a grid of compute units and grid of memory units interconnected with a switching array. The user program includes multiple tensor-based algebraic expressions that are converted to an intermediate representation comprising multiple stages. Each stage includes one or more logical operations executable via dataflow through compute units, and each stage is preceded by and followed by a buffer, each buffer corresponding to one or more memory units. The method includes detecting a memory mapping operation within a critical stage and moving the memory mapping operation to an adjacent stage, wherein the memory mapping operation is executable by memory units within the adjacent stage and dataflow through the buffer is controlled by one or more memory units within the grid of memory units.
    Type: Application
    Filed: February 28, 2023
    Publication date: August 31, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Adam BORDELON, David Alan KOEPLINGER
  • Patent number: 11714780
    Abstract: The technology disclosed partitions a dataflow graph of a high-level program into memory allocations and execution fragments. The memory allocations represent creation of logical memory spaces in on-processor and/or off-processor memories for data required to implement the dataflow graph. The execution fragments represent operations on the data. The technology disclosed designates the memory allocations to virtual memory units and the execution fragments to virtual compute units. The technology disclosed partitions the execution fragments into memory fragments and compute fragments, and assigns the memory fragments to the virtual memory units and the compute fragments to the virtual compute units. The technology disclosed then allocates the virtual memory units to physical memory units and the virtual compute units to physical compute units.
    Type: Grant
    Filed: May 20, 2021
    Date of Patent: August 1, 2023
    Assignee: SambaNova Systems, Inc.
    Inventors: David Alan Koeplinger, Raghu Prabhakar, Sumti Jairath
  • Patent number: 11709664
    Abstract: A compiler configured to configure memory nodes with a ready-to-read credit counter and a write credit counter. The ready-to-read credit counter of a particular upstream memory node initialized with as many read credits as a buffer depth of a corresponding downstream memory node. The ready-to-read credit counter configured to decrement when a buffer data unit is written by the particular upstream memory node into the corresponding downstream memory node, and to increment when the particular upstream memory node receives from the corresponding downstream memory node a read ready token. The write credit counter of the particular upstream memory node initialized with one or more write credits and configured to decrement when the particular upstream memory node begins writing the buffer data unit into the corresponding downstream memory node, and to increment when the particular upstream memory node receives from the corresponding downstream memory node a write done token.
    Type: Grant
    Filed: June 2, 2020
    Date of Patent: July 25, 2023
    Assignee: SambaNova Systems, Inc.
    Inventors: Weiwei Chen, Raghu Prabhakar, David Alan Koeplinger, Sitanshu Gupta, Ruddhi Arun Chaphekar, Ajit Punj, Sumti Jairath
  • Publication number: 20230205501
    Abstract: The technology disclosed provides a system that comprises a processor with computing units on an integrated circuit substrate. The processor is configured to map a program across multiple hardware stages with each hardware stage executing a corresponding operation of the program at a different stage latency dependent on an operation type and an operand format. The system further comprises a runtime logic that configures the compute units with configuration data. The configuration data causes first and second producer hardware stages in a given compute unit to execute first and second data processing operations and produce first and second outputs at first and second stage latencies, and synchronizes consumption of the first and second outputs by a consumer hardware stage in the given compute unit for execution of a third data processing operation by introducing a register storage delay that compensates for a difference between the first and second stage latencies.
    Type: Application
    Filed: December 27, 2022
    Publication date: June 29, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Weiwei CHEN, Raghu PRABHAKAR, David Alan KOEPLINGER
  • Patent number: 11645057
    Abstract: A dataflow graph has operation units that are configured to be producer operation units to produce tensors for execution of the application, and to be consumer operation units to consume the tensors for execution of the application. Compile time logic is configured to process the dataflow graph to determine, for the tensors, expected producer memory layouts, expected consumer memory layouts, and current memory layouts. The expected producer memory layouts specify memory layouts required by the producer operation units that produce the tensors. The expected consumer memory layouts specify the memory layouts required by the consumer operation units that consume the tensors. The current memory layouts specify the memory layouts of the tensors. Each of the memory layouts includes a vector dimension and at least one of a vector ordering and a data alignment.
    Type: Grant
    Filed: September 24, 2020
    Date of Patent: May 9, 2023
    Assignee: SambaNova Systems, Inc.
    Inventors: David Alan Koeplinger, Weiwei Chen, Kevin James Brown, Xiaoming Gu
  • Publication number: 20220147328
    Abstract: A dataflow graph for an application has operation units that are configured to be producers and consumers of tensors. A write access pattern of a particular producer specifies an order in which the particular producer generates elements of a tensor, and a read access pattern of a corresponding consumer specifies an order in which the corresponding consumer processes the elements of the tensor. The technology disclosed detects conflicts between the producers and the corresponding consumers that have mismatches between the write access patterns and the read access patterns. A conflict occurs when the order in which the particular producer generates the elements of the tensor is different from the order in which the corresponding consumer processes the elements of the tensor. The technology disclosed resolves the conflicts by inserting buffers between the producers and the corresponding consumers.
    Type: Application
    Filed: January 24, 2022
    Publication date: May 12, 2022
    Applicant: SambaNova Systems, Inc.
    Inventors: Kevin James BROWN, David Alan KOEPLINGER, Weiwei CHEN, Xiaoming GU
  • Publication number: 20220092247
    Abstract: A dataflow graph has operation units that are configured to be producer operation units to produce tensors for execution of the application, and to be consumer operation units to consume the tensors for execution of the application. Compile time logic is configured to process the dataflow graph to determine, for the tensors, expected producer memory layouts, expected consumer memory layouts, and current memory layouts. The expected producer memory layouts specify memory layouts required by the producer operation units that produce the tensors. The expected consumer memory layouts specify the memory layouts required by the consumer operation units that consume the tensors. The current memory layouts specify the memory layouts of the tensors. Each of the memory layouts includes a vector dimension and at least one of a vector ordering and a data alignment.
    Type: Application
    Filed: September 24, 2020
    Publication date: March 24, 2022
    Applicant: SambaNova Systems, Inc.
    Inventors: David Alan KOEPLINGER, Weiwei CHEN, Kevin James BROWN, Xiaoming GU
  • Publication number: 20220058034
    Abstract: A data processing system comprises a pool of reconfigurable data flow resources and a runtime processor. The pool of reconfigurable data flow resources includes arrays of physical configurable units and memory. The runtime processor includes logic to receive a plurality of configuration files for user applications. The configuration files include configurations of virtual data flow resources required to execute the user applications. The runtime processor also includes logic to allocate physical configurable units and memory in the pool of reconfigurable data flow resources to the virtual data flow resources and load the configuration files to the allocated physical configurable units. The runtime processor further includes logic to execute the user applications using the allocated physical configurable units and memory.
    Type: Application
    Filed: August 18, 2020
    Publication date: February 24, 2022
    Applicant: SambaNova Systems, Inc.
    Inventors: Gregory Frederick GROHOSKI, Manish K. SHAH, Raghu PRABHAKAR, Mark LUTTRELL, Ravinder KUMAR, Kin Hing LEUNG, Ranen CHATTERJEE, Sumti JAIRATH, David Alan KOEPLINGER, Ram SIVARAMAKRISHNAN, Matthew Thomas GRIMM
  • Patent number: 11237971
    Abstract: A dataflow graph for an application has operation units that are configured to be producers and consumers of tensors. A write access pattern of a particular producer specifies an order in which the particular producer generates elements of a tensor, and a read access pattern of a corresponding consumer specifies an order in which the corresponding consumer processes the elements of the tensor. The technology disclosed detects conflicts between the producers and the corresponding consumers that have mismatches between the write access patterns and the read access patterns. A conflict occurs when the order in which the particular producer generates the elements of the tensor is different from the order in which the corresponding consumer processes the elements of the tensor. The technology disclosed resolves the conflicts by inserting buffers between the producers and the corresponding consumers.
    Type: Grant
    Filed: September 16, 2020
    Date of Patent: February 1, 2022
    Assignee: SambaNova Systems, Inc.
    Inventors: Kevin James Brown, David Alan Koeplinger, Weiwei Chen, Xiaoming Gu
  • Publication number: 20210373867
    Abstract: A compiler configured to configure memory nodes with a ready-to-read credit counter and a write credit counter. The ready-to-read credit counter of a particular upstream memory node initialized with as many read credits as a buffer depth of a corresponding downstream memory node. The ready-to-read credit counter configured to decrement when a buffer data unit is written by the particular upstream memory node into the corresponding downstream memory node, and to increment when the particular upstream memory node receives from the corresponding downstream memory node a read ready token. The write credit counter of the particular upstream memory node initialized with one or more write credits and configured to decrement when the particular upstream memory node begins writing the buffer data unit into the corresponding downstream memory node, and to increment when the particular upstream memory node receives from the corresponding downstream memory node a write done token.
    Type: Application
    Filed: June 2, 2020
    Publication date: December 2, 2021
    Applicant: SambaNova Systems, Inc.
    Inventors: Weiwei CHEN, Raghu PRABHAKAR, David Alan KOEPLINGER, Sitanshu GUPTA, Ruddhi Arun CHAPHEKAR, Ajit PUNJ, Sumti JAIRATH
  • Publication number: 20210271630
    Abstract: The technology disclosed partitions a dataflow graph of a high-level program into memory allocations and execution fragments. The memory allocations represent creation of logical memory spaces in on-processor and/or off-processor memories for data required to implement the dataflow graph. The execution fragments represent operations on the data. The technology disclosed designates the memory allocations to virtual memory units and the execution fragments to virtual compute units. The technology disclosed partitions the execution fragments into memory fragments and compute fragments, and assigns the memory fragments to the virtual memory units and the compute fragments to the virtual compute units. The technology disclosed then allocates the virtual memory units to physical memory units and the virtual compute units to physical compute units.
    Type: Application
    Filed: May 20, 2021
    Publication date: September 2, 2021
    Applicant: SambaNova Systems, Inc.
    Inventors: David Alan KOEPLINGER, Raghu PRABHAKAR, Sumti JAIRATH