Patents by Inventor Davor Capalija

Davor Capalija has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Seamless place and route for heterogenous network of processor cores

Patent number: 11960885

Abstract: Methods and systems related to parallel computing using heterogeneous networks of computational nodes are disclosed herein. A method for executing a complex computation on a heterogeneous set of computational nodes linked together by a set of links in a network is disclosed. The method includes compiling, using a table of bandwidth values for the set of links in the network, a set of instructions for routing data for the execution of the complex computation. The method also includes configuring a set of programmable controllers on the heterogeneous set of computational nodes with the set of instructions. The method also includes executing the set of instructions using the set of programmable controllers. The method also includes routing data through the network to facilitate the execution of the complex computation by the heterogeneous set of computational nodes and in response to the execution of the instructions.

Type: Grant

Filed: April 11, 2022

Date of Patent: April 16, 2024

Assignee: Tenstorrent Inc.

Inventors: Jasmina Vasiljevic, Ljubisa Bajic, Davor Capalija, Stanislav Sokorac
Application data flow graph execution using network-on-chip overlay

Patent number: 11934897

Abstract: Methods and systems for executing an application data flow graph using a network of computational nodes are disclosed. In specific examples, the network of computational nodes can be a network-on-chip for a multicore processor. One method includes transitioning first application data from a first source computational node to an intermediary computational node. The method can also include providing second application data, from a computation layer of the network of computational nodes, on the intermediary computational node. The method can also include multicasting the first application data in combination with the second application data from the intermediary computational node to at least two destination computational nodes. The first source computational node, the intermediary computational node, and the at least two destination computational nodes are all in the network of computational nodes.

Type: Grant

Filed: January 29, 2021

Date of Patent: March 19, 2024

Assignee: Tenstorrent Inc.

Inventors: Jasmina Vasiljevic, Davor Capalija, Zahi Moudallal, Utku Aydonat, Joseph Chu, S. Alexander Chin, Ljubisa Bajic
Processor cores using packet identifiers for routing and computation

Patent number: 11829752

Abstract: Processor cores using packet identifiers for routing and computation are disclosed. One method includes executing a complex computation using a set of processing cores. The method includes routing a set of packets using a set of packet identifiers and executing a set of instructions. The set of instructions are defined using a set of operand identifiers. The operand identifiers represent packet identifiers in the set of packet identifiers. In specific implementations the set of the operand identifiers represent packet identifiers in the set of packet identifiers in that a set of memories on the set of processing cores stores data values in common association with both the set of packets, and a set of operands identified by the set of operand identifiers. In specific implementations the set of operand identifiers and packet identifiers are unambiguously mapped to an underlying set of application datums of the complex computation.

Type: Grant

Filed: March 3, 2022

Date of Patent: November 28, 2023

Assignee: Tenstorrent Inc.

Inventors: Davor Capalija, Ljubisa Bajic, Jasmina Vasiljevic
Memory-Size- and Bandwidth-Efficient Method for Feeding Systolic Array Matrix Multipliers

Publication number: 20230359695

Abstract: Matrix multiplication systolic array feed methods and related processing element (PE) microarchitectures for efficiently implementing systolic array generic matrix multiplier (SGEMM) in integrated circuits is provided. A systolic array architecture may include a processing element array, a column feeder array, and a row feeder array. A bandwidth of external memory may be reduced by a factor of reduction based on interleaving of the matrix data via a feeding pattern of the column feeder array and the row feeder array.

Type: Application

Filed: July 17, 2023

Publication date: November 9, 2023

Inventors: Jack Z. Yinger, Andrew Ling, Tomasz Czajkowski, Davor Capalija, Eriko Nurvitadhi, Deborah Marr
SEAMLESS PLACE AND ROUTE FOR HETEROGENOUS NETWORK OF PROCESSOR CORES

Publication number: 20230325183

Abstract: Methods and systems related to parallel computing using heterogenous networks of computational nodes are disclosed herein. A method for executing a complex computation on a heterogenous set of computational nodes linked together by a set of links in a network is disclosed. The method includes compiling, using a table of bandwidth values for the set of links in the network, a set of instructions for routing data for the execution of the complex computation. The method also includes configuring a set of programmable controllers on the heterogenous set of computational nodes with the set of instructions. The method also includes executing the set of instructions using the set of programmable controllers. The method also includes routing data through the network, to facilitate the execution of the complex computation by the heterogenous set of computational nodes, and in response to the execution of the instructions.

Type: Application

Filed: April 11, 2022

Publication date: October 12, 2023

Applicant: Tenstorrent Inc.

Inventors: Jasmina Vasiljevic, Ljubisa Bajic, Davor Capalija, Stanislav Sokorac
SPARSITY UNIFORMITY ENFORCEMENT FOR MULTICORE PROCESSOR

Publication number: 20230325160

Abstract: Methods and systems relating to the field of parallel computing are disclosed herein. The methods and systems disclosed include approaches for sparsity uniformity enforcement for a set of computational nodes which are used to execute a complex computation. A disclosed method includes determining a sparsity distribution in a set of operand data, and generating, using a compiler, a set of instructions for executing, using the set of operand data and a set of processing cores, a complex computation. Alternatively, the method includes altering the operand data. The method also includes distributing the set of operand data to the set of processing cores for use in executing the complex computation in accordance with the set of instructions. Either the altering is conducted to, or the compiler is programmed to, balance the sparsity distribution among the set of processing cores.

Type: Application

Filed: May 25, 2023

Publication date: October 12, 2023

Inventors: Ljubisa Bajic, Davor Capalija, Yu Ting Chen, Andrew Grebenisan, Hassan Farooq, Akhmed Rakhmati, Stephen Chin, Vladimir Blagojevic, Almeet Bhullar, Jasmina Vasiljevic
OVERLAY LAYER HARDWARE UNIT FOR NETWORK OF PROCESSOR CORES

Publication number: 20230281155

Abstract: Methods and systems for executing an application data flow graph on a set of computational nodes are disclosed. The computational nodes can each include a programmable controller from a set of programmable controllers, a memory from a set of memories, a network interface unit from a set of network interface units, and an endpoint from a set of endpoints. A disclosed method comprises configuring the programmable controllers with instructions. The method also comprises independently and asynchronously executing the instructions using the set of programmable controllers in response to a set of events exchanged between the programmable controllers themselves, between the programmable controllers and the network interface units, and between the programmable controllers and the set of endpoints. The method also comprises transitioning data in the set of memories on the computational nodes in accordance with the application data flow graph and in response to the execution of the instructions.

Type: Application

Filed: May 11, 2023

Publication date: September 7, 2023

Applicant: Tenstorrent Inc.

Inventors: Ivan Matosevic, Davor Capalija, Jasmina Vasiljevic, Utku Aydonat, S. Alexander Chin, Djordje Maksimovic, Ljubisa Bajic
Overlay layer hardware unit for network of processor cores

Patent number: 11734224

Abstract: Methods and systems for executing an application data flow graph on a set of computational nodes are disclosed. The computational nodes can each include a programmable controller from a set of programmable controllers, a memory from a set of memories, a network interface unit from a set of network interface units, and an endpoint from a set of endpoints. A disclosed method comprises configuring the programmable controllers with instructions. The method also comprises independently and asynchronously executing the instructions using the set of programmable controllers in response to a set of events exchanged between the programmable controllers themselves, between the programmable controllers and the network interface units, and between the programmable controllers and the set of endpoints. The method also comprises transitioning data in the set of memories on the computational nodes in accordance with the application data flow graph and in response to the execution of the instructions.

Type: Grant

Filed: September 28, 2020

Date of Patent: August 22, 2023

Assignee: Tenstorrent Inc.

Inventors: Ivan Matosevic, Davor Capalija, Jasmina Vasiljevic, Utku Aydonat, S. Alexander Chin, Djordje Maksimovic, Ljubisa Bajic
RUNTIME PREDICTORS FOR COMPUTATION REDUCTION IN DEPENDENT COMPUTATIONS

Publication number: 20230259579

Abstract: Methods and systems relating to reducing the number of computations required to execute an artificial neural network (ANN) are disclosed herein. A disclosed method includes: generating a summary of a set of data which is an input for a composite computation; executing a simplified composite computation, using the summary, to produce a simplified output; and executing a second simplified composite computation, using the simplified output, to produce a second simplified output which is a predictor. The second simplified composite computation is a simplification of a second composite computation. The composite computations are both part of a complex computation for the directed graph. The second composite computation depends on the composite computation in the directed graph. The method further includes suppressing, while executing the complex computation, a set of component computations from the second composite computation. The set of component computations are selected for suppression based on the predictor.

Type: Application

Filed: January 31, 2022

Publication date: August 17, 2023

Applicant: Tenstorrent Inc.

Inventors: Ljubisa Bajic, Davor Capalija, Yu Ting Chen, Andrew Grebenisan, Hassan Farooq, Akhmed Rakhmati, Stephen Chin, Vladimir Blagojevic, Almeet Bhullar, Jasmina Vasiljevic
PROCESSOR CORES USING CONTENT OBJECT IDENTIFIERS FOR ROUTING AND COMPUTATION

Publication number: 20230236831

Abstract: Processor cores using content object identifiers for routing and computation are disclosed. One method includes executing a complex computation using a set of processing cores. The method includes routing a set of content objects using a set of content object identifiers and executing a set of instructions. The set of instructions are defined using a set of operand identifiers. The operand identifiers represent content object identifiers in the set of content object identifiers. The content objects can be routed according to a named data networking (NDN) or content-centric networking (CCN) paradigm with the content object identifiers mentioned above serving as the names for the computation data being routed by the network.

Type: Application

Filed: March 31, 2023

Publication date: July 27, 2023

Inventors: Davor Capalija, Ljubisa Bajic, Jasmina Vasiljevic, Yongbum Kim
Sparsity uniformity enforcement for multicore processor

Patent number: 11709662

Abstract: Methods and systems relating to the field of parallel computing are disclosed herein. The methods and systems disclosed include approaches for sparsity uniformity enforcement for a set of computational nodes which are used to execute a complex computation. A disclosed method includes determining a sparsity distribution in a set of operand data, and generating, using a compiler, a set of instructions for executing, using the set of operand data and a set of processing cores, a complex computation. Alternatively, the method includes altering the operand data. The method also includes distributing the set of operand data to the set of processing cores for use in executing the complex computation in accordance with the set of instructions. Either the altering is conducted to, or the compiler is programmed to, balance the sparsity distribution among the set of processing cores.

Type: Grant

Filed: November 5, 2021

Date of Patent: July 25, 2023

Assignee: Tenstorrent Inc.

Inventors: Ljubisa Bajic, Davor Capalija, Yu Ting Chen, Andrew Grebenisan, Hassan Farooq, Akhmed Rakhmati, Stephen Chin, Vladimir Blagojevic, Almeet Bhullar, Jasmina Vasiljevic
Sparsity uniformity enforcement for multicore processor

Patent number: 11693639

Abstract: Methods and systems relating to the field of parallel computing are disclosed herein. The methods and systems disclosed include approaches for sparsity uniformity enforcement for a set of computational nodes which are used to execute a complex computation. A disclosed method includes determining a sparsity distribution in a set of operand data, and generating, using a compiler, a set of instructions for executing, using the set of operand data and a set of processing cores, a complex computation. Alternatively, the method includes altering the operand data. The method also includes distributing the set of operand data to the set of processing cores for use in executing the complex computation in accordance with the set of instructions. Either the altering is conducted to, or the compiler is programmed to, balance the sparsity distribution among the set of processing cores.

Type: Grant

Filed: November 5, 2021

Date of Patent: July 4, 2023

Assignee: Tenstorrent Inc.

Inventors: Ljubisa Bajic, Davor Capalija, Yu Ting Chen, Andrew Grebenisan, Hassan Farooq, Akhmed Rakhmati, Stephen Chin, Vladimir Blagojevic, Almeet Bhullar, Jasmina Vasiljevic
RUNTIME PREDICTORS FOR NEURAL NETWORK COMPUTATION REDUCTION

Publication number: 20230196124

Abstract: Methods and systems relating to reducing the number of computations required to execute an artificial neural network (ANN) are disclosed herein. The methods include a computer-implemented method conducted during an execution of an ANN. The method includes generating a set of execution data, generating a summary of a set of neural network data of the ANN, generating a summary of a set of execution data of the execution of the ANN, generating a prediction using the summary of the set of neural network data and the summary of the set of execution data, and executing a composite computation. The composite computation is required for the execution of the ANN. The method also includes suppressing a set of component computations of the composite computation. The set of suppressed component computations is at least partly determined by the prediction.

Type: Application

Filed: December 22, 2021

Publication date: June 22, 2023

Applicant: Tenstorrent Inc.

Inventors: Ljubisa Bajic, Davor Capalija, Yu Ting Chen, Andrew Grebenisan, Hassan Farooq, Akhmed Rakhmati, Stephen Chin, Vladimir Blagojevic, Almeet Bhullar, Jasmina Vasiljevic
COMPUTATIONAL CIRCUIT WITH HIERARCHICAL ACCUMULATOR

Publication number: 20230177106

Abstract: Methods and systems relating to computational circuitry are disclosed herein. A disclosed computational circuit includes a math circuit, a first accumulator, and a second accumulator. The first accumulator has a first memory. The second accumulator has a second memory. The first accumulator is communicatively connected to the math circuit and accumulates values from the math circuit in the first memory. The second accumulator is communicatively connected to the first memory and accumulates values from the first memory in the second memory. The first memory is faster and smaller than the second memory.

Type: Application

Filed: December 8, 2021

Publication date: June 8, 2023

Applicant: Tenstorrent Inc.

Inventors: Davor Capalija, Ljubisa Bajic, Alex Cejkov
Sparsity Uniformity Enforcement for Multicore Processor

Publication number: 20230146541

Abstract: Methods and systems relating to the field of parallel computing are disclosed herein. The methods and systems disclosed include approaches for sparsity uniformity enforcement for a set of computational nodes which are used to execute a complex computation. A disclosed method includes determining a sparsity distribution in a set of operand data, and generating, using a compiler, a set of instructions for executing, using the set of operand data and a set of processing cores, a complex computation. Alternatively, the method includes altering the operand data. The method also includes distributing the set of operand data to the set of processing cores for use in executing the complex computation in accordance with the set of instructions. Either the altering is conducted to, or the compiler is programmed to, balance the sparsity distribution among the set of processing cores.

Type: Application

Filed: November 5, 2021

Publication date: May 11, 2023

Applicant: Tenstorrent Inc.

Inventors: Ljubisa Bajic, Davor Capalija, Yu Ting Chen, Andrew Grebenisan, Hassan Farooq, Ahmed Rakhmati, Stephen Chin, Vladimir Blagojevic, Almeet Bhullar, Jasmina Vasiljevic
Sparsity Uniformity Enforcement for Multicore Processor

Publication number: 20230143538

Abstract: Methods and systems relating to the field of parallel computing are disclosed herein. The methods and systems disclosed include approaches for sparsity uniformity enforcement for a set of computational nodes which are used to execute a complex computation. A disclosed method includes determining a sparsity distribution in a set of operand data, and generating, using a compiler, a set of instructions for executing, using the set of operand data and a set of processing cores, a complex computation. Alternatively, the method includes altering the operand data. The method also includes distributing the set of operand data to the set of processing cores for use in executing the complex computation in accordance with the set of instructions. Either the altering is conducted to, or the compiler is programmed to, balance the sparsity distribution among the set of processing cores.

Type: Application

Filed: November 5, 2021

Publication date: May 11, 2023

Applicant: Tenstorrent Inc.

Inventors: Ljubisa Bajic, Davor Capalija, Yu Ting Chen, Andrew Grebenisan, Hassan Farooq, Ahmed Rakhmati, Stephen Chin, Vladimir Blagojevic, Almeet Bhullar, Jasmina Vasiljevic
DATA STRUCTURE OPTIMIZED DEDICATED MEMORY CACHES

Publication number: 20230062891

Abstract: Methods and systems associated with caches are disclosed. One disclosed system includes at least one memory storing at least two data structures. The at least two data structures include a first data structure and a second data structure. The system also includes at least two caches with a first cache which caches the first data structure and a second cache which caches the second data structure. The system also includes a controller communicatively coupled to the at least two caches. The controller separately configures the first cache based on the first data structure and the second cache based on the second data structure. The system also comprises at least one processor communicatively coupled to the at least two caches. The processor accesses each of the at least two data structures using the at least two caches and during the execution of a complex computation.

Type: Application

Filed: November 7, 2022

Publication date: March 2, 2023

Inventors: Ljubisa Bajic, Davor Capalija, Ivan Matosevic, Alex Cejkov
Memory-Size- and Bandwidth-Efficient Method for Feeding Systolic Array Matrix Multipliers

Publication number: 20230064381

Abstract: Matrix multiplication systolic array feed methods and related processing element (PE) microarchitectures for efficiently implementing systolic array generic matrix multiplier (SGEMM) in integrated circuits is provided. A systolic array architecture may include a processing element array, a column feeder array, and a row feeder array. A bandwidth of external memory may be reduced by a factor of reduction based on interleaving of the matrix data via a feeding pattern of the column feeder array and the row feeder array.

Type: Application

Filed: May 9, 2022

Publication date: March 2, 2023

Inventors: Jack Z. Yinger, Andrew Ling, Tomasz Czajkowski, Davor Capalija, Eriko Nurvitadhi, Deborah Marr
OVERLAY LAYER FOR NETWORK OF PROCESSOR CORES

Publication number: 20230041130

Abstract: Methods and systems related to the efficient execution of complex computations by a multicore processor and the movement of data among the various processing cores in the multicore processor are disclosed. A multicore processor includes a set of processing cores and associated sets of processing pipelines, core controllers, routers, and network interface units. The multicore processor also includes a computation layer, for conducting computations using the set of processing cores, with executable instructions for the set of processing pipelines which are executed by the set of core controllers. The multicore processor also includes a network-on-chip layer, for connecting the set of processing cores in the multicore processor, with executable instructions for the set of routers and the set of network interface units.

Type: Application

Filed: September 14, 2022

Publication date: February 9, 2023

Inventors: Davor Capalija, Ivan Matosevic, Jasmina Vasiljevic, Utku Aydonat, Andrew Lewycky, S. Alexander Chin, Ljubisa Bajic, Alex Cejkov, Milos Trajkovic
Data structure optimized dedicated memory caches

Patent number: 11520701

Abstract: Methods and systems associated with caches are disclosed. One disclosed system includes at least one memory storing at least two data structures. The at least two data structures include a first data structure and a second data structure. The system also includes at least two caches with a first cache which caches the first data structure and a second cache which caches the second data structure. The system also includes a controller communicatively coupled to the at least two caches. The controller separately configures the first cache based on the first data structure and the second cache based on the second data structure. The system also comprises at least one processor communicatively coupled to the at least two caches. The processor accesses each of the at least two data structures using the at least two caches and during the execution of a complex computation.

Type: Grant

Filed: April 2, 2021

Date of Patent: December 6, 2022

Assignee: Tenstorrent Inc.

Inventors: Ljubisa Bajic, Davor Capalija, Ivan Matosevic, Alex Cejkov

1 2 next