Patents by Inventor Ljubisa Bajic

Ljubisa Bajic has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20230041130
    Abstract: Methods and systems related to the efficient execution of complex computations by a multicore processor and the movement of data among the various processing cores in the multicore processor are disclosed. A multicore processor includes a set of processing cores and associated sets of processing pipelines, core controllers, routers, and network interface units. The multicore processor also includes a computation layer, for conducting computations using the set of processing cores, with executable instructions for the set of processing pipelines which are executed by the set of core controllers. The multicore processor also includes a network-on-chip layer, for connecting the set of processing cores in the multicore processor, with executable instructions for the set of routers and the set of network interface units.
    Type: Application
    Filed: September 14, 2022
    Publication date: February 9, 2023
    Inventors: Davor Capalija, Ivan Matosevic, Jasmina Vasiljevic, Utku Aydonat, Andrew Lewycky, S. Alexander Chin, Ljubisa Bajic, Alex Cejkov, Milos Trajkovic
  • Patent number: 11520701
    Abstract: Methods and systems associated with caches are disclosed. One disclosed system includes at least one memory storing at least two data structures. The at least two data structures include a first data structure and a second data structure. The system also includes at least two caches with a first cache which caches the first data structure and a second cache which caches the second data structure. The system also includes a controller communicatively coupled to the at least two caches. The controller separately configures the first cache based on the first data structure and the second cache based on the second data structure. The system also comprises at least one processor communicatively coupled to the at least two caches. The processor accesses each of the at least two data structures using the at least two caches and during the execution of a complex computation.
    Type: Grant
    Filed: April 2, 2021
    Date of Patent: December 6, 2022
    Assignee: Tenstorrent Inc.
    Inventors: Ljubisa Bajic, Davor Capalija, Ivan Matosevic, Alex Cejkov
  • Patent number: 11467846
    Abstract: Methods and systems related to the efficient execution of complex computations by a multicore processor and the movement of data among the various processing cores in the multicore processor are disclosed. A multicore processor stack for the multicore processor can include a computation layer, for conducting computations using the processing cores in the multicore processor, with executable instructions for processing pipelines in the processing cores. The multicore processor stack can also include a network-on-chip layer, for connecting the processing cores in the multicore processor, with executable instructions for routers and network interface units in the multicore processor. The computation layer and the network-on-chip layer can be logically isolated by a network-on-chip overlay layer.
    Type: Grant
    Filed: July 29, 2020
    Date of Patent: October 11, 2022
    Assignee: Tenstorrent Inc.
    Inventors: Davor Capalija, Ivan Matosevic, Jasmina Vasiljevic, Utku Aydonat, Andrew Lewycky, S. Alexander Chin, Ljubisa Bajic
  • Publication number: 20220318614
    Abstract: Methods and systems for the for the accelerated execution of a directed graph are disclosed. The execution can involve the generation of an inference from a set of inputs provided to an artificial neural network. In a specific example, a method for executing a directed graph includes receiving at least two batches of indices. The batches of indices, when used to access a set of embeddings, provide at least two batches of embedding outputs and execute a layer of the directed graph. The method further includes accessing the set of embeddings using the at least two batches of indices. The method further includes rearranging, based on a set of latencies for the accessing step, the at least two batches of embedding outputs into at least two batches or rearranged embeddings. The method further includes providing the at least two batches of rearranged embeddings to a subsequent layer of the directed graph.
    Type: Application
    Filed: April 2, 2021
    Publication date: October 6, 2022
    Applicant: Tenstorrent Inc.
    Inventors: Ljubisa Bajic, Davor Capalija, Ivan Matosevic, Alex Cejkov
  • Publication number: 20220318144
    Abstract: Methods and systems associated with caches are disclosed. One disclosed system includes at least one memory storing at least two data structures. The at least two data structures include a first data structure and a second data structure. The system also includes at least two caches with a first cache which caches the first data structure and a second cache which caches the second data structure. The system also includes a controller communicatively coupled to the at least two caches. The controller separately configures the first cache based on the first data structure and the second cache based on the second data structure. The system also comprises at least one processor communicatively coupled to the at least two caches. The processor accesses each of the at least two data structures using the at least two caches and during the execution of a complex computation.
    Type: Application
    Filed: April 2, 2021
    Publication date: October 6, 2022
    Applicant: Tenstorrent Inc.
    Inventors: Ljubisa Bajic, Davor Capalija, Ivan Matosevic, Alex Cejkov
  • Publication number: 20220245009
    Abstract: Methods and systems for the for executing an application data flow graph using a network of computational nodes are disclosed. In specific examples, the network of computational nodes can be a network-on-chip for a multicore processor. One method includes transitioning first application data from a first source computational node to an intermediary computational node. The method can also include providing second application data, from a computation layer of the network of computational nodes, on the intermediary computational node. The method can also include multicasting the first application data in combination with the second application data from the intermediary computational node to at least two destination computational nodes. The first source computational node, the intermediary computational node, and the at least two destination computational nodes are all in the network of computational nodes.
    Type: Application
    Filed: January 29, 2021
    Publication date: August 4, 2022
    Applicant: Tenstorrent Inc.
    Inventors: Jasmina Vasiljevic, Davor Capalija, Zahi Moudallal, Utku Aydonat, Joseph Chu, S. Alexander Chin, Ljubisa Bajic
  • Publication number: 20220222086
    Abstract: Processing cores with the ability to suppress operations based on a contribution estimate for those operations for purposes of increasing the overall performance of the core are disclosed. Associated methods that can be conducted by such processing cores are also disclosed. One such method includes generating a reference value for a composite computation. A complete execution of the composite computation generates a precise output and requires execution of a set of component computations. The method also includes generating a component computation approximation. The method also includes evaluating the component computation approximation with the reference value. The method also includes executing a partial execution of the composite computation using the component computation approximation to produce an estimated output.
    Type: Application
    Filed: March 28, 2022
    Publication date: July 14, 2022
    Applicant: Tenstorrent Inc.
    Inventors: Ljubisa Bajic, Milos Trajkovic, Ivan Hamer, Syed Gilani
  • Publication number: 20220188106
    Abstract: Processor cores using packet identifiers for routing and computation are disclosed. One method includes executing a complex computation using a set of processing cores. The method includes routing a set of packets using a set of packet identifiers and executing a set of instructions. The set of instructions are defined using a set of operand identifiers. The operand identifiers represent packet identifiers in the set of packet identifiers. In specific implementations the set of the operand identifiers represent packet identifiers in the set of packet identifiers in that a set of memories on the set of processing cores stores data values in common association with both the set of packets, and a set of operands identified by the set of operand identifiers. In specific implementations the set of operand identifiers and packet identifiers are unambiguously mapped to an underlying set of application datums of the complex computation.
    Type: Application
    Filed: March 3, 2022
    Publication date: June 16, 2022
    Applicant: Tenstorrent Inc.
    Inventors: Davor Capalija, Ljubisa Bajic, Jasmina Vasiljevic
  • Patent number: 11301264
    Abstract: Processing cores with the ability to suppress operations based on a contribution estimate for those operations for purposes of increasing the overall performance of the core are disclosed. Associated methods that can be conducted by such processing cores are also disclosed. One such method includes generating a reference value for a composite computation. A complete execution of the composite computation generates a precise output and requires execution of a set of component computations. The method also includes generating a component computation approximation. The method also includes evaluating the component computation approximation with the reference value. The method also includes executing a partial execution of the composite computation using the component computation approximation to produce an estimated output.
    Type: Grant
    Filed: February 11, 2020
    Date of Patent: April 12, 2022
    Assignee: Tenstorrent Inc.
    Inventors: Ljubisa Bajic, Milos Trajkovic, Ivan Hamer, Syed Gilani
  • Publication number: 20220100503
    Abstract: Methods and systems for executing an application data flow graph on a set of computational nodes are disclosed. The computational nodes can each include a programmable controller from a set of programmable controllers, a memory from a set of memories, a network interface unit from a set of network interface units, and an endpoint from a set of endpoints. A disclosed method comprises configuring the programmable controllers with instructions. The method also comprises independently and asynchronously executing the instructions using the set of programmable controllers in response to a set of events exchanged between the programmable controllers themselves, between the programmable controllers and the network interface units, between the programmable controllers and the set of endpoints. The method also comprises transitioning data in the set of memories on the computational nodes in accordance with the application data flow graph and in response to the execution of the instructions.
    Type: Application
    Filed: September 28, 2020
    Publication date: March 31, 2022
    Applicant: Tenstorrent Inc.
    Inventors: Ivan Matosevic, Davor Capalija, Jasmina Vasiljevic, Utku Aydonat, S. Alexander Chin, Djordje Maksimovic, Ljubisa Bajic
  • Patent number: 11269628
    Abstract: Processor cores using packet identifiers for routing and computation are disclosed. One method includes executing a complex computation using a set of processing cores. The method includes routing a set of packets using a set of packet identifiers and executing a set of instructions. The set of instructions are defined using a set of operand identifiers. The operand identifiers represent packet identifiers in the set of packet identifiers. In specific implementations the set of the operand identifiers represent packet identifiers in the set of packet identifiers in that a set of memories on the set of processing cores stores data values in common association with both the set of packets, and a set of operands identified by the set of operand identifiers. In specific implementations the set of operand identifiers and packet identifiers are unambiguously mapped to an underlying set of application datums of the complex computation.
    Type: Grant
    Filed: June 15, 2020
    Date of Patent: March 8, 2022
    Assignee: Tenstorrent Inc.
    Inventors: Davor Capalija, Ljubisa Bajic, Jasmina Vasiljevic
  • Patent number: 11245643
    Abstract: Methods and systems related to speculative resource allocation for routing on an interconnect fabric are disclosed herein. One disclosed method includes speculatively allocating a collection of resources to support a set of paths through an interconnect fabric. The method also includes aggregating a set of responses from the set of paths at a branch node on the set of paths. If a resource contention is detected, the set of responses will include an indicator of a resource contention. The method will then further include transmitting, from the branch node and in response to the indicator of the resource contention, a deallocate message downstream and the indicator of the resource contention upstream, and reallocating resources for the multicast after a hold period.
    Type: Grant
    Filed: May 20, 2020
    Date of Patent: February 8, 2022
    Assignee: Tenstorrent Inc.
    Inventors: Ivan Matosevic, Ljubisa Bajic
  • Publication number: 20210382716
    Abstract: A processing core and associated methods for the efficient execution of a directed graph are disclosed. A disclosed processing core includes a memory and a first data tile stored in the memory. The first data tile includes a first set of data elements and metadata stored in association with the first set of data elements. The processing core also includes a second data tile stored in the memory. The second data tile includes a second set of data elements. The processing core also includes an arithmetic logic unit configured to conduct an arithmetic logic operation using data from the first set of data elements and the second set of data elements. The processing core also includes a control unit configured to evaluate the metadata and control the arithmetic logic unit to conditionally execute the arithmetic logic operation based on the evaluation of the metadata.
    Type: Application
    Filed: August 23, 2021
    Publication date: December 9, 2021
    Applicant: Tenstorrent Inc.
    Inventors: Ljubisa Bajic, Milos Trajkovic, Ivan Hamer, Lejla Bajic, Aleksandar Cejkov
  • Publication number: 20210367905
    Abstract: Methods and systems related to speculative resource allocation for routing on an interconnect fabric are disclosed herein. One disclosed method includes speculatively allocating a collection of resources to support a set of paths through an interconnect fabric. The method also includes aggregating a set of responses from the set of paths at a branch node on the set of paths. If a resource contention is detected, the set of responses will include an indicator of a resource contention. The method will then further include transmitting, from the branch node and in response to the indicator of the resource contention, a deallocate message downstream and the indicator of the resource contention upstream, and reallocating resources for the multicast after a hold period.
    Type: Application
    Filed: May 20, 2020
    Publication date: November 25, 2021
    Applicant: Tenstorrent Inc.
    Inventors: Ivan Matosevic, Ljubisa Bajic
  • Patent number: 11113051
    Abstract: A processing core and associated methods for the efficient execution of a directed graph are disclosed. A disclosed processing core comprises a memory and a first data tile stored in the memory. The first data tile includes a first set of data elements and metadata stored in association with the first set of data elements. The processing core also comprises a second data tile stored in the memory. The second data tile includes a second set of data elements. The processing core also comprises an arithmetic logic unit configured to conduct an arithmetic logic operation using data from the first set of data elements and the second set of data elements. The processing core also comprises a control unit configured to evaluate the metadata and control the arithmetic logic unit to conditionally execute the arithmetic logic operation based on the evaluation of the metadata.
    Type: Grant
    Filed: October 8, 2018
    Date of Patent: September 7, 2021
    Assignee: Tenstorrent Inc.
    Inventors: Ljubisa Bajic, Milos Trajkovic, Ivan Hamer, Lejla Bajic, Aleksandar Cejkov
  • Publication number: 20210271450
    Abstract: Processing cores with data associative adaptive rounding and associated methods are disclosed herein. One disclosed processing core comprises an arithmetic logic unit cluster configured to generate a value for a unit of directed graph data using input directed graph data, a comparator coupled to a threshold register and a data register, a core controller configured to load a threshold value into the threshold register when the value for the unit of directed graph data is loaded into the data register, and a rounding circuit. The rounding circuit is configured to receive the value for the unit of directed graph data from the arithmetic logic unit cluster and conditionally round the value for the unit of directed graph data based on a comparator output from the comparator.
    Type: Application
    Filed: May 17, 2021
    Publication date: September 2, 2021
    Applicant: Tenstorrent Inc
    Inventors: Ljubisa Bajic, Alex Cejkov, Lejla Bajic
  • Patent number: 11010132
    Abstract: Processing cores with data associative adaptive rounding and associated methods are disclosed herein. One disclosed processing core comprises an arithmetic logic unit cluster configured to generate a value for a unit of directed graph data using input directed graph data, a comparator coupled to a threshold register and a data register, a core controller configured to load a threshold value into the threshold register when the value for the unit of directed graph data is loaded into the data register, and a rounding circuit. The rounding circuit is configured to receive the value for the unit of directed graph data from the arithmetic logic unit cluster and conditionally round the value for the unit of directed graph data based on a comparator output from the comparator.
    Type: Grant
    Filed: September 17, 2019
    Date of Patent: May 18, 2021
    Assignee: Tenstorrent Inc.
    Inventors: Ljubisa Bajic, Alex Cejkov, Lejla Bajic
  • Patent number: 10938413
    Abstract: Methods and systems regarding the rapid and efficient compression and decompression of sparse data are disclosed. One method for compressing a set of data from a sparse matrix includes, evaluating a sequence of data entries from the set of data, extracting a sequence of sparse data values from the sequence, extracting a sequence of non-sparse data value run lengths from the sequence, formulating a set of row pointers from the sequence, storing the sequence of sparse data values in a first set of memory addresses, and storing the sequence of non-sparse data value run lengths in a second set of memory addresses. The set of row pointers identify a set of rows of the sparse matrix in both the first and second sets of memory addresses. Rapid decompression can be conducted using the row pointers.
    Type: Grant
    Filed: April 17, 2020
    Date of Patent: March 2, 2021
    Assignee: Tenstorrent Inc.
    Inventors: Ljubisa Bajic, Alex Cejkov, Lejla Bajic
  • Publication number: 20210042118
    Abstract: A processing core for the efficient execution of a directed graph is disclosed. The processing core includes a memory and a first and a second data tile stored in the memory. The first and second data tiles include a first and a second set of data elements stored contiguously in the memory. The processing core also includes metadata relationally stored with the first data tile in the memory. The processing core also includes an execution engine, a control unit, and an instruction. Execution of the instruction uses the execution engine, a first data element in the first set of data elements, and a second data element in the second set of data elements. The control unit conditions execution of the instruction using the metadata. A standard execution of the instruction generates a standard output. A conditional execution of the instruction operation generates a conditionally executed output.
    Type: Application
    Filed: October 26, 2020
    Publication date: February 11, 2021
    Applicant: Tenstorrent Inc.
    Inventors: Ljubisa Bajic, Milos Trajkovic, Ivan Hamer
  • Publication number: 20210034373
    Abstract: Methods and systems related to the efficient execution of complex computations by a multicore processor and the movement of data among the various processing cores in the multicore processor are disclosed. A multicore processor stack for the multicore processor can include a computation layer, for conducting computations using the processing cores in the multicore processor, with executable instructions for processing pipelines in the processing cores. The multicore processor stack can also include a network-on-chip layer, for connecting the processing cores in the multicore processor, with executable instructions for routers and network interface units in the multicore processor. The computation layer and the network-on-chip layer can be logically isolated by a network-on-chip overlay layer.
    Type: Application
    Filed: July 29, 2020
    Publication date: February 4, 2021
    Applicant: Tenstorrent Inc.
    Inventors: Davor Capalija, Ivan Matosevic, Jasmina Vasiljevic, Utku Aydonat, Andrew Lewycky, S. Alexander Chin, Ljubisa Bajic