Patents Assigned to Graphcore Limited
  • Publication number: 20200225960
    Abstract: A method for debugging a processor which is executing vertices of a software application is described. Each vertex is assigned to a programming thread of the processor. The processor has debug hardware for raising exceptions in certain break conditions. The method comprises inspecting a vertex identifier, comparing the vertex identifier and raising an instruction exception event for the programming thread if the vertex identifier assigned to the thread matches the vertex break identifier in the debug hardware. Exceptions are raised based on identified vertices, rather than just individual instructions or instruction addresses.
    Type: Application
    Filed: May 22, 2019
    Publication date: July 16, 2020
    Applicant: Graphcore Limited
    Inventors: Alan Graham Alexander, Richard Luke Southwell Osborne, Matthew David Fyles
  • Publication number: 20200218523
    Abstract: A method for generating a program to run on multiple tiles. The method comprises: receiving an input graph comprising data nodes, compute vertices and edges; receiving an initial tile-mapping specifying which data nodes and vertices are allocated to which tile; and determining a subgraph of the input graph that meets one or more heuristic rules. The rules comprises: the subgraph comprises at least one data node, the subgraph spans no more than a threshold number of tiles in the initial tile-mapping, and the subgraph comprises at least a minimum number of edges outputting to one or more vertices on one or more other tiles. The method further comprises adapting the initial mapping to migrate the data nodes and any vertices of the determined subgraph to said one or more other tiles, and compiling the executable program from the graph with the vertices and data nodes allocated by the adapted mapping.
    Type: Application
    Filed: February 15, 2019
    Publication date: July 9, 2020
    Applicant: Graphcore Limited
    Inventors: Mark Lloyd Pupilli, David Lacey
  • Patent number: 10705998
    Abstract: A processing system comprising: multiple processor modules, each comprising a respective execution unit memory; and an interconnect for exchanging data between different sets of the processor modules. A group of the processor modules operates in a series of BSP supersteps. For the exchange phase of each superstep, each receiving processor module that is to receive data from outside its own set is pre-programmed with a value representing the number of units of data to receive. Starting from the pre-programmed value, it then counts out the number of data units remaining to be received each time a data unit is received. Each receiving processor module is further arranged to perform an exchange synchronization whereby, before advancing from the exchange phase to the compute phase of the current superstep, the receiving processor module waits until no units of data remain to be received according to the count.
    Type: Grant
    Filed: February 15, 2019
    Date of Patent: July 7, 2020
    Assignee: Graphcore Limited
    Inventors: Daniel John Pelham Wilkinson, Alan Graham Alexander
  • Publication number: 20200210175
    Abstract: A processor comprising a barrel-threaded execution unit for executing concurrent threads, and one or more register files comprising a respective set of context registers for each concurrent thread. One of the register files further comprises a set of shared weights registers common to some or all of the concurrent threads. The types of instruction defined in the instruction set of the processor include an arithmetic instruction having operands specifying a source and a destination from amongst a respective set of arithmetic registers of the thread in which the arithmetic instruction is executed. The execution unit is configured so as, in response to the opcode of the arithmetic instruction, to perform an operation comprising multiplying an input from the source by at least one of the weights from at least one of the shared weights registers, and to place a result in the destination.
    Type: Application
    Filed: February 15, 2019
    Publication date: July 2, 2020
    Applicant: Graphcore Limited
    Inventors: Alan Graham Alexander, Simon Christian Knowles, Mrudula Chidambar Gore
  • Publication number: 20200210192
    Abstract: A processor comprising: a barrel-threaded execution unit for executing concurrent threads, and a repeat cache shared between the concurrent threads. The processor's instruction set includes a repeat instruction which takes a repeat count operand. When the repeat cache is not claimed and the repeat instruction is executed in a first thread, a portion of code is cached from the first thread into the repeat cache, the state of the repeat cache is changed to record it as claimed, and the cached code is executed a number of times. When the repeat instruction is then executed in a further thread, then the already-cached portion of code is again executed a respective number of times, each time from the repeat cache. For each of the first and further instructions, the repeat count operand in the respective instruction specifies the number of times to execute the cached code.
    Type: Application
    Filed: February 15, 2019
    Publication date: July 2, 2020
    Applicant: Graphcore Limited
    Inventors: Alan Graham Alexander, Simon Christian Knowles, Mrudula Chidambar Gore, Jonathan Louis Ferguson
  • Publication number: 20200210187
    Abstract: A processor having an instruction set including a load-store instruction having operands specifying, from amongst the registers in at least one register file, a respective destination of each of two load operations, a respective source of a store operation, and a pair of address registers arranged to hold three memory addresses, the three memory addresses being a respective load address for each the two load operations and a respective store address for the store operation. The load-store instruction further includes three immediate stride operands each specifying a respective stride value for each of the two load addresses and one store address, wherein at least some possible values of each immediate stride operand specify the respective stride value by specifying one of a plurality of fields within a stride register in one of the one or more register files, each field holding a different stride value.
    Type: Application
    Filed: February 15, 2019
    Publication date: July 2, 2020
    Applicant: Graphcore Limited
    Inventors: Alan Graham Alexander, Simon Christian Knowles, Mrudula Chidambar Gore
  • Publication number: 20200210364
    Abstract: A processing system comprising: multiple processor modules, each comprising a respective execution unit memory; and an interconnect for exchanging data between different sets of the processor modules. A group of the processor modules operates in a series of BSP supersteps. For the exchange phase of each superstep, each receiving processor module that is to receive data from outside its own set is pre-programmed with a value representing the number of units of data to receive. Starting from the pre-programmed value, it then counts out the number of data units remaining to be received each time a data unit is received. Each receiving processor module is further arranged to perform an exchange synchronization whereby, before advancing from the exchange phase to the compute phase of the current superstep, the receiving processor module waits until no units of data remain to be received according to the count.
    Type: Application
    Filed: February 15, 2019
    Publication date: July 2, 2020
    Applicant: Graphcore Limited
    Inventors: Daniel John Pelham Wilkinson, Alan Graham Alexander
  • Publication number: 20200201636
    Abstract: A multi-tile processing system has a plurality of tiles each having an execution unit, and an interconnect operable to conduct communications between a group of the tiles according to a bulk synchronous parallel scheme. The execution unit is operable to execute instructions of an instruction set which has a synchronisation instruction for execution by each tile upon completion of its compute phase. The execution of the synchronisation instruction depends on the state of an exception enable flag. In one state, the synchronisation instruction causes the execution unit to send the synchronisation request to hardware logic in the interconnect. In another state of the exception enable flag the synchronisation instruction does not send the synchronisation request, but sets an exception events status to permit interrogation access to the tile. A corresponding method of controlling the debug states of the processing system is provided.
    Type: Application
    Filed: April 26, 2019
    Publication date: June 25, 2020
    Applicant: Graphcore Limited
    Inventors: Alan Graham Alexander, Matthew David Fyles
  • Publication number: 20200201600
    Abstract: A method and apparatus for handling overflow conditions resulting from arithmetic operations involving floating point numbers. An indication is stored as part of a thread's context indicating one of two possible modes for handling overflow conditions. In a first mode, a result of an arithmetic operation is set to the limit representable in the floating point format. In a second mode, a result of an arithmetic operation is set to a NaN.
    Type: Application
    Filed: April 26, 2019
    Publication date: June 25, 2020
    Applicant: Graphcore Limited
    Inventors: Alan Graham Alexander, Edward Andrews, Stephen Felix, Mrudula Chidambar Gore
  • Publication number: 20200201810
    Abstract: A method of recording tile identifiers in each of a plurality of tiles of a multitile processor is described. Tiles are arranged in columns, each column having a plurality of processing circuits, each processing circuit comprising one or more tiles, wherein a base processing circuit in each column is connected to a set of processing circuit identifier wires. A base value is generated on each of the set of processing circuit identifier wires for the base processing circuit in each column. At the base processing circuit, the base value on the set of processing circuit identifier wires is read and incremented by one. The incremented value is propagated to a next processing circuit in the column, and at the next processing circuit a unique identifier is recorded by concatenating an identifier of the column and the incremented value.
    Type: Application
    Filed: May 22, 2019
    Publication date: June 25, 2020
    Applicant: Graphcore Limited
    Inventors: Stephen Felix, Jonathan Mangnall
  • Publication number: 20200201811
    Abstract: A time deterministic computer is architected so that exchange code compiled for one set of tiles, e.g., a column, can be reused on other sets.
    Type: Application
    Filed: May 22, 2019
    Publication date: June 25, 2020
    Applicant: Graphcore Limited
    Inventors: Stephen Felix, Simon Christian Knowles
  • Publication number: 20200201601
    Abstract: A function approximation system is disclosed for determining output floating point values of functions calculated using floating point numbers. Complex functions have different shapes in different subsets of their input domain, making them difficult to predict for different values of the input variable. The function approximation system comprises an execution unit configured to determine corresponding values of a given function given a floating point input to the function; a plurality of look up tables for each function type; a correction table of values which determines if corrections to the output value are required; and a table selector for finding an appropriate table for a given function.
    Type: Application
    Filed: April 26, 2019
    Publication date: June 25, 2020
    Applicant: Graphcore Limited
    Inventors: Jonathan Mangnall, Stephen Felix
  • Publication number: 20200201704
    Abstract: A processor comprises a plurality of processing units, wherein there is a fixed transmission time for transmitting a message from a sending processing unit to a receiving processing unit, based on the physical positions of the sending and receiving processing units in the processor. The processing units are arranged in a column, and the fixed transmission time depends on the position of a processing circuit in the column. An exchange fabric is provided for exchanging messages between sending and receiving processing units, the columns being arranged with respect to the exchange fabric such that the fixed transmission time depends on the distances of the processing circuits with respect to the exchange fabric. The processor comprises at least one delay stage for each processing circuit and switching circuitry for selectively switching the delay stage into or out of a communication path involved in message exchange.
    Type: Application
    Filed: April 26, 2019
    Publication date: June 25, 2020
    Applicant: Graphcore Limited
    Inventor: Stephen Felix
  • Publication number: 20200201794
    Abstract: The present disclosure relates to a method of scheduling messages to be exchanged between tiles in a computer where there is a fixed transmission time between sending and receiving tiles. According to the method a total size of message data to be sent or received by each tile is determined. One of the tiles is selected based at least on the size of the message data to schedule a first message. The first message to be scheduled is selected from the set of messages on that tile. In order to schedule the message the other end points of this selected message are determined, and then respective time slots are allocated at the sending and receiving tiles for that message. The size of the selected message is then deducted from each of the tiles acting as end points for the message, and then the sequence is carried out again until all messages have been scheduled. This technique optimises message exchange in an exchange phase of a BSP system.
    Type: Application
    Filed: April 26, 2019
    Publication date: June 25, 2020
    Applicant: Graphcore Limited
    Inventors: Richard Luke Southwell Osborne, Stephen Felix
  • Publication number: 20200201604
    Abstract: A pseudo random number generator implemented in hardware. The pseudo random number generator comprises a state post processing circuit for processing two state values to produce a random number. The circuit having a first combinatorial logic comprising a XOR or XNOR gate configured to process a first pair of bits from the state values, a second combinatorial logic comprising an OR or AND gate configured to process a second pair of bits from the state value, and third combinatorial logic comprising an OR or AND gate configured or process a third pair of bits from the state value. The circuit has fourth combinatorial logic configured to process the outputs of the first three set of combinatorial logic so as to provide a result bit of the random number. The fourth combinatorial logic comprises an AND or OR gate and a XOR or XNOR gate.
    Type: Application
    Filed: April 26, 2019
    Publication date: June 25, 2020
    Applicant: Graphcore Limited
    Inventors: Stephen Felix, James William Hanlon
  • Publication number: 20200201652
    Abstract: A multitile processing system has an execution unit on each tile, and an interconnect which conducts communications between the tiles according to a bulk synchronous parallel scheme. Each tile performs an on-tile compute phase followed by an intertile exchange phase, where the exchange phase is held back until all tiles in a particular group have completed the compute phase. On completion of the compute phase, each tile generates a synchronisation request and pauses an issue of instructions until it receives a synchronisation acknowledgement. If a tile attains an excepted state, it raises an exception signal and pauses instruction issue until the excepted state has been resolved. However, tiles which are not in the excepted state can continue to perform their on-tile computer phase, and will issue their own synchronisation request in their own normal time frame.
    Type: Application
    Filed: May 22, 2019
    Publication date: June 25, 2020
    Applicant: Graphcore Limited
    Inventors: Alan Graham Alexander, Matthew David Fyles
  • Patent number: 10691432
    Abstract: A method for generating a program to run on multiple tiles. The method comprises: receiving an input graph comprising data nodes, compute vertices and edges; receiving an initial tile-mapping specifying which data nodes and vertices are allocated to which tile; and determining a subgraph of the input graph that meets one or more heuristic rules. The rules comprises: the subgraph comprises at least one data node, the subgraph spans no more than a threshold number of tiles in the initial tile-mapping, and the subgraph comprises at least a minimum number of edges outputting to one or more vertices on one or more other tiles. The method further comprises adapting the initial mapping to migrate the data nodes and any vertices of the determined subgraph to said one or more other tiles, and compiling the executable program from the graph with the vertices and data nodes allocated by the adapted mapping.
    Type: Grant
    Filed: February 15, 2019
    Date of Patent: June 23, 2020
    Assignee: Graphcore Limited
    Inventors: Mark Lloyd Pupilli, David Lacey
  • Patent number: 10628377
    Abstract: A processing system comprising an arrangement of tiles and synchronization logic in the form of hardware logic for coordinating between a group of some or all of said tiles. The instruction set comprises a synchronization instruction which causes an instance of a synchronization request to be transmitted from the respective tile to the synchronization logic, and suspends instruction issue on the respective tile pending a synchronization acknowledgement. In response to receiving an instance of the synchronization request from all of the tiles of the group, the synchronization logic returns the synchronization acknowledgment back to each of the tiles in the group to allow the instruction issue to resume. The instruction set further comprises an abstain instruction, which sends an instance of the synchronization request but does not suspend instruction issue on the respective tile pending the synchronization acknowledgement, instead allowing the instruction issue on the respective tile to continue.
    Type: Grant
    Filed: February 1, 2018
    Date of Patent: April 21, 2020
    Assignee: Graphcore Limited
    Inventors: Daniel John Pelham Wilkinson, Richard Luke Southwell Osborne, Matthew David Fyles, Alan Graham Alexander, Stephen Felix
  • Patent number: 10613833
    Abstract: The present relates to invention deals with an execution unit configured to execute a computer program instruction to generate random numbers based on a predetermined probability distribution. The execution unit comprises a hardware pseudorandom number generator configured to generate at least randomised bit string on execution of the instruction and adding circuitry which is configured to receive a number of bit sequences of a predetermined bit length selected from the randomised bit string and to sum them to produce a result.
    Type: Grant
    Filed: February 1, 2018
    Date of Patent: April 7, 2020
    Assignee: Graphcore Limited
    Inventors: Stephen Felix, Godfrey Da Costa
  • Patent number: 10606641
    Abstract: A processor comprising: an execution unit for executing a respective thread in each of a repeating sequence of time slots; and a plurality of context register sets, each comprising a respective set of registers for representing a state of a respective thread. The context register sets comprise a respective worker context register set for each of the number of time slots the execution unit is operable to interleave, and at least one extra context register set. The worker context register sets represent the respective states of worker threads and the extra context register set being represents the state of a supervisor thread. The processor is configured to begin running the supervisor thread in each of the time slots, and to enable the supervisor thread to then individually relinquish each of the time slots in which it is running to a respective one of the worker threads.
    Type: Grant
    Filed: October 19, 2018
    Date of Patent: March 31, 2020
    Assignee: Graphcore Limited
    Inventor: Simon Christian Knowles