Patents Assigned to Graphcore Limited
  • Patent number: 11237882
    Abstract: A gateway for interfacing a host with a subsystem for acting as a work accelerator to the host. The gateway enables the transfer of batches of data to the subsystem at precompiled data exchange synchronisation points. The gateway comprises a streaming engine having a data mover engine and a memory management engine, the data mover engine and memory management engine being configured to execute instructions in coordination from work descriptors. The memory management engine is configured to execute instructions from the work descriptor to transfer data between external storage and the local memory associated with the gateway. The data mover engine is configured to execute instructions from the work descriptor to transfer data between the local memory associated with the gateway and the subsystem.
    Type: Grant
    Filed: December 28, 2018
    Date of Patent: February 1, 2022
    Assignee: Graphcore Limited
    Inventors: Ola Tørudbakken, Daniel John Pelham Wilkinson, Richard Luke Southwell Osborne, Brian Manula, Harald Høeg
  • Patent number: 11176066
    Abstract: The present disclosure relates to a method of scheduling messages to be exchanged between tiles in a computer where there is a fixed transmission time between sending and receiving tiles. According to the method a total size of message data to be sent or received by each tile is determined. One of the tiles is selected based at least on the size of the message data to schedule a first message. The first message to be scheduled is selected from the set of messages on that tile. In order to schedule the message the other end points of this selected message are determined, and then respective time slots are allocated at the sending and receiving tiles for that message. The size of the selected message is then deducted from each of the tiles acting as end points for the message, and then the sequence is carried out again until all messages have been scheduled. This technique optimises message exchange in an exchange phase of a BSP system.
    Type: Grant
    Filed: April 26, 2019
    Date of Patent: November 16, 2021
    Assignee: Graphcore Limited
    Inventors: Richard Luke Southwell Osborne, Stephen Felix
  • Patent number: 11169777
    Abstract: A method and apparatus for handling overflow conditions resulting from arithmetic operations involving floating point numbers. An indication is stored as part of a thread's context indicating one of two possible modes for handling overflow conditions. In a first mode, a result of an arithmetic operation is set to the limit representable in the floating point format. In a second mode, a result of an arithmetic operation is set to a NaN.
    Type: Grant
    Filed: April 26, 2019
    Date of Patent: November 9, 2021
    Assignee: Graphcore Limited
    Inventors: Alan Graham Alexander, Edward Andrews, Stephen Felix, Mrudula Chidambar Gore
  • Patent number: 11106432
    Abstract: An execution unit is described which is particularly configured to generate an exponential of an operand floating point format. The operand is multiplied by a fixed multiplicand, logged to the base 2 (e) to generate a multiplication result. An integer part and a fractional part are extracted from the multiplication result. An exponent register stores the integer part to form the exponent of the exponential result. A lookup table has a plurality of entries each providing a value of 2f for a fractional part f used to access a lookup table. The fractional part is derived from a mantissa of the operand. That is, first and second bit sequences are extracted from the mantissa. One of the bit sequences is used to generate an estimated fractional component, and the other is used to access a value from the lookup table.
    Type: Grant
    Filed: May 31, 2019
    Date of Patent: August 31, 2021
    Assignee: Graphcore Limited
    Inventors: Jonathan Mangnall, Stephen Felix
  • Patent number: 11061679
    Abstract: A processor comprising an execution unit, memory and one or more register files. The execution unit is configured to execute instances of machine code instructions from an instruction set. The types of instruction defined in the instruction set include a double-load instruction for loading from the memory to at least one of the one or more register files. The execution unit is configured so as, when the load instruction is executed, to perform a first load operation strided by a fixed stride, and a second load operation strided by a variable stride, the variable stride being specified in a variable stride register in one of the one or more register files.
    Type: Grant
    Filed: April 19, 2019
    Date of Patent: July 13, 2021
    Assignee: Graphcore Limited
    Inventors: Alan Graham Alexander, Simon Christian Knowles, Mrudula Chidambar Gore
  • Patent number: 11048563
    Abstract: A processing system comprising: a subsystem for acting as a work accelerator to a host processor, the subsystem comprising an arrangement of tiles; and an interconnect for communicating between the tiles and connecting the subsystem to the host. The interconnect comprises synchronization logic to coordinate barrier synchronizations between a group of the tiles. The synchronization logic comprises a host sync proxy module, comprising a counter written with a number of credits by the host processor, and being configured to automatically decrement the number of credits each time one of the barrier synchronizations requiring host involvement is performed. When the number of credits in the counter is exhausted, the barrier is not released until a further write from the host to the host sync proxy module, but when the number is credits in the counter is not exhausted the barrier is released without a separate write from the host.
    Type: Grant
    Filed: February 1, 2018
    Date of Patent: June 29, 2021
    Assignee: Graphcore Limited
    Inventors: Daniel John Pelham Wilkinson, Stephen Felix, Matthew David Fyles, Richard Luke Southwell Osborne
  • Patent number: 11036673
    Abstract: A method of recording tile identifiers in each of a plurality of tiles of a multitile processor is described. Tiles are arranged in columns, each column having a plurality of processing circuits, each processing circuit comprising one or more tiles, wherein a base processing circuit in each column is connected to a set of processing circuit identifier wires. A base value is generated on each of the set of processing circuit identifier wires for the base processing circuit in each column. At the base processing circuit, the base value on the set of processing circuit identifier wires is read and incremented by one. The incremented value is propagated to a next processing circuit in the column, and at the next processing circuit a unique identifier is recorded by concatenating an identifier of the column and the incremented value.
    Type: Grant
    Filed: May 22, 2019
    Date of Patent: June 15, 2021
    Assignee: Graphcore Limited
    Inventors: Stephen Felix, Jonathan Mangnall
  • Patent number: 11023290
    Abstract: A processing system comprising an arrangement of tiles and an interconnect between the tiles. The interconnect comprises synchronization logic for coordinating a barrier synchronization to be performed between a group of the tiles. The instruction set comprises a synchronization instruction taking an operand which selects one of a plurality of available modes each specifying a different membership of the group. Execution of the synchronization instruction cause a synchronization request to be transmitted from the respective tile to the synchronization logic, and instruction issue to be suspended on the respective tile pending a synchronization acknowledgement being received back from the synchronization logic. In response to receiving the synchronization request from all the tiles in the group as specified by the operand of the synchronization instruction, the synchronization logic returns the synchronization acknowledgment to the tiles in the specified group.
    Type: Grant
    Filed: February 1, 2018
    Date of Patent: June 1, 2021
    Assignee: Graphcore Limited
    Inventors: Daniel John Pelham Wilkinson, Simon Christian Knowles, Matthew David Fyles, Alan Graham Alexander, Stephen Felix
  • Patent number: 11023239
    Abstract: A processor comprising an execution unit, memory and one or more register files. The execution unit is configured to execute instances of machine code instructions from an instruction set. The types of instruction defined in the instruction set include a double-load instruction for loading from the memory to at least one of the one or more register files. The execution unit is configured so as, when the load instruction is executed, to perform a first load operation strided by a fixed stride, and a second load operation strided by a variable stride, the variable stride being specified in a variable stride register in one of the one or more register files.
    Type: Grant
    Filed: April 19, 2019
    Date of Patent: June 1, 2021
    Assignee: Graphcore Limited
    Inventors: Alan Graham Alexander, Simon Christian Knowles, Mrudula Chidambar Gore
  • Patent number: 10970131
    Abstract: A gateway for interfacing a host with a subsystem for acting as a work accelerator to the host, the gateway enabling the transfer of batches of data to and from the subsystem at pre-compiled data exchange synchronisation points attained by the subsystem. The gateway is configured to: receive from a storage system data determined by the host to be processed by the subsystem; store a number of credits indicating the availability of data for transfer to the subsystem at each pre-compiled data exchange synchronisation point; receive a synchronisation request from the subsystem when it attains a data exchange synchronisation point; and in response to determining that the number of credits comprises a non-zero number of credits: transmit a synchronisation acknowledgment to the subsystem; and cause the received data to be transferred to the subsystem.
    Type: Grant
    Filed: December 28, 2018
    Date of Patent: April 6, 2021
    Assignee: Graphcore Limited
    Inventors: Ola Tørudbakken, Daniel John Pelham Wilkinson, Richard Luke Southwell Osborne, Stephen Felix, Matthew David Fyles, Brian Manula, Harald Høeg
  • Patent number: 10963315
    Abstract: A system comprising: a first subsystem comprising one or more first processors, and a second subsystem comprising one or more second processors. The second subsystem is configured to process code over a series of steps delineated by barrier synchronizations, and in a current step, to send a descriptor to the first subsystem specifying a value of each of one or more parameters of each of one or more interactions that the second subsystem is programmed to perform with the first subsystem via an inter-processor interconnect in a subsequent step. The first subsystem is configured to execute a portion of code to perform one or more preparatory operations, based on the specified values of at least one of the one or more parameters of each interaction as specified by the descriptor, to prepare for said one or more interactions prior to the barrier synchronization leading into the subsequent phase.
    Type: Grant
    Filed: February 15, 2019
    Date of Patent: March 30, 2021
    Assignee: Graphcore Limited
    Inventors: David Lacey, Daniel John Pelham Wilkinson, Richard Luke Southwell Osborne, Matthew David Fyles
  • Patent number: 10956165
    Abstract: A processor comprising: an execution unit for executing a respective thread in each of a repeating sequence of time slots; and a plurality of context register sets, each comprising a respective set of registers for representing a state of a respective thread. The context register sets comprise a respective worker context register set for each of the number of time slots the execution unit is operable to interleave, and at least one extra context register set. The worker context register sets represent the respective states of worker threads and the extra context register set being represents the state of a supervisor thread. The processor is configured to begin running the supervisor thread in each of the time slots, and to enable the supervisor thread to then individually relinquish each of the time slots in which it is running to a respective one of the worker threads.
    Type: Grant
    Filed: February 1, 2018
    Date of Patent: March 23, 2021
    Assignee: Graphcore Limited
    Inventor: Simon Christian Knowles
  • Patent number: 10936008
    Abstract: The invention relates to a computer comprising: a plurality of processing units each having instruction storage holding a local program, an execution unit executing the local program, data storage for holding data; an input interface with a set of input wires, and an output interface with a set of output wires; a switching fabric connected to each of the processing units by the respective set of output wires and connectable to each of the processing units by the respective input wires via switching circuitry controllable by each processing unit; a synchronisation module operable to generate a synchronisation signal to control the computer to switch between a compute phase and an exchange phase, wherein the processing units are configured to execute their local programs according to a common clock, the local programs being such that in the exchange phase at least one processing unit executes a send instruction from its local program to transmit at a transmit time a data packet onto its output set of connection
    Type: Grant
    Filed: February 1, 2018
    Date of Patent: March 2, 2021
    Assignee: Graphcore Limited
    Inventors: Simon Christian Knowles, Daniel John Pelham Wilkinson, Richard Luke Southwell Osborne, Alan Graham Alexander, Stephen Felix, Jonathan Mangnall, David Lacey
  • Patent number: 10922063
    Abstract: A computer system comprises a work accelerator, a gateway the transfer of data to the accelerator from external storage, the accelerator executes a first compiled code sequence to perform computations on data transferred to the accelerator from the gateway. The first compiled code sequence comprises a synchronisation instruction indicating a barrier between a compute phase in which the compute instructions are executed and an exchange phase, wherein execution of the synchronisation instruction causes an indication of a pre-compiled data exchange synchronisation point to be transferred to the gateway. The gateway comprises a streaming engine storing a second compiled code sequence in the form of a set of data transfer instructions executable by the streaming engine to perform data transfer operations to stream data through the gateway in the exchange phase, wherein the first and second compiled code sequences are generated as a related set at compile time.
    Type: Grant
    Filed: December 28, 2018
    Date of Patent: February 16, 2021
    Assignee: Graphcore Limited
    Inventors: Ola Tørudbakken, Daniel John Pelham Wilkinson, Brian Manula, Harald Høeg
  • Patent number: 10802536
    Abstract: The invention relates to a computer implemented method of generating multiple programs to deliver a computerised function, each program to be executed in a processing unit of a computer comprising a plurality of processing units each having instruction storage for holding a local program, an execution unit for executing the local program and data storage for holding data, a switching fabric connected to an output interface of each processing unit and connectable to an input interface of each processing unit by switching circuitry controllable by each processing unit, and a synchronisation module operable to generate a synchronisation signal, the method comprising: generating a local program for each processing unit comprising a sequence of executable instructions; determining for each processing unit a relative time of execution of instructions of each local program whereby a local program allocated to one processing unit is scheduled to execute with a predetermined delay relative to a synchronisation signal
    Type: Grant
    Filed: February 1, 2018
    Date of Patent: October 13, 2020
    Assignee: Graphcore Limited
    Inventors: Simon Christian Knowles, Daniel John Pelham Wilkinson, Richard Luke Southwell Osborne, Alan Graham Alexander, Stephen Felix, Jonathan Mangnall, David Lacey
  • Publication number: 20200319974
    Abstract: A system comprising: a first subsystem comprising at least one first processor, and a second subsystem comprising one or more second processors. A first program is arranged to run on the at least one first processor, the first program being configured to send data from the first subsystem to the second subsystem. A second program is arranged to run on the one more second processors, the second program being configured to operate on the data content from the first subsystem. The first program is configured to set a checkpoint at successive points in time. At each checkpoint it records in memory of the first subsystem i) a program state of the second program, comprising a state of one or more registers on each of the second processors at the time of the checkpoint, and ii) a copy of the data content sent to the second subsystem since the respective checkpoint.
    Type: Application
    Filed: May 22, 2019
    Publication date: October 8, 2020
    Applicant: Graphcore Limited
    Inventors: David Lacey, Daniel John Pelham Wilkinson
  • Publication number: 20200293278
    Abstract: An execution unit for a processor, the execution unit comprising: a look up table having a plurality of entries, each of the plurality of entries comprising an initial estimate for a result of an operation; a preparatory circuit configured to search the look up table using an index value dependent upon the operand to locate an entry comprising a first initial estimate for a result of the operation; a plurality of processing circuits comprising at least one multiplier circuit; and control circuitry configured to provide the first initial estimate to the at least one multiplier circuit of the plurality of processing circuits so as perform processing, by the plurality of processing units, of the first initial estimate to generate the function result, said processing comprising applying one or more Newton Raphson iterations to the first initial estimate.
    Type: Application
    Filed: April 26, 2019
    Publication date: September 17, 2020
    Applicant: Graphcore Limited
    Inventors: Jonathan Mangnall, Stephen Felix
  • Publication number: 20200293315
    Abstract: An execution unit is described which is particularly configured to generate an exponential of an operand floating point format. The operand is multiplied by a fixed multiplicant, logged to the base 2 (e) to generate a multiplication result. An integer part and a fractional part are extracted from the multiplication result. An exponent register stores the integer part to form the exponent of the exponential result. A lookup table has a plurality of entries each providing a value of 2f for a fractional part f used to access a lookup table. The fractional part is derived from a mantissa of the operand. That is, first and second bit sequences are extracted from the mantissa. One of the bit sequences is used to generate an estimated fractional component, and the other is used to access a value from the lookup table. The estimated fractional component and a looked up value are multiplied.
    Type: Application
    Filed: May 31, 2019
    Publication date: September 17, 2020
    Applicant: Graphcore Limited
    Inventors: Jonathan Mangnall, Stephen Felix
  • Publication number: 20200293285
    Abstract: An execution unit for a processor, the execution unit comprising: a look up table; a preparatory circuit configured to determine an index value in dependence upon the operand and search the look up table using the index value to locate an entry comprising a natural logarithm associated with the index value; control circuitry configured to provide a first value determined in dependence upon the operand and a second value determined in dependence upon the operand as inputs to at least one multiplier circuit of the execution unit so as to evaluate terms of a Taylor series expansion of a natural logarithm, wherein the control circuitry is configured to provide the natural logarithm associated with the index value and the terms of the Taylor series expansion as inputs to at least one addition circuit so as to generate a mantissa of a natural logarithm of the operand.
    Type: Application
    Filed: May 31, 2019
    Publication date: September 17, 2020
    Applicant: Graphcore Limited
    Inventor: Jonathan Mangnall
  • Publication number: 20200233670
    Abstract: A processor comprising an execution unit, memory and one or more register files. The execution unit is configured to execute instances of machine code instructions from an instruction set. The types of instruction defined in the instruction set include a double-load instruction for loading from the memory to at least one of the one or more register files. The execution unit is configured so as, when the load instruction is executed, to perform a first load operation strided by a fixed stride, and a second load operation strided by a variable stride, the variable stride being specified in a variable stride register in one of the one or more register files.
    Type: Application
    Filed: April 19, 2019
    Publication date: July 23, 2020
    Applicant: Graphcore Limited
    Inventors: Alan Graham Alexander, Simon Christian Knowles, Mrudula Chidambar Gore