Patents Assigned to Graphcore Limited

Scheduling tasks in a multi-threaded processor

Patent number: 10606641

Abstract: A processor comprising: an execution unit for executing a respective thread in each of a repeating sequence of time slots; and a plurality of context register sets, each comprising a respective set of registers for representing a state of a respective thread. The context register sets comprise a respective worker context register set for each of the number of time slots the execution unit is operable to interleave, and at least one extra context register set. The worker context register sets represent the respective states of worker threads and the extra context register set being represents the state of a supervisor thread. The processor is configured to begin running the supervisor thread in each of the time slots, and to enable the supervisor thread to then individually relinquish each of the time slots in which it is running to a respective one of the worker threads.

Type: Grant

Filed: October 19, 2018

Date of Patent: March 31, 2020

Assignee: Graphcore Limited

Inventor: Simon Christian Knowles
Parallel computing

Patent number: 10585716

Abstract: A method for executing a computer program, the method implemented by a processor comprising a plural number of computing units and an interconnect connected to the computing units, wherein each computing unit comprises a processing unit and a memory having at least two memory ports, each port assignable to one or more respective regions of the memory, wherein the method comprises at each computing unit: performing an initial step of the program to write: an initial output value to an output region of the memory, and an initial input value to an input region of the memory; and performing a subsequent step of the program by: in a compute phase: assigning one of the two ports to both the input region and the output region; executing code sequences on the processing unit to compute an output set of one or more new output values, and writing the output set to the output region, the output set computed from the initial output and initial input values, each of which is retrieved via said one port in the compute phas

Type: Grant

Filed: February 1, 2018

Date of Patent: March 10, 2020

Assignee: Graphcore Limited

Inventor: Simon Christian Knowles
Synchronization in a multi-tile, multi-chip processing arrangement

Patent number: 10579585

Abstract: A method of operating a system comprising multiple processor tiles divided into a plurality of domains wherein within each domain the tiles are connected to one another via a respective instance of a time-deterministic interconnect and between domains the tiles are connected to one another via a non-time-deterministic interconnect. The method comprises: performing a compute stage, then performing a respective internal barrier synchronization within each domain, then performing an internal exchange phase within each domain, then performing an external barrier synchronization to synchronize between different domains, then performing an external exchange phase between the domains.

Type: Grant

Filed: February 1, 2018

Date of Patent: March 3, 2020

Assignee: Graphcore Limited

Inventors: Daniel John Pelham Wilkinson, Stephen Felix, Richard Luke Southwell Osborne, Simon Christian Knowles, Alan Graham Alexander, Ian James Quinn
Controlling timing in computer processing

Patent number: 10579582

Abstract: A computer program comprising a sequence of instructions for execution on a processing unit having instruction storage for holding the computer program, an execution unit for executing the computer program and data storage for holding data, the computer program comprising: a switch control instruction which when executed causes the processing unit to control switching circuitry to connect a set of connection wires of the processing unit to a switching fabric to receive a data packet at a predetermined received time, the switch control instruction comprising a delay control field which holds a value defining a delay between issuance of the instruction in the sequence of instructions and its execution by the execution unit.

Type: Grant

Filed: February 1, 2018

Date of Patent: March 3, 2020

Assignee: Graphcore Limited

Inventors: Richard Luke Southwell Osborne, Alan Graham Alexander, Stephen Felix
Synchronization in a multi-tile processing arrangement

Patent number: 10564970

Abstract: A processing system comprising multiple tiles and an interconnect between the tiles. The interconnect is used to communicate between a group of some or all of the tiles according to a bulk synchronous parallel scheme, whereby each tile in the group performs an on-tile compute phase followed by an inter-tile exchange phase with the exchange phase being held back until all tiles in the group have completed the compute phase. Each tile in the group has a local exit state upon completion of the compute phase. The instruction set comprises a synchronization instruction for execution by each tile upon completion of its compute phase to signal a sync request to logic in the interconnect. In response to receiving the sync request from all the tiles in the group, the logic releases the next exchange phase and also makes available an aggregated a state of all the tiles in the group.

Type: Grant

Filed: February 1, 2018

Date of Patent: February 18, 2020

Assignee: Graphcore Limited

Inventors: Simon Christian Knowles, Alan Graham Alexander
Sending data off-chip

Patent number: 10558595

Abstract: A processor comprising multiple tiles on the same chip, and an external interconnect for communicating data off-chip in the form of packets. The external interconnect comprises an external exchange block configured to provide flow control and queuing of the packets. One of the tiles is nominated by the compiler to send an external exchange request message to the exchange block on behalf of others with data to send externally. The exchange sends an exchange-on message to a first of these tiles, to cause the first tile to start sending packets via the external interconnect. Then, once this tile has sent its last data packet, the exchange block sends an exchange-off control packet to this tile to cause it to stop sending packets, and sends another exchange-on message to the next tile with data to send, and so forth.

Type: Grant

Filed: October 19, 2018

Date of Patent: February 11, 2020

Assignee: Graphcore Limited

Inventors: Daniel John Pelham Wilkinson, Richard Luke Southwell Osborne, Stephen Felix, Graham Bernard Cunningham, Alan Graham Alexander
SYNCHRONIZATION AND EXCHANGE OF DATA BETWEEN PROCESSORS

Publication number: 20200012536

Abstract: A system comprising: a first subsystem comprising one or more first processors, and a second subsystem comprising one or more second processors. The second subsystem is configured to process code over a series of steps delineated by barrier synchronizations, and in a current step, to send a descriptor to the first subsystem specifying a value of each of one or more parameters of each of one or more interactions that the second subsystem is programmed to perform with the first subsystem via an inter-processor interconnect in a subsequent step. The first subsystem is configured to execute a portion of code to perform one or more preparatory operations, based on the specified values of at least one of the one or more parameters of each interaction as specified by the descriptor, to prepare for said one or more interactions prior to the barrier synchronization leading into the subsequent phase.

Type: Application

Filed: February 15, 2019

Publication date: January 9, 2020

Applicant: Graphcore Limited

Inventors: David Lacey, Daniel John Pelham Wilkinson, Richard Luke Southwell Osborne, Matthew David Fyles
GATEWAY FABRIC PORTS

Publication number: 20200014560

Abstract: A gateway for interfacing a host with a subsystem for acting as a work accelerator to the host. The gateway enables the transfer of batches of data to the subsystem at precompiled data exchange synchronisation points. The gateway acts to route data between accelerators which are connected in a scaled system of multiple gateways and accelerators using a global address space set up at compile time of an application to run on the computer system.

Type: Application

Filed: December 28, 2018

Publication date: January 9, 2020

Applicant: Graphcore Limited

Inventors: Ola Tørudbakken, Daniel John Pelham Wilkinson, Brian Manula
CODE COMPILATION FOR SCALING ACCELERATORS

Publication number: 20200012482

Abstract: A computer system comprises a work accelerator, a gateway the transfer of data to the accelerator from external storage, the accelerator executes a first compiled code sequence to perform computations on data transferred to the accelerator from the gateway. The first compiled code sequence comprises a synchronisation instruction indicating a barrier between a compute phase in which the compute instructions are executed and an exchange phase, wherein execution of the synchronisation instruction causes an indication of a pre-compiled data exchange synchronisation point to be transferred to the gateway. The gateway comprises a streaming engine storing a second compiled code sequence in the form of a set of data transfer instructions executable by the streaming engine to perform data transfer operations to stream data through the gateway in the exchange phase, wherein the first and second compiled code sequences are generated as a related set at compile time.

Type: Application

Filed: December 28, 2018

Publication date: January 9, 2020

Applicant: Graphcore Limited

Inventors: Ola Tørudbakken, Daniel John Pelham Wilkinson, Brian Manula, Harald Høeg
GATEWAY TO GATEWAY SYNCHRONISATION

Publication number: 20200012533

Abstract: A gateway in a computing system for interfacing a host with a subsystem for acting as a work accelerator to the host, the gateway having: an accelerator interface for enabling the transfer of batches of data to the subsystem at pre-compiled data exchange synchronisation points attained by the subsystem; a data connection interface for receiving data to be processed from storage; and a gateway interface for connection to a third gateway. The gateway is configured to store a number of credits indicating at least one of: the availability of data for transfer to the subsystem at a pre-compiled data exchange synchronisation point; and the availability of storage for receiving data from the subsystem at a pre-compiled data exchange synchronisation point. The gateway uses these credits to control whether or not synchronisation barrier is passed by transmitting synchronisation requests upstream to the third gateway or simply acknowledging the requests received.

Type: Application

Filed: December 28, 2018

Publication date: January 9, 2020

Applicant: Graphcore Limited

Inventors: Ola Tørudbakken, Daniel John Pelham Wilkinson, Brian Manula, Harald Høeg
DATA THROUGH GATEWAY

Publication number: 20200012609

Abstract: A gateway for use in a computing system to interface a host with the subsystem for acting as a work accelerator to the host, the gateway having: an accelerator interface for connection to the subsystem to enable transfer of batches of data between the subsystem and the gateway; a data connection interface for connection to external storage for exchanging data between the gateway and storage; a gateway interface for connection to at least one second gateway; a memory interface connected to a local memory associated with the gateway; and a streaming engine for controlling the streaming of batches of data into and out of the gateway in response to pre-compiled data exchange synchronisation points attained by the subsystem, wherein the streaming of batches of data are selectively via at least one of the accelerator interface, data connection interface, gateway interface and memory interface.

Type: Application

Filed: December 28, 2018

Publication date: January 9, 2020

Applicant: Graphcore Limited

Inventors: Ola Tørudbakken, Brian Manula, Harald Høeg
STREAMING ENGINE

Publication number: 20200012534

Abstract: A gateway for interfacing a host with a subsystem for acting as a work accelerator to the host. The gateway enables the transfer of batches of data to the subsystem at precompiled data exchange synchronisation points. The gateway comprises a streaming engine having a data mover engine and a memory management engine, the data mover engine and memory management engine being configured to execute instructions in coordination from work descriptors. The memory management engine is configured to execute instructions from the work descriptor to transfer data between external storage and the local memory associated with the gateway. The data mover engine is configured to execute instructions from the work descriptor to transfer data between the local memory associated with the gateway and the subsystem.

Type: Application

Filed: December 28, 2018

Publication date: January 9, 2020

Applicant: Graphcore Limited

Inventors: Ola Tørudbakken, Daniel John Pelham Wilkinson, Richard Luke Southwell Osborne, Brian Manula, Harald Høeg
HOST PROXY ON GATEWAY

Publication number: 20200014631

Abstract: A gateway for interfacing a host with a subsystem for acting as a work accelerator to the host, the gateway enabling the transfer of batches of data to and from the subsystem at pre-compiled data exchange synchronisation points attained by the subsystem. The gateway is configured to: receive from a storage system data determined by the host to be processed by the subsystem; store a number of credits indicating the availability of data for transfer to the subsystem at each pre-compiled data exchange synchronisation point; receive a synchronisation request from the subsystem when it attains a data exchange synchronisation point; and in response to determining that the number of credits comprises a non-zero number of credits: transmit a synchronisation acknowledgment to the subsystem; and cause the received data to be transferred to the subsystem.

Type: Application

Filed: December 28, 2018

Publication date: January 9, 2020

Applicant: Graphcore Limited

Inventors: Ola Tørudbakken, Daniel John Pelham Wikinson, Richard Luke Sothwell Osborne, Stephen Felix, Matthew David Fyles, Brian Manula, Harald Høeg
Combining states of multiple threads in a multi-threaded processor

Patent number: 10452396

Abstract: A processor comprising: an execution unit, multiple context register sets, a scheduler arranged to control the execution unit to provide a repeating sequence of temporally interleaved time slots, thereby enabling at least one respective worker thread to be allocated for execution in each respective one of some or all of the time slots, wherein a program state of the respective worker thread currently executing in each time slot is maintained in a respective one of the context register sets; and an exit state register arranged to store an aggregated exit state the worker threads. The instruction set comprises an exit instruction for inclusion in each worker thread, the exit state instruction taking an individual exit state of the respective thread as an operand. The exit instruction terminates the respective worker and also cause the individual exit state specified in the operand to contribute to the aggregated exit state.

Type: Grant

Filed: February 1, 2018

Date of Patent: October 22, 2019

Assignee: Graphcore Limited

Inventor: Simon Christian Knowles
Direction indicator

Patent number: 10360175

Abstract: An indication of a direction of transmission over the switching fabric is inserted into a data packet that is transmitted from a tile. The indication of direction may indicate directions from the transmitting tile in which intended recipient tiles are present. The switching fabric prevents (e.g. by blocking the data packet at one of a series of latches) the transmission in a direction not indicated in the data packet. Hence, power saving may be achieved, by preventing the unnecessary transmission of data packets over parts of the switching fabric.

Type: Grant

Filed: February 1, 2018

Date of Patent: July 23, 2019

Assignee: Graphcore Limited

Inventors: Stephen Felix, Jonathan Mangnall
SYNCHRONIZATION IN A MULTI-TILE PROCESSING ARRAY

Publication number: 20190155328

Abstract: The invention relates to a computer comprising: a plurality of processing units each having instruction storage holding a local program, an execution unit executing the local program, data storage for holding data; an input interface with a set of input wires, and an output interface with a set of output wires; a switching fabric connected to each of the processing units by the respective set of output wires and connectable to each of the processing units by the respective input wires via switching circuitry controllable by each processing unit; a synchronisation module operable to generate a synchronisation signal to control the computer to switch between a compute phase and an exchange phase, wherein the processing units are configured to execute their local programs according to a common clock, the local programs being such that in the exchange phase at least one processing unit executes a send instruction from its local program to transmit at a transmit time a data packet onto its output set of connection

Type: Application

Filed: October 19, 2018

Publication date: May 23, 2019

Applicant: Graphcore Limited

Inventors: Simon Christian Knowles, Daniel John Pelham Wilkinson, Richard Luke Southwell Osborne, Alan Graham Alexander, Stephen Felix, Jonathan Mangnall, David Lacey
SENDING DATA OFF-CHIP

Publication number: 20190155768

Abstract: A processor comprising multiple tiles on the same chip, and an external interconnect for communicating data off-chip in the form of packets. The external interconnect comprises an external exchange block configured to provide flow control and queuing of the packets. One of the tiles is nominated by the compiler to send an external exchange request message to the exchange block on behalf of others with data to send externally. The exchange sends an exchange-on message to a first of these tiles, to cause the first tile to start sending packets via the external interconnect. Then, once this tile has sent its last data packet, the exchange block sends an exchange-off control packet to this tile to cause it to stop sending packets, and sends another exchange-on message to the next tile with data to send, and so forth.

Type: Application

Filed: October 19, 2018

Publication date: May 23, 2019

Applicant: Graphcore Limited

Inventors: Daniel John Pelham Wilkinson, Richard Luke Southwell Osborne, Stephen Felix, Graham Bernard Cunningham, Alan Graham Alexander
SCHEDULING TASKS IN A MULTI-THREADED PROCESSOR

Publication number: 20190155648

Abstract: A processor comprising: an execution unit for executing a respective thread in each of a repeating sequence of time slots; and a plurality of context register sets, each comprising a respective set of registers for representing a state of a respective thread. The context register sets comprise a respective worker context register set for each of the number of time slots the execution unit is operable to interleave, and at least one extra context register set. The worker context register sets represent the respective states of worker threads and the extra context register set being represents the state of a supervisor thread. The processor is configured to begin running the supervisor thread in each of the time slots, and to enable the supervisor thread to then individually relinquish each of the time slots in which it is running to a respective one of the worker threads.

Type: Application

Filed: October 19, 2018

Publication date: May 23, 2019

Applicant: Graphcore Limited

Inventor: Simon Christian Knowles
SYNCHRONIZATION AMONGST PROCESSOR TILES

Publication number: 20190121679

Abstract: A processing system comprising an arrangement of tiles and an interconnect between the tiles. The interconnect comprises synchronization logic for coordinating a barrier synchronization to be performed between a group of the tiles. The instruction set comprises a synchronization instruction taking an operand which selects one of a plurality of available modes each specifying a different membership of the group. Execution of the synchronization instruction cause a synchronization request to be transmitted from the respective tile to the synchronization logic, and instruction issue to be suspended on the respective tile pending a synchronization acknowledgement being received back from the synchronization logic. In response to receiving the synchronization request from all the tiles in the group as specified by the operand of the synchronization instruction, the synchronization logic returns the synchronization acknowledgment to the tiles in the specified group.

Type: Application

Filed: February 1, 2018

Publication date: April 25, 2019

Applicant: Graphcore Limited

Inventors: Daniel John Pelham Wilkinson, Simon Christian Knowles, Matthew David Fyles, Alan Graham Alexander, Stephen Felix
COMBINING STATES OF MULTIPLE THREADS IN A MULTI-THREADED PROCESSOR

Publication number: 20190121638

Abstract: A processor comprising: an execution unit, multiple context register sets, a scheduler arranged to control the execution unit to provide a repeating sequence of temporally interleaved time slots, thereby enabling at least one respective worker thread to be allocated for execution in each respective one of some or all of the time slots, wherein a program state of the respective worker thread currently executing in each time slot is maintained in a respective one of the context register sets; and an exit state register arranged to store an aggregated exit state the worker threads. The instruction set comprises an exit instruction for inclusion in each worker thread, the exit state instruction taking an individual exit state of the respective thread as an operand. The exit instruction terminates the respective worker and also cause the individual exit state specified in the operand to contribute to the aggregated exit state.

Type: Application

Filed: February 1, 2018

Publication date: April 25, 2019

Applicant: Graphcore Limited

Inventor: Simon Christian Knowles

prev 1 2 3 4 5 next