Patents by Inventor Alan Graham Alexander

Alan Graham Alexander has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Synchronization Amongst Processor Tiles

Publication number: 20210271527

Abstract: A processing system comprising an arrangement of tiles and an interconnect between the tiles. The interconnect comprises synchronization logic for coordinating a barrier synchronization to be performed between a group of the tiles. The instruction set comprises a synchronization instruction taking an operand which selects one of a plurality of available modes each specifying a different membership of the group. Execution of the synchronization instruction cause a synchronization request to be transmitted from the respective tile to the synchronization logic, and instruction issue to be suspended on the respective tile pending a synchronization acknowledgement being received back from the synchronization logic. In response to receiving the synchronization request from all the tiles in the group as specified by the operand of the synchronization instruction, the synchronization logic returns the synchronization acknowledgment to the tiles in the specified group.

Type: Application

Filed: May 14, 2021

Publication date: September 2, 2021

Inventors: Daniel John Pelham Wilkinson, Simon Christian Knowles, Matthew David Fyles, Alan Graham Alexander, Stephen Felix
Double-load instruction using a fixed stride and a variable stride for updating addresses between successive instructions

Patent number: 11061679

Abstract: A processor comprising an execution unit, memory and one or more register files. The execution unit is configured to execute instances of machine code instructions from an instruction set. The types of instruction defined in the instruction set include a double-load instruction for loading from the memory to at least one of the one or more register files. The execution unit is configured so as, when the load instruction is executed, to perform a first load operation strided by a fixed stride, and a second load operation strided by a variable stride, the variable stride being specified in a variable stride register in one of the one or more register files.

Type: Grant

Filed: April 19, 2019

Date of Patent: July 13, 2021

Assignee: Graphcore Limited

Inventors: Alan Graham Alexander, Simon Christian Knowles, Mrudula Chidambar Gore
Double-load instruction using a fixed stride and a variable stride for updating addresses between successive instructions

Patent number: 11023239

Abstract: A processor comprising an execution unit, memory and one or more register files. The execution unit is configured to execute instances of machine code instructions from an instruction set. The types of instruction defined in the instruction set include a double-load instruction for loading from the memory to at least one of the one or more register files. The execution unit is configured so as, when the load instruction is executed, to perform a first load operation strided by a fixed stride, and a second load operation strided by a variable stride, the variable stride being specified in a variable stride register in one of the one or more register files.

Type: Grant

Filed: April 19, 2019

Date of Patent: June 1, 2021

Assignee: Graphcore Limited

Inventors: Alan Graham Alexander, Simon Christian Knowles, Mrudula Chidambar Gore
Synchronization amongst processor tiles

Patent number: 11023290

Abstract: A processing system comprising an arrangement of tiles and an interconnect between the tiles. The interconnect comprises synchronization logic for coordinating a barrier synchronization to be performed between a group of the tiles. The instruction set comprises a synchronization instruction taking an operand which selects one of a plurality of available modes each specifying a different membership of the group. Execution of the synchronization instruction cause a synchronization request to be transmitted from the respective tile to the synchronization logic, and instruction issue to be suspended on the respective tile pending a synchronization acknowledgement being received back from the synchronization logic. In response to receiving the synchronization request from all the tiles in the group as specified by the operand of the synchronization instruction, the synchronization logic returns the synchronization acknowledgment to the tiles in the specified group.

Type: Grant

Filed: February 1, 2018

Date of Patent: June 1, 2021

Assignee: Graphcore Limited

Inventors: Daniel John Pelham Wilkinson, Simon Christian Knowles, Matthew David Fyles, Alan Graham Alexander, Stephen Felix
Synchronization in a multi-tile, multi-chip processing arrangement

Patent number: 11023413

Abstract: A method of operating a system comprising multiple processor tiles divided into a plurality of domains wherein within each domain the tiles are connected to one another via a respective instance of a time-deterministic interconnect and between domains the tiles are connected to one another via a non-time-deterministic interconnect. The method comprises: performing a compute stage, then performing a respective internal barrier synchronization within each domain, then performing an internal exchange phase within each domain, then performing an external barrier synchronization to synchronize between different domains, then performing an external exchange phase between the domains.

Type: Grant

Filed: December 23, 2019

Date of Patent: June 1, 2021

Assignee: GRAPHCORE LIMITED

Inventors: Daniel John Pelham Wilkinson, Stephen Felix, Richard Luke Southwell Osborne, Simon Christian Knowles, Alan Graham Alexander, Ian James Quinn
Synchronization in a multi-tile processing array

Patent number: 10963003

Abstract: The invention relates to a computer comprising: a plurality of processing units each having instruction storage holding a local program, an execution unit executing the local program, data storage for holding data; an input interface with a set of input wires, and an output interface with a set of output wires; a switching fabric connected to each of the processing units by the respective set of output wires and connectable to each of the processing units by the respective input wires via switching circuitry controllable by each processing unit; a synchronisation module operable to generate a synchronisation signal to control the computer to switch between a compute phase and an exchange phase, wherein the processing units are configured to execute their local programs according to a common clock, the local programs being such that in the exchange phase at least one processing unit executes a send instruction from its local program to transmit at a transmit time a data packet onto its output set of connection

Type: Grant

Filed: October 19, 2018

Date of Patent: March 30, 2021

Assignee: GRAPHCORE LIMITED

Inventors: Simon Christian Knowles, Daniel John Pelham Wilkinson, Richard Luke Southwell Osborne, Alan Graham Alexander, Stephen Felix, Jonathan Mangnall, David Lacey
Synchronization in a multi-tile processing array

Patent number: 10936008

Abstract: The invention relates to a computer comprising: a plurality of processing units each having instruction storage holding a local program, an execution unit executing the local program, data storage for holding data; an input interface with a set of input wires, and an output interface with a set of output wires; a switching fabric connected to each of the processing units by the respective set of output wires and connectable to each of the processing units by the respective input wires via switching circuitry controllable by each processing unit; a synchronisation module operable to generate a synchronisation signal to control the computer to switch between a compute phase and an exchange phase, wherein the processing units are configured to execute their local programs according to a common clock, the local programs being such that in the exchange phase at least one processing unit executes a send instruction from its local program to transmit at a transmit time a data packet onto its output set of connection

Type: Grant

Filed: February 1, 2018

Date of Patent: March 2, 2021

Assignee: Graphcore Limited

Inventors: Simon Christian Knowles, Daniel John Pelham Wilkinson, Richard Luke Southwell Osborne, Alan Graham Alexander, Stephen Felix, Jonathan Mangnall, David Lacey
Sending data from an arrangement of processor modules

Patent number: 10817444

Abstract: A system comprising an arrangement of multiple processor modules, and an external interconnect for communicating data in the form of packets to outside the arrangement. The interconnect comprises an exchange block configured to provide flow control. One of the processor modules is arranged to send an exchange request message to the exchange block on behalf of others with data to send outside the arrangement. The exchange block sends an exchange-on message to a first of these processor modules, to cause the first module to start sending packets via the interconnect. Then, once this processor module has sent its last data packet, the exchange block sends an exchange-off message to this processor module to cause it to stop sending packets, and sends another exchange-on message to the next processor module with data to send, and so forth.

Type: Grant

Filed: July 30, 2019

Date of Patent: October 27, 2020

Assignee: GRAPHCORE LIMITED

Inventors: Daniel John Pelham Wilkinson, Richard Luke Southwell Osborne, Stephen Felix, Graham Bernard Cunningham, Alan Graham Alexander
Compiler method

Patent number: 10802536

Abstract: The invention relates to a computer implemented method of generating multiple programs to deliver a computerised function, each program to be executed in a processing unit of a computer comprising a plurality of processing units each having instruction storage for holding a local program, an execution unit for executing the local program and data storage for holding data, a switching fabric connected to an output interface of each processing unit and connectable to an input interface of each processing unit by switching circuitry controllable by each processing unit, and a synchronisation module operable to generate a synchronisation signal, the method comprising: generating a local program for each processing unit comprising a sequence of executable instructions; determining for each processing unit a relative time of execution of instructions of each local program whereby a local program allocated to one processing unit is scheduled to execute with a predetermined delay relative to a synchronisation signal

Type: Grant

Filed: February 1, 2018

Date of Patent: October 13, 2020

Assignee: Graphcore Limited

Inventors: Simon Christian Knowles, Daniel John Pelham Wilkinson, Richard Luke Southwell Osborne, Alan Graham Alexander, Stephen Felix, Jonathan Mangnall, David Lacey
Debugging Mechanism

Publication number: 20200319991

Abstract: A processor comprising at least one processing module, each processing module comprising: an execution pipeline; memory; an instruction fetch unit comprising operable to switch between an operational mode and a debugging mode, the instruction fetch unit being configured so as, when in the operational mode, to fetch machine code instructions from the memory into the execution pipeline to be executed; and a debug interface for connecting to a debug adapter. The debug interface comprises a debug instruction register enabling the debug adapter to write a machine code instruction to the debug instruction register, and wherein the instruction fetch unit is configured so as, when in the debug mode, to fetch instructions from the debug instruction register into the pipeline instead of from the memory.

Type: Application

Filed: July 31, 2019

Publication date: October 8, 2020

Inventors: Alan Graham Alexander, Graham Bernard Cunningham
DOUBLE LOAD INSTRUCTION

Publication number: 20200233670

Abstract: A processor comprising an execution unit, memory and one or more register files. The execution unit is configured to execute instances of machine code instructions from an instruction set. The types of instruction defined in the instruction set include a double-load instruction for loading from the memory to at least one of the one or more register files. The execution unit is configured so as, when the load instruction is executed, to perform a first load operation strided by a fixed stride, and a second load operation strided by a variable stride, the variable stride being specified in a variable stride register in one of the one or more register files.

Type: Application

Filed: April 19, 2019

Publication date: July 23, 2020

Applicant: Graphcore Limited

Inventors: Alan Graham Alexander, Simon Christian Knowles, Mrudula Chidambar Gore
HANDLING EXCEPTIONS IN A MACHINE LEARNING PROCESSOR

Publication number: 20200225960

Abstract: A method for debugging a processor which is executing vertices of a software application is described. Each vertex is assigned to a programming thread of the processor. The processor has debug hardware for raising exceptions in certain break conditions. The method comprises inspecting a vertex identifier, comparing the vertex identifier and raising an instruction exception event for the programming thread if the vertex identifier assigned to the thread matches the vertex break identifier in the debug hardware. Exceptions are raised based on identified vertices, rather than just individual instructions or instruction addresses.

Type: Application

Filed: May 22, 2019

Publication date: July 16, 2020

Applicant: Graphcore Limited

Inventors: Alan Graham Alexander, Richard Luke Southwell Osborne, Matthew David Fyles
Exchange of data between processor modules

Patent number: 10705998

Abstract: A processing system comprising: multiple processor modules, each comprising a respective execution unit memory; and an interconnect for exchanging data between different sets of the processor modules. A group of the processor modules operates in a series of BSP supersteps. For the exchange phase of each superstep, each receiving processor module that is to receive data from outside its own set is pre-programmed with a value representing the number of units of data to receive. Starting from the pre-programmed value, it then counts out the number of data units remaining to be received each time a data unit is received. Each receiving processor module is further arranged to perform an exchange synchronization whereby, before advancing from the exchange phase to the compute phase of the current superstep, the receiving processor module waits until no units of data remain to be received according to the count.

Type: Grant

Filed: February 15, 2019

Date of Patent: July 7, 2020

Assignee: Graphcore Limited

Inventors: Daniel John Pelham Wilkinson, Alan Graham Alexander
Exchange of data between processor modules

Patent number: 10705999

Abstract: A processing system comprising: multiple processor modules, each comprising a respective execution unit memory; and an interconnect for exchanging data between different sets of the processor modules. A group of the processor modules operates in a series of steps. For an exchange phase of each step by each receiving processor module that is to receive data from outside its own set, the receiving module is pre-programmed with a value representing the number of units of data to receive. Starting from the pre-programmed value, it then counts out the number of data units remaining to be received each time a data unit is received. Each receiving processor module is further arranged to perform an exchange synchronization whereby, before advancing from the exchange phase to the compute phase of the current step, the receiving processor module waits until no units of data remain to be received according to the count.

Type: Grant

Filed: August 22, 2019

Date of Patent: July 7, 2020

Assignee: GRAPHCORE LIMITED

Inventors: Daniel John Pelham Wilkinson, Alan Graham Alexander
EXCHANGE OF DATA BETWEEN PROCESSOR MODULES

Publication number: 20200210364

Abstract: A processing system comprising: multiple processor modules, each comprising a respective execution unit memory; and an interconnect for exchanging data between different sets of the processor modules. A group of the processor modules operates in a series of BSP supersteps. For the exchange phase of each superstep, each receiving processor module that is to receive data from outside its own set is pre-programmed with a value representing the number of units of data to receive. Starting from the pre-programmed value, it then counts out the number of data units remaining to be received each time a data unit is received. Each receiving processor module is further arranged to perform an exchange synchronization whereby, before advancing from the exchange phase to the compute phase of the current superstep, the receiving processor module waits until no units of data remain to be received according to the count.

Type: Application

Filed: February 15, 2019

Publication date: July 2, 2020

Applicant: Graphcore Limited

Inventors: Daniel John Pelham Wilkinson, Alan Graham Alexander
INSTRUCTION CACHE IN A MULTI-THREADED PROCESSOR

Publication number: 20200210192

Abstract: A processor comprising: a barrel-threaded execution unit for executing concurrent threads, and a repeat cache shared between the concurrent threads. The processor's instruction set includes a repeat instruction which takes a repeat count operand. When the repeat cache is not claimed and the repeat instruction is executed in a first thread, a portion of code is cached from the first thread into the repeat cache, the state of the repeat cache is changed to record it as claimed, and the cached code is executed a number of times. When the repeat instruction is then executed in a further thread, then the already-cached portion of code is again executed a respective number of times, each time from the repeat cache. For each of the first and further instructions, the repeat count operand in the respective instruction specifies the number of times to execute the cached code.

Type: Application

Filed: February 15, 2019

Publication date: July 2, 2020

Applicant: Graphcore Limited

Inventors: Alan Graham Alexander, Simon Christian Knowles, Mrudula Chidambar Gore, Jonathan Louis Ferguson
REGISTER FILES IN A MULTI-THREADED PROCESSOR

Publication number: 20200210175

Abstract: A processor comprising a barrel-threaded execution unit for executing concurrent threads, and one or more register files comprising a respective set of context registers for each concurrent thread. One of the register files further comprises a set of shared weights registers common to some or all of the concurrent threads. The types of instruction defined in the instruction set of the processor include an arithmetic instruction having operands specifying a source and a destination from amongst a respective set of arithmetic registers of the thread in which the arithmetic instruction is executed. The execution unit is configured so as, in response to the opcode of the arithmetic instruction, to perform an operation comprising multiplying an input from the source by at least one of the weights from at least one of the shared weights registers, and to place a result in the destination.

Type: Application

Filed: February 15, 2019

Publication date: July 2, 2020

Applicant: Graphcore Limited

Inventors: Alan Graham Alexander, Simon Christian Knowles, Mrudula Chidambar Gore
EXCHANGE OF DATA BETWEEN PROCESSOR MODULES

Publication number: 20200210365

Abstract: A processing system comprising: multiple processor modules, each comprising a respective execution unit memory; and an interconnect for exchanging data between different sets of the processor modules. A group of the processor modules operates in a series of steps. For an exchange phase of each step by each receiving processor module that is to receive data from outside its own set, the receiving module is pre-programmed with a value representing the number of units of data to receive. Starting from the pre-programmed value, it then counts out the number of data units remaining to be received each time a data unit is received. Each receiving processor module is further arranged to perform an exchange synchronization whereby, before advancing from the exchange phase to the compute phase of the current step, the receiving processor module waits until no units of data remain to be received according to the count.

Type: Application

Filed: August 22, 2019

Publication date: July 2, 2020

Inventors: Daniel John Pelham Wilkinson, Alan Graham Alexander
LOAD-STORE INSTRUCTION

Publication number: 20200210187

Abstract: A processor having an instruction set including a load-store instruction having operands specifying, from amongst the registers in at least one register file, a respective destination of each of two load operations, a respective source of a store operation, and a pair of address registers arranged to hold three memory addresses, the three memory addresses being a respective load address for each the two load operations and a respective store address for the store operation. The load-store instruction further includes three immediate stride operands each specifying a respective stride value for each of the two load addresses and one store address, wherein at least some possible values of each immediate stride operand specify the respective stride value by specifying one of a plurality of fields within a stride register in one of the one or more register files, each field holding a different stride value.

Type: Application

Filed: February 15, 2019

Publication date: July 2, 2020

Applicant: Graphcore Limited

Inventors: Alan Graham Alexander, Simon Christian Knowles, Mrudula Chidambar Gore
OVERFLOW CONDITION

Publication number: 20200201600

Abstract: A method and apparatus for handling overflow conditions resulting from arithmetic operations involving floating point numbers. An indication is stored as part of a thread's context indicating one of two possible modes for handling overflow conditions. In a first mode, a result of an arithmetic operation is set to the limit representable in the floating point format. In a second mode, a result of an arithmetic operation is set to a NaN.

Type: Application

Filed: April 26, 2019

Publication date: June 25, 2020

Applicant: Graphcore Limited

Inventors: Alan Graham Alexander, Edward Andrews, Stephen Felix, Mrudula Chidambar Gore

prev 1 2 3 4 next