Patents Assigned to Graphcore Limited
  • Patent number: 11644884
    Abstract: There is disclosed a method of controlling the frequency of a clock signal in a processor. The method selects a first clock generator to provide a processor clock signal for executing an application. If a threshold event is detected, a second clock generator is selected. The method reduces the frequency of a clock signal generated by the first clock generator while a processor clock signal is being provided for execution of an application from the second clock generator. The second clock generator generates a clock at a lower speed than the first clock generator. After a predetermined time, the first clock generator is reselected to provide the processor clock signal. The threshold detection is repeated until an optimum clock frequency is discovered.
    Type: Grant
    Filed: August 17, 2021
    Date of Patent: May 9, 2023
    Assignee: GRAPHCORE LIMITED
    Inventors: Stephen Felix, Mrudula Gore
  • Patent number: 11645225
    Abstract: A computer, including a plurality of processing nodes arranged in two-dimensional arrays in respective front and rear layers. Each processing node has a set of activatable links. When activated, transmission of data items between the nodes connected via the activated link is enabled. When not activated, transmission of data items between the nodes is prevented. The set of activatable links including a respective link which connects the processing node to each adjacent node in the array, and to a facing processing node in the other layer. An allocation engine is configured to receive an allocation instruction and connected to the processing nodes to selectively activate the links in a configuration.
    Type: Grant
    Filed: August 10, 2022
    Date of Patent: May 9, 2023
    Assignee: GRAPHCORE LIMITED
    Inventor: Simon Knowles
  • Patent number: 11637682
    Abstract: An apparatus is provided for converting the form in which a synchronisation request for a barrier synchronisation is provided. The synchronisation request is provided from a first synchronisation circuitry to a second synchronisation circuitry by asserting one of a set of separate signals that may each correspond to a bit in a register or a signal on a wire. The second synchronisation circuitry provides for the packetisation of the sync request by sending a packet comprising the sync request over a network to be received at a further subsystem.
    Type: Grant
    Filed: July 14, 2021
    Date of Patent: April 25, 2023
    Assignee: GRAPHCORE LIMITED
    Inventors: Martin Vickers, Daniel John Pelham Wilkinson
  • Patent number: 11635966
    Abstract: Aspects of the present disclosure provide a processor having: an execution unit configured to execute machine code instructions, at least one of the machine code instructions requiring multiple cycles for its execution; instruction memory holding instructions for execution, wherein the execution unit is configured to access the memory to fetch instructions for execution; an instruction injection mechanism configured to inject an instruction into the execution pipeline during execution of the at least one machine code instruction fetched from the memory; the execution unit configured to pause execution of the at least one machine code instruction, to execute the injected instruction to termination, to detect termination of the injected instruction and to automatically recommence execution of the at least one machine code instruction on detection of termination of the injected instruction.
    Type: Grant
    Filed: June 1, 2021
    Date of Patent: April 25, 2023
    Assignee: GRAPHCORE LIMITED
    Inventors: James Pallister, Jamie Hanlon
  • Patent number: 11630986
    Abstract: A method for generating an executable program to run on a system of one or more processor chips each comprising a plurality of tiles. The method comprises: receiving a graph comprising a plurality of data nodes, compute vertices and directional edges, wherein the graph is received in a first graph format that does not specify which data nodes and vertices are allocated to which of the tiles; and generating an application programming interface, API, for converting the graph, to determine a tile-mapping allocating the data nodes and vertices amongst the tiles. The generating of the API comprises searching the graph to identify compute vertices which match any of a predetermined set of one or more compute vertex types. The API is then called to convert the graph to a second graph format that includes the tile-mapping, including the allocation by the assigned memory allocation functions.
    Type: Grant
    Filed: April 2, 2020
    Date of Patent: April 18, 2023
    Assignee: GRAPHCORE LIMITED
    Inventor: David Norman
  • Patent number: 11630983
    Abstract: A method for generating an executable program to run on a system of one or more processor chips each comprising a plurality of tiles. The method comprises: receiving a graph comprising a plurality of data nodes, compute vertices and directional edges, wherein the graph is received in a first graph format that does not specify which data nodes and vertices are allocated to which of the tiles; and generating an application programming interface, API, for converting the graph, to determine a tile-mapping allocating the data nodes and vertices amongst the tiles. The generating of the API comprises searching the graph to identify compute vertices which match any of a predetermined set of one or more compute vertex types. The API is then called to convert the graph to a second graph format that includes the tile-mapping, including the allocation by the assigned memory allocation functions.
    Type: Grant
    Filed: July 31, 2019
    Date of Patent: April 18, 2023
    Assignee: GRAPHCORE LIMITED
    Inventor: David Norman
  • Patent number: 11625356
    Abstract: A computer comprising a plurality of interconnected processing nodes arranged in a configuration in which multiple layers of interconnected nodes are arranged along an axis, each layer comprising at least four processing nodes connected in a non-axial ring by at least respective intralayer link between each pair of neighbouring processing nodes, wherein each of the at least four processing nodes in each layer is connected to a respective corresponding node in one or more adjacent layer by a respective interlayer link, the computer being programmed to provide in the configuration two embedded one dimensional paths and to transmit data around each of the two embedded one dimensional paths, each embedded one dimensional path using all processing nodes of the computer in such a manner that the two embedded one dimensional paths operate simultaneously without sharing links.
    Type: Grant
    Filed: March 24, 2021
    Date of Patent: April 11, 2023
    Assignee: GRAPHCORE LIMITED
    Inventor: Simon Knowles
  • Patent number: 11625061
    Abstract: Two clocks, a fast clock and a slow clock are provided for clocking a processing unit. A plurality of frequency settings, referred to as gears, are defined for the two clock. Each of these gears indicates a maximum frequency for the fast clock and a minimum frequency for the slow clock, such that the gap between the two frequencies may be kept to a manageable level so as to reduce transients upon switching between the two clocks. The system switches between the gears as required. In response to a determination to increase the frequency of the clock signal, a higher gear is selected at which the maximum and minimum frequencies defined for that gear are higher than the previous selected gear.
    Type: Grant
    Filed: June 16, 2021
    Date of Patent: April 11, 2023
    Assignee: GRAPHCORE LIMITED
    Inventors: Simon Douglas Chambers, Stephen Felix, Ian Malcolm King
  • Patent number: 11625357
    Abstract: A data processing system comprising a plurality of processors, wherein each of the processors is configured to perform data transfer operations to transfer outgoing data to one or more others of the processors during a first of the exchange stages; receive incoming data from the one or more others of the processors during the first of the exchange stages; determine further outgoing data in dependence upon at least part of the incoming data; count an amount of at least part the incoming data received during the first of the exchange stages from the one or more others of the processors; and in response to determining that the amount of the at least part of the incoming data received has reached a predefined amount, perform data transfer operations to transfer the further outgoing data to the one or more others of the processors during a second of the exchange stages.
    Type: Grant
    Filed: April 9, 2020
    Date of Patent: April 11, 2023
    Assignee: GRAPHCORE LIMITED
    Inventor: Lars Paul Huse
  • Patent number: 11614789
    Abstract: A system and method for docking a processing unit provided. According to the method, the system dithers between the two signals provided by the two clock generators so as to clock the processing unit at an average clock frequency having a value between the frequencies of the two signals. The average clock frequency is adjusted by modifying the proportion of time spent on one clock signal vs the other clock signal.
    Type: Grant
    Filed: June 17, 2021
    Date of Patent: March 28, 2023
    Assignee: GRAPHCORE LIMITED
    Inventor: Ian Malcolm King
  • Patent number: 11615053
    Abstract: A processor in a network has a plurality of processing units arranged on a chip. An on-chip interconnect enables data to be exchanged between the processing units. A plurality of external interfaces are configured to communicate data off chip in the form of packets, each packet having a destination address identifying a destination of the packet. The external interfaces are connected to respective additional connected processors. A routing bus routes packets between the processing units and the external interfaces. A routing register defines a routing domain for the processor, the routing domain comprising one or more of the additional processor, and at least a subset of further additional processors of the network, wherein the additional processors of the subset are directly or indirectly connected to the processor. The routing domain can be modified by changing the contents of the routing register as a sliding window domain.
    Type: Grant
    Filed: July 13, 2021
    Date of Patent: March 28, 2023
    Assignee: GRAPHCORE LIMITED
    Inventors: Daniel John Pelham Wilkinson, Lars Paul Huse, Richard Luke Southwell Osborne, Graham Bernard Cunningham, Hachem Yassine
  • Patent number: 11615038
    Abstract: A gateway for use in a computing system to interface a host with the subsystem for acting as a work accelerator to the host, the gateway having: an accelerator interface for connection to the subsystem to enable transfer of batches of data between the subsystem and the gateway; a data connection interface for connection to external storage for exchanging data between the gateway and storage; a gateway interface for connection to at least one second gateway; a memory interface connected to a local memory associated with the gateway; and a streaming engine for controlling the streaming of batches of data into and out of the gateway in response to pre-compiled data exchange synchronisation points attained by the subsystem, wherein the streaming of batches of data are selectively via at least one of the accelerator interface, data connection interface, gateway interface and memory interface.
    Type: Grant
    Filed: December 28, 2018
    Date of Patent: March 28, 2023
    Assignee: Graphcore Limited
    Inventors: Ola Tørudbakken, Brian Manula, Harald Høeg
  • Patent number: 11614946
    Abstract: A computer comprising a plurality of processing nodes is provided. Each processing node has at least one processor configured to process input data to generate an array of data items. The processing nodes are arranged in cliques in which each processing node of a clique is connected to each other processing node in the clique by first and second clique links. The cliques are inter-connected in rings such that each processing node is a member of a single clique and a single ring. The processing nodes of all cliques are configured to exchange in each exchange step of a machine learning collective via the respective first and second clique links at least two data items with the other processing node(s) in its clique, and all processing nodes are configured to reduce each received data item with the data item in the corresponding position in the array on that processing node.
    Type: Grant
    Filed: March 26, 2020
    Date of Patent: March 28, 2023
    Assignee: GRAPHCORE LIMITED
    Inventor: Simon Knowles
  • Patent number: 11599363
    Abstract: A computer comprising a plurality of processors, each of which are configured to perform operations on data during a compute phase for the computer and, following a pre-compiled synchronisation barrier, exchange data with at least one other of the processors during an exchange phase for the computer, wherein of the processors in the computer is indexed and the data exchange operations carried out by each processor in the exchange phase depend upon its index value.
    Type: Grant
    Filed: April 6, 2020
    Date of Patent: March 7, 2023
    Assignee: GRAPHCORE LIMITED
    Inventors: Richard Osborne, Matthew Fyles
  • Patent number: 11593185
    Abstract: A processing system comprising multiple tiles and an interconnect between the tiles. The interconnect is used to communicate between a group of some or all of the tiles according to a bulk synchronous parallel scheme, whereby each tile in the group performs an on-tile compute phase followed by an inter-tile exchange phase with the exchange phase being held back until all tiles in the group have completed the compute phase. Each tile in the group has a local exit state upon completion of the compute phase. The instruction set comprises a synchronization instruction for execution by each tile upon completion of its compute phase to signal a sync request to logic in the interconnect. In response to receiving the sync request from all the tiles in the group, the logic releases the next exchange phase and also makes available an aggregated a state of all the tiles in the group.
    Type: Grant
    Filed: November 19, 2019
    Date of Patent: February 28, 2023
    Assignee: GRAPHCORE LIMITED
    Inventors: Simon Christian Knowles, Alan Graham Alexander
  • Patent number: 11586483
    Abstract: A processing system comprising an arrangement of tiles and an interconnect between the tiles. The interconnect comprises synchronization logic for coordinating a barrier synchronization to be performed between a group of the tiles. The instruction set comprises a synchronization instruction taking an operand which selects one of a plurality of available modes each specifying a different membership of the group. Execution of the synchronization instruction cause a synchronization request to be transmitted from the respective tile to the synchronization logic, and instruction issue to be suspended on the respective tile pending a synchronization acknowledgement being received back from the synchronization logic. In response to receiving the synchronization request from all the tiles in the group as specified by the operand of the synchronization instruction, the synchronization logic returns the synchronization acknowledgment to the tiles in the specified group.
    Type: Grant
    Filed: May 14, 2021
    Date of Patent: February 21, 2023
    Assignee: GRAPHCORE LIMITED
    Inventors: Daniel John Pelham Wilkinson, Simon Christian Knowles, Matthew David Fyles, Alan Graham Alexander, Stephen Felix
  • Patent number: 11567768
    Abstract: A processor is disclosed including: a barrel-threaded execution unit for executing concurrent threads, and a repeat cache shared between the concurrent threads. The processor's instruction set includes a repeat instruction which takes a repeat count operand. When the repeat cache is not claimed and the repeat instruction is executed in a first thread, a portion of code is cached from the first thread into the repeat cache, the state of the repeat cache is changed to record it as claimed, and the cached code is executed a number of times. When the repeat instruction is then executed in a further thread, then the already-cached portion of code is again executed a respective number of times, each time from the repeat cache. For each of the first and further instructions, the repeat count operand in the respective instruction specifies the number of times to execute the cached code.
    Type: Grant
    Filed: February 15, 2019
    Date of Patent: January 31, 2023
    Assignee: Graphcore Limited
    Inventors: Alan Graham Alexander, Simon Christian Knowles, Mrudula Chidambar Gore, Jonathan Louis Ferguson
  • Patent number: 11561799
    Abstract: An execution unit comprising a processing pipeline configured to perform calculations to evaluate a plurality of mathematical functions. The processing pipeline comprises a plurality of stages through which each calculation for evaluating a mathematical function progresses to an end result. Each of a plurality of processing circuits in the pipeline is configured to perform an operation on input values during at least one stage of the plurality of stages. The plurality of processing circuits include multiplier circuits. A first multiplier circuit and a second multiplier circuit are configured to operate in parallel, such that at the same stage in the processing pipeline, the first multiplier circuit and the second multiplier circuit perform their processing. A third multiplier circuit is arranged in series with the first multiplier circuit and the second multiplier circuit and processes outputs from the first multiplier circuit and the second multiplier circuit.
    Type: Grant
    Filed: June 3, 2021
    Date of Patent: January 24, 2023
    Assignee: GRAPHCORE LIMITED
    Inventor: Jonathan Mangnall
  • Patent number: 11561926
    Abstract: A time deterministic computer is architected so that exchange code compiled for one set of tiles, e.g., a column, can be reused on other sets. The computer comprises: a plurality of processing units each having an input interface with a set of input wires, and an output interface with a set of output wires: a switching fabric connected to each of the processing units by the respective set of output wires and connectable to each of the processing units by the respective set of output wires and connectable to each of the processing units by the respective input wires via switching circuitry controllable by its associated processing unit; the processing units arranged in columns, each column having a base processing unit proximate the switching fabric and multiple processing units one adjacent the other in respective positions in the direction of the column.
    Type: Grant
    Filed: January 20, 2022
    Date of Patent: January 24, 2023
    Assignee: GRAPHCORE LIMITED
    Inventors: Stephen Felix, Simon Christian Knowles
  • Patent number: 11550639
    Abstract: A work accelerator is connected to a gateway. The gateway enables the transfer of data to the work accelerator from an external storage at pre-compiled data synchronisation points attained by the work accelerator. The work accelerator is configured to send to a register of the gateway an indication of a sync group comprising the gateway. The work accelerator then sends to the gateway, a synchronisation request for a synchronisation to be performed at an upcoming pre-compiled data exchange synchronisation point. The sync propagation circuits are each configured to receive at least one synchronisation request and propagate or acknowledge the synchronisation request in dependence upon the indication of the sync group received from the work accelerator.
    Type: Grant
    Filed: May 6, 2021
    Date of Patent: January 10, 2023
    Assignee: GRAPHCORE LIMITED
    Inventor: Brian Manula