Patents Assigned to Graphcore Limited
-
Patent number: 11119559Abstract: There is disclosed a method of controlling the frequency of a clock signal in a processor. The method selects a first clock generator to provide a processor clock signal for executing an application. If a threshold event is detected, a second clock generator is selected. The method reduces the frequency of a clock signal generated by the first clock generator while a processor clock signal is being provided for execution of an application from the second clock generator. The second clock generator generates a clock at a lower speed than the first clock generator. After a predetermined time the first clock generator is reselected to provide the processor clock signal. The threshold detection is repeated until an optimum clock frequency is discovered.Type: GrantFiled: May 31, 2019Date of Patent: September 14, 2021Assignee: GRAPHCORE LIMITEDInventors: Stephen Felix, Mrudula Gore
-
Patent number: 11119873Abstract: A processor comprises a plurality of processing units, wherein there is a fixed transmission time for transmitting a message from a sending processing unit to a receiving processing unit, based on the physical positions of the sending and receiving processing units in the processor. The processing units are arranged in a column, and the fixed transmission time depends on the position of a processing circuit in the column. An exchange fabric is provided for exchanging messages between sending and receiving processing units, the columns being arranged with respect to the exchange fabric such that the fixed transmission time depends on the distances of the processing circuits with respect to the exchange fabric. The processor comprises at least one delay stage for each processing circuit and switching circuitry for selectively switching the delay stage into or out of a communication path involved in message exchange.Type: GrantFiled: April 26, 2019Date of Patent: September 14, 2021Assignee: GRAPHCORE LIMITEDInventor: Stephen Felix
-
Patent number: 11119952Abstract: A gateway for use in a computing system to interface a host with the subsystem for acting as a work accelerator to the host, the gateway having an streaming engine for controlling the streaming of batches of data into and out of the gateway in response to pre-compiled data exchange synchronisation points attained by the subsystem, wherein the streaming of batches of data is selectively via at least one of an accelerator interface, a data connection interface, a gateway interface and an memory interface, wherein the streaming engine is configured to perform data preparation processing of the batches of data streamed into the gateway prior to said batches of data being streamed out of the gateway, wherein the data preparation processing comprises at least one of: data augmentation; decompression; and decryption.Type: GrantFiled: July 2, 2019Date of Patent: September 14, 2021Assignee: GRAPHCORE LIMITEDInventors: Ola Torudbakken, Brian Manula
-
Patent number: 11113060Abstract: A processing apparatus comprising one or more processing modules, each comprising an execution unit. The one or more processing modules are operable to run a plurality of parallel or concurrent threads, and the processing apparatus further comprises a storage location for storing an aggregated exit state of the plurality of threads. An instruction set of the processing apparatus comprises an exit instruction for inclusion in each of the plurality of threads, the exit state instruction taking an individual exit state of the respective thread as an operand. The exit instruction terminates the respective thread and also causes the individual exit state specified in the operand to contribute to the aggregated exit state.Type: GrantFiled: August 1, 2019Date of Patent: September 7, 2021Assignee: GRAPHCORE LIMITEDInventor: Simon Christian Knowles
-
Patent number: 11106510Abstract: A processing system comprising: a subsystem for acting as a work accelerator to a host processor, the subsystem comprising an arrangement of tiles; and an interconnect for communicating between the tiles and connecting the subsystem to the host. The interconnect comprises synchronization logic to coordinate barrier synchronizations between a group of the tiles. The synchronization logic comprises a host sync proxy module, comprising a counter written with a number of credits by the host processor, and being configured to automatically decrement the number of credits each time one of the barrier synchronizations requiring host involvement is performed. When the number of credits in the counter is exhausted, the barrier is not released until a further write from the host to the host sync proxy module, but when the number is credits in the counter is not exhausted the barrier is released without a separate write from the host.Type: GrantFiled: July 17, 2019Date of Patent: August 31, 2021Assignee: GRAPHCORE LIMITEDInventors: Daniel John Pelham Wilkinson, Stephen Felix, Matthew David Fyles, Richard Luke Southwell Osborne
-
Patent number: 11106432Abstract: An execution unit is described which is particularly configured to generate an exponential of an operand floating point format. The operand is multiplied by a fixed multiplicand, logged to the base 2 (e) to generate a multiplication result. An integer part and a fractional part are extracted from the multiplication result. An exponent register stores the integer part to form the exponent of the exponential result. A lookup table has a plurality of entries each providing a value of 2f for a fractional part f used to access a lookup table. The fractional part is derived from a mantissa of the operand. That is, first and second bit sequences are extracted from the mantissa. One of the bit sequences is used to generate an estimated fractional component, and the other is used to access a value from the lookup table.Type: GrantFiled: May 31, 2019Date of Patent: August 31, 2021Assignee: Graphcore LimitedInventors: Jonathan Mangnall, Stephen Felix
-
Patent number: 11061679Abstract: A processor comprising an execution unit, memory and one or more register files. The execution unit is configured to execute instances of machine code instructions from an instruction set. The types of instruction defined in the instruction set include a double-load instruction for loading from the memory to at least one of the one or more register files. The execution unit is configured so as, when the load instruction is executed, to perform a first load operation strided by a fixed stride, and a second load operation strided by a variable stride, the variable stride being specified in a variable stride register in one of the one or more register files.Type: GrantFiled: April 19, 2019Date of Patent: July 13, 2021Assignee: Graphcore LimitedInventors: Alan Graham Alexander, Simon Christian Knowles, Mrudula Chidambar Gore
-
Patent number: 11048563Abstract: A processing system comprising: a subsystem for acting as a work accelerator to a host processor, the subsystem comprising an arrangement of tiles; and an interconnect for communicating between the tiles and connecting the subsystem to the host. The interconnect comprises synchronization logic to coordinate barrier synchronizations between a group of the tiles. The synchronization logic comprises a host sync proxy module, comprising a counter written with a number of credits by the host processor, and being configured to automatically decrement the number of credits each time one of the barrier synchronizations requiring host involvement is performed. When the number of credits in the counter is exhausted, the barrier is not released until a further write from the host to the host sync proxy module, but when the number is credits in the counter is not exhausted the barrier is released without a separate write from the host.Type: GrantFiled: February 1, 2018Date of Patent: June 29, 2021Assignee: Graphcore LimitedInventors: Daniel John Pelham Wilkinson, Stephen Felix, Matthew David Fyles, Richard Luke Southwell Osborne
-
Patent number: 11048844Abstract: A method and system for improved design verification for a data processing device when performing a logic simulation. The system identifies certain corresponding coverpoints at different points in results of a logic simulation for a design. Using coverage results obtained for the design, a merging of the results is performed for the certain corresponding coverpoints in the design. In the merged results, a coverpoint is considered as covered if at least one corresponding coverpoint is covered during the logic simulation.Type: GrantFiled: June 26, 2020Date of Patent: June 29, 2021Assignee: GRAPHCORE LIMITEDInventors: Paul Miseldine, Michael Davie
-
Patent number: 11036673Abstract: A method of recording tile identifiers in each of a plurality of tiles of a multitile processor is described. Tiles are arranged in columns, each column having a plurality of processing circuits, each processing circuit comprising one or more tiles, wherein a base processing circuit in each column is connected to a set of processing circuit identifier wires. A base value is generated on each of the set of processing circuit identifier wires for the base processing circuit in each column. At the base processing circuit, the base value on the set of processing circuit identifier wires is read and incremented by one. The incremented value is propagated to a next processing circuit in the column, and at the next processing circuit a unique identifier is recorded by concatenating an identifier of the column and the incremented value.Type: GrantFiled: May 22, 2019Date of Patent: June 15, 2021Assignee: Graphcore LimitedInventors: Stephen Felix, Jonathan Mangnall
-
Patent number: 11029962Abstract: An execution unit comprising a processing pipeline configured to perform calculations to evaluate a plurality of mathematical functions. The processing pipeline comprises a plurality of stages through which each calculation for evaluating a mathematical function progresses to an end result. Each of a plurality of processing circuits in the pipeline is configured to perform an operation on input values during at least one stage of the plurality of stages. The plurality of processing circuits include multiplier circuits. A first multiplier circuit and a second multiplier circuit are configured to operate in parallel, such that at the same stage in the processing pipeline, the first multiplier circuit and the second multiplier circuit perform their processing. A third multiplier circuit is arranged in series with the first multiplier circuit and the second multiplier circuit and processes outputs from the first multiplier circuit and the second multiplier circuit.Type: GrantFiled: May 31, 2019Date of Patent: June 8, 2021Assignee: GRAPHCORE LIMITEDInventor: Jonathan Mangnall
-
Patent number: 11023239Abstract: A processor comprising an execution unit, memory and one or more register files. The execution unit is configured to execute instances of machine code instructions from an instruction set. The types of instruction defined in the instruction set include a double-load instruction for loading from the memory to at least one of the one or more register files. The execution unit is configured so as, when the load instruction is executed, to perform a first load operation strided by a fixed stride, and a second load operation strided by a variable stride, the variable stride being specified in a variable stride register in one of the one or more register files.Type: GrantFiled: April 19, 2019Date of Patent: June 1, 2021Assignee: Graphcore LimitedInventors: Alan Graham Alexander, Simon Christian Knowles, Mrudula Chidambar Gore
-
Patent number: 11023290Abstract: A processing system comprising an arrangement of tiles and an interconnect between the tiles. The interconnect comprises synchronization logic for coordinating a barrier synchronization to be performed between a group of the tiles. The instruction set comprises a synchronization instruction taking an operand which selects one of a plurality of available modes each specifying a different membership of the group. Execution of the synchronization instruction cause a synchronization request to be transmitted from the respective tile to the synchronization logic, and instruction issue to be suspended on the respective tile pending a synchronization acknowledgement being received back from the synchronization logic. In response to receiving the synchronization request from all the tiles in the group as specified by the operand of the synchronization instruction, the synchronization logic returns the synchronization acknowledgment to the tiles in the specified group.Type: GrantFiled: February 1, 2018Date of Patent: June 1, 2021Assignee: Graphcore LimitedInventors: Daniel John Pelham Wilkinson, Simon Christian Knowles, Matthew David Fyles, Alan Graham Alexander, Stephen Felix
-
Patent number: 11023413Abstract: A method of operating a system comprising multiple processor tiles divided into a plurality of domains wherein within each domain the tiles are connected to one another via a respective instance of a time-deterministic interconnect and between domains the tiles are connected to one another via a non-time-deterministic interconnect. The method comprises: performing a compute stage, then performing a respective internal barrier synchronization within each domain, then performing an internal exchange phase within each domain, then performing an external barrier synchronization to synchronize between different domains, then performing an external exchange phase between the domains.Type: GrantFiled: December 23, 2019Date of Patent: June 1, 2021Assignee: GRAPHCORE LIMITEDInventors: Daniel John Pelham Wilkinson, Stephen Felix, Richard Luke Southwell Osborne, Simon Christian Knowles, Alan Graham Alexander, Ian James Quinn
-
Patent number: 11010208Abstract: A work accelerator is connected to a gateway. The gateway enables the transfer of data to the work accelerator from an external storage at pre-compiled data synchronisation points attained by the work accelerator. The work accelerator is configured to send to a register of the gateway an indication of a sync group comprising the gateway. The work accelerator then sends to the gateway, a synchronisation request for a synchronisation to be performed at an upcoming pre-compiled data exchange synchronisation point. The sync propagation circuits are each configured to receive at least one synchronisation request and propagate or acknowledge the synchronisation request in dependence upon the indication of the sync group received from the work accelerator.Type: GrantFiled: October 25, 2019Date of Patent: May 18, 2021Assignee: GRAPHCORE LIMITEDInventor: Brian Manula
-
Patent number: 10970131Abstract: A gateway for interfacing a host with a subsystem for acting as a work accelerator to the host, the gateway enabling the transfer of batches of data to and from the subsystem at pre-compiled data exchange synchronisation points attained by the subsystem. The gateway is configured to: receive from a storage system data determined by the host to be processed by the subsystem; store a number of credits indicating the availability of data for transfer to the subsystem at each pre-compiled data exchange synchronisation point; receive a synchronisation request from the subsystem when it attains a data exchange synchronisation point; and in response to determining that the number of credits comprises a non-zero number of credits: transmit a synchronisation acknowledgment to the subsystem; and cause the received data to be transferred to the subsystem.Type: GrantFiled: December 28, 2018Date of Patent: April 6, 2021Assignee: Graphcore LimitedInventors: Ola Tørudbakken, Daniel John Pelham Wilkinson, Richard Luke Southwell Osborne, Stephen Felix, Matthew David Fyles, Brian Manula, Harald Høeg
-
Patent number: 10963315Abstract: A system comprising: a first subsystem comprising one or more first processors, and a second subsystem comprising one or more second processors. The second subsystem is configured to process code over a series of steps delineated by barrier synchronizations, and in a current step, to send a descriptor to the first subsystem specifying a value of each of one or more parameters of each of one or more interactions that the second subsystem is programmed to perform with the first subsystem via an inter-processor interconnect in a subsequent step. The first subsystem is configured to execute a portion of code to perform one or more preparatory operations, based on the specified values of at least one of the one or more parameters of each interaction as specified by the descriptor, to prepare for said one or more interactions prior to the barrier synchronization leading into the subsequent phase.Type: GrantFiled: February 15, 2019Date of Patent: March 30, 2021Assignee: Graphcore LimitedInventors: David Lacey, Daniel John Pelham Wilkinson, Richard Luke Southwell Osborne, Matthew David Fyles
-
Patent number: 10963003Abstract: The invention relates to a computer comprising: a plurality of processing units each having instruction storage holding a local program, an execution unit executing the local program, data storage for holding data; an input interface with a set of input wires, and an output interface with a set of output wires; a switching fabric connected to each of the processing units by the respective set of output wires and connectable to each of the processing units by the respective input wires via switching circuitry controllable by each processing unit; a synchronisation module operable to generate a synchronisation signal to control the computer to switch between a compute phase and an exchange phase, wherein the processing units are configured to execute their local programs according to a common clock, the local programs being such that in the exchange phase at least one processing unit executes a send instruction from its local program to transmit at a transmit time a data packet onto its output set of connectionType: GrantFiled: October 19, 2018Date of Patent: March 30, 2021Assignee: GRAPHCORE LIMITEDInventors: Simon Christian Knowles, Daniel John Pelham Wilkinson, Richard Luke Southwell Osborne, Alan Graham Alexander, Stephen Felix, Jonathan Mangnall, David Lacey
-
Patent number: 10956234Abstract: A system comprising a gateway for interfacing external data sources with one or more accelerators. The gateway comprises a plurality of virtual gateways, each of which is configured to stream data from the external data sources to one or more associated accelerators. The plurality of virtual gateways are each configured to stream data from external data sources so that the data is received at an associated accelerator in response to a synchronisation point being obtained by a synchronisation zone. Each of the virtual gateways is assigned a virtual ID so that when data is received at the gateway, data can be delivered to the appropriate gateway.Type: GrantFiled: May 31, 2019Date of Patent: March 23, 2021Assignee: GRAPHCORE LIMITEDInventors: Brian Manula, Harald Hoeg, Ola Torudbakken
-
Patent number: 10956165Abstract: A processor comprising: an execution unit for executing a respective thread in each of a repeating sequence of time slots; and a plurality of context register sets, each comprising a respective set of registers for representing a state of a respective thread. The context register sets comprise a respective worker context register set for each of the number of time slots the execution unit is operable to interleave, and at least one extra context register set. The worker context register sets represent the respective states of worker threads and the extra context register set being represents the state of a supervisor thread. The processor is configured to begin running the supervisor thread in each of the time slots, and to enable the supervisor thread to then individually relinquish each of the time slots in which it is running to a respective one of the worker threads.Type: GrantFiled: February 1, 2018Date of Patent: March 23, 2021Assignee: Graphcore LimitedInventor: Simon Christian Knowles