Patents by Inventor Stephen Felix

Stephen Felix has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11036673
    Abstract: A method of recording tile identifiers in each of a plurality of tiles of a multitile processor is described. Tiles are arranged in columns, each column having a plurality of processing circuits, each processing circuit comprising one or more tiles, wherein a base processing circuit in each column is connected to a set of processing circuit identifier wires. A base value is generated on each of the set of processing circuit identifier wires for the base processing circuit in each column. At the base processing circuit, the base value on the set of processing circuit identifier wires is read and incremented by one. The incremented value is propagated to a next processing circuit in the column, and at the next processing circuit a unique identifier is recorded by concatenating an identifier of the column and the incremented value.
    Type: Grant
    Filed: May 22, 2019
    Date of Patent: June 15, 2021
    Assignee: Graphcore Limited
    Inventors: Stephen Felix, Jonathan Mangnall
  • Patent number: 11023290
    Abstract: A processing system comprising an arrangement of tiles and an interconnect between the tiles. The interconnect comprises synchronization logic for coordinating a barrier synchronization to be performed between a group of the tiles. The instruction set comprises a synchronization instruction taking an operand which selects one of a plurality of available modes each specifying a different membership of the group. Execution of the synchronization instruction cause a synchronization request to be transmitted from the respective tile to the synchronization logic, and instruction issue to be suspended on the respective tile pending a synchronization acknowledgement being received back from the synchronization logic. In response to receiving the synchronization request from all the tiles in the group as specified by the operand of the synchronization instruction, the synchronization logic returns the synchronization acknowledgment to the tiles in the specified group.
    Type: Grant
    Filed: February 1, 2018
    Date of Patent: June 1, 2021
    Assignee: Graphcore Limited
    Inventors: Daniel John Pelham Wilkinson, Simon Christian Knowles, Matthew David Fyles, Alan Graham Alexander, Stephen Felix
  • Patent number: 11023413
    Abstract: A method of operating a system comprising multiple processor tiles divided into a plurality of domains wherein within each domain the tiles are connected to one another via a respective instance of a time-deterministic interconnect and between domains the tiles are connected to one another via a non-time-deterministic interconnect. The method comprises: performing a compute stage, then performing a respective internal barrier synchronization within each domain, then performing an internal exchange phase within each domain, then performing an external barrier synchronization to synchronize between different domains, then performing an external exchange phase between the domains.
    Type: Grant
    Filed: December 23, 2019
    Date of Patent: June 1, 2021
    Assignee: GRAPHCORE LIMITED
    Inventors: Daniel John Pelham Wilkinson, Stephen Felix, Richard Luke Southwell Osborne, Simon Christian Knowles, Alan Graham Alexander, Ian James Quinn
  • Patent number: 10970131
    Abstract: A gateway for interfacing a host with a subsystem for acting as a work accelerator to the host, the gateway enabling the transfer of batches of data to and from the subsystem at pre-compiled data exchange synchronisation points attained by the subsystem. The gateway is configured to: receive from a storage system data determined by the host to be processed by the subsystem; store a number of credits indicating the availability of data for transfer to the subsystem at each pre-compiled data exchange synchronisation point; receive a synchronisation request from the subsystem when it attains a data exchange synchronisation point; and in response to determining that the number of credits comprises a non-zero number of credits: transmit a synchronisation acknowledgment to the subsystem; and cause the received data to be transferred to the subsystem.
    Type: Grant
    Filed: December 28, 2018
    Date of Patent: April 6, 2021
    Assignee: Graphcore Limited
    Inventors: Ola Tørudbakken, Daniel John Pelham Wilkinson, Richard Luke Southwell Osborne, Stephen Felix, Matthew David Fyles, Brian Manula, Harald Høeg
  • Patent number: 10963003
    Abstract: The invention relates to a computer comprising: a plurality of processing units each having instruction storage holding a local program, an execution unit executing the local program, data storage for holding data; an input interface with a set of input wires, and an output interface with a set of output wires; a switching fabric connected to each of the processing units by the respective set of output wires and connectable to each of the processing units by the respective input wires via switching circuitry controllable by each processing unit; a synchronisation module operable to generate a synchronisation signal to control the computer to switch between a compute phase and an exchange phase, wherein the processing units are configured to execute their local programs according to a common clock, the local programs being such that in the exchange phase at least one processing unit executes a send instruction from its local program to transmit at a transmit time a data packet onto its output set of connection
    Type: Grant
    Filed: October 19, 2018
    Date of Patent: March 30, 2021
    Assignee: GRAPHCORE LIMITED
    Inventors: Simon Christian Knowles, Daniel John Pelham Wilkinson, Richard Luke Southwell Osborne, Alan Graham Alexander, Stephen Felix, Jonathan Mangnall, David Lacey
  • Publication number: 20210091786
    Abstract: A hardware module comprising circuity configured to: store a sequence of n bits in a register of the hardware module; generate a signed integer comprising a magnitude component and a sign bit by: if the most significant bit of the sequence of n bits is equal to one: set each of the n?1 of the most significant bits of the magnitude component to be equal to the corresponding bit of the n?1 least significant bits of the sequence of n bits; and set the sign bit to be zero; if the most significant bit of the sequence of n bits is equal to zero: set each of the n?1 of the most significant bits of the magnitude component to be equal to the inverse of the corresponding bit of the n?1 least significant bits of the sequence of n bits; and set the sign bit to be one.
    Type: Application
    Filed: June 21, 2019
    Publication date: March 25, 2021
    Inventors: STEPHEN FELIX, MRUDULA GORE
  • Patent number: 10936008
    Abstract: The invention relates to a computer comprising: a plurality of processing units each having instruction storage holding a local program, an execution unit executing the local program, data storage for holding data; an input interface with a set of input wires, and an output interface with a set of output wires; a switching fabric connected to each of the processing units by the respective set of output wires and connectable to each of the processing units by the respective input wires via switching circuitry controllable by each processing unit; a synchronisation module operable to generate a synchronisation signal to control the computer to switch between a compute phase and an exchange phase, wherein the processing units are configured to execute their local programs according to a common clock, the local programs being such that in the exchange phase at least one processing unit executes a send instruction from its local program to transmit at a transmit time a data packet onto its output set of connection
    Type: Grant
    Filed: February 1, 2018
    Date of Patent: March 2, 2021
    Assignee: Graphcore Limited
    Inventors: Simon Christian Knowles, Daniel John Pelham Wilkinson, Richard Luke Southwell Osborne, Alan Graham Alexander, Stephen Felix, Jonathan Mangnall, David Lacey
  • Patent number: 10817459
    Abstract: An indication of a direction of transmission over the switching fabric is inserted into a data packet that is transmitted from a tile. The indication of direction may indicate directions from the transmitting tile in which intended recipient tiles are present. The switching fabric prevents (e.g. by blocking the data packet at one of a series of latches) the transmission in a direction not indicated in the data packet. Hence, power saving may be achieved, by preventing the unnecessary transmission of data packets over parts of the switching fabric.
    Type: Grant
    Filed: June 25, 2019
    Date of Patent: October 27, 2020
    Assignee: GRAPHCORE LIMITED
    Inventors: Stephen Felix, Jonathan Mangnall
  • Patent number: 10817444
    Abstract: A system comprising an arrangement of multiple processor modules, and an external interconnect for communicating data in the form of packets to outside the arrangement. The interconnect comprises an exchange block configured to provide flow control. One of the processor modules is arranged to send an exchange request message to the exchange block on behalf of others with data to send outside the arrangement. The exchange block sends an exchange-on message to a first of these processor modules, to cause the first module to start sending packets via the interconnect. Then, once this processor module has sent its last data packet, the exchange block sends an exchange-off message to this processor module to cause it to stop sending packets, and sends another exchange-on message to the next processor module with data to send, and so forth.
    Type: Grant
    Filed: July 30, 2019
    Date of Patent: October 27, 2020
    Assignee: GRAPHCORE LIMITED
    Inventors: Daniel John Pelham Wilkinson, Richard Luke Southwell Osborne, Stephen Felix, Graham Bernard Cunningham, Alan Graham Alexander
  • Patent number: 10820409
    Abstract: According to a first aspect, there is provided a computer structure comprising a first silicon substrate and a second silicon substrate. Computer circuitry configured to perform computing operations is formed in the first silicon substrate, which has a self-supporting depth and an inner facing surface. A plurality of distributed capacitance units are formed in the second silicon substrate, which has an inner facing surface located in overlap with the inner facing surface of the first substrate and is connected to the first substrate via a set of connectors arranged extending depthwise of the structure between the inner facing surfaces. The inner facing surfaces have matching planar surface dimensions. The second substrate has an outer facing surface on which are arranged a plurality of connector terminals for connecting the computer structure to a supply voltage. The second substrate has a smaller depth than the first substrate.
    Type: Grant
    Filed: February 28, 2020
    Date of Patent: October 27, 2020
    Assignee: GRAPHCORE LIMITED
    Inventor: Stephen Felix
  • Patent number: 10802536
    Abstract: The invention relates to a computer implemented method of generating multiple programs to deliver a computerised function, each program to be executed in a processing unit of a computer comprising a plurality of processing units each having instruction storage for holding a local program, an execution unit for executing the local program and data storage for holding data, a switching fabric connected to an output interface of each processing unit and connectable to an input interface of each processing unit by switching circuitry controllable by each processing unit, and a synchronisation module operable to generate a synchronisation signal, the method comprising: generating a local program for each processing unit comprising a sequence of executable instructions; determining for each processing unit a relative time of execution of instructions of each local program whereby a local program allocated to one processing unit is scheduled to execute with a predetermined delay relative to a synchronisation signal
    Type: Grant
    Filed: February 1, 2018
    Date of Patent: October 13, 2020
    Assignee: Graphcore Limited
    Inventors: Simon Christian Knowles, Daniel John Pelham Wilkinson, Richard Luke Southwell Osborne, Alan Graham Alexander, Stephen Felix, Jonathan Mangnall, David Lacey
  • Publication number: 20200293278
    Abstract: An execution unit for a processor, the execution unit comprising: a look up table having a plurality of entries, each of the plurality of entries comprising an initial estimate for a result of an operation; a preparatory circuit configured to search the look up table using an index value dependent upon the operand to locate an entry comprising a first initial estimate for a result of the operation; a plurality of processing circuits comprising at least one multiplier circuit; and control circuitry configured to provide the first initial estimate to the at least one multiplier circuit of the plurality of processing circuits so as perform processing, by the plurality of processing units, of the first initial estimate to generate the function result, said processing comprising applying one or more Newton Raphson iterations to the first initial estimate.
    Type: Application
    Filed: April 26, 2019
    Publication date: September 17, 2020
    Applicant: Graphcore Limited
    Inventors: Jonathan Mangnall, Stephen Felix
  • Publication number: 20200293315
    Abstract: An execution unit is described which is particularly configured to generate an exponential of an operand floating point format. The operand is multiplied by a fixed multiplicant, logged to the base 2 (e) to generate a multiplication result. An integer part and a fractional part are extracted from the multiplication result. An exponent register stores the integer part to form the exponent of the exponential result. A lookup table has a plurality of entries each providing a value of 2f for a fractional part f used to access a lookup table. The fractional part is derived from a mantissa of the operand. That is, first and second bit sequences are extracted from the mantissa. One of the bit sequences is used to generate an estimated fractional component, and the other is used to access a value from the lookup table. The estimated fractional component and a looked up value are multiplied.
    Type: Application
    Filed: May 31, 2019
    Publication date: September 17, 2020
    Applicant: Graphcore Limited
    Inventors: Jonathan Mangnall, Stephen Felix
  • Publication number: 20200201794
    Abstract: The present disclosure relates to a method of scheduling messages to be exchanged between tiles in a computer where there is a fixed transmission time between sending and receiving tiles. According to the method a total size of message data to be sent or received by each tile is determined. One of the tiles is selected based at least on the size of the message data to schedule a first message. The first message to be scheduled is selected from the set of messages on that tile. In order to schedule the message the other end points of this selected message are determined, and then respective time slots are allocated at the sending and receiving tiles for that message. The size of the selected message is then deducted from each of the tiles acting as end points for the message, and then the sequence is carried out again until all messages have been scheduled. This technique optimises message exchange in an exchange phase of a BSP system.
    Type: Application
    Filed: April 26, 2019
    Publication date: June 25, 2020
    Applicant: Graphcore Limited
    Inventors: Richard Luke Southwell Osborne, Stephen Felix
  • Publication number: 20200201604
    Abstract: A pseudo random number generator implemented in hardware. The pseudo random number generator comprises a state post processing circuit for processing two state values to produce a random number. The circuit having a first combinatorial logic comprising a XOR or XNOR gate configured to process a first pair of bits from the state values, a second combinatorial logic comprising an OR or AND gate configured to process a second pair of bits from the state value, and third combinatorial logic comprising an OR or AND gate configured or process a third pair of bits from the state value. The circuit has fourth combinatorial logic configured to process the outputs of the first three set of combinatorial logic so as to provide a result bit of the random number. The fourth combinatorial logic comprises an AND or OR gate and a XOR or XNOR gate.
    Type: Application
    Filed: April 26, 2019
    Publication date: June 25, 2020
    Applicant: Graphcore Limited
    Inventors: Stephen Felix, James William Hanlon
  • Publication number: 20200201601
    Abstract: A function approximation system is disclosed for determining output floating point values of functions calculated using floating point numbers. Complex functions have different shapes in different subsets of their input domain, making them difficult to predict for different values of the input variable. The function approximation system comprises an execution unit configured to determine corresponding values of a given function given a floating point input to the function; a plurality of look up tables for each function type; a correction table of values which determines if corrections to the output value are required; and a table selector for finding an appropriate table for a given function.
    Type: Application
    Filed: April 26, 2019
    Publication date: June 25, 2020
    Applicant: Graphcore Limited
    Inventors: Jonathan Mangnall, Stephen Felix
  • Publication number: 20200201602
    Abstract: A hardware module comprising at least one of: one or more field programmable gate arrays and one or more application specific integrated circuits configured to: receive a number in floating-point representation at a first precision level, the number comprising an exponent and a first mantissa; apply a first random number to the first mantissa to generate a first carry; truncate the first mantissa to a level specified by a second precision level; add the first carry to the least significant bit of the mantissa truncated to the level specified by the second precision level to form a mantissa for the number in floating-point representation at the second precision level.
    Type: Application
    Filed: July 26, 2019
    Publication date: June 25, 2020
    Inventors: Stephen Felix, Mrudula Gore, Alan Graham Alexander
  • Publication number: 20200201412
    Abstract: There is disclosed a method of controlling the frequency of a clock signal in a processor. The method selects a first clock generator to provide a processor clock signal for executing an application. If a threshold event is detected, a second clock generator is selected. The method reduces the frequency of a clock signal generated by the first clock generator while a processor clock signal is being provided for execution of an application from the second clock generator. The second clock generator generates a clock at a lower speed than the first clock generator. After a predetermined time the first clock generator is reselected to provide the processor clock signal. The threshold detection is repeated until an optimum clock frequency is discovered.
    Type: Application
    Filed: May 31, 2019
    Publication date: June 25, 2020
    Inventors: Stephen Felix, Mrudula Gore
  • Publication number: 20200201704
    Abstract: A processor comprises a plurality of processing units, wherein there is a fixed transmission time for transmitting a message from a sending processing unit to a receiving processing unit, based on the physical positions of the sending and receiving processing units in the processor. The processing units are arranged in a column, and the fixed transmission time depends on the position of a processing circuit in the column. An exchange fabric is provided for exchanging messages between sending and receiving processing units, the columns being arranged with respect to the exchange fabric such that the fixed transmission time depends on the distances of the processing circuits with respect to the exchange fabric. The processor comprises at least one delay stage for each processing circuit and switching circuitry for selectively switching the delay stage into or out of a communication path involved in message exchange.
    Type: Application
    Filed: April 26, 2019
    Publication date: June 25, 2020
    Applicant: Graphcore Limited
    Inventor: Stephen Felix
  • Publication number: 20200201600
    Abstract: A method and apparatus for handling overflow conditions resulting from arithmetic operations involving floating point numbers. An indication is stored as part of a thread's context indicating one of two possible modes for handling overflow conditions. In a first mode, a result of an arithmetic operation is set to the limit representable in the floating point format. In a second mode, a result of an arithmetic operation is set to a NaN.
    Type: Application
    Filed: April 26, 2019
    Publication date: June 25, 2020
    Applicant: Graphcore Limited
    Inventors: Alan Graham Alexander, Edward Andrews, Stephen Felix, Mrudula Chidambar Gore