Patents Assigned to Graphcore Limited
  • Patent number: 12645755
    Abstract: A computer-implemented method comprising: processing data in a neural network to compute a network tensor comprising a plurality of tensor elements represented in an initial numerical format; computing a histogram of tensor elements; selecting a target numerical format, the target numerical format having a lower precision than the initial numerical format; evaluating a metric based on the histogram of tensor elements and the target numerical format, the metric indicating a degree of accuracy of a representation of the network tensor in the target numerical format; and based on the evaluated metric, converting the plurality of tensor elements from the initial numerical format to the target numerical format.
    Type: Grant
    Filed: December 15, 2022
    Date of Patent: June 2, 2026
    Assignee: GRAPHCORE LIMITED
    Inventors: Godfrey Da Costa, Badreddine Noune, Daniel Justus, Carlo Luschi
  • Patent number: 12613732
    Abstract: A processing device comprising: at least one execution unit configured to interleave execution of a plurality of worker threads, wherein each of the worker threads is configured to execute a same set of code to perform operations on a different set of data held in an input buffer of a memory of the processing device and output the results data to an output buffer. An instruction is executed so as to cause a plurality of operand registers, each of which is associated with one of the worker threads, to be populated with one or more variables enabling each worker to determine where in the input buffer is located its set of input data and where to store its results data.
    Type: Grant
    Filed: October 28, 2022
    Date of Patent: April 28, 2026
    Assignee: GRAPHCORE LIMITED
    Inventors: Alan Alexander, Stephen Felix, Edward Andrews, Godfrey Da Costa
  • Patent number: 12596595
    Abstract: A set of configurable sync groupings (which may be referred to as sync zones) are defined. Any of the processors may belong to any of the sync zones. Each of the processor comprises a register indicating to which of the sync zones it belongs. If a processor does not belong to a sync zone, it continually asserts a sync request for that sync zone to the sync controller. If a processor does belong to a sync zone, it will only assert its sync request for that sync zone upon arriving at a synchronisation point for that sync zone indicated in its compiled code set.
    Type: Grant
    Filed: July 11, 2022
    Date of Patent: April 7, 2026
    Assignee: GRAPHCORE LIMITED
    Inventors: Stephen Felix, Richard Osborne
  • Patent number: 12596550
    Abstract: By providing a mode indication, an execution unit is operable to operate in two separate modes, each of which cause the execution unit to perform calculations by interpreting the same bit string (the first of the bit strings) as representing one of two different values. When operating in the first mode, the first of the bit string represents an undefined value, in other words a NaN. When operating in the second mode, the first of the bit strings represents a negative zero. Hence, the same string of bits can represent either a NaN or a negative zero depending upon the mode of operation of the processor. Since it is not necessary to reserve more than one bit string to represent these two special values, the remaining combinations of bits are available to represent other values.
    Type: Grant
    Filed: January 18, 2023
    Date of Patent: April 7, 2026
    Assignee: GRAPHCORE LIMITED
    Inventor: Alan Alexander
  • Patent number: 12588501
    Abstract: A heatsink is provided for a memory and routing module with a lower and upper side, both sides having multiple semiconductor chips attached. The lower side of the module has a connection component attached for connection to a motherboard. The heatsink includes a module receiving region configured to receive a lower side of the module, including a first thermally conductive portion arranged to face the semiconductor chips, an aperture through the lower heatsink component and a thermally conductive peripheral region disposed around the module receiving region. The heatsink includes an upper heatsink component which is configured to connect to the lower heatsink component at the peripheral region to retain the module. The upper heatsink component includes a lower side. The lower side includes a second thermally conductive portion arranged to face the semiconductor chips disposed on an upper side of the module and multiple second heat dissipating elements.
    Type: Grant
    Filed: February 27, 2023
    Date of Patent: March 24, 2026
    Assignee: GRAPHCORE LIMITED
    Inventors: Stephen Bodiley, David Japp, Stephen Felix
  • Patent number: 12561114
    Abstract: A computer comprising a plurality of processing units, each processing unit having an execution unit and access to computer memory which stores code executable by the execution unit and input values of an input vector to be processed by the code, the code, when executed, configured to access the computer memory to obtain multiple pairs of input values of the input vector, determine a maximum or corrected maximum input value of each pair as a maximum result element, determine and store in a computer memory a maximum or corrected maximum result of each pair of maximum result elements as an approximation to the natural log of the sum of the exponents of the input values and access the computer memory to obtain each input value and apply it to the maximum or corrected maximum result to generate each output value of a Softmax output vector.
    Type: Grant
    Filed: June 1, 2021
    Date of Patent: February 24, 2026
    Assignee: GRAPHCORE LIMITED
    Inventors: Carlo Luschi, Godfrey Da Costa, Badreddine Noune
  • Patent number: 12554490
    Abstract: An execution unit performs a byte-wise rotation of an input data block. An input data array receives an input data block. Two first layer multiplexer arrays each receive a first layer data block comprising a respective subset of bytes of the input data block and a first layer control signal, and rotate the first layer data block by an amount indicated by the first layer control signal. The second layer multiplexer array receives a second control signal and selects between a corresponding byte of the first and second rotated first layer data blocks based on the second control signal. The execution unit also includes a control signal generator, configured to generate the first layer control signal and second layer control signal based on a received computer program instruction. Results of smaller block rotations are thus used as partial results for larger block rotation, avoiding large multiplexer arrays with complex wiring.
    Type: Grant
    Filed: December 18, 2023
    Date of Patent: February 17, 2026
    Assignee: GRAPHCORE LIMITED
    Inventor: Mark Sheppard
  • Patent number: 12549649
    Abstract: In order to provide for the extension of either the MAC address or the VLAN identifier as required, a sliding cursor functionality between the MAC address and the VLAN identifier is provided. The MAC address may be extended by borrowing bits conventionally used for representing part of the VLAN identifier. Similarly, VLAN identifier may be extended by borrowing bits conventionally used for representing part of the MAC address.
    Type: Grant
    Filed: October 19, 2022
    Date of Patent: February 10, 2026
    Assignee: GRAPHCORE LIMITED
    Inventors: Bjorn Dag Johnsen, Brian Edward Manula
  • Patent number: 12519722
    Abstract: A data processing system having an address resolution function for deriving MAC addresses. The set of MACs defined for the devices on the network encode physical position or logical identifier information of those devices. Therefore, each of these MACs is derivable using a mapping function that maps the physical position or logical identifier information supplied by an application to the MAC addresses of the devices on the network. When the protocol processing entity has to send data over the network, it can obtain the MAC address for the destination determined on the basis of the physical position or logical identifier supplied by the application. In this way, since the MACs are derivable on the basis of the physical positions or logical identifiers, the broadcasting of ARP request messages, which would otherwise be required when the protocol processing entity requires the MAC for the destination, may be avoided.
    Type: Grant
    Filed: September 13, 2022
    Date of Patent: January 6, 2026
    Assignee: GRAPHCORE LIMITED
    Inventors: Bjorn Dag Johnsen, Brian Edward Manula
  • Patent number: 12469783
    Abstract: In a stacked integrated circuit device, there are two components, one in a first of the die and another in a second of the die. Each of the components is provided with two output connections, one leading above and one leading below the die, and two input connections, one leading above and one leading below the die, either of the two die. As a result of the redundancy, both die may be used in either position in the stacked structure. If either of the die is used as the top die, it sends data on its second output path and receives data on its second input path. On the other hand, when one of the die is used as the bottom die, it sends data on its first output path and receives data on its first input path. In this way, the same design may be used for the connections between each of the die.
    Type: Grant
    Filed: September 28, 2022
    Date of Patent: November 11, 2025
    Assignee: GRAPHCORE LIMITED
    Inventors: Alexander MacFaden, Stephen Felix
  • Patent number: 12468534
    Abstract: A processing device comprises a register configured to store a count value indicating a number of times overflow events have resulted from arithmetic operations performed by the processing device. An execution unit of the device, in response to performing an arithmetic operation having a result which extends beyond one of the predefined limit values for the floating-point format, stores a result value that is within the predefined limit values, and cause the count value to be incremented. The count value provides a performant way of determining the number of overflow events that have occurred during the arithmetic processing performed by the execution unit. The count value provides a metric that provides a measure of the inaccuracy imparted into the results of the application processing by overflow events.
    Type: Grant
    Filed: October 10, 2023
    Date of Patent: November 11, 2025
    Assignee: GRAPHCORE LIMITED
    Inventors: Alan Alexander, Dominic Masters
  • Patent number: 12430130
    Abstract: A hardware module is provided in an execution unit and is responsive to execution of multiple instances of a new type of instruction to perform a plurality of reductions in parallel. The hardware module comprises: a first accumulator storing first state associated with a first of the reductions; and a second accumulator storing second state associated with a second of the reductions. Upon execution of each of the multiple instances of the first type of instruction: an input value for the respective instance is provided to a first processing circuit of the hardware module such that the first processing circuit performs a first type of operation to update the first state; and the same input value is provided to the second processing circuit of the hardware module such that the second processing circuit performs a second type of operation to update the second state.
    Type: Grant
    Filed: February 2, 2023
    Date of Patent: September 30, 2025
    Assignee: GRAPHCORE LIMITED
    Inventors: Alan Alexander, Mrudula Gore
  • Patent number: 12430552
    Abstract: A computer-implemented method of training a deep neural network, comprising, for each of one or more batches of training examples: processing the data in a forward pass through the layers of the network, by: applying a set of network weights to the input data to obtain a set of weighted inputs, normalising the weighted inputs based on statistics computed for each training example, transforming the normalised inputs by affine transformation parameters, applying an activation function to the transformed normalised inputs to obtain post-activation values, and normalizing the post-activation values based on one or more proxy variables sampled from a distribution defined by proxy distribution parameters, the normalization applied independently of training example; processing the data in a backward pass through the network to determine updates to learnable parameters comprising network weights, affine transformation parameters, and proxy distribution parameters, and updating the learnable parameters to optimise a p
    Type: Grant
    Filed: September 29, 2021
    Date of Patent: September 30, 2025
    Assignee: GRAPHCORE LIMITED
    Inventors: Antoine Labatie, Dominic Alexander Masters, Zachary Frank Eaton-Rosen, Carlo Luschi
  • Patent number: 12399717
    Abstract: A processing device comprising: a control register configured to store a scaling factor; at least one execution unit configured to execute instructions to perform arithmetic operations on input floating-point numbers provided according to a first floating-point format, wherein each of the input floating-point numbers provided according to the first floating-point format comprises a predetermined number of bits, wherein the at least one execution unit is configured to, in response to execution of an instance of a first of the instructions: perform processing of a first set of the input floating-point numbers to generate a result value, the result value provided in a further format and comprising more the predetermined number of bits, enabling representation of a greater range of values than is representable in the first floating-point format; and apply the scaling factor specified in the control register to increase or decrease an exponent of the result value.
    Type: Grant
    Filed: February 27, 2023
    Date of Patent: August 26, 2025
    Assignee: GRAPHCORE LIMITED
    Inventors: Alan Alexander, Simon Knowles, Stephen Felix, Carlo Luschi, Badreddine Noune, Mrudula Gore, Godfrey Da Costa, Edward Andrews, Dominic Masters
  • Patent number: 12400971
    Abstract: A substrate and a method for manufacturing the substrate. The substrate is suitable for mounting at least one semiconductor die onto a printed circuit board. The substrate comprises two opposing stacks, with each stack comprising alternating layers of copper and electrically insulating film. The film and the copper have different co-efficients of thermal expansion, allowing the warpage behaviour of the substrate to be controlled by providing the substrate with different film thicknesses between the opposing stacks.
    Type: Grant
    Filed: July 21, 2021
    Date of Patent: August 26, 2025
    Assignee: GRAPHCORE LIMITED
    Inventors: Simon Jonathan Stacey, Yang Chih Wang
  • Patent number: 12386761
    Abstract: A memory and routing module includes a substrate and a connection component. The connection component is attached to the substrate and includes multiple pins that connect the module to a corresponding connection component on a motherboard. The substrate is connected to a dynamic random-access memory, DRAM, chip, and a routing chip. The routing chip includes a memory controller, multiple connections, and routing logic. The multiple connections include a first group between the memory controller and the DRAM chip and a second group of connections with the pins of the connection component. The routing logic routes data between the second group of connections and the first group of connections.
    Type: Grant
    Filed: December 2, 2022
    Date of Patent: August 12, 2025
    Assignee: GRAPHCORE LIMITED
    Inventors: Stephen Felix, Simon Stacey
  • Patent number: 12374657
    Abstract: The first logic wafer is attached to a supporting wafer, which adds sufficient depth to this bonded structure such that the first logic wafer may be thinned during the manufacturing process. The first logic wafer is thinned such that the through silicon vias may be etched in the substrate of the first logic wafer so as to provide adequate connectivity to a second logic wafer, which is bonded to the first logic wafer. The second logic wafer adds sufficient depth to this bonded structure to allow the supporting wafer to then be thinned to enable through silicon vias to be added to the supporting wafer so as to provide appropriate connectivity for the entire stacked structure. The thinned supporting wafer is retained in the finished stacked wafer structure and may comprise additional components (e.g. capacitors) supporting the operation of the processing circuitry in the logic wafers.
    Type: Grant
    Filed: October 5, 2022
    Date of Patent: July 29, 2025
    Assignee: GRAPHCORE LIMITED
    Inventors: Stephen Felix, Phillip Horsfield, Simon Jonathan Stacey
  • Patent number: 12368680
    Abstract: A bypass path is provided in the node for reducing the latency and power consumption associated with writing to and reading from the VC buffer, and is enabled when certain conditions are met. Bypass is enabled for a received packet when there is no other data that is ready to be sent from the VC buffer, which is the case when all VCs either have zero credits or an empty partition in the buffer. In this way, data arriving at the node is prevented from using the bypass path to take priority over data already held in the VC buffer and ready for transmission.
    Type: Grant
    Filed: November 9, 2023
    Date of Patent: July 22, 2025
    Assignee: GRAPHCORE LIMITED
    Inventor: Ashley Robinson
  • Patent number: 12367043
    Abstract: A processor comprising a barrel-threaded execution unit for executing concurrent threads, and one or more register files comprising a respective set of context registers for each concurrent thread. One of the one or more register files further comprises a set of shared weights registers common to some or all of the concurrent threads. The types of instructions defined in the instruction set of the processor include an arithmetic instruction having operands specifying a source and a destination from amongst a respective set of arithmetic registers of the thread in which the arithmetic instruction is executed. The execution unit is configured so as, in response to the opcode of the arithmetic instruction, to perform an operation comprising multiplying an input from the source by at least one of the weights from at least one of the shared weights registers, and to place a result in the destination.
    Type: Grant
    Filed: February 15, 2019
    Date of Patent: July 22, 2025
    Assignee: Graphcore Limited
    Inventors: Alan Graham Alexander, Simon Christian Knowles, Mrudula Chidambar Gore
  • Patent number: 12346284
    Abstract: A computer comprising a plurality of processor devices connected in a ring, wherein each of the processor devices is connected to each of two neighbouring ones of the processor devices by a respective physical inter-processor link. Each of a set of external memory device stores a local portion of the externally stored dataset. Each processor device executes instructions to: determine that a synchronisation point has been reached by the plurality of processor devices; responsive to the determination, access from its connected external memory device its local portion of the externally stored dataset stored; record a copy of its local portion of the externally stored dataset in its local memory; transmit its local portion of the externally stored dataset to at least one of its connected neighbouring processing devices; and receive an incoming portion of the externally stored dataset from at least one of its connected neighbouring processing devices.
    Type: Grant
    Filed: September 12, 2022
    Date of Patent: July 1, 2025
    Assignee: GRAPHCORE LIMITED
    Inventor: Simon Christian Knowles