Patents by Inventor Simon Knowles

Simon Knowles has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20210342284
    Abstract: A network comprising interconnected first and second processors, each processor comprising one or more of: multiple processing units arranged on a chip configured to execute program code; an on-chip interconnect comprising groups of exchange paths connected to receive data from corresponding groups of the processing units; external interfaces configured to communicate data off-chip as packets, each having a destination address, external interfaces of the first and second processors being connected by an external link; multiple exchange blocks, each connected to groups of the exchange paths; a routing bus configured to route packets between the exchange blocks and the external interfaces. Processing units of the first processor generate off-chip packets such that the group of processing units serviced by the first exchange block on the first processor address off-chip packets to the group of processing units on the second processor serviced by the corresponding first exchange block of the second processor.
    Type: Application
    Filed: July 13, 2021
    Publication date: November 4, 2021
    Inventors: Simon KNOWLES, Hachem YASSINE
  • Publication number: 20210303505
    Abstract: A computer comprising a plurality of interconnected processing nodes arranged in a configuration in which multiple layers of interconnected nodes are arranged along an axis, each layer comprising at least four processing nodes connected in a non-axial ring by at least respective intralayer link between each pair of neighbouring processing nodes, wherein each of the at least four processing nodes in each layer is connected to a respective corresponding node in one or more adjacent layer by a respective interlayer link, the computer being programmed to provide in the configuration two embedded one dimensional paths and to transmit data around each of the two embedded one dimensional paths, each embedded one dimensional path using all processing nodes of the computer in such a manner that the two embedded one dimensional paths operate simultaneously without sharing links.
    Type: Application
    Filed: March 24, 2021
    Publication date: September 30, 2021
    Inventor: Simon KNOWLES
  • Publication number: 20200310819
    Abstract: A computer comprising a plurality of processing nodes is provided. Each processing node has at least one processor configured to process input data to generate an array of data items. The processing nodes are arranged in cliques in which each processing node of a clique is connected to each other processing node in the clique by first and second clique links. The cliques are inter-connected in rings such that each processing node is a member of a single clique and a single ring. The processing nodes of all cliques are configured to exchange in each exchange step of a machine learning collective via the respective first and second clique links at least two data items with the other processing node(s) in its clique, and all processing nodes are configured to reduce each received data item with the data item in the corresponding position in the array on that processing node.
    Type: Application
    Filed: March 26, 2020
    Publication date: October 1, 2020
    Inventor: Simon KNOWLES
  • Publication number: 20200311020
    Abstract: One aspect of the invention provides a computer comprising a plurality of interconnected processing nodes arranged in a ladder configuration comprising a plurality of facing pairs of processing nodes. The processing nodes of each pair are connected to each other by two links. A processing node in each pair is connected to a corresponding processing node in an adjacent pair by at least one link. The processing nodes are programmed to operate the ladder configuration to transmit data around two embedded one-dimensional rings formed by respective sets of processing nodes and links, each ring using all processing nodes in the ladder once only.
    Type: Application
    Filed: March 26, 2020
    Publication date: October 1, 2020
    Inventor: Simon KNOWLES
  • Publication number: 20200311529
    Abstract: According to an aspect of the invention, there is provided a computer comprising a plurality of interconnected processing nodes arranged in a configuration with multiple stacked layers. Each layer comprises four processing nodes connected by respective links between the processing nodes. In end layers of the stack, the four processing nodes are interconnected in a ring formation by two links between the nodes, the two links adapted to operate simultaneously. Processing nodes in the multiple stacked layers provide four faces, each face comprising multiple layers, each layer comprising a pair of processing nodes. The processing nodes are programmed to operate a configuration to transmit data around embedded one-dimensional rings, each ring formed by processing nodes in two opposing faces.
    Type: Application
    Filed: March 26, 2020
    Publication date: October 1, 2020
    Inventor: Simon KNOWLES
  • Publication number: 20200311017
    Abstract: A computer comprising a plurality of processing nodes is provided. Each processing node has at least one processor configured to process input data to generate an array of data items. The processing nodes are arranged in cliques in which each processing node of a clique is connected to each other processing node in the clique by first and second clique links. The cliques are inter-connected in rings such that each processing node is a member of a single clique and a single ring. The processing nodes of all cliques are configured to exchange in each exchange step of a machine learning collective via the respective first and second clique links at least two data items with the other processing node(s) in its clique, and all processing nodes are configured to reduce each received data item with the data item in the corresponding position in the array on that processing node.
    Type: Application
    Filed: March 26, 2020
    Publication date: October 1, 2020
    Inventor: Simon KNOWLES
  • Publication number: 20200311528
    Abstract: A computer comprising a plurality of interconnected processing nodes arranged in multiple stacked layers forming a multi-face prism is provided. Each face of the prism comprises multiple stacked pairs of nodes. Said nodes are connected by at least two intralayer links. Each node is connected to a corresponding node in an adjacent pair by an interlayer link. The corresponding nodes are connected by respective interlayer links to form respective edges. Each pair forms part of a layers, each layer comprising multiple nodes, each node connected to their neighbouring nodes in the layer by at least one of the intralayer links to form a ring. Data is transmitted around paths formed by respective sets of nodes and links, each path having a first portion between a first and second endmost layers, and a second portion provided between the second and first endmost layers and comprising one of the edges.
    Type: Application
    Filed: March 26, 2020
    Publication date: October 1, 2020
    Inventor: Simon KNOWLES
  • Publication number: 20200293478
    Abstract: A computer comprising a plurality of interconnected processing nodes arranged in a configuration with multiple layers, arranged along an axis, comprising first and second endmost layers and at least one intermediate layer between the first and second endmost layers is provided. Each layer comprises a plurality of processing nodes connected in a ring by an intralayer respective set of links between each pair of neighbouring processing nodes, the links adapted to operate simultaneously. Nodes in each layer are connected to respective corresponding nodes in each adjacent layer by an interlayer link. Each processing node in the first endmost layer is connected to a corresponding node in the second endmost layer. Data is transmitted around a plurality of embedded one-dimensional logical rings with an asymmetric bandwidth utilisation, each logical ring using all processing nodes of the computer in such a manner that the plurality of embedded one-dimensional logical rings operate simultaneously.
    Type: Application
    Filed: March 26, 2020
    Publication date: September 17, 2020
    Inventor: Simon KNOWLES
  • Patent number: 9477475
    Abstract: According to embodiments disclosed herein, there is disclosed a computer processor architecture; and in particular a computer processor, a method of operating the same, and a computer program product that makes use of an instruction set for the computer. In one embodiment, the computer processor includes: (1) a decode unit for decoding instruction packets fetched from a memory holding the instruction packets, (2) a control processing channel capable of performing control operations and (3) a data processing channel capable of performing data processing operations, wherein, in use the decode unit causes instructions of instruction packets comprising a plurality of only control instructions to be executed sequentially on the control processing channel, and wherein, in use the decode unit causes instructions of instruction packets comprising a plurality of instructions comprising at least one data processing instruction to be executed simultaneously on the data processing channel.
    Type: Grant
    Filed: April 30, 2015
    Date of Patent: October 25, 2016
    Assignee: Nvidia Technology UK Limited
    Inventor: Simon Knowles
  • Publication number: 20150234659
    Abstract: According to embodiments disclosed herein, there is disclosed a computer processor architecture; and in particular a computer processor, a method of operating the same, and a computer program product that makes use of an instruction set for the computer. In one embodiment, the computer processor includes: (1) a decode unit for decoding instruction packets fetched from a memory holding the instruction packets, (2) a control processing channel capable of performing control operations and (3) a data processing channel capable of performing data processing operations, wherein, in use the decode unit causes instructions of instruction packets comprising a plurality of only control instructions to be executed sequentially on the control processing channel, and wherein, in use the decode unit causes instructions of instruction packets comprising a plurality of instructions comprising at least one data processing instruction to be executed simultaneously on the data processing channel.
    Type: Application
    Filed: April 30, 2015
    Publication date: August 20, 2015
    Inventor: Simon Knowles
  • Patent number: 9047094
    Abstract: According to embodiments of the invention, there is disclosed a computer processor architecture; and in particular a computer processor, a method of operating the same, and a computer program product that makes use of an instruction set for the computer. In one embodiment according to the invention, there is provided a computer processor, the processor comprising: a decode unit for decoding instruction packets fetched from a memory holding a sequence of instruction packets; and first and second processing channels, each channel comprising a plurality of functional units, wherein the first processing channel is capable of performing control operations and comprises a control register file having a relatively narrower bit width, and the second processing channel is capable of performing data processing operations at least one input of which is a vector and comprises a data register file having a relatively wider bit width.
    Type: Grant
    Filed: March 31, 2004
    Date of Patent: June 2, 2015
    Assignee: Icera Inc.
    Inventor: Simon Knowles
  • Patent number: 8966223
    Abstract: A configurable execution unit comprises operators capable of being dynamically configured by an instruction at the level of processing multi-bit operand values. The unit comprises one or more dynamically configurable operator modules, each module being connectable to receive input operands indicated in an instruction, and a programmable lookup table connectable to receive dynamic configuration information determined from an opcode portion of the instruction and capable of generating operator configuration settings defining an aspect of the function or behavior of a configurable operator module, responsive to said dynamic configuration information in the instruction.
    Type: Grant
    Filed: May 5, 2005
    Date of Patent: February 24, 2015
    Assignee: Icera, Inc.
    Inventor: Simon Knowles
  • Patent number: 8782376
    Abstract: A processor including: a first and at least a second data processing channel with enable logic for selectively enabling the second channel; logic for generating first and second storage addresses having a variable offset therebetween based on the same one or more address operands of the same storage access instruction; and circuitry for transferring data between the first address and a register of the first data processing channel and between the second address and a corresponding register of the second channel based on a same one or more register specifier operands of the access instruction. The first data processing channel performs an operation using one or more registers of the first data processing channel, and on condition of being enabled the second channel performs the same operation using a corresponding one or more of its own registers based on the same one or more operands of the data processing instruction.
    Type: Grant
    Filed: August 26, 2011
    Date of Patent: July 15, 2014
    Assignee: Icera Inc.
    Inventors: Simon Knowles, Edward Andrews, Stephen Felix, Simon Huckett, Colman Hegarty
  • Patent number: 8671268
    Abstract: A configurable execution unit comprises operators capable of being dynamically configured by an instruction at the level of processing multi-bit operand values. The unit comprises one or more dynamically configurable operator modules, the or each module being connectable to receive input operands indicated in an instruction, and a programmable lookup table connectable to receive dynamic configuration information determined from an opcode portion of the instruction and capable of generating operator configuration settings defining an aspect of the function or behavior of a configurable operator module, responsive to the dynamic configuration information in the instruction.
    Type: Grant
    Filed: March 11, 2011
    Date of Patent: March 11, 2014
    Assignee: ICERA, Inc.
    Inventor: Simon Knowles
  • Patent number: 8484442
    Abstract: A computer processor comprises a decode unit and a processing channel. The decode unit decodes a stream of instruction packets from a memory, each instruction packet comprising a plurality of instructions. The processing channel comprises a plurality of functional units and operable to perform control processing operations. The decode unit is operable to receive and decode instruction packets of a bit length of 64 bits and to detect if the instruction packet defines three control instructions each having a length of 21 bits. The decode unit detects that the instruction packet comprises the three control instructions. The control instructions are supplied to the processing channel for execution in the order in which they appear in the instruction packet. The detection uses an identification bit in the instruction packet.
    Type: Grant
    Filed: December 9, 2010
    Date of Patent: July 9, 2013
    Assignee: Icera Inc.
    Inventor: Simon Knowles
  • Patent number: 8484441
    Abstract: A computer processor with control and data processing capabilities comprises a decode unit for decoding instructions. A data processing facility comprises a first data execution path including fixed operators and a second data execution path including at least configurable operators, the configurable operators having a plurality of predefined configurations, at least some of which are selectable by means of an opcode portion of a data processing instruction. The decode unit is operable to detect whether a data processing instruction defines a fixed data processing operation or a configurable data processing operation, said decode unit causing the computer system to supply data for processing to said first data execution path when a fixed data processing instruction is detected and to said configurable data execution path when a configurable data processing instruction is detected.
    Type: Grant
    Filed: March 31, 2004
    Date of Patent: July 9, 2013
    Assignee: Icera Inc.
    Inventor: Simon Knowles
  • Publication number: 20120221834
    Abstract: A processor including: a first and at least a second data processing channel with enable logic for selectively enabling the second channel; logic for generating first and second storage addresses having a variable offset therebetween based on the same one or more address operands of the same storage access instruction; and circuitry for transferring data between the first address and a register of the first data processing channel and between the second address and a corresponding register of the second channel based on a same one or more register specifier operands of the access instruction. The first data processing channel performs an operation using one or more registers of the first data processing channel, and on condition of being enabled the second channel performs the same operation using a corresponding one or more of its own registers based on the same one or more operands of the data processing instruction.
    Type: Application
    Filed: August 26, 2011
    Publication date: August 30, 2012
    Applicant: ICERA INC
    Inventors: Simon Knowles, Edward Andrews, Stephen Felix, Simon Huckett, Colman Hegarty, Fabienne Hegarty
  • Publication number: 20110161640
    Abstract: A configurable execution unit comprises operators capable of being dynamically configured by an instruction at the level of processing multi-bit operand values. The unit comprises one or more dynamically configurable operator modules, the or each module being connectable to receive input operands indicated in an instruction, and a programmable lookup table connectable to receive dynamic configuration information determined from an opcode portion of the instruction and capable of generating operator configuration settings defining an aspect of the function or behavior of a configurable operator module, responsive to the dynamic configuration information in the instruction.
    Type: Application
    Filed: March 11, 2011
    Publication date: June 30, 2011
    Inventor: Simon Knowles
  • Patent number: 7949856
    Abstract: According to embodiments of the invention, there is disclosed a computer processor architecture; and in particular a computer processor, a method of operating the same, and a computer program product that makes use of an instruction set for the computer.
    Type: Grant
    Filed: March 31, 2004
    Date of Patent: May 24, 2011
    Assignee: Icera Inc.
    Inventor: Simon Knowles
  • Patent number: 7933405
    Abstract: According to embodiments of the invention, there is disclosed a data processing unit, a method of operating the same, computer program product and an instruction. In one embodiment according to the invention, there is provided a data processing unit for a computer processor, the data processing unit comprising a deep register access mechanism capable of performing a permutation operation on at least one data operand accessed from a register file of the computer processor, the permutation operation being performed in series with (i) register access for the data operand and (ii) execution of a data processing operation on the operand.
    Type: Grant
    Filed: April 8, 2005
    Date of Patent: April 26, 2011
    Assignee: Icera Inc.
    Inventors: Simon Knowles, Stephen Felix