Patents by Inventor Stephen Felix

Stephen Felix has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20190155768
    Abstract: A processor comprising multiple tiles on the same chip, and an external interconnect for communicating data off-chip in the form of packets. The external interconnect comprises an external exchange block configured to provide flow control and queuing of the packets. One of the tiles is nominated by the compiler to send an external exchange request message to the exchange block on behalf of others with data to send externally. The exchange sends an exchange-on message to a first of these tiles, to cause the first tile to start sending packets via the external interconnect. Then, once this tile has sent its last data packet, the exchange block sends an exchange-off control packet to this tile to cause it to stop sending packets, and sends another exchange-on message to the next tile with data to send, and so forth.
    Type: Application
    Filed: October 19, 2018
    Publication date: May 23, 2019
    Applicant: Graphcore Limited
    Inventors: Daniel John Pelham Wilkinson, Richard Luke Southwell Osborne, Stephen Felix, Graham Bernard Cunningham, Alan Graham Alexander
  • Publication number: 20190121679
    Abstract: A processing system comprising an arrangement of tiles and an interconnect between the tiles. The interconnect comprises synchronization logic for coordinating a barrier synchronization to be performed between a group of the tiles. The instruction set comprises a synchronization instruction taking an operand which selects one of a plurality of available modes each specifying a different membership of the group. Execution of the synchronization instruction cause a synchronization request to be transmitted from the respective tile to the synchronization logic, and instruction issue to be suspended on the respective tile pending a synchronization acknowledgement being received back from the synchronization logic. In response to receiving the synchronization request from all the tiles in the group as specified by the operand of the synchronization instruction, the synchronization logic returns the synchronization acknowledgment to the tiles in the specified group.
    Type: Application
    Filed: February 1, 2018
    Publication date: April 25, 2019
    Applicant: Graphcore Limited
    Inventors: Daniel John Pelham Wilkinson, Simon Christian Knowles, Matthew David Fyles, Alan Graham Alexander, Stephen Felix
  • Publication number: 20190121784
    Abstract: A method of operating a system comprising multiple processor tiles divided into a plurality of domains wherein within each domain the tiles are connected to one another via a respective instance of a time-deterministic interconnect and between domains the tiles are connected to one another via a non-time-deterministic interconnect. The method comprises: performing a compute stage, then performing a respective internal barrier synchronization within each domain, then performing an internal exchange phase within each domain, then performing an external barrier synchronization to synchronize between different domains, then performing an external exchange phase between the domains.
    Type: Application
    Filed: February 1, 2018
    Publication date: April 25, 2019
    Applicant: Graphcore Limited
    Inventors: Daniel John Pelham Wilkinson, Stephen Felix, Richard Luke Southwell Osborne, Simon Christian Knowles, Alan Graham Alexander, Ian James Quinn
  • Publication number: 20190121388
    Abstract: The invention relates to a computer implemented method of generating multiple programs to deliver a computerised function, each program to be executed in a processing unit of a computer comprising a plurality of processing units each having instruction storage for holding a local program, an execution unit for executing the local program and data storage for holding data, a switching fabric connected to an output interface of each processing unit and connectable to an input interface of each processing unit by switching circuitry controllable by each processing unit, and a synchronisation module operable to generate a synchronisation signal, the method comprising: generating a local program for each processing unit comprising a sequence of executable instructions; determining for each processing unit a relative time of execution of instructions of each local program whereby a local program allocated to one processing unit is scheduled to execute with a predetermined delay relative to a synchronisation signal
    Type: Application
    Filed: February 1, 2018
    Publication date: April 25, 2019
    Applicant: Graphcore Limited
    Inventors: Simon Christian Knowles, Daniel John Pelham Wilkinson, Richard Luke Southwell Osborne, Alan Graham Alexander, Stephen Felix, Jonathan Mangnall, David Lacey
  • Publication number: 20190121616
    Abstract: The present relates to invention deals with an execution unit configured to execute a computer program instruction to generate random numbers based on a predetermined probability distribution. The execution unit comprises a hardware pseudorandom number generator configured to generate at least randomised bit string on execution of the instruction and adding circuitry which is configured to receive a number of bit sequences of a predetermined bit length selected from the randomised bit string and to sum them to produce a result.
    Type: Application
    Filed: February 1, 2018
    Publication date: April 25, 2019
    Applicant: Graphcore Limited
    Inventors: Stephen Felix, Godfrey Da Costa
  • Publication number: 20190121777
    Abstract: The invention relates to a computer program comprising a sequence of instructions for execution on a processing unit having instruction storage for holding the computer program, an execution unit for executing the computer program and data storage for holding data, the computer program comprising one or more computer executable instruction which, when executed, implements: a send function which causes a data packet destined for a recipient processing unit to be transmitted on a set of connection wires connected to the processing unit, the data packet having no destination identifier but being transmitted at a predetermined transmit time; and a switch control function which causes the processing unit to control switching circuitry to connect a set of connection wires of the processing unit to a switching fabric to receive a data packet at a predetermined receive time.
    Type: Application
    Filed: February 1, 2018
    Publication date: April 25, 2019
    Applicant: Graphcore Limited
    Inventors: Simon Christian Knowles, Daniel John Pelham Wilkinson, Richard Luke Southwell Osborne, Alan Graham Alexander, Stephen Felix, Jonathan Mangnall, David Lacey
  • Publication number: 20190121785
    Abstract: A processing system comprising an arrangement of tiles and synchronization logic in the form of hardware logic for coordinating between a group of some or all of said tiles. The instruction set comprises a synchronization instruction which causes an instance of a synchronization request to be transmitted from the respective tile to the synchronization logic, and suspends instruction issue on the respective tile pending a synchronization acknowledgement. In response to receiving an instance of the synchronization request from all of the tiles of the group, the synchronization logic returns the synchronization acknowledgment back to each of the tiles in the group to allow the instruction issue to resume. The instruction set further comprises an abstain instruction, which sends an instance of the synchronization request but does not suspend instruction issue on the respective tile pending the synchronization acknowledgement, instead allowing the instruction issue on the respective tile to continue.
    Type: Application
    Filed: February 1, 2018
    Publication date: April 25, 2019
    Applicant: Graphcore Limited
    Inventors: Daniel John Pelham Wilkinson, Richard Luke Southwell Osborne, Matthew David Fyles, Alan Graham Alexander, Stephen Felix
  • Publication number: 20190121680
    Abstract: A processing system comprising: a subsystem for acting as a work accelerator to a host processor, the subsystem comprising an arrangement of tiles; and an interconnect for communicating between the tiles and connecting the subsystem to the host. The interconnect comprises synchronization logic to coordinate barrier synchronizations between a group of the tiles. The synchronization logic comprises a host sync proxy module, comprising a counter written with a number of credits by the host processor, and being configured to automatically decrement the number of credits each time one of the barrier synchronizations requiring host involvement is performed. When the number of credits in the counter is exhausted, the barrier is not released until a further write from the host to the host sync proxy module, but when the number is credits in the counter is not exhausted the barrier is released without a separate write from the host.
    Type: Application
    Filed: February 1, 2018
    Publication date: April 25, 2019
    Applicant: Graphcore Limited
    Inventors: Daniel John Pelham Wilkinson, Stephen Felix, Matthew David Fyles, Richard Luke Southwell Osborne
  • Publication number: 20190121779
    Abstract: A computer program comprising a sequence of instructions for execution on a processing unit having instruction storage for holding the computer program, an execution unit for executing the computer program and data storage for holding data, the computer program comprising: a switch control instruction which when executed causes the processing unit to control switching circuitry to connect a set of connection wires of the processing unit to a switching fabric to receive a data packet at a predetermined received time, the switch control instruction comprising a delay control field which holds a value defining a delay between issuance of the instruction in the sequence of instructions and its execution by the execution unit.
    Type: Application
    Filed: February 1, 2018
    Publication date: April 25, 2019
    Applicant: Graphcore Limited
    Inventors: Richard Luke Southwell Osborne, Alan Graham Alexander, Stephen Felix
  • Publication number: 20190121387
    Abstract: The invention relates to a computer comprising: a plurality of processing units each having instruction storage holding a local program, an execution unit executing the local program, data storage for holding data; an input interface with a set of input wires, and an output interface with a set of output wires; a switching fabric connected to each of the processing units by the respective set of output wires and connectable to each of the processing units by the respective input wires via switching circuitry controllable by each processing unit; a synchronisation module operable to generate a synchronisation signal to control the computer to switch between a compute phase and an exchange phase, wherein the processing units are configured to execute their local programs according to a common clock, the local programs being such that in the exchange phase at least one processing unit executes a send instruction from its local program to transmit at a transmit time a data packet onto its output set of connection
    Type: Application
    Filed: February 1, 2018
    Publication date: April 25, 2019
    Applicant: Graphcore Limited
    Inventors: Simon Christian Knowles, Daniel John Pelham Wilkinson, Richard Luke Southwell Osborne, Alan Graham Alexander, Stephen Felix, Jonathan Mangnall, David Lacey
  • Publication number: 20190121778
    Abstract: An indication of a direction of transmission over the switching fabric is inserted into a data packet that is transmitted from a tile. The indication of direction may indicate directions from the transmitting tile in which intended recipient tiles are present. The switching fabric prevents (e.g. by blocking the data packet at one of a series of latches) the transmission in a direction not indicated in the data packet. Hence, power saving may be achieved, by preventing the unnecessary transmission of data packets over parts of the switching fabric.
    Type: Application
    Filed: February 1, 2018
    Publication date: April 25, 2019
    Applicant: Graphcore Limited
    Inventors: Stephen Felix, Jonathan Mangnall
  • Publication number: 20190121639
    Abstract: The present invention relates to an execution unit for executing a computer program comprising a sequence of instructions, which include a masking instruction. The execution unit is configured to execute the masking instruction which when executed by the execution unit masks randomly selected values from a source operand of n values and retains other original values from the source operand to generate a result which includes original values from the source operand and the masked values in their respective original locations.
    Type: Application
    Filed: February 1, 2018
    Publication date: April 25, 2019
    Applicant: Graphcore Limited
    Inventors: Stephen Felix, Simon Christian Knowles, Godfrey Da Costa
  • Patent number: 9766649
    Abstract: A system is based on an IC. A first component of the IC generates a signal that clocks the IC at a target operating frequency. A period corresponding to the target clock frequency exceeds a duration of a longest critical path associated with the IC. The first component and synchronous logic of the IC clocked therewith, each functions with the core supply voltage, which may be supplied to each via the same power supply rail. A second IC component detects errors that relate to an operation of the IC at the target clock frequency and determines a level for adjusting the core supply voltage. The Vdd adjustment ameliorates the frequency error. The voltage determination uses closed loop dynamic voltage and frequency scaling.
    Type: Grant
    Filed: July 22, 2013
    Date of Patent: September 19, 2017
    Assignee: Nvidia Corporation
    Inventors: Stephen Felix, Jeffery Bond, Tezaswi Raja, Kalyana Bollapalli, Vikram Mehta
  • Patent number: 9639327
    Abstract: A circuit for multiplying a digital signal by a variable gain, controlled in dependence on a digital gain control value. The circuit comprises: a multiplier input for receiving the digital signal; a multiplier output for outputting the digital signal multiplied by the gain; a plurality of multiplier stages each arranged to multiply by a respective predetermined multiplication factor; and switching circuitry arranged so as to apply selected ones of the multiplier stages in a multiplication path between the input and output, in dependence on the digital gain control value. The multiplication factors are arranged such that binary steps in the digital gain control value result in logarithmic steps in said gain.
    Type: Grant
    Filed: April 7, 2011
    Date of Patent: May 2, 2017
    Assignee: Nvidia Corporation
    Inventor: Stephen Felix
  • Publication number: 20160336054
    Abstract: A subsystem configured to select the power supply to a static random access memory cell compares the level of a dedicated memory supply voltage to the primary system supply voltage. The subsystem then switches the primary system supply to the SRAM cell when the system voltage is higher than the memory supply voltage with some margin. When the system voltage is lower than the memory supply voltage, with margin, the subsystem switches the memory supply to the SRAM cell. When the system voltage is comparable to the memory supply, the subsystem switches the system voltage to the SRAM cell if performance is a prioritized consideration, but switches the memory supply to the SRAM cell if power reduction is a prioritized consideration. In this manner, the system achieves optimum performance without incurring steady state power losses and avoids timing issues in accessing memory.
    Type: Application
    Filed: May 13, 2015
    Publication date: November 17, 2016
    Inventors: Stephen FELIX, Hwong-Kwo LIN, Spencer GOLD, Jing GUO, Andreas GOTTERBA, Jason GOLBUS, Karthik NATARAJAN, Jun YANG, Zhenye JIANG, Ge YANG, Lei WANG, Yong LI, Hua CHEN, Haiyan GONG, Beibei REN, Eric VOELKEL
  • Patent number: 9494641
    Abstract: A degradation detector for an integrated circuit (IC), a method of detecting aging in an IC and an IC incorporating the degradation detector or the method. In one embodiment, the degradation detector includes: (1) an offline ring oscillator (RO) coupled to a power gate and a clock gate, (2) a frozen RO coupled to a clock gate, (3) an online RO and (4) an analyzer coupled to the offline RO, the frozen RO and the online RO and operable to place the degradation detector in a normal state in which the offline RO is disconnected from both the drive voltage source and the clock source, the frozen RO is connected to the drive voltage source but disconnected from the clock source and the online RO is connected to both the drive voltage source and the clock source.
    Type: Grant
    Filed: January 24, 2014
    Date of Patent: November 15, 2016
    Assignee: Nvidia Corporation
    Inventors: Brian Smith, Stephen Felix, Tezaswi Raja, Roman Surgutchik
  • Patent number: 9484115
    Abstract: A subsystem configured to select the power supply to a static random access memory cell compares the level of a dedicated memory supply voltage to the primary system supply voltage. The subsystem then switches the primary system supply to the SRAM cell when the system voltage is higher than the memory supply voltage with some margin. When the system voltage is lower than the memory supply voltage, with margin, the subsystem switches the memory supply to the SRAM cell. When the system voltage is comparable to the memory supply, the subsystem switches the system voltage to the SRAM cell if performance is a prioritized consideration, but switches the memory supply to the SRAM cell if power reduction is a prioritized consideration. In this manner, the system achieves optimum performance without incurring steady state power losses and avoids timing issues in accessing memory.
    Type: Grant
    Filed: May 13, 2015
    Date of Patent: November 1, 2016
    Assignee: NVIDIA Corporation
    Inventors: Stephen Felix, Hwong-Kwo Lin, Spencer Gold, Jing Guo, Andreas Gotterba, Jason Golbus, Karthik Natarajan, Jun Yang, Zhenye Jiang, Ge Yang, Lei Wang, Yong Li, Hua Chen, Haiyan Gong, Beibei Ren, Eric Voelkel
  • Patent number: 9389622
    Abstract: A voltage margin controller, an IC included the same and a method of controlling voltage margin for a voltage domain of an IC are disclosed herein. In one embodiment, the voltage margin controller includes: (1) monitoring branches including circuit function indicators configured to indicate whether circuitry in the voltage domain could operate at corresponding candidate reduced voltage levels and (2) a voltage margin adjuster coupled to the monitoring branches and configured to develop a voltage margin adjustment for a voltage regulator of the voltage domain based upon an operating number of the circuit function indicators.
    Type: Grant
    Filed: October 6, 2015
    Date of Patent: July 12, 2016
    Assignee: Nvidia Corporation
    Inventors: Brian L. Smith, Stephen Felix, Jesse Max Guss, Tezaswi Raja
  • Publication number: 20160026195
    Abstract: A voltage margin controller, an IC included the same and a method of controlling voltage margin for a voltage domain of an IC are disclosed herein. In one embodiment, the voltage margin controller includes: (1) monitoring branches including circuit function indicators configured to indicate whether circuitry in the voltage domain could operate at corresponding candidate reduced voltage levels and (2) a voltage margin adjuster coupled to the monitoring branches and configured to develop a voltage margin adjustment for a voltage regulator of the voltage domain based upon an operating number of the circuit function indicators.
    Type: Application
    Filed: October 6, 2015
    Publication date: January 28, 2016
    Inventors: Brian L. Smith, Stephen Felix, Jesse Max Guss, Tezaswi Raja
  • Patent number: 9245595
    Abstract: A method and a system are provided for performing memory access assist using voltage boost. A memory access request is received at a storage cell array that comprises two or more subarrays, each subarray including at least one row of storage cells. The voltage boost is applied, during the memory access, to a first negative supply voltage of a first storage cell subarray of the two or more storage cell subarrays. The first negative supply voltage of the first storage cell subarray is lower than a second negative supply voltage of a second storage cell subarray of the two or more storage cell subarrays.
    Type: Grant
    Filed: December 20, 2013
    Date of Patent: January 26, 2016
    Assignee: NVIDIA Corporation
    Inventors: Stephen Felix, Stéphane Badel