Patents by Inventor Stephen Felix
Stephen Felix has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20190155768Abstract: A processor comprising multiple tiles on the same chip, and an external interconnect for communicating data off-chip in the form of packets. The external interconnect comprises an external exchange block configured to provide flow control and queuing of the packets. One of the tiles is nominated by the compiler to send an external exchange request message to the exchange block on behalf of others with data to send externally. The exchange sends an exchange-on message to a first of these tiles, to cause the first tile to start sending packets via the external interconnect. Then, once this tile has sent its last data packet, the exchange block sends an exchange-off control packet to this tile to cause it to stop sending packets, and sends another exchange-on message to the next tile with data to send, and so forth.Type: ApplicationFiled: October 19, 2018Publication date: May 23, 2019Applicant: Graphcore LimitedInventors: Daniel John Pelham Wilkinson, Richard Luke Southwell Osborne, Stephen Felix, Graham Bernard Cunningham, Alan Graham Alexander
-
Publication number: 20190121679Abstract: A processing system comprising an arrangement of tiles and an interconnect between the tiles. The interconnect comprises synchronization logic for coordinating a barrier synchronization to be performed between a group of the tiles. The instruction set comprises a synchronization instruction taking an operand which selects one of a plurality of available modes each specifying a different membership of the group. Execution of the synchronization instruction cause a synchronization request to be transmitted from the respective tile to the synchronization logic, and instruction issue to be suspended on the respective tile pending a synchronization acknowledgement being received back from the synchronization logic. In response to receiving the synchronization request from all the tiles in the group as specified by the operand of the synchronization instruction, the synchronization logic returns the synchronization acknowledgment to the tiles in the specified group.Type: ApplicationFiled: February 1, 2018Publication date: April 25, 2019Applicant: Graphcore LimitedInventors: Daniel John Pelham Wilkinson, Simon Christian Knowles, Matthew David Fyles, Alan Graham Alexander, Stephen Felix
-
Publication number: 20190121784Abstract: A method of operating a system comprising multiple processor tiles divided into a plurality of domains wherein within each domain the tiles are connected to one another via a respective instance of a time-deterministic interconnect and between domains the tiles are connected to one another via a non-time-deterministic interconnect. The method comprises: performing a compute stage, then performing a respective internal barrier synchronization within each domain, then performing an internal exchange phase within each domain, then performing an external barrier synchronization to synchronize between different domains, then performing an external exchange phase between the domains.Type: ApplicationFiled: February 1, 2018Publication date: April 25, 2019Applicant: Graphcore LimitedInventors: Daniel John Pelham Wilkinson, Stephen Felix, Richard Luke Southwell Osborne, Simon Christian Knowles, Alan Graham Alexander, Ian James Quinn
-
Publication number: 20190121388Abstract: The invention relates to a computer implemented method of generating multiple programs to deliver a computerised function, each program to be executed in a processing unit of a computer comprising a plurality of processing units each having instruction storage for holding a local program, an execution unit for executing the local program and data storage for holding data, a switching fabric connected to an output interface of each processing unit and connectable to an input interface of each processing unit by switching circuitry controllable by each processing unit, and a synchronisation module operable to generate a synchronisation signal, the method comprising: generating a local program for each processing unit comprising a sequence of executable instructions; determining for each processing unit a relative time of execution of instructions of each local program whereby a local program allocated to one processing unit is scheduled to execute with a predetermined delay relative to a synchronisation signalType: ApplicationFiled: February 1, 2018Publication date: April 25, 2019Applicant: Graphcore LimitedInventors: Simon Christian Knowles, Daniel John Pelham Wilkinson, Richard Luke Southwell Osborne, Alan Graham Alexander, Stephen Felix, Jonathan Mangnall, David Lacey
-
Publication number: 20190121616Abstract: The present relates to invention deals with an execution unit configured to execute a computer program instruction to generate random numbers based on a predetermined probability distribution. The execution unit comprises a hardware pseudorandom number generator configured to generate at least randomised bit string on execution of the instruction and adding circuitry which is configured to receive a number of bit sequences of a predetermined bit length selected from the randomised bit string and to sum them to produce a result.Type: ApplicationFiled: February 1, 2018Publication date: April 25, 2019Applicant: Graphcore LimitedInventors: Stephen Felix, Godfrey Da Costa
-
Publication number: 20190121777Abstract: The invention relates to a computer program comprising a sequence of instructions for execution on a processing unit having instruction storage for holding the computer program, an execution unit for executing the computer program and data storage for holding data, the computer program comprising one or more computer executable instruction which, when executed, implements: a send function which causes a data packet destined for a recipient processing unit to be transmitted on a set of connection wires connected to the processing unit, the data packet having no destination identifier but being transmitted at a predetermined transmit time; and a switch control function which causes the processing unit to control switching circuitry to connect a set of connection wires of the processing unit to a switching fabric to receive a data packet at a predetermined receive time.Type: ApplicationFiled: February 1, 2018Publication date: April 25, 2019Applicant: Graphcore LimitedInventors: Simon Christian Knowles, Daniel John Pelham Wilkinson, Richard Luke Southwell Osborne, Alan Graham Alexander, Stephen Felix, Jonathan Mangnall, David Lacey
-
Publication number: 20190121785Abstract: A processing system comprising an arrangement of tiles and synchronization logic in the form of hardware logic for coordinating between a group of some or all of said tiles. The instruction set comprises a synchronization instruction which causes an instance of a synchronization request to be transmitted from the respective tile to the synchronization logic, and suspends instruction issue on the respective tile pending a synchronization acknowledgement. In response to receiving an instance of the synchronization request from all of the tiles of the group, the synchronization logic returns the synchronization acknowledgment back to each of the tiles in the group to allow the instruction issue to resume. The instruction set further comprises an abstain instruction, which sends an instance of the synchronization request but does not suspend instruction issue on the respective tile pending the synchronization acknowledgement, instead allowing the instruction issue on the respective tile to continue.Type: ApplicationFiled: February 1, 2018Publication date: April 25, 2019Applicant: Graphcore LimitedInventors: Daniel John Pelham Wilkinson, Richard Luke Southwell Osborne, Matthew David Fyles, Alan Graham Alexander, Stephen Felix
-
Publication number: 20190121680Abstract: A processing system comprising: a subsystem for acting as a work accelerator to a host processor, the subsystem comprising an arrangement of tiles; and an interconnect for communicating between the tiles and connecting the subsystem to the host. The interconnect comprises synchronization logic to coordinate barrier synchronizations between a group of the tiles. The synchronization logic comprises a host sync proxy module, comprising a counter written with a number of credits by the host processor, and being configured to automatically decrement the number of credits each time one of the barrier synchronizations requiring host involvement is performed. When the number of credits in the counter is exhausted, the barrier is not released until a further write from the host to the host sync proxy module, but when the number is credits in the counter is not exhausted the barrier is released without a separate write from the host.Type: ApplicationFiled: February 1, 2018Publication date: April 25, 2019Applicant: Graphcore LimitedInventors: Daniel John Pelham Wilkinson, Stephen Felix, Matthew David Fyles, Richard Luke Southwell Osborne
-
Publication number: 20190121779Abstract: A computer program comprising a sequence of instructions for execution on a processing unit having instruction storage for holding the computer program, an execution unit for executing the computer program and data storage for holding data, the computer program comprising: a switch control instruction which when executed causes the processing unit to control switching circuitry to connect a set of connection wires of the processing unit to a switching fabric to receive a data packet at a predetermined received time, the switch control instruction comprising a delay control field which holds a value defining a delay between issuance of the instruction in the sequence of instructions and its execution by the execution unit.Type: ApplicationFiled: February 1, 2018Publication date: April 25, 2019Applicant: Graphcore LimitedInventors: Richard Luke Southwell Osborne, Alan Graham Alexander, Stephen Felix
-
Publication number: 20190121387Abstract: The invention relates to a computer comprising: a plurality of processing units each having instruction storage holding a local program, an execution unit executing the local program, data storage for holding data; an input interface with a set of input wires, and an output interface with a set of output wires; a switching fabric connected to each of the processing units by the respective set of output wires and connectable to each of the processing units by the respective input wires via switching circuitry controllable by each processing unit; a synchronisation module operable to generate a synchronisation signal to control the computer to switch between a compute phase and an exchange phase, wherein the processing units are configured to execute their local programs according to a common clock, the local programs being such that in the exchange phase at least one processing unit executes a send instruction from its local program to transmit at a transmit time a data packet onto its output set of connectionType: ApplicationFiled: February 1, 2018Publication date: April 25, 2019Applicant: Graphcore LimitedInventors: Simon Christian Knowles, Daniel John Pelham Wilkinson, Richard Luke Southwell Osborne, Alan Graham Alexander, Stephen Felix, Jonathan Mangnall, David Lacey
-
Publication number: 20190121778Abstract: An indication of a direction of transmission over the switching fabric is inserted into a data packet that is transmitted from a tile. The indication of direction may indicate directions from the transmitting tile in which intended recipient tiles are present. The switching fabric prevents (e.g. by blocking the data packet at one of a series of latches) the transmission in a direction not indicated in the data packet. Hence, power saving may be achieved, by preventing the unnecessary transmission of data packets over parts of the switching fabric.Type: ApplicationFiled: February 1, 2018Publication date: April 25, 2019Applicant: Graphcore LimitedInventors: Stephen Felix, Jonathan Mangnall
-
Publication number: 20190121639Abstract: The present invention relates to an execution unit for executing a computer program comprising a sequence of instructions, which include a masking instruction. The execution unit is configured to execute the masking instruction which when executed by the execution unit masks randomly selected values from a source operand of n values and retains other original values from the source operand to generate a result which includes original values from the source operand and the masked values in their respective original locations.Type: ApplicationFiled: February 1, 2018Publication date: April 25, 2019Applicant: Graphcore LimitedInventors: Stephen Felix, Simon Christian Knowles, Godfrey Da Costa
-
Patent number: 9766649Abstract: A system is based on an IC. A first component of the IC generates a signal that clocks the IC at a target operating frequency. A period corresponding to the target clock frequency exceeds a duration of a longest critical path associated with the IC. The first component and synchronous logic of the IC clocked therewith, each functions with the core supply voltage, which may be supplied to each via the same power supply rail. A second IC component detects errors that relate to an operation of the IC at the target clock frequency and determines a level for adjusting the core supply voltage. The Vdd adjustment ameliorates the frequency error. The voltage determination uses closed loop dynamic voltage and frequency scaling.Type: GrantFiled: July 22, 2013Date of Patent: September 19, 2017Assignee: Nvidia CorporationInventors: Stephen Felix, Jeffery Bond, Tezaswi Raja, Kalyana Bollapalli, Vikram Mehta
-
Patent number: 9639327Abstract: A circuit for multiplying a digital signal by a variable gain, controlled in dependence on a digital gain control value. The circuit comprises: a multiplier input for receiving the digital signal; a multiplier output for outputting the digital signal multiplied by the gain; a plurality of multiplier stages each arranged to multiply by a respective predetermined multiplication factor; and switching circuitry arranged so as to apply selected ones of the multiplier stages in a multiplication path between the input and output, in dependence on the digital gain control value. The multiplication factors are arranged such that binary steps in the digital gain control value result in logarithmic steps in said gain.Type: GrantFiled: April 7, 2011Date of Patent: May 2, 2017Assignee: Nvidia CorporationInventor: Stephen Felix
-
Publication number: 20160336054Abstract: A subsystem configured to select the power supply to a static random access memory cell compares the level of a dedicated memory supply voltage to the primary system supply voltage. The subsystem then switches the primary system supply to the SRAM cell when the system voltage is higher than the memory supply voltage with some margin. When the system voltage is lower than the memory supply voltage, with margin, the subsystem switches the memory supply to the SRAM cell. When the system voltage is comparable to the memory supply, the subsystem switches the system voltage to the SRAM cell if performance is a prioritized consideration, but switches the memory supply to the SRAM cell if power reduction is a prioritized consideration. In this manner, the system achieves optimum performance without incurring steady state power losses and avoids timing issues in accessing memory.Type: ApplicationFiled: May 13, 2015Publication date: November 17, 2016Inventors: Stephen FELIX, Hwong-Kwo LIN, Spencer GOLD, Jing GUO, Andreas GOTTERBA, Jason GOLBUS, Karthik NATARAJAN, Jun YANG, Zhenye JIANG, Ge YANG, Lei WANG, Yong LI, Hua CHEN, Haiyan GONG, Beibei REN, Eric VOELKEL
-
Patent number: 9494641Abstract: A degradation detector for an integrated circuit (IC), a method of detecting aging in an IC and an IC incorporating the degradation detector or the method. In one embodiment, the degradation detector includes: (1) an offline ring oscillator (RO) coupled to a power gate and a clock gate, (2) a frozen RO coupled to a clock gate, (3) an online RO and (4) an analyzer coupled to the offline RO, the frozen RO and the online RO and operable to place the degradation detector in a normal state in which the offline RO is disconnected from both the drive voltage source and the clock source, the frozen RO is connected to the drive voltage source but disconnected from the clock source and the online RO is connected to both the drive voltage source and the clock source.Type: GrantFiled: January 24, 2014Date of Patent: November 15, 2016Assignee: Nvidia CorporationInventors: Brian Smith, Stephen Felix, Tezaswi Raja, Roman Surgutchik
-
Patent number: 9484115Abstract: A subsystem configured to select the power supply to a static random access memory cell compares the level of a dedicated memory supply voltage to the primary system supply voltage. The subsystem then switches the primary system supply to the SRAM cell when the system voltage is higher than the memory supply voltage with some margin. When the system voltage is lower than the memory supply voltage, with margin, the subsystem switches the memory supply to the SRAM cell. When the system voltage is comparable to the memory supply, the subsystem switches the system voltage to the SRAM cell if performance is a prioritized consideration, but switches the memory supply to the SRAM cell if power reduction is a prioritized consideration. In this manner, the system achieves optimum performance without incurring steady state power losses and avoids timing issues in accessing memory.Type: GrantFiled: May 13, 2015Date of Patent: November 1, 2016Assignee: NVIDIA CorporationInventors: Stephen Felix, Hwong-Kwo Lin, Spencer Gold, Jing Guo, Andreas Gotterba, Jason Golbus, Karthik Natarajan, Jun Yang, Zhenye Jiang, Ge Yang, Lei Wang, Yong Li, Hua Chen, Haiyan Gong, Beibei Ren, Eric Voelkel
-
Patent number: 9389622Abstract: A voltage margin controller, an IC included the same and a method of controlling voltage margin for a voltage domain of an IC are disclosed herein. In one embodiment, the voltage margin controller includes: (1) monitoring branches including circuit function indicators configured to indicate whether circuitry in the voltage domain could operate at corresponding candidate reduced voltage levels and (2) a voltage margin adjuster coupled to the monitoring branches and configured to develop a voltage margin adjustment for a voltage regulator of the voltage domain based upon an operating number of the circuit function indicators.Type: GrantFiled: October 6, 2015Date of Patent: July 12, 2016Assignee: Nvidia CorporationInventors: Brian L. Smith, Stephen Felix, Jesse Max Guss, Tezaswi Raja
-
Publication number: 20160026195Abstract: A voltage margin controller, an IC included the same and a method of controlling voltage margin for a voltage domain of an IC are disclosed herein. In one embodiment, the voltage margin controller includes: (1) monitoring branches including circuit function indicators configured to indicate whether circuitry in the voltage domain could operate at corresponding candidate reduced voltage levels and (2) a voltage margin adjuster coupled to the monitoring branches and configured to develop a voltage margin adjustment for a voltage regulator of the voltage domain based upon an operating number of the circuit function indicators.Type: ApplicationFiled: October 6, 2015Publication date: January 28, 2016Inventors: Brian L. Smith, Stephen Felix, Jesse Max Guss, Tezaswi Raja
-
Patent number: 9245595Abstract: A method and a system are provided for performing memory access assist using voltage boost. A memory access request is received at a storage cell array that comprises two or more subarrays, each subarray including at least one row of storage cells. The voltage boost is applied, during the memory access, to a first negative supply voltage of a first storage cell subarray of the two or more storage cell subarrays. The first negative supply voltage of the first storage cell subarray is lower than a second negative supply voltage of a second storage cell subarray of the two or more storage cell subarrays.Type: GrantFiled: December 20, 2013Date of Patent: January 26, 2016Assignee: NVIDIA CorporationInventors: Stephen Felix, Stéphane Badel