Patents by Inventor John Nicol

John Nicol has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10719470
    Abstract: Techniques are disclosed for data manipulation. Data is obtained from a first switching element where the first switching element is controlled by a first circular buffer. Data is sent to a second switching element where the second switching element is controlled by a second circular buffer. Data is controlled by a third switching element that is controlled by a third circular buffer. The third switching element hierarchically controls the first switching element and the second switching element. Data is routed through a fourth switching element that is controlled by a fourth circular buffer. The circular buffers are statically scheduled. The obtaining data from a first switching element and the sending the data to a second switching element includes a direct memory access (DMA). The switching elements can operate as a master controller or as a slave device. The switching elements can comprise clusters within an asynchronous reconfigurable fabric.
    Type: Grant
    Filed: September 22, 2017
    Date of Patent: July 21, 2020
    Assignee: Wave Computing, Inc.
    Inventor: Christopher John Nicol
  • Publication number: 20200167309
    Abstract: Techniques for reconfigurable fabric configuration using spatial and temporal routing are disclosed. A plurality of clusters within a reconfigurable fabric is allocated, where the plurality of clusters is configured to execute one or more functions. A first spatial routing and a first temporal routing through the reconfigurable fabric are calculated. A second spatial routing and a second temporal routing through the reconfigurable fabric are calculated. The first and second spatial routings and the first and second temporal routings are optimized. The one or more functions are executed using routings that were optimized. The first spatial routing and the second spatial routing enable a logical connection for data transfer between at least two clusters of the plurality of clusters. The optimizing places routing instructions in clusters along a routing path within the reconfigurable fabric. The routing instructions are placed in unused cluster control instruction locations to enable spatial routing.
    Type: Application
    Filed: November 27, 2019
    Publication date: May 28, 2020
    Inventor: Christopher John Nicol
  • Patent number: 10656911
    Abstract: Techniques are disclosed for power conservation. A plurality of processing elements and a plurality of instructions are configured. The plurality of processing elements is controlled by instructions contained in a plurality of circular buffers. The plurality of processing elements can comprise a data flow processor. A first processing element, from the plurality of interconnected processing elements, is set into a sleep state by a first instruction from the plurality of instructions. The first processing element is woken from the sleep state as a result of valid data being presented to the first processing element. A subsection of the plurality of interconnected processing elements is also set into a sleep state based on the first processing element being set into a sleep state.
    Type: Grant
    Filed: February 11, 2019
    Date of Patent: May 19, 2020
    Assignee: Wave Computing, Inc.
    Inventor: Christopher John Nicol
  • Patent number: 10659396
    Abstract: Techniques are disclosed for managing data within a reconfigurable computing environment. In a multiple processing element environment, such as a mesh network or other suitable topology, there is an inherent need to pass data between processing elements. Subtasks are divided among multiple processing elements. The output resulting from the subtasks is then merged by a downstream processing element. In such cases, a join operation can be used to combine data from multiple upstream processing elements. A control agent executes on each processing element. A memory buffer is disposed between upstream processing elements and the downstream processing element. The downstream processing element is configured to automatically perform an operation based on the availability of valid data from the upstream processing elements.
    Type: Grant
    Filed: June 28, 2018
    Date of Patent: May 19, 2020
    Assignee: Wave Computing, Inc.
    Inventor: Christopher John Nicol
  • Patent number: 10592444
    Abstract: A plurality of software programmable processors is disclosed. The software programmable processors are controlled by rotating circular buffers. A first processor and a second processor within the plurality of software programmable processors are individually programmable. The first processor within the plurality of software programmable processors is coupled to neighbor processors within the plurality of software programmable processors. The first processor sends and receives data from the neighbor processors. The first processor and the second processor are configured to operate on a common instruction cycle. An output of the first processor from a first instruction cycle is an input to the second processor on a subsequent instruction cycle.
    Type: Grant
    Filed: March 3, 2017
    Date of Patent: March 17, 2020
    Assignee: Wave Computing, Inc.
    Inventors: Christopher John Nicol, Samit Chaudhuri, Radoslav Danilak
  • Patent number: 10564929
    Abstract: A combination of memory units and dataflow processing units is disclosed for computation. A first memory unit is interposed between a first dataflow processing unit and a second dataflow processing unit. Operations for a dataflow graph are allocated across the first dataflow processing unit and the second dataflow processing unit. The first memory unit passes data between the first dataflow processing unit and the second dataflow processing unit to execute the dataflow graph. The first memory unit is a high bandwidth, shared memory device including a hybrid memory cube. The first dataflow processing unit and second dataflow processing unit include a plurality of circular buffers containing instructions for controlling data transfer between the first dataflow processing unit and second dataflow processing unit. Additional dataflow processing units and additional memory units are included for additional functionality and efficiency.
    Type: Grant
    Filed: August 1, 2017
    Date of Patent: February 18, 2020
    Assignee: Wave Computing, Inc.
    Inventors: Christopher John Nicol, Derek William Meyer
  • Patent number: 10505704
    Abstract: Disclosed embodiments provide an interface circuit for the transfer of data from a synchronous circuit to an asynchronous circuit. Data from the synchronous circuit is received into a memory in the interface circuit. The data in the memory is then sent to the asynchronous circuit based on an instruction in a circular buffer that is part of the interface circuit. Processing elements within the interface circuit execute instructions contained within the circular buffer. The circular buffer rotates to provide new instructions to the processing elements. Flow control paces the data from the synchronous circuit to the asynchronous circuit.
    Type: Grant
    Filed: August 1, 2016
    Date of Patent: December 10, 2019
    Assignee: Wave Computing, Inc.
    Inventor: Christopher John Nicol
  • Patent number: 10437728
    Abstract: Circular buffers containing instructions that enable the execution of operations on logical elements are described where data in the circular buffers is swapped to storage. The instructions comprise a branchless instruction set. Data stored in circular buffers is paged in and out to a second level memory. State information for each logical element is also saved and restored using paging memory. Instructions are provided to logical elements, such as processing elements, via circular buffers. The instructions enable a group of processing elements to perform operations implementing a desired functionality. That functionality is changed by updating the circular buffers with new instructions that are transferred from paging memory. The previous instructions can be saved off in paging memory before the new instructions are copied over to the circular buffers. This enables the hardware to be rapidly reconfigured amongst multiple functions.
    Type: Grant
    Filed: September 10, 2018
    Date of Patent: October 8, 2019
    Assignee: Wave Computing, Inc.
    Inventor: Christopher John Nicol
  • Publication number: 20190279038
    Abstract: Techniques are disclosed for data flow graph node parallel update for machine learning. A first plurality of processing elements is configured to implement a portion of a data flow graph. The nodes include at least one variable node and implement part of a neural network. A second plurality of processing elements is configured to implement a second portion of the data flow graph. These nodes include at least one additional variable node and implement an additional part of the neural network. Training data is issued to the first plurality of processing elements. The training data is used to update variables within the at least one variable node. Additional variables are updated within the at least one additional variable node. The updating includes forwarding training data from the first plurality to the second plurality. The neural network is trained based on the variables that were updated and the additional variables.
    Type: Application
    Filed: May 27, 2019
    Publication date: September 12, 2019
    Inventor: Christopher John Nicol
  • Publication number: 20190279086
    Abstract: Techniques are disclosed for data flow graph node update for machine learning. A plurality of processing elements is configured within a reconfigurable fabric to implement a data flow graph. The nodes of the data flow graph include one or more variable nodes, and the data flow graph implements a neural network. N copies of a variable contained in a variable node are issued, where the N copies are used for distribution within the data flow graph, and where N is an integer greater than or equal to one and less than or equal to the total number of nodes in the graph. The N copies of a variable are distributed within the data flow graph. The neural network is updated based on the N copies of a variable. Results from the distribution are averaged. The averaging includes parallel training of different data for machine learning.
    Type: Application
    Filed: May 27, 2019
    Publication date: September 12, 2019
    Inventors: Christopher John Nicol, Lin Zhong
  • Patent number: 10374605
    Abstract: Techniques are disclosed for designing a reconfigurable fabric. The reconfigurable fabric is designed using logical elements, configurable connections between and among the logical elements, and rotating circular buffers. The circular buffers contain configuration instructions. The configuration instructions control connections between and among logical elements. The logical elements change operation based on the instructions that rotate through the circular buffers. Clusters of logical elements are interconnected by a switching fabric. Each cluster contains processing elements, storage elements, and switching elements. A circular buffer within a cluster contains multiple switching instructions to control the flow of data throughout the switching fabric. The circular buffer provides a pipelined execution of switching instructions for the implementation of multiple functions.
    Type: Grant
    Filed: October 31, 2018
    Date of Patent: August 6, 2019
    Assignee: Wave Computing, Inc.
    Inventor: Christopher John Nicol
  • Patent number: 10374981
    Abstract: An interface circuit is disclosed for the transfer of data from a synchronous circuit, with multiple source elements, to an asynchronous circuit. Data from the synchronous circuit is received into a memory in the interface circuit. The data in the memory is then sent to the asynchronous circuit based on an instruction in a circular buffer that is part of the interface circuit. Processing elements within the interface circuit execute instructions contained within the circular buffer. The circular buffer rotates to provide new instructions to the processing elements. Flow control paces the data from the synchronous circuit to the asynchronous circuit.
    Type: Grant
    Filed: August 2, 2016
    Date of Patent: August 6, 2019
    Assignee: Wave Computing, Inc.
    Inventor: Christopher John Nicol
  • Publication number: 20190228037
    Abstract: Techniques are disclosed for checkpointing data flow graph computation for machine learning. Processing elements within a reconfigurable fabric are configured to implement a data flow graph. Nodes of the data flow graph can include variable nodes. The processing elements are loaded with process agents. Valid data is executed by a first process agent. The first process agent corresponds to a starting node of the data flow graph. Invalid data is sent to the first process agent. The invalid data initiates a checkpoint operation for the data flow graph. Invalid data is propagated from the starting node of the data flow graph to other nodes within the data flow graph. The variable nodes are paused upon receiving invalid data. Paused variable nodes within the data flow graph are restarted by issuing a run command, and valid data is sent to the starting node of the data flow graph.
    Type: Application
    Filed: March 29, 2019
    Publication date: July 25, 2019
    Inventors: Christopher John Nicol, Keith Mark Evans, Mehran Ramezani
  • Publication number: 20190228340
    Abstract: Techniques are disclosed for data manipulation that enables data flow graph computation for machine learning. A plurality of processing elements within a reconfigurable fabric is configured to implement a data flow graph. The nodes of the data flow graph include variable notes. The plurality of processing elements is initialized with a plurality of process agents. A first set of buffers is initialized for a first process agent, where the first process agent corresponds to a starting node of the data flow graph. A fire signal is issued for the starting node based on the first set of buffers being initialized. Results of operations are collected by a further process agent following receipt of the fire signal. The data flow graph computation can be paused by loading invalid data or by withholding new data from entering the data flow graph. The pausing can be controlled by an execution manager.
    Type: Application
    Filed: March 29, 2019
    Publication date: July 25, 2019
    Inventor: Christopher John Nicol
  • Publication number: 20190197018
    Abstract: Techniques are disclosed for dynamic reconfiguration using data transfer control. Clusters on a reconfigurable fabric are accessed to implement a logical operation. The logical operation can include a Boolean operation, a matrix operation, a tensor operation, etc. Clusters from the plurality of clusters are provisioned for implementation of a first agent on the reconfigurable fabric. The clusters can include quads. The one or more clusters provisioned for the first agent include a first data transfer control block. Additional clusters from the plurality of clusters are provisioned for implementation of a second agent on the reconfigurable fabric. The additional clusters provisioned for the second agent include a second data transfer control block. The logical operation is performed using the first agent. Control information is transferred from the first data transfer control block to the second data transfer control block.
    Type: Application
    Filed: March 1, 2019
    Publication date: June 27, 2019
    Inventors: Keith Mark Evans, Christopher John Nicol, Mehran Ramezani
  • Publication number: 20190171416
    Abstract: Techniques are disclosed for power conservation. A plurality of processing elements and a plurality of instructions are configured. The plurality of processing elements is controlled by instructions contained in a plurality of circular buffers. The plurality of processing elements can comprise a data flow processor. A first processing element, from the plurality of interconnected processing elements, is set into a sleep state by a first instruction from the plurality of instructions. The first processing element is woken from the sleep state as a result of valid data being presented to the first processing element. A subsection of the plurality of interconnected processing elements is also set into a sleep state based on the first processing element being set into a sleep state.
    Type: Application
    Filed: February 11, 2019
    Publication date: June 6, 2019
    Inventor: Christopher John Nicol
  • Publication number: 20190138373
    Abstract: Techniques are disclosed for multithreaded data flow processing within a reconfigurable fabric. Code is obtained for performing data manipulation within a reconfigurable fabric. The code is segmented into a plurality of data manipulation operations. A first segment from the segmenting is allocated to a first set of processing elements within a plurality of processing elements comprising a reconfigurable fabric. A second segment from the segmenting is allocated to a second set of processing elements within the reconfigurable fabric. The first segment is executed on the first set of processing elements while the second segment is executed on the second set of processing elements. The first kernel and the second kernel comprise multithreaded operation.
    Type: Application
    Filed: December 28, 2018
    Publication date: May 9, 2019
    Inventors: Christopher John Nicol, Derek William Meyer
  • Publication number: 20190130269
    Abstract: Techniques are disclosed for pipelined tensor manipulation within a reconfigurable fabric. A tensor is obtained for processing on a reconfigurable fabric comprised of a plurality of processing elements. The tensor is applied as input to a pipeline of agents running on the plurality of processing elements. The tensor is sectioned into subsections. A first subsection from the one or more subsections is applied to a first agent in the pipeline of agents. A first result is calculated by the first agent for the first subsection. The first result is output to a second agent in the pipeline of agents. A second result is calculated, by the second agent, based on the first result. A subsection done indication is sent, by the second agent, to the first agent, when the calculating the second result is accomplished. The second result is output to a third agent in the pipeline of agents.
    Type: Application
    Filed: December 4, 2018
    Publication date: May 2, 2019
    Inventor: Christopher John Nicol
  • Publication number: 20190130291
    Abstract: Techniques are disclosed for dynamic reconfiguration with partially resident agents. A plurality of clusters on a reconfigurable fabric is accessed to implement a logical operation. Two or more clusters from the plurality of clusters are provisioned for implementation of a first agent on the reconfigurable fabric wherein the first agent is comprised of an agent control unit and an agent kernel. The logical operation is executed, on the two or more clusters of the reconfigurable fabric, using the first agent. The agent kernel is removed from the reconfigurable fabric while leaving the agent control unit resident on the reconfigurable fabric. The agent control unit is further used to buffer data incoming to the first agent. The agent control unit is further used to provide data to logic downstream from the first agent. A second agent provides data incoming to the first agent.
    Type: Application
    Filed: December 21, 2018
    Publication date: May 2, 2019
    Inventor: Christopher John Nicol
  • Publication number: 20190130270
    Abstract: Techniques are disclosed for tensor manipulation within a reconfigurable fabric using pointers. A first tensor is obtained for processing on a reconfigurable fabric comprised of a plurality of processing, storage, and switching elements. A first agent is deployed on one or more of the plurality of processing elements of the reconfigurable fabric. The first tensor is manipulated by the first agent. The results of the manipulating the first tensor are stored in a storage element external from the first agent. A pointer is provided to a second agent deployed on one or more of the plurality of processing elements of the reconfigurable fabric, wherein the pointer identifies an address of the storage element at which the first tensor is stored. A transfer buffer is used between the first agent and the second agent within the reconfigurable fabric to facilitate tensor transfers between the first agent and the second agent.
    Type: Application
    Filed: December 4, 2018
    Publication date: May 2, 2019
    Inventors: Christopher John Nicol, David Jay O'Shea