Patents by Inventor Simon C. Steely

Simon C. Steely has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20200409709
    Abstract: Systems, methods, and apparatuses relating to time-multiplexing circuitry in a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator (CSA) includes a plurality of processing elements; and a time-multiplexed, circuit switched interconnect network between the plurality of processing elements. In another embodiment, a configurable spatial accelerator (CSA) includes a plurality of time-multiplexed processing elements; and a time-multiplexed, circuit switched interconnect network between the plurality of time-multiplexed processing elements.
    Type: Application
    Filed: June 29, 2019
    Publication date: December 31, 2020
    Inventors: Kermin ChoFleming, Simon C. Steely, JR., Mitchell Diamond
  • Publication number: 20200371566
    Abstract: Embodiments herein may present an integrated circuit or a computing system having an integrated circuit, where the integrated circuit includes a physical network layer, a physical computing layer, and a physical memory layer, each having a set of dies, and a die including multiple tiles. The physical network layer further includes one or more signal pathways dynamically configurable between multiple pre-defined interconnect topologies for the multiple tiles, where each topology of the multiple pre-defined interconnect topologies corresponds to a communication pattern related to a workload. At least a tile in the physical computing layer is further arranged to move data to another tile in the physical computing layer or a storage cell of the physical memory layer through the one or more signal pathways in the physical network layer. Other embodiments may be described and/or claimed.
    Type: Application
    Filed: April 29, 2020
    Publication date: November 26, 2020
    Inventors: Simon C. Steely, JR., Richard Dischler, David Bach, Olivier Franza, William J. Butera, Christian Karl, Benjamin Keen, Brian Leung
  • Publication number: 20200310994
    Abstract: Systems, methods, and apparatuses relating to memory interface circuit allocation in a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator (CSA) includes a plurality of processing elements; a plurality of request address file (RAF) circuits, and a circuit switched interconnect network between the plurality of processing elements and the RAF circuits. As a dataflow architecture, embodiments of CSA have a unique memory architecture where memory accesses are decoupled into an explicit request and response phase allowing pipelining through memory. Certain embodiments herein provide for an improved memory sub-system design via the improvements to allocation discussed herein.
    Type: Application
    Filed: March 30, 2019
    Publication date: October 1, 2020
    Inventors: Kermin ChoFleming, Yu Bai, Simon C. Steely
  • Patent number: 10691182
    Abstract: Embodiments herein may present an integrated circuit or a computing system having an integrated circuit, where the integrated circuit includes a physical network layer, a physical computing layer, and a physical memory layer, each having a set of dies, and a die including multiple tiles. The physical network layer further includes one or more signal pathways dynamically configurable between multiple pre-defined interconnect topologies for the multiple tiles, where each topology of the multiple pre-defined interconnect topologies corresponds to a communication pattern related to a workload. At least a tile in the physical computing layer is further arranged to move data to another tile in the physical computing layer or a storage cell of the physical memory layer through the one or more signal pathways in the physical network layer. Other embodiments may be described and/or claimed.
    Type: Grant
    Filed: May 20, 2019
    Date of Patent: June 23, 2020
    Assignee: Intel Corporation
    Inventors: Simon C. Steely, Jr., Richard Dischler, David Bach, Olivier Franza, William J. Butera, Christian Karl, Benjamin Keen, Brian Leung
  • Patent number: 10572376
    Abstract: An integrated circuit includes a memory interface, coupled to a memory to store data corresponding to instructions, and an operations queue to buffer memory operations corresponding to the instructions. The integrated circuit may include acceleration hardware to execute a sub-program corresponding to the instructions. A set of input queues may include an address queue to receive, from the acceleration hardware, an address of the memory associated with a second memory operation of the memory operations, and a dependency queue to receive, from the acceleration hardware, a dependency token associated with the address. The dependency token indicates a dependency on data generated by a first memory operation of the memory operations. A scheduler circuit may schedule issuance of the second memory operation to the memory in response to the dependency queue receiving the dependency token and the address queue receiving the address.
    Type: Grant
    Filed: December 30, 2016
    Date of Patent: February 25, 2020
    Assignee: Intel Corporation
    Inventors: Kermin Elliott Fleming, Jr., Simon C. Steely, Jr., Kent D. Glossop
  • Patent number: 10558575
    Abstract: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In one embodiment, a processor includes a core with a decoder to decode an instruction into a decoded instruction and an execution unit to execute the decoded instruction to perform a first operation; a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements is to perform a second operation when an incoming operand set arrives at the plurality of processing elements.
    Type: Grant
    Filed: December 30, 2016
    Date of Patent: February 11, 2020
    Assignee: Intel Corporation
    Inventors: Kermin E. Fleming, Jr., Kent D. Glossop, Simon C. Steely, Jr., Jinjie Tang, Alan G. Gara
  • Patent number: 10515049
    Abstract: Methods and apparatuses relating to distributed memory hazard detection and error recovery are described. In one embodiment, a memory circuit includes a memory interface circuit to service memory requests from a spatial array of processing elements for data stored in a plurality of cache banks; and a hazard detection circuit in each of the plurality of cache banks, wherein a first hazard detection circuit for a speculative memory load request from the memory interface circuit, that is marked with a potential dynamic data dependency, to an address within a first cache bank of the first hazard detection circuit, is to mark the address for tracking of other memory requests to the address, store data from the address in speculative completion storage, and send the data from the speculative completion storage to the spatial array of processing elements when a memory dependency token is received for the speculative memory load request.
    Type: Grant
    Filed: July 1, 2017
    Date of Patent: December 24, 2019
    Assignee: intel corporation
    Inventors: Kermin E. Fleming, Simon C. Steely, Kent D. Glossop
  • Patent number: 10515046
    Abstract: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described.
    Type: Grant
    Filed: July 1, 2017
    Date of Patent: December 24, 2019
    Assignee: Intel Corporation
    Inventors: Kermin Fleming, Kent D. Glossop, Simon C. Steely, Jr.
  • Publication number: 20190384601
    Abstract: A processor of an aspect includes a plurality of packed data registers, and a decode unit to decode an instruction. The instruction is to indicate a packed data register of the plurality of packed data registers that is to store a source packed memory address information. The source packed memory address information is to include a plurality of memory address information data elements. An execution unit is coupled with the decode unit and the plurality of packed data registers, the execution unit, in response to the instruction, is to load a plurality of data elements from a plurality of memory addresses that are each to correspond to a different one of the plurality of memory address information data elements, and store the plurality of loaded data elements in a destination storage location. The destination storage location does not include a register of the plurality of packed data registers.
    Type: Application
    Filed: August 9, 2019
    Publication date: December 19, 2019
    Inventors: William C. Hasenplaugh, Chris J. Newburn, Simon C. Steely, JR., Samantika S. Sury
  • Patent number: 10496574
    Abstract: Systems, methods, and apparatuses relating to a memory fence mechanism in a configurable spatial accelerator are described. In one embodiment, a processor includes a plurality of processing elements and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements are to perform a plurality of operations, each by a respective, incoming operand set arriving at each of the dataflow operators of the plurality of processing elements. The processor also includes a fence manager to manage a memory fence between a first operation and a second operation of the plurality of operations.
    Type: Grant
    Filed: September 28, 2017
    Date of Patent: December 3, 2019
    Assignee: Intel Corporation
    Inventors: Kermin Fleming, Kent D. Glossop, Simon C. Steely, Jr.
  • Publication number: 20190354146
    Abstract: Embodiments herein may present an integrated circuit or a computing system having an integrated circuit, where the integrated circuit includes a physical network layer, a physical computing layer, and a physical memory layer, each having a set of dies, and a die including multiple tiles. The physical network layer further includes one or more signal pathways dynamically configurable between multiple pre-defined interconnect topologies for the multiple tiles, where each topology of the multiple pre-defined interconnect topologies corresponds to a communication pattern related to a workload. At least a tile in the physical computing layer is further arranged to move data to another tile in the physical computing layer or a storage cell of the physical memory layer through the one or more signal pathways in the physical network layer. Other embodiments may be described and/or claimed.
    Type: Application
    Filed: May 20, 2019
    Publication date: November 21, 2019
    Inventors: Simon C. Steely, JR., Richard Dischler, David Bach, Olivier Franza, William J. Butera, Christian Karl, Benjamin Keen, Brian Leung
  • Patent number: 10474375
    Abstract: An integrated circuit includes a processor to execute instructions and to interact with memory, and acceleration hardware, to execute a sub-program corresponding to instructions. A set of input queues includes a store address queue to receive, from the acceleration hardware, a first address of the memory, the first address associated with a store operation and a store data queue to receive, from the acceleration hardware, first data to be stored at the first address of the memory. The set of input queues also includes a completion queue to buffer response data for a load operation. A disambiguator circuit, coupled to the set of input queues and the memory, is to, responsive to determining the load operation, which succeeds the store operation, has an address conflict with the first address, copy the first data from the store data queue into the completion queue for the load operation.
    Type: Grant
    Filed: December 30, 2016
    Date of Patent: November 12, 2019
    Assignee: Intel Corporation
    Inventors: Kermin Elliott Fleming, Jr., Simon C. Steely, Jr., Kent D. Glossop
  • Patent number: 10467183
    Abstract: Methods and apparatuses relating to pipelined runtime services in spatial arrays are described.
    Type: Grant
    Filed: July 1, 2017
    Date of Patent: November 5, 2019
    Assignee: Intel Corporation
    Inventors: Kermin Fleming, Jr., Simon C. Steely, Jr., Kent D. Glossop
  • Patent number: 10469397
    Abstract: Systems, methods, and apparatuses relating to configurable network-based dataflow operator circuits are described. In one embodiment, a processor includes a spatial array of processing elements, and a packet switched communications network to route data within the spatial array between processing elements according to a dataflow graph to perform a first dataflow operation of the dataflow graph, wherein the packet switched communications network further comprises a plurality of network dataflow endpoint circuits to perform a second dataflow operation of the dataflow graph.
    Type: Grant
    Filed: July 1, 2017
    Date of Patent: November 5, 2019
    Assignee: Intel Corporation
    Inventors: Kermin Fleming, Kent D. Glossop, Simon C. Steely, Jr.
  • Patent number: 10445098
    Abstract: Methods and apparatuses relating to privileged configuration in spatial arrays are described. In one embodiment, a processor includes processing elements; an interconnect network between the processing elements; and a configuration controller coupled to a first subset and a second, different subset of the plurality of processing elements, the first subset having an output coupled to an input of the second, different subset, wherein the configuration controller is to configure the interconnect network between the first subset and the second, different subset of the plurality of processing elements to not allow communication on the interconnect network between the first subset and the second, different subset when a privilege bit is set to a first value and to allow communication on the interconnect network between the first subset and the second, different subset of the plurality of processing elements when the privilege bit is set to a second value.
    Type: Grant
    Filed: September 30, 2017
    Date of Patent: October 15, 2019
    Assignee: Intel Corporation
    Inventors: Kermin E. Fleming, Simon C. Steely, Kent D. Glossop
  • Patent number: 10445250
    Abstract: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In one embodiment, a processor includes a core with a decoder to decode an instruction into a decoded instruction and an execution unit to execute the decoded instruction to perform a first operation; a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements are to perform a second operation by a respective, incoming operand set arriving at each of the dataflow operators of the plurality of processing elements.
    Type: Grant
    Filed: December 30, 2017
    Date of Patent: October 15, 2019
    Assignee: Intel Corporation
    Inventors: Kermin E. Fleming, Kent D. Glossop, Simon C. Steely
  • Patent number: 10445234
    Abstract: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In an embodiment, a processor includes a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements are to perform an atomic operation when an incoming operand set arrives at the plurality of processing elements.
    Type: Grant
    Filed: July 1, 2017
    Date of Patent: October 15, 2019
    Assignee: Intel Corporation
    Inventors: Kermin Fleming, Kent D. Glossop, Simon C. Steely, Jr., Samantika S. Sury
  • Patent number: 10445451
    Abstract: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In one embodiment, a processor includes a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements is to perform an operation when an incoming operand set arrives at the plurality of processing elements. At least one of the plurality of processing elements includes a plurality of control inputs.
    Type: Grant
    Filed: July 1, 2017
    Date of Patent: October 15, 2019
    Assignee: Intel Corporation
    Inventors: Kermin Fleming, Kent D. Glossop, Simon C. Steely, Jr., Ping Tak Peter Tang
  • Publication number: 20190303263
    Abstract: Systems, methods, and apparatuses relating to integrated performance monitoring in a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator includes a first performance monitoring circuit coupled to a first proper subset of processing elements by a network to receive at least one monitoring value from each of the first plurality of the processing elements, generate a first aggregated monitoring value based on the at least one monitoring value from each of the first plurality of the processing elements, and send the first aggregated monitoring value to a performance manager circuit on a different network when a first threshold value is exceeded by the first aggregated monitoring value; and the performance manager circuit is to perform an action based on the first aggregated monitoring value.
    Type: Application
    Filed: March 30, 2018
    Publication date: October 3, 2019
    Inventors: KERMIN E. FLEMING, JR., SIMON C. STEELY, JR., JINJIE TANG
  • Publication number: 20190303297
    Abstract: Systems, methods, and apparatuses relating to remote memory access in a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator includes a first memory interface circuit coupled to a first processing element and a cache, the first memory interface circuit to issue a memory request to the cache, the memory request comprising a field to identify a second memory interface circuit as a receiver of data for the memory request; and the second memory interface circuit coupled to a second processing element and the cache, the second memory interface circuit to send a credit return value to the first memory interface circuit, to cause the first memory interface circuit to mark the memory request as complete, when the data for the memory request arrives at the second memory interface circuit and a completion configuration register of the second memory interface circuit is set to a remote response value.
    Type: Application
    Filed: April 2, 2018
    Publication date: October 3, 2019
    Inventors: KERMIN E. FLEMING, JR., SIMON C. STEELY, JR., KENT D. GLOSSOP