Patents by Inventor Simon C. Steely

Simon C. Steely has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10430252
    Abstract: In an embodiment, a processor includes a plurality of cores and synchronization logic. The synchronization logic includes circuitry to: receive a first memory request and a second memory request; determine whether the second memory request is in contention with the first memory request; and in response to a determination that the second memory request is in contention with the first memory request, process the second memory request using a non-blocking cache coherence protocol. Other embodiments are described and claimed.
    Type: Grant
    Filed: November 15, 2018
    Date of Patent: October 1, 2019
    Assignee: Intel Corporation
    Inventors: Samantika S. Sury, Robert G. Blankenship, Simon C. Steely, Jr.
  • Patent number: 10416999
    Abstract: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In one embodiment, a processor includes a core with a decoder to decode an instruction into a decoded instruction and an execution unit to execute the decoded instruction to perform a first operation; a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements are to perform a second operation by a respective, incoming operand set arriving at each of the dataflow operators of the plurality of processing elements.
    Type: Grant
    Filed: December 30, 2016
    Date of Patent: September 17, 2019
    Assignee: Intel Corporation
    Inventors: Kermin Fleming, Kent D. Glossop, Simon C. Steely, Jr.
  • Patent number: 10417175
    Abstract: Methods and apparatuses relating to consistency in an accelerator are described. In one embodiment, request address file (RAF) circuits are coupled to a spatial array by a first network, a memory is coupled to the RAF circuits by a second network, a RAF circuit is to not issue, into the second network, a request to the memory marked with a program order dependency on a previous request until receiving a first token generated by completion of the previous request to the memory by another RAF circuit, and a second RAF circuit is to not issue, into the second network, a second request to the memory marked with a program order dependency on a first request until receiving a second token sent by a first RAF circuit when a predetermined time period has lapsed since the first request was issued by the first RAF circuit into the second network.
    Type: Grant
    Filed: December 30, 2017
    Date of Patent: September 17, 2019
    Assignee: Intel Corporation
    Inventors: Kermin E. Fleming, Simon C. Steely, Jr., Kent D. Glossop
  • Patent number: 10402168
    Abstract: A floating point multiply-add unit having inputs coupled to receive a floating point multiplier data element, a floating point multiplicand data element, and a floating point addend data element. The multiply-add unit including a mantissa multiplier to multiply a mantissa of the multiplier data element and a mantissa of the multiplicand data element to calculate a mantissa product. The mantissa multiplier including a most significant bit portion to calculate most significant bits of the mantissa product, and a least significant bit portion to calculate least significant bits of the mantissa product. The mantissa multiplier has a plurality of different possible sizes of the least significant bit portion. Energy consumption reduction logic to selectively reduce energy consumption of the least significant bit portion, but not the most significant bit portion, to cause the least significant bit portion to not calculate the least significant bits of the mantissa product.
    Type: Grant
    Filed: October 1, 2016
    Date of Patent: September 3, 2019
    Assignee: Intel Corporation
    Inventors: William C. Hasenplaugh, Kermin E. Fleming, Jr., Tryggve Fossum, Simon C. Steely, Jr.
  • Publication number: 20190258481
    Abstract: A processor of an aspect includes a decode unit to decode a matrix multiplication instruction. The matrix multiplication instruction is to indicate a first memory location of a first source matrix, is to indicate a second memory location of a second source matrix, and is to indicate a third memory location where a result matrix is to be stored. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the matrix multiplication instruction, is to multiply a portion of the first and second source matrices prior to an interruption, and store a completion progress indicator in response to the interruption. The completion progress indicator to indicate an amount of progress in multiplying the first and second source matrices, and storing corresponding result data to the third memory location, that is to have been completed prior to the interruption.
    Type: Application
    Filed: April 29, 2019
    Publication date: August 22, 2019
    Inventors: Edward T. GROCHOWSKI, Asit K. MISHRA, Robert VALENTINE, Mark J. CHARNEY, Simon C. STEELY, JR.
  • Patent number: 10387319
    Abstract: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In one embodiment, a processor includes a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements is to perform an operation when an incoming operand set arrives at the plurality of processing elements. The processor also includes a streamer element to prefetch the incoming operand set from two or more levels of a memory system.
    Type: Grant
    Filed: July 1, 2017
    Date of Patent: August 20, 2019
    Assignee: Intel Corporation
    Inventors: Michael C. Adler, Chiachen Chou, Neal C. Crago, Kermin Fleming, Kent D. Glossop, Aamer Jaleel, Pratik M. Marolia, Simon C. Steely, Jr., Samantika S. Sury
  • Patent number: 10380063
    Abstract: Systems, methods, and apparatuses relating to a sequencer dataflow operator of a configurable spatial accelerator are described. In one embodiment, an interconnect network between a plurality of processing elements receives an input of a dataflow graph comprising a plurality of nodes forming a loop construct, wherein the dataflow graph is overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements and at least one dataflow operator controlled by a sequencer dataflow operator of the plurality of processing elements, and the plurality of processing elements is to perform an operation when an incoming operand set arrives at the plurality of processing elements and the sequencer dataflow operator generates control signals for the at least one dataflow operator in the plurality of processing elements.
    Type: Grant
    Filed: September 30, 2017
    Date of Patent: August 13, 2019
    Assignee: Intel Corporation
    Inventors: Jinjie Tang, Kermin E. Fleming, Simon C. Steely, Kent D. Glossop, Jim Sukha
  • Patent number: 10379855
    Abstract: A processor of an aspect includes a plurality of packed data registers, and a decode unit to decode an instruction. The instruction is to indicate a packed data register of the plurality of packed data registers that is to store a source packed memory address information. The source packed memory address information is to include a plurality of memory address information data elements. An execution unit is coupled with the decode unit and the plurality of packed data registers, the execution unit, in response to the instruction, is to load a plurality of data elements from a plurality of memory addresses that are each to correspond to a different one of the plurality of memory address information data elements, and store the plurality of loaded data elements in a destination storage location. The destination storage location does not include a register of the plurality of packed data registers.
    Type: Grant
    Filed: September 30, 2016
    Date of Patent: August 13, 2019
    Assignee: Intel Corporation
    Inventors: William C. Hasenplaugh, Chris J. Newburn, Simon C. Steely, Jr., Samantika S. Sury
  • Publication number: 20190205284
    Abstract: Methods and apparatuses relating to consistency in an accelerator are described. In one embodiment, request address file (RAF) circuits are coupled to a spatial array by a first network, a memory is coupled to the RAF circuits by a second network, a RAF circuit is to not issue, into the second network, a request to the memory marked with a program order dependency on a previous request until receiving a first token generated by completion of the previous request to the memory by another RAF circuit, and a second RAF circuit is to not issue, into the second network, a second request to the memory marked with a program order dependency on a first request until receiving a second token sent by a first RAF circuit when a predetermined time period has lapsed since the first request was issued by the first RAF circuit into the second network.
    Type: Application
    Filed: December 30, 2017
    Publication date: July 4, 2019
    Inventors: Kermin E. Fleming, Simon C. Steely, Jr., Kent D. Glossop
  • Publication number: 20190205263
    Abstract: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In one embodiment, a processor includes a core with a decoder to decode an instruction into a decoded instruction and an execution unit to execute the decoded instruction to perform a first operation; a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements are to perform a second operation by a respective, incoming operand set arriving at each of the dataflow operators of the plurality of processing elements.
    Type: Application
    Filed: December 30, 2017
    Publication date: July 4, 2019
    Inventors: Kermin E. Fleming, Kent D. Glossop, Simon C. Steely
  • Patent number: 10275243
    Abstract: A processor of an aspect includes a decode unit to decode a matrix multiplication instruction. The matrix multiplication instruction is to indicate a first memory location of a first source matrix, is to indicate a second memory location of a second source matrix, and is to indicate a third memory location where a result matrix is to be stored. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the matrix multiplication instruction, is to multiply a portion of the first and second source matrices prior to an interruption, and store a completion progress indicator in response to the interruption. The completion progress indicator to indicate an amount of progress in multiplying the first and second source matrices, and storing corresponding result data to the third memory location, that is to have been completed prior to the interruption.
    Type: Grant
    Filed: July 2, 2016
    Date of Patent: April 30, 2019
    Assignee: Intel Corporation
    Inventors: Edward T. Grochowski, Asit K. Mishra, Robert Valentine, Mark J. Charney, Simon C. Steely, Jr.
  • Publication number: 20190102295
    Abstract: A method for adaptively performing a set of data transfer processes in a multi-core processor is described. The method may include receiving, by a shared cache from a first core cache, a first request for a cache line; determining, by the shared cache in response to receipt of the first request, whether the cache line is a widely-shared cache line or a single-producer-single-consumer cache line; and performing, by the first core cache and a second core cache, a three-hop data transfer process in response to determining that the cache line is a single-producer-single-consumer cache line, wherein the three-hop data transfer process transfers the cache line directly from the second core cache to the first core cache.
    Type: Application
    Filed: September 29, 2017
    Publication date: April 4, 2019
    Inventors: Samantika S. Sury, Robert G. Blankenship, Simon C. Steely, JR., Yen-Cheng Liu
  • Publication number: 20190102179
    Abstract: Methods and apparatuses relating to privileged configuration in spatial arrays are described. In one embodiment, a processor includes processing elements; an interconnect network between the processing elements; and a configuration controller coupled to a first subset and a second, different subset of the plurality of processing elements, the first subset having an output coupled to an input of the second, different subset, wherein the configuration controller is to configure the interconnect network between the first subset and the second, different subset of the plurality of processing elements to not allow communication on the interconnect network between the first subset and the second, different subset when a privilege bit is set to a first value and to allow communication on the interconnect network between the first subset and the second, different subset of the plurality of processing elements when the privilege bit is set to a second value.
    Type: Application
    Filed: September 30, 2017
    Publication date: April 4, 2019
    Inventors: KERMIN E. FLEMING, SIMON C. STEELY, KENT D. GLOSSOP
  • Publication number: 20190102338
    Abstract: Systems, methods, and apparatuses relating to a sequencer dataflow operator of a configurable spatial accelerator are described. In one embodiment, an interconnect network between a plurality of processing elements receives an input of a dataflow graph comprising a plurality of nodes forming a loop construct, wherein the dataflow graph is overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements and at least one dataflow operator controlled by a sequencer dataflow operator of the plurality of processing elements, and the plurality of processing elements is to perform an operation when an incoming operand set arrives at the plurality of processing elements and the sequencer dataflow operator generates control signals for the at least one dataflow operator in the plurality of processing elements.
    Type: Application
    Filed: September 30, 2017
    Publication date: April 4, 2019
    Inventors: JINJIE TANG, KERMIN E. FLEMING, SIMON C. STEELY, KENT D. GLOSSOP, JIM SUKHA
  • Publication number: 20190095383
    Abstract: Systems, methods, and apparatuses relating to debugging a configurable spatial accelerator are described. In one embodiment, a processor includes a plurality of processing elements and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements are to perform an operation by a respective, incoming operand set arriving at each of the dataflow operators of the plurality of processing elements. At least a first of the plurality of processing elements is to enter a halted state in response to being represented as a first of the plurality of dataflow operators.
    Type: Application
    Filed: September 28, 2017
    Publication date: March 28, 2019
    Inventors: Kermin Fleming, Simon C. Steely, JR., Kent D. Glossop
  • Publication number: 20190095369
    Abstract: Systems, methods, and apparatuses relating to a memory fence mechanism in a configurable spatial accelerator are described. In one embodiment, a processor includes a plurality of processing elements and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements are to perform a plurality of operations, each by a respective, incoming operand set arriving at each of the dataflow operators of the plurality of processing elements. The processor also includes a fence manager to manage a memory fence between a first operation and a second operation of the plurality of operations.
    Type: Application
    Filed: September 28, 2017
    Publication date: March 28, 2019
    Inventors: Kermin Fleming, Kent D. Glossop, Simon C. Steely, JR.
  • Publication number: 20190087240
    Abstract: In an embodiment, a processor includes a plurality of cores and synchronization logic. The synchronization logic includes circuitry to: receive a first memory request and a second memory request; determine whether the second memory request is in contention with the first memory request; and in response to a determination that the second memory request is in contention with the first memory request, process the second memory request using a non-blocking cache coherence protocol. Other embodiments are described and claimed.
    Type: Application
    Filed: November 15, 2018
    Publication date: March 21, 2019
    Inventors: Samantika S. Sury, Robert G. Blankenship, Simon C. Steely, JR.
  • Publication number: 20190042513
    Abstract: Systems, methods, and apparatuses relating to operations in a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator includes a first processing element that includes a configuration register within the first processing element to store a configuration value that causes the first processing element to perform an operation according to the configuration value, a plurality of input queues, an input controller to control enqueue and dequeue of values into the plurality of input queues according to the configuration value, a plurality of output queues, and an output controller to control enqueue and dequeue of values into the plurality of output queues according to the configuration value.
    Type: Application
    Filed: June 30, 2018
    Publication date: February 7, 2019
    Inventors: Kermin E. FLEMING, JR., Simon C. STEELY, JR., Kent D. GLOSSOP, Mitchell DIAMOND, Benjamin KEEN, Dennis BRADFORD, Fabrizio Petrini, Barry TANNENBAUM, Yonghzi ZHANG
  • Publication number: 20190042534
    Abstract: Embodiments herein may present a multi-tile processor including a plurality of processor tiles, and a plurality of interconnects selectively coupling the plurality of processor tiles to each other. A first processor tile may include a memory to store a bulletin board to hold a message, an execution unit, and an encapsulated software module. The encapsulated software module may select a second processor tile coupled with the first processor tile by an interconnect to be a part of a signal pathway. The second processor tile may be selected based on a selection criterion of the signal pathway and the message held in the bulletin board. The encapsulated software module may post and read a message at the bulletin board stored in the memory, or read a message from a bulletin board stored in a memory of the second processor tile. Other embodiments may be described and/or claimed.
    Type: Application
    Filed: May 15, 2018
    Publication date: February 7, 2019
    Inventors: William J. Butera, Simon C. Steely, JR., Richard J. Dischler
  • Publication number: 20190018815
    Abstract: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described.
    Type: Application
    Filed: July 1, 2017
    Publication date: January 17, 2019
    Inventors: KERMIN FLEMING, KENT D. GLOSSOP, SIMON C. STEELY, JR.