Patents by Inventor Simon C. Steely, Jr.

Simon C. Steely, Jr. has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11907713
    Abstract: Systems, methods, and apparatuses relating to a sign modification field for fused operations in a configurable spatial accelerator are described.
    Type: Grant
    Filed: December 28, 2019
    Date of Patent: February 20, 2024
    Assignee: Intel Corporation
    Inventors: Kermin E. Chofleming, Chuanjun Zhang, Daniel Towner, Simon C. Steely, Jr., Benjamin Keen
  • Publication number: 20230409478
    Abstract: Latency on the miss path to a cache level in a CPU module is reduced by predicting when a cache miss is likely. Main memory is directly accessed in parallel with the access to the cache level in the CPU module based on the prediction that a cache miss is likely in the cache level.
    Type: Application
    Filed: September 1, 2023
    Publication date: December 21, 2023
    Inventors: Kermin CHOFLEMING, Yu BAI, Simon C. STEELY, JR.
  • Publication number: 20230409318
    Abstract: A processor of an aspect includes a decode unit to decode a matrix multiplication instruction. The matrix multiplication instruction is to indicate a first memory location of a first source matrix, is to indicate a second memory location of a second source matrix, and is to indicate a third memory location where a result matrix is to be stored. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the matrix multiplication instruction, is to multiply a portion of the first and second source matrices prior to an interruption, and store a completion progress indicator in response to the interruption. The completion progress indicator to indicate an amount of progress in multiplying the first and second source matrices, and storing corresponding result data to the third memory location, that is to have been completed prior to the interruption.
    Type: Application
    Filed: August 30, 2023
    Publication date: December 21, 2023
    Inventors: Edward T. GROCHOWSKI, Asit K. MISHRA, Robert VALENTINE, Mark J. CHARNEY, Simon C. STEELY, JR.
  • Patent number: 11698787
    Abstract: A processor of an aspect includes a decode unit to decode a matrix multiplication instruction. The matrix multiplication instruction is to indicate a first memory location of a first source matrix, is to indicate a second memory location of a second source matrix, and is to indicate a third memory location where a result matrix is to be stored. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the matrix multiplication instruction, is to multiply a portion of the first and second source matrices prior to an interruption, and store a completion progress indicator in response to the interruption. The completion progress indicator to indicate an amount of progress in multiplying the first and second source matrices, and storing corresponding result data to the third memory location, that is to have been completed prior to the interruption.
    Type: Grant
    Filed: June 29, 2021
    Date of Patent: July 11, 2023
    Assignee: INTEL CORPORATION
    Inventors: Edward T. Grochowski, Asit K. Mishra, Robert Valentine, Mark J. Charney, Simon C. Steely, Jr.
  • Patent number: 11656662
    Abstract: Embodiments herein may present an integrated circuit or a computing system having an integrated circuit, where the integrated circuit includes a physical network layer, a physical computing layer, and a physical memory layer, each having a set of dies, and a die including multiple tiles. The physical network layer further includes one or more signal pathways dynamically configurable between multiple pre-defined interconnect topologies for the multiple tiles, where each topology of the multiple pre-defined interconnect topologies corresponds to a communication pattern related to a workload. At least a tile in the physical computing layer is further arranged to move data to another tile in the physical computing layer or a storage cell of the physical memory layer through the one or more signal pathways in the physical network layer. Other embodiments may be described and/or claimed.
    Type: Grant
    Filed: February 11, 2021
    Date of Patent: May 23, 2023
    Assignee: Intel Corporation
    Inventors: Simon C. Steely, Jr., Richard Dischler, David Bach, Olivier Franza, William J. Butera, Christian Karl, Benjamin Keen, Brian Leung
  • Patent number: 11593295
    Abstract: Systems, methods, and apparatuses relating to operations in a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator includes a first processing element that includes a configuration register within the first processing element to store a configuration value that causes the first processing element to perform an operation according to the configuration value, a plurality of input queues, an input controller to control enqueue and dequeue of values into the plurality of input queues according to the configuration value, a plurality of output queues, and an output controller to control enqueue and dequeue of values into the plurality of output queues according to the configuration value.
    Type: Grant
    Filed: December 14, 2021
    Date of Patent: February 28, 2023
    Assignee: Intel Corporation
    Inventors: Kermin E. Fleming, Jr., Simon C. Steely, Jr., Kent D. Glossop, Mitchell Diamond, Benjamin Keen, Dennis Bradford, Fabrizio Petrini, Barry Tannenbaum, Yongzhi Zhang
  • Patent number: 11269805
    Abstract: Embodiments herein may present a multi-tile processor including a plurality of processor tiles, and a plurality of interconnects selectively coupling the plurality of processor tiles to each other. A first processor tile may include a memory to store a bulletin board to hold a message, an execution unit, and an encapsulated software module. The encapsulated software module may select a second processor tile coupled with the first processor tile by an interconnect to be a part of a signal pathway. The second processor tile may be selected based on a selection criterion of the signal pathway and the message held in the bulletin board. The encapsulated software module may post and read a message at the bulletin board stored in the memory, or read a message from a bulletin board stored in a memory of the second processor tile. Other embodiments may be described and/or claimed.
    Type: Grant
    Filed: May 15, 2018
    Date of Patent: March 8, 2022
    Assignee: Intel Corporation
    Inventors: William J. Butera, Simon C. Steely, Jr., Richard J. Dischler
  • Patent number: 11200186
    Abstract: Systems, methods, and apparatuses relating to operations in a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator includes a first processing element that includes a configuration register within the first processing element to store a configuration value that causes the first processing element to perform an operation according to the configuration value, a plurality of input queues, an input controller to control enqueue and dequeue of values into the plurality of input queues according to the configuration value, a plurality of output queues, and an output controller to control enqueue and dequeue of values into the plurality of output queues according to the configuration value.
    Type: Grant
    Filed: June 30, 2018
    Date of Patent: December 14, 2021
    Assignee: Intel Corporation
    Inventors: Kermin E. Fleming, Jr., Simon C. Steely, Jr., Kent D. Glossop, Mitchell Diamond, Benjamin Keen, Dennis Bradford, Fabrizio Petrini, Barry Tannenbaum, Yongzhi Zhang
  • Publication number: 20210326131
    Abstract: A processor of an aspect includes a decode unit to decode a matrix multiplication instruction. The matrix multiplication instruction is to indicate a first memory location of a first source matrix, is to indicate a second memory location of a second source matrix, and is to indicate a third memory location where a result matrix is to be stored. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the matrix multiplication instruction, is to multiply a portion of the first and second source matrices prior to an interruption, and store a completion progress indicator in response to the interruption. The completion progress indicator to indicate an amount of progress in multiplying the first and second source matrices, and storing corresponding result data to the third memory location, that is to have been completed prior to the interruption.
    Type: Application
    Filed: June 29, 2021
    Publication date: October 21, 2021
    Inventors: Edward T. GROCHOWSKI, Asit K. MISHRA, Robert VALENTINE, Mark J. CHARNEY, Simon C. STEELY, JR.
  • Publication number: 20210255674
    Abstract: Embodiments herein may present an integrated circuit or a computing system having an integrated circuit, where the integrated circuit includes a physical network layer, a physical computing layer, and a physical memory layer, each having a set of dies, and a die including multiple tiles. The physical network layer further includes one or more signal pathways dynamically configurable between multiple pre-defined interconnect topologies for the multiple tiles, where each topology of the multiple pre-defined interconnect topologies corresponds to a communication pattern related to a workload. At least a tile in the physical computing layer is further arranged to move data to another tile in the physical computing layer or a storage cell of the physical memory layer through the one or more signal pathways in the physical network layer. Other embodiments may be described and/or claimed.
    Type: Application
    Filed: February 11, 2021
    Publication date: August 19, 2021
    Inventors: Simon C. Steely, JR., Richard Dischler, David Bach, Olivier Franza, William J. Butera, Christian Karl, Benjamin Keen, Brian Leung
  • Patent number: 11086816
    Abstract: Systems, methods, and apparatuses relating to debugging a configurable spatial accelerator are described. In one embodiment, a processor includes a plurality of processing elements and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements are to perform an operation by a respective, incoming operand set arriving at each of the dataflow operators of the plurality of processing elements. At least a first of the plurality of processing elements is to enter a halted state in response to being represented as a first of the plurality of dataflow operators.
    Type: Grant
    Filed: September 28, 2017
    Date of Patent: August 10, 2021
    Assignee: Intel Corporation
    Inventors: Kermin Fleming, Simon C. Steely, Jr., Kent D. Glossop
  • Publication number: 20210224213
    Abstract: Examples include techniques for near data acceleration for a multi-core architecture. A near data processor included in a memory controller of a processor may access data maintained in a memory device coupled with the near data processor via one or more memory channels responsive to a work request to execute a kernel, an application or a loop routine using the accessed data to generate values. The near data processor provides an indication to the requestor of the work request that values have been generated.
    Type: Application
    Filed: March 19, 2021
    Publication date: July 22, 2021
    Inventors: Swapna RAJ, Samantika S. SURY, Kermin CHOFLEMING, Simon C. STEELY, JR.
  • Patent number: 11068264
    Abstract: A processor of an aspect includes a plurality of packed data registers, and a decode unit to decode an instruction. The instruction is to indicate a packed data register of the plurality of packed data registers that is to store a source packed memory address information. The source packed memory address information is to include a plurality of memory address information data elements. An execution unit is coupled with the decode unit and the plurality of packed data registers, the execution unit, in response to the instruction, is to load a plurality of data elements from a plurality of memory addresses that are each to correspond to a different one of the plurality of memory address information data elements, and store the plurality of loaded data elements in a destination storage location. The destination storage location does not include a register of the plurality of packed data registers.
    Type: Grant
    Filed: August 9, 2019
    Date of Patent: July 20, 2021
    Assignee: Intel Corporation
    Inventors: William C. Hasenplaugh, Chris J. Newburn, Simon C. Steely, Jr., Samantika S. Sury
  • Publication number: 20210200540
    Abstract: Systems, methods, and apparatuses relating to fused operations in a configurable spatial accelerator are described.
    Type: Application
    Filed: December 28, 2019
    Publication date: July 1, 2021
    Inventors: Kermin E. CHOFLEMING, Chuanjun ZHANG, Daniel TOWNER, Simon C. STEELY, JR., Benjamin KEEN
  • Patent number: 11048508
    Abstract: A processor of an aspect includes a decode unit to decode a matrix multiplication instruction. The matrix multiplication instruction is to indicate a first memory location of a first source matrix, is to indicate a second memory location of a second source matrix, and is to indicate a third memory location where a result matrix is to be stored. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the matrix multiplication instruction, is to multiply a portion of the first and second source matrices prior to an interruption, and store a completion progress indicator in response to the interruption. The completion progress indicator to indicate an amount of progress in multiplying the first and second source matrices, and storing corresponding result data to the third memory location, that is to have been completed prior to the interruption.
    Type: Grant
    Filed: April 29, 2019
    Date of Patent: June 29, 2021
    Assignee: Intel Corporation
    Inventors: Edward T. Grochowski, Asit K. Mishra, Robert Valentine, Mark J. Charney, Simon C. Steely, Jr.
  • Patent number: 10963022
    Abstract: Embodiments herein may present an integrated circuit or a computing system having an integrated circuit, where the integrated circuit includes a physical network layer, a physical computing layer, and a physical memory layer, each having a set of dies, and a die including multiple tiles. The physical network layer further includes one or more signal pathways dynamically configurable between multiple pre-defined interconnect topologies for the multiple tiles, where each topology of the multiple pre-defined interconnect topologies corresponds to a communication pattern related to a workload. At least a tile in the physical computing layer is further arranged to move data to another tile in the physical computing layer or a storage cell of the physical memory layer through the one or more signal pathways in the physical network layer. Other embodiments may be described and/or claimed.
    Type: Grant
    Filed: April 29, 2020
    Date of Patent: March 30, 2021
    Assignee: Intel Corporation
    Inventors: Simon C. Steely, Jr., Richard Dischler, David Bach, Olivier Franza, William J. Butera, Christian Karl, Benjamin Keen, Brian Leung
  • Patent number: 10891254
    Abstract: Embodiments relate to a computational device including multiple processor tiles on a die that may have multiple switchable topologies. A topology of the computational device may include one or more virtual circuits. A virtual circuit may include multiple processor tiles. A processor tile of a virtual circuit of a topology may include a configuration vector to control a connection between the processor tile and a neighboring processor tile. A first topology of the computation device may correspond to a first phase of a computation of a program, and a second topology of the computation device may correspond to a second phase of the computation of the program. Other embodiments may be described and/or claimed.
    Type: Grant
    Filed: June 29, 2017
    Date of Patent: January 12, 2021
    Assignee: Intel Corporation
    Inventors: William J. Butera, Simon C. Steely, Jr., Richard J. Dischler
  • Publication number: 20200409709
    Abstract: Systems, methods, and apparatuses relating to time-multiplexing circuitry in a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator (CSA) includes a plurality of processing elements; and a time-multiplexed, circuit switched interconnect network between the plurality of processing elements. In another embodiment, a configurable spatial accelerator (CSA) includes a plurality of time-multiplexed processing elements; and a time-multiplexed, circuit switched interconnect network between the plurality of time-multiplexed processing elements.
    Type: Application
    Filed: June 29, 2019
    Publication date: December 31, 2020
    Inventors: Kermin ChoFleming, Simon C. Steely, JR., Mitchell Diamond
  • Publication number: 20200371566
    Abstract: Embodiments herein may present an integrated circuit or a computing system having an integrated circuit, where the integrated circuit includes a physical network layer, a physical computing layer, and a physical memory layer, each having a set of dies, and a die including multiple tiles. The physical network layer further includes one or more signal pathways dynamically configurable between multiple pre-defined interconnect topologies for the multiple tiles, where each topology of the multiple pre-defined interconnect topologies corresponds to a communication pattern related to a workload. At least a tile in the physical computing layer is further arranged to move data to another tile in the physical computing layer or a storage cell of the physical memory layer through the one or more signal pathways in the physical network layer. Other embodiments may be described and/or claimed.
    Type: Application
    Filed: April 29, 2020
    Publication date: November 26, 2020
    Inventors: Simon C. Steely, JR., Richard Dischler, David Bach, Olivier Franza, William J. Butera, Christian Karl, Benjamin Keen, Brian Leung
  • Patent number: 10691182
    Abstract: Embodiments herein may present an integrated circuit or a computing system having an integrated circuit, where the integrated circuit includes a physical network layer, a physical computing layer, and a physical memory layer, each having a set of dies, and a die including multiple tiles. The physical network layer further includes one or more signal pathways dynamically configurable between multiple pre-defined interconnect topologies for the multiple tiles, where each topology of the multiple pre-defined interconnect topologies corresponds to a communication pattern related to a workload. At least a tile in the physical computing layer is further arranged to move data to another tile in the physical computing layer or a storage cell of the physical memory layer through the one or more signal pathways in the physical network layer. Other embodiments may be described and/or claimed.
    Type: Grant
    Filed: May 20, 2019
    Date of Patent: June 23, 2020
    Assignee: Intel Corporation
    Inventors: Simon C. Steely, Jr., Richard Dischler, David Bach, Olivier Franza, William J. Butera, Christian Karl, Benjamin Keen, Brian Leung