Patents by Inventor Simon C. Steely

Simon C. Steely has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Techniques for near data acceleration for a multi-core architecture

Patent number: 12204478

Abstract: Examples include techniques for near data acceleration for a multi-core architecture. A near data processor included in a memory controller of a processor may access data maintained in a memory device coupled with the near data processor via one or more memory channels responsive to a work request to execute a kernel, an application or a loop routine using the accessed data to generate values. The near data processor provides an indication to the requestor of the work request that values have been generated.

Type: Grant

Filed: March 19, 2021

Date of Patent: January 21, 2025

Assignee: Intel Corporation

Inventors: Swapna Raj, Samantika S. Sury, Kermin Chofleming, Simon C. Steely, Jr.
Interruptible and restartable matrix multiplication instructions, processors, methods, and systems

Patent number: 12204898

Abstract: A processor of an aspect includes a decode unit to decode a matrix multiplication instruction. The matrix multiplication instruction is to indicate a first memory location of a first source matrix, is to indicate a second memory location of a second source matrix, and is to indicate a third memory location where a result matrix is to be stored. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the matrix multiplication instruction, is to multiply a portion of the first and second source matrices prior to an interruption, and store a completion progress indicator in response to the interruption. The completion progress indicator to indicate an amount of progress in multiplying the first and second source matrices, and storing corresponding result data to the third memory location, that is to have been completed prior to the interruption.

Type: Grant

Filed: August 30, 2023

Date of Patent: January 21, 2025

Assignee: Intel Corporation

Inventors: Edward T. Grochowski, Asit K. Mishra, Robert Valentine, Mark J. Charney, Simon C. Steely, Jr.
Interruptible and restartable matrix multiplication instructions, processors, methods, and systems

Patent number: 12050912

Abstract: A processor of an aspect includes a decode unit to decode a matrix multiplication instruction. The matrix multiplication instruction is to indicate a first memory location of a first source matrix, is to indicate a second memory location of a second source matrix, and is to indicate a third memory location where a result matrix is to be stored. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the matrix multiplication instruction, is to multiply a portion of the first and second source matrices prior to an interruption, and store a completion progress indicator in response to the interruption. The completion progress indicator to indicate an amount of progress in multiplying the first and second source matrices, and storing corresponding result data to the third memory location, that is to have been completed prior to the interruption.

Type: Grant

Filed: July 10, 2023

Date of Patent: July 30, 2024

Assignee: Intel Corporation

Inventors: Edward T. Grochowski, Asit K. Mishra, Robert Valentine, Mark J. Charney, Simon C. Steely, Jr.
APPARATUSES, METHODS, AND SYSTEMS FOR INSTRUCTIONS FOR LOADING A TILE OF A MATRIX OPERATIONS ACCELERATOR

Publication number: 20240220323

Abstract: Systems, methods, and apparatuses relating to floating-point support circuitry to implement floating-point operations on a two-dimensional grid of fixed-point processing elements are described.

Type: Application

Filed: December 30, 2022

Publication date: July 4, 2024

Inventors: Gregory Henry, Kermin E. Chofleming, Simon C. Steely, JR.
Apparatuses, methods, and systems for fused operations using sign modification in a processing element of a configurable spatial accelerator

Patent number: 11907713

Abstract: Systems, methods, and apparatuses relating to a sign modification field for fused operations in a configurable spatial accelerator are described.

Type: Grant

Filed: December 28, 2019

Date of Patent: February 20, 2024

Assignee: Intel Corporation

Inventors: Kermin E. Chofleming, Chuanjun Zhang, Daniel Towner, Simon C. Steely, Jr., Benjamin Keen
Interruptible and restartable matrix multiplication instructions, processors, methods, and systems

Patent number: 11698787

Abstract: A processor of an aspect includes a decode unit to decode a matrix multiplication instruction. The matrix multiplication instruction is to indicate a first memory location of a first source matrix, is to indicate a second memory location of a second source matrix, and is to indicate a third memory location where a result matrix is to be stored. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the matrix multiplication instruction, is to multiply a portion of the first and second source matrices prior to an interruption, and store a completion progress indicator in response to the interruption. The completion progress indicator to indicate an amount of progress in multiplying the first and second source matrices, and storing corresponding result data to the third memory location, that is to have been completed prior to the interruption.

Type: Grant

Filed: June 29, 2021

Date of Patent: July 11, 2023

Assignee: INTEL CORPORATION

Inventors: Edward T. Grochowski, Asit K. Mishra, Robert Valentine, Mark J. Charney, Simon C. Steely, Jr.
Layered super-reticle computing : architectures and methods

Patent number: 11656662

Abstract: Embodiments herein may present an integrated circuit or a computing system having an integrated circuit, where the integrated circuit includes a physical network layer, a physical computing layer, and a physical memory layer, each having a set of dies, and a die including multiple tiles. The physical network layer further includes one or more signal pathways dynamically configurable between multiple pre-defined interconnect topologies for the multiple tiles, where each topology of the multiple pre-defined interconnect topologies corresponds to a communication pattern related to a workload. At least a tile in the physical computing layer is further arranged to move data to another tile in the physical computing layer or a storage cell of the physical memory layer through the one or more signal pathways in the physical network layer. Other embodiments may be described and/or claimed.

Type: Grant

Filed: February 11, 2021

Date of Patent: May 23, 2023

Assignee: Intel Corporation

Inventors: Simon C. Steely, Jr., Richard Dischler, David Bach, Olivier Franza, William J. Butera, Christian Karl, Benjamin Keen, Brian Leung
Apparatuses, methods, and systems for operations in a configurable spatial accelerator

Patent number: 11593295

Abstract: Systems, methods, and apparatuses relating to operations in a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator includes a first processing element that includes a configuration register within the first processing element to store a configuration value that causes the first processing element to perform an operation according to the configuration value, a plurality of input queues, an input controller to control enqueue and dequeue of values into the plurality of input queues according to the configuration value, a plurality of output queues, and an output controller to control enqueue and dequeue of values into the plurality of output queues according to the configuration value.

Type: Grant

Filed: December 14, 2021

Date of Patent: February 28, 2023

Assignee: Intel Corporation

Inventors: Kermin E. Fleming, Jr., Simon C. Steely, Jr., Kent D. Glossop, Mitchell Diamond, Benjamin Keen, Dennis Bradford, Fabrizio Petrini, Barry Tannenbaum, Yongzhi Zhang
Signal pathways in multi-tile processors

Patent number: 11269805

Abstract: Embodiments herein may present a multi-tile processor including a plurality of processor tiles, and a plurality of interconnects selectively coupling the plurality of processor tiles to each other. A first processor tile may include a memory to store a bulletin board to hold a message, an execution unit, and an encapsulated software module. The encapsulated software module may select a second processor tile coupled with the first processor tile by an interconnect to be a part of a signal pathway. The second processor tile may be selected based on a selection criterion of the signal pathway and the message held in the bulletin board. The encapsulated software module may post and read a message at the bulletin board stored in the memory, or read a message from a bulletin board stored in a memory of the second processor tile. Other embodiments may be described and/or claimed.

Type: Grant

Filed: May 15, 2018

Date of Patent: March 8, 2022

Assignee: Intel Corporation

Inventors: William J. Butera, Simon C. Steely, Jr., Richard J. Dischler
Apparatuses, methods, and systems for operations in a configurable spatial accelerator

Patent number: 11200186

Abstract: Systems, methods, and apparatuses relating to operations in a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator includes a first processing element that includes a configuration register within the first processing element to store a configuration value that causes the first processing element to perform an operation according to the configuration value, a plurality of input queues, an input controller to control enqueue and dequeue of values into the plurality of input queues according to the configuration value, a plurality of output queues, and an output controller to control enqueue and dequeue of values into the plurality of output queues according to the configuration value.

Type: Grant

Filed: June 30, 2018

Date of Patent: December 14, 2021

Assignee: Intel Corporation

Inventors: Kermin E. Fleming, Jr., Simon C. Steely, Jr., Kent D. Glossop, Mitchell Diamond, Benjamin Keen, Dennis Bradford, Fabrizio Petrini, Barry Tannenbaum, Yongzhi Zhang
LAYERED SUPER-RETICLE COMPUTING : ARCHITECTURES AND METHODS

Publication number: 20210255674

Abstract: Embodiments herein may present an integrated circuit or a computing system having an integrated circuit, where the integrated circuit includes a physical network layer, a physical computing layer, and a physical memory layer, each having a set of dies, and a die including multiple tiles. The physical network layer further includes one or more signal pathways dynamically configurable between multiple pre-defined interconnect topologies for the multiple tiles, where each topology of the multiple pre-defined interconnect topologies corresponds to a communication pattern related to a workload. At least a tile in the physical computing layer is further arranged to move data to another tile in the physical computing layer or a storage cell of the physical memory layer through the one or more signal pathways in the physical network layer. Other embodiments may be described and/or claimed.

Type: Application

Filed: February 11, 2021

Publication date: August 19, 2021

Inventors: Simon C. Steely, JR., Richard Dischler, David Bach, Olivier Franza, William J. Butera, Christian Karl, Benjamin Keen, Brian Leung
Processors, methods, and systems for debugging a configurable spatial accelerator

Patent number: 11086816

Abstract: Systems, methods, and apparatuses relating to debugging a configurable spatial accelerator are described. In one embodiment, a processor includes a plurality of processing elements and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements are to perform an operation by a respective, incoming operand set arriving at each of the dataflow operators of the plurality of processing elements. At least a first of the plurality of processing elements is to enter a halted state in response to being represented as a first of the plurality of dataflow operators.

Type: Grant

Filed: September 28, 2017

Date of Patent: August 10, 2021

Assignee: Intel Corporation

Inventors: Kermin Fleming, Simon C. Steely, Jr., Kent D. Glossop
Processors, methods, systems, and instructions to load multiple data elements to destination storage locations other than packed data registers

Patent number: 11068264

Abstract: A processor of an aspect includes a plurality of packed data registers, and a decode unit to decode an instruction. The instruction is to indicate a packed data register of the plurality of packed data registers that is to store a source packed memory address information. The source packed memory address information is to include a plurality of memory address information data elements. An execution unit is coupled with the decode unit and the plurality of packed data registers, the execution unit, in response to the instruction, is to load a plurality of data elements from a plurality of memory addresses that are each to correspond to a different one of the plurality of memory address information data elements, and store the plurality of loaded data elements in a destination storage location. The destination storage location does not include a register of the plurality of packed data registers.

Type: Grant

Filed: August 9, 2019

Date of Patent: July 20, 2021

Assignee: Intel Corporation

Inventors: William C. Hasenplaugh, Chris J. Newburn, Simon C. Steely, Jr., Samantika S. Sury
Interruptible and restartable matrix multiplication instructions, processors, methods, and systems

Patent number: 11048508

Abstract: A processor of an aspect includes a decode unit to decode a matrix multiplication instruction. The matrix multiplication instruction is to indicate a first memory location of a first source matrix, is to indicate a second memory location of a second source matrix, and is to indicate a third memory location where a result matrix is to be stored. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the matrix multiplication instruction, is to multiply a portion of the first and second source matrices prior to an interruption, and store a completion progress indicator in response to the interruption. The completion progress indicator to indicate an amount of progress in multiplying the first and second source matrices, and storing corresponding result data to the third memory location, that is to have been completed prior to the interruption.

Type: Grant

Filed: April 29, 2019

Date of Patent: June 29, 2021

Assignee: Intel Corporation

Inventors: Edward T. Grochowski, Asit K. Mishra, Robert Valentine, Mark J. Charney, Simon C. Steely, Jr.
Layered super-reticle computing : architectures and methods

Patent number: 10963022

Abstract: Embodiments herein may present an integrated circuit or a computing system having an integrated circuit, where the integrated circuit includes a physical network layer, a physical computing layer, and a physical memory layer, each having a set of dies, and a die including multiple tiles. The physical network layer further includes one or more signal pathways dynamically configurable between multiple pre-defined interconnect topologies for the multiple tiles, where each topology of the multiple pre-defined interconnect topologies corresponds to a communication pattern related to a workload. At least a tile in the physical computing layer is further arranged to move data to another tile in the physical computing layer or a storage cell of the physical memory layer through the one or more signal pathways in the physical network layer. Other embodiments may be described and/or claimed.

Type: Grant

Filed: April 29, 2020

Date of Patent: March 30, 2021

Assignee: Intel Corporation

Inventors: Simon C. Steely, Jr., Richard Dischler, David Bach, Olivier Franza, William J. Butera, Christian Karl, Benjamin Keen, Brian Leung
Apparatuses, methods, and systems for memory interface circuit allocation in a configurable spatial accelerator

Patent number: 10915471

Abstract: Systems, methods, and apparatuses relating to memory interface circuit allocation in a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator (CSA) includes a plurality of processing elements; a plurality of request address file (RAF) circuits, and a circuit switched interconnect network between the plurality of processing elements and the RAF circuits. As a dataflow architecture, embodiments of CSA have a unique memory architecture where memory accesses are decoupled into an explicit request and response phase allowing pipelining through memory. Certain embodiments herein provide for an improved memory sub-system design via the improvements to allocation discussed herein.

Type: Grant

Filed: March 30, 2019

Date of Patent: February 9, 2021

Assignee: Intel Corporation

Inventors: Kermin ChoFleming, Yu Bai, Simon C. Steely
Switchable topology processor tile and computing machine

Patent number: 10891254

Abstract: Embodiments relate to a computational device including multiple processor tiles on a die that may have multiple switchable topologies. A topology of the computational device may include one or more virtual circuits. A virtual circuit may include multiple processor tiles. A processor tile of a virtual circuit of a topology may include a configuration vector to control a connection between the processor tile and a neighboring processor tile. A first topology of the computation device may correspond to a first phase of a computation of a program, and a second topology of the computation device may correspond to a second phase of the computation of the program. Other embodiments may be described and/or claimed.

Type: Grant

Filed: June 29, 2017

Date of Patent: January 12, 2021

Assignee: Intel Corporation

Inventors: William J. Butera, Simon C. Steely, Jr., Richard J. Dischler
APPARATUSES, METHODS, AND SYSTEMS FOR TIME-MULTIPLEXING IN A CONFIGURABLE SPATIAL ACCELERATOR

Publication number: 20200409709

Abstract: Systems, methods, and apparatuses relating to time-multiplexing circuitry in a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator (CSA) includes a plurality of processing elements; and a time-multiplexed, circuit switched interconnect network between the plurality of processing elements. In another embodiment, a configurable spatial accelerator (CSA) includes a plurality of time-multiplexed processing elements; and a time-multiplexed, circuit switched interconnect network between the plurality of time-multiplexed processing elements.

Type: Application

Filed: June 29, 2019

Publication date: December 31, 2020

Inventors: Kermin ChoFleming, Simon C. Steely, JR., Mitchell Diamond
LAYERED SUPER-RETICLE COMPUTING : ARCHITECTURES AND METHODS

Publication number: 20200371566

Abstract: Embodiments herein may present an integrated circuit or a computing system having an integrated circuit, where the integrated circuit includes a physical network layer, a physical computing layer, and a physical memory layer, each having a set of dies, and a die including multiple tiles. The physical network layer further includes one or more signal pathways dynamically configurable between multiple pre-defined interconnect topologies for the multiple tiles, where each topology of the multiple pre-defined interconnect topologies corresponds to a communication pattern related to a workload. At least a tile in the physical computing layer is further arranged to move data to another tile in the physical computing layer or a storage cell of the physical memory layer through the one or more signal pathways in the physical network layer. Other embodiments may be described and/or claimed.

Type: Application

Filed: April 29, 2020

Publication date: November 26, 2020

Inventors: Simon C. Steely, JR., Richard Dischler, David Bach, Olivier Franza, William J. Butera, Christian Karl, Benjamin Keen, Brian Leung
APPARATUSES, METHODS, AND SYSTEMS FOR MEMORY INTERFACE CIRCUIT ALLOCATION IN A CONFIGURABLE SPATIAL ACCELERATOR

Publication number: 20200310994

Abstract: Systems, methods, and apparatuses relating to memory interface circuit allocation in a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator (CSA) includes a plurality of processing elements; a plurality of request address file (RAF) circuits, and a circuit switched interconnect network between the plurality of processing elements and the RAF circuits. As a dataflow architecture, embodiments of CSA have a unique memory architecture where memory accesses are decoupled into an explicit request and response phase allowing pipelining through memory. Certain embodiments herein provide for an improved memory sub-system design via the improvements to allocation discussed herein.

Type: Application

Filed: March 30, 2019

Publication date: October 1, 2020

Inventors: Kermin ChoFleming, Yu Bai, Simon C. Steely

1 2 3 4 5 … next