Patents by Inventor Simon C. Steely
Simon C. Steely has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12204478Abstract: Examples include techniques for near data acceleration for a multi-core architecture. A near data processor included in a memory controller of a processor may access data maintained in a memory device coupled with the near data processor via one or more memory channels responsive to a work request to execute a kernel, an application or a loop routine using the accessed data to generate values. The near data processor provides an indication to the requestor of the work request that values have been generated.Type: GrantFiled: March 19, 2021Date of Patent: January 21, 2025Assignee: Intel CorporationInventors: Swapna Raj, Samantika S. Sury, Kermin Chofleming, Simon C. Steely, Jr.
-
Patent number: 12204898Abstract: A processor of an aspect includes a decode unit to decode a matrix multiplication instruction. The matrix multiplication instruction is to indicate a first memory location of a first source matrix, is to indicate a second memory location of a second source matrix, and is to indicate a third memory location where a result matrix is to be stored. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the matrix multiplication instruction, is to multiply a portion of the first and second source matrices prior to an interruption, and store a completion progress indicator in response to the interruption. The completion progress indicator to indicate an amount of progress in multiplying the first and second source matrices, and storing corresponding result data to the third memory location, that is to have been completed prior to the interruption.Type: GrantFiled: August 30, 2023Date of Patent: January 21, 2025Assignee: Intel CorporationInventors: Edward T. Grochowski, Asit K. Mishra, Robert Valentine, Mark J. Charney, Simon C. Steely, Jr.
-
Patent number: 12050912Abstract: A processor of an aspect includes a decode unit to decode a matrix multiplication instruction. The matrix multiplication instruction is to indicate a first memory location of a first source matrix, is to indicate a second memory location of a second source matrix, and is to indicate a third memory location where a result matrix is to be stored. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the matrix multiplication instruction, is to multiply a portion of the first and second source matrices prior to an interruption, and store a completion progress indicator in response to the interruption. The completion progress indicator to indicate an amount of progress in multiplying the first and second source matrices, and storing corresponding result data to the third memory location, that is to have been completed prior to the interruption.Type: GrantFiled: July 10, 2023Date of Patent: July 30, 2024Assignee: Intel CorporationInventors: Edward T. Grochowski, Asit K. Mishra, Robert Valentine, Mark J. Charney, Simon C. Steely, Jr.
-
Publication number: 20240220323Abstract: Systems, methods, and apparatuses relating to floating-point support circuitry to implement floating-point operations on a two-dimensional grid of fixed-point processing elements are described.Type: ApplicationFiled: December 30, 2022Publication date: July 4, 2024Inventors: Gregory Henry, Kermin E. Chofleming, Simon C. Steely, JR.
-
Patent number: 11907713Abstract: Systems, methods, and apparatuses relating to a sign modification field for fused operations in a configurable spatial accelerator are described.Type: GrantFiled: December 28, 2019Date of Patent: February 20, 2024Assignee: Intel CorporationInventors: Kermin E. Chofleming, Chuanjun Zhang, Daniel Towner, Simon C. Steely, Jr., Benjamin Keen
-
Patent number: 11698787Abstract: A processor of an aspect includes a decode unit to decode a matrix multiplication instruction. The matrix multiplication instruction is to indicate a first memory location of a first source matrix, is to indicate a second memory location of a second source matrix, and is to indicate a third memory location where a result matrix is to be stored. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the matrix multiplication instruction, is to multiply a portion of the first and second source matrices prior to an interruption, and store a completion progress indicator in response to the interruption. The completion progress indicator to indicate an amount of progress in multiplying the first and second source matrices, and storing corresponding result data to the third memory location, that is to have been completed prior to the interruption.Type: GrantFiled: June 29, 2021Date of Patent: July 11, 2023Assignee: INTEL CORPORATIONInventors: Edward T. Grochowski, Asit K. Mishra, Robert Valentine, Mark J. Charney, Simon C. Steely, Jr.
-
Patent number: 11656662Abstract: Embodiments herein may present an integrated circuit or a computing system having an integrated circuit, where the integrated circuit includes a physical network layer, a physical computing layer, and a physical memory layer, each having a set of dies, and a die including multiple tiles. The physical network layer further includes one or more signal pathways dynamically configurable between multiple pre-defined interconnect topologies for the multiple tiles, where each topology of the multiple pre-defined interconnect topologies corresponds to a communication pattern related to a workload. At least a tile in the physical computing layer is further arranged to move data to another tile in the physical computing layer or a storage cell of the physical memory layer through the one or more signal pathways in the physical network layer. Other embodiments may be described and/or claimed.Type: GrantFiled: February 11, 2021Date of Patent: May 23, 2023Assignee: Intel CorporationInventors: Simon C. Steely, Jr., Richard Dischler, David Bach, Olivier Franza, William J. Butera, Christian Karl, Benjamin Keen, Brian Leung
-
Patent number: 11593295Abstract: Systems, methods, and apparatuses relating to operations in a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator includes a first processing element that includes a configuration register within the first processing element to store a configuration value that causes the first processing element to perform an operation according to the configuration value, a plurality of input queues, an input controller to control enqueue and dequeue of values into the plurality of input queues according to the configuration value, a plurality of output queues, and an output controller to control enqueue and dequeue of values into the plurality of output queues according to the configuration value.Type: GrantFiled: December 14, 2021Date of Patent: February 28, 2023Assignee: Intel CorporationInventors: Kermin E. Fleming, Jr., Simon C. Steely, Jr., Kent D. Glossop, Mitchell Diamond, Benjamin Keen, Dennis Bradford, Fabrizio Petrini, Barry Tannenbaum, Yongzhi Zhang
-
Patent number: 11269805Abstract: Embodiments herein may present a multi-tile processor including a plurality of processor tiles, and a plurality of interconnects selectively coupling the plurality of processor tiles to each other. A first processor tile may include a memory to store a bulletin board to hold a message, an execution unit, and an encapsulated software module. The encapsulated software module may select a second processor tile coupled with the first processor tile by an interconnect to be a part of a signal pathway. The second processor tile may be selected based on a selection criterion of the signal pathway and the message held in the bulletin board. The encapsulated software module may post and read a message at the bulletin board stored in the memory, or read a message from a bulletin board stored in a memory of the second processor tile. Other embodiments may be described and/or claimed.Type: GrantFiled: May 15, 2018Date of Patent: March 8, 2022Assignee: Intel CorporationInventors: William J. Butera, Simon C. Steely, Jr., Richard J. Dischler
-
Patent number: 11200186Abstract: Systems, methods, and apparatuses relating to operations in a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator includes a first processing element that includes a configuration register within the first processing element to store a configuration value that causes the first processing element to perform an operation according to the configuration value, a plurality of input queues, an input controller to control enqueue and dequeue of values into the plurality of input queues according to the configuration value, a plurality of output queues, and an output controller to control enqueue and dequeue of values into the plurality of output queues according to the configuration value.Type: GrantFiled: June 30, 2018Date of Patent: December 14, 2021Assignee: Intel CorporationInventors: Kermin E. Fleming, Jr., Simon C. Steely, Jr., Kent D. Glossop, Mitchell Diamond, Benjamin Keen, Dennis Bradford, Fabrizio Petrini, Barry Tannenbaum, Yongzhi Zhang
-
Publication number: 20210255674Abstract: Embodiments herein may present an integrated circuit or a computing system having an integrated circuit, where the integrated circuit includes a physical network layer, a physical computing layer, and a physical memory layer, each having a set of dies, and a die including multiple tiles. The physical network layer further includes one or more signal pathways dynamically configurable between multiple pre-defined interconnect topologies for the multiple tiles, where each topology of the multiple pre-defined interconnect topologies corresponds to a communication pattern related to a workload. At least a tile in the physical computing layer is further arranged to move data to another tile in the physical computing layer or a storage cell of the physical memory layer through the one or more signal pathways in the physical network layer. Other embodiments may be described and/or claimed.Type: ApplicationFiled: February 11, 2021Publication date: August 19, 2021Inventors: Simon C. Steely, JR., Richard Dischler, David Bach, Olivier Franza, William J. Butera, Christian Karl, Benjamin Keen, Brian Leung
-
Patent number: 11086816Abstract: Systems, methods, and apparatuses relating to debugging a configurable spatial accelerator are described. In one embodiment, a processor includes a plurality of processing elements and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements are to perform an operation by a respective, incoming operand set arriving at each of the dataflow operators of the plurality of processing elements. At least a first of the plurality of processing elements is to enter a halted state in response to being represented as a first of the plurality of dataflow operators.Type: GrantFiled: September 28, 2017Date of Patent: August 10, 2021Assignee: Intel CorporationInventors: Kermin Fleming, Simon C. Steely, Jr., Kent D. Glossop
-
Patent number: 11068264Abstract: A processor of an aspect includes a plurality of packed data registers, and a decode unit to decode an instruction. The instruction is to indicate a packed data register of the plurality of packed data registers that is to store a source packed memory address information. The source packed memory address information is to include a plurality of memory address information data elements. An execution unit is coupled with the decode unit and the plurality of packed data registers, the execution unit, in response to the instruction, is to load a plurality of data elements from a plurality of memory addresses that are each to correspond to a different one of the plurality of memory address information data elements, and store the plurality of loaded data elements in a destination storage location. The destination storage location does not include a register of the plurality of packed data registers.Type: GrantFiled: August 9, 2019Date of Patent: July 20, 2021Assignee: Intel CorporationInventors: William C. Hasenplaugh, Chris J. Newburn, Simon C. Steely, Jr., Samantika S. Sury
-
Patent number: 11048508Abstract: A processor of an aspect includes a decode unit to decode a matrix multiplication instruction. The matrix multiplication instruction is to indicate a first memory location of a first source matrix, is to indicate a second memory location of a second source matrix, and is to indicate a third memory location where a result matrix is to be stored. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the matrix multiplication instruction, is to multiply a portion of the first and second source matrices prior to an interruption, and store a completion progress indicator in response to the interruption. The completion progress indicator to indicate an amount of progress in multiplying the first and second source matrices, and storing corresponding result data to the third memory location, that is to have been completed prior to the interruption.Type: GrantFiled: April 29, 2019Date of Patent: June 29, 2021Assignee: Intel CorporationInventors: Edward T. Grochowski, Asit K. Mishra, Robert Valentine, Mark J. Charney, Simon C. Steely, Jr.
-
Patent number: 10963022Abstract: Embodiments herein may present an integrated circuit or a computing system having an integrated circuit, where the integrated circuit includes a physical network layer, a physical computing layer, and a physical memory layer, each having a set of dies, and a die including multiple tiles. The physical network layer further includes one or more signal pathways dynamically configurable between multiple pre-defined interconnect topologies for the multiple tiles, where each topology of the multiple pre-defined interconnect topologies corresponds to a communication pattern related to a workload. At least a tile in the physical computing layer is further arranged to move data to another tile in the physical computing layer or a storage cell of the physical memory layer through the one or more signal pathways in the physical network layer. Other embodiments may be described and/or claimed.Type: GrantFiled: April 29, 2020Date of Patent: March 30, 2021Assignee: Intel CorporationInventors: Simon C. Steely, Jr., Richard Dischler, David Bach, Olivier Franza, William J. Butera, Christian Karl, Benjamin Keen, Brian Leung
-
Patent number: 10915471Abstract: Systems, methods, and apparatuses relating to memory interface circuit allocation in a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator (CSA) includes a plurality of processing elements; a plurality of request address file (RAF) circuits, and a circuit switched interconnect network between the plurality of processing elements and the RAF circuits. As a dataflow architecture, embodiments of CSA have a unique memory architecture where memory accesses are decoupled into an explicit request and response phase allowing pipelining through memory. Certain embodiments herein provide for an improved memory sub-system design via the improvements to allocation discussed herein.Type: GrantFiled: March 30, 2019Date of Patent: February 9, 2021Assignee: Intel CorporationInventors: Kermin ChoFleming, Yu Bai, Simon C. Steely
-
Patent number: 10891254Abstract: Embodiments relate to a computational device including multiple processor tiles on a die that may have multiple switchable topologies. A topology of the computational device may include one or more virtual circuits. A virtual circuit may include multiple processor tiles. A processor tile of a virtual circuit of a topology may include a configuration vector to control a connection between the processor tile and a neighboring processor tile. A first topology of the computation device may correspond to a first phase of a computation of a program, and a second topology of the computation device may correspond to a second phase of the computation of the program. Other embodiments may be described and/or claimed.Type: GrantFiled: June 29, 2017Date of Patent: January 12, 2021Assignee: Intel CorporationInventors: William J. Butera, Simon C. Steely, Jr., Richard J. Dischler
-
Publication number: 20200409709Abstract: Systems, methods, and apparatuses relating to time-multiplexing circuitry in a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator (CSA) includes a plurality of processing elements; and a time-multiplexed, circuit switched interconnect network between the plurality of processing elements. In another embodiment, a configurable spatial accelerator (CSA) includes a plurality of time-multiplexed processing elements; and a time-multiplexed, circuit switched interconnect network between the plurality of time-multiplexed processing elements.Type: ApplicationFiled: June 29, 2019Publication date: December 31, 2020Inventors: Kermin ChoFleming, Simon C. Steely, JR., Mitchell Diamond
-
Publication number: 20200371566Abstract: Embodiments herein may present an integrated circuit or a computing system having an integrated circuit, where the integrated circuit includes a physical network layer, a physical computing layer, and a physical memory layer, each having a set of dies, and a die including multiple tiles. The physical network layer further includes one or more signal pathways dynamically configurable between multiple pre-defined interconnect topologies for the multiple tiles, where each topology of the multiple pre-defined interconnect topologies corresponds to a communication pattern related to a workload. At least a tile in the physical computing layer is further arranged to move data to another tile in the physical computing layer or a storage cell of the physical memory layer through the one or more signal pathways in the physical network layer. Other embodiments may be described and/or claimed.Type: ApplicationFiled: April 29, 2020Publication date: November 26, 2020Inventors: Simon C. Steely, JR., Richard Dischler, David Bach, Olivier Franza, William J. Butera, Christian Karl, Benjamin Keen, Brian Leung
-
Publication number: 20200310994Abstract: Systems, methods, and apparatuses relating to memory interface circuit allocation in a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator (CSA) includes a plurality of processing elements; a plurality of request address file (RAF) circuits, and a circuit switched interconnect network between the plurality of processing elements and the RAF circuits. As a dataflow architecture, embodiments of CSA have a unique memory architecture where memory accesses are decoupled into an explicit request and response phase allowing pipelining through memory. Certain embodiments herein provide for an improved memory sub-system design via the improvements to allocation discussed herein.Type: ApplicationFiled: March 30, 2019Publication date: October 1, 2020Inventors: Kermin ChoFleming, Yu Bai, Simon C. Steely