Patents by Inventor Simon C. Steely, Jr.

Simon C. Steely, Jr. has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Processors, methods, and systems with a configurable spatial accelerator

Patent number: 10558575

Abstract: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In one embodiment, a processor includes a core with a decoder to decode an instruction into a decoded instruction and an execution unit to execute the decoded instruction to perform a first operation; a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements is to perform a second operation when an incoming operand set arrives at the plurality of processing elements.

Type: Grant

Filed: December 30, 2016

Date of Patent: February 11, 2020

Assignee: Intel Corporation

Inventors: Kermin E. Fleming, Jr., Kent D. Glossop, Simon C. Steely, Jr., Jinjie Tang, Alan G. Gara
Processors, methods, and systems with a configurable spatial accelerator

Patent number: 10515046

Abstract: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described.

Type: Grant

Filed: July 1, 2017

Date of Patent: December 24, 2019

Assignee: Intel Corporation

Inventors: Kermin Fleming, Kent D. Glossop, Simon C. Steely, Jr.
PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS TO LOAD MULTIPLE DATA ELEMENTS TO DESTINATION STORAGE LOCATIONS OTHER THAN PACKED DATA REGISTERS

Publication number: 20190384601

Abstract: A processor of an aspect includes a plurality of packed data registers, and a decode unit to decode an instruction. The instruction is to indicate a packed data register of the plurality of packed data registers that is to store a source packed memory address information. The source packed memory address information is to include a plurality of memory address information data elements. An execution unit is coupled with the decode unit and the plurality of packed data registers, the execution unit, in response to the instruction, is to load a plurality of data elements from a plurality of memory addresses that are each to correspond to a different one of the plurality of memory address information data elements, and store the plurality of loaded data elements in a destination storage location. The destination storage location does not include a register of the plurality of packed data registers.

Type: Application

Filed: August 9, 2019

Publication date: December 19, 2019

Inventors: William C. Hasenplaugh, Chris J. Newburn, Simon C. Steely, JR., Samantika S. Sury
Processors, methods, and systems for a memory fence in a configurable spatial accelerator

Patent number: 10496574

Abstract: Systems, methods, and apparatuses relating to a memory fence mechanism in a configurable spatial accelerator are described. In one embodiment, a processor includes a plurality of processing elements and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements are to perform a plurality of operations, each by a respective, incoming operand set arriving at each of the dataflow operators of the plurality of processing elements. The processor also includes a fence manager to manage a memory fence between a first operation and a second operation of the plurality of operations.

Type: Grant

Filed: September 28, 2017

Date of Patent: December 3, 2019

Assignee: Intel Corporation

Inventors: Kermin Fleming, Kent D. Glossop, Simon C. Steely, Jr.
LAYERED SUPER-RETICLE COMPUTING : ARCHITECTURES AND METHODS

Publication number: 20190354146

Abstract: Embodiments herein may present an integrated circuit or a computing system having an integrated circuit, where the integrated circuit includes a physical network layer, a physical computing layer, and a physical memory layer, each having a set of dies, and a die including multiple tiles. The physical network layer further includes one or more signal pathways dynamically configurable between multiple pre-defined interconnect topologies for the multiple tiles, where each topology of the multiple pre-defined interconnect topologies corresponds to a communication pattern related to a workload. At least a tile in the physical computing layer is further arranged to move data to another tile in the physical computing layer or a storage cell of the physical memory layer through the one or more signal pathways in the physical network layer. Other embodiments may be described and/or claimed.

Type: Application

Filed: May 20, 2019

Publication date: November 21, 2019

Inventors: Simon C. Steely, JR., Richard Dischler, David Bach, Olivier Franza, William J. Butera, Christian Karl, Benjamin Keen, Brian Leung
Runtime address disambiguation in acceleration hardware

Patent number: 10474375

Abstract: An integrated circuit includes a processor to execute instructions and to interact with memory, and acceleration hardware, to execute a sub-program corresponding to instructions. A set of input queues includes a store address queue to receive, from the acceleration hardware, a first address of the memory, the first address associated with a store operation and a store data queue to receive, from the acceleration hardware, first data to be stored at the first address of the memory. The set of input queues also includes a completion queue to buffer response data for a load operation. A disambiguator circuit, coupled to the set of input queues and the memory, is to, responsive to determining the load operation, which succeeds the store operation, has an address conflict with the first address, copy the first data from the store data queue into the completion queue for the load operation.

Type: Grant

Filed: December 30, 2016

Date of Patent: November 12, 2019

Assignee: Intel Corporation

Inventors: Kermin Elliott Fleming, Jr., Simon C. Steely, Jr., Kent D. Glossop
Processors and methods for pipelined runtime services in a spatial array

Patent number: 10467183

Abstract: Methods and apparatuses relating to pipelined runtime services in spatial arrays are described.

Type: Grant

Filed: July 1, 2017

Date of Patent: November 5, 2019

Assignee: Intel Corporation

Inventors: Kermin Fleming, Jr., Simon C. Steely, Jr., Kent D. Glossop
Processors and methods with configurable network-based dataflow operator circuits

Patent number: 10469397

Abstract: Systems, methods, and apparatuses relating to configurable network-based dataflow operator circuits are described. In one embodiment, a processor includes a spatial array of processing elements, and a packet switched communications network to route data within the spatial array between processing elements according to a dataflow graph to perform a first dataflow operation of the dataflow graph, wherein the packet switched communications network further comprises a plurality of network dataflow endpoint circuits to perform a second dataflow operation of the dataflow graph.

Type: Grant

Filed: July 1, 2017

Date of Patent: November 5, 2019

Assignee: Intel Corporation

Inventors: Kermin Fleming, Kent D. Glossop, Simon C. Steely, Jr.
Processors, methods, and systems for a configurable spatial accelerator with transactional and replay features

Patent number: 10445234

Abstract: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In an embodiment, a processor includes a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements are to perform an atomic operation when an incoming operand set arrives at the plurality of processing elements.

Type: Grant

Filed: July 1, 2017

Date of Patent: October 15, 2019

Assignee: Intel Corporation

Inventors: Kermin Fleming, Kent D. Glossop, Simon C. Steely, Jr., Samantika S. Sury
Processors, methods, and systems for a configurable spatial accelerator with performance, correctness, and power reduction features

Patent number: 10445451

Abstract: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In one embodiment, a processor includes a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements is to perform an operation when an incoming operand set arrives at the plurality of processing elements. At least one of the plurality of processing elements includes a plurality of control inputs.

Type: Grant

Filed: July 1, 2017

Date of Patent: October 15, 2019

Assignee: Intel Corporation

Inventors: Kermin Fleming, Kent D. Glossop, Simon C. Steely, Jr., Ping Tak Peter Tang
Synchronization logic for memory requests

Patent number: 10430252

Abstract: In an embodiment, a processor includes a plurality of cores and synchronization logic. The synchronization logic includes circuitry to: receive a first memory request and a second memory request; determine whether the second memory request is in contention with the first memory request; and in response to a determination that the second memory request is in contention with the first memory request, process the second memory request using a non-blocking cache coherence protocol. Other embodiments are described and claimed.

Type: Grant

Filed: November 15, 2018

Date of Patent: October 1, 2019

Assignee: Intel Corporation

Inventors: Samantika S. Sury, Robert G. Blankenship, Simon C. Steely, Jr.
Apparatus, methods, and systems for memory consistency in a configurable spatial accelerator

Patent number: 10417175

Abstract: Methods and apparatuses relating to consistency in an accelerator are described. In one embodiment, request address file (RAF) circuits are coupled to a spatial array by a first network, a memory is coupled to the RAF circuits by a second network, a RAF circuit is to not issue, into the second network, a request to the memory marked with a program order dependency on a previous request until receiving a first token generated by completion of the previous request to the memory by another RAF circuit, and a second RAF circuit is to not issue, into the second network, a second request to the memory marked with a program order dependency on a first request until receiving a second token sent by a first RAF circuit when a predetermined time period has lapsed since the first request was issued by the first RAF circuit into the second network.

Type: Grant

Filed: December 30, 2017

Date of Patent: September 17, 2019

Assignee: Intel Corporation

Inventors: Kermin E. Fleming, Simon C. Steely, Jr., Kent D. Glossop
Processors, methods, and systems with a configurable spatial accelerator

Patent number: 10416999

Abstract: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In one embodiment, a processor includes a core with a decoder to decode an instruction into a decoded instruction and an execution unit to execute the decoded instruction to perform a first operation; a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements are to perform a second operation by a respective, incoming operand set arriving at each of the dataflow operators of the plurality of processing elements.

Type: Grant

Filed: December 30, 2016

Date of Patent: September 17, 2019

Assignee: Intel Corporation

Inventors: Kermin Fleming, Kent D. Glossop, Simon C. Steely, Jr.
Low energy consumption mantissa multiplication for floating point multiply-add operations

Patent number: 10402168

Abstract: A floating point multiply-add unit having inputs coupled to receive a floating point multiplier data element, a floating point multiplicand data element, and a floating point addend data element. The multiply-add unit including a mantissa multiplier to multiply a mantissa of the multiplier data element and a mantissa of the multiplicand data element to calculate a mantissa product. The mantissa multiplier including a most significant bit portion to calculate most significant bits of the mantissa product, and a least significant bit portion to calculate least significant bits of the mantissa product. The mantissa multiplier has a plurality of different possible sizes of the least significant bit portion. Energy consumption reduction logic to selectively reduce energy consumption of the least significant bit portion, but not the most significant bit portion, to cause the least significant bit portion to not calculate the least significant bits of the mantissa product.

Type: Grant

Filed: October 1, 2016

Date of Patent: September 3, 2019

Assignee: Intel Corporation

Inventors: William C. Hasenplaugh, Kermin E. Fleming, Jr., Tryggve Fossum, Simon C. Steely, Jr.
Processors, methods, and systems for a configurable spatial accelerator with memory system performance, power reduction, and atomics support features

Patent number: 10387319

Abstract: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In one embodiment, a processor includes a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements is to perform an operation when an incoming operand set arrives at the plurality of processing elements. The processor also includes a streamer element to prefetch the incoming operand set from two or more levels of a memory system.

Type: Grant

Filed: July 1, 2017

Date of Patent: August 20, 2019

Assignee: Intel Corporation

Inventors: Michael C. Adler, Chiachen Chou, Neal C. Crago, Kermin Fleming, Kent D. Glossop, Aamer Jaleel, Pratik M. Marolia, Simon C. Steely, Jr., Samantika S. Sury
Processors, methods, systems, and instructions to load multiple data elements to destination storage locations other than packed data registers

Patent number: 10379855

Abstract: A processor of an aspect includes a plurality of packed data registers, and a decode unit to decode an instruction. The instruction is to indicate a packed data register of the plurality of packed data registers that is to store a source packed memory address information. The source packed memory address information is to include a plurality of memory address information data elements. An execution unit is coupled with the decode unit and the plurality of packed data registers, the execution unit, in response to the instruction, is to load a plurality of data elements from a plurality of memory addresses that are each to correspond to a different one of the plurality of memory address information data elements, and store the plurality of loaded data elements in a destination storage location. The destination storage location does not include a register of the plurality of packed data registers.

Type: Grant

Filed: September 30, 2016

Date of Patent: August 13, 2019

Assignee: Intel Corporation

Inventors: William C. Hasenplaugh, Chris J. Newburn, Simon C. Steely, Jr., Samantika S. Sury
APPARATUS, METHODS, AND SYSTEMS FOR MEMORY CONSISTENCY IN A CONFIGURABLE SPATIAL ACCELERATOR

Publication number: 20190205284

Abstract: Methods and apparatuses relating to consistency in an accelerator are described. In one embodiment, request address file (RAF) circuits are coupled to a spatial array by a first network, a memory is coupled to the RAF circuits by a second network, a RAF circuit is to not issue, into the second network, a request to the memory marked with a program order dependency on a previous request until receiving a first token generated by completion of the previous request to the memory by another RAF circuit, and a second RAF circuit is to not issue, into the second network, a second request to the memory marked with a program order dependency on a first request until receiving a second token sent by a first RAF circuit when a predetermined time period has lapsed since the first request was issued by the first RAF circuit into the second network.

Type: Application

Filed: December 30, 2017

Publication date: July 4, 2019

Inventors: Kermin E. Fleming, Simon C. Steely, Jr., Kent D. Glossop
Interruptible and restartable matrix multiplication instructions, processors, methods, and systems

Patent number: 10275243

Abstract: A processor of an aspect includes a decode unit to decode a matrix multiplication instruction. The matrix multiplication instruction is to indicate a first memory location of a first source matrix, is to indicate a second memory location of a second source matrix, and is to indicate a third memory location where a result matrix is to be stored. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the matrix multiplication instruction, is to multiply a portion of the first and second source matrices prior to an interruption, and store a completion progress indicator in response to the interruption. The completion progress indicator to indicate an amount of progress in multiplying the first and second source matrices, and storing corresponding result data to the third memory location, that is to have been completed prior to the interruption.

Type: Grant

Filed: July 2, 2016

Date of Patent: April 30, 2019

Assignee: Intel Corporation

Inventors: Edward T. Grochowski, Asit K. Mishra, Robert Valentine, Mark J. Charney, Simon C. Steely, Jr.
METHOD AND APPARATUS FOR ADAPTIVELY SELECTING DATA TRANSFER PROCESSES FOR SINGLE-PRODUCER-SINGLE-CONSUMER AND WIDELY SHARED CACHE LINES

Publication number: 20190102295

Abstract: A method for adaptively performing a set of data transfer processes in a multi-core processor is described. The method may include receiving, by a shared cache from a first core cache, a first request for a cache line; determining, by the shared cache in response to receipt of the first request, whether the cache line is a widely-shared cache line or a single-producer-single-consumer cache line; and performing, by the first core cache and a second core cache, a three-hop data transfer process in response to determining that the cache line is a single-producer-single-consumer cache line, wherein the three-hop data transfer process transfers the cache line directly from the second core cache to the first core cache.

Type: Application

Filed: September 29, 2017

Publication date: April 4, 2019

Inventors: Samantika S. Sury, Robert G. Blankenship, Simon C. Steely, JR., Yen-Cheng Liu
PROCESSORS, METHODS, AND SYSTEMS FOR DEBUGGING A CONFIGURABLE SPATIAL ACCELERATOR

Publication number: 20190095383

Abstract: Systems, methods, and apparatuses relating to debugging a configurable spatial accelerator are described. In one embodiment, a processor includes a plurality of processing elements and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements are to perform an operation by a respective, incoming operand set arriving at each of the dataflow operators of the plurality of processing elements. At least a first of the plurality of processing elements is to enter a halted state in response to being represented as a first of the plurality of dataflow operators.

Type: Application

Filed: September 28, 2017

Publication date: March 28, 2019

Inventors: Kermin Fleming, Simon C. Steely, JR., Kent D. Glossop

prev 1 2 3 4 5 6 … next