Patents by Inventor Robert Pawlowski

Robert Pawlowski has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

TECHNIQUES FOR ACCELERATION OF A PREFIX-SCAN OPERATION

Publication number: 20210149683

Abstract: Examples include techniques for an in-network acceleration of a parallel prefix-scan operation. Examples include configuring registers of a node included in a plurality of nodes on a same semiconductor package. The registers to be configured responsive to receiving an instruction that indicates a logical tree to map to a network topology that includes the node. The instruction associated with a prefix-scan operation to be executed by at least a portion of the plurality of nodes.

Type: Application

Filed: December 21, 2020

Publication date: May 20, 2021

Inventors: Ankit MORE, Fabrizio PETRINI, Robert PAWLOWSKI, Shruti SHARMA, Sowmya PITCHAIMOORTHY
Array broadcast and reduction systems and methods

Patent number: 10983793

Abstract: The present disclosure is directed to systems and methods of performing one or more broadcast or reduction operations using direct memory access (DMA) control circuitry. The DMA control circuitry executes a modified instruction set architecture (ISA) that facilitates the broadcast distribution of data to a plurality of destination addresses in system memory circuitry. The broadcast instruction may include broadcast of a single data value to each destination address. The broadcast instruction may include broadcast of a data array to each destination address. The DMA control circuitry may also execute a reduction instruction that facilitates the retrieval of data from a plurality of source addresses in system memory and performing one or more operations using the retrieved data. Since the DMA control circuitry, rather than the processor circuitry performs the broadcast and reduction operations, system speed and efficiency is beneficially enhanced.

Type: Grant

Filed: March 29, 2019

Date of Patent: April 20, 2021

Assignee: Intel Corporation

Inventors: Joshua Fryman, Ankit More, Jason Howard, Robert Pawlowski, Yigit Demir, Nick Pepperling, Fabrizio Petrini, Sriram Aananthakrishnan, Shaden Smith
Systems and methods for ISA support for indirect loads and stores for efficiently accessing compressed lists in graph applications

Patent number: 10929132

Abstract: Disclosed embodiments relate to systems and methods for performing instructions to access a compressed graphic list. In one example, a processor includes fetch and decode circuitry to fetch and decode the single instruction to access the compressed graphic list, and execution circuitry to execute the decoded single instruction to cause access to the compressed graphic list by: receiving, from a load store queue, at a first op-engine associated with a first data location, an indirection request, computing, via the first op-engine, a second data location associated with a second op-engine, computing, via the second op-engine, a third data location associated with a third op-engine responsive to the indirection request, and providing, via the third op-engine, a data response to the load store queue responsive to receiving data from the third data location.

Type: Grant

Filed: September 23, 2019

Date of Patent: February 23, 2021

Assignee: Intel Corporation

Inventors: Robert Pawlowski, Scott Hagan Schmittel, Joshua Fryman, Wim Heirman, Jason Howard, Ankit More, Shaden Smith, Scott Cline
HARDWARE SUPPORT FOR DUAL-MEMORY ATOMIC OPERATIONS

Publication number: 20200401412

Abstract: Disclosed embodiments relate to hardware support for dual-memory atomic operations. In one example, a processor includes multiple cores, each including multiple multi-threaded pipelines (MTPs), each associated with a memory, an atomic unit (ATMU) to perform atomic operations and a write-combine buffer (WCB) to manage access to and locks of cache lines in the associated memory, each MTP including fetch and decode stages to fetch and decode an instruction having fields to specify first and second memory locations and an opcode calling for a first MTP to send a request to a second MTP of the multiple MTPs, the second MTP being associated with a memory to which the first memory location is mapped, and to perform an atomic dual-memory operation on the first and second memory locations using its associated ATMU and WCB to perform the request.

Type: Application

Filed: June 24, 2019

Publication date: December 24, 2020

Applicant: Intel Corporation

Inventors: Robert PAWLOWSKI, Joshua B. FRYMAN, Vincent CAVE, Eric M. SCHWARTZ, Ivan B. GANEV, Jason M. HOWARD, Ankit MORE, Shaden SMITH
Multi-processor system with configurable cache sub-domains and cross-die memory coherency

Patent number: 10795819

Abstract: Disclosed embodiments relate to a system with configurable cache sub-domains and cross-die memory coherency. In one example, a system includes R racks, each rack housing N nodes, each node incorporating D dies, each die containing C cores and a die shadow tag, each core including P pipelines and a core shadow tag, each pipelines associated with a data cache and data cache tags and being either non-coherent or coherent and one of X coherency domains, wherein each pipeline, when needing to read a cache line, issues a read request to its associated data cache, then, if need be, issues a read request to its associated core-level cache, then, if need be, issues a read request to its associated die-level cache, then, if need be, issues a no-cache remote read request to a target die being mapped to hold the cache line.

Type: Grant

Filed: June 26, 2019

Date of Patent: October 6, 2020

Assignee: Intel Corporation

Inventors: Robert Pawlowski, Bharadwaj Krishnamurthy, Vincent Cave, Jason M. Howard, Ankit More, Joshua B. Fryman
ARRAY BROADCAST AND REDUCTION SYSTEMS AND METHODS

Publication number: 20200310795

Abstract: The present disclosure is directed to systems and methods of performing one or more broadcast or reduction operations using direct memory access (DMA) control circuitry. The DMA control circuitry executes a modified instruction set architecture (ISA) that facilitates the broadcast distribution of data to a plurality of destination addresses in system memory circuitry. The broadcast instruction may include broadcast of a single data value to each destination address. The broadcast instruction may include broadcast of a data array to each destination address. The DMA control circuitry may also execute a reduction instruction that facilitates the retrieval of data from a plurality of source addresses in system memory and performing one or more operations using the retrieved data. Since the DMA control circuitry, rather than the processor circuitry performs the broadcast and reduction operations, system speed and efficiency is beneficially enhanced.

Type: Application

Filed: March 29, 2019

Publication date: October 1, 2020

Applicant: INTEL CORPORATION

Inventors: Joshua Fryman, Ankit More, Jason Howard, Robert Pawlowski, Yigit Demir, Nick Pepperling, Fabrizio Petrini, Sriram Aananthakrishnan, Shaden Smith
MEMORY SYSTEM ARCHITECTURE FOR MULTI-THREADED PROCESSORS

Publication number: 20200104164

Abstract: Disclosed embodiments relate to an improved memory system architecture for multi-threaded processors. In one example, a system includes a system comprising a multi-threaded processor core (MTPC), the MTPC comprising: P pipelines, each to concurrently process T threads; a crossbar to communicatively couple the P pipelines; a memory for use by the P pipelines, a scheduler to optimize reduction operations by assigning multiple threads to generate results of commutative arithmetic operations, and then accumulate the generated results, and a memory controller (MC) to connect with external storage and other MTPCs, the MC further comprising at least one optimization selected from: an instruction set architecture including a dual-memory operation; a direct memory access (DMA) engine; a buffer to store multiple pending instruction cache requests; multiple channels across which to stripe memory requests; and a shadow-tag coherency management unit.

Type: Application

Filed: September 28, 2018

Publication date: April 2, 2020

Inventors: Robert PAWLOWSKI, Ankit MORE, Jason M. HOWARD, Joshua B. FRYMAN, Tina C. ZHONG, Shaden SMITH, Sowmya PITCHAIMOORTHY, Samkit JAIN, Vincent CAVE, Sriram AANANTHAKRISHNAN, Bharadwaj KRISHNAMURTHY
MULTITHREADED PROCESSOR CORE WITH HARDWARE-ASSISTED TASK SCHEDULING

Publication number: 20200004587

Abstract: Embodiments of apparatuses, methods, and systems for a multithreaded processor core with hardware-assisted task scheduling are described. In an embodiment, a processor includes a first hardware thread, a second hardware thread, and a task manager. The task manager is to issue a task to the first hardware thread. The task manager includes a hardware task queue in which to store a plurality of task descriptors. Each of the task descriptors is to represent one of a single task, a collection of iterative tasks, and a linked list of tasks.

Type: Application

Filed: June 29, 2018

Publication date: January 2, 2020

Inventors: Paul Griffin, Joshua Fryman, Jason Howard, Sang Phill Park, Robert Pawlowski, Michael Abbott, Scott Cline, Samkit Jain, Ankit More, Vincent Cave, Fabrizio Petrini, Ivan Ganev
SYSTEM, APPARATUS AND METHOD FOR BARRIER SYNCHRONIZATION IN A MULTI-THREADED PROCESSOR

Publication number: 20200004602

Abstract: In one embodiment, a first processor core includes: a plurality of execution pipelines each to execute instructions of one or more threads; a plurality of pipeline barrier circuits coupled to the plurality of execution pipelines, each of the plurality of pipeline barrier circuits associated with one of the plurality of execution pipelines to maintain status information for a plurality of barrier groups, each of the plurality of barrier groups formed of at least two threads; and a core barrier circuit to control operation of the plurality of pipeline barrier circuits and to inform the plurality of pipeline barrier circuits when a first barrier has been reached by a first barrier group of the plurality of barrier groups. Other embodiments are described and claimed.

Type: Application

Filed: June 27, 2018

Publication date: January 2, 2020

Inventors: Robert Pawlowski, Ankit More, Shaden Smith, Sowmya Pitchaimoorthy, Samkit Jain, Vincent Cavé, Sriram Aananthakrishnan, Jason M. Howard, Joshua B. Fryman
Structures and operations of integrated circuits having network of configurable switches

Patent number: 10476492

Abstract: Embodiments herein may present an integrated circuit including a switch, where the switch together with other switches forms a network of switches to perform a sequence of operations according to a structure of a collective tree. The switch includes a first number of input ports, a second number of output ports, a configurable crossbar to selectively couple the first number of input ports to the second number of output ports, and a computation engine coupled to the first number of input ports, the second number of output ports, and the crossbar. The computation engine of the switch performs an operation corresponding to an operation represented by a node of the collective tree. The switch further includes one or more registers to selectively configure the first number of input ports and the configurable crossbar. Other embodiments may be described and/or claimed.

Type: Grant

Filed: November 27, 2018

Date of Patent: November 12, 2019

Assignee: Intel Corporation

Inventors: Ankit More, Jason M. Howard, Robert Pawlowski, Fabrizio Petrini, Shaden Smith
INSTRUCTION SET ARCHITECTURE TO FACILITATE ENERGY-EFFICIENT COMPUTING FOR EXASCALE ARCHITECTURES

Publication number: 20190303159

Abstract: Disclosed embodiments relate to an instruction set architecture to facilitate energy-efficient computing for exascale architectures. In one embodiment, a processor includes a plurality of accelerator cores, each having a corresponding instruction set architecture (ISA); a fetch circuit to fetch one or more instructions specifying one of the accelerator cores, a decode circuit to decode the one or more fetched instructions, and an issue circuit to translate the one or more decoded instructions into the ISA corresponding to the specified accelerator core, collate the one or more translated instructions into an instruction packet, and issue the instruction packet to the specified accelerator core; and, wherein the plurality of accelerator cores comprise a memory engine (MENG), a collective engine (CENG), a queue engine (QENG), and a chain management unit (CMU).

Type: Application

Filed: March 29, 2018

Publication date: October 3, 2019

Inventors: Joshua B. FRYMAN, Jason M. HOWARD, Priyanka SURESH, Banu Meenakshi NAGASUNDARAM, Srikanth DAKSHINAMOORTHY, Ankit MORE, Robert PAWLOWSKI, Samkit JAIN, Pranav YEOLEKAR, Avinash M. SEEGEHALLI, Surhud KHARE, Dinesh SOMASEKHAR, David S. DUNNING, Romain E. Cledat, William Paul GRIFFIN, Bhavitavya B. BHADVIYA, Ivan B. GANEV
STRUCTURES AND OPERATIONS OF INTEGRATED CIRCUITS HAVING NETWORK OF CONFIGURABLE SWITCHES

Publication number: 20190109590

Abstract: Embodiments herein may present an integrated circuit including a switch, where the switch together with other switches forms a network of switches to perform a sequence of operations according to a structure of a collective tree. The switch includes a first number of input ports, a second number of output ports, a configurable crossbar to selectively couple the first number of input ports to the second number of output ports, and a computation engine coupled to the first number of input ports, the second number of output ports, and the crossbar. The computation engine of the switch performs an operation corresponding to an operation represented by a node of the collective tree. The switch further includes one or more registers to selectively configure the first number of input ports and the configurable crossbar. Other embodiments may be described and/or claimed.

Type: Application

Filed: November 27, 2018

Publication date: April 11, 2019

Inventors: Ankit More, Jason M. Howard, Robert Pawlowski, Fabrizio Petrini, Shaden Smith
OPTIMIZED MEMORY ACCESS BANDWIDTH DEVICES, SYSTEMS, AND METHODS FOR PROCESSING LOW SPATIAL LOCALITY DATA

Publication number: 20180285252

Abstract: Optimized memory access bandwidth devices, systems, and methods for processing low spatial locality data are disclosed and described. A system memory is divided into a plurality of memory subsections, where each memory subsection is communicatively coupled to an independent memory channel to a memory controller. Memory access requests from a processor are thereby sent by the memory controller to only the appropriate memory subsection.

Type: Application

Filed: April 1, 2017

Publication date: October 4, 2018

Applicant: Intel Corporation

Inventors: Kon-Woo Kwon, Vivek Kozhikkottu, Sang Phill Park, Ankit More, William P. Griffin, Robert Pawlowski, Jason M. Howard, Joshua B. Fryman

prev 1 2