Patents by Inventor Joshua B. Fryman

Joshua B. Fryman has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

MEMORY RANK DESIGN FOR A MEMORY CHANNEL THAT IS OPTIMIZED FOR GRAPH APPLICATIONS

Publication number: 20200233819

Abstract: An apparatus is described. The apparatus includes a rank of memory chips to couple to a memory channel. The memory channel is characterized as having eight transfers of eight bits of raw data per burst access. The rank of memory chips has first, second and third X4 memory chips. The X4 memory chips conform to a JEDEC dual data rate (DDR) memory interface specification. The first and second X4 memory chips are to couple to an eight bit raw data portion of the memory channel's data bus. The third X4 memory chip to couple to an error correction coding (ECC) information portion of the memory channel's data bus.

Type: Application

Filed: March 27, 2020

Publication date: July 23, 2020

Inventors: Byoungchan OH, Sai Dheeraj POLAGANI, Joshua B. FRYMAN
Indirect memory fetcher

Patent number: 10684858

Abstract: Disclosed embodiments relate to an indirect memory fetch (IMF) unit. In one example, an apparatus includes circuitry to fetch and decode an instruction specifying a sparse operand array including N operands, and an index array including N contiguously-addressed indices. The apparatus further includes a processing engine associated with an IMF unit to respond to the decoded instruction by initializing the IMF unit to fetch the N operands in order, probing the IMF unit to determine that a fetched operand is ready to retrieve, retrieving the fetched operand from the IMF unit, and repeating the probing and retrieving until all N operands have been retrieved. The IMF unit, independent of the processing engine, is to fetch the N contiguously-addressed indices from the index array, use the N fetched indices to calculate memory addresses for the N operands, and issue a plurality of read requests to fetch the N operands in order.

Type: Grant

Filed: June 1, 2018

Date of Patent: June 16, 2020

Assignee: Intel Corporation

Inventors: Stijn Eyerman, Wim Heirman, Kristof Du Bois, Ibrahim Hur, Joshua B. Fryman
System, Apparatus And Method For Dynamic Automatic Sub-Cacheline Granularity Memory Access Control

Publication number: 20200174929

Abstract: In one embodiment, an apparatus includes a memory access circuit to receive memory access instructions and provide at least some of the memory access instructions to a memory subsystem for execution. The memory access circuit may have a conversion circuit to convert the first memory access instruction to a first subline memory access instruction, e.g., based at least in part on an access history for a first memory access instruction. Other embodiments are described and claimed.

Type: Application

Filed: November 29, 2018

Publication date: June 4, 2020

Inventors: Wim Heirman, Stijn Eyerman, Kristof Du Bois, Ibrahim Hur, Joshua B. Fryman
MEMORY SYSTEM ARCHITECTURE FOR MULTI-THREADED PROCESSORS

Publication number: 20200104164

Abstract: Disclosed embodiments relate to an improved memory system architecture for multi-threaded processors. In one example, a system includes a system comprising a multi-threaded processor core (MTPC), the MTPC comprising: P pipelines, each to concurrently process T threads; a crossbar to communicatively couple the P pipelines; a memory for use by the P pipelines, a scheduler to optimize reduction operations by assigning multiple threads to generate results of commutative arithmetic operations, and then accumulate the generated results, and a memory controller (MC) to connect with external storage and other MTPCs, the MC further comprising at least one optimization selected from: an instruction set architecture including a dual-memory operation; a direct memory access (DMA) engine; a buffer to store multiple pending instruction cache requests; multiple channels across which to stripe memory requests; and a shadow-tag coherency management unit.

Type: Application

Filed: September 28, 2018

Publication date: April 2, 2020

Inventors: Robert PAWLOWSKI, Ankit MORE, Jason M. HOWARD, Joshua B. FRYMAN, Tina C. ZHONG, Shaden SMITH, Sowmya PITCHAIMOORTHY, Samkit JAIN, Vincent CAVE, Sriram AANANTHAKRISHNAN, Bharadwaj KRISHNAMURTHY
SYSTEM, APPARATUS AND METHOD FOR BARRIER SYNCHRONIZATION IN A MULTI-THREADED PROCESSOR

Publication number: 20200004602

Abstract: In one embodiment, a first processor core includes: a plurality of execution pipelines each to execute instructions of one or more threads; a plurality of pipeline barrier circuits coupled to the plurality of execution pipelines, each of the plurality of pipeline barrier circuits associated with one of the plurality of execution pipelines to maintain status information for a plurality of barrier groups, each of the plurality of barrier groups formed of at least two threads; and a core barrier circuit to control operation of the plurality of pipeline barrier circuits and to inform the plurality of pipeline barrier circuits when a first barrier has been reached by a first barrier group of the plurality of barrier groups. Other embodiments are described and claimed.

Type: Application

Filed: June 27, 2018

Publication date: January 2, 2020

Inventors: Robert Pawlowski, Ankit More, Shaden Smith, Sowmya Pitchaimoorthy, Samkit Jain, Vincent Cavé, Sriram Aananthakrishnan, Jason M. Howard, Joshua B. Fryman
INDIRECT MEMORY FETCHER

Publication number: 20190369998

Abstract: Disclosed embodiments relate to an indirect memory fetch (IMF) unit. In one example, an apparatus includes circuitry to fetch and decode an instruction specifying a sparse operand array including N operands, and an index array including N contiguously-addressed indices. The apparatus further includes a processing engine associated with an IMF unit to respond to the decoded instruction by initializing the IMF unit to fetch the N operands in order, probing the IMF unit to determine that a fetched operand is ready to retrieve, retrieving the fetched operand from the IMF unit, and repeating the probing and retrieving until all N operands have been retrieved. The IMF unit, independent of the processing engine, is to fetch the N contiguously-addressed indices from the index array, use the N fetched indices to calculate memory addresses for the N operands, and issue a plurality of read requests to fetch the N operands in order.

Type: Application

Filed: June 1, 2018

Publication date: December 5, 2019

Inventors: Stijn EYERMAN, Wim HEIRMAN, Kristof DU BOIS, Ibrahim HUR, Joshua B. FRYMAN
INSTRUCTION SET ARCHITECTURE TO FACILITATE ENERGY-EFFICIENT COMPUTING FOR EXASCALE ARCHITECTURES

Publication number: 20190303159

Abstract: Disclosed embodiments relate to an instruction set architecture to facilitate energy-efficient computing for exascale architectures. In one embodiment, a processor includes a plurality of accelerator cores, each having a corresponding instruction set architecture (ISA); a fetch circuit to fetch one or more instructions specifying one of the accelerator cores, a decode circuit to decode the one or more fetched instructions, and an issue circuit to translate the one or more decoded instructions into the ISA corresponding to the specified accelerator core, collate the one or more translated instructions into an instruction packet, and issue the instruction packet to the specified accelerator core; and, wherein the plurality of accelerator cores comprise a memory engine (MENG), a collective engine (CENG), a queue engine (QENG), and a chain management unit (CMU).

Type: Application

Filed: March 29, 2018

Publication date: October 3, 2019

Inventors: Joshua B. FRYMAN, Jason M. HOWARD, Priyanka SURESH, Banu Meenakshi NAGASUNDARAM, Srikanth DAKSHINAMOORTHY, Ankit MORE, Robert PAWLOWSKI, Samkit JAIN, Pranav YEOLEKAR, Avinash M. SEEGEHALLI, Surhud KHARE, Dinesh SOMASEKHAR, David S. DUNNING, Romain E. Cledat, William Paul GRIFFIN, Bhavitavya B. BHADVIYA, Ivan B. GANEV
Apparatus and method for efficiently implementing a processor pipeline

Patent number: 10409763

Abstract: Various different embodiments of the invention are described including: (1) a method and apparatus for intelligently allocating threads within a binary translation system; (2) data cache way prediction guided by binary translation code morphing software; (3) fast interpreter hardware support on the data-side; (4) out-of-order retirement; (5) decoupled load retirement in an atomic OOO processor; (6) handling transactional and atomic memory in an out-of-order binary translation based processor; and (7) speculative memory management in a binary translation based out of order processor.

Type: Grant

Filed: June 30, 2014

Date of Patent: September 10, 2019

Assignee: INTEL CORPORATION

Inventors: Patrick P. Lai, Ethan Schuchman, David Keppel, Denis M. Khartikov, Polychronis Xekalakis, Joshua B. Fryman, Allan D. Knies, Naveen Neelakantam, Gregor Stellpflug, John H. Kelm, Mirem Hyuseinova Seidahmedova, Demos Pavlou, Jaroslaw Topp
System, apparatus and method for low overhead control transfer to alternate address space in a processor

Patent number: 10296338

Abstract: In one embodiment, a processor includes: an accelerator associated with a first address space; a core associated with a second address space and including an alternate address space configuration register to store configuration information to enable the core to execute instructions from the first address space; and a control logic to configure the core based in part on information in the alternate address space configuration register. Other embodiments are described and claimed.

Type: Grant

Filed: December 9, 2016

Date of Patent: May 21, 2019

Assignee: Intel Corporation

Inventors: Brent R. Boswell, Banu Meenakshi Nagasundaram, Michael D. Abbott, Srikanth Dakshinamoorthy, Jason M. Howard, Joshua B. Fryman
STORAGE ARCHITECTURES FOR GRAPH ANALYSIS APPLICATIONS

Publication number: 20190042613

Abstract: Methods, apparatus, systems and articles of manufacture to build a storage architecture for graph data are disclosed herein. Disclosed example apparatus include a neighbor identifier to identify respective sets of neighboring vertices of a graph. The neighboring vertices included in the respective sets are adjacent to respective ones of a plurality of vertices of the graph and respective sets of neighboring vertices are represented as respective lists of neighboring vertex identifiers. The apparatus also includes an element creator to create, in a cache memory, an array of elements that are unpopulated. The array elements have lengths equal to a length of a cache line. In addition, the apparatus includes an element populater to populate the elements with neighboring vertex identifiers. Each of the elements store neighboring vertex identifiers of respective ones of the list of neighboring vertex identifiers.

Type: Application

Filed: March 30, 2018

Publication date: February 7, 2019

Inventors: Stijn Eyerman, Jason M. Howard, Ibrahim Hur, Ivan B. Ganev, Fabrizio Petrini, Joshua B. Fryman
OPTIMIZED MEMORY ACCESS BANDWIDTH DEVICES, SYSTEMS, AND METHODS FOR PROCESSING LOW SPATIAL LOCALITY DATA

Publication number: 20180285252

Abstract: Optimized memory access bandwidth devices, systems, and methods for processing low spatial locality data are disclosed and described. A system memory is divided into a plurality of memory subsections, where each memory subsection is communicatively coupled to an independent memory channel to a memory controller. Memory access requests from a processor are thereby sent by the memory controller to only the appropriate memory subsection.

Type: Application

Filed: April 1, 2017

Publication date: October 4, 2018

Applicant: Intel Corporation

Inventors: Kon-Woo Kwon, Vivek Kozhikkottu, Sang Phill Park, Ankit More, William P. Griffin, Robert Pawlowski, Jason M. Howard, Joshua B. Fryman
SYSTEM, APPARATUS AND METHOD FOR LOW OVERHEAD CONTROL TRANSFER TO ALTERNATE ADDRESS SPACE IN A PROCESSOR

Publication number: 20180165203

Abstract: In one embodiment, a processor includes: an accelerator associated with a first address space; a core associated with a second address space and including an alternate address space configuration register to store configuration information to enable the core to execute instructions from the first address space; and a control logic to configure the core based in part on information in the alternate address space configuration register. Other embodiments are described and claimed.

Type: Application

Filed: December 9, 2016

Publication date: June 14, 2018

Inventors: Brent R. Boswell, Banu Meenakshi Nagasundaram, Michael D. Abbott, Srikanth Dakshinamoorthy, Jason M. Howard, Joshua B. Fryman
Collective communications apparatus and method for parallel systems

Patent number: 9477628

Abstract: A collective communication apparatus and method for parallel computing systems. For example, one embodiment of an apparatus comprises a plurality of processor elements (PEs); collective interconnect logic to dynamically form a virtual collective interconnect (VCI) between the PEs at runtime without global communication among all of the PEs, the VCI defining a logical topology between the PEs in which each PE is directly communicatively coupled to a only a subset of the remaining PEs; and execution logic to execute collective operations across the PEs, wherein one or more of the PEs receive first results from a first portion of the subset of the remaining PEs, perform a portion of the collective operations, and provide second results to a second portion of the subset of the remaining PEs.

Type: Grant

Filed: September 28, 2013

Date of Patent: October 25, 2016

Assignee: Intel Corporation

Inventors: Allan D. Knies, David Pardo Keppel, Dong Hyuk Woo, Joshua B. Fryman
Reconfigurable apparatus for hierarchical collective networks with bypass mode

Patent number: 9405724

Abstract: A reconfigurable tree apparatus with a bypass mode and a method of using the reconfigurable tree apparatus are disclosed. The reconfigurable tree apparatus uses a short-circuit register to selectively designate participating agents for such operations as barriers, multicast, and reductions. The reconfigurable tree apparatus enables an agent to initiate a barrier, multicast, or reduction operation, leaving software to determine the participating agents for each operation. Although the reconfigurable tree apparatus is implemented using a small number of wires, multiple in-flight barrier, multicast, and reduction operations can take place. The method and apparatus have low complexity, easy reconfigurability, and provide the energy savings necessary for future exa-scale machines.

Type: Grant

Filed: June 28, 2013

Date of Patent: August 2, 2016

Assignee: INTEL CORPORATION

Inventors: Jianping Xu, Asit K. Mishra, Joshua B. Fryman, David S. Dunning
APPARATUS AND METHOD FOR EFFICIENTLY IMPLEMENTING A PROCESSOR PIPELINE

Publication number: 20150378731

Abstract: Various different embodiments of the invention are described including: (1) a method and apparatus for intelligently allocating threads within a binary translation system; (2) data cache way prediction guided by binary translation code morphing software; (3) fast interpreter hardware support on the data-side; (4) out-of-order retirement; (5) decoupled load retirement in an atomic OOO processor; (6) handling transactional and atomic memory in an out-of-order binary translation based processor; and (7) speculative memory management in a binary translation based out of order processor.

Type: Application

Filed: June 30, 2014

Publication date: December 31, 2015

Inventors: PATRICK P. LAI, ETHAN SCHUCHMAN, DAVID KEPPEL, DENIS M. KHARTIKOV, POLYCHRONIS XEKALAKIS, JOSHUA B. FRYMAN, ALLAN D. KNIES, NAVEEN NEELAKANTAM, GREGOR STELLPFLUG, JOHN H. KELM, MIREM HYUSEINOVA, DEMOS PAVLOU, JAROSLAW TOPP
COLLECTIVE COMMUNICATIONS APPARATUS AND METHOD FOR PARALLEL SYSTEMS

Publication number: 20150095542

Abstract: A collective communication apparatus and method for parallel computing systems. For example, one embodiment of an apparatus comprises a plurality of processor elements (PEs); collective interconnect logic to dynamically form a virtual collective interconnect (VCI) between the PEs at runtime without global communication among all of the PEs, the VCI defining a logical topology between the PEs in which each PE is directly communicatively coupled to a only a subset of the remaining PEs; and execution logic to execute collective operations across the PEs, wherein one or more of the PEs receive first results from a first portion of the subset of the remaining PEs, perform a portion of the collective operations, and provide second results to a second portion of the subset of the remaining PEs.

Type: Application

Filed: September 28, 2013

Publication date: April 2, 2015

Inventors: Allan D. Knies, David Pardo Keppel, Dong Hyuk Woo, Joshua B. Fryman
RECONFIGURABLE APPARATUS FOR HIERARCHICAL COLLECTIVE NETWORKS WITH BYPASS MODE

Publication number: 20150006849

Abstract: A reconfigurable tree apparatus with a bypass mode and a method of using the reconfigurable tree apparatus are disclosed. The reconfigurable tree apparatus uses a short-circuit register to selectively designate participating agents for such operations as barriers, multicast, and reductions. The reconfigurable tree apparatus enables an agent to initiate a barrier, multicast, or reduction operation, leaving software to determine the participating agents for each operation. Although the reconfigurable tree apparatus is implemented using a small number of wires, multiple in-flight barrier, multicast, and reduction operations can take place. The method and apparatus have low complexity, easy reconfigurability, and provide the energy savings necessary for future exa-scale machines.

Type: Application

Filed: June 28, 2013

Publication date: January 1, 2015

Inventors: Jianping Xu, Asit K. Mishra, Joshua B. Fryman, David S. Dunning
Using Reduced Instruction Set Cores

Publication number: 20140258685

Abstract: A processor may be built with cores that only execute some partial set of the instructions needed to be fully backwards compliant. Thus, in some embodiments power consumption may be reduced by providing partial cores that only execute certain instructions and not other instructions. The instructions not supported may be handled in other, more energy efficient ways, so that, the overall processor, including the partial core, may be fully backwards compliant.

Type: Application

Filed: December 30, 2011

Publication date: September 11, 2014

Inventors: Srihari Makineni, Steven R. King, Alexander Redkin, Joshua B. Fryman, Ravishankar Iyer, Pavel S. Smirnov, Dmitry Gusev, Dmitri Pavlov
EXPOSING CONTROL OF POWER AND CLOCK GATING FOR SOFTWARE

Publication number: 20140095896

Abstract: A processor includes at least one power domain, each power domain including at least one core that switchably receives power supply from a voltage regulator and switchably receives a clock signal from a clock source, a cache, and at least one control registers having stored thereon data indicating power management states of the at least one power domain and the cache.

Type: Application

Filed: September 28, 2012

Publication date: April 3, 2014

Inventors: Nicholas P. Carter, Joshua B. Fryman, Robert C. Knauerhase, Aditya B. Agrawal, Josep Torrellas
Adaptive optimized compare-exchange operation

Patent number: 8601242

Abstract: A technique to perform a fast compare-exchange operation is disclosed. More specifically, a machine-readable medium, processor, and system are described that implement a fast compare-exchange operation as well as a cache line mark operation that enables the fast compare-exchange operation.

Type: Grant

Filed: December 18, 2009

Date of Patent: December 3, 2013

Assignee: Intel Corporation

Inventors: Joshua B. Fryman, Andrew Thomas Forsyth, Edward Grochowski

prev 1 2 3 next