Patents by Inventor Christopher J. Hughes

Christopher J. Hughes has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Coalescing adjacent gather/scatter operations

Patent number: 12360774

Abstract: According to one embodiment, a processor includes an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction having a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the first instruction, to read contiguous a first and a second of the data elements from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry of the first storage location and a second data element in a second entry of a second storage location corresponding to the first entry of the first storage location.

Type: Grant

Filed: December 31, 2022

Date of Patent: July 15, 2025

Assignee: Intel Corporation

Inventors: Andrew T. Forsyth, Brian J. Hickmann, Jonathan C. Hall, Christopher J. Hughes
Dynamic cache coherence protocol based on runtime interconnect utilization

Patent number: 12332796

Abstract: In one embodiment, a processor includes interconnect circuitry, processing circuitry, a first cache, and cache controller circuitry. The interconnect circuitry communicates over a processor interconnect with a second processor that includes a second cache. The processing circuitry generates a memory read request for a corresponding memory address of a memory. Based on the memory read request, the cache controller circuitry detects a cache miss in the first cache, which indicates that the first cache does not contain a valid copy of data for the corresponding memory address. Based on the cache miss, the cache controller circuitry requests the data from the second cache or the memory based on a current bandwidth utilization of the processor interconnect.

Type: Grant

Filed: December 15, 2021

Date of Patent: June 17, 2025

Assignee: Intel Corporation

Inventors: Keqiang Wu, Lingxiang Xiang, Heidi Pan, Christopher J. Hughes, Zhe Wang
Systems and methods of instructions to accelerate multiplication of sparse matrices using bitmasks that identify non-zero elements

Patent number: 12287843

Abstract: Disclosed embodiments relate to accelerating multiplication of sparse matrices. In one example, a processor is to fetch and decode an instruction having fields to specify locations of first, second, and third matrices, and an opcode indicating the processor is to multiply and accumulate matching non-zero (NZ) elements of the first and second matrices with corresponding elements of the third matrix, and executing the decoded instruction as per the opcode to generate NZ bitmasks for the first and second matrices, broadcast up to two NZ elements at a time from each row of the first matrix and each column of the second matrix to a processing engine (PE) grid, each PE to multiply and accumulate matching NZ elements of the first and second matrices with corresponding elements of the third matrix. Each PE further to store an NZ element for use in a subsequent multiplications.

Type: Grant

Filed: November 6, 2023

Date of Patent: April 29, 2025

Assignee: Intel Corporation

Inventors: Dan Baum, Chen Koren, Elmoustapha Ould-Ahmed-Vall, Michael Espig, Christopher J. Hughes, Raanan Sade, Robert Valentine, Mark J. Charney, Alexander F. Heinecke
Apparatuses, methods, and systems for instructions to convert 16-bit floating-point formats

Patent number: 12277419

Abstract: Systems, methods, and apparatuses relating to instructions to convert 16-bit floating-point formats are described. In one embodiment, a processor includes fetch circuitry to fetch a single instruction having fields to specify an opcode and locations of a source vector comprising N plurality of 16-bit half-precision floating-point elements, and a destination vector to store N plurality of 16-bit bfloat floating-point elements, the opcode to indicate execution circuitry is to convert each of the elements of the source vector from 16-bit half-precision floating-point format to 16-bit bfloat floating-point format and store each converted element into a corresponding location of the destination vector, decode circuitry to decode the fetched single instruction into a decoded single instruction, and the execution circuitry to respond to the decoded single instruction as specified by the opcode.

Type: Grant

Filed: December 24, 2020

Date of Patent: April 15, 2025

Assignee: Intel Corporation

Inventors: Alexander F. Heinecke, Robert Valentine, Mark J. Charney, Menachem Adelman, Christopher J. Hughes, Evangelos Georganas, Zeev Sperber, Amit Gradstein, Simon Rubanovich
Systems for performing instructions to quickly convert and use tiles as 1D vectors

Patent number: 12265826

Abstract: Disclosed embodiments relate to systems for performing instructions to quickly convert and use matrices (tiles) as one-dimensional vectors. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode, locations of a two-dimensional (2D) matrix and a one-dimensional (1D) vector, and a group of elements comprising one of a row, part of a row, multiple rows, a column, part of a column, multiple columns, and a rectangular sub-tile of the specified 2D matrix, and wherein the opcode is to indicate a move of the specified group between the 2D matrix and the 1D vector, decode circuitry to decode the fetched instruction; and execution circuitry, responsive to the decoded instruction, when the opcode specifies a move from 1D, to move contents of the specified 1D vector to the specified group of elements.

Type: Grant

Filed: December 28, 2023

Date of Patent: April 1, 2025

Assignee: Intel Corporation

Inventors: Bret Toll, Christopher J. Hughes, Dan Baum, Elmoustapha Ould-Ahmed-Vall, Raanan Sade, Robert Valentine, Mark J. Charney, Alexander F. Heinecke
Adaptive remote atomics

Patent number: 12216579

Abstract: Disclosed embodiments relate to atomic memory operations. In one example, an apparatus includes multiple processor cores, a cache hierarchy, a local execution unit, and a remote execution unit, and an adaptive remote atomic operation unit. The cache hierarchy includes a local cache at a first level and a shared cache at a second level. The local execution unit is to perform an atomic operation at the first level if the local cache is a storing a cache line including data for the atomic operation. The remote execution unit is to perform the atomic operation at the second level. The adaptive remote atomic operation unit is to determine whether to perform the first atomic operation at the first level or at the second level and whether to copy the cache line from the shared cache to the local cache.

Type: Grant

Filed: December 25, 2020

Date of Patent: February 4, 2025

Assignee: Intel Corporation

Inventors: Carl J. Beckmann, Samantika S. Sury, Christopher J. Hughes, Lingxiang Xiang, Rahul Agrawal
Inter-cluster shared data management in sub-NUMA cluster

Patent number: 12210446

Abstract: An embodiment of an integrated circuit may comprise circuitry communicatively coupled to two or more sub-non-uniform memory access clusters (SNCs) to allocate a specified memory space in the two or more SNCs in accordance with a SNC memory allocation policy indicated from a request to initialize the specified memory space. An embodiment of an apparatus may comprise decode circuitry to decode a single instruction, the single instruction to include a field for an opcode, and execution circuitry to execute the decoded instruction according to the opcode to provide an indicated SNC memory allocation policy (e.g., a SNC policy hint). Other embodiments are disclosed and claimed.

Type: Grant

Filed: June 21, 2021

Date of Patent: January 28, 2025

Assignee: Intel Corporation

Inventors: Zhe Wang, Lingxiang Xiang, Christopher J. Hughes
Data locality enhancement for graphics processing units

Patent number: 12190118

Abstract: Embodiments described herein provide an apparatus comprising a plurality of processing resources including a first processing resource and a second processing resource, a memory communicatively coupled to the first processing resource and the second processing resource, and a processor to receive data dependencies for one or more tasks comprising one or more producer tasks executing on the first processing resource and one or more consumer tasks executing on the second processing resource and move a data output from one or more producer tasks executing on the first processing resource to a cache memory communicatively coupled to the second processing resource. Other embodiments may be described and claimed.

Type: Grant

Filed: June 22, 2023

Date of Patent: January 7, 2025

Assignee: INTEL CORPORATION

Inventors: Christopher J. Hughes, Prasoonkumar Surti, Guei-Yuan Lueh, Adam T. Lake, Jill Boyce, Subramaniam Maiyuran, Lidong Xu, James M. Holland, Vasanth Ranganathan, Nikos Kaburlasos, Altug Koker, Abhishek R. Appu
Systems and methods for performing matrix compress and decompress instructions

Patent number: 12175246

Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.

Type: Grant

Filed: September 1, 2023

Date of Patent: December 24, 2024

Assignee: Intel Corporation

Inventors: Dan Baum, Michael Espig, James Guilford, Wajdi K. Feghali, Raanan Sade, Christopher J. Hughes, Robert Valentine, Bret Toll, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney, Vinodh Gopal, Ronen Zohar, Alexander F. Heinecke
INSTRUCTIONS FOR REMOTE ATOMIC OPERATIONS

Publication number: 20240362021

Abstract: Disclosed embodiments relate to atomic memory operations. In one example, a method of executing an instruction atomically and with weak order includes: fetching, by fetch circuitry, the instruction from code storage, the instruction including an opcode, a source identifier, and a destination identifier, decoding, by decode circuitry, the fetched instruction, selecting, by a scheduling circuit, an execution circuit among multiple circuits in a system, scheduling, by the scheduling circuit, execution of the decoded instruction out of order with respect to other instructions, with an order selected to optimize at least one of latency, throughput, power, and performance, and executing the decoded instruction, by the execution circuit, to: atomically read a datum from a location identified by the destination identifier, perform an operation on the datum as specified by the opcode, the operation to use a source operand identified by the source identifier, and write a result back to the location.

Type: Application

Filed: May 21, 2024

Publication date: October 31, 2024

Inventors: Doddaballapur N. Jayasimha, Jonas Svennebring, Samantika S. Sury, Christopher J. Hughes, Jong Soo Park, Lingxiang Xiang
Apparatuses and methods for a processor architecture

Patent number: 12130740

Abstract: Embodiments of an invention a processor architecture are disclosed. In an embodiment, a processor includes a decoder, an execution unit, a coherent cache, and an interconnect. The decoder is to decode an instruction to zero a cache line. The execution unit is to issue a write command to initiate a cache line sized write of zeros. The coherent cache is to receive the write command, to determine whether there is a hit in the coherent cache and whether a cache coherency protocol state of the hit cache line is a modified state or an exclusive state, to configure a cache line to indicate all zeros, and to issue the write command toward the interconnect. The interconnect is to, responsive to receipt of the write command, issue a snoop to each of a plurality of other coherent caches for which it must be determined if there is a hit.

Type: Grant

Filed: April 4, 2022

Date of Patent: October 29, 2024

Assignee: Intel Corporation

Inventors: Jason W. Brandt, Robert S. Chappell, Jesus Corbal, Edward T. Grochowski, Stephen H. Gunther, Buford M. Guy, Thomas R. Huff, Christopher J. Hughes, Elmoustapha Ould-Ahmed-Vall, Ronak Singhal, Seyed Yahya Sotoudeh, Bret L. Toll, Lihu Rappoport, David B. Papworth, James D. Allen
SYSTEM, METHOD AND APPARATUS FOR CONDITIONALLY OFFLOADING INSTRUCTION EXECUTION

Publication number: 20240354107

Abstract: In one example, a processor includes: at least one core to execute instructions; and at least one cache memory coupled to the at least one core, the at least one cache memory to store data, at least some of the data a copy of data stored in a memory. The at least one core is to determine whether to conditionally offload a sequence of instructions for execution on a compute circuit associated with the memory, based at least in part on whether one or more first data is present in the at least one cache memory, the one or more first data for use during execution of the sequence of instructions. Other embodiments are described and claimed.

Type: Application

Filed: June 26, 2024

Publication date: October 24, 2024

Inventors: Frank Hady, Christopher J. Hughes, Scott Peterson
Matrix data scatter and gather between rows and irregularly spaced memory locations

Patent number: 12112167

Abstract: Embodiments for gathering and scattering matrix data by row are disclosed. In an embodiment, a processor includes a storage matrix, a decoder, and execution circuitry. The decoder is to decode an instruction having a format including an opcode field to specify an opcode and a first operand field to specify a set of irregularly spaced memory locations. The execution circuitry is to, in response to the decoded instruction, calculate a set of addresses corresponding to the set of irregularly spaced memory locations and transfer a set of rows of data between the storage and the set of irregularly spaced memory locations.

Type: Grant

Filed: June 27, 2020

Date of Patent: October 8, 2024

Assignee: Intel Corporation

Inventors: Christopher J. Hughes, Alexander F. Heinecke, Robert Valentine, Menachem Adelman, Evangelos Georganas, Mark J. Charney, Nikita A. Shustrov, Sara Baghsorkhi
MATRIX TRANSPOSE AND MULTIPLY

Publication number: 20240329938

Abstract: Embodiments for a matrix transpose and multiply operation are disclosed. In an embodiment, a processor includes a decoder and execution circuitry. The decoder is to decode an instruction having a format including an opcode field to specify an opcode, a first destination operand field to specify a destination matrix location, a first source operand field to specify a first source matrix location, and a second source operand field to specify a second source matrix location. The execution circuitry is to, in response to the decoded instruction, transpose the first source matrix to generate a transposed first source matrix, perform a matrix multiplication using the transposed first source matrix and the second source matrix to generate a result, and store the result in a destination matrix location.

Type: Application

Filed: March 15, 2024

Publication date: October 3, 2024

Applicant: Intel Corporation

Inventors: Menachem Adelman, Robert Valentine, Barukh Ziv, Amit Gradstein, Simon Rubanovich, Zeev Sperber, Mark J. Charney, Christopher J. Hughes, Alexander F. Heinecke, Evangelos Georganas, Binh Pham
Processor instructions for data compression and decompression

Patent number: 12106104

Abstract: A processor that includes compression instructions to compress multiple adjacent data blocks of uncompressed read-only data stored in memory into one compressed read-only data block and store the compressed read-only data block in multiple adjacent blocks in the memory is provided. During execution of an application to operate on the read-only data, one of the multiple adjacent blocks storing the compressed read-only block is read from memory, stored in a prefetch buffer and decompressed in the memory controller. In response to a subsequent request during execution of the application for an adjacent data block in the compressed read-only data block, the uncompressed adjacent block is read directly from the prefetch buffer.

Type: Grant

Filed: December 23, 2020

Date of Patent: October 1, 2024

Assignee: Intel Corporation

Inventors: Zhe Wang, Alaa R. Alameldeen, Christopher J. Hughes
DYNAMIC CACHE COHERENCE PROTOCOL BASED ON RUNTIME INTERCONNECT UTILIZATION

Publication number: 20240303195

Abstract: In one embodiment, a processor includes interconnect circuitry, processing circuitry, a first cache, and cache controller circuitry. The interconnect circuitry communicates over a processor interconnect with a second processor that includes a second cache. The processing circuitry generates a memory read request for a corresponding memory address of a memory. Based on the memory read request, the cache controller circuitry detects a cache miss in the first cache, which indicates that the first cache does not contain a valid copy of data for the corresponding memory address. Based on the cache miss, the cache controller circuitry requests the data from the second cache or the memory based on a current bandwidth utilization of the processor interconnect.

Type: Application

Filed: December 15, 2021

Publication date: September 12, 2024

Applicant: Intel Corporation

Inventors: Keqiang Wu, Lingxiang Xiang, Heidi Pan, Christopher J. Hughes, Zhe Wang
Apparatuses, methods, and systems for 8-bit floating-point matrix dot product instructions

Patent number: 12056489

Abstract: Systems, methods, and apparatuses relating to 8-bit floating-point matrix dot product instructions are described.

Type: Grant

Filed: May 5, 2023

Date of Patent: August 6, 2024

Assignee: Intel Corporation

Inventors: Naveen Mellempudi, Alexander F. Heinecke, Robert Valentine, Mark J. Charney, Christopher J. Hughes, Evangelos Georganas, Zeev Sperber, Amit Gradstein, Simon Rubanovich
Apparatuses, methods, and systems for 8-bit floating-point matrix dot product instructions

Patent number: 12020028

Abstract: Systems, methods, and apparatuses relating to 8-bit floating-point matrix dot product instructions are described.

Type: Grant

Filed: December 26, 2020

Date of Patent: June 25, 2024

Assignee: Intel Corporation

Inventors: Naveen Mellempudi, Alexander F. Heinecke, Robert Valentine, Mark J. Charney, Christopher J. Hughes, Evangelos Georganas, Zeev Sperber, Amit Gradstein, Simon Rubanovich
Instructions for remote atomic operations

Patent number: 11989555

Abstract: Disclosed embodiments relate to atomic memory operations. In one example, a method of executing an instruction atomically and with weak order includes: fetching, by fetch circuitry, the instruction from code storage, the instruction including an opcode, a source identifier, and a destination identifier, decoding, by decode circuitry, the fetched instruction, selecting, by a scheduling circuit, an execution circuit among multiple circuits in a system, scheduling, by the scheduling circuit, execution of the decoded instruction out of order with respect to other instructions, with an order selected to optimize at least one of latency, throughput, power, and performance, and executing the decoded instruction, by the execution circuit, to: atomically read a datum from a location identified by the destination identifier, perform an operation on the datum as specified by the opcode, the operation to use a source operand identified by the source identifier, and write a result back to the location.

Type: Grant

Filed: June 29, 2017

Date of Patent: May 21, 2024

Assignee: Intel Corporation

Inventors: Doddaballapur N. Jayasimha, Jonas Svennebring, Samantika S. Sury, Christopher J. Hughes, Jong Soo Park, Lingxiang Xiang
INTER-CLUSTER SHARED DATA MANAGEMENT IN SUB-NUMA CLUSTER

Publication number: 20240152448

Abstract: An embodiment of an integrated circuit may comprise circuitry communicatively coupled to two or more sub-non-uniform memory access clusters (SNCs) to allocate a specified memory space in the two or more SNCs in accordance with a SNC memory allocation policy indicated from a request to initialize the specified memory space. An embodiment of an apparatus may comprise decode circuitry to decode a single instruction, the single instruction to include a field for an opcode, and execution circuitry to execute the decoded instruction according to the opcode to provide an indicated SNC memory allocation policy (e.g., a SNC policy hint). Other embodiments are disclosed and claimed.

Type: Application

Filed: June 21, 2021

Publication date: May 9, 2024

Applicant: Intel Corporation

Inventors: Zhe Wang, Lingxiang Xiang, Christopher J. Hughes

1 2 3 4 5 … next