Patents by Inventor Christopher J. Hughes

Christopher J. Hughes has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

APPARATUSES AND METHODS FOR A PROCESSOR ARCHITECTURE

Publication number: 20220237123

Abstract: Embodiments of an invention a processor architecture are disclosed. In an embodiment, a processor includes a decoder, an execution unit, a coherent cache, and an interconnect. The decoder is to decode an instruction to zero a cache line. The execution unit is to issue a write command to initiate a cache line sized write of zeros. The coherent cache is to receive the write command, to determine whether there is a hit in the coherent cache and whether a cache coherency protocol state of the hit cache line is a modified state or an exclusive state, to configure a cache line to indicate all zeros, and to issue the write command toward the interconnect. The interconnect is to, responsive to receipt of the write command, issue a snoop to each of a plurality of other coherent caches for which it must be determined if there is a hit.

Type: Application

Filed: April 4, 2022

Publication date: July 28, 2022

Applicant: Intel Corporation

Inventors: Jason W. Brandt, Robert S. Chappell, Jesus Corbal, Edward T. Grochowski, Stephen H. Gunther, Buford M. Guy, Thomas R. Huff, Christopher J. Hughes, Elmoustapha Ould-Ahmed-Vall, Ronak Singhal, Seyed Yahya Sotoudeh, Bret L. Toll, Lihu Rappoport, David B. Papworth, James D. Allen
METHOD AND APPARATUS FOR PERFORMING REDUCTION OPERATIONS ON A PLURALITY OF ASSOCIATED DATA ELEMENT VALUES

Publication number: 20220229661

Abstract: Embodiments detailed herein relate to reduction operations on a plurality of data element values. In one embodiment, a process comprises decoding circuitry to decode an instruction and execution circuitry to execute the decoded instruction. The instruction specifies a first input register containing a plurality of data element values, a first index register containing a plurality of indices, and an output register, where each index of the plurality of indices maps to one unique data element position of the first input register. The execution includes to identify data element values that are associated with one another based on the indices, perform one or more reduction operations on the associated data element values based on the identification, and store results of the one or more reduction operations in the output register.

Type: Application

Filed: April 4, 2022

Publication date: July 21, 2022

Inventors: Christopher J. HUGHES, Jonathan D. PEARCE, Guei-Yuan LUEH, ElMoustapha OULD-AHMED-VALL, Jorge E. PARRA, Prasoonkumar SURTI, Krishna N. VINOD, Ronen ZOHAR
Systems and methods for performing instructions to transform matrices into row-interleaved format

Patent number: 11392381

Abstract: Disclosed embodiments relate to systems and methods for performing instructions to transform matrices into a row-interleaved format. In one example, a processor includes fetch and decode circuitry to fetch and decode an instruction having fields to specify an opcode and locations of source and destination matrices, wherein the opcode indicates that the processor is to transform the specified source matrix into the specified destination matrix having the row-interleaved format; and execution circuitry to respond to the decoded instruction by transforming the specified source matrix into the specified RowInt-formatted destination matrix by interleaving J elements of each J-element sub-column of the specified source matrix in either row-major or column-major order into a K-wide submatrix of the specified destination matrix, the K-wide submatrix having K columns and enough rows to hold the J elements.

Type: Grant

Filed: March 29, 2021

Date of Patent: July 19, 2022

Assignee: Intel Corporation

Inventors: Raanan Sade, Robert Valentine, Bret Toll, Christopher J. Hughes, Alexander F. Heinecke, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney
No-locality hint vector memory access processors, methods, systems, and instructions

Patent number: 11392500

Abstract: A processor of an aspect includes a plurality of packed data registers, and a decode unit to decode a no-locality hint vector memory access instruction. The no-locality hint vector memory access instruction to indicate a packed data register of the plurality of packed data registers that is to have a source packed memory indices. The source packed memory indices to have a plurality of memory indices. The no-locality hint vector memory access instruction is to provide a no-locality hint to the processor for data elements that are to be accessed with the memory indices. The processor also includes an execution unit coupled with the decode unit and the plurality of packed data registers. The execution unit, in response to the no-locality hint vector memory access instruction, is to access the data elements at memory locations that are based on the memory indices.

Type: Grant

Filed: October 21, 2020

Date of Patent: July 19, 2022

Assignee: Intel Corporation

Inventor: Christopher J. Hughes
APPARATUSES, METHODS, AND SYSTEMS FOR 8-BIT FLOATING-POINT MATRIX DOT PRODUCT INSTRUCTIONS

Publication number: 20220206801

Abstract: Systems, methods, and apparatuses relating to 8-bit floating-point matrix dot product instructions are described.

Type: Application

Filed: December 26, 2020

Publication date: June 30, 2022

Applicant: Intel Corporation

Inventors: Naveen Mellempudi, Alexander F. Heinecke, Robert Valentine, Mark J. Charney, Christopher J. Hughes, Evangelos Georganas, Zeev Sperber, Amit Gradstein, Simon Rubanovich
ADAPTIVE REMOTE ATOMICS

Publication number: 20220206945

Abstract: Disclosed embodiments relate to atomic memory operations. In one example, an apparatus includes multiple processor cores, a cache hierarchy, a local execution unit, and a remote execution unit, and an adaptive remote atomic operation unit. The cache hierarchy includes a local cache at a first level and a shared cache at a second level. The local execution unit is to perform an atomic operation at the first level if the local cache is a storing a cache line including data for the atomic operation. The remote execution unit is to perform the atomic operation at the second level. The adaptive remote atomic operation unit is to determine whether to perform the first atomic operation at the first level or at the second level and whether to copy the cache line from the shared cache to the local cache.

Type: Application

Filed: December 25, 2020

Publication date: June 30, 2022

Applicant: Intel Corporation

Inventors: Carl J. Beckmann, Samantika S. Sury, Christopher J. Hughes, Lingxiang Xiang, Rahul Agrawal
PROCESSOR INSTRUCTIONS FOR DATA COMPRESSION AND DECOMPRESSION

Publication number: 20220197642

Abstract: A processor that includes compression instructions to compress multiple adjacent data blocks of uncompressed read-only data stored in memory into one compressed read-only data block and store the compressed read-only data block in multiple adjacent blocks in the memory is provided. During execution of an application to operate on the read-only data, one of the multiple adjacent blocks storing the compressed read-only block is read from memory, stored in a prefetch buffer and decompressed in the memory controller. In response to a subsequent request during execution of the application for an adjacent data block in the compressed read-only data block, the uncompressed adjacent block is read directly from the prefetch buffer.

Type: Application

Filed: December 23, 2020

Publication date: June 23, 2022

Inventors: Zhe WANG, Alaa R. ALAMELDEEN, Christopher J. HUGHES
Method and apparatus for data-ready memory operations

Patent number: 11360771

Abstract: Disclosed embodiments relate to a new instruction for performing data-ready memory access operations. In one example, a system includes circuits to fetch, decode, and execute an instruction that includes an opcode, at least one memory location identifier identifying at least one data element, a register identifier, a data readiness indicator identifying at least one data access condition, and a data readiness mask, wherein the execution circuit is to, for each data element of the at least one data element, determine whether a memory request for the data element satisfies the at least one data access condition identified by the data readiness indicator, and in response to determining that the data access condition: generate a prefetch request for the data element, and set a value in a corresponding data element position of the data readiness mask to indicate that the memory request for the data element does not satisfy the at least one data access condition.

Type: Grant

Filed: June 30, 2017

Date of Patent: June 14, 2022

Assignee: Intel Corporation

Inventors: William M. Brown, Mikhail Plotnikov, Christopher J. Hughes
SYSTEMS AND METHODS FOR PERFORMING MATRIX COMPRESS AND DECOMPRESS INSTRUCTIONS

Publication number: 20220171627

Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.

Type: Application

Filed: February 15, 2022

Publication date: June 2, 2022

Inventors: Dan BAUM, Michael ESPIG, James GUILFORD, Wajdi K. FEGHALI, Raanan SADE, Christopher J. HUGHES, Robert VALENTINE, Bret TOLL, Elmoustapha OULD-AHMED-VALL, Mark J. CHARNEY, Vinodh GOPAL, Ronen ZOHAR, Alexander F. HEINECKE
Apparatuses and methods for a processor architecture

Patent number: 11294809

Abstract: Embodiments of an invention a processor architecture are disclosed. In an embodiment, a processor includes a decoder, an execution unit, a coherent cache, and an interconnect. The decoder is to decode an instruction to zero a cache line. The execution unit is to issue a write command to initiate a cache line sized write of zeros. The coherent cache is to receive the write command, to determine whether there is a hit in the coherent cache and whether a cache coherency protocol state of the hit cache line is a modified state or an exclusive state, to configure a cache line to indicate all zeros, and to issue the write command toward the interconnect. The interconnect is to, responsive to receipt of the write command, issue a snoop to each of a plurality of other coherent caches for which it must be determined if there is a hit.

Type: Grant

Filed: August 28, 2018

Date of Patent: April 5, 2022

Assignee: Intel Corporation

Inventors: Jason W. Brandt, Robert S. Chappell, Jesus Corbal, Edward T. Grochowski, Stephen H. Gunther, Buford M. Guy, Thomas R. Huff, Christopher J. Hughes, Elmoustapha Ould-Ahmed-Vall, Ronak Singhal, Seyed Yahya Sotoudeh, Bret L. Toll, Lihu Rappoport, David Papworth, James D. Allen
Method and apparatus for performing reduction operations on a plurality of associated data element values

Patent number: 11294670

Abstract: Embodiments detailed herein relate to reduction operations on a plurality of data element values. In one embodiment, a process comprises decoding circuitry to decode an instruction and execution circuitry to execute the decoded instruction. The instruction specifies a first input register containing a plurality of data element values, a first index register containing a plurality of indices, and an output register, where each index of the plurality of indices maps to one unique data element position of the first input register. The execution includes to identify data element values that are associated with one another based on the indices, perform one or more reduction operations on the associated data element values based on the identification, and store results of the one or more reduction operations in the output register.

Type: Grant

Filed: March 27, 2019

Date of Patent: April 5, 2022

Assignee: INTEL CORPORATION

Inventors: Christopher J. Hughes, Jonathan D. Pearce, Guei-Yuan Lueh, ElMoustapha Ould-Ahmed-Vall, Jorge E. Parra, Prasoonkumar Surti, Krishna N. Vinod, Ronen Zohar
Systems and methods for performing duplicate detection instructions on 2D data

Patent number: 11294671

Abstract: Disclosed embodiments relate to systems and methods for performing duplicate detection instructions on two-dimensional (2D) data. In one example, a processor includes fetch circuitry to fetch an instruction, decode circuitry to decode the fetched instruction having fields to specify an opcode and locations of a source matrix comprising M×N elements and a destination, the opcode to indicate execution circuitry is to use a plurality of comparators to discover duplicates in the source matrix, and store indications of locations of discovered duplicates in the destination. The execution circuitry to execute the decoded instruction as per the opcode.

Type: Grant

Filed: December 26, 2018

Date of Patent: April 5, 2022

Assignee: Intel Corporation

Inventors: Christopher J. Hughes, Michael Espig, Dan Baum, Robert Valentine, Bret Toll, Elmoustapha Ould-Ahmed-Vall
APPARATUSES, METHODS, AND SYSTEMS FOR INSTRUCTIONS TO CONVERT 16-BIT FLOATING-POINT FORMATS

Publication number: 20220100507

Abstract: Systems, methods, and apparatuses relating to instructions to convert 16-bit floating-point formats are described. In one embodiment, a processor includes fetch circuitry to fetch a single instruction having fields to specify an opcode and locations of a source vector comprising N plurality of 16-bit half-precision floating-point elements, and a destination vector to store N plurality of 16-bit bfloat floating-point elements, the opcode to indicate execution circuitry is to convert each of the elements of the source vector from 16-bit half-precision floating-point format to 16-bit bfloat floating-point format and store each converted element into a corresponding location of the destination vector, decode circuitry to decode the fetched single instruction into a decoded single instruction, and the execution circuitry to respond to the decoded single instruction as specified by the opcode.

Type: Application

Filed: December 24, 2020

Publication date: March 31, 2022

Inventors: ALEXANDER F. HEINECKE, ROBERT VALENTINE, MARK J. CHARNEY, MENACHEM ADELMAN, CHRISTOPHER J. HUGHES, EVANGELOS GEORGANAS, ZEEV SPERBER, AMIT GRADSTEIN, SIMON RUBANOVICH
SYSTEMS FOR PERFORMING INSTRUCTIONS TO QUICKLY CONVERT AND USE TILES AS 1D VECTORS

Publication number: 20220100505

Abstract: Disclosed embodiments relate to systems for performing instructions to quickly convert and use matrices (tiles) as one-dimensional vectors. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode, locations of a two-dimensional (2D) matrix and a one-dimensional (1D) vector, and a group of elements comprising one of a row, part of a row, multiple rows, a column, part of a column, multiple columns, and a rectangular sub-tile of the specified 2D matrix, and wherein the opcode is to indicate a move of the specified group between the 2D matrix and the 1D vector, decode circuitry to decode the fetched instruction; and execution circuitry, responsive to the decoded instruction, when the opcode specifies a move from 1D, to move contents of the specified 1D vector to the specified group of elements.

Type: Application

Filed: December 13, 2021

Publication date: March 31, 2022

Inventors: Bret TOLL, Christopher J. HUGHES, Dan BAUM, Elmoustapha OULD-AHMED-VALL, Raanan SADE, Robert VALENTINE, Mark J. CHARNEY, Alexander F. HEINECKE
APPARATUSES, METHODS, AND SYSTEMS FOR INSTRUCTIONS FOR LOADING DATA AND PADDING INTO A TILE OF A MATRIX OPERATIONS ACCELERATOR

Publication number: 20220100513

Abstract: Systems, methods, and apparatuses relating to one or more instructions that load data into a tile register and pad a row (or column) with a pad value from a padding circuit are described.

Type: Application

Filed: December 24, 2020

Publication date: March 31, 2022

Inventors: CHRISTOPHER J. HUGHES, ALEXANDER HEINECKE, ROBERT VALENTINE, MENACHEM ADELMAN, EVANGELOS GEORGANAS, MARK CHARNEY
SYSTEMS FOR PERFORMING INSTRUCTIONS TO QUICKLY CONVERT AND USE TILES AS 1D VECTORS

Publication number: 20220100515

Abstract: Disclosed embodiments relate to systems for performing instructions to quickly convert and use matrices (tiles) as one-dimensional vectors. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode, locations of a two-dimensional (2D) matrix and a one-dimensional (1D) vector, and a group of elements comprising one of a row, part of a row, multiple rows, a column, part of a column, multiple columns, and a rectangular sub-tile of the specified 2D matrix, and wherein the opcode is to indicate a move of the specified group between the 2D matrix and the 1D vector, decode circuitry to decode the fetched instruction; and execution circuitry, responsive to the decoded instruction, when the opcode specifies a move from 1D, to move contents of the specified 1D vector to the specified group of elements.

Type: Application

Filed: December 13, 2021

Publication date: March 31, 2022

Inventors: Bret TOLL, Christopher J. HUGHES, Dan BAUM, Elmoustapha OULD-AHMED-VALL, Raanan SADE, Robert VALENTINE, Mark J. CHARNEY, Alexander F. HEINECKE
APPARATUSES, METHODS, AND SYSTEMS FOR INSTRUCTIONS FOR 16-BIT FLOATING-POINT MATRIX DOT PRODUCT INSTRUCTIONS

Publication number: 20220100502

Abstract: Systems, methods, and apparatuses relating to 16-bit floating-point matrix dot product instructions are described.

Type: Application

Filed: December 24, 2020

Publication date: March 31, 2022

Inventors: ALEXANDER F. HEINECKE, ROBERT VALENTINE, MARK J. CHARNEY, MENACHEM ADELMAN, CHRISTOPHER J. HUGHES, EVANGELOS GEORGANAS, ZEEV SPERBER, AMIT GRADSTEIN, SIMON RUBANOVICH
REMOTE ATOMIC OPERATIONS IN MULTI-SOCKET SYSTEMS

Publication number: 20220091983

Abstract: Disclosed embodiments relate to remote atomic operations (RAO) in multi-socket systems. In one example, a method, performed by a cache control circuit of a requester socket, includes: receiving the RAO instruction from the requester CPU core, determining a home agent in a home socket for the addressed cache line, providing a request for ownership (RFO) of the addressed cache line to the home agent, waiting for the home agent to either invalidate and retrieve a latest copy of the addressed cache line from a cache, or to fetch the addressed cache line from memory, receiving an acknowledgement and the addressed cache line, executing the RAO instruction on the received cache line atomically, subsequently receiving multiple local RAO instructions to the addressed cache line from one or more requester CPU cores, and executing the multiple local RAO instructions on the received cache line independently of the home agent.

Type: Application

Filed: October 5, 2021

Publication date: March 24, 2022

Inventors: Doddaballapur N. Jayasimha, Samantika S. Sury, Christopher J. Hughes, Jonas Svennebring, Yen-Cheng Liu, Stephen R. Van Doren, David A. Koufaty
Systems and methods for performing matrix compress and decompress instructions

Patent number: 11249761

Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.

Type: Grant

Filed: July 20, 2020

Date of Patent: February 15, 2022

Assignee: Intel Corporation

Inventors: Dan Baum, Michael Espig, James Guilford, Wajdi K. Feghali, Raanan Sade, Christopher J. Hughes, Robert Valentine, Bret Toll, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney, Vinodh Gopal, Ronen Zohar, Alexander F. Heinecke
SYSTEMS AND METHODS OF INSTRUCTIONS TO ACCELERATE MULTIPLICATION OF SPARSE MATRICES USING BITMASKS THAT IDENTIFY NON-ZERO ELEMENTS

Publication number: 20220012305

Abstract: Disclosed embodiments relate to accelerating multiplication of sparse matrices. In one example, a processor is to fetch and decode an instruction having fields to specify locations of first, second, and third matrices, and an opcode indicating the processor is to multiply and accumulate matching non-zero (NZ) elements of the first and second matrices with corresponding elements of the third matrix, and executing the decoded instruction as per the opcode to generate NZ bitmasks for the first and second matrices, broadcast up to two NZ elements at a time from each row of the first matrix and each column of the second matrix to a processing engine (PE) grid, each PE to multiply and accumulate matching NZ elements of the first and second matrices with corresponding elements of the third matrix. Each PE further to store an NZ element for use in a subsequent multiplications.

Type: Application

Filed: September 24, 2021

Publication date: January 13, 2022

Inventors: Dan BAUM, Chen KOREN, Elmoustapha OULD-AHMED-VALL, Michael ESPIG, Christopher J. HUGHES, Raanan SADE, Robert VALENTINE, Mark J. CHARNEY, Alexander F. HEINECKE

prev 1 2 3 4 5 6 7 … next