Patents by Inventor Christopher J. Hughes

Christopher J. Hughes has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240329938
    Abstract: Embodiments for a matrix transpose and multiply operation are disclosed. In an embodiment, a processor includes a decoder and execution circuitry. The decoder is to decode an instruction having a format including an opcode field to specify an opcode, a first destination operand field to specify a destination matrix location, a first source operand field to specify a first source matrix location, and a second source operand field to specify a second source matrix location. The execution circuitry is to, in response to the decoded instruction, transpose the first source matrix to generate a transposed first source matrix, perform a matrix multiplication using the transposed first source matrix and the second source matrix to generate a result, and store the result in a destination matrix location.
    Type: Application
    Filed: March 15, 2024
    Publication date: October 3, 2024
    Applicant: Intel Corporation
    Inventors: Menachem Adelman, Robert Valentine, Barukh Ziv, Amit Gradstein, Simon Rubanovich, Zeev Sperber, Mark J. Charney, Christopher J. Hughes, Alexander F. Heinecke, Evangelos Georganas, Binh Pham
  • Patent number: 12106104
    Abstract: A processor that includes compression instructions to compress multiple adjacent data blocks of uncompressed read-only data stored in memory into one compressed read-only data block and store the compressed read-only data block in multiple adjacent blocks in the memory is provided. During execution of an application to operate on the read-only data, one of the multiple adjacent blocks storing the compressed read-only block is read from memory, stored in a prefetch buffer and decompressed in the memory controller. In response to a subsequent request during execution of the application for an adjacent data block in the compressed read-only data block, the uncompressed adjacent block is read directly from the prefetch buffer.
    Type: Grant
    Filed: December 23, 2020
    Date of Patent: October 1, 2024
    Assignee: Intel Corporation
    Inventors: Zhe Wang, Alaa R. Alameldeen, Christopher J. Hughes
  • Publication number: 20240303195
    Abstract: In one embodiment, a processor includes interconnect circuitry, processing circuitry, a first cache, and cache controller circuitry. The interconnect circuitry communicates over a processor interconnect with a second processor that includes a second cache. The processing circuitry generates a memory read request for a corresponding memory address of a memory. Based on the memory read request, the cache controller circuitry detects a cache miss in the first cache, which indicates that the first cache does not contain a valid copy of data for the corresponding memory address. Based on the cache miss, the cache controller circuitry requests the data from the second cache or the memory based on a current bandwidth utilization of the processor interconnect.
    Type: Application
    Filed: December 15, 2021
    Publication date: September 12, 2024
    Applicant: Intel Corporation
    Inventors: Keqiang Wu, Lingxiang Xiang, Heidi Pan, Christopher J. Hughes, Zhe Wang
  • Patent number: 12056489
    Abstract: Systems, methods, and apparatuses relating to 8-bit floating-point matrix dot product instructions are described.
    Type: Grant
    Filed: May 5, 2023
    Date of Patent: August 6, 2024
    Assignee: Intel Corporation
    Inventors: Naveen Mellempudi, Alexander F. Heinecke, Robert Valentine, Mark J. Charney, Christopher J. Hughes, Evangelos Georganas, Zeev Sperber, Amit Gradstein, Simon Rubanovich
  • Publication number: 20240241722
    Abstract: Systems, methods, and apparatuses relating to 8-bit floating-point matrix dot product instructions are described.
    Type: Application
    Filed: March 28, 2024
    Publication date: July 18, 2024
    Applicant: Intel Corporation
    Inventors: Naveen MELLEMPUDI, Alexander F. HEINECKE, Robert VALENTINE, Mark J. CHARNEY, Christopher J. HUGHES, Evangelos GEORGANAS, Zeev SPERBER, Amit GRADSTEIN, Simon RUBANOVICH
  • Patent number: 12020028
    Abstract: Systems, methods, and apparatuses relating to 8-bit floating-point matrix dot product instructions are described.
    Type: Grant
    Filed: December 26, 2020
    Date of Patent: June 25, 2024
    Assignee: Intel Corporation
    Inventors: Naveen Mellempudi, Alexander F. Heinecke, Robert Valentine, Mark J. Charney, Christopher J. Hughes, Evangelos Georganas, Zeev Sperber, Amit Gradstein, Simon Rubanovich
  • Patent number: 11989555
    Abstract: Disclosed embodiments relate to atomic memory operations. In one example, a method of executing an instruction atomically and with weak order includes: fetching, by fetch circuitry, the instruction from code storage, the instruction including an opcode, a source identifier, and a destination identifier, decoding, by decode circuitry, the fetched instruction, selecting, by a scheduling circuit, an execution circuit among multiple circuits in a system, scheduling, by the scheduling circuit, execution of the decoded instruction out of order with respect to other instructions, with an order selected to optimize at least one of latency, throughput, power, and performance, and executing the decoded instruction, by the execution circuit, to: atomically read a datum from a location identified by the destination identifier, perform an operation on the datum as specified by the opcode, the operation to use a source operand identified by the source identifier, and write a result back to the location.
    Type: Grant
    Filed: June 29, 2017
    Date of Patent: May 21, 2024
    Assignee: Intel Corporation
    Inventors: Doddaballapur N. Jayasimha, Jonas Svennebring, Samantika S. Sury, Christopher J. Hughes, Jong Soo Park, Lingxiang Xiang
  • Publication number: 20240152448
    Abstract: An embodiment of an integrated circuit may comprise circuitry communicatively coupled to two or more sub-non-uniform memory access clusters (SNCs) to allocate a specified memory space in the two or more SNCs in accordance with a SNC memory allocation policy indicated from a request to initialize the specified memory space. An embodiment of an apparatus may comprise decode circuitry to decode a single instruction, the single instruction to include a field for an opcode, and execution circuitry to execute the decoded instruction according to the opcode to provide an indicated SNC memory allocation policy (e.g., a SNC policy hint). Other embodiments are disclosed and claimed.
    Type: Application
    Filed: June 21, 2021
    Publication date: May 9, 2024
    Applicant: Intel Corporation
    Inventors: Zhe Wang, Lingxiang Xiang, Christopher J. Hughes
  • Patent number: 11972230
    Abstract: Embodiments for a matrix transpose and multiply operation are disclosed. In an embodiment, a processor includes a decoder and execution circuitry. The decoder is to decode an instruction having a format including an opcode field to specify an opcode, a first destination operand field to specify a destination matrix location, a first source operand field to specify a first source matrix location, and a second source operand field to specify a second source matrix location. The execution circuitry is to, in response to the decoded instruction, transpose the first source matrix to generate a transposed first source matrix, perform a matrix multiplication using the transposed first source matrix and the second source matrix to generate a result, and store the result in a destination matrix location.
    Type: Grant
    Filed: June 27, 2020
    Date of Patent: April 30, 2024
    Assignee: Intel Corporation
    Inventors: Menachem Adelman, Robert Valentine, Barukh Ziv, Amit Gradstein, Simon Rubanovich, Zeev Sperber, Mark J. Charney, Christopher J. Hughes, Alexander F. Heinecke, Evangelos Georganas, Binh Pham
  • Publication number: 20240126555
    Abstract: A method of an aspect includes receiving a request for a chained accelerator operation, and configuring a chain of accelerators to perform the chained accelerator operation. This may include configuring a first accelerator to access an input data from a source memory location in system memory, process the input data, and generate first intermediate data. This may also include configuring a second accelerator to receive the first intermediate data, without the first intermediate data having been sent to the system memory, process the first intermediate data, and generate additional data. Other apparatus, methods, systems, and machine-readable medium are disclosed.
    Type: Application
    Filed: October 17, 2022
    Publication date: April 18, 2024
    Inventors: Saurabh GAYEN, Christopher J. HUGHES, Utkarsh Y. KAKAIYA, Alexander F. HEINECKE
  • Publication number: 20240126551
    Abstract: Disclosed embodiments relate to systems for performing instructions to quickly convert and use matrices (tiles) as one-dimensional vectors. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode, locations of a two-dimensional (2D) matrix and a one-dimensional (1D) vector, and a group of elements comprising one of a row, part of a row, multiple rows, a column, part of a column, multiple columns, and a rectangular sub-tile of the specified 2D matrix, and wherein the opcode is to indicate a move of the specified group between the 2D matrix and the 1D vector, decode circuitry to decode the fetched instruction; and execution circuitry, responsive to the decoded instruction, when the opcode specifies a move from 1D, to move contents of the specified 1D vector to the specified group of elements.
    Type: Application
    Filed: December 28, 2023
    Publication date: April 18, 2024
    Inventors: Bret TOLL, Christopher J. HUGHES, Dan BAUM, Elmoustapha OULD-AHMED-VALL, Raanan SADE, Robert VALENTINE, Mark J. CHARNEY, Alexander F. HEINECKE
  • Publication number: 20240127392
    Abstract: A chip or other apparatus of an aspect includes a first accelerator and a second accelerator. The first accelerator has support for a chained accelerator operation. The first accelerator is to be controlled as part of the chained accelerator operation to access an input data from a source memory location in system memory, process the input data, generate first intermediate data, and store the first intermediate data to a storage. The second accelerator also has support for the chained accelerator operation. The second accelerator is to be controlled as part of the chained accelerator operation to receive the first intermediate data from the storage, without the first intermediate data having been sent to the system memory, process the first intermediate data, and generate additional data. Other apparatus, methods, systems, and machine-readable medium are disclosed.
    Type: Application
    Filed: October 17, 2022
    Publication date: April 18, 2024
    Inventors: Christopher J. HUGHES, Saurabh GAYEN, Utkarsh Y. KAKAIYA, Alexander F. HEINECKE
  • Publication number: 20240126613
    Abstract: A chip or other apparatus of an aspect includes a first accelerator and a second accelerator. The first accelerator has support for a chained accelerator operation. The first accelerator is to be controlled as part of the chained accelerator operation to access an input data from a source memory location in system memory, process the input data, and generate first intermediate data. The second accelerator also has support for the chained accelerator operation. The second accelerator is to be controlled as part of the chained accelerator operation to receive the first intermediate data, without the first intermediate data having been sent to the system memory, process the first intermediate data, and generate additional data. Other apparatus, methods, systems, and machine-readable medium are disclosed.
    Type: Application
    Filed: October 17, 2022
    Publication date: April 18, 2024
    Inventors: Saurabh GAYEN, Christopher J. HUGHES, Utkarsh Y. KAKAIYA, Alexander F. HEINECKE
  • Patent number: 11954490
    Abstract: Disclosed embodiments relate to systems and methods for performing instructions to transform matrices into a row-interleaved format. In one example, a processor includes fetch and decode circuitry to fetch and decode an instruction having fields to specify an opcode and locations of source and destination matrices, wherein the opcode indicates that the processor is to transform the specified source matrix into the specified destination matrix having the row-interleaved format; and execution circuitry to respond to the decoded instruction by transforming the specified source matrix into the specified RowInt-formatted destination matrix by interleaving J elements of each J-element sub-column of the specified source matrix in either row-major or column-major order into a K-wide submatrix of the specified destination matrix, the K-wide submatrix having K columns and enough rows to hold the J elements.
    Type: Grant
    Filed: April 28, 2023
    Date of Patent: April 9, 2024
    Assignee: Intel Corporation
    Inventors: Raanan Sade, Robert Valentine, Bret Toll, Christopher J. Hughes, Alexander F. Heinecke, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney
  • Patent number: 11954489
    Abstract: Disclosed embodiments relate to systems for performing instructions to quickly convert and use matrices (tiles) as one-dimensional vectors. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode, locations of a two-dimensional (2D) matrix and a one-dimensional (1D) vector, and a group of elements comprising one of a row, part of a row, multiple rows, a column, part of a column, multiple columns, and a rectangular sub-tile of the specified 2D matrix, and wherein the opcode is to indicate a move of the specified group between the 2D matrix and the 1D vector, decode circuitry to decode the fetched instruction; and execution circuitry, responsive to the decoded instruction, when the opcode specifies a move from 1D, to move contents of the specified 1D vector to the specified group of elements.
    Type: Grant
    Filed: December 13, 2021
    Date of Patent: April 9, 2024
    Assignee: Intel Corporation
    Inventors: Bret Toll, Christopher J. Hughes, Dan Baum, Elmoustapha Ould-Ahmed-Vall, Raanan Sade, Robert Valentine, Mark J. Charney, Alexander F. Heinecke
  • Publication number: 20240103867
    Abstract: Disclosed embodiments relate to systems for performing instructions to quickly convert and use matrices (tiles) as one-dimensional vectors. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode, locations of a two-dimensional (2D) matrix and a one-dimensional (1D) vector, and a group of elements comprising one of a row, part of a row, multiple rows, a column, part of a column, multiple columns, and a rectangular sub-tile of the specified 2D matrix, and wherein the opcode is to indicate a move of the specified group between the 2D matrix and the 1D vector, decode circuitry to decode the fetched instruction; and execution circuitry, responsive to the decoded instruction, when the opcode specifies a move from 1D, to move contents of the specified 1D vector to the specified group of elements.
    Type: Application
    Filed: November 28, 2023
    Publication date: March 28, 2024
    Inventors: Bret TOLL, Christopher J. HUGHES, Dan BAUM, Elmoustapha OULD-AHMED-VALL, Raanan SADE, Robert VALENTINE, Mark J. CHARNEY, Alexander F. HEINECKE
  • Patent number: 11941395
    Abstract: Systems, methods, and apparatuses relating to 16-bit floating-point matrix dot product instructions are described.
    Type: Grant
    Filed: December 24, 2020
    Date of Patent: March 26, 2024
    Assignee: Intel Corporation
    Inventors: Alexander F. Heinecke, Robert Valentine, Mark J. Charney, Menachem Adelman, Christopher J. Hughes, Evangelos Georganas, Zeev Sperber, Amit Gradstein, Simon Rubanovich
  • Patent number: 11941394
    Abstract: A processor includes a decode unit to decode an instruction indicating a source packed data operand having source data elements and indicating a destination storage location. Each of the source data elements has a source data element value and a source data element position. An execution unit, in response to the instruction, stores a result packed data operand having result data elements each having a result data element value and a result data element position. Each result data element value is one of: (1) equal to a source data element position of a source data element, closest to one end of the source operand, having a source data element value equal to the result data element position of the result data element; and (2) a replacement value, when no source data element has a source data element value equal to the result data element position of the result data element.
    Type: Grant
    Filed: December 9, 2019
    Date of Patent: March 26, 2024
    Assignee: Intel Corporation
    Inventors: Christopher J. Hughes, Jong Soo Park
  • Patent number: 11934830
    Abstract: Disclosed embodiments relate to a new instruction for performing data-ready memory access operations.
    Type: Grant
    Filed: June 13, 2022
    Date of Patent: March 19, 2024
    Assignee: Intel Corporation
    Inventors: William M. Brown, Mikhail Plotnikov, Christopher J. Hughes
  • Publication number: 20240078285
    Abstract: Disclosed embodiments relate to accelerating multiplication of sparse matrices. In one example, a processor is to fetch and decode an instruction having fields to specify locations of first, second, and third matrices, and an opcode indicating the processor is to multiply and accumulate matching non-zero (NZ) elements of the first and second matrices with corresponding elements of the third matrix, and executing the decoded instruction as per the opcode to generate NZ bitmasks for the first and second matrices, broadcast up to two NZ elements at a time from each row of the first matrix and each column of the second matrix to a processing engine (PE) grid, each PE to multiply and accumulate matching NZ elements of the first and second matrices with corresponding elements of the third matrix. Each PE further to store an NZ element for use in a subsequent multiplications.
    Type: Application
    Filed: November 6, 2023
    Publication date: March 7, 2024
    Inventors: Dan BAUM, Chen KOREN, Elmoustapha OULD-AHMED-VALL, Michael ESPIG, Christopher J. HUGHES, Raanan SADE, Robert VALENTINE, Mark J. CHARNEY, Alexander F. HEINECKE