Patents by Inventor Christopher J. Hughes

Christopher J. Hughes has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20200310809
    Abstract: Embodiments detailed herein relate to reduction operations on a plurality of data element values. In one embodiment, a process comprises decoding circuitry to decode an instruction and execution circuitry to execute the decoded instruction. The instruction specifies a first input register containing a plurality of data element values, a first index register containing a plurality of indices, and an output register, where each index of the plurality of indices maps to one unique data element position of the first input register. The execution includes to identify data element values that are associated with one another based on the indices, perform one or more reduction operations on the associated data element values based on the identification, and store results of the one or more reduction operations in the output register.
    Type: Application
    Filed: March 27, 2019
    Publication date: October 1, 2020
    Inventors: Christopher J. HUGHES, Jonathan D. PEARCE, Guei-Yuan LUEH, ElMoustapha OULD-AHMED-VALL, Jorge E. PARRA, Prasoonkumar SURTI, Krishna N. VINOD, Ronen ZOHAR
  • Patent number: 10782971
    Abstract: Methods and apparatus for vector-matrix comparison are disclosed. In one embodiment, a processor comprises decoding and execution circuitry. The decoding circuitry decodes an instruction, where operands of the instruction specifies an output location to store output results, a vector of data element values, and a matrix of data element values. The execution circuitry executes the decoded instruction. The execution includes to map each of the data element values of the vector to one of consecutive rows of the matrix; for each data element value of the vector, to compare that data element value of the vector with data element values in a respective row of the matrix and obtain data element match results. The execution further includes to store the output results based on the data element match results, where each output result maps to a respective data element column position and indicates a vector match result.
    Type: Grant
    Filed: March 30, 2019
    Date of Patent: September 22, 2020
    Assignee: Intel Corporation
    Inventors: Christopher J. Hughes, ElMoustapha Ould-Ahmed-Vall, Jorge E. Parra, Prasoonkumar Surti, Krishna N. Vinod, Ronen Zohar
  • Publication number: 20200272466
    Abstract: An apparatus and method for processing efficient multicast operation.
    Type: Application
    Filed: May 13, 2020
    Publication date: August 27, 2020
    Inventors: CHRISTOPHER J. HUGHES, DAN BAUM
  • Patent number: 10719323
    Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.
    Type: Grant
    Filed: September 27, 2018
    Date of Patent: July 21, 2020
    Assignee: Intel Corporation
    Inventors: Dan Baum, Michael Espig, James Guilford, Wajdi K. Feghali, Raanan Sade, Christopher J. Hughes, Robert Valentine, Bret Toll, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney, Vinodh Gopal, Ronen Zohar, Alexander F. Heinecke
  • Patent number: 10705964
    Abstract: In one embodiment, a processor includes a control logic to determine whether to enable an incoming data block associated with a first priority to displace, in a cache memory coupled to the processor, a candidate victim data block associated with a second priority and stored in the cache memory, based at least in part on the first and second priorities, a first access history associated with the incoming data block and a second access history associated with the candidate victim data block. Other embodiments are described and claimed.
    Type: Grant
    Filed: April 28, 2015
    Date of Patent: July 7, 2020
    Assignee: Intel Corporation
    Inventors: Kshitij A. Doshi, Christopher J. Hughes
  • Publication number: 20200210174
    Abstract: Systems, methods, and apparatuses relating to performing stencil configuration and computation operations are described.
    Type: Application
    Filed: December 29, 2018
    Publication date: July 2, 2020
    Inventors: Michael Espig, Christopher J. Hughes
  • Publication number: 20200210182
    Abstract: Disclosed embodiments relate to systems and methods for performing duplicate detection instructions on two-dimensional (2D) data. In one example, a processor includes fetch circuitry to fetch an instruction, decode circuitry to decode the fetched instruction having fields to specify an opcode and locations of a source matrix comprising M×N elements and a destination, the opcode to indicate execution circuitry is to use a plurality of comparators to discover duplicates in the source matrix, and store indications of locations of discovered duplicates in the destination. The execution circuitry to execute the decoded instruction as per the opcode.
    Type: Application
    Filed: December 26, 2018
    Publication date: July 2, 2020
    Inventors: Christopher J. HUGHES, Michael ESPIG, Dan BAUM, Robert VALENTINE, Bret TOLL, Elmoustapha OULD-AHMED-VALL
  • Publication number: 20200210517
    Abstract: Disclosed embodiments relate to accelerating multiplication of sparse matrices. In one example, a processor is to fetch and decode an instruction having fields to specify locations of first, second, and third matrices, and an opcode indicating the processor is to multiply and accumulate matching non-zero (NZ) elements of the first and second matrices with corresponding elements of the third matrix, and executing the decoded instruction as per the opcode to generate NZ bitmasks for the first and second matrices, broadcast up to two NZ elements at a time from each row of the first matrix and each column of the second matrix to a processing engine (PE) grid, each PE to multiply and accumulate matching NZ elements of the first and second matrices with corresponding elements of the third matrix. Each PE further to store an NZ element for use in a subsequent multiplications.
    Type: Application
    Filed: December 27, 2018
    Publication date: July 2, 2020
    Inventors: Dan BAUM, Chen KOREN, Elmoustapha OULD-AHMED-VALL, Michael ESPIG, Christopher J. HUGHES, Raanan SADE, Robert VALENTINE, Mark J. CHARNEY, Alexander F. HEINECKE
  • Publication number: 20200210173
    Abstract: Disclosed embodiments relate to systems and methods for performing nibble-sized operations on matrix elements. In one example, a processor includes fetch circuitry to fetch an instruction, decode circuitry to decode the fetched instruction the fetched instruction having fields to specify an opcode and locations of first source, second source, and destination matrices, the opcode to indicate the processor is to, for each pair of corresponding elements of the first and second source matrices, logically partition each element into nibble-sized partitions, perform an operation indicated by the instruction on each partition, and store execution results to a corresponding nibble-sized partition of a corresponding element of the destination matrix. The exemplary processor includes execution circuitry to execute the decoded instruction as per the opcode.
    Type: Application
    Filed: December 26, 2018
    Publication date: July 2, 2020
    Inventors: Elmoustapha OULD-AHMED-VALL, Jonathan D. PEARCE, Dan BAUM, Guei-Yuan LUEH, Michael ESPIG, Christopher J. HUGHES, Raanan SADE, Robert VALENTINE, Mark J. CHARNEY, Alexander F. HEINECKE
  • Publication number: 20200210188
    Abstract: Disclosed embodiments relate to systems and methods for performing matrix row-wise and column-wise permute instructions. In one example, a processor includes fetch circuitry to fetch an instruction, decoding, using decode circuitry, the fetched instruction having fields to specify an opcode and locations of a source matrix and a destination matrix, the opcode indicating the processor is to perform a permutation by copying, into each of a plurality of equal-sized logical partitions of the destination matrix, a selected logical partition of a same size from the source matrix, the selection being indicated by a permute control, and execution circuitry to execute the decoded instruction as per the opcode.
    Type: Application
    Filed: December 27, 2018
    Publication date: July 2, 2020
    Inventors: Elmoustapha OULD-AHMED-VALL, Jonathan D. PEARCE, Dan BAUM, Guei-Yuan LUEH, Michael ESPIG, Christopher J. HUGHES, Raanan SADE, Robert VALENTINE, Mark J. CHARNEY, Alexander F. HEINECKE
  • Publication number: 20200210516
    Abstract: Systems, methods, and apparatuses relating to performing fast Fourier transform (FFT) configuration and computation operations are described. In one embodiment, a processor includes a matrix operations accelerator circuit that includes a two-dimensional grid of processing element circuits; a first plurality of registers that represents a first two-dimensional matrix coupled to the matrix operations accelerator circuit; a second plurality of registers that represents a second two-dimensional matrix coupled to the matrix operations accelerator circuit; a decoder, of a core coupled to the matrix operations accelerator circuit, to decode a single instruction into a decoded single instruction; and an execution circuit of the core to execute the decoded single instruction to cause the two-dimensional grid of processing element circuits to operate on a first packed data input value and a first complex twiddle factor value to produce a first result and a second result.
    Type: Application
    Filed: December 29, 2018
    Publication date: July 2, 2020
    Inventors: Michael Espig, Christopher J. Hughes, Jongsoo Park
  • Publication number: 20200201640
    Abstract: Disclosed embodiments relate to transposing vectors while loading from memory. In one example, a processor includes a register file, a memory interface, fetch circuitry to fetch an instruction, decode circuitry to decode the fetched instruction having fields to specify an opcode, a destination vector register, and a source vector having N groups of elements, N being a positive integer, the opcode to indicate the processor is to fetch the source vector, generate write data comprising one or more N-tuples, each N-tuple comprising corresponding elements from each of the N groups of elements, and write the write data to the destination vector register, and execution circuitry to execute the decoded instruction as per the opcode, the execution circuitry has a shuffle pipeline disposed between the memory and the register file, the shuffle pipeline to fetch, decode, and execute further instances of the instruction at one instruction per clock cycle.
    Type: Application
    Filed: December 21, 2018
    Publication date: June 25, 2020
    Inventors: Alexander F. HEINECKE, Evangelos GEORGANAS, Christopher J. HUGHES, Raanan SADE, Robert VALENTINE
  • Patent number: 10678689
    Abstract: Technologies for migration of dynamic home tile mapping are described. An apparatus includes means for receiving coherence messages from other processor cores on the die, means for recording locations from which the coherence messages originate and means for determining distances between the requested home tiles and the locations from which the coherence messages originate. The apparatus includes means for determining whether an average distance between a particular home tile, whose identifier is stored in the home tile table, exceeds a threshold. When the average distance exceeds the defined threshold, the apparatus includes means for migrating the particular home tile to another location.
    Type: Grant
    Filed: April 12, 2019
    Date of Patent: June 9, 2020
    Assignee: Intel Corporation
    Inventors: Christopher J. Hughes, Daehyun Kim, Jong Soo Park, Richard M. Yoo
  • Publication number: 20200174790
    Abstract: Disclosed embodiments relate to a new instruction for detecting conflicts in a set of vector elements and determining a number of instances of each distinct data value within the vector. A system includes circuits to fetch, decode, and execute an instruction that includes an opcode, a destination vector identifier, a source vector identifier, and an immediate value, wherein the execution circuit is to, for each data element position of a source vector, determine a number of matching data element positions in the source vector storing a same data value as stored at the data element position, the matching data element positions located between the data element position and a least significant data element position of the source vector, and store in a corresponding data element position of a destination vector identified by the destination vector identifier, a value representing the number of matching data element positions.
    Type: Application
    Filed: June 30, 2017
    Publication date: June 4, 2020
    Applicant: Intel Corporation
    Inventors: Mikhail PLOTNIKOV, Christopher J. HUGHES, Andrey NARAIKIN
  • Patent number: 10664273
    Abstract: An apparatus and method for processing efficient multicast operation.
    Type: Grant
    Filed: March 30, 2018
    Date of Patent: May 26, 2020
    Assignee: Intel Corporation
    Inventors: Christopher J. Hughes, Dan Baum
  • Patent number: 10664287
    Abstract: Disclosed embodiments relate to systems and methods for implementing chained tile operations. In one example, a processor includes fetch circuitry to fetch one or more instructions until a plurality of instructions has been fetched, each instruction to specify source and destination tile operands, decode circuitry to decode the fetched instructions, and execution circuitry, responsive to the decoded instructions, to: identify first and second decoded instructions belonging to a chain of instructions, dynamically select and configure a SIMD path comprising first and second processing engines (PE) to execute the first and second decoded instructions, and set aside the specified destination of the first decoded instruction, and instead route a result of the first decoded instruction from the first PE to be used by the second PE to perform the second decoded instruction.
    Type: Grant
    Filed: March 30, 2018
    Date of Patent: May 26, 2020
    Assignee: Intel Corporation
    Inventors: Christopher J. Hughes, Alexander F. Heinecke, Robert Valentine, Bret Toll, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall
  • Patent number: 10664199
    Abstract: A processor includes a processing core to generate a memory request for an application data in an application. The processor also includes a virtual page group memory management (VPGMM) unit coupled to the processing core to specify a caching priority (CP) to the application data for the application. The caching priority identifies importance of the application data in a cache.
    Type: Grant
    Filed: November 13, 2018
    Date of Patent: May 26, 2020
    Assignee: Intel Corporation
    Inventors: Subramanya R. Dulloor, Rajesh M. Sankaran, David A. Koufaty, Christopher J. Hughes, Jong Soo Park, Sheng Li
  • Publication number: 20200142699
    Abstract: Disclosed embodiments relate to a new instruction for performing data-ready memory access operations. In one example, a system includes circuits to fetch, decode, and execute an instruction that includes an opcode, at least one memory location identifier identifying at least one data element, a register identifier, a data readiness indicator identifying at least one data access condition, and a data readiness mask, wherein the execution circuit is to, for each data element of the at least one data element, determine whether a memory request for the data element satisfies the at least one data access condition identified by the data readiness indicator, and in response to determining that the data access condition: generate a prefetch request for the data element, and set a value in a corresponding data element position of the data readiness mask to indicate that the memory request for the data element does not satisfy the at least one data access condition.
    Type: Application
    Filed: June 30, 2017
    Publication date: May 7, 2020
    Applicant: Intel Corporation
    Inventors: William M. BROWN, Mikhail PLOTNIKOV, Christopher J. HUGHES
  • Publication number: 20200117451
    Abstract: A processor includes a decode unit to decode an instruction indicating a source packed data operand having source data elements and indicating a destination storage location. Each of the source data elements has a source data element value and a source data element position. An execution unit, in response to the instruction, stores a result packed data operand having result data elements each having a result data element value and a result data element position. Each result data element value is one of: (1) equal to a source data element position of a source data element, closest to one end of the source operand, having a source data element value equal to the result data element position of the result data element; and (2) a replacement value, when no source data element has a source data element value equal to the result data element position of the result data element.
    Type: Application
    Filed: December 9, 2019
    Publication date: April 16, 2020
    Inventors: Christopher J. Hughes, Jong Soo Park
  • Publication number: 20200104132
    Abstract: Disclosed embodiments relate to systems and methods for performing instructions structured to compute a min/max value of a vector. In one example, a processor executes a decoded single instruction to determine on a per data element position of the identified first and second operands a maximum or minimum, store the determined maximum or minimums in corresponding data element positions of the identified first operand, and determine and store, in each data element position of the identified third operand, an indication of where the maximum or minimum came from.
    Type: Application
    Filed: September 29, 2018
    Publication date: April 2, 2020
    Inventors: Sunny L. GOGAR, Rama Kishan V. MALLADI, Elmoustapha OULD-AHMED-VALL, Christopher J. HUGHES