Patents by Inventor Christopher J. Hughes

Christopher J. Hughes has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

METHOD AND APPARATUS FOR PERFORMING REDUCTION OPERATIONS ON A PLURALITY OF DATA ELEMENT VALUES

Publication number: 20200310809

Abstract: Embodiments detailed herein relate to reduction operations on a plurality of data element values. In one embodiment, a process comprises decoding circuitry to decode an instruction and execution circuitry to execute the decoded instruction. The instruction specifies a first input register containing a plurality of data element values, a first index register containing a plurality of indices, and an output register, where each index of the plurality of indices maps to one unique data element position of the first input register. The execution includes to identify data element values that are associated with one another based on the indices, perform one or more reduction operations on the associated data element values based on the identification, and store results of the one or more reduction operations in the output register.

Type: Application

Filed: March 27, 2019

Publication date: October 1, 2020

Inventors: Christopher J. HUGHES, Jonathan D. PEARCE, Guei-Yuan LUEH, ElMoustapha OULD-AHMED-VALL, Jorge E. PARRA, Prasoonkumar SURTI, Krishna N. VINOD, Ronen ZOHAR
Method and apparatus for vector-matrix comparison

Patent number: 10782971

Abstract: Methods and apparatus for vector-matrix comparison are disclosed. In one embodiment, a processor comprises decoding and execution circuitry. The decoding circuitry decodes an instruction, where operands of the instruction specifies an output location to store output results, a vector of data element values, and a matrix of data element values. The execution circuitry executes the decoded instruction. The execution includes to map each of the data element values of the vector to one of consecutive rows of the matrix; for each data element value of the vector, to compare that data element value of the vector with data element values in a respective row of the matrix and obtain data element match results. The execution further includes to store the output results based on the data element match results, where each output result maps to a respective data element column position and indicates a vector match result.

Type: Grant

Filed: March 30, 2019

Date of Patent: September 22, 2020

Assignee: Intel Corporation

Inventors: Christopher J. Hughes, ElMoustapha Ould-Ahmed-Vall, Jorge E. Parra, Prasoonkumar Surti, Krishna N. Vinod, Ronen Zohar
APPARATUS AND METHOD FOR PROCESSING EFFICIENT MULTICAST OPERATION

Publication number: 20200272466

Abstract: An apparatus and method for processing efficient multicast operation.

Type: Application

Filed: May 13, 2020

Publication date: August 27, 2020

Inventors: CHRISTOPHER J. HUGHES, DAN BAUM
Systems and methods for performing matrix compress and decompress instructions

Patent number: 10719323

Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.

Type: Grant

Filed: September 27, 2018

Date of Patent: July 21, 2020

Assignee: Intel Corporation

Inventors: Dan Baum, Michael Espig, James Guilford, Wajdi K. Feghali, Raanan Sade, Christopher J. Hughes, Robert Valentine, Bret Toll, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney, Vinodh Gopal, Ronen Zohar, Alexander F. Heinecke
Controlling displacement in a co-operative and adaptive multiple-level memory system

Patent number: 10705964

Abstract: In one embodiment, a processor includes a control logic to determine whether to enable an incoming data block associated with a first priority to displace, in a cache memory coupled to the processor, a candidate victim data block associated with a second priority and stored in the cache memory, based at least in part on the first and second priorities, a first access history associated with the incoming data block and a second access history associated with the candidate victim data block. Other embodiments are described and claimed.

Type: Grant

Filed: April 28, 2015

Date of Patent: July 7, 2020

Assignee: Intel Corporation

Inventors: Kshitij A. Doshi, Christopher J. Hughes
APPARATUSES, METHODS, AND SYSTEMS FOR STENCIL CONFIGURATION AND COMPUTATION INSTRUCTIONS

Publication number: 20200210174

Abstract: Systems, methods, and apparatuses relating to performing stencil configuration and computation operations are described.

Type: Application

Filed: December 29, 2018

Publication date: July 2, 2020

Inventors: Michael Espig, Christopher J. Hughes
SYSTEMS AND METHODS FOR PERFORMING DUPLICATE DETECTION INSTRUCTIONS ON 2D DATA

Publication number: 20200210182

Abstract: Disclosed embodiments relate to systems and methods for performing duplicate detection instructions on two-dimensional (2D) data. In one example, a processor includes fetch circuitry to fetch an instruction, decode circuitry to decode the fetched instruction having fields to specify an opcode and locations of a source matrix comprising M×N elements and a destination, the opcode to indicate execution circuitry is to use a plurality of comparators to discover duplicates in the source matrix, and store indications of locations of discovered duplicates in the destination. The execution circuitry to execute the decoded instruction as per the opcode.

Type: Application

Filed: December 26, 2018

Publication date: July 2, 2020

Inventors: Christopher J. HUGHES, Michael ESPIG, Dan BAUM, Robert VALENTINE, Bret TOLL, Elmoustapha OULD-AHMED-VALL
SYSTEMS AND METHODS TO ACCELERATE MULTIPLICATION OF SPARSE MATRICES

Publication number: 20200210517

Abstract: Disclosed embodiments relate to accelerating multiplication of sparse matrices. In one example, a processor is to fetch and decode an instruction having fields to specify locations of first, second, and third matrices, and an opcode indicating the processor is to multiply and accumulate matching non-zero (NZ) elements of the first and second matrices with corresponding elements of the third matrix, and executing the decoded instruction as per the opcode to generate NZ bitmasks for the first and second matrices, broadcast up to two NZ elements at a time from each row of the first matrix and each column of the second matrix to a processing engine (PE) grid, each PE to multiply and accumulate matching NZ elements of the first and second matrices with corresponding elements of the third matrix. Each PE further to store an NZ element for use in a subsequent multiplications.

Type: Application

Filed: December 27, 2018

Publication date: July 2, 2020

Inventors: Dan BAUM, Chen KOREN, Elmoustapha OULD-AHMED-VALL, Michael ESPIG, Christopher J. HUGHES, Raanan SADE, Robert VALENTINE, Mark J. CHARNEY, Alexander F. HEINECKE
SYSTEMS AND METHODS FOR PERFORMING NIBBLE-SIZED OPERATIONS ON MATRIX ELEMENTS

Publication number: 20200210173

Abstract: Disclosed embodiments relate to systems and methods for performing nibble-sized operations on matrix elements. In one example, a processor includes fetch circuitry to fetch an instruction, decode circuitry to decode the fetched instruction the fetched instruction having fields to specify an opcode and locations of first source, second source, and destination matrices, the opcode to indicate the processor is to, for each pair of corresponding elements of the first and second source matrices, logically partition each element into nibble-sized partitions, perform an operation indicated by the instruction on each partition, and store execution results to a corresponding nibble-sized partition of a corresponding element of the destination matrix. The exemplary processor includes execution circuitry to execute the decoded instruction as per the opcode.

Type: Application

Filed: December 26, 2018

Publication date: July 2, 2020

Inventors: Elmoustapha OULD-AHMED-VALL, Jonathan D. PEARCE, Dan BAUM, Guei-Yuan LUEH, Michael ESPIG, Christopher J. HUGHES, Raanan SADE, Robert VALENTINE, Mark J. CHARNEY, Alexander F. HEINECKE
SYSTEMS AND METHODS FOR PERFORMING MATRIX ROW- AND COLUMN-WISE PERMUTE INSTRUCTIONS

Publication number: 20200210188

Abstract: Disclosed embodiments relate to systems and methods for performing matrix row-wise and column-wise permute instructions. In one example, a processor includes fetch circuitry to fetch an instruction, decoding, using decode circuitry, the fetched instruction having fields to specify an opcode and locations of a source matrix and a destination matrix, the opcode indicating the processor is to perform a permutation by copying, into each of a plurality of equal-sized logical partitions of the destination matrix, a selected logical partition of a same size from the source matrix, the selection being indicated by a permute control, and execution circuitry to execute the decoded instruction as per the opcode.

Type: Application

Filed: December 27, 2018

Publication date: July 2, 2020

Inventors: Elmoustapha OULD-AHMED-VALL, Jonathan D. PEARCE, Dan BAUM, Guei-Yuan LUEH, Michael ESPIG, Christopher J. HUGHES, Raanan SADE, Robert VALENTINE, Mark J. CHARNEY, Alexander F. HEINECKE
APPARATUSES, METHODS, AND SYSTEMS FOR FAST FOURIER TRANSFORM CONFIGURATION AND COMPUTATION INSTRUCTIONS

Publication number: 20200210516

Abstract: Systems, methods, and apparatuses relating to performing fast Fourier transform (FFT) configuration and computation operations are described. In one embodiment, a processor includes a matrix operations accelerator circuit that includes a two-dimensional grid of processing element circuits; a first plurality of registers that represents a first two-dimensional matrix coupled to the matrix operations accelerator circuit; a second plurality of registers that represents a second two-dimensional matrix coupled to the matrix operations accelerator circuit; a decoder, of a core coupled to the matrix operations accelerator circuit, to decode a single instruction into a decoded single instruction; and an execution circuit of the core to execute the decoded single instruction to cause the two-dimensional grid of processing element circuits to operate on a first packed data input value and a first complex twiddle factor value to produce a first result and a second result.

Type: Application

Filed: December 29, 2018

Publication date: July 2, 2020

Inventors: Michael Espig, Christopher J. Hughes, Jongsoo Park
SYSTEMS AND METHODS TO TRANSPOSE VECTORS ON-THE-FLY WHILE LOADING FROM MEMORY

Publication number: 20200201640

Abstract: Disclosed embodiments relate to transposing vectors while loading from memory. In one example, a processor includes a register file, a memory interface, fetch circuitry to fetch an instruction, decode circuitry to decode the fetched instruction having fields to specify an opcode, a destination vector register, and a source vector having N groups of elements, N being a positive integer, the opcode to indicate the processor is to fetch the source vector, generate write data comprising one or more N-tuples, each N-tuple comprising corresponding elements from each of the N groups of elements, and write the write data to the destination vector register, and execution circuitry to execute the decoded instruction as per the opcode, the execution circuitry has a shuffle pipeline disposed between the memory and the register file, the shuffle pipeline to fetch, decode, and execute further instances of the instruction at one instruction per clock cycle.

Type: Application

Filed: December 21, 2018

Publication date: June 25, 2020

Inventors: Alexander F. HEINECKE, Evangelos GEORGANAS, Christopher J. HUGHES, Raanan SADE, Robert VALENTINE
Dynamic home tile mapping

Patent number: 10678689

Abstract: Technologies for migration of dynamic home tile mapping are described. An apparatus includes means for receiving coherence messages from other processor cores on the die, means for recording locations from which the coherence messages originate and means for determining distances between the requested home tiles and the locations from which the coherence messages originate. The apparatus includes means for determining whether an average distance between a particular home tile, whose identifier is stored in the home tile table, exceeds a threshold. When the average distance exceeds the defined threshold, the apparatus includes means for migrating the particular home tile to another location.

Type: Grant

Filed: April 12, 2019

Date of Patent: June 9, 2020

Assignee: Intel Corporation

Inventors: Christopher J. Hughes, Daehyun Kim, Jong Soo Park, Richard M. Yoo
METHOD AND APPARATUS FOR VECTORIZING HISTOGRAM LOOPS

Publication number: 20200174790

Abstract: Disclosed embodiments relate to a new instruction for detecting conflicts in a set of vector elements and determining a number of instances of each distinct data value within the vector. A system includes circuits to fetch, decode, and execute an instruction that includes an opcode, a destination vector identifier, a source vector identifier, and an immediate value, wherein the execution circuit is to, for each data element position of a source vector, determine a number of matching data element positions in the source vector storing a same data value as stored at the data element position, the matching data element positions located between the data element position and a least significant data element position of the source vector, and store in a corresponding data element position of a destination vector identified by the destination vector identifier, a value representing the number of matching data element positions.

Type: Application

Filed: June 30, 2017

Publication date: June 4, 2020

Applicant: Intel Corporation

Inventors: Mikhail PLOTNIKOV, Christopher J. HUGHES, Andrey NARAIKIN
Delayed prefetch manager to multicast an updated cache line to processor cores requesting the updated data

Patent number: 10664273

Abstract: An apparatus and method for processing efficient multicast operation.

Type: Grant

Filed: March 30, 2018

Date of Patent: May 26, 2020

Assignee: Intel Corporation

Inventors: Christopher J. Hughes, Dan Baum
Systems and methods for implementing chained tile operations

Patent number: 10664287

Abstract: Disclosed embodiments relate to systems and methods for implementing chained tile operations. In one example, a processor includes fetch circuitry to fetch one or more instructions until a plurality of instructions has been fetched, each instruction to specify source and destination tile operands, decode circuitry to decode the fetched instructions, and execution circuitry, responsive to the decoded instructions, to: identify first and second decoded instructions belonging to a chain of instructions, dynamically select and configure a SIMD path comprising first and second processing engines (PE) to execute the first and second decoded instructions, and set aside the specified destination of the first decoded instruction, and instead route a result of the first decoded instruction from the first PE to be used by the second PE to perform the second decoded instruction.

Type: Grant

Filed: March 30, 2018

Date of Patent: May 26, 2020

Assignee: Intel Corporation

Inventors: Christopher J. Hughes, Alexander F. Heinecke, Robert Valentine, Bret Toll, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall
Application driven hardware cache management

Patent number: 10664199

Abstract: A processor includes a processing core to generate a memory request for an application data in an application. The processor also includes a virtual page group memory management (VPGMM) unit coupled to the processing core to specify a caching priority (CP) to the application data for the application. The caching priority identifies importance of the application data in a cache.

Type: Grant

Filed: November 13, 2018

Date of Patent: May 26, 2020

Assignee: Intel Corporation

Inventors: Subramanya R. Dulloor, Rajesh M. Sankaran, David A. Koufaty, Christopher J. Hughes, Jong Soo Park, Sheng Li
METHOD AND APPARATUS FOR DATA-READY MEMORY OPERATIONS

Publication number: 20200142699

Abstract: Disclosed embodiments relate to a new instruction for performing data-ready memory access operations. In one example, a system includes circuits to fetch, decode, and execute an instruction that includes an opcode, at least one memory location identifier identifying at least one data element, a register identifier, a data readiness indicator identifying at least one data access condition, and a data readiness mask, wherein the execution circuit is to, for each data element of the at least one data element, determine whether a memory request for the data element satisfies the at least one data access condition identified by the data readiness indicator, and in response to determining that the data access condition: generate a prefetch request for the data element, and set a value in a corresponding data element position of the data readiness mask to indicate that the memory request for the data element does not satisfy the at least one data access condition.

Type: Application

Filed: June 30, 2017

Publication date: May 7, 2020

Applicant: Intel Corporation

Inventors: William M. BROWN, Mikhail PLOTNIKOV, Christopher J. HUGHES
DATA ELEMENT REARRANGEMENT, PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS

Publication number: 20200117451

Abstract: A processor includes a decode unit to decode an instruction indicating a source packed data operand having source data elements and indicating a destination storage location. Each of the source data elements has a source data element value and a source data element position. An execution unit, in response to the instruction, stores a result packed data operand having result data elements each having a result data element value and a result data element position. Each result data element value is one of: (1) equal to a source data element position of a source data element, closest to one end of the source operand, having a source data element value equal to the result data element position of the result data element; and (2) a replacement value, when no source data element has a source data element value equal to the result data element position of the result data element.

Type: Application

Filed: December 9, 2019

Publication date: April 16, 2020

Inventors: Christopher J. Hughes, Jong Soo Park
SYSTEMS AND METHODS FOR PERFORMING INSTRUCTIONS SPECIFYING VECTOR TILE LOGIC OPERATIONS

Publication number: 20200104132

Abstract: Disclosed embodiments relate to systems and methods for performing instructions structured to compute a min/max value of a vector. In one example, a processor executes a decoded single instruction to determine on a per data element position of the identified first and second operands a maximum or minimum, store the determined maximum or minimums in corresponding data element positions of the identified first operand, and determine and store, in each data element position of the identified third operand, an indication of where the maximum or minimum came from.

Type: Application

Filed: September 29, 2018

Publication date: April 2, 2020

Inventors: Sunny L. GOGAR, Rama Kishan V. MALLADI, Elmoustapha OULD-AHMED-VALL, Christopher J. HUGHES

prev … 2 3 4 5 6 7 8 9 10 … next