Patents by Inventor Christopher J. Hughes

Christopher J. Hughes has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Systems and methods for performing instructions to transform matrices into row-interleaved format

Patent number: 10963256

Abstract: Disclosed embodiments relate to systems and methods for performing instructions to transform matrices into a row-interleaved format. In one example, a processor includes fetch and decode circuitry to fetch and decode an instruction having fields to specify an opcode and locations of source and destination matrices, wherein the opcode indicates that the processor is to transform the specified source matrix into the specified destination matrix having the row-interleaved format; and execution circuitry to respond to the decoded instruction by transforming the specified source matrix into the specified RowInt-formatted destination matrix by interleaving J elements of each J-element sub-column of the specified source matrix in either row-major or column-major order into a K-wide submatrix of the specified destination matrix, the K-wide submatrix having K columns and enough rows to hold the J elements.

Type: Grant

Filed: September 28, 2018

Date of Patent: March 30, 2021

Assignee: Intel Corporation

Inventors: Raanan Sade, Robert Valentine, Bret Toll, Christopher J. Hughes, Alexander F. Heinecke, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney
Apparatuses, methods, and systems for fast fourier transform configuration and computation instructions

Patent number: 10942985

Abstract: Systems, methods, and apparatuses relating to performing fast Fourier transform (FFT) configuration and computation operations are described. In one embodiment, a processor includes a matrix operations accelerator circuit that includes a two-dimensional grid of processing element circuits; a first plurality of registers that represents a first two-dimensional matrix coupled to the matrix operations accelerator circuit; a second plurality of registers that represents a second two-dimensional matrix coupled to the matrix operations accelerator circuit; a decoder, of a core coupled to the matrix operations accelerator circuit, to decode a single instruction into a decoded single instruction; and an execution circuit of the core to execute the decoded single instruction to cause the two-dimensional grid of processing element circuits to operate on a first packed data input value and a first complex twiddle factor value to produce a first result and a second result.

Type: Grant

Filed: December 29, 2018

Date of Patent: March 9, 2021

Assignee: Intel Corporation

Inventors: Michael Espig, Christopher J. Hughes, Jongsoo Park
Systems, apparatuses, and methods for data speculation execution

Patent number: 10942744

Abstract: Systems, methods, and apparatuses for data speculation execution (DSX) are described. In some embodiments, a hardware apparatus for DSX comprises execution hardware to execute instructions to begin and end a data speculative execution (DSX) and speculative instructions during the DSX, and DSX tracking hardware to track speculative memory accesses and detect ordering violations in a DSX of speculative instructions using a sequence number, addresses of instruction accesses, and whether an instruction being tracked is a write, and to trigger a mis-speculation upon an ordering violation.

Type: Grant

Filed: December 24, 2014

Date of Patent: March 9, 2021

Assignee: Intel Corporation

Inventors: Elmoustapha Ould-Ahmed-Vall, Christopher J. Hughes, Robert Valentine, Milind B. Girkar
Method and apparatus for efficient matrix alignment in a systolic array

Patent number: 10929143

Abstract: An apparatus and method for efficient matrix alignment in a systolic array.

Type: Grant

Filed: September 28, 2018

Date of Patent: February 23, 2021

Assignee: Intel Corporation

Inventors: Mike Espig, Bret Toll, Raanan Sade, Bob Valentine, Alexander Heinecke, Christopher J. Hughes
No-locality hint vector memory access processors, methods, systems, and instructions

Patent number: 10929298

Abstract: A processor of an aspect includes a plurality of packed data registers, and a decode unit to decode a no-locality hint vector memory access instruction. The no-locality hint vector memory access instruction to indicate a packed data register of the plurality of packed data registers that is to have a source packed memory indices. The source packed memory indices to have a plurality of memory indices. The no-locality hint vector memory access instruction is to provide a no-locality hint to the processor for data elements that are to be accessed with the memory indices. The processor also includes an execution unit coupled with the decode unit and the plurality of packed data registers. The execution unit, in response to the no-locality hint vector memory access instruction, is to access the data elements at memory locations that are based on the memory indices.

Type: Grant

Filed: February 15, 2019

Date of Patent: February 23, 2021

Assignee: Intel Corporation

Inventor: Christopher J. Hughes
Systems and methods for performing vector max/min instructions that also generate index values

Patent number: 10922080

Abstract: Disclosed embodiments relate to systems and methods for performing instructions structured to compute a min/max value of a vector. In one example, a processor executes a decoded single instruction to determine on a per data element position of the identified first and second operands a maximum or minimum, store the determined maximum or minimums in corresponding data element positions of the identified first operand, and determine and store, in each data element position of the identified third operand, an indication of where the maximum or minimum came from.

Type: Grant

Filed: September 29, 2018

Date of Patent: February 16, 2021

Assignee: Intel Corporation

Inventors: Sunny L. Gogar, Rama Kishan V. Malladi, Elmoustapha Ould-Ahmed-Vall, Christopher J. Hughes
Apparatuses, methods, and systems for stencil configuration and computation instructions

Patent number: 10922077

Abstract: Systems, methods, and apparatuses relating to performing stencil configuration and computation operations are described.

Type: Grant

Filed: December 29, 2018

Date of Patent: February 16, 2021

Assignee: Intel Corporation

Inventors: Michael Espig, Christopher J. Hughes
Systems for performing instructions for fast element unpacking into 2-dimensional registers

Patent number: 10896043

Abstract: Disclosed embodiments relate to instructions for fast element unpacking. In one example, a processor includes fetch circuitry to fetch an instruction whose format includes fields to specify an opcode and locations of an Array-of-Structures (AOS) source matrix and one or more Structure of Arrays (SOA) destination matrices, wherein: the specified opcode calls for unpacking elements of the specified AOS source matrix into the specified Structure of Arrays (SOA) destination matrices, the AOS source matrix is to contain N structures each containing K elements of different types, with same-typed elements in consecutive structures separated by a stride, the SOA destination matrices together contain K segregated groups, each containing N same-typed elements, decode circuitry to decode the fetched instruction, and execution circuitry, responsive to the decoded instruction, to unpack each element of the specified AOS matrix into one of the K element types of the one or more SOA matrices.

Type: Grant

Filed: September 28, 2018

Date of Patent: January 19, 2021

Assignee: Intel Corporation

Inventors: Bret Toll, Alexander F. Heinecke, Christopher J. Hughes, Ronen Zohar, Michael Espig, Dan Baum, Raanan Sade, Robert Valentine, Mark J. Charney, Elmoustapha Ould-Ahmed-Vall
HARDWARE/SOFTWARE CO-OPTIMIZATION TO IMPROVE PERFORMANCE AND ENERGY FOR INTER-VM COMMUNICATION FOR NFVS AND OTHER PRODUCER-CONSUMER WORKLOADS

Publication number: 20210004328

Abstract: Methods and apparatus implementing Hardware/Software co-optimization to improve performance and energy for inter-VM communication for NFVs and other producer-consumer workloads. The apparatus include multi-core processors with multi-level cache hierarchies including and L1 and L2 cache for each core and a shared last-level cache (LLC). One or more machine-level instructions are provided for proactively demoting cachelines from lower cache levels to higher cache levels, including demoting cachelines from L1/L2 caches to an LLC. Techniques are also provided for implementing hardware/software co-optimization in multi-socket NUMA architecture system, wherein cachelines may be selectively demoted and pushed to an LLC in a remote socket. In addition, techniques are disclosure for implementing early snooping in multi-socket systems to reduce latency when accessing cachelines on remote sockets.

Type: Application

Filed: September 21, 2020

Publication date: January 7, 2021

Inventors: Ren Wang, Andrew J. Herdrich, Yen-cheng Liu, Herbert H. Hum, Jong Soo Park, Christopher J. Hughes, Namakkal N. Venkatesan, Adrian C. Moga, Aamer Jaleel, Zeshan A. Chishti, Mesut A. Ergin, Jr-shian Tsai, Alexander W. Min, Tsung-yuan C. Tai, Christian Maciocco, Rajesh Sankaran
System and method of loop vectorization by compressing indices and data elements from iterations based on a control mask

Patent number: 10884744

Abstract: Loop vectorization methods and apparatus are disclosed. An example method includes generating a first control mask for a set of iterations of a loop by evaluating a condition of the loop, wherein generating the first control mask includes setting a bit of the control mask to a first value when the condition indicates that an operation of the loop is to be executed, and setting the bit of the first control mask to a second value when the condition indicates that the operation of the loop is to be bypassed. The example method also includes compressing indexes corresponding to the first set of iterations of the loop according to the first control mask.

Type: Grant

Filed: August 18, 2017

Date of Patent: January 5, 2021

Assignee: Intel Corporation

Inventors: Mikhail Plotnikov, Andrey Naraikin, Christopher J. Hughes
Systems and methods for performing instructions to transpose rectangular tiles

Patent number: 10866786

Abstract: Disclosed embodiments relate to systems and methods for performing instructions to transpose rectangular tiles. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of first destination, second destination, first source, and second source matrices, the specified opcode to cause the processor to process each of the specified source and destination matrices as a rectangular matrix, decode circuitry to decode the fetched rectangular matrix transpose instruction, and execution circuitry to respond to the decoded rectangular matrix transpose instruction by transposing each row of elements of the specified first source matrix into a corresponding column of the specified first destination matrix and transposing each row of elements of the specified second source matrix into a corresponding column of the specified second destination matrix.

Type: Grant

Filed: September 27, 2018

Date of Patent: December 15, 2020

Assignee: Intel Corporation

Inventors: Raanan Sade, Robert Valentine, Mark J. Charney, Simon Rubanovich, Amit Gradstein, Zeev Sperber, Bret Toll, Jesus Corbal, Christopher J. Hughes, Alexander F. Heinecke, Elmoustapha Ould-Ahmed-Vall
SYSTEMS AND METHODS FOR IMPLEMENTING CHAINED TILE OPERATIONS

Publication number: 20200387383

Abstract: Disclosed embodiments relate to systems and methods for implementing chained tile operations. In one example, a processor includes fetch circuitry to fetch one or more instructions until a plurality of instructions has been fetched, each instruction to specify source and destination tile operands, decode circuitry to decode the fetched instructions, and execution circuitry, responsive to the decoded instructions, to: identify first and second decoded instructions belonging to a chain of instructions, dynamically select and configure a SIMD path comprising first and second processing engines (PE) to execute the first and second decoded instructions, and set aside the specified destination of the first decoded instruction, and instead route a result of the first decoded instruction from the first PE to be used by the second PE to perform the second decoded instruction.

Type: Application

Filed: April 30, 2020

Publication date: December 10, 2020

Applicant: Intel Corporation

Inventors: Christopher J. HUGHES, Alexander F. HEINECKE, Robert Valentine, Bret Toll, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall
Apparatus and method for processing structure of arrays (SoA) and array of structures (AoS) data

Patent number: 10838734

Abstract: An apparatus and method for processing array of structures (AoS) and structure of arrays (SoA) data. For example, one embodiment of a processor comprises: a destination tile register to store data elements in a structure of arrays (SoA) format; a first source tile register to store indices associated with the data elements; instruction fetch circuitry to fetch an array of structures (AoS) gather instruction comprising operands identifying the first source tile register and the destination tile register; a decoder to decode the AoS gather instruction; and execution circuitry to determine a plurality of system memory addresses based on the indices from the first source tile register, to read data elements from the system memory addresses in an AoS format, and to load the data elements to the destination tile register in an SoA format.

Type: Grant

Filed: September 24, 2018

Date of Patent: November 17, 2020

Assignee: Intel Corporation

Inventors: Christopher J. Hughes, Bret Toll, Alexander Heinecke, Dan Baum, Elmoustapha Ould-Ahmed-Vall, Raanan Sade, Robert Valentine, Mark Charney
SYSTEMS AND METHODS FOR PERFORMING MATRIX COMPRESS AND DECOMPRESS INSTRUCTIONS

Publication number: 20200348937

Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.

Type: Application

Filed: July 20, 2020

Publication date: November 5, 2020

Inventors: Dan BAUM, Michael ESPIG, James GUILFORD, Wajdi K. FEGHALI, Raanan SADE, Christopher J. HUGHES, Robert VALENTINE, Bret TOLL, Elmoustapha OULD-AHMED-VALL, Mark J. CHARNEY, Vinodh GOPAL, Ronen ZOHAR, Alexander F. HEINECKE
Method and apparatus for vector-matrix comparison

Patent number: 10817297

Abstract: Methods and apparatus for vector-matrix comparison are disclosed. In one embodiment, a processor comprises decoding and execution circuitry. The decoding circuitry decodes an instruction, where operands of the instruction specifies an output location to store output results, a vector of data element values, and a matrix of data element values. The execution circuitry executes the decoded instruction. The execution includes to map each of the data element values of the vector to one of consecutive rows of the matrix; for each data element value of the vector, to compare that data element value of the vector with data element values in a respective row of the matrix and obtain data element match results. The execution further includes to store the output results based on the data element match results, where each output result maps to a respective data element column position and indicates a vector match result.

Type: Grant

Filed: March 30, 2019

Date of Patent: October 27, 2020

Assignee: Intel Corporation

Inventors: Christopher J. Hughes, ElMoustapha Ould-Ahmed-Vall, Jorge E. Parra, Prasoonkumar Surti, Krishna N. Vinod, Ronen Zohar
Hardware/software co-optimization to improve performance and energy for inter-VM communication for NFVs and other producer-consumer workloads

Patent number: 10817425

Abstract: Methods and apparatus implementing Hardware/Software co-optimization to improve performance and energy for inter-VM communication for NFVs and other producer-consumer workloads. The apparatus include multi-core processors with multi-level cache hierarchies including and L1 and L2 cache for each core and a shared last-level cache (LLC). One or more machine-level instructions are provided for proactively demoting cachelines from lower cache levels to higher cache levels, including demoting cachelines from L1/L2 caches to an LLC. Techniques are also provided for implementing hardware/software co-optimization in multi-socket NUMA architecture system, wherein cachelines may be selectively demoted and pushed to an LLC in a remote socket. In addition, techniques are disclosure for implementing early snooping in multi-socket systems to reduce latency when accessing cachelines on remote sockets.

Type: Grant

Filed: December 26, 2014

Date of Patent: October 27, 2020

Assignee: Intel Corporation

Inventors: Ren Wang, Andrew J. Herdrich, Yen-cheng Liu, Herbert H. Hum, Jong Soo Park, Christopher J. Hughes, Namakkal N. Venkatesan, Adrian C. Moga, Aamer Jaleel, Zeshan A. Chishti, Mesut A. Ergin, Jr-shian Tsai, Alexander W. Min, Tsung-yuan C. Tai, Christian Maciocco, Rajesh Sankaran
SPATIAL AND TEMPORAL MERGING OF REMOTE ATOMIC OPERATIONS

Publication number: 20200319886

Abstract: Disclosed embodiments relate to spatial and temporal merging of remote atomic operations.

Type: Application

Filed: February 24, 2020

Publication date: October 8, 2020

Applicant: Intel Corporation

Inventors: Christopher J. Hughes, Joseph Nuzman, Jonas Svennebring, Doddaballapur N. Jayasimha, Samantika S. Sury, David A. Koufaty, Niall D. McDonnell, Yen-Cheng Liu, Stephen R. Van Doren, Stephen J. Robinson
METHOD AND APPARATUS FOR VECTOR-MATRIX COMPARISON

Publication number: 20200310804

Abstract: Methods and apparatus for vector-matrix comparison are disclosed. In one embodiment, a processor comprises decoding and execution circuitry. The decoding circuitry decodes an instruction, where operands of the instruction specifies an output location to store output results, a vector of data element values, and a matrix of data element values. The execution circuitry executes the decoded instruction. The execution includes to map each of the data element values of the vector to one of consecutive rows of the matrix; for each data element value of the vector, to compare that data element value of the vector with data element values in a respective row of the matrix and obtain data element match results. The execution further includes to store the output results based on the data element match results, where each output result maps to a respective data element column position and indicates a vector match result.

Type: Application

Filed: March 30, 2019

Publication date: October 1, 2020

Inventors: Christopher J. HUGHES, ElMoustapha OULD-AHMED-VALL, Jorge E. PARRA, Prasoonkumar SURTI, Krishna N. VINOD, Ronen ZOHAR
METHOD AND APPARATUS FOR PERFORMING REDUCTION OPERATIONS ON A PLURALITY OF DATA ELEMENT VALUES

Publication number: 20200310809

Abstract: Embodiments detailed herein relate to reduction operations on a plurality of data element values. In one embodiment, a process comprises decoding circuitry to decode an instruction and execution circuitry to execute the decoded instruction. The instruction specifies a first input register containing a plurality of data element values, a first index register containing a plurality of indices, and an output register, where each index of the plurality of indices maps to one unique data element position of the first input register. The execution includes to identify data element values that are associated with one another based on the indices, perform one or more reduction operations on the associated data element values based on the identification, and store results of the one or more reduction operations in the output register.

Type: Application

Filed: March 27, 2019

Publication date: October 1, 2020

Inventors: Christopher J. HUGHES, Jonathan D. PEARCE, Guei-Yuan LUEH, ElMoustapha OULD-AHMED-VALL, Jorge E. PARRA, Prasoonkumar SURTI, Krishna N. VINOD, Ronen ZOHAR
SYSTEMS AND METHODS FOR RECONFIGURABLE SYSTOLIC ARRAYS

Publication number: 20200311021

Abstract: Systems and techniques are provided for hardware architecture used in parallel computing applications to improve computation efficiency. An integrated circuit system may include a data store that stores data for processing and a reconfigurable systolic array that may process the data. The reconfigurable systolic array may include a first row of processing elements (PE) that process the data according to a first function and a second row of PE that process the data according to a second function. The reconfigurable systolic array may also include a routing block coupled to the first row of PE, the second row of PE, and the data store. Further, the reconfigurable systolic array may receive data from the first row of PE, transmit the data received from the first row of PE to the second row of PE, and transmit data output by the second row of PE to the first row of PE.

Type: Application

Filed: March 31, 2019

Publication date: October 1, 2020

Inventors: Kamlesh R. Pillai, Christopher J. Hughes

prev 1 2 3 4 5 6 7 8 9 … next