Patents by Inventor Christopher J. Hughes

Christopher J. Hughes has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Methods, apparatus, instructions and logic to provide vector packed tuple cross-comparison functionality

Patent number: 10203955

Abstract: Instructions and logic provide SIMD vector packed tuple cross-comparison functionality. Some processor embodiments include first and second registers with a variable plurality of data fields, each of the data fields to store an element of a first data type. The processor executes a SIMD instruction for vector packed tuple cross-comparison in some embodiments, which for each data field of a portion of data fields in a tuple of the first register, compares its corresponding element with every element of a corresponding portion of data fields in a tuple of the second register and sets a mask bit corresponding to each element of the second register portion, in a bit-mask corresponding to each unmasked element of the corresponding first register portion, according to the corresponding comparison. In some embodiments bit-masks are shifted by corresponding elements in data fields of a third register. The comparison type is indicated by an immediate operand.

Type: Grant

Filed: December 31, 2014

Date of Patent: February 12, 2019

Assignee: Intel Corporation

Inventors: Robert Valentine, Christopher J. Hughes, Mark J. Charney, Zeev Sperber, Amit Gradstein, Simon Rubanovich, Elmoustapha Ould-Ahmed-Vall, Yuri Gebil
SYSTEMS FOR PERFORMING INSTRUCTIONS FOR FAST ELEMENT UNPACKING INTO 2-DIMENSIONAL REGISTERS

Publication number: 20190042245

Abstract: Disclosed embodiments relate to instructions for fast element unpacking. In one example, a processor includes fetch circuitry to fetch an instruction whose format includes fields to specify an opcode and locations of an Array-of-Structures (AOS) source matrix and one or more Structure of Arrays (SOA) destination matrices, wherein: the specified opcode calls for unpacking elements of the specified AOS source matrix into the specified Structure of Arrays (SOA) destination matrices, the AOS source matrix is to contain N structures each containing K elements of different types, with same-typed elements in consecutive structures separated by a stride, the SOA destination matrices together contain K segregated groups, each containing N same-typed elements, decode circuitry to decode the fetched instruction, and execution circuitry, responsive to the decoded instruction, to unpack each element of the specified AOS matrix into one of the K element types of the one or more SOA matrices.

Type: Application

Filed: September 28, 2018

Publication date: February 7, 2019

Inventors: Bret TOLL, Alexander F. HEINECKE, Christopher J. HUGHES, Ronen ZOHAR, Michael ESPIG, Dan BAUM, Raanan SADE, Robert VALENTINE
SYSTEMS AND METHODS FOR PERFORMING MATRIX COMPRESS AND DECOMPRESS INSTRUCTIONS

Publication number: 20190042257

Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.

Type: Application

Filed: September 27, 2018

Publication date: February 7, 2019

Inventors: Dan BAUM, Michael ESPIG, James GUILFORD, Wajdi K. FEGHALI, Raanan SADE, Christopher J. HUGHES, Robert VALENTINE, Bret TOLL, Elmoustapha OULD-AHMED-VALL, Mark J. CHARNEY, Vinodh GOPAL, Ronen ZOHAR, Alexander F. HEINECKE
SYSTEMS AND METHODS FOR PERFORMING INSTRUCTIONS SPECIFYING TERNARY TILE LOGIC OPERATIONS

Publication number: 20190042260

Abstract: Disclosed embodiments relate to systems and methods for performing instructions specifying ternary tile operations. In one example, a processor includes fetch and decode circuitry to fetch and decode an instruction specifying a ternary tile operation, and locations of destination and first, second, and third source matrices, each of the matrices having M rows by N columns; and execution circuitry to respond to the decoded instruction by, for each equal-sized group of K elements of the specified first, second, and third source matrices, generate K results by performing the ternary tile operation in parallel on K corresponding elements of the specified first, second, and third source matrices, and store each of the K results to a corresponding element of the specified destination matrix, wherein corresponding elements of the specified source and destination matrices occupy a same relative position within their associated matrix.

Type: Application

Filed: September 14, 2018

Publication date: February 7, 2019

Inventors: Elmoustapha OULD-AHMED-VALL, Christopher J. HUGHES, Bret TOLL, Dan BAUM, Raanan SADE, Robert VALENTINE, Mark J. CHARNEY, Alexander F. HEINECKE
SYSTEMS AND METHODS FOR PERFORMING HORIZONTAL TILE OPERATIONS

Publication number: 20190042261

Abstract: Disclosed embodiments relate to systems and methods for performing instructions specifying horizontal tile operations. In one example, a processor includes fetch circuitry to fetch an instruction specifying a horizontal tile operation, a location of a M by N source matrix comprising K groups of elements, and locations of K destinations, wherein each of the K groups of elements comprises the same number of elements, decode circuitry to decode the fetched instruction, and execution circuitry to respond to the decoded instruction by generating K results, each result being generated by performing the specified horizontal tile operation across every element of a corresponding group of the K groups, and writing each generated result to a corresponding location of the K specified destination locations.

Type: Application

Filed: September 14, 2018

Publication date: February 7, 2019

Inventors: Christopher J. HUGHES, Bret TOLL, Dan BAUM, Elmoustapha OULD-AHMED-VALL, Raanan SADE, Robert VALENTINE, Mark J. CHARNEY, Alexander F. HEINECKE
SYSTEMS AND METHODS FOR PERFORMING INSTRUCTIONS TO TRANSPOSE RECTANGULAR TILES

Publication number: 20190042202

Abstract: Disclosed embodiments relate to systems and methods for performing instructions to transpose rectangular tiles. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of first destination, second destination, first source, and second source matrices, the specified opcode to cause the processor to process each of the specified source and destination matrices as a rectangular matrix, decode circuitry to decode the fetched rectangular matrix transpose instruction, and execution circuitry to respond to the decoded rectangular matrix transpose instruction by transposing each row of elements of the specified first source matrix into a corresponding column of the specified first destination matrix and transposing each row of elements of the specified second source matrix into a corresponding column of the specified second destination matrix.

Type: Application

Filed: September 27, 2018

Publication date: February 7, 2019

Inventors: Raanan SADE, Robert VALENTINE, Mark J. CHARNEY, Simon RUBANOVICH, Amit GRADSTEIN, Zeev SPERBER, Bret TOLL, Jesus CORBAL, Christopher J. HUGHES, Alexander F. HEINECKE, Elmoustapha OULD-AHMED-VALL
Hybrid hardware and software implementation of transactional memory access

Patent number: 10180903

Abstract: Embodiments of the invention relate a hybrid hardware and software implementation of transactional memory accesses in a computer system. A processor including a transactional cache and a regular cache is utilized in a computer system that includes a policy manager to select one of a first mode (a hardware mode) or a second mode (a software mode) to implement transactional memory accesses. In the hardware mode the transactional cache is utilized to perform read and write memory operations and in the software mode the regular cache is utilized to perform read and write memory operations.

Type: Grant

Filed: April 1, 2017

Date of Patent: January 15, 2019

Assignee: INTEL CORPORATION

Inventors: Sanjeev Kumar, Christopher J. Hughes, Partha Kundu, Anthony Nguyen
Read and Write Masks Update Instruction for Vectorization of Recursive Computations Over Independent Data

Publication number: 20190012171

Abstract: A processor executes a mask update instruction to perform updates to a first mask register and a second mask register. A register file within the processor includes the first mask register and the second mask register. The processor includes execution circuitry to execute the mask update instruction. In response to the mask update instruction, the execution circuitry is to invert a given number of mask bits in the first mask register, and also to invert the given number of mask bits in the second mask register.

Type: Application

Filed: April 2, 2018

Publication date: January 10, 2019

Inventors: Mikhail Plotnikov, Andrey Naraikin, Christopher J. Hughes
APPARATUSES AND METHODS FOR A PROCESSOR ARCHITECTURE

Publication number: 20190012266

Abstract: Embodiments of an invention a processor architecture are disclosed. In an embodiment, a processor includes a decoder, an execution unit, a coherent cache, and an interconnect. The decoder is to decode an instruction to zero a cache line. The execution unit is to issue a write command to initiate a cache line sized write of zeros. The coherent cache is to receive the write command, to determine whether there is a hit in the coherent cache and whether a cache coherency protocol state of the hit cache line is a modified state or an exclusive state, to configure a cache line to indicate all zeros, and to issue the write command toward the interconnect. The interconnect is to, responsive to receipt of the write command, issue a snoop to each of a plurality of other coherent caches for which it must be determined if there is a hit.

Type: Application

Filed: August 28, 2018

Publication date: January 10, 2019

Inventors: Jason W. Brandt, Robert S. Chappell, Jesus Corbal, Edward T. Grochowski, Stephen H. Gunther, Buford M. Guy, Thomas R. Huff, Christopher J. Hughes, Elmoustapha Ould-Ahmed-Vall, Ronak Singhal, Seyed Yahya Sotoudeh, Bret L. Toll, Lihu Rappoport, David Papworth, James D. Allen
Gathering and scattering multiple data elements

Patent number: 10175990

Abstract: According to a first aspect, efficient data transfer operations can be achieved by: decoding by a processor device, a single instruction specifying a transfer operation for a plurality of data elements between a first storage location and a second storage location; issuing the single instruction for execution by an execution unit in the processor; detecting an occurrence of an exception during execution of the single instruction; and in response to the exception, delivering pending traps or interrupts to an exception handler prior to delivering the exception.

Type: Grant

Filed: May 20, 2013

Date of Patent: January 8, 2019

Assignee: Intel Corporation

Inventors: Christopher J. Hughes, Yen-Kuang (Y. K.) Chen, Mayank Bomb, Jason W. Brandt, Mark J. Buxton, Mark J. Charney, Srinivas Chennupaty, Jesus Corbal, Martin G. Dixon, Milind B. Girkar, Jonathan C. Hall, Hideki (Saito) Ido, Peter Lachner, Gilbert Neiger, Chris J. Newburn, Rajesh S. Parthasarathy, Bret L. Toll, Robert Valentine, Jeffrey G. Wiedemeier
INSTRUCTIONS FOR REMOTE ATOMIC OPERATIONS

Publication number: 20190004810

Abstract: Disclosed embodiments relate to atomic memory operations. In one example, a method of executing an instruction atomically and with weak order includes: fetching, by fetch circuitry, the instruction from code storage, the instruction including an opcode, a source identifier, and a destination identifier, decoding, by decode circuitry, the fetched instruction, selecting, by a scheduling circuit, an execution circuit among multiple circuits in a system, scheduling, by the scheduling circuit, execution of the decoded instruction out of order with respect to other instructions, with an order selected to optimize at least one of latency, throughput, power, and performance, and executing the decoded instruction, by the execution circuit, to: atomically read a datum from a location identified by the destination identifier, perform an operation on the datum as specified by the opcode, the operation to use a source operand identified by the source identifier, and write a result back to the location.

Type: Application

Filed: June 29, 2017

Publication date: January 3, 2019

Inventors: Doddaballapur N. Jayasimha, Jonas Svennebring, Samantika S. Sury, Christopher J. Hughes, Jong Soo Park, Lingxiang Xiang
Methods, apparatus, instructions and logic to provide permute controls with leading zero count functionality

Patent number: 10162637

Abstract: Instructions and logic provide SIMD permute controls with leading zero count functionality. Some embodiments include processors with a register with a plurality of data fields, each of the data fields to store a second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of most significant contiguous bits set to zero for corresponding data fields. Responsive to decoding a vector leading zero count instruction, execution units count the number of most significant contiguous bits set to zero for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector leading zero count instructions can be used to generate permute controls and completion masks to be used along with the set of permute controls, to resolve dependencies in gather-modify-scatter SIMD operations.

Type: Grant

Filed: March 5, 2018

Date of Patent: December 25, 2018

Assignee: Intel Corporation

Inventors: Christopher J. Hughes, Mikhail Plotnikov, Andrey Naraikin, Robert Valentine
Methods, apparatus, instructions and logic to provide permute controls with leading zero count functionality

Patent number: 10162639

Abstract: Instructions and logic provide SIMD permute controls with leading zero count functionality. Some embodiments include processors with a register with a plurality of data fields, each of the data fields to store a second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of most significant contiguous bits set to zero for corresponding data fields. Responsive to decoding a vector leading zero count instruction, execution units count the number of most significant contiguous bits set to zero for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector leading zero count instructions can be used to generate permute controls and completion masks to be used along with the set of permute controls, to resolve dependencies in gather-modify-scatter SIMD operations.

Type: Grant

Filed: March 5, 2018

Date of Patent: December 25, 2018

Assignee: Intel Corporation

Inventors: Christopher J. Hughes, Mikhail Plotnikov, Andrey Naraikin, Robert Valentine
Methods, apparatus, instructions and logic to provide permute controls with leading zero count functionality

Patent number: 10162638

Abstract: Instructions and logic provide SIMD permute controls with leading zero count functionality. Some embodiments include processors with a register with a plurality of data fields, each of the data fields to store a second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of most significant contiguous bits set to zero for corresponding data fields. Responsive to decoding a vector leading zero count instruction, execution units count the number of most significant contiguous bits set to zero for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector leading zero count instructions can be used to generate permute controls and completion masks to be used along with the set of permute controls, to resolve dependencies in gather-modify-scatter SIMD operations.

Type: Grant

Filed: March 5, 2018

Date of Patent: December 25, 2018

Assignee: Intel Corporation

Inventors: Christopher J. Hughes, Mikhail Plotnikov, Andrey Naraikin, Robert Valentine
Instruction and logic to provide pushing buffer copy and store functionality

Patent number: 10152325

Abstract: Instructions and logic provide pushing buffer copy and store functionality. Some embodiments include a first hardware thread or processing core, and a second hardware thread or processing core, a cache to store cache coherent data in a cache line for a shared memory address accessible by the second hardware thread or processing core. Responsive to decoding an instruction specifying a source data operand, said shared memory address as a destination operand, and one or more owner of said shared memory address, one or more execution units copy data from the source data operand to the cache coherent data in the cache line for said shared memory address accessible by said second hardware thread or processing core in the cache when said one or more owner includes said second hardware thread or processing core.

Type: Grant

Filed: February 7, 2017

Date of Patent: December 11, 2018

Assignee: Intel Corporation

Inventors: Christopher J. Hughes, Changkyu Kim, Daehyun Kim, Victor W. Lee, Jong Soo Park
Application driven hardware cache management

Patent number: 10126985

Abstract: A processor includes a processing core to generate a memory request for an application data in an application. The processor also includes a virtual page group memory management (VPGMM) unit coupled to the processing core to specify a caching priority (CP) to the application data for the application. The caching priority identifies importance of the application data in a cache.

Type: Grant

Filed: June 24, 2015

Date of Patent: November 13, 2018

Assignee: Intel Corporation

Inventors: Subramanya R. Dulloor, Rajesh M. Sankaran, David A. Koufaty, Christopher J. Hughes, Jong Soo Park, Sheng Li
Apparatus and method for lazy translation lookaside buffer (TLB) coherence

Patent number: 10120814

Abstract: An apparatus and method are described for managing TLB coherence. For example, one embodiment of a processor comprises: one or more cores to execute instructions and process data; one or more translation lookaside buffers (TLBs) each comprising a plurality of entries to cache virtual-to-physical address translations usable by the set of one or more cores when executing the instructions; one or more epoch counters each programmed with a specified epoch value; and TLB validation logic to validate a specified set of TLB entries at intervals specified by the epoch value.

Type: Grant

Filed: April 1, 2016

Date of Patent: November 6, 2018

Assignee: Intel Corporation

Inventors: Kshitij A. Doshi, Christopher J. Hughes
Gathering and scattering multiple data elements

Patent number: 10114651

Abstract: According to a first aspect, efficient data transfer operations can be achieved by: decoding by a processor device, a single instruction specifying a transfer operation for a plurality of data elements between a first storage location and a second storage location; issuing the single instruction for execution by an execution unit in the processor; detecting an occurrence of an exception during execution of the single instruction; and in response to the exception, delivering pending traps or interrupts to an exception handler prior to delivering the exception.

Type: Grant

Filed: January 4, 2018

Date of Patent: October 30, 2018

Assignee: Intel Corporation

Inventors: Christopher J. Hughes, Yen-Kuang (Y. K.) Chen, Mayank Bomb, Jason W. Brandt, Mark J. Buxton, Mark J. Charney, Srinivas Chennupaty, Jesus Corbal, Martin G. Dixon, Milind B. Girkar, Jonathan C. Hall, Hideki (Saito) Ido, Peter Lachner, Gilbert Neiger, Chris J. Newburn, Rajesh S. Parthasarathy, Bret L. Toll, Robert Valentine, Jeffrey G. Wiedemeier
Hybrid hardware and software implementation of transactional memory access

Patent number: 10102123

Abstract: Embodiments of the invention relate a hybrid hardware and software implementation of transactional memory accesses in a computer system. A processor including a transactional cache and a regular cache is utilized in a computer system that includes a policy manager to select one of a first mode (a hardware mode) or a second mode (a software mode) to implement transactional memory accesses. In the hardware mode the transactional cache is utilized to perform read and write memory operations and in the software mode the regular cache is utilized to perform read and write memory operations.

Type: Grant

Filed: October 20, 2016

Date of Patent: October 16, 2018

Assignee: Intel Corporation

Inventors: Sanjeev Kumar, Christopher J. Hughes, Partha Kundu, Anthony Nguyen
System, Apparatus And Method For Selective Enabling Of Locality-Based Instruction Handling

Publication number: 20180285280

Abstract: In an embodiment, a processor includes a sparse access buffer having a plurality of entries each to store for a memory access instruction to a particular address, address information and count information; and a memory controller to issue read requests to a memory, the memory controller including a locality controller to receive a memory access instruction having a no-locality hint and override the no-locality hint based at least in part on the count information stored in an entry of the sparse access buffer. Other embodiments are described and claimed.

Type: Application

Filed: March 31, 2017

Publication date: October 4, 2018

Inventors: Berkin Akin, Rajat Agarwal, Jong Soo Park, Christopher J. Hughes, Chiachen Chou

prev … 5 6 7 8 9 10 11 12 13 … next