Patents by Inventor Christopher J. Hughes

Christopher J. Hughes has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10203955
    Abstract: Instructions and logic provide SIMD vector packed tuple cross-comparison functionality. Some processor embodiments include first and second registers with a variable plurality of data fields, each of the data fields to store an element of a first data type. The processor executes a SIMD instruction for vector packed tuple cross-comparison in some embodiments, which for each data field of a portion of data fields in a tuple of the first register, compares its corresponding element with every element of a corresponding portion of data fields in a tuple of the second register and sets a mask bit corresponding to each element of the second register portion, in a bit-mask corresponding to each unmasked element of the corresponding first register portion, according to the corresponding comparison. In some embodiments bit-masks are shifted by corresponding elements in data fields of a third register. The comparison type is indicated by an immediate operand.
    Type: Grant
    Filed: December 31, 2014
    Date of Patent: February 12, 2019
    Assignee: Intel Corporation
    Inventors: Robert Valentine, Christopher J. Hughes, Mark J. Charney, Zeev Sperber, Amit Gradstein, Simon Rubanovich, Elmoustapha Ould-Ahmed-Vall, Yuri Gebil
  • Publication number: 20190042245
    Abstract: Disclosed embodiments relate to instructions for fast element unpacking. In one example, a processor includes fetch circuitry to fetch an instruction whose format includes fields to specify an opcode and locations of an Array-of-Structures (AOS) source matrix and one or more Structure of Arrays (SOA) destination matrices, wherein: the specified opcode calls for unpacking elements of the specified AOS source matrix into the specified Structure of Arrays (SOA) destination matrices, the AOS source matrix is to contain N structures each containing K elements of different types, with same-typed elements in consecutive structures separated by a stride, the SOA destination matrices together contain K segregated groups, each containing N same-typed elements, decode circuitry to decode the fetched instruction, and execution circuitry, responsive to the decoded instruction, to unpack each element of the specified AOS matrix into one of the K element types of the one or more SOA matrices.
    Type: Application
    Filed: September 28, 2018
    Publication date: February 7, 2019
    Inventors: Bret TOLL, Alexander F. HEINECKE, Christopher J. HUGHES, Ronen ZOHAR, Michael ESPIG, Dan BAUM, Raanan SADE, Robert VALENTINE
  • Publication number: 20190042257
    Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.
    Type: Application
    Filed: September 27, 2018
    Publication date: February 7, 2019
    Inventors: Dan BAUM, Michael ESPIG, James GUILFORD, Wajdi K. FEGHALI, Raanan SADE, Christopher J. HUGHES, Robert VALENTINE, Bret TOLL, Elmoustapha OULD-AHMED-VALL, Mark J. CHARNEY, Vinodh GOPAL, Ronen ZOHAR, Alexander F. HEINECKE
  • Publication number: 20190042260
    Abstract: Disclosed embodiments relate to systems and methods for performing instructions specifying ternary tile operations. In one example, a processor includes fetch and decode circuitry to fetch and decode an instruction specifying a ternary tile operation, and locations of destination and first, second, and third source matrices, each of the matrices having M rows by N columns; and execution circuitry to respond to the decoded instruction by, for each equal-sized group of K elements of the specified first, second, and third source matrices, generate K results by performing the ternary tile operation in parallel on K corresponding elements of the specified first, second, and third source matrices, and store each of the K results to a corresponding element of the specified destination matrix, wherein corresponding elements of the specified source and destination matrices occupy a same relative position within their associated matrix.
    Type: Application
    Filed: September 14, 2018
    Publication date: February 7, 2019
    Inventors: Elmoustapha OULD-AHMED-VALL, Christopher J. HUGHES, Bret TOLL, Dan BAUM, Raanan SADE, Robert VALENTINE, Mark J. CHARNEY, Alexander F. HEINECKE
  • Publication number: 20190042261
    Abstract: Disclosed embodiments relate to systems and methods for performing instructions specifying horizontal tile operations. In one example, a processor includes fetch circuitry to fetch an instruction specifying a horizontal tile operation, a location of a M by N source matrix comprising K groups of elements, and locations of K destinations, wherein each of the K groups of elements comprises the same number of elements, decode circuitry to decode the fetched instruction, and execution circuitry to respond to the decoded instruction by generating K results, each result being generated by performing the specified horizontal tile operation across every element of a corresponding group of the K groups, and writing each generated result to a corresponding location of the K specified destination locations.
    Type: Application
    Filed: September 14, 2018
    Publication date: February 7, 2019
    Inventors: Christopher J. HUGHES, Bret TOLL, Dan BAUM, Elmoustapha OULD-AHMED-VALL, Raanan SADE, Robert VALENTINE, Mark J. CHARNEY, Alexander F. HEINECKE
  • Publication number: 20190042202
    Abstract: Disclosed embodiments relate to systems and methods for performing instructions to transpose rectangular tiles. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of first destination, second destination, first source, and second source matrices, the specified opcode to cause the processor to process each of the specified source and destination matrices as a rectangular matrix, decode circuitry to decode the fetched rectangular matrix transpose instruction, and execution circuitry to respond to the decoded rectangular matrix transpose instruction by transposing each row of elements of the specified first source matrix into a corresponding column of the specified first destination matrix and transposing each row of elements of the specified second source matrix into a corresponding column of the specified second destination matrix.
    Type: Application
    Filed: September 27, 2018
    Publication date: February 7, 2019
    Inventors: Raanan SADE, Robert VALENTINE, Mark J. CHARNEY, Simon RUBANOVICH, Amit GRADSTEIN, Zeev SPERBER, Bret TOLL, Jesus CORBAL, Christopher J. HUGHES, Alexander F. HEINECKE, Elmoustapha OULD-AHMED-VALL
  • Patent number: 10180903
    Abstract: Embodiments of the invention relate a hybrid hardware and software implementation of transactional memory accesses in a computer system. A processor including a transactional cache and a regular cache is utilized in a computer system that includes a policy manager to select one of a first mode (a hardware mode) or a second mode (a software mode) to implement transactional memory accesses. In the hardware mode the transactional cache is utilized to perform read and write memory operations and in the software mode the regular cache is utilized to perform read and write memory operations.
    Type: Grant
    Filed: April 1, 2017
    Date of Patent: January 15, 2019
    Assignee: INTEL CORPORATION
    Inventors: Sanjeev Kumar, Christopher J. Hughes, Partha Kundu, Anthony Nguyen
  • Publication number: 20190012171
    Abstract: A processor executes a mask update instruction to perform updates to a first mask register and a second mask register. A register file within the processor includes the first mask register and the second mask register. The processor includes execution circuitry to execute the mask update instruction. In response to the mask update instruction, the execution circuitry is to invert a given number of mask bits in the first mask register, and also to invert the given number of mask bits in the second mask register.
    Type: Application
    Filed: April 2, 2018
    Publication date: January 10, 2019
    Inventors: Mikhail Plotnikov, Andrey Naraikin, Christopher J. Hughes
  • Publication number: 20190012266
    Abstract: Embodiments of an invention a processor architecture are disclosed. In an embodiment, a processor includes a decoder, an execution unit, a coherent cache, and an interconnect. The decoder is to decode an instruction to zero a cache line. The execution unit is to issue a write command to initiate a cache line sized write of zeros. The coherent cache is to receive the write command, to determine whether there is a hit in the coherent cache and whether a cache coherency protocol state of the hit cache line is a modified state or an exclusive state, to configure a cache line to indicate all zeros, and to issue the write command toward the interconnect. The interconnect is to, responsive to receipt of the write command, issue a snoop to each of a plurality of other coherent caches for which it must be determined if there is a hit.
    Type: Application
    Filed: August 28, 2018
    Publication date: January 10, 2019
    Inventors: Jason W. Brandt, Robert S. Chappell, Jesus Corbal, Edward T. Grochowski, Stephen H. Gunther, Buford M. Guy, Thomas R. Huff, Christopher J. Hughes, Elmoustapha Ould-Ahmed-Vall, Ronak Singhal, Seyed Yahya Sotoudeh, Bret L. Toll, Lihu Rappoport, David Papworth, James D. Allen
  • Patent number: 10175990
    Abstract: According to a first aspect, efficient data transfer operations can be achieved by: decoding by a processor device, a single instruction specifying a transfer operation for a plurality of data elements between a first storage location and a second storage location; issuing the single instruction for execution by an execution unit in the processor; detecting an occurrence of an exception during execution of the single instruction; and in response to the exception, delivering pending traps or interrupts to an exception handler prior to delivering the exception.
    Type: Grant
    Filed: May 20, 2013
    Date of Patent: January 8, 2019
    Assignee: Intel Corporation
    Inventors: Christopher J. Hughes, Yen-Kuang (Y. K.) Chen, Mayank Bomb, Jason W. Brandt, Mark J. Buxton, Mark J. Charney, Srinivas Chennupaty, Jesus Corbal, Martin G. Dixon, Milind B. Girkar, Jonathan C. Hall, Hideki (Saito) Ido, Peter Lachner, Gilbert Neiger, Chris J. Newburn, Rajesh S. Parthasarathy, Bret L. Toll, Robert Valentine, Jeffrey G. Wiedemeier
  • Publication number: 20190004810
    Abstract: Disclosed embodiments relate to atomic memory operations. In one example, a method of executing an instruction atomically and with weak order includes: fetching, by fetch circuitry, the instruction from code storage, the instruction including an opcode, a source identifier, and a destination identifier, decoding, by decode circuitry, the fetched instruction, selecting, by a scheduling circuit, an execution circuit among multiple circuits in a system, scheduling, by the scheduling circuit, execution of the decoded instruction out of order with respect to other instructions, with an order selected to optimize at least one of latency, throughput, power, and performance, and executing the decoded instruction, by the execution circuit, to: atomically read a datum from a location identified by the destination identifier, perform an operation on the datum as specified by the opcode, the operation to use a source operand identified by the source identifier, and write a result back to the location.
    Type: Application
    Filed: June 29, 2017
    Publication date: January 3, 2019
    Inventors: Doddaballapur N. Jayasimha, Jonas Svennebring, Samantika S. Sury, Christopher J. Hughes, Jong Soo Park, Lingxiang Xiang
  • Patent number: 10162637
    Abstract: Instructions and logic provide SIMD permute controls with leading zero count functionality. Some embodiments include processors with a register with a plurality of data fields, each of the data fields to store a second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of most significant contiguous bits set to zero for corresponding data fields. Responsive to decoding a vector leading zero count instruction, execution units count the number of most significant contiguous bits set to zero for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector leading zero count instructions can be used to generate permute controls and completion masks to be used along with the set of permute controls, to resolve dependencies in gather-modify-scatter SIMD operations.
    Type: Grant
    Filed: March 5, 2018
    Date of Patent: December 25, 2018
    Assignee: Intel Corporation
    Inventors: Christopher J. Hughes, Mikhail Plotnikov, Andrey Naraikin, Robert Valentine
  • Patent number: 10162639
    Abstract: Instructions and logic provide SIMD permute controls with leading zero count functionality. Some embodiments include processors with a register with a plurality of data fields, each of the data fields to store a second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of most significant contiguous bits set to zero for corresponding data fields. Responsive to decoding a vector leading zero count instruction, execution units count the number of most significant contiguous bits set to zero for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector leading zero count instructions can be used to generate permute controls and completion masks to be used along with the set of permute controls, to resolve dependencies in gather-modify-scatter SIMD operations.
    Type: Grant
    Filed: March 5, 2018
    Date of Patent: December 25, 2018
    Assignee: Intel Corporation
    Inventors: Christopher J. Hughes, Mikhail Plotnikov, Andrey Naraikin, Robert Valentine
  • Patent number: 10162638
    Abstract: Instructions and logic provide SIMD permute controls with leading zero count functionality. Some embodiments include processors with a register with a plurality of data fields, each of the data fields to store a second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of most significant contiguous bits set to zero for corresponding data fields. Responsive to decoding a vector leading zero count instruction, execution units count the number of most significant contiguous bits set to zero for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector leading zero count instructions can be used to generate permute controls and completion masks to be used along with the set of permute controls, to resolve dependencies in gather-modify-scatter SIMD operations.
    Type: Grant
    Filed: March 5, 2018
    Date of Patent: December 25, 2018
    Assignee: Intel Corporation
    Inventors: Christopher J. Hughes, Mikhail Plotnikov, Andrey Naraikin, Robert Valentine
  • Patent number: 10152325
    Abstract: Instructions and logic provide pushing buffer copy and store functionality. Some embodiments include a first hardware thread or processing core, and a second hardware thread or processing core, a cache to store cache coherent data in a cache line for a shared memory address accessible by the second hardware thread or processing core. Responsive to decoding an instruction specifying a source data operand, said shared memory address as a destination operand, and one or more owner of said shared memory address, one or more execution units copy data from the source data operand to the cache coherent data in the cache line for said shared memory address accessible by said second hardware thread or processing core in the cache when said one or more owner includes said second hardware thread or processing core.
    Type: Grant
    Filed: February 7, 2017
    Date of Patent: December 11, 2018
    Assignee: Intel Corporation
    Inventors: Christopher J. Hughes, Changkyu Kim, Daehyun Kim, Victor W. Lee, Jong Soo Park
  • Patent number: 10126985
    Abstract: A processor includes a processing core to generate a memory request for an application data in an application. The processor also includes a virtual page group memory management (VPGMM) unit coupled to the processing core to specify a caching priority (CP) to the application data for the application. The caching priority identifies importance of the application data in a cache.
    Type: Grant
    Filed: June 24, 2015
    Date of Patent: November 13, 2018
    Assignee: Intel Corporation
    Inventors: Subramanya R. Dulloor, Rajesh M. Sankaran, David A. Koufaty, Christopher J. Hughes, Jong Soo Park, Sheng Li
  • Patent number: 10120814
    Abstract: An apparatus and method are described for managing TLB coherence. For example, one embodiment of a processor comprises: one or more cores to execute instructions and process data; one or more translation lookaside buffers (TLBs) each comprising a plurality of entries to cache virtual-to-physical address translations usable by the set of one or more cores when executing the instructions; one or more epoch counters each programmed with a specified epoch value; and TLB validation logic to validate a specified set of TLB entries at intervals specified by the epoch value.
    Type: Grant
    Filed: April 1, 2016
    Date of Patent: November 6, 2018
    Assignee: Intel Corporation
    Inventors: Kshitij A. Doshi, Christopher J. Hughes
  • Patent number: 10114651
    Abstract: According to a first aspect, efficient data transfer operations can be achieved by: decoding by a processor device, a single instruction specifying a transfer operation for a plurality of data elements between a first storage location and a second storage location; issuing the single instruction for execution by an execution unit in the processor; detecting an occurrence of an exception during execution of the single instruction; and in response to the exception, delivering pending traps or interrupts to an exception handler prior to delivering the exception.
    Type: Grant
    Filed: January 4, 2018
    Date of Patent: October 30, 2018
    Assignee: Intel Corporation
    Inventors: Christopher J. Hughes, Yen-Kuang (Y. K.) Chen, Mayank Bomb, Jason W. Brandt, Mark J. Buxton, Mark J. Charney, Srinivas Chennupaty, Jesus Corbal, Martin G. Dixon, Milind B. Girkar, Jonathan C. Hall, Hideki (Saito) Ido, Peter Lachner, Gilbert Neiger, Chris J. Newburn, Rajesh S. Parthasarathy, Bret L. Toll, Robert Valentine, Jeffrey G. Wiedemeier
  • Patent number: 10102123
    Abstract: Embodiments of the invention relate a hybrid hardware and software implementation of transactional memory accesses in a computer system. A processor including a transactional cache and a regular cache is utilized in a computer system that includes a policy manager to select one of a first mode (a hardware mode) or a second mode (a software mode) to implement transactional memory accesses. In the hardware mode the transactional cache is utilized to perform read and write memory operations and in the software mode the regular cache is utilized to perform read and write memory operations.
    Type: Grant
    Filed: October 20, 2016
    Date of Patent: October 16, 2018
    Assignee: Intel Corporation
    Inventors: Sanjeev Kumar, Christopher J. Hughes, Partha Kundu, Anthony Nguyen
  • Publication number: 20180285280
    Abstract: In an embodiment, a processor includes a sparse access buffer having a plurality of entries each to store for a memory access instruction to a particular address, address information and count information; and a memory controller to issue read requests to a memory, the memory controller including a locality controller to receive a memory access instruction having a no-locality hint and override the no-locality hint based at least in part on the count information stored in an entry of the sparse access buffer. Other embodiments are described and claimed.
    Type: Application
    Filed: March 31, 2017
    Publication date: October 4, 2018
    Inventors: Berkin Akin, Rajat Agarwal, Jong Soo Park, Christopher J. Hughes, Chiachen Chou